About these documents ********************* These documents are generated from reStructuredText sources by Sphinx, a document processor specifically written for the Python documentation. Development of the documentation and its toolchain is an entirely volunteer effort, just like Python itself. If you want to contribute, please take a look at the Dealing with Bugs page for information on how to do so. New volunteers are always welcome! Many thanks go to: * Fred L. Drake, Jr., the creator of the original Python documentation toolset and writer of much of the content; * the Docutils project for creating reStructuredText and the Docutils suite; * Fredrik Lundh for his Alternative Python Reference project from which Sphinx got many good ideas. Contributors to the Python Documentation ======================================== Many people have contributed to the Python language, the Python standard library, and the Python documentation. See Misc/ACKS in the Python source distribution for a partial list of contributors. It is only with the input and contributions of the Python community that Python has such wonderful documentation – Thank You! Dealing with Bugs ***************** Python is a mature programming language which has established a reputation for stability. In order to maintain this reputation, the developers would like to know of any deficiencies you find in Python. It can be sometimes faster to fix bugs yourself and contribute patches to Python as it streamlines the process and involves less people. Learn how to contribute. Documentation bugs ================== If you find a bug in this documentation or would like to propose an improvement, please submit a bug report on the tracker. If you have a suggestion on how to fix it, include that as well. If you’re short on time, you can also email documentation bug reports to docs@python.org (behavioral bugs can be sent to python- list@python.org). ‘docs@’ is a mailing list run by volunteers; your request will be noticed, though it may take a while to be processed. See also: Documentation bugs A list of documentation bugs that have been submitted to the Python issue tracker. Issue Tracking Overview of the process involved in reporting an improvement on the tracker. Helping with Documentation Comprehensive guide for individuals that are interested in contributing to Python documentation. Documentation Translations A list of GitHub pages for documentation translation and their primary contacts. Using the Python issue tracker ============================== Issue reports for Python itself should be submitted via the GitHub issues tracker (https://github.com/python/cpython/issues). The GitHub issues tracker offers a web form which allows pertinent information to be entered and submitted to the developers. The first step in filing a report is to determine whether the problem has already been reported. The advantage in doing so, aside from saving the developers’ time, is that you learn what has been done to fix it; it may be that the problem has already been fixed for the next release, or additional information is needed (in which case you are welcome to provide it if you can!). To do this, search the tracker using the search box at the top of the page. If the problem you’re reporting is not already in the list, log in to GitHub. If you don’t already have a GitHub account, create a new account using the “Sign up” link. It is not possible to submit a bug report anonymously. Being now logged in, you can submit an issue. Click on the “New issue” button in the top bar to report a new issue. The submission form has two fields, “Title” and “Comment”. For the “Title” field, enter a *very* short description of the problem; less than ten words is good. In the “Comment” field, describe the problem in detail, including what you expected to happen and what did happen. Be sure to include whether any extension modules were involved, and what hardware and software platform you were using (including version information as appropriate). Each issue report will be reviewed by a developer who will determine what needs to be done to correct the problem. You will receive an update each time an action is taken on the issue. See also: How to Report Bugs Effectively Article which goes into some detail about how to create a useful bug report. This describes what kind of information is useful and why it is useful. Bug Writing Guidelines Information about writing a good bug report. Some of this is specific to the Mozilla project, but describes general good practices. Getting started contributing to Python yourself =============================================== Beyond just reporting bugs that you find, you are also welcome to submit patches to fix them. You can find more information on how to get started patching Python in the Python Developer’s Guide. If you have questions, the core-mentorship mailing list is a friendly place to get answers to any and all questions pertaining to the process of fixing issues in Python. Abstract Objects Layer ********************** The functions in this chapter interact with Python objects regardless of their type, or with wide classes of object types (e.g. all numerical types, or all sequence types). When used on object types for which they do not apply, they will raise a Python exception. It is not possible to use these functions on objects that are not properly initialized, such as a list object that has been created by "PyList_New()", but whose items have not been set to some non-"NULL" value yet. * Object Protocol * Call Protocol * The *tp_call* Protocol * The Vectorcall Protocol * Recursion Control * Vectorcall Support API * Object Calling API * Call Support API * Number Protocol * Sequence Protocol * Mapping Protocol * Iterator Protocol * Buffer Protocol * Buffer structure * Buffer request types * request-independent fields * readonly, format * shape, strides, suboffsets * contiguity requests * compound requests * Complex arrays * NumPy-style: shape and strides * PIL-style: shape, strides and suboffsets * Buffer-related functions * Old Buffer Protocol Allocating Objects on the Heap ****************************** PyObject* _PyObject_New(PyTypeObject *type) *Return value: New reference.* PyVarObject* _PyObject_NewVar(PyTypeObject *type, Py_ssize_t size) *Return value: New reference.* PyObject* PyObject_Init(PyObject *op, PyTypeObject *type) *Return value: Borrowed reference.* Initialize a newly-allocated object *op* with its type and initial reference. Returns the initialized object. If *type* indicates that the object participates in the cyclic garbage detector, it is added to the detector’s set of observed objects. Other fields of the object are not affected. PyVarObject* PyObject_InitVar(PyVarObject *op, PyTypeObject *type, Py_ssize_t size) *Return value: Borrowed reference.* This does everything "PyObject_Init()" does, and also initializes the length information for a variable-size object. TYPE* PyObject_New(TYPE, PyTypeObject *type) *Return value: New reference.* Allocate a new Python object using the C structure type *TYPE* and the Python type object *type*. Fields not defined by the Python object header are not initialized; the object’s reference count will be one. The size of the memory allocation is determined from the "tp_basicsize" field of the type object. TYPE* PyObject_NewVar(TYPE, PyTypeObject *type, Py_ssize_t size) *Return value: New reference.* Allocate a new Python object using the C structure type *TYPE* and the Python type object *type*. Fields not defined by the Python object header are not initialized. The allocated memory allows for the *TYPE* structure plus *size* fields of the size given by the "tp_itemsize" field of *type*. This is useful for implementing objects like tuples, which are able to determine their size at construction time. Embedding the array of fields into the same allocation decreases the number of allocations, improving the memory management efficiency. void PyObject_Del(void *op) Releases memory allocated to an object using "PyObject_New()" or "PyObject_NewVar()". This is normally called from the "tp_dealloc" handler specified in the object’s type. The fields of the object should not be accessed after this call as the memory is no longer a valid Python object. PyObject _Py_NoneStruct Object which is visible in Python as "None". This should only be accessed using the "Py_None" macro, which evaluates to a pointer to this object. See also: "PyModule_Create()" To allocate and create extension modules. API and ABI Versioning ********************** "PY_VERSION_HEX" is the Python version number encoded in a single integer. For example if the "PY_VERSION_HEX" is set to "0x030401a2", the underlying version information can be found by treating it as a 32 bit number in the following manner: +---------+---------------------------+--------------------------------------------------+ | Bytes | Bits (big endian order) | Meaning | |=========|===========================|==================================================| | "1" | "1-8" | "PY_MAJOR_VERSION" (the "3" in "3.4.1a2") | +---------+---------------------------+--------------------------------------------------+ | "2" | "9-16" | "PY_MINOR_VERSION" (the "4" in "3.4.1a2") | +---------+---------------------------+--------------------------------------------------+ | "3" | "17-24" | "PY_MICRO_VERSION" (the "1" in "3.4.1a2") | +---------+---------------------------+--------------------------------------------------+ | "4" | "25-28" | "PY_RELEASE_LEVEL" ("0xA" for alpha, "0xB" for | | | | beta, "0xC" for release candidate and "0xF" for | | | | final), in this case it is alpha. | +---------+---------------------------+--------------------------------------------------+ | | "29-32" | "PY_RELEASE_SERIAL" (the "2" in "3.4.1a2", zero | | | | for final releases) | +---------+---------------------------+--------------------------------------------------+ Thus "3.4.1a2" is hexversion "0x030401a2". All the given macros are defined in Include/patchlevel.h. Parsing arguments and building values ************************************* These functions are useful when creating your own extensions functions and methods. Additional information and examples are available in Extending and Embedding the Python Interpreter. The first three of these functions described, "PyArg_ParseTuple()", "PyArg_ParseTupleAndKeywords()", and "PyArg_Parse()", all use *format strings* which are used to tell the function about the expected arguments. The format strings use the same syntax for each of these functions. Parsing arguments ================= A format string consists of zero or more “format units.” A format unit describes one Python object; it is usually a single character or a parenthesized sequence of format units. With a few exceptions, a format unit that is not a parenthesized sequence normally corresponds to a single address argument to these functions. In the following description, the quoted form is the format unit; the entry in (round) parentheses is the Python object type that matches the format unit; and the entry in [square] brackets is the type of the C variable(s) whose address should be passed. Strings and buffers ------------------- These formats allow accessing an object as a contiguous chunk of memory. You don’t have to provide raw storage for the returned unicode or bytes area. In general, when a format sets a pointer to a buffer, the buffer is managed by the corresponding Python object, and the buffer shares the lifetime of this object. You won’t have to release any memory yourself. The only exceptions are "es", "es#", "et" and "et#". However, when a "Py_buffer" structure gets filled, the underlying buffer is locked so that the caller can subsequently use the buffer even inside a "Py_BEGIN_ALLOW_THREADS" block without the risk of mutable data being resized or destroyed. As a result, **you have to call** "PyBuffer_Release()" after you have finished processing the data (or in any early abort case). Unless otherwise stated, buffers are not NUL-terminated. Some formats require a read-only *bytes-like object*, and set a pointer instead of a buffer structure. They work by checking that the object’s "PyBufferProcs.bf_releasebuffer" field is "NULL", which disallows mutable objects such as "bytearray". Note: For all "#" variants of formats ("s#", "y#", etc.), the type of the length argument (int or "Py_ssize_t") is controlled by defining the macro "PY_SSIZE_T_CLEAN" before including "Python.h". If the macro was defined, length is a "Py_ssize_t" rather than an "int". This behavior will change in a future Python version to only support "Py_ssize_t" and drop "int" support. It is best to always define "PY_SSIZE_T_CLEAN". "s" ("str") [const char *] Convert a Unicode object to a C pointer to a character string. A pointer to an existing string is stored in the character pointer variable whose address you pass. The C string is NUL-terminated. The Python string must not contain embedded null code points; if it does, a "ValueError" exception is raised. Unicode objects are converted to C strings using "'utf-8'" encoding. If this conversion fails, a "UnicodeError" is raised. Note: This format does not accept *bytes-like objects*. If you want to accept filesystem paths and convert them to C character strings, it is preferable to use the "O&" format with "PyUnicode_FSConverter()" as *converter*. Changed in version 3.5: Previously, "TypeError" was raised when embedded null code points were encountered in the Python string. "s*" ("str" or *bytes-like object*) [Py_buffer] This format accepts Unicode objects as well as bytes-like objects. It fills a "Py_buffer" structure provided by the caller. In this case the resulting C string may contain embedded NUL bytes. Unicode objects are converted to C strings using "'utf-8'" encoding. "s#" ("str", read-only *bytes-like object*) [const char *, int or "Py_ssize_t"] Like "s*", except that it doesn’t accept mutable objects. The result is stored into two C variables, the first one a pointer to a C string, the second one its length. The string may contain embedded null bytes. Unicode objects are converted to C strings using "'utf-8'" encoding. "z" ("str" or "None") [const char *] Like "s", but the Python object may also be "None", in which case the C pointer is set to "NULL". "z*" ("str", *bytes-like object* or "None") [Py_buffer] Like "s*", but the Python object may also be "None", in which case the "buf" member of the "Py_buffer" structure is set to "NULL". "z#" ("str", read-only *bytes-like object* or "None") [const char *, int or "Py_ssize_t"] Like "s#", but the Python object may also be "None", in which case the C pointer is set to "NULL". "y" (read-only *bytes-like object*) [const char *] This format converts a bytes-like object to a C pointer to a character string; it does not accept Unicode objects. The bytes buffer must not contain embedded null bytes; if it does, a "ValueError" exception is raised. Changed in version 3.5: Previously, "TypeError" was raised when embedded null bytes were encountered in the bytes buffer. "y*" (*bytes-like object*) [Py_buffer] This variant on "s*" doesn’t accept Unicode objects, only bytes- like objects. **This is the recommended way to accept binary data.** "y#" (read-only *bytes-like object*) [const char *, int or "Py_ssize_t"] This variant on "s#" doesn’t accept Unicode objects, only bytes- like objects. "S" ("bytes") [PyBytesObject *] Requires that the Python object is a "bytes" object, without attempting any conversion. Raises "TypeError" if the object is not a bytes object. The C variable may also be declared as "PyObject*". "Y" ("bytearray") [PyByteArrayObject *] Requires that the Python object is a "bytearray" object, without attempting any conversion. Raises "TypeError" if the object is not a "bytearray" object. The C variable may also be declared as "PyObject*". "u" ("str") [const Py_UNICODE *] Convert a Python Unicode object to a C pointer to a NUL-terminated buffer of Unicode characters. You must pass the address of a "Py_UNICODE" pointer variable, which will be filled with the pointer to an existing Unicode buffer. Please note that the width of a "Py_UNICODE" character depends on compilation options (it is either 16 or 32 bits). The Python string must not contain embedded null code points; if it does, a "ValueError" exception is raised. Changed in version 3.5: Previously, "TypeError" was raised when embedded null code points were encountered in the Python string. Deprecated since version 3.3, will be removed in version 3.12: Part of the old-style "Py_UNICODE" API; please migrate to using "PyUnicode_AsWideCharString()". "u#" ("str") [const Py_UNICODE *, int or "Py_ssize_t"] This variant on "u" stores into two C variables, the first one a pointer to a Unicode data buffer, the second one its length. This variant allows null code points. Deprecated since version 3.3, will be removed in version 3.12: Part of the old-style "Py_UNICODE" API; please migrate to using "PyUnicode_AsWideCharString()". "Z" ("str" or "None") [const Py_UNICODE *] Like "u", but the Python object may also be "None", in which case the "Py_UNICODE" pointer is set to "NULL". Deprecated since version 3.3, will be removed in version 3.12: Part of the old-style "Py_UNICODE" API; please migrate to using "PyUnicode_AsWideCharString()". "Z#" ("str" or "None") [const Py_UNICODE *, int or "Py_ssize_t"] Like "u#", but the Python object may also be "None", in which case the "Py_UNICODE" pointer is set to "NULL". Deprecated since version 3.3, will be removed in version 3.12: Part of the old-style "Py_UNICODE" API; please migrate to using "PyUnicode_AsWideCharString()". "U" ("str") [PyObject *] Requires that the Python object is a Unicode object, without attempting any conversion. Raises "TypeError" if the object is not a Unicode object. The C variable may also be declared as "PyObject*". "w*" (read-write *bytes-like object*) [Py_buffer] This format accepts any object which implements the read-write buffer interface. It fills a "Py_buffer" structure provided by the caller. The buffer may contain embedded null bytes. The caller have to call "PyBuffer_Release()" when it is done with the buffer. "es" ("str") [const char *encoding, char **buffer] This variant on "s" is used for encoding Unicode into a character buffer. It only works for encoded data without embedded NUL bytes. This format requires two arguments. The first is only used as input, and must be a "const char*" which points to the name of an encoding as a NUL-terminated string, or "NULL", in which case "'utf-8'" encoding is used. An exception is raised if the named encoding is not known to Python. The second argument must be a "char**"; the value of the pointer it references will be set to a buffer with the contents of the argument text. The text will be encoded in the encoding specified by the first argument. "PyArg_ParseTuple()" will allocate a buffer of the needed size, copy the encoded data into this buffer and adjust **buffer* to reference the newly allocated storage. The caller is responsible for calling "PyMem_Free()" to free the allocated buffer after use. "et" ("str", "bytes" or "bytearray") [const char *encoding, char **buffer] Same as "es" except that byte string objects are passed through without recoding them. Instead, the implementation assumes that the byte string object uses the encoding passed in as parameter. "es#" ("str") [const char *encoding, char **buffer, int or "Py_ssize_t" *buffer_length] This variant on "s#" is used for encoding Unicode into a character buffer. Unlike the "es" format, this variant allows input data which contains NUL characters. It requires three arguments. The first is only used as input, and must be a "const char*" which points to the name of an encoding as a NUL-terminated string, or "NULL", in which case "'utf-8'" encoding is used. An exception is raised if the named encoding is not known to Python. The second argument must be a "char**"; the value of the pointer it references will be set to a buffer with the contents of the argument text. The text will be encoded in the encoding specified by the first argument. The third argument must be a pointer to an integer; the referenced integer will be set to the number of bytes in the output buffer. There are two modes of operation: If **buffer* points a "NULL" pointer, the function will allocate a buffer of the needed size, copy the encoded data into this buffer and set **buffer* to reference the newly allocated storage. The caller is responsible for calling "PyMem_Free()" to free the allocated buffer after usage. If **buffer* points to a non-"NULL" pointer (an already allocated buffer), "PyArg_ParseTuple()" will use this location as the buffer and interpret the initial value of **buffer_length* as the buffer size. It will then copy the encoded data into the buffer and NUL- terminate it. If the buffer is not large enough, a "ValueError" will be set. In both cases, **buffer_length* is set to the length of the encoded data without the trailing NUL byte. "et#" ("str", "bytes" or "bytearray") [const char *encoding, char **buffer, int or "Py_ssize_t" *buffer_length] Same as "es#" except that byte string objects are passed through without recoding them. Instead, the implementation assumes that the byte string object uses the encoding passed in as parameter. Numbers ------- "b" ("int") [unsigned char] Convert a nonnegative Python integer to an unsigned tiny int, stored in a C "unsigned char". "B" ("int") [unsigned char] Convert a Python integer to a tiny int without overflow checking, stored in a C "unsigned char". "h" ("int") [short int] Convert a Python integer to a C "short int". "H" ("int") [unsigned short int] Convert a Python integer to a C "unsigned short int", without overflow checking. "i" ("int") [int] Convert a Python integer to a plain C "int". "I" ("int") [unsigned int] Convert a Python integer to a C "unsigned int", without overflow checking. "l" ("int") [long int] Convert a Python integer to a C "long int". "k" ("int") [unsigned long] Convert a Python integer to a C "unsigned long" without overflow checking. "L" ("int") [long long] Convert a Python integer to a C "long long". "K" ("int") [unsigned long long] Convert a Python integer to a C "unsigned long long" without overflow checking. "n" ("int") ["Py_ssize_t"] Convert a Python integer to a C "Py_ssize_t". "c" ("bytes" or "bytearray" of length 1) [char] Convert a Python byte, represented as a "bytes" or "bytearray" object of length 1, to a C "char". Changed in version 3.3: Allow "bytearray" objects. "C" ("str" of length 1) [int] Convert a Python character, represented as a "str" object of length 1, to a C "int". "f" ("float") [float] Convert a Python floating point number to a C "float". "d" ("float") [double] Convert a Python floating point number to a C "double". "D" ("complex") [Py_complex] Convert a Python complex number to a C "Py_complex" structure. Other objects ------------- "O" (object) [PyObject *] Store a Python object (without any conversion) in a C object pointer. The C program thus receives the actual object that was passed. The object’s reference count is not increased. The pointer stored is not "NULL". "O!" (object) [*typeobject*, PyObject *] Store a Python object in a C object pointer. This is similar to "O", but takes two C arguments: the first is the address of a Python type object, the second is the address of the C variable (of type "PyObject*") into which the object pointer is stored. If the Python object does not have the required type, "TypeError" is raised. "O&" (object) [*converter*, *anything*] Convert a Python object to a C variable through a *converter* function. This takes two arguments: the first is a function, the second is the address of a C variable (of arbitrary type), converted to "void *". The *converter* function in turn is called as follows: status = converter(object, address); where *object* is the Python object to be converted and *address* is the "void*" argument that was passed to the "PyArg_Parse*()" function. The returned *status* should be "1" for a successful conversion and "0" if the conversion has failed. When the conversion fails, the *converter* function should raise an exception and leave the content of *address* unmodified. If the *converter* returns "Py_CLEANUP_SUPPORTED", it may get called a second time if the argument parsing eventually fails, giving the converter a chance to release any memory that it had already allocated. In this second call, the *object* parameter will be "NULL"; *address* will have the same value as in the original call. Changed in version 3.1: "Py_CLEANUP_SUPPORTED" was added. "p" ("bool") [int] Tests the value passed in for truth (a boolean **p**redicate) and converts the result to its equivalent C true/false integer value. Sets the int to "1" if the expression was true and "0" if it was false. This accepts any valid Python value. See Truth Value Testing for more information about how Python tests values for truth. New in version 3.3. "(items)" ("tuple") [*matching-items*] The object must be a Python sequence whose length is the number of format units in *items*. The C arguments must correspond to the individual format units in *items*. Format units for sequences may be nested. It is possible to pass “long” integers (integers whose value exceeds the platform’s "LONG_MAX") however no proper range checking is done — the most significant bits are silently truncated when the receiving field is too small to receive the value (actually, the semantics are inherited from downcasts in C — your mileage may vary). A few other characters have a meaning in a format string. These may not occur inside nested parentheses. They are: "|" Indicates that the remaining arguments in the Python argument list are optional. The C variables corresponding to optional arguments should be initialized to their default value — when an optional argument is not specified, "PyArg_ParseTuple()" does not touch the contents of the corresponding C variable(s). "$" "PyArg_ParseTupleAndKeywords()" only: Indicates that the remaining arguments in the Python argument list are keyword-only. Currently, all keyword-only arguments must also be optional arguments, so "|" must always be specified before "$" in the format string. New in version 3.3. ":" The list of format units ends here; the string after the colon is used as the function name in error messages (the “associated value” of the exception that "PyArg_ParseTuple()" raises). ";" The list of format units ends here; the string after the semicolon is used as the error message *instead* of the default error message. ":" and ";" mutually exclude each other. Note that any Python object references which are provided to the caller are *borrowed* references; do not decrement their reference count! Additional arguments passed to these functions must be addresses of variables whose type is determined by the format string; these are used to store values from the input tuple. There are a few cases, as described in the list of format units above, where these parameters are used as input values; they should match what is specified for the corresponding format unit in that case. For the conversion to succeed, the *arg* object must match the format and the format must be exhausted. On success, the "PyArg_Parse*()" functions return true, otherwise they return false and raise an appropriate exception. When the "PyArg_Parse*()" functions fail due to conversion failure in one of the format units, the variables at the addresses corresponding to that and the following format units are left untouched. API Functions ------------- int PyArg_ParseTuple(PyObject *args, const char *format, ...) Parse the parameters of a function that takes only positional parameters into local variables. Returns true on success; on failure, it returns false and raises the appropriate exception. int PyArg_VaParse(PyObject *args, const char *format, va_list vargs) Identical to "PyArg_ParseTuple()", except that it accepts a va_list rather than a variable number of arguments. int PyArg_ParseTupleAndKeywords(PyObject *args, PyObject *kw, const char *format, char *keywords[], ...) Parse the parameters of a function that takes both positional and keyword parameters into local variables. The *keywords* argument is a "NULL"-terminated array of keyword parameter names. Empty names denote positional-only parameters. Returns true on success; on failure, it returns false and raises the appropriate exception. Changed in version 3.6: Added support for positional-only parameters. int PyArg_VaParseTupleAndKeywords(PyObject *args, PyObject *kw, const char *format, char *keywords[], va_list vargs) Identical to "PyArg_ParseTupleAndKeywords()", except that it accepts a va_list rather than a variable number of arguments. int PyArg_ValidateKeywordArguments(PyObject *) Ensure that the keys in the keywords argument dictionary are strings. This is only needed if "PyArg_ParseTupleAndKeywords()" is not used, since the latter already does this check. New in version 3.2. int PyArg_Parse(PyObject *args, const char *format, ...) Function used to deconstruct the argument lists of “old-style” functions — these are functions which use the "METH_OLDARGS" parameter parsing method, which has been removed in Python 3. This is not recommended for use in parameter parsing in new code, and most code in the standard interpreter has been modified to no longer use this for that purpose. It does remain a convenient way to decompose other tuples, however, and may continue to be used for that purpose. int PyArg_UnpackTuple(PyObject *args, const char *name, Py_ssize_t min, Py_ssize_t max, ...) A simpler form of parameter retrieval which does not use a format string to specify the types of the arguments. Functions which use this method to retrieve their parameters should be declared as "METH_VARARGS" in function or method tables. The tuple containing the actual parameters should be passed as *args*; it must actually be a tuple. The length of the tuple must be at least *min* and no more than *max*; *min* and *max* may be equal. Additional arguments must be passed to the function, each of which should be a pointer to a "PyObject*" variable; these will be filled in with the values from *args*; they will contain borrowed references. The variables which correspond to optional parameters not given by *args* will not be filled in; these should be initialized by the caller. This function returns true on success and false if *args* is not a tuple or contains the wrong number of elements; an exception will be set if there was a failure. This is an example of the use of this function, taken from the sources for the "_weakref" helper module for weak references: static PyObject * weakref_ref(PyObject *self, PyObject *args) { PyObject *object; PyObject *callback = NULL; PyObject *result = NULL; if (PyArg_UnpackTuple(args, "ref", 1, 2, &object, &callback)) { result = PyWeakref_NewRef(object, callback); } return result; } The call to "PyArg_UnpackTuple()" in this example is entirely equivalent to this call to "PyArg_ParseTuple()": PyArg_ParseTuple(args, "O|O:ref", &object, &callback) Building values =============== PyObject* Py_BuildValue(const char *format, ...) *Return value: New reference.* Create a new value based on a format string similar to those accepted by the "PyArg_Parse*()" family of functions and a sequence of values. Returns the value or "NULL" in the case of an error; an exception will be raised if "NULL" is returned. "Py_BuildValue()" does not always build a tuple. It builds a tuple only if its format string contains two or more format units. If the format string is empty, it returns "None"; if it contains exactly one format unit, it returns whatever object is described by that format unit. To force it to return a tuple of size 0 or one, parenthesize the format string. When memory buffers are passed as parameters to supply data to build objects, as for the "s" and "s#" formats, the required data is copied. Buffers provided by the caller are never referenced by the objects created by "Py_BuildValue()". In other words, if your code invokes "malloc()" and passes the allocated memory to "Py_BuildValue()", your code is responsible for calling "free()" for that memory once "Py_BuildValue()" returns. In the following description, the quoted form is the format unit; the entry in (round) parentheses is the Python object type that the format unit will return; and the entry in [square] brackets is the type of the C value(s) to be passed. The characters space, tab, colon and comma are ignored in format strings (but not within format units such as "s#"). This can be used to make long format strings a tad more readable. "s" ("str" or "None") [const char *] Convert a null-terminated C string to a Python "str" object using "'utf-8'" encoding. If the C string pointer is "NULL", "None" is used. "s#" ("str" or "None") [const char *, int or "Py_ssize_t"] Convert a C string and its length to a Python "str" object using "'utf-8'" encoding. If the C string pointer is "NULL", the length is ignored and "None" is returned. "y" ("bytes") [const char *] This converts a C string to a Python "bytes" object. If the C string pointer is "NULL", "None" is returned. "y#" ("bytes") [const char *, int or "Py_ssize_t"] This converts a C string and its lengths to a Python object. If the C string pointer is "NULL", "None" is returned. "z" ("str" or "None") [const char *] Same as "s". "z#" ("str" or "None") [const char *, int or "Py_ssize_t"] Same as "s#". "u" ("str") [const wchar_t *] Convert a null-terminated "wchar_t" buffer of Unicode (UTF-16 or UCS-4) data to a Python Unicode object. If the Unicode buffer pointer is "NULL", "None" is returned. "u#" ("str") [const wchar_t *, int or "Py_ssize_t"] Convert a Unicode (UTF-16 or UCS-4) data buffer and its length to a Python Unicode object. If the Unicode buffer pointer is "NULL", the length is ignored and "None" is returned. "U" ("str" or "None") [const char *] Same as "s". "U#" ("str" or "None") [const char *, int or "Py_ssize_t"] Same as "s#". "i" ("int") [int] Convert a plain C "int" to a Python integer object. "b" ("int") [char] Convert a plain C "char" to a Python integer object. "h" ("int") [short int] Convert a plain C "short int" to a Python integer object. "l" ("int") [long int] Convert a C "long int" to a Python integer object. "B" ("int") [unsigned char] Convert a C "unsigned char" to a Python integer object. "H" ("int") [unsigned short int] Convert a C "unsigned short int" to a Python integer object. "I" ("int") [unsigned int] Convert a C "unsigned int" to a Python integer object. "k" ("int") [unsigned long] Convert a C "unsigned long" to a Python integer object. "L" ("int") [long long] Convert a C "long long" to a Python integer object. "K" ("int") [unsigned long long] Convert a C "unsigned long long" to a Python integer object. "n" ("int") ["Py_ssize_t"] Convert a C "Py_ssize_t" to a Python integer. "c" ("bytes" of length 1) [char] Convert a C "int" representing a byte to a Python "bytes" object of length 1. "C" ("str" of length 1) [int] Convert a C "int" representing a character to Python "str" object of length 1. "d" ("float") [double] Convert a C "double" to a Python floating point number. "f" ("float") [float] Convert a C "float" to a Python floating point number. "D" ("complex") [Py_complex *] Convert a C "Py_complex" structure to a Python complex number. "O" (object) [PyObject *] Pass a Python object untouched (except for its reference count, which is incremented by one). If the object passed in is a "NULL" pointer, it is assumed that this was caused because the call producing the argument found an error and set an exception. Therefore, "Py_BuildValue()" will return "NULL" but won’t raise an exception. If no exception has been raised yet, "SystemError" is set. "S" (object) [PyObject *] Same as "O". "N" (object) [PyObject *] Same as "O", except it doesn’t increment the reference count on the object. Useful when the object is created by a call to an object constructor in the argument list. "O&" (object) [*converter*, *anything*] Convert *anything* to a Python object through a *converter* function. The function is called with *anything* (which should be compatible with "void*") as its argument and should return a “new” Python object, or "NULL" if an error occurred. "(items)" ("tuple") [*matching-items*] Convert a sequence of C values to a Python tuple with the same number of items. "[items]" ("list") [*matching-items*] Convert a sequence of C values to a Python list with the same number of items. "{items}" ("dict") [*matching-items*] Convert a sequence of C values to a Python dictionary. Each pair of consecutive C values adds one item to the dictionary, serving as key and value, respectively. If there is an error in the format string, the "SystemError" exception is set and "NULL" returned. PyObject* Py_VaBuildValue(const char *format, va_list vargs) *Return value: New reference.* Identical to "Py_BuildValue()", except that it accepts a va_list rather than a variable number of arguments. Boolean Objects *************** Booleans in Python are implemented as a subclass of integers. There are only two booleans, "Py_False" and "Py_True". As such, the normal creation and deletion functions don’t apply to booleans. The following macros are available, however. int PyBool_Check(PyObject *o) Return true if *o* is of type "PyBool_Type". This function always succeeds. PyObject* Py_False The Python "False" object. This object has no methods. It needs to be treated just like any other object with respect to reference counts. PyObject* Py_True The Python "True" object. This object has no methods. It needs to be treated just like any other object with respect to reference counts. Py_RETURN_FALSE Return "Py_False" from a function, properly incrementing its reference count. Py_RETURN_TRUE Return "Py_True" from a function, properly incrementing its reference count. PyObject* PyBool_FromLong(long v) *Return value: New reference.* Return a new reference to "Py_True" or "Py_False" depending on the truth value of *v*. Buffer Protocol *************** Certain objects available in Python wrap access to an underlying memory array or *buffer*. Such objects include the built-in "bytes" and "bytearray", and some extension types like "array.array". Third- party libraries may define their own types for special purposes, such as image processing or numeric analysis. While each of these types have their own semantics, they share the common characteristic of being backed by a possibly large memory buffer. It is then desirable, in some situations, to access that buffer directly and without intermediate copying. Python provides such a facility at the C level in the form of the buffer protocol. This protocol has two sides: * on the producer side, a type can export a “buffer interface” which allows objects of that type to expose information about their underlying buffer. This interface is described in the section Buffer Object Structures; * on the consumer side, several means are available to obtain a pointer to the raw underlying data of an object (for example a method parameter). Simple objects such as "bytes" and "bytearray" expose their underlying buffer in byte-oriented form. Other forms are possible; for example, the elements exposed by an "array.array" can be multi-byte values. An example consumer of the buffer interface is the "write()" method of file objects: any object that can export a series of bytes through the buffer interface can be written to a file. While "write()" only needs read-only access to the internal contents of the object passed to it, other methods such as "readinto()" need write access to the contents of their argument. The buffer interface allows objects to selectively allow or reject exporting of read-write and read-only buffers. There are two ways for a consumer of the buffer interface to acquire a buffer over a target object: * call "PyObject_GetBuffer()" with the right parameters; * call "PyArg_ParseTuple()" (or one of its siblings) with one of the "y*", "w*" or "s*" format codes. In both cases, "PyBuffer_Release()" must be called when the buffer isn’t needed anymore. Failure to do so could lead to various issues such as resource leaks. Buffer structure ================ Buffer structures (or simply “buffers”) are useful as a way to expose the binary data from another object to the Python programmer. They can also be used as a zero-copy slicing mechanism. Using their ability to reference a block of memory, it is possible to expose any data to the Python programmer quite easily. The memory could be a large, constant array in a C extension, it could be a raw block of memory for manipulation before passing to an operating system library, or it could be used to pass around structured data in its native, in- memory format. Contrary to most data types exposed by the Python interpreter, buffers are not "PyObject" pointers but rather simple C structures. This allows them to be created and copied very simply. When a generic wrapper around a buffer is needed, a memoryview object can be created. For short instructions how to write an exporting object, see Buffer Object Structures. For obtaining a buffer, see "PyObject_GetBuffer()". Py_buffer void *buf A pointer to the start of the logical structure described by the buffer fields. This can be any location within the underlying physical memory block of the exporter. For example, with negative "strides" the value may point to the end of the memory block. For *contiguous* arrays, the value points to the beginning of the memory block. void *obj A new reference to the exporting object. The reference is owned by the consumer and automatically decremented and set to "NULL" by "PyBuffer_Release()". The field is the equivalent of the return value of any standard C-API function. As a special case, for *temporary* buffers that are wrapped by "PyMemoryView_FromBuffer()" or "PyBuffer_FillInfo()" this field is "NULL". In general, exporting objects MUST NOT use this scheme. Py_ssize_t len "product(shape) * itemsize". For contiguous arrays, this is the length of the underlying memory block. For non-contiguous arrays, it is the length that the logical structure would have if it were copied to a contiguous representation. Accessing "((char *)buf)[0] up to ((char *)buf)[len-1]" is only valid if the buffer has been obtained by a request that guarantees contiguity. In most cases such a request will be "PyBUF_SIMPLE" or "PyBUF_WRITABLE". int readonly An indicator of whether the buffer is read-only. This field is controlled by the "PyBUF_WRITABLE" flag. Py_ssize_t itemsize Item size in bytes of a single element. Same as the value of "struct.calcsize()" called on non-"NULL" "format" values. Important exception: If a consumer requests a buffer without the "PyBUF_FORMAT" flag, "format" will be set to "NULL", but "itemsize" still has the value for the original format. If "shape" is present, the equality "product(shape) * itemsize == len" still holds and the consumer can use "itemsize" to navigate the buffer. If "shape" is "NULL" as a result of a "PyBUF_SIMPLE" or a "PyBUF_WRITABLE" request, the consumer must disregard "itemsize" and assume "itemsize == 1". const char *format A *NUL* terminated string in "struct" module style syntax describing the contents of a single item. If this is "NULL", ""B"" (unsigned bytes) is assumed. This field is controlled by the "PyBUF_FORMAT" flag. int ndim The number of dimensions the memory represents as an n-dimensional array. If it is "0", "buf" points to a single item representing a scalar. In this case, "shape", "strides" and "suboffsets" MUST be "NULL". The macro "PyBUF_MAX_NDIM" limits the maximum number of dimensions to 64. Exporters MUST respect this limit, consumers of multi-dimensional buffers SHOULD be able to handle up to "PyBUF_MAX_NDIM" dimensions. Py_ssize_t *shape An array of "Py_ssize_t" of length "ndim" indicating the shape of the memory as an n-dimensional array. Note that "shape[0] * ... * shape[ndim-1] * itemsize" MUST be equal to "len". Shape values are restricted to "shape[n] >= 0". The case "shape[n] == 0" requires special attention. See complex arrays for further information. The shape array is read-only for the consumer. Py_ssize_t *strides An array of "Py_ssize_t" of length "ndim" giving the number of bytes to skip to get to a new element in each dimension. Stride values can be any integer. For regular arrays, strides are usually positive, but a consumer MUST be able to handle the case "strides[n] <= 0". See complex arrays for further information. The strides array is read-only for the consumer. Py_ssize_t *suboffsets An array of "Py_ssize_t" of length "ndim". If "suboffsets[n] >= 0", the values stored along the nth dimension are pointers and the suboffset value dictates how many bytes to add to each pointer after de-referencing. A suboffset value that is negative indicates that no de-referencing should occur (striding in a contiguous memory block). If all suboffsets are negative (i.e. no de-referencing is needed), then this field must be "NULL" (the default value). This type of array representation is used by the Python Imaging Library (PIL). See complex arrays for further information how to access elements of such an array. The suboffsets array is read-only for the consumer. void *internal This is for use internally by the exporting object. For example, this might be re-cast as an integer by the exporter and used to store flags about whether or not the shape, strides, and suboffsets arrays must be freed when the buffer is released. The consumer MUST NOT alter this value. Buffer request types ==================== Buffers are usually obtained by sending a buffer request to an exporting object via "PyObject_GetBuffer()". Since the complexity of the logical structure of the memory can vary drastically, the consumer uses the *flags* argument to specify the exact buffer type it can handle. All "Py_buffer" fields are unambiguously defined by the request type. request-independent fields -------------------------- The following fields are not influenced by *flags* and must always be filled in with the correct values: "obj", "buf", "len", "itemsize", "ndim". readonly, format ---------------- PyBUF_WRITABLE Controls the "readonly" field. If set, the exporter MUST provide a writable buffer or else report failure. Otherwise, the exporter MAY provide either a read-only or writable buffer, but the choice MUST be consistent for all consumers. PyBUF_FORMAT Controls the "format" field. If set, this field MUST be filled in correctly. Otherwise, this field MUST be "NULL". "PyBUF_WRITABLE" can be |’d to any of the flags in the next section. Since "PyBUF_SIMPLE" is defined as 0, "PyBUF_WRITABLE" can be used as a stand-alone flag to request a simple writable buffer. "PyBUF_FORMAT" can be |’d to any of the flags except "PyBUF_SIMPLE". The latter already implies format "B" (unsigned bytes). shape, strides, suboffsets -------------------------- The flags that control the logical structure of the memory are listed in decreasing order of complexity. Note that each flag contains all bits of the flags below it. +-------------------------------+---------+-----------+--------------+ | Request | shape | strides | suboffsets | |===============================|=========|===========|==============| | PyBUF_INDIRECT | yes | yes | if needed | +-------------------------------+---------+-----------+--------------+ | PyBUF_STRIDES | yes | yes | NULL | +-------------------------------+---------+-----------+--------------+ | PyBUF_ND | yes | NULL | NULL | +-------------------------------+---------+-----------+--------------+ | PyBUF_SIMPLE | NULL | NULL | NULL | +-------------------------------+---------+-----------+--------------+ contiguity requests ------------------- C or Fortran *contiguity* can be explicitly requested, with and without stride information. Without stride information, the buffer must be C-contiguous. +-------------------------------------+---------+-----------+--------------+----------+ | Request | shape | strides | suboffsets | contig | |=====================================|=========|===========|==============|==========| | PyBUF_C_CONTIGUOUS | yes | yes | NULL | C | +-------------------------------------+---------+-----------+--------------+----------+ | PyBUF_F_CONTIGUOUS | yes | yes | NULL | F | +-------------------------------------+---------+-----------+--------------+----------+ | PyBUF_ANY_CONTIGUOUS | yes | yes | NULL | C or F | +-------------------------------------+---------+-----------+--------------+----------+ | "PyBUF_ND" | yes | NULL | NULL | C | +-------------------------------------+---------+-----------+--------------+----------+ compound requests ----------------- All possible requests are fully defined by some combination of the flags in the previous section. For convenience, the buffer protocol provides frequently used combinations as single flags. In the following table *U* stands for undefined contiguity. The consumer would have to call "PyBuffer_IsContiguous()" to determine contiguity. +---------------------------------+---------+-----------+--------------+----------+------------+----------+ | Request | shape | strides | suboffsets | contig | readonly | format | |=================================|=========|===========|==============|==========|============|==========| | PyBUF_FULL | yes | yes | if needed | U | 0 | yes | +---------------------------------+---------+-----------+--------------+----------+------------+----------+ | PyBUF_FULL_RO | yes | yes | if needed | U | 1 or 0 | yes | +---------------------------------+---------+-----------+--------------+----------+------------+----------+ | PyBUF_RECORDS | yes | yes | NULL | U | 0 | yes | +---------------------------------+---------+-----------+--------------+----------+------------+----------+ | PyBUF_RECORDS_RO | yes | yes | NULL | U | 1 or 0 | yes | +---------------------------------+---------+-----------+--------------+----------+------------+----------+ | PyBUF_STRIDED | yes | yes | NULL | U | 0 | NULL | +---------------------------------+---------+-----------+--------------+----------+------------+----------+ | PyBUF_STRIDED_RO | yes | yes | NULL | U | 1 or 0 | NULL | +---------------------------------+---------+-----------+--------------+----------+------------+----------+ | PyBUF_CONTIG | yes | NULL | NULL | C | 0 | NULL | +---------------------------------+---------+-----------+--------------+----------+------------+----------+ | PyBUF_CONTIG_RO | yes | NULL | NULL | C | 1 or 0 | NULL | +---------------------------------+---------+-----------+--------------+----------+------------+----------+ Complex arrays ============== NumPy-style: shape and strides ------------------------------ The logical structure of NumPy-style arrays is defined by "itemsize", "ndim", "shape" and "strides". If "ndim == 0", the memory location pointed to by "buf" is interpreted as a scalar of size "itemsize". In that case, both "shape" and "strides" are "NULL". If "strides" is "NULL", the array is interpreted as a standard n-dimensional C-array. Otherwise, the consumer must access an n-dimensional array as follows: ptr = (char *)buf + indices[0] * strides[0] + ... + indices[n-1] * strides[n-1]; item = *((typeof(item) *)ptr); As noted above, "buf" can point to any location within the actual memory block. An exporter can check the validity of a buffer with this function: def verify_structure(memlen, itemsize, ndim, shape, strides, offset): """Verify that the parameters represent a valid array within the bounds of the allocated memory: char *mem: start of the physical memory block memlen: length of the physical memory block offset: (char *)buf - mem """ if offset % itemsize: return False if offset < 0 or offset+itemsize > memlen: return False if any(v % itemsize for v in strides): return False if ndim <= 0: return ndim == 0 and not shape and not strides if 0 in shape: return True imin = sum(strides[j]*(shape[j]-1) for j in range(ndim) if strides[j] <= 0) imax = sum(strides[j]*(shape[j]-1) for j in range(ndim) if strides[j] > 0) return 0 <= offset+imin and offset+imax+itemsize <= memlen PIL-style: shape, strides and suboffsets ---------------------------------------- In addition to the regular items, PIL-style arrays can contain pointers that must be followed in order to get to the next element in a dimension. For example, the regular three-dimensional C-array "char v[2][2][3]" can also be viewed as an array of 2 pointers to 2 two- dimensional arrays: "char (*v[2])[2][3]". In suboffsets representation, those two pointers can be embedded at the start of "buf", pointing to two "char x[2][3]" arrays that can be located anywhere in memory. Here is a function that returns a pointer to the element in an N-D array pointed to by an N-dimensional index when there are both non-"NULL" strides and suboffsets: void *get_item_pointer(int ndim, void *buf, Py_ssize_t *strides, Py_ssize_t *suboffsets, Py_ssize_t *indices) { char *pointer = (char*)buf; int i; for (i = 0; i < ndim; i++) { pointer += strides[i] * indices[i]; if (suboffsets[i] >=0 ) { pointer = *((char**)pointer) + suboffsets[i]; } } return (void*)pointer; } Buffer-related functions ======================== int PyObject_CheckBuffer(PyObject *obj) Return "1" if *obj* supports the buffer interface otherwise "0". When "1" is returned, it doesn’t guarantee that "PyObject_GetBuffer()" will succeed. This function always succeeds. int PyObject_GetBuffer(PyObject *exporter, Py_buffer *view, int flags) Send a request to *exporter* to fill in *view* as specified by *flags*. If the exporter cannot provide a buffer of the exact type, it MUST raise "PyExc_BufferError", set "view->obj" to "NULL" and return "-1". On success, fill in *view*, set "view->obj" to a new reference to *exporter* and return 0. In the case of chained buffer providers that redirect requests to a single object, "view->obj" MAY refer to this object instead of *exporter* (See Buffer Object Structures). Successful calls to "PyObject_GetBuffer()" must be paired with calls to "PyBuffer_Release()", similar to "malloc()" and "free()". Thus, after the consumer is done with the buffer, "PyBuffer_Release()" must be called exactly once. void PyBuffer_Release(Py_buffer *view) Release the buffer *view* and decrement the reference count for "view->obj". This function MUST be called when the buffer is no longer being used, otherwise reference leaks may occur. It is an error to call this function on a buffer that was not obtained via "PyObject_GetBuffer()". Py_ssize_t PyBuffer_SizeFromFormat(const char *format) Return the implied "itemsize" from "format". On error, raise an exception and return -1. New in version 3.9. int PyBuffer_IsContiguous(Py_buffer *view, char order) Return "1" if the memory defined by the *view* is C-style (*order* is "'C'") or Fortran-style (*order* is "'F'") *contiguous* or either one (*order* is "'A'"). Return "0" otherwise. This function always succeeds. void* PyBuffer_GetPointer(Py_buffer *view, Py_ssize_t *indices) Get the memory area pointed to by the *indices* inside the given *view*. *indices* must point to an array of "view->ndim" indices. int PyBuffer_FromContiguous(Py_buffer *view, void *buf, Py_ssize_t len, char fort) Copy contiguous *len* bytes from *buf* to *view*. *fort* can be "'C'" or "'F'" (for C-style or Fortran-style ordering). "0" is returned on success, "-1" on error. int PyBuffer_ToContiguous(void *buf, Py_buffer *src, Py_ssize_t len, char order) Copy *len* bytes from *src* to its contiguous representation in *buf*. *order* can be "'C'" or "'F'" or "'A'" (for C-style or Fortran-style ordering or either one). "0" is returned on success, "-1" on error. This function fails if *len* != *src->len*. void PyBuffer_FillContiguousStrides(int ndims, Py_ssize_t *shape, Py_ssize_t *strides, int itemsize, char order) Fill the *strides* array with byte-strides of a *contiguous* (C-style if *order* is "'C'" or Fortran-style if *order* is "'F'") array of the given shape with the given number of bytes per element. int PyBuffer_FillInfo(Py_buffer *view, PyObject *exporter, void *buf, Py_ssize_t len, int readonly, int flags) Handle buffer requests for an exporter that wants to expose *buf* of size *len* with writability set according to *readonly*. *buf* is interpreted as a sequence of unsigned bytes. The *flags* argument indicates the request type. This function always fills in *view* as specified by flags, unless *buf* has been designated as read-only and "PyBUF_WRITABLE" is set in *flags*. On success, set "view->obj" to a new reference to *exporter* and return 0. Otherwise, raise "PyExc_BufferError", set "view->obj" to "NULL" and return "-1"; If this function is used as part of a getbufferproc, *exporter* MUST be set to the exporting object and *flags* must be passed unmodified. Otherwise, *exporter* MUST be "NULL". Byte Array Objects ****************** PyByteArrayObject This subtype of "PyObject" represents a Python bytearray object. PyTypeObject PyByteArray_Type This instance of "PyTypeObject" represents the Python bytearray type; it is the same object as "bytearray" in the Python layer. Type check macros ================= int PyByteArray_Check(PyObject *o) Return true if the object *o* is a bytearray object or an instance of a subtype of the bytearray type. This function always succeeds. int PyByteArray_CheckExact(PyObject *o) Return true if the object *o* is a bytearray object, but not an instance of a subtype of the bytearray type. This function always succeeds. Direct API functions ==================== PyObject* PyByteArray_FromObject(PyObject *o) *Return value: New reference.* Return a new bytearray object from any object, *o*, that implements the buffer protocol. PyObject* PyByteArray_FromStringAndSize(const char *string, Py_ssize_t len) *Return value: New reference.* Create a new bytearray object from *string* and its length, *len*. On failure, "NULL" is returned. PyObject* PyByteArray_Concat(PyObject *a, PyObject *b) *Return value: New reference.* Concat bytearrays *a* and *b* and return a new bytearray with the result. Py_ssize_t PyByteArray_Size(PyObject *bytearray) Return the size of *bytearray* after checking for a "NULL" pointer. char* PyByteArray_AsString(PyObject *bytearray) Return the contents of *bytearray* as a char array after checking for a "NULL" pointer. The returned array always has an extra null byte appended. int PyByteArray_Resize(PyObject *bytearray, Py_ssize_t len) Resize the internal buffer of *bytearray* to *len*. Macros ====== These macros trade safety for speed and they don’t check pointers. char* PyByteArray_AS_STRING(PyObject *bytearray) Macro version of "PyByteArray_AsString()". Py_ssize_t PyByteArray_GET_SIZE(PyObject *bytearray) Macro version of "PyByteArray_Size()". Bytes Objects ************* These functions raise "TypeError" when expecting a bytes parameter and called with a non-bytes parameter. PyBytesObject This subtype of "PyObject" represents a Python bytes object. PyTypeObject PyBytes_Type This instance of "PyTypeObject" represents the Python bytes type; it is the same object as "bytes" in the Python layer. int PyBytes_Check(PyObject *o) Return true if the object *o* is a bytes object or an instance of a subtype of the bytes type. This function always succeeds. int PyBytes_CheckExact(PyObject *o) Return true if the object *o* is a bytes object, but not an instance of a subtype of the bytes type. This function always succeeds. PyObject* PyBytes_FromString(const char *v) *Return value: New reference.* Return a new bytes object with a copy of the string *v* as value on success, and "NULL" on failure. The parameter *v* must not be "NULL"; it will not be checked. PyObject* PyBytes_FromStringAndSize(const char *v, Py_ssize_t len) *Return value: New reference.* Return a new bytes object with a copy of the string *v* as value and length *len* on success, and "NULL" on failure. If *v* is "NULL", the contents of the bytes object are uninitialized. PyObject* PyBytes_FromFormat(const char *format, ...) *Return value: New reference.* Take a C "printf()"-style *format* string and a variable number of arguments, calculate the size of the resulting Python bytes object and return a bytes object with the values formatted into it. The variable arguments must be C types and must correspond exactly to the format characters in the *format* string. The following format characters are allowed: +---------------------+-----------------+----------------------------------+ | Format Characters | Type | Comment | |=====================|=================|==================================| | "%%" | *n/a* | The literal % character. | +---------------------+-----------------+----------------------------------+ | "%c" | int | A single byte, represented as a | | | | C int. | +---------------------+-----------------+----------------------------------+ | "%d" | int | Equivalent to "printf("%d")". | | | | [1] | +---------------------+-----------------+----------------------------------+ | "%u" | unsigned int | Equivalent to "printf("%u")". | | | | [1] | +---------------------+-----------------+----------------------------------+ | "%ld" | long | Equivalent to "printf("%ld")". | | | | [1] | +---------------------+-----------------+----------------------------------+ | "%lu" | unsigned long | Equivalent to "printf("%lu")". | | | | [1] | +---------------------+-----------------+----------------------------------+ | "%zd" | "Py_ssize_t" | Equivalent to "printf("%zd")". | | | | [1] | +---------------------+-----------------+----------------------------------+ | "%zu" | size_t | Equivalent to "printf("%zu")". | | | | [1] | +---------------------+-----------------+----------------------------------+ | "%i" | int | Equivalent to "printf("%i")". | | | | [1] | +---------------------+-----------------+----------------------------------+ | "%x" | int | Equivalent to "printf("%x")". | | | | [1] | +---------------------+-----------------+----------------------------------+ | "%s" | const char* | A null-terminated C character | | | | array. | +---------------------+-----------------+----------------------------------+ | "%p" | const void* | The hex representation of a C | | | | pointer. Mostly equivalent to | | | | "printf("%p")" except that it is | | | | guaranteed to start with the | | | | literal "0x" regardless of what | | | | the platform’s "printf" yields. | +---------------------+-----------------+----------------------------------+ An unrecognized format character causes all the rest of the format string to be copied as-is to the result object, and any extra arguments discarded. [1] For integer specifiers (d, u, ld, lu, zd, zu, i, x): the 0-conversion flag has effect even when a precision is given. PyObject* PyBytes_FromFormatV(const char *format, va_list vargs) *Return value: New reference.* Identical to "PyBytes_FromFormat()" except that it takes exactly two arguments. PyObject* PyBytes_FromObject(PyObject *o) *Return value: New reference.* Return the bytes representation of object *o* that implements the buffer protocol. Py_ssize_t PyBytes_Size(PyObject *o) Return the length of the bytes in bytes object *o*. Py_ssize_t PyBytes_GET_SIZE(PyObject *o) Macro form of "PyBytes_Size()" but without error checking. char* PyBytes_AsString(PyObject *o) Return a pointer to the contents of *o*. The pointer refers to the internal buffer of *o*, which consists of "len(o) + 1" bytes. The last byte in the buffer is always null, regardless of whether there are any other null bytes. The data must not be modified in any way, unless the object was just created using "PyBytes_FromStringAndSize(NULL, size)". It must not be deallocated. If *o* is not a bytes object at all, "PyBytes_AsString()" returns "NULL" and raises "TypeError". char* PyBytes_AS_STRING(PyObject *string) Macro form of "PyBytes_AsString()" but without error checking. int PyBytes_AsStringAndSize(PyObject *obj, char **buffer, Py_ssize_t *length) Return the null-terminated contents of the object *obj* through the output variables *buffer* and *length*. If *length* is "NULL", the bytes object may not contain embedded null bytes; if it does, the function returns "-1" and a "ValueError" is raised. The buffer refers to an internal buffer of *obj*, which includes an additional null byte at the end (not counted in *length*). The data must not be modified in any way, unless the object was just created using "PyBytes_FromStringAndSize(NULL, size)". It must not be deallocated. If *obj* is not a bytes object at all, "PyBytes_AsStringAndSize()" returns "-1" and raises "TypeError". Changed in version 3.5: Previously, "TypeError" was raised when embedded null bytes were encountered in the bytes object. void PyBytes_Concat(PyObject **bytes, PyObject *newpart) Create a new bytes object in **bytes* containing the contents of *newpart* appended to *bytes*; the caller will own the new reference. The reference to the old value of *bytes* will be stolen. If the new object cannot be created, the old reference to *bytes* will still be discarded and the value of **bytes* will be set to "NULL"; the appropriate exception will be set. void PyBytes_ConcatAndDel(PyObject **bytes, PyObject *newpart) Create a new bytes object in **bytes* containing the contents of *newpart* appended to *bytes*. This version decrements the reference count of *newpart*. int _PyBytes_Resize(PyObject **bytes, Py_ssize_t newsize) A way to resize a bytes object even though it is “immutable”. Only use this to build up a brand new bytes object; don’t use this if the bytes may already be known in other parts of the code. It is an error to call this function if the refcount on the input bytes object is not one. Pass the address of an existing bytes object as an lvalue (it may be written into), and the new size desired. On success, **bytes* holds the resized bytes object and "0" is returned; the address in **bytes* may differ from its input value. If the reallocation fails, the original bytes object at **bytes* is deallocated, **bytes* is set to "NULL", "MemoryError" is set, and "-1" is returned. Call Protocol ************* CPython supports two different calling protocols: *tp_call* and vectorcall. The *tp_call* Protocol ====================== Instances of classes that set "tp_call" are callable. The signature of the slot is: PyObject *tp_call(PyObject *callable, PyObject *args, PyObject *kwargs); A call is made using a tuple for the positional arguments and a dict for the keyword arguments, similarly to "callable(*args, **kwargs)" in Python code. *args* must be non-NULL (use an empty tuple if there are no arguments) but *kwargs* may be *NULL* if there are no keyword arguments. This convention is not only used by *tp_call*: "tp_new" and "tp_init" also pass arguments this way. To call an object, use "PyObject_Call()" or another call API. The Vectorcall Protocol ======================= New in version 3.9. The vectorcall protocol was introduced in **PEP 590** as an additional protocol for making calls more efficient. As rule of thumb, CPython will prefer the vectorcall for internal calls if the callable supports it. However, this is not a hard rule. Additionally, some third-party extensions use *tp_call* directly (rather than using "PyObject_Call()"). Therefore, a class supporting vectorcall must also implement "tp_call". Moreover, the callable must behave the same regardless of which protocol is used. The recommended way to achieve this is by setting "tp_call" to "PyVectorcall_Call()". This bears repeating: Warning: A class supporting vectorcall **must** also implement "tp_call" with the same semantics. A class should not implement vectorcall if that would be slower than *tp_call*. For example, if the callee needs to convert the arguments to an args tuple and kwargs dict anyway, then there is no point in implementing vectorcall. Classes can implement the vectorcall protocol by enabling the "Py_TPFLAGS_HAVE_VECTORCALL" flag and setting "tp_vectorcall_offset" to the offset inside the object structure where a *vectorcallfunc* appears. This is a pointer to a function with the following signature: PyObject *(*vectorcallfunc)(PyObject *callable, PyObject *const *args, size_t nargsf, PyObject *kwnames) * *callable* is the object being called. * *args* is a C array consisting of the positional arguments followed by the values of the keyword arguments. This can be *NULL* if there are no arguments. * *nargsf* is the number of positional arguments plus possibly the "PY_VECTORCALL_ARGUMENTS_OFFSET" flag. To get the actual number of positional arguments from *nargsf*, use "PyVectorcall_NARGS()". * *kwnames* is a tuple containing the names of the keyword arguments; in other words, the keys of the kwargs dict. These names must be strings (instances of "str" or a subclass) and they must be unique. If there are no keyword arguments, then *kwnames* can instead be *NULL*. PY_VECTORCALL_ARGUMENTS_OFFSET If this flag is set in a vectorcall *nargsf* argument, the callee is allowed to temporarily change "args[-1]". In other words, *args* points to argument 1 (not 0) in the allocated vector. The callee must restore the value of "args[-1]" before returning. For "PyObject_VectorcallMethod()", this flag means instead that "args[0]" may be changed. Whenever they can do so cheaply (without additional allocation), callers are encouraged to use "PY_VECTORCALL_ARGUMENTS_OFFSET". Doing so will allow callables such as bound methods to make their onward calls (which include a prepended *self* argument) very efficiently. To call an object that implements vectorcall, use a call API function as with any other callable. "PyObject_Vectorcall()" will usually be most efficient. Note: In CPython 3.8, the vectorcall API and related functions were available provisionally under names with a leading underscore: "_PyObject_Vectorcall", "_Py_TPFLAGS_HAVE_VECTORCALL", "_PyObject_VectorcallMethod", "_PyVectorcall_Function", "_PyObject_CallOneArg", "_PyObject_CallMethodNoArgs", "_PyObject_CallMethodOneArg". Additionally, "PyObject_VectorcallDict" was available as "_PyObject_FastCallDict". The old names are still defined as aliases of the new, non- underscored names. Recursion Control ----------------- When using *tp_call*, callees do not need to worry about recursion: CPython uses "Py_EnterRecursiveCall()" and "Py_LeaveRecursiveCall()" for calls made using *tp_call*. For efficiency, this is not the case for calls done using vectorcall: the callee should use *Py_EnterRecursiveCall* and *Py_LeaveRecursiveCall* if needed. Vectorcall Support API ---------------------- Py_ssize_t PyVectorcall_NARGS(size_t nargsf) Given a vectorcall *nargsf* argument, return the actual number of arguments. Currently equivalent to: (Py_ssize_t)(nargsf & ~PY_VECTORCALL_ARGUMENTS_OFFSET) However, the function "PyVectorcall_NARGS" should be used to allow for future extensions. This function is not part of the limited API. New in version 3.8. vectorcallfunc PyVectorcall_Function(PyObject *op) If *op* does not support the vectorcall protocol (either because the type does not or because the specific instance does not), return *NULL*. Otherwise, return the vectorcall function pointer stored in *op*. This function never raises an exception. This is mostly useful to check whether or not *op* supports vectorcall, which can be done by checking "PyVectorcall_Function(op) != NULL". This function is not part of the limited API. New in version 3.8. PyObject* PyVectorcall_Call(PyObject *callable, PyObject *tuple, PyObject *dict) Call *callable*’s "vectorcallfunc" with positional and keyword arguments given in a tuple and dict, respectively. This is a specialized function, intended to be put in the "tp_call" slot or be used in an implementation of "tp_call". It does not check the "Py_TPFLAGS_HAVE_VECTORCALL" flag and it does not fall back to "tp_call". This function is not part of the limited API. New in version 3.8. Object Calling API ================== Various functions are available for calling a Python object. Each converts its arguments to a convention supported by the called object – either *tp_call* or vectorcall. In order to do as little conversion as possible, pick one that best fits the format of data you have available. The following table summarizes the available functions; please see individual documentation for details. +--------------------------------------------+--------------------+----------------------+-----------------+ | Function | callable | args | kwargs | |============================================|====================|======================|=================| | "PyObject_Call()" | "PyObject *" | tuple | dict/"NULL" | +--------------------------------------------+--------------------+----------------------+-----------------+ | "PyObject_CallNoArgs()" | "PyObject *" | — | — | +--------------------------------------------+--------------------+----------------------+-----------------+ | "PyObject_CallOneArg()" | "PyObject *" | 1 object | — | +--------------------------------------------+--------------------+----------------------+-----------------+ | "PyObject_CallObject()" | "PyObject *" | tuple/"NULL" | — | +--------------------------------------------+--------------------+----------------------+-----------------+ | "PyObject_CallFunction()" | "PyObject *" | format | — | +--------------------------------------------+--------------------+----------------------+-----------------+ | "PyObject_CallMethod()" | obj + "char*" | format | — | +--------------------------------------------+--------------------+----------------------+-----------------+ | "PyObject_CallFunctionObjArgs()" | "PyObject *" | variadic | — | +--------------------------------------------+--------------------+----------------------+-----------------+ | "PyObject_CallMethodObjArgs()" | obj + name | variadic | — | +--------------------------------------------+--------------------+----------------------+-----------------+ | "PyObject_CallMethodNoArgs()" | obj + name | — | — | +--------------------------------------------+--------------------+----------------------+-----------------+ | "PyObject_CallMethodOneArg()" | obj + name | 1 object | — | +--------------------------------------------+--------------------+----------------------+-----------------+ | "PyObject_Vectorcall()" | "PyObject *" | vectorcall | vectorcall | +--------------------------------------------+--------------------+----------------------+-----------------+ | "PyObject_VectorcallDict()" | "PyObject *" | vectorcall | dict/"NULL" | +--------------------------------------------+--------------------+----------------------+-----------------+ | "PyObject_VectorcallMethod()" | arg + name | vectorcall | vectorcall | +--------------------------------------------+--------------------+----------------------+-----------------+ PyObject* PyObject_Call(PyObject *callable, PyObject *args, PyObject *kwargs) *Return value: New reference.* Call a callable Python object *callable*, with arguments given by the tuple *args*, and named arguments given by the dictionary *kwargs*. *args* must not be *NULL*; use an empty tuple if no arguments are needed. If no named arguments are needed, *kwargs* can be *NULL*. Return the result of the call on success, or raise an exception and return *NULL* on failure. This is the equivalent of the Python expression: "callable(*args, **kwargs)". PyObject* PyObject_CallNoArgs(PyObject *callable) Call a callable Python object *callable* without any arguments. It is the most efficient way to call a callable Python object without any argument. Return the result of the call on success, or raise an exception and return *NULL* on failure. New in version 3.9. PyObject* PyObject_CallOneArg(PyObject *callable, PyObject *arg) Call a callable Python object *callable* with exactly 1 positional argument *arg* and no keyword arguments. Return the result of the call on success, or raise an exception and return *NULL* on failure. This function is not part of the limited API. New in version 3.9. PyObject* PyObject_CallObject(PyObject *callable, PyObject *args) *Return value: New reference.* Call a callable Python object *callable*, with arguments given by the tuple *args*. If no arguments are needed, then *args* can be *NULL*. Return the result of the call on success, or raise an exception and return *NULL* on failure. This is the equivalent of the Python expression: "callable(*args)". PyObject* PyObject_CallFunction(PyObject *callable, const char *format, ...) *Return value: New reference.* Call a callable Python object *callable*, with a variable number of C arguments. The C arguments are described using a "Py_BuildValue()" style format string. The format can be *NULL*, indicating that no arguments are provided. Return the result of the call on success, or raise an exception and return *NULL* on failure. This is the equivalent of the Python expression: "callable(*args)". Note that if you only pass "PyObject *" args, "PyObject_CallFunctionObjArgs()" is a faster alternative. Changed in version 3.4: The type of *format* was changed from "char *". PyObject* PyObject_CallMethod(PyObject *obj, const char *name, const char *format, ...) *Return value: New reference.* Call the method named *name* of object *obj* with a variable number of C arguments. The C arguments are described by a "Py_BuildValue()" format string that should produce a tuple. The format can be *NULL*, indicating that no arguments are provided. Return the result of the call on success, or raise an exception and return *NULL* on failure. This is the equivalent of the Python expression: "obj.name(arg1, arg2, ...)". Note that if you only pass "PyObject *" args, "PyObject_CallMethodObjArgs()" is a faster alternative. Changed in version 3.4: The types of *name* and *format* were changed from "char *". PyObject* PyObject_CallFunctionObjArgs(PyObject *callable, ...) *Return value: New reference.* Call a callable Python object *callable*, with a variable number of "PyObject *" arguments. The arguments are provided as a variable number of parameters followed by *NULL*. Return the result of the call on success, or raise an exception and return *NULL* on failure. This is the equivalent of the Python expression: "callable(arg1, arg2, ...)". PyObject* PyObject_CallMethodObjArgs(PyObject *obj, PyObject *name, ...) *Return value: New reference.* Call a method of the Python object *obj*, where the name of the method is given as a Python string object in *name*. It is called with a variable number of "PyObject *" arguments. The arguments are provided as a variable number of parameters followed by *NULL*. Return the result of the call on success, or raise an exception and return *NULL* on failure. PyObject* PyObject_CallMethodNoArgs(PyObject *obj, PyObject *name) Call a method of the Python object *obj* without arguments, where the name of the method is given as a Python string object in *name*. Return the result of the call on success, or raise an exception and return *NULL* on failure. This function is not part of the limited API. New in version 3.9. PyObject* PyObject_CallMethodOneArg(PyObject *obj, PyObject *name, PyObject *arg) Call a method of the Python object *obj* with a single positional argument *arg*, where the name of the method is given as a Python string object in *name*. Return the result of the call on success, or raise an exception and return *NULL* on failure. This function is not part of the limited API. New in version 3.9. PyObject* PyObject_Vectorcall(PyObject *callable, PyObject *const *args, size_t nargsf, PyObject *kwnames) Call a callable Python object *callable*. The arguments are the same as for "vectorcallfunc". If *callable* supports vectorcall, this directly calls the vectorcall function stored in *callable*. Return the result of the call on success, or raise an exception and return *NULL* on failure. This function is not part of the limited API. New in version 3.9. PyObject* PyObject_VectorcallDict(PyObject *callable, PyObject *const *args, size_t nargsf, PyObject *kwdict) Call *callable* with positional arguments passed exactly as in the vectorcall protocol, but with keyword arguments passed as a dictionary *kwdict*. The *args* array contains only the positional arguments. Regardless of which protocol is used internally, a conversion of arguments needs to be done. Therefore, this function should only be used if the caller already has a dictionary ready to use for the keyword arguments, but not a tuple for the positional arguments. This function is not part of the limited API. New in version 3.9. PyObject* PyObject_VectorcallMethod(PyObject *name, PyObject *const *args, size_t nargsf, PyObject *kwnames) Call a method using the vectorcall calling convention. The name of the method is given as a Python string *name*. The object whose method is called is *args[0]*, and the *args* array starting at *args[1]* represents the arguments of the call. There must be at least one positional argument. *nargsf* is the number of positional arguments including *args[0]*, plus "PY_VECTORCALL_ARGUMENTS_OFFSET" if the value of "args[0]" may temporarily be changed. Keyword arguments can be passed just like in "PyObject_Vectorcall()". If the object has the "Py_TPFLAGS_METHOD_DESCRIPTOR" feature, this will call the unbound method object with the full *args* vector as arguments. Return the result of the call on success, or raise an exception and return *NULL* on failure. This function is not part of the limited API. New in version 3.9. Call Support API ================ int PyCallable_Check(PyObject *o) Determine if the object *o* is callable. Return "1" if the object is callable and "0" otherwise. This function always succeeds. Capsules ******** Refer to Providing a C API for an Extension Module for more information on using these objects. New in version 3.1. PyCapsule This subtype of "PyObject" represents an opaque value, useful for C extension modules who need to pass an opaque value (as a "void*" pointer) through Python code to other C code. It is often used to make a C function pointer defined in one module available to other modules, so the regular import mechanism can be used to access C APIs defined in dynamically loaded modules. PyCapsule_Destructor The type of a destructor callback for a capsule. Defined as: typedef void (*PyCapsule_Destructor)(PyObject *); See "PyCapsule_New()" for the semantics of PyCapsule_Destructor callbacks. int PyCapsule_CheckExact(PyObject *p) Return true if its argument is a "PyCapsule". This function always succeeds. PyObject* PyCapsule_New(void *pointer, const char *name, PyCapsule_Destructor destructor) *Return value: New reference.* Create a "PyCapsule" encapsulating the *pointer*. The *pointer* argument may not be "NULL". On failure, set an exception and return "NULL". The *name* string may either be "NULL" or a pointer to a valid C string. If non-"NULL", this string must outlive the capsule. (Though it is permitted to free it inside the *destructor*.) If the *destructor* argument is not "NULL", it will be called with the capsule as its argument when it is destroyed. If this capsule will be stored as an attribute of a module, the *name* should be specified as "modulename.attributename". This will enable other modules to import the capsule using "PyCapsule_Import()". void* PyCapsule_GetPointer(PyObject *capsule, const char *name) Retrieve the *pointer* stored in the capsule. On failure, set an exception and return "NULL". The *name* parameter must compare exactly to the name stored in the capsule. If the name stored in the capsule is "NULL", the *name* passed in must also be "NULL". Python uses the C function "strcmp()" to compare capsule names. PyCapsule_Destructor PyCapsule_GetDestructor(PyObject *capsule) Return the current destructor stored in the capsule. On failure, set an exception and return "NULL". It is legal for a capsule to have a "NULL" destructor. This makes a "NULL" return code somewhat ambiguous; use "PyCapsule_IsValid()" or "PyErr_Occurred()" to disambiguate. void* PyCapsule_GetContext(PyObject *capsule) Return the current context stored in the capsule. On failure, set an exception and return "NULL". It is legal for a capsule to have a "NULL" context. This makes a "NULL" return code somewhat ambiguous; use "PyCapsule_IsValid()" or "PyErr_Occurred()" to disambiguate. const char* PyCapsule_GetName(PyObject *capsule) Return the current name stored in the capsule. On failure, set an exception and return "NULL". It is legal for a capsule to have a "NULL" name. This makes a "NULL" return code somewhat ambiguous; use "PyCapsule_IsValid()" or "PyErr_Occurred()" to disambiguate. void* PyCapsule_Import(const char *name, int no_block) Import a pointer to a C object from a capsule attribute in a module. The *name* parameter should specify the full name to the attribute, as in "module.attribute". The *name* stored in the capsule must match this string exactly. If *no_block* is true, import the module without blocking (using "PyImport_ImportModuleNoBlock()"). If *no_block* is false, import the module conventionally (using "PyImport_ImportModule()"). Return the capsule’s internal *pointer* on success. On failure, set an exception and return "NULL". int PyCapsule_IsValid(PyObject *capsule, const char *name) Determines whether or not *capsule* is a valid capsule. A valid capsule is non-"NULL", passes "PyCapsule_CheckExact()", has a non-"NULL" pointer stored in it, and its internal name matches the *name* parameter. (See "PyCapsule_GetPointer()" for information on how capsule names are compared.) In other words, if "PyCapsule_IsValid()" returns a true value, calls to any of the accessors (any function starting with "PyCapsule_Get()") are guaranteed to succeed. Return a nonzero value if the object is valid and matches the name passed in. Return "0" otherwise. This function will not fail. int PyCapsule_SetContext(PyObject *capsule, void *context) Set the context pointer inside *capsule* to *context*. Return "0" on success. Return nonzero and set an exception on failure. int PyCapsule_SetDestructor(PyObject *capsule, PyCapsule_Destructor destructor) Set the destructor inside *capsule* to *destructor*. Return "0" on success. Return nonzero and set an exception on failure. int PyCapsule_SetName(PyObject *capsule, const char *name) Set the name inside *capsule* to *name*. If non-"NULL", the name must outlive the capsule. If the previous *name* stored in the capsule was not "NULL", no attempt is made to free it. Return "0" on success. Return nonzero and set an exception on failure. int PyCapsule_SetPointer(PyObject *capsule, void *pointer) Set the void pointer inside *capsule* to *pointer*. The pointer may not be "NULL". Return "0" on success. Return nonzero and set an exception on failure. Cell Objects ************ “Cell” objects are used to implement variables referenced by multiple scopes. For each such variable, a cell object is created to store the value; the local variables of each stack frame that references the value contains a reference to the cells from outer scopes which also use that variable. When the value is accessed, the value contained in the cell is used instead of the cell object itself. This de- referencing of the cell object requires support from the generated byte-code; these are not automatically de-referenced when accessed. Cell objects are not likely to be useful elsewhere. PyCellObject The C structure used for cell objects. PyTypeObject PyCell_Type The type object corresponding to cell objects. int PyCell_Check(ob) Return true if *ob* is a cell object; *ob* must not be "NULL". This function always succeeds. PyObject* PyCell_New(PyObject *ob) *Return value: New reference.* Create and return a new cell object containing the value *ob*. The parameter may be "NULL". PyObject* PyCell_Get(PyObject *cell) *Return value: New reference.* Return the contents of the cell *cell*. PyObject* PyCell_GET(PyObject *cell) *Return value: Borrowed reference.* Return the contents of the cell *cell*, but without checking that *cell* is non-"NULL" and a cell object. int PyCell_Set(PyObject *cell, PyObject *value) Set the contents of the cell object *cell* to *value*. This releases the reference to any current content of the cell. *value* may be "NULL". *cell* must be non-"NULL"; if it is not a cell object, "-1" will be returned. On success, "0" will be returned. void PyCell_SET(PyObject *cell, PyObject *value) Sets the value of the cell object *cell* to *value*. No reference counts are adjusted, and no checks are made for safety; *cell* must be non-"NULL" and must be a cell object. Code Objects ************ Code objects are a low-level detail of the CPython implementation. Each one represents a chunk of executable code that hasn’t yet been bound into a function. PyCodeObject The C structure of the objects used to describe code objects. The fields of this type are subject to change at any time. PyTypeObject PyCode_Type This is an instance of "PyTypeObject" representing the Python "code" type. int PyCode_Check(PyObject *co) Return true if *co* is a "code" object. This function always succeeds. int PyCode_GetNumFree(PyCodeObject *co) Return the number of free variables in *co*. PyCodeObject* PyCode_New(int argcount, int kwonlyargcount, int nlocals, int stacksize, int flags, PyObject *code, PyObject *consts, PyObject *names, PyObject *varnames, PyObject *freevars, PyObject *cellvars, PyObject *filename, PyObject *name, int firstlineno, PyObject *lnotab) *Return value: New reference.* Return a new code object. If you need a dummy code object to create a frame, use "PyCode_NewEmpty()" instead. Calling "PyCode_New()" directly can bind you to a precise Python version since the definition of the bytecode changes often. PyCodeObject* PyCode_NewWithPosOnlyArgs(int argcount, int posonlyargcount, int kwonlyargcount, int nlocals, int stacksize, int flags, PyObject *code, PyObject *consts, PyObject *names, PyObject *varnames, PyObject *freevars, PyObject *cellvars, PyObject *filename, PyObject *name, int firstlineno, PyObject *lnotab) *Return value: New reference.* Similar to "PyCode_New()", but with an extra “posonlyargcount” for positional-only arguments. New in version 3.8. PyCodeObject* PyCode_NewEmpty(const char *filename, const char *funcname, int firstlineno) *Return value: New reference.* Return a new empty code object with the specified filename, function name, and first line number. It is illegal to "exec()" or "eval()" the resulting code object. Codec registry and support functions ************************************ int PyCodec_Register(PyObject *search_function) Register a new codec search function. As side effect, this tries to load the "encodings" package, if not yet done, to make sure that it is always first in the list of search functions. int PyCodec_KnownEncoding(const char *encoding) Return "1" or "0" depending on whether there is a registered codec for the given *encoding*. This function always succeeds. PyObject* PyCodec_Encode(PyObject *object, const char *encoding, const char *errors) *Return value: New reference.* Generic codec based encoding API. *object* is passed through the encoder function found for the given *encoding* using the error handling method defined by *errors*. *errors* may be "NULL" to use the default method defined for the codec. Raises a "LookupError" if no encoder can be found. PyObject* PyCodec_Decode(PyObject *object, const char *encoding, const char *errors) *Return value: New reference.* Generic codec based decoding API. *object* is passed through the decoder function found for the given *encoding* using the error handling method defined by *errors*. *errors* may be "NULL" to use the default method defined for the codec. Raises a "LookupError" if no encoder can be found. Codec lookup API ================ In the following functions, the *encoding* string is looked up converted to all lower-case characters, which makes encodings looked up through this mechanism effectively case-insensitive. If no codec is found, a "KeyError" is set and "NULL" returned. PyObject* PyCodec_Encoder(const char *encoding) *Return value: New reference.* Get an encoder function for the given *encoding*. PyObject* PyCodec_Decoder(const char *encoding) *Return value: New reference.* Get a decoder function for the given *encoding*. PyObject* PyCodec_IncrementalEncoder(const char *encoding, const char *errors) *Return value: New reference.* Get an "IncrementalEncoder" object for the given *encoding*. PyObject* PyCodec_IncrementalDecoder(const char *encoding, const char *errors) *Return value: New reference.* Get an "IncrementalDecoder" object for the given *encoding*. PyObject* PyCodec_StreamReader(const char *encoding, PyObject *stream, const char *errors) *Return value: New reference.* Get a "StreamReader" factory function for the given *encoding*. PyObject* PyCodec_StreamWriter(const char *encoding, PyObject *stream, const char *errors) *Return value: New reference.* Get a "StreamWriter" factory function for the given *encoding*. Registry API for Unicode encoding error handlers ================================================ int PyCodec_RegisterError(const char *name, PyObject *error) Register the error handling callback function *error* under the given *name*. This callback function will be called by a codec when it encounters unencodable characters/undecodable bytes and *name* is specified as the error parameter in the call to the encode/decode function. The callback gets a single argument, an instance of "UnicodeEncodeError", "UnicodeDecodeError" or "UnicodeTranslateError" that holds information about the problematic sequence of characters or bytes and their offset in the original string (see Unicode Exception Objects for functions to extract this information). The callback must either raise the given exception, or return a two-item tuple containing the replacement for the problematic sequence, and an integer giving the offset in the original string at which encoding/decoding should be resumed. Return "0" on success, "-1" on error. PyObject* PyCodec_LookupError(const char *name) *Return value: New reference.* Lookup the error handling callback function registered under *name*. As a special case "NULL" can be passed, in which case the error handling callback for “strict” will be returned. PyObject* PyCodec_StrictErrors(PyObject *exc) *Return value: Always NULL.* Raise *exc* as an exception. PyObject* PyCodec_IgnoreErrors(PyObject *exc) *Return value: New reference.* Ignore the unicode error, skipping the faulty input. PyObject* PyCodec_ReplaceErrors(PyObject *exc) *Return value: New reference.* Replace the unicode encode error with "?" or "U+FFFD". PyObject* PyCodec_XMLCharRefReplaceErrors(PyObject *exc) *Return value: New reference.* Replace the unicode encode error with XML character references. PyObject* PyCodec_BackslashReplaceErrors(PyObject *exc) *Return value: New reference.* Replace the unicode encode error with backslash escapes ("\x", "\u" and "\U"). PyObject* PyCodec_NameReplaceErrors(PyObject *exc) *Return value: New reference.* Replace the unicode encode error with "\N{...}" escapes. New in version 3.5. Complex Number Objects ********************** Python’s complex number objects are implemented as two distinct types when viewed from the C API: one is the Python object exposed to Python programs, and the other is a C structure which represents the actual complex number value. The API provides functions for working with both. Complex Numbers as C Structures =============================== Note that the functions which accept these structures as parameters and return them as results do so *by value* rather than dereferencing them through pointers. This is consistent throughout the API. Py_complex The C structure which corresponds to the value portion of a Python complex number object. Most of the functions for dealing with complex number objects use structures of this type as input or output values, as appropriate. It is defined as: typedef struct { double real; double imag; } Py_complex; Py_complex _Py_c_sum(Py_complex left, Py_complex right) Return the sum of two complex numbers, using the C "Py_complex" representation. Py_complex _Py_c_diff(Py_complex left, Py_complex right) Return the difference between two complex numbers, using the C "Py_complex" representation. Py_complex _Py_c_neg(Py_complex num) Return the negation of the complex number *num*, using the C "Py_complex" representation. Py_complex _Py_c_prod(Py_complex left, Py_complex right) Return the product of two complex numbers, using the C "Py_complex" representation. Py_complex _Py_c_quot(Py_complex dividend, Py_complex divisor) Return the quotient of two complex numbers, using the C "Py_complex" representation. If *divisor* is null, this method returns zero and sets "errno" to "EDOM". Py_complex _Py_c_pow(Py_complex num, Py_complex exp) Return the exponentiation of *num* by *exp*, using the C "Py_complex" representation. If *num* is null and *exp* is not a positive real number, this method returns zero and sets "errno" to "EDOM". Complex Numbers as Python Objects ================================= PyComplexObject This subtype of "PyObject" represents a Python complex number object. PyTypeObject PyComplex_Type This instance of "PyTypeObject" represents the Python complex number type. It is the same object as "complex" in the Python layer. int PyComplex_Check(PyObject *p) Return true if its argument is a "PyComplexObject" or a subtype of "PyComplexObject". This function always succeeds. int PyComplex_CheckExact(PyObject *p) Return true if its argument is a "PyComplexObject", but not a subtype of "PyComplexObject". This function always succeeds. PyObject* PyComplex_FromCComplex(Py_complex v) *Return value: New reference.* Create a new Python complex number object from a C "Py_complex" value. PyObject* PyComplex_FromDoubles(double real, double imag) *Return value: New reference.* Return a new "PyComplexObject" object from *real* and *imag*. double PyComplex_RealAsDouble(PyObject *op) Return the real part of *op* as a C "double". double PyComplex_ImagAsDouble(PyObject *op) Return the imaginary part of *op* as a C "double". Py_complex PyComplex_AsCComplex(PyObject *op) Return the "Py_complex" value of the complex number *op*. If *op* is not a Python complex number object but has a "__complex__()" method, this method will first be called to convert *op* to a Python complex number object. If "__complex__()" is not defined then it falls back to "__float__()". If "__float__()" is not defined then it falls back to "__index__()". Upon failure, this method returns "-1.0" as a real value. Changed in version 3.8: Use "__index__()" if available. Concrete Objects Layer ********************** The functions in this chapter are specific to certain Python object types. Passing them an object of the wrong type is not a good idea; if you receive an object from a Python program and you are not sure that it has the right type, you must perform a type check first; for example, to check that an object is a dictionary, use "PyDict_Check()". The chapter is structured like the “family tree” of Python object types. Warning: While the functions described in this chapter carefully check the type of the objects which are passed in, many of them do not check for "NULL" being passed instead of a valid object. Allowing "NULL" to be passed in can cause memory access violations and immediate termination of the interpreter. Fundamental Objects =================== This section describes Python type objects and the singleton object "None". * Type Objects * Creating Heap-Allocated Types * The "None" Object Numeric Objects =============== * Integer Objects * Boolean Objects * Floating Point Objects * Complex Number Objects * Complex Numbers as C Structures * Complex Numbers as Python Objects Sequence Objects ================ Generic operations on sequence objects were discussed in the previous chapter; this section deals with the specific kinds of sequence objects that are intrinsic to the Python language. * Bytes Objects * Byte Array Objects * Type check macros * Direct API functions * Macros * Unicode Objects and Codecs * Unicode Objects * Unicode Type * Unicode Character Properties * Creating and accessing Unicode strings * Deprecated Py_UNICODE APIs * Locale Encoding * File System Encoding * wchar_t Support * Built-in Codecs * Generic Codecs * UTF-8 Codecs * UTF-32 Codecs * UTF-16 Codecs * UTF-7 Codecs * Unicode-Escape Codecs * Raw-Unicode-Escape Codecs * Latin-1 Codecs * ASCII Codecs * Character Map Codecs * MBCS codecs for Windows * Methods & Slots * Methods and Slot Functions * Tuple Objects * Struct Sequence Objects * List Objects Container Objects ================= * Dictionary Objects * Set Objects Function Objects ================ * Function Objects * Instance Method Objects * Method Objects * Cell Objects * Code Objects Other Objects ============= * File Objects * Module Objects * Initializing C modules * Single-phase initialization * Multi-phase initialization * Low-level module creation functions * Support functions * Module lookup * Iterator Objects * Descriptor Objects * Slice Objects * Ellipsis Object * MemoryView objects * Weak Reference Objects * Capsules * Generator Objects * Coroutine Objects * Context Variables Objects * DateTime Objects * Objects for Type Hinting Context Variables Objects ************************* Note: Changed in version 3.7.1: In Python 3.7.1 the signatures of all context variables C APIs were **changed** to use "PyObject" pointers instead of "PyContext", "PyContextVar", and "PyContextToken", e.g.: // in 3.7.0: PyContext *PyContext_New(void); // in 3.7.1+: PyObject *PyContext_New(void); See bpo-34762 for more details. New in version 3.7. This section details the public C API for the "contextvars" module. PyContext The C structure used to represent a "contextvars.Context" object. PyContextVar The C structure used to represent a "contextvars.ContextVar" object. PyContextToken The C structure used to represent a "contextvars.Token" object. PyTypeObject PyContext_Type The type object representing the *context* type. PyTypeObject PyContextVar_Type The type object representing the *context variable* type. PyTypeObject PyContextToken_Type The type object representing the *context variable token* type. Type-check macros: int PyContext_CheckExact(PyObject *o) Return true if *o* is of type "PyContext_Type". *o* must not be "NULL". This function always succeeds. int PyContextVar_CheckExact(PyObject *o) Return true if *o* is of type "PyContextVar_Type". *o* must not be "NULL". This function always succeeds. int PyContextToken_CheckExact(PyObject *o) Return true if *o* is of type "PyContextToken_Type". *o* must not be "NULL". This function always succeeds. Context object management functions: PyObject *PyContext_New(void) *Return value: New reference.* Create a new empty context object. Returns "NULL" if an error has occurred. PyObject *PyContext_Copy(PyObject *ctx) *Return value: New reference.* Create a shallow copy of the passed *ctx* context object. Returns "NULL" if an error has occurred. PyObject *PyContext_CopyCurrent(void) *Return value: New reference.* Create a shallow copy of the current thread context. Returns "NULL" if an error has occurred. int PyContext_Enter(PyObject *ctx) Set *ctx* as the current context for the current thread. Returns "0" on success, and "-1" on error. int PyContext_Exit(PyObject *ctx) Deactivate the *ctx* context and restore the previous context as the current context for the current thread. Returns "0" on success, and "-1" on error. Context variable functions: PyObject *PyContextVar_New(const char *name, PyObject *def) *Return value: New reference.* Create a new "ContextVar" object. The *name* parameter is used for introspection and debug purposes. The *def* parameter specifies a default value for the context variable, or "NULL" for no default. If an error has occurred, this function returns "NULL". int PyContextVar_Get(PyObject *var, PyObject *default_value, PyObject **value) Get the value of a context variable. Returns "-1" if an error has occurred during lookup, and "0" if no error occurred, whether or not a value was found. If the context variable was found, *value* will be a pointer to it. If the context variable was *not* found, *value* will point to: * *default_value*, if not "NULL"; * the default value of *var*, if not "NULL"; * "NULL" Except for "NULL", the function returns a new reference. PyObject *PyContextVar_Set(PyObject *var, PyObject *value) *Return value: New reference.* Set the value of *var* to *value* in the current context. Returns a new token object for this change, or "NULL" if an error has occurred. int PyContextVar_Reset(PyObject *var, PyObject *token) Reset the state of the *var* context variable to that it was in before "PyContextVar_Set()" that returned the *token* was called. This function returns "0" on success and "-1" on error. String conversion and formatting ******************************** Functions for number conversion and formatted string output. int PyOS_snprintf(char *str, size_t size, const char *format, ...) Output not more than *size* bytes to *str* according to the format string *format* and the extra arguments. See the Unix man page *snprintf(3)*. int PyOS_vsnprintf(char *str, size_t size, const char *format, va_list va) Output not more than *size* bytes to *str* according to the format string *format* and the variable argument list *va*. Unix man page *vsnprintf(3)*. "PyOS_snprintf()" and "PyOS_vsnprintf()" wrap the Standard C library functions "snprintf()" and "vsnprintf()". Their purpose is to guarantee consistent behavior in corner cases, which the Standard C functions do not. The wrappers ensure that "str[size-1]" is always "'\0'" upon return. They never write more than *size* bytes (including the trailing "'\0'") into str. Both functions require that "str != NULL", "size > 0" and "format != NULL". If the platform doesn’t have "vsnprintf()" and the buffer size needed to avoid truncation exceeds *size* by more than 512 bytes, Python aborts with a "Py_FatalError()". The return value (*rv*) for these functions should be interpreted as follows: * When "0 <= rv < size", the output conversion was successful and *rv* characters were written to *str* (excluding the trailing "'\0'" byte at "str[rv]"). * When "rv >= size", the output conversion was truncated and a buffer with "rv + 1" bytes would have been needed to succeed. "str[size-1]" is "'\0'" in this case. * When "rv < 0", “something bad happened.” "str[size-1]" is "'\0'" in this case too, but the rest of *str* is undefined. The exact cause of the error depends on the underlying platform. The following functions provide locale-independent string to number conversions. double PyOS_string_to_double(const char *s, char **endptr, PyObject *overflow_exception) Convert a string "s" to a "double", raising a Python exception on failure. The set of accepted strings corresponds to the set of strings accepted by Python’s "float()" constructor, except that "s" must not have leading or trailing whitespace. The conversion is independent of the current locale. If "endptr" is "NULL", convert the whole string. Raise "ValueError" and return "-1.0" if the string is not a valid representation of a floating-point number. If endptr is not "NULL", convert as much of the string as possible and set "*endptr" to point to the first unconverted character. If no initial segment of the string is the valid representation of a floating-point number, set "*endptr" to point to the beginning of the string, raise ValueError, and return "-1.0". If "s" represents a value that is too large to store in a float (for example, ""1e500"" is such a string on many platforms) then if "overflow_exception" is "NULL" return "Py_HUGE_VAL" (with an appropriate sign) and don’t set any exception. Otherwise, "overflow_exception" must point to a Python exception object; raise that exception and return "-1.0". In both cases, set "*endptr" to point to the first character after the converted value. If any other error occurs during the conversion (for example an out-of-memory error), set the appropriate Python exception and return "-1.0". New in version 3.1. char* PyOS_double_to_string(double val, char format_code, int precision, int flags, int *ptype) Convert a "double" *val* to a string using supplied *format_code*, *precision*, and *flags*. *format_code* must be one of "'e'", "'E'", "'f'", "'F'", "'g'", "'G'" or "'r'". For "'r'", the supplied *precision* must be 0 and is ignored. The "'r'" format code specifies the standard "repr()" format. *flags* can be zero or more of the values "Py_DTSF_SIGN", "Py_DTSF_ADD_DOT_0", or "Py_DTSF_ALT", or-ed together: * "Py_DTSF_SIGN" means to always precede the returned string with a sign character, even if *val* is non-negative. * "Py_DTSF_ADD_DOT_0" means to ensure that the returned string will not look like an integer. * "Py_DTSF_ALT" means to apply “alternate” formatting rules. See the documentation for the "PyOS_snprintf()" "'#'" specifier for details. If *ptype* is non-"NULL", then the value it points to will be set to one of "Py_DTST_FINITE", "Py_DTST_INFINITE", or "Py_DTST_NAN", signifying that *val* is a finite number, an infinite number, or not a number, respectively. The return value is a pointer to *buffer* with the converted string or "NULL" if the conversion failed. The caller is responsible for freeing the returned string by calling "PyMem_Free()". New in version 3.1. int PyOS_stricmp(const char *s1, const char *s2) Case insensitive comparison of strings. The function works almost identically to "strcmp()" except that it ignores the case. int PyOS_strnicmp(const char *s1, const char *s2, Py_ssize_t  size) Case insensitive comparison of strings. The function works almost identically to "strncmp()" except that it ignores the case. Coroutine Objects ***************** New in version 3.5. Coroutine objects are what functions declared with an "async" keyword return. PyCoroObject The C structure used for coroutine objects. PyTypeObject PyCoro_Type The type object corresponding to coroutine objects. int PyCoro_CheckExact(PyObject *ob) Return true if *ob*’s type is "PyCoro_Type"; *ob* must not be "NULL". This function always succeeds. PyObject* PyCoro_New(PyFrameObject *frame, PyObject *name, PyObject *qualname) *Return value: New reference.* Create and return a new coroutine object based on the *frame* object, with "__name__" and "__qualname__" set to *name* and *qualname*. A reference to *frame* is stolen by this function. The *frame* argument must not be "NULL". DateTime Objects **************** Various date and time objects are supplied by the "datetime" module. Before using any of these functions, the header file "datetime.h" must be included in your source (note that this is not included by "Python.h"), and the macro "PyDateTime_IMPORT" must be invoked, usually as part of the module initialisation function. The macro puts a pointer to a C structure into a static variable, "PyDateTimeAPI", that is used by the following macros. Macro for access to the UTC singleton: PyObject* PyDateTime_TimeZone_UTC Returns the time zone singleton representing UTC, the same object as "datetime.timezone.utc". New in version 3.7. Type-check macros: int PyDate_Check(PyObject *ob) Return true if *ob* is of type "PyDateTime_DateType" or a subtype of "PyDateTime_DateType". *ob* must not be "NULL". This function always succeeds. int PyDate_CheckExact(PyObject *ob) Return true if *ob* is of type "PyDateTime_DateType". *ob* must not be "NULL". This function always succeeds. int PyDateTime_Check(PyObject *ob) Return true if *ob* is of type "PyDateTime_DateTimeType" or a subtype of "PyDateTime_DateTimeType". *ob* must not be "NULL". This function always succeeds. int PyDateTime_CheckExact(PyObject *ob) Return true if *ob* is of type "PyDateTime_DateTimeType". *ob* must not be "NULL". This function always succeeds. int PyTime_Check(PyObject *ob) Return true if *ob* is of type "PyDateTime_TimeType" or a subtype of "PyDateTime_TimeType". *ob* must not be "NULL". This function always succeeds. int PyTime_CheckExact(PyObject *ob) Return true if *ob* is of type "PyDateTime_TimeType". *ob* must not be "NULL". This function always succeeds. int PyDelta_Check(PyObject *ob) Return true if *ob* is of type "PyDateTime_DeltaType" or a subtype of "PyDateTime_DeltaType". *ob* must not be "NULL". This function always succeeds. int PyDelta_CheckExact(PyObject *ob) Return true if *ob* is of type "PyDateTime_DeltaType". *ob* must not be "NULL". This function always succeeds. int PyTZInfo_Check(PyObject *ob) Return true if *ob* is of type "PyDateTime_TZInfoType" or a subtype of "PyDateTime_TZInfoType". *ob* must not be "NULL". This function always succeeds. int PyTZInfo_CheckExact(PyObject *ob) Return true if *ob* is of type "PyDateTime_TZInfoType". *ob* must not be "NULL". This function always succeeds. Macros to create objects: PyObject* PyDate_FromDate(int year, int month, int day) *Return value: New reference.* Return a "datetime.date" object with the specified year, month and day. PyObject* PyDateTime_FromDateAndTime(int year, int month, int day, int hour, int minute, int second, int usecond) *Return value: New reference.* Return a "datetime.datetime" object with the specified year, month, day, hour, minute, second and microsecond. PyObject* PyDateTime_FromDateAndTimeAndFold(int year, int month, int day, int hour, int minute, int second, int usecond, int fold) *Return value: New reference.* Return a "datetime.datetime" object with the specified year, month, day, hour, minute, second, microsecond and fold. New in version 3.6. PyObject* PyTime_FromTime(int hour, int minute, int second, int usecond) *Return value: New reference.* Return a "datetime.time" object with the specified hour, minute, second and microsecond. PyObject* PyTime_FromTimeAndFold(int hour, int minute, int second, int usecond, int fold) *Return value: New reference.* Return a "datetime.time" object with the specified hour, minute, second, microsecond and fold. New in version 3.6. PyObject* PyDelta_FromDSU(int days, int seconds, int useconds) *Return value: New reference.* Return a "datetime.timedelta" object representing the given number of days, seconds and microseconds. Normalization is performed so that the resulting number of microseconds and seconds lie in the ranges documented for "datetime.timedelta" objects. PyObject* PyTimeZone_FromOffset(PyDateTime_DeltaType* offset) *Return value: New reference.* Return a "datetime.timezone" object with an unnamed fixed offset represented by the *offset* argument. New in version 3.7. PyObject* PyTimeZone_FromOffsetAndName(PyDateTime_DeltaType* offset, PyUnicode* name) *Return value: New reference.* Return a "datetime.timezone" object with a fixed offset represented by the *offset* argument and with tzname *name*. New in version 3.7. Macros to extract fields from date objects. The argument must be an instance of "PyDateTime_Date", including subclasses (such as "PyDateTime_DateTime"). The argument must not be "NULL", and the type is not checked: int PyDateTime_GET_YEAR(PyDateTime_Date *o) Return the year, as a positive int. int PyDateTime_GET_MONTH(PyDateTime_Date *o) Return the month, as an int from 1 through 12. int PyDateTime_GET_DAY(PyDateTime_Date *o) Return the day, as an int from 1 through 31. Macros to extract fields from datetime objects. The argument must be an instance of "PyDateTime_DateTime", including subclasses. The argument must not be "NULL", and the type is not checked: int PyDateTime_DATE_GET_HOUR(PyDateTime_DateTime *o) Return the hour, as an int from 0 through 23. int PyDateTime_DATE_GET_MINUTE(PyDateTime_DateTime *o) Return the minute, as an int from 0 through 59. int PyDateTime_DATE_GET_SECOND(PyDateTime_DateTime *o) Return the second, as an int from 0 through 59. int PyDateTime_DATE_GET_MICROSECOND(PyDateTime_DateTime *o) Return the microsecond, as an int from 0 through 999999. int PyDateTime_DATE_GET_FOLD(PyDateTime_DateTime *o) Return the fold, as an int from 0 through 1. New in version 3.6. Macros to extract fields from time objects. The argument must be an instance of "PyDateTime_Time", including subclasses. The argument must not be "NULL", and the type is not checked: int PyDateTime_TIME_GET_HOUR(PyDateTime_Time *o) Return the hour, as an int from 0 through 23. int PyDateTime_TIME_GET_MINUTE(PyDateTime_Time *o) Return the minute, as an int from 0 through 59. int PyDateTime_TIME_GET_SECOND(PyDateTime_Time *o) Return the second, as an int from 0 through 59. int PyDateTime_TIME_GET_MICROSECOND(PyDateTime_Time *o) Return the microsecond, as an int from 0 through 999999. int PyDateTime_TIME_GET_FOLD(PyDateTime_Time *o) Return the fold, as an int from 0 through 1. New in version 3.6. Macros to extract fields from time delta objects. The argument must be an instance of "PyDateTime_Delta", including subclasses. The argument must not be "NULL", and the type is not checked: int PyDateTime_DELTA_GET_DAYS(PyDateTime_Delta *o) Return the number of days, as an int from -999999999 to 999999999. New in version 3.3. int PyDateTime_DELTA_GET_SECONDS(PyDateTime_Delta *o) Return the number of seconds, as an int from 0 through 86399. New in version 3.3. int PyDateTime_DELTA_GET_MICROSECONDS(PyDateTime_Delta *o) Return the number of microseconds, as an int from 0 through 999999. New in version 3.3. Macros for the convenience of modules implementing the DB API: PyObject* PyDateTime_FromTimestamp(PyObject *args) *Return value: New reference.* Create and return a new "datetime.datetime" object given an argument tuple suitable for passing to "datetime.datetime.fromtimestamp()". PyObject* PyDate_FromTimestamp(PyObject *args) *Return value: New reference.* Create and return a new "datetime.date" object given an argument tuple suitable for passing to "datetime.date.fromtimestamp()". Descriptor Objects ****************** “Descriptors” are objects that describe some attribute of an object. They are found in the dictionary of type objects. PyTypeObject PyProperty_Type The type object for the built-in descriptor types. PyObject* PyDescr_NewGetSet(PyTypeObject *type, struct PyGetSetDef *getset) *Return value: New reference.* PyObject* PyDescr_NewMember(PyTypeObject *type, struct PyMemberDef *meth) *Return value: New reference.* PyObject* PyDescr_NewMethod(PyTypeObject *type, struct PyMethodDef *meth) *Return value: New reference.* PyObject* PyDescr_NewWrapper(PyTypeObject *type, struct wrapperbase *wrapper, void *wrapped) *Return value: New reference.* PyObject* PyDescr_NewClassMethod(PyTypeObject *type, PyMethodDef *method) *Return value: New reference.* int PyDescr_IsData(PyObject *descr) Return true if the descriptor objects *descr* describes a data attribute, or false if it describes a method. *descr* must be a descriptor object; there is no error checking. PyObject* PyWrapper_New(PyObject *, PyObject *) *Return value: New reference.* Dictionary Objects ****************** PyDictObject This subtype of "PyObject" represents a Python dictionary object. PyTypeObject PyDict_Type This instance of "PyTypeObject" represents the Python dictionary type. This is the same object as "dict" in the Python layer. int PyDict_Check(PyObject *p) Return true if *p* is a dict object or an instance of a subtype of the dict type. This function always succeeds. int PyDict_CheckExact(PyObject *p) Return true if *p* is a dict object, but not an instance of a subtype of the dict type. This function always succeeds. PyObject* PyDict_New() *Return value: New reference.* Return a new empty dictionary, or "NULL" on failure. PyObject* PyDictProxy_New(PyObject *mapping) *Return value: New reference.* Return a "types.MappingProxyType" object for a mapping which enforces read-only behavior. This is normally used to create a view to prevent modification of the dictionary for non-dynamic class types. void PyDict_Clear(PyObject *p) Empty an existing dictionary of all key-value pairs. int PyDict_Contains(PyObject *p, PyObject *key) Determine if dictionary *p* contains *key*. If an item in *p* is matches *key*, return "1", otherwise return "0". On error, return "-1". This is equivalent to the Python expression "key in p". PyObject* PyDict_Copy(PyObject *p) *Return value: New reference.* Return a new dictionary that contains the same key-value pairs as *p*. int PyDict_SetItem(PyObject *p, PyObject *key, PyObject *val) Insert *val* into the dictionary *p* with a key of *key*. *key* must be *hashable*; if it isn’t, "TypeError" will be raised. Return "0" on success or "-1" on failure. This function *does not* steal a reference to *val*. int PyDict_SetItemString(PyObject *p, const char *key, PyObject *val) Insert *val* into the dictionary *p* using *key* as a key. *key* should be a "const char*". The key object is created using "PyUnicode_FromString(key)". Return "0" on success or "-1" on failure. This function *does not* steal a reference to *val*. int PyDict_DelItem(PyObject *p, PyObject *key) Remove the entry in dictionary *p* with key *key*. *key* must be hashable; if it isn’t, "TypeError" is raised. If *key* is not in the dictionary, "KeyError" is raised. Return "0" on success or "-1" on failure. int PyDict_DelItemString(PyObject *p, const char *key) Remove the entry in dictionary *p* which has a key specified by the string *key*. If *key* is not in the dictionary, "KeyError" is raised. Return "0" on success or "-1" on failure. PyObject* PyDict_GetItem(PyObject *p, PyObject *key) *Return value: Borrowed reference.* Return the object from dictionary *p* which has a key *key*. Return "NULL" if the key *key* is not present, but *without* setting an exception. Note that exceptions which occur while calling "__hash__()" and "__eq__()" methods will get suppressed. To get error reporting use "PyDict_GetItemWithError()" instead. PyObject* PyDict_GetItemWithError(PyObject *p, PyObject *key) *Return value: Borrowed reference.* Variant of "PyDict_GetItem()" that does not suppress exceptions. Return "NULL" **with** an exception set if an exception occurred. Return "NULL" **without** an exception set if the key wasn’t present. PyObject* PyDict_GetItemString(PyObject *p, const char *key) *Return value: Borrowed reference.* This is the same as "PyDict_GetItem()", but *key* is specified as a "const char*", rather than a "PyObject*". Note that exceptions which occur while calling "__hash__()" and "__eq__()" methods and creating a temporary string object will get suppressed. To get error reporting use "PyDict_GetItemWithError()" instead. PyObject* PyDict_SetDefault(PyObject *p, PyObject *key, PyObject *defaultobj) *Return value: Borrowed reference.* This is the same as the Python-level "dict.setdefault()". If present, it returns the value corresponding to *key* from the dictionary *p*. If the key is not in the dict, it is inserted with value *defaultobj* and *defaultobj* is returned. This function evaluates the hash function of *key* only once, instead of evaluating it independently for the lookup and the insertion. New in version 3.4. PyObject* PyDict_Items(PyObject *p) *Return value: New reference.* Return a "PyListObject" containing all the items from the dictionary. PyObject* PyDict_Keys(PyObject *p) *Return value: New reference.* Return a "PyListObject" containing all the keys from the dictionary. PyObject* PyDict_Values(PyObject *p) *Return value: New reference.* Return a "PyListObject" containing all the values from the dictionary *p*. Py_ssize_t PyDict_Size(PyObject *p) Return the number of items in the dictionary. This is equivalent to "len(p)" on a dictionary. int PyDict_Next(PyObject *p, Py_ssize_t *ppos, PyObject **pkey, PyObject **pvalue) Iterate over all key-value pairs in the dictionary *p*. The "Py_ssize_t" referred to by *ppos* must be initialized to "0" prior to the first call to this function to start the iteration; the function returns true for each pair in the dictionary, and false once all pairs have been reported. The parameters *pkey* and *pvalue* should either point to "PyObject*" variables that will be filled in with each key and value, respectively, or may be "NULL". Any references returned through them are borrowed. *ppos* should not be altered during iteration. Its value represents offsets within the internal dictionary structure, and since the structure is sparse, the offsets are not consecutive. For example: PyObject *key, *value; Py_ssize_t pos = 0; while (PyDict_Next(self->dict, &pos, &key, &value)) { /* do something interesting with the values... */ ... } The dictionary *p* should not be mutated during iteration. It is safe to modify the values of the keys as you iterate over the dictionary, but only so long as the set of keys does not change. For example: PyObject *key, *value; Py_ssize_t pos = 0; while (PyDict_Next(self->dict, &pos, &key, &value)) { long i = PyLong_AsLong(value); if (i == -1 && PyErr_Occurred()) { return -1; } PyObject *o = PyLong_FromLong(i + 1); if (o == NULL) return -1; if (PyDict_SetItem(self->dict, key, o) < 0) { Py_DECREF(o); return -1; } Py_DECREF(o); } int PyDict_Merge(PyObject *a, PyObject *b, int override) Iterate over mapping object *b* adding key-value pairs to dictionary *a*. *b* may be a dictionary, or any object supporting "PyMapping_Keys()" and "PyObject_GetItem()". If *override* is true, existing pairs in *a* will be replaced if a matching key is found in *b*, otherwise pairs will only be added if there is not a matching key in *a*. Return "0" on success or "-1" if an exception was raised. int PyDict_Update(PyObject *a, PyObject *b) This is the same as "PyDict_Merge(a, b, 1)" in C, and is similar to "a.update(b)" in Python except that "PyDict_Update()" doesn’t fall back to the iterating over a sequence of key value pairs if the second argument has no “keys” attribute. Return "0" on success or "-1" if an exception was raised. int PyDict_MergeFromSeq2(PyObject *a, PyObject *seq2, int override) Update or merge into dictionary *a*, from the key-value pairs in *seq2*. *seq2* must be an iterable object producing iterable objects of length 2, viewed as key-value pairs. In case of duplicate keys, the last wins if *override* is true, else the first wins. Return "0" on success or "-1" if an exception was raised. Equivalent Python (except for the return value): def PyDict_MergeFromSeq2(a, seq2, override): for key, value in seq2: if override or key not in a: a[key] = value Exception Handling ****************** The functions described in this chapter will let you handle and raise Python exceptions. It is important to understand some of the basics of Python exception handling. It works somewhat like the POSIX "errno" variable: there is a global indicator (per thread) of the last error that occurred. Most C API functions don’t clear this on success, but will set it to indicate the cause of the error on failure. Most C API functions also return an error indicator, usually "NULL" if they are supposed to return a pointer, or "-1" if they return an integer (exception: the "PyArg_*()" functions return "1" for success and "0" for failure). Concretely, the error indicator consists of three object pointers: the exception’s type, the exception’s value, and the traceback object. Any of those pointers can be "NULL" if non-set (although some combinations are forbidden, for example you can’t have a non-"NULL" traceback if the exception type is "NULL"). When a function must fail because some function it called failed, it generally doesn’t set the error indicator; the function it called already set it. It is responsible for either handling the error and clearing the exception or returning after cleaning up any resources it holds (such as object references or memory allocations); it should *not* continue normally if it is not prepared to handle the error. If returning due to an error, it is important to indicate to the caller that an error has been set. If the error is not handled or carefully propagated, additional calls into the Python/C API may not behave as intended and may fail in mysterious ways. Note: The error indicator is **not** the result of "sys.exc_info()". The former corresponds to an exception that is not yet caught (and is therefore still propagating), while the latter returns an exception after it is caught (and has therefore stopped propagating). Printing and clearing ===================== void PyErr_Clear() Clear the error indicator. If the error indicator is not set, there is no effect. void PyErr_PrintEx(int set_sys_last_vars) Print a standard traceback to "sys.stderr" and clear the error indicator. **Unless** the error is a "SystemExit", in that case no traceback is printed and the Python process will exit with the error code specified by the "SystemExit" instance. Call this function **only** when the error indicator is set. Otherwise it will cause a fatal error! If *set_sys_last_vars* is nonzero, the variables "sys.last_type", "sys.last_value" and "sys.last_traceback" will be set to the type, value and traceback of the printed exception, respectively. void PyErr_Print() Alias for "PyErr_PrintEx(1)". void PyErr_WriteUnraisable(PyObject *obj) Call "sys.unraisablehook()" using the current exception and *obj* argument. This utility function prints a warning message to "sys.stderr" when an exception has been set but it is impossible for the interpreter to actually raise the exception. It is used, for example, when an exception occurs in an "__del__()" method. The function is called with a single argument *obj* that identifies the context in which the unraisable exception occurred. If possible, the repr of *obj* will be printed in the warning message. An exception must be set when calling this function. Raising exceptions ================== These functions help you set the current thread’s error indicator. For convenience, some of these functions will always return a "NULL" pointer for use in a "return" statement. void PyErr_SetString(PyObject *type, const char *message) This is the most common way to set the error indicator. The first argument specifies the exception type; it is normally one of the standard exceptions, e.g. "PyExc_RuntimeError". You need not increment its reference count. The second argument is an error message; it is decoded from "'utf-8'". void PyErr_SetObject(PyObject *type, PyObject *value) This function is similar to "PyErr_SetString()" but lets you specify an arbitrary Python object for the “value” of the exception. PyObject* PyErr_Format(PyObject *exception, const char *format, ...) *Return value: Always NULL.* This function sets the error indicator and returns "NULL". *exception* should be a Python exception class. The *format* and subsequent parameters help format the error message; they have the same meaning and values as in "PyUnicode_FromFormat()". *format* is an ASCII-encoded string. PyObject* PyErr_FormatV(PyObject *exception, const char *format, va_list vargs) *Return value: Always NULL.* Same as "PyErr_Format()", but taking a "va_list" argument rather than a variable number of arguments. New in version 3.5. void PyErr_SetNone(PyObject *type) This is a shorthand for "PyErr_SetObject(type, Py_None)". int PyErr_BadArgument() This is a shorthand for "PyErr_SetString(PyExc_TypeError, message)", where *message* indicates that a built-in operation was invoked with an illegal argument. It is mostly for internal use. PyObject* PyErr_NoMemory() *Return value: Always NULL.* This is a shorthand for "PyErr_SetNone(PyExc_MemoryError)"; it returns "NULL" so an object allocation function can write "return PyErr_NoMemory();" when it runs out of memory. PyObject* PyErr_SetFromErrno(PyObject *type) *Return value: Always NULL.* This is a convenience function to raise an exception when a C library function has returned an error and set the C variable "errno". It constructs a tuple object whose first item is the integer "errno" value and whose second item is the corresponding error message (gotten from "strerror()"), and then calls "PyErr_SetObject(type, object)". On Unix, when the "errno" value is "EINTR", indicating an interrupted system call, this calls "PyErr_CheckSignals()", and if that set the error indicator, leaves it set to that. The function always returns "NULL", so a wrapper function around a system call can write "return PyErr_SetFromErrno(type);" when the system call returns an error. PyObject* PyErr_SetFromErrnoWithFilenameObject(PyObject *type, PyObject *filenameObject) *Return value: Always NULL.* Similar to "PyErr_SetFromErrno()", with the additional behavior that if *filenameObject* is not "NULL", it is passed to the constructor of *type* as a third parameter. In the case of "OSError" exception, this is used to define the "filename" attribute of the exception instance. PyObject* PyErr_SetFromErrnoWithFilenameObjects(PyObject *type, PyObject *filenameObject, PyObject *filenameObject2) *Return value: Always NULL.* Similar to "PyErr_SetFromErrnoWithFilenameObject()", but takes a second filename object, for raising errors when a function that takes two filenames fails. New in version 3.4. PyObject* PyErr_SetFromErrnoWithFilename(PyObject *type, const char *filename) *Return value: Always NULL.* Similar to "PyErr_SetFromErrnoWithFilenameObject()", but the filename is given as a C string. *filename* is decoded from the filesystem encoding ("os.fsdecode()"). PyObject* PyErr_SetFromWindowsErr(int ierr) *Return value: Always NULL.* This is a convenience function to raise "WindowsError". If called with *ierr* of "0", the error code returned by a call to "GetLastError()" is used instead. It calls the Win32 function "FormatMessage()" to retrieve the Windows description of error code given by *ierr* or "GetLastError()", then it constructs a tuple object whose first item is the *ierr* value and whose second item is the corresponding error message (gotten from "FormatMessage()"), and then calls "PyErr_SetObject(PyExc_WindowsError, object)". This function always returns "NULL". Availability: Windows. PyObject* PyErr_SetExcFromWindowsErr(PyObject *type, int ierr) *Return value: Always NULL.* Similar to "PyErr_SetFromWindowsErr()", with an additional parameter specifying the exception type to be raised. Availability: Windows. PyObject* PyErr_SetFromWindowsErrWithFilename(int ierr, const char *filename) *Return value: Always NULL.* Similar to "PyErr_SetFromWindowsErrWithFilenameObject()", but the filename is given as a C string. *filename* is decoded from the filesystem encoding ("os.fsdecode()"). Availability: Windows. PyObject* PyErr_SetExcFromWindowsErrWithFilenameObject(PyObject *type, int ierr, PyObject *filename) *Return value: Always NULL.* Similar to "PyErr_SetFromWindowsErrWithFilenameObject()", with an additional parameter specifying the exception type to be raised. Availability: Windows. PyObject* PyErr_SetExcFromWindowsErrWithFilenameObjects(PyObject *type, int ierr, PyObject *filename, PyObject *filename2) *Return value: Always NULL.* Similar to "PyErr_SetExcFromWindowsErrWithFilenameObject()", but accepts a second filename object. Availability: Windows. New in version 3.4. PyObject* PyErr_SetExcFromWindowsErrWithFilename(PyObject *type, int ierr, const char *filename) *Return value: Always NULL.* Similar to "PyErr_SetFromWindowsErrWithFilename()", with an additional parameter specifying the exception type to be raised. Availability: Windows. PyObject* PyErr_SetImportError(PyObject *msg, PyObject *name, PyObject *path) *Return value: Always NULL.* This is a convenience function to raise "ImportError". *msg* will be set as the exception’s message string. *name* and *path*, both of which can be "NULL", will be set as the "ImportError"’s respective "name" and "path" attributes. New in version 3.3. PyObject* PyErr_SetImportErrorSubclass(PyObject *exception, PyObject *msg, PyObject *name, PyObject *path) *Return value: Always NULL.* Much like "PyErr_SetImportError()" but this function allows for specifying a subclass of "ImportError" to raise. New in version 3.6. void PyErr_SyntaxLocationObject(PyObject *filename, int lineno, int col_offset) Set file, line, and offset information for the current exception. If the current exception is not a "SyntaxError", then it sets additional attributes, which make the exception printing subsystem think the exception is a "SyntaxError". New in version 3.4. void PyErr_SyntaxLocationEx(const char *filename, int lineno, int col_offset) Like "PyErr_SyntaxLocationObject()", but *filename* is a byte string decoded from the filesystem encoding ("os.fsdecode()"). New in version 3.2. void PyErr_SyntaxLocation(const char *filename, int lineno) Like "PyErr_SyntaxLocationEx()", but the *col_offset* parameter is omitted. void PyErr_BadInternalCall() This is a shorthand for "PyErr_SetString(PyExc_SystemError, message)", where *message* indicates that an internal operation (e.g. a Python/C API function) was invoked with an illegal argument. It is mostly for internal use. Issuing warnings ================ Use these functions to issue warnings from C code. They mirror similar functions exported by the Python "warnings" module. They normally print a warning message to *sys.stderr*; however, it is also possible that the user has specified that warnings are to be turned into errors, and in that case they will raise an exception. It is also possible that the functions raise an exception because of a problem with the warning machinery. The return value is "0" if no exception is raised, or "-1" if an exception is raised. (It is not possible to determine whether a warning message is actually printed, nor what the reason is for the exception; this is intentional.) If an exception is raised, the caller should do its normal exception handling (for example, "Py_DECREF()" owned references and return an error value). int PyErr_WarnEx(PyObject *category, const char *message, Py_ssize_t stack_level) Issue a warning message. The *category* argument is a warning category (see below) or "NULL"; the *message* argument is a UTF-8 encoded string. *stack_level* is a positive number giving a number of stack frames; the warning will be issued from the currently executing line of code in that stack frame. A *stack_level* of 1 is the function calling "PyErr_WarnEx()", 2 is the function above that, and so forth. Warning categories must be subclasses of "PyExc_Warning"; "PyExc_Warning" is a subclass of "PyExc_Exception"; the default warning category is "PyExc_RuntimeWarning". The standard Python warning categories are available as global variables whose names are enumerated at Standard Warning Categories. For information about warning control, see the documentation for the "warnings" module and the "-W" option in the command line documentation. There is no C API for warning control. int PyErr_WarnExplicitObject(PyObject *category, PyObject *message, PyObject *filename, int lineno, PyObject *module, PyObject *registry) Issue a warning message with explicit control over all warning attributes. This is a straightforward wrapper around the Python function "warnings.warn_explicit()"; see there for more information. The *module* and *registry* arguments may be set to "NULL" to get the default effect described there. New in version 3.4. int PyErr_WarnExplicit(PyObject *category, const char *message, const char *filename, int lineno, const char *module, PyObject *registry) Similar to "PyErr_WarnExplicitObject()" except that *message* and *module* are UTF-8 encoded strings, and *filename* is decoded from the filesystem encoding ("os.fsdecode()"). int PyErr_WarnFormat(PyObject *category, Py_ssize_t stack_level, const char *format, ...) Function similar to "PyErr_WarnEx()", but use "PyUnicode_FromFormat()" to format the warning message. *format* is an ASCII-encoded string. New in version 3.2. int PyErr_ResourceWarning(PyObject *source, Py_ssize_t stack_level, const char *format, ...) Function similar to "PyErr_WarnFormat()", but *category* is "ResourceWarning" and it passes *source* to "warnings.WarningMessage()". New in version 3.6. Querying the error indicator ============================ PyObject* PyErr_Occurred() *Return value: Borrowed reference.* Test whether the error indicator is set. If set, return the exception *type* (the first argument to the last call to one of the "PyErr_Set*()" functions or to "PyErr_Restore()"). If not set, return "NULL". You do not own a reference to the return value, so you do not need to "Py_DECREF()" it. The caller must hold the GIL. Note: Do not compare the return value to a specific exception; use "PyErr_ExceptionMatches()" instead, shown below. (The comparison could easily fail since the exception may be an instance instead of a class, in the case of a class exception, or it may be a subclass of the expected exception.) int PyErr_ExceptionMatches(PyObject *exc) Equivalent to "PyErr_GivenExceptionMatches(PyErr_Occurred(), exc)". This should only be called when an exception is actually set; a memory access violation will occur if no exception has been raised. int PyErr_GivenExceptionMatches(PyObject *given, PyObject *exc) Return true if the *given* exception matches the exception type in *exc*. If *exc* is a class object, this also returns true when *given* is an instance of a subclass. If *exc* is a tuple, all exception types in the tuple (and recursively in subtuples) are searched for a match. void PyErr_Fetch(PyObject **ptype, PyObject **pvalue, PyObject **ptraceback) Retrieve the error indicator into three variables whose addresses are passed. If the error indicator is not set, set all three variables to "NULL". If it is set, it will be cleared and you own a reference to each object retrieved. The value and traceback object may be "NULL" even when the type object is not. Note: This function is normally only used by code that needs to catch exceptions or by code that needs to save and restore the error indicator temporarily, e.g.: { PyObject *type, *value, *traceback; PyErr_Fetch(&type, &value, &traceback); /* ... code that might produce other errors ... */ PyErr_Restore(type, value, traceback); } void PyErr_Restore(PyObject *type, PyObject *value, PyObject *traceback) Set the error indicator from the three objects. If the error indicator is already set, it is cleared first. If the objects are "NULL", the error indicator is cleared. Do not pass a "NULL" type and non-"NULL" value or traceback. The exception type should be a class. Do not pass an invalid exception type or value. (Violating these rules will cause subtle problems later.) This call takes away a reference to each object: you must own a reference to each object before the call and after the call you no longer own these references. (If you don’t understand this, don’t use this function. I warned you.) Note: This function is normally only used by code that needs to save and restore the error indicator temporarily. Use "PyErr_Fetch()" to save the current error indicator. void PyErr_NormalizeException(PyObject **exc, PyObject **val, PyObject **tb) Under certain circumstances, the values returned by "PyErr_Fetch()" below can be “unnormalized”, meaning that "*exc" is a class object but "*val" is not an instance of the same class. This function can be used to instantiate the class in that case. If the values are already normalized, nothing happens. The delayed normalization is implemented to improve performance. Note: This function *does not* implicitly set the "__traceback__" attribute on the exception value. If setting the traceback appropriately is desired, the following additional snippet is needed: if (tb != NULL) { PyException_SetTraceback(val, tb); } void PyErr_GetExcInfo(PyObject **ptype, PyObject **pvalue, PyObject **ptraceback) Retrieve the exception info, as known from "sys.exc_info()". This refers to an exception that was *already caught*, not to an exception that was freshly raised. Returns new references for the three objects, any of which may be "NULL". Does not modify the exception info state. Note: This function is not normally used by code that wants to handle exceptions. Rather, it can be used when code needs to save and restore the exception state temporarily. Use "PyErr_SetExcInfo()" to restore or clear the exception state. New in version 3.3. void PyErr_SetExcInfo(PyObject *type, PyObject *value, PyObject *traceback) Set the exception info, as known from "sys.exc_info()". This refers to an exception that was *already caught*, not to an exception that was freshly raised. This function steals the references of the arguments. To clear the exception state, pass "NULL" for all three arguments. For general rules about the three arguments, see "PyErr_Restore()". Note: This function is not normally used by code that wants to handle exceptions. Rather, it can be used when code needs to save and restore the exception state temporarily. Use "PyErr_GetExcInfo()" to read the exception state. New in version 3.3. Signal Handling =============== int PyErr_CheckSignals() This function interacts with Python’s signal handling. It checks whether a signal has been sent to the processes and if so, invokes the corresponding signal handler. If the "signal" module is supported, this can invoke a signal handler written in Python. In all cases, the default effect for "SIGINT" is to raise the "KeyboardInterrupt" exception. If an exception is raised the error indicator is set and the function returns "-1"; otherwise the function returns "0". The error indicator may or may not be cleared if it was previously set. void PyErr_SetInterrupt() Simulate the effect of a "SIGINT" signal arriving. The next time "PyErr_CheckSignals()" is called, the Python signal handler for "SIGINT" will be called. If "SIGINT" isn’t handled by Python (it was set to "signal.SIG_DFL" or "signal.SIG_IGN"), this function does nothing. int PySignal_SetWakeupFd(int fd) This utility function specifies a file descriptor to which the signal number is written as a single byte whenever a signal is received. *fd* must be non-blocking. It returns the previous such file descriptor. The value "-1" disables the feature; this is the initial state. This is equivalent to "signal.set_wakeup_fd()" in Python, but without any error checking. *fd* should be a valid file descriptor. The function should only be called from the main thread. Changed in version 3.5: On Windows, the function now also supports socket handles. Exception Classes ================= PyObject* PyErr_NewException(const char *name, PyObject *base, PyObject *dict) *Return value: New reference.* This utility function creates and returns a new exception class. The *name* argument must be the name of the new exception, a C string of the form "module.classname". The *base* and *dict* arguments are normally "NULL". This creates a class object derived from "Exception" (accessible in C as "PyExc_Exception"). The "__module__" attribute of the new class is set to the first part (up to the last dot) of the *name* argument, and the class name is set to the last part (after the last dot). The *base* argument can be used to specify alternate base classes; it can either be only one class or a tuple of classes. The *dict* argument can be used to specify a dictionary of class variables and methods. PyObject* PyErr_NewExceptionWithDoc(const char *name, const char *doc, PyObject *base, PyObject *dict) *Return value: New reference.* Same as "PyErr_NewException()", except that the new exception class can easily be given a docstring: If *doc* is non-"NULL", it will be used as the docstring for the exception class. New in version 3.2. Exception Objects ================= PyObject* PyException_GetTraceback(PyObject *ex) *Return value: New reference.* Return the traceback associated with the exception as a new reference, as accessible from Python through "__traceback__". If there is no traceback associated, this returns "NULL". int PyException_SetTraceback(PyObject *ex, PyObject *tb) Set the traceback associated with the exception to *tb*. Use "Py_None" to clear it. PyObject* PyException_GetContext(PyObject *ex) *Return value: New reference.* Return the context (another exception instance during whose handling *ex* was raised) associated with the exception as a new reference, as accessible from Python through "__context__". If there is no context associated, this returns "NULL". void PyException_SetContext(PyObject *ex, PyObject *ctx) Set the context associated with the exception to *ctx*. Use "NULL" to clear it. There is no type check to make sure that *ctx* is an exception instance. This steals a reference to *ctx*. PyObject* PyException_GetCause(PyObject *ex) *Return value: New reference.* Return the cause (either an exception instance, or "None", set by "raise ... from ...") associated with the exception as a new reference, as accessible from Python through "__cause__". void PyException_SetCause(PyObject *ex, PyObject *cause) Set the cause associated with the exception to *cause*. Use "NULL" to clear it. There is no type check to make sure that *cause* is either an exception instance or "None". This steals a reference to *cause*. "__suppress_context__" is implicitly set to "True" by this function. Unicode Exception Objects ========================= The following functions are used to create and modify Unicode exceptions from C. PyObject* PyUnicodeDecodeError_Create(const char *encoding, const char *object, Py_ssize_t length, Py_ssize_t start, Py_ssize_t end, const char *reason) *Return value: New reference.* Create a "UnicodeDecodeError" object with the attributes *encoding*, *object*, *length*, *start*, *end* and *reason*. *encoding* and *reason* are UTF-8 encoded strings. PyObject* PyUnicodeEncodeError_Create(const char *encoding, const Py_UNICODE *object, Py_ssize_t length, Py_ssize_t start, Py_ssize_t end, const char *reason) *Return value: New reference.* Create a "UnicodeEncodeError" object with the attributes *encoding*, *object*, *length*, *start*, *end* and *reason*. *encoding* and *reason* are UTF-8 encoded strings. Deprecated since version 3.3: 3.11"Py_UNICODE" is deprecated since Python 3.3. Please migrate to "PyObject_CallFunction(PyExc_UnicodeEncodeError, "sOnns", ...)". PyObject* PyUnicodeTranslateError_Create(const Py_UNICODE *object, Py_ssize_t length, Py_ssize_t start, Py_ssize_t end, const char *reason) *Return value: New reference.* Create a "UnicodeTranslateError" object with the attributes *object*, *length*, *start*, *end* and *reason*. *reason* is a UTF-8 encoded string. Deprecated since version 3.3: 3.11"Py_UNICODE" is deprecated since Python 3.3. Please migrate to "PyObject_CallFunction(PyExc_UnicodeTranslateError, "Onns", ...)". PyObject* PyUnicodeDecodeError_GetEncoding(PyObject *exc) PyObject* PyUnicodeEncodeError_GetEncoding(PyObject *exc) *Return value: New reference.* Return the *encoding* attribute of the given exception object. PyObject* PyUnicodeDecodeError_GetObject(PyObject *exc) PyObject* PyUnicodeEncodeError_GetObject(PyObject *exc) PyObject* PyUnicodeTranslateError_GetObject(PyObject *exc) *Return value: New reference.* Return the *object* attribute of the given exception object. int PyUnicodeDecodeError_GetStart(PyObject *exc, Py_ssize_t *start) int PyUnicodeEncodeError_GetStart(PyObject *exc, Py_ssize_t *start) int PyUnicodeTranslateError_GetStart(PyObject *exc, Py_ssize_t *start) Get the *start* attribute of the given exception object and place it into **start*. *start* must not be "NULL". Return "0" on success, "-1" on failure. int PyUnicodeDecodeError_SetStart(PyObject *exc, Py_ssize_t start) int PyUnicodeEncodeError_SetStart(PyObject *exc, Py_ssize_t start) int PyUnicodeTranslateError_SetStart(PyObject *exc, Py_ssize_t start) Set the *start* attribute of the given exception object to *start*. Return "0" on success, "-1" on failure. int PyUnicodeDecodeError_GetEnd(PyObject *exc, Py_ssize_t *end) int PyUnicodeEncodeError_GetEnd(PyObject *exc, Py_ssize_t *end) int PyUnicodeTranslateError_GetEnd(PyObject *exc, Py_ssize_t *end) Get the *end* attribute of the given exception object and place it into **end*. *end* must not be "NULL". Return "0" on success, "-1" on failure. int PyUnicodeDecodeError_SetEnd(PyObject *exc, Py_ssize_t end) int PyUnicodeEncodeError_SetEnd(PyObject *exc, Py_ssize_t end) int PyUnicodeTranslateError_SetEnd(PyObject *exc, Py_ssize_t end) Set the *end* attribute of the given exception object to *end*. Return "0" on success, "-1" on failure. PyObject* PyUnicodeDecodeError_GetReason(PyObject *exc) PyObject* PyUnicodeEncodeError_GetReason(PyObject *exc) PyObject* PyUnicodeTranslateError_GetReason(PyObject *exc) *Return value: New reference.* Return the *reason* attribute of the given exception object. int PyUnicodeDecodeError_SetReason(PyObject *exc, const char *reason) int PyUnicodeEncodeError_SetReason(PyObject *exc, const char *reason) int PyUnicodeTranslateError_SetReason(PyObject *exc, const char *reason) Set the *reason* attribute of the given exception object to *reason*. Return "0" on success, "-1" on failure. Recursion Control ================= These two functions provide a way to perform safe recursive calls at the C level, both in the core and in extension modules. They are needed if the recursive code does not necessarily invoke Python code (which tracks its recursion depth automatically). They are also not needed for *tp_call* implementations because the call protocol takes care of recursion handling. int Py_EnterRecursiveCall(const char *where) Marks a point where a recursive C-level call is about to be performed. If "USE_STACKCHECK" is defined, this function checks if the OS stack overflowed using "PyOS_CheckStack()". In this is the case, it sets a "MemoryError" and returns a nonzero value. The function then checks if the recursion limit is reached. If this is the case, a "RecursionError" is set and a nonzero value is returned. Otherwise, zero is returned. *where* should be a UTF-8 encoded string such as "" in instance check"" to be concatenated to the "RecursionError" message caused by the recursion depth limit. Changed in version 3.9: This function is now also available in the limited API. void Py_LeaveRecursiveCall(void) Ends a "Py_EnterRecursiveCall()". Must be called once for each *successful* invocation of "Py_EnterRecursiveCall()". Changed in version 3.9: This function is now also available in the limited API. Properly implementing "tp_repr" for container types requires special recursion handling. In addition to protecting the stack, "tp_repr" also needs to track objects to prevent cycles. The following two functions facilitate this functionality. Effectively, these are the C equivalent to "reprlib.recursive_repr()". int Py_ReprEnter(PyObject *object) Called at the beginning of the "tp_repr" implementation to detect cycles. If the object has already been processed, the function returns a positive integer. In that case the "tp_repr" implementation should return a string object indicating a cycle. As examples, "dict" objects return "{...}" and "list" objects return "[...]". The function will return a negative integer if the recursion limit is reached. In that case the "tp_repr" implementation should typically return "NULL". Otherwise, the function returns zero and the "tp_repr" implementation can continue normally. void Py_ReprLeave(PyObject *object) Ends a "Py_ReprEnter()". Must be called once for each invocation of "Py_ReprEnter()" that returns zero. Standard Exceptions =================== All standard Python exceptions are available as global variables whose names are "PyExc_" followed by the Python exception name. These have the type "PyObject*"; they are all class objects. For completeness, here are all the variables: +-------------------------------------------+-----------------------------------+------------+ | C Name | Python Name | Notes | |===========================================|===================================|============| | "PyExc_BaseException" | "BaseException" | [1] | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_Exception" | "Exception" | [1] | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_ArithmeticError" | "ArithmeticError" | [1] | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_AssertionError" | "AssertionError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_AttributeError" | "AttributeError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_BlockingIOError" | "BlockingIOError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_BrokenPipeError" | "BrokenPipeError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_BufferError" | "BufferError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_ChildProcessError" | "ChildProcessError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_ConnectionAbortedError" | "ConnectionAbortedError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_ConnectionError" | "ConnectionError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_ConnectionRefusedError" | "ConnectionRefusedError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_ConnectionResetError" | "ConnectionResetError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_EOFError" | "EOFError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_FileExistsError" | "FileExistsError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_FileNotFoundError" | "FileNotFoundError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_FloatingPointError" | "FloatingPointError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_GeneratorExit" | "GeneratorExit" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_ImportError" | "ImportError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_IndentationError" | "IndentationError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_IndexError" | "IndexError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_InterruptedError" | "InterruptedError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_IsADirectoryError" | "IsADirectoryError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_KeyError" | "KeyError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_KeyboardInterrupt" | "KeyboardInterrupt" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_LookupError" | "LookupError" | [1] | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_MemoryError" | "MemoryError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_ModuleNotFoundError" | "ModuleNotFoundError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_NameError" | "NameError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_NotADirectoryError" | "NotADirectoryError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_NotImplementedError" | "NotImplementedError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_OSError" | "OSError" | [1] | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_OverflowError" | "OverflowError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_PermissionError" | "PermissionError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_ProcessLookupError" | "ProcessLookupError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_RecursionError" | "RecursionError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_ReferenceError" | "ReferenceError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_RuntimeError" | "RuntimeError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_StopAsyncIteration" | "StopAsyncIteration" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_StopIteration" | "StopIteration" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_SyntaxError" | "SyntaxError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_SystemError" | "SystemError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_SystemExit" | "SystemExit" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_TabError" | "TabError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_TimeoutError" | "TimeoutError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_TypeError" | "TypeError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_UnboundLocalError" | "UnboundLocalError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_UnicodeDecodeError" | "UnicodeDecodeError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_UnicodeEncodeError" | "UnicodeEncodeError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_UnicodeError" | "UnicodeError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_UnicodeTranslateError" | "UnicodeTranslateError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_ValueError" | "ValueError" | | +-------------------------------------------+-----------------------------------+------------+ | "PyExc_ZeroDivisionError" | "ZeroDivisionError" | | +-------------------------------------------+-----------------------------------+------------+ New in version 3.3: "PyExc_BlockingIOError", "PyExc_BrokenPipeError", "PyExc_ChildProcessError", "PyExc_ConnectionError", "PyExc_ConnectionAbortedError", "PyExc_ConnectionRefusedError", "PyExc_ConnectionResetError", "PyExc_FileExistsError", "PyExc_FileNotFoundError", "PyExc_InterruptedError", "PyExc_IsADirectoryError", "PyExc_NotADirectoryError", "PyExc_PermissionError", "PyExc_ProcessLookupError" and "PyExc_TimeoutError" were introduced following **PEP 3151**. New in version 3.5: "PyExc_StopAsyncIteration" and "PyExc_RecursionError". New in version 3.6: "PyExc_ModuleNotFoundError". These are compatibility aliases to "PyExc_OSError": +---------------------------------------+------------+ | C Name | Notes | |=======================================|============| | "PyExc_EnvironmentError" | | +---------------------------------------+------------+ | "PyExc_IOError" | | +---------------------------------------+------------+ | "PyExc_WindowsError" | [2] | +---------------------------------------+------------+ Changed in version 3.3: These aliases used to be separate exception types. Notes: [1] This is a base class for other standard exceptions. [2] Only defined on Windows; protect code that uses this by testing that the preprocessor macro "MS_WINDOWS" is defined. Standard Warning Categories =========================== All standard Python warning categories are available as global variables whose names are "PyExc_" followed by the Python exception name. These have the type "PyObject*"; they are all class objects. For completeness, here are all the variables: +--------------------------------------------+-----------------------------------+------------+ | C Name | Python Name | Notes | |============================================|===================================|============| | "PyExc_Warning" | "Warning" | [3] | +--------------------------------------------+-----------------------------------+------------+ | "PyExc_BytesWarning" | "BytesWarning" | | +--------------------------------------------+-----------------------------------+------------+ | "PyExc_DeprecationWarning" | "DeprecationWarning" | | +--------------------------------------------+-----------------------------------+------------+ | "PyExc_FutureWarning" | "FutureWarning" | | +--------------------------------------------+-----------------------------------+------------+ | "PyExc_ImportWarning" | "ImportWarning" | | +--------------------------------------------+-----------------------------------+------------+ | "PyExc_PendingDeprecationWarning" | "PendingDeprecationWarning" | | +--------------------------------------------+-----------------------------------+------------+ | "PyExc_ResourceWarning" | "ResourceWarning" | | +--------------------------------------------+-----------------------------------+------------+ | "PyExc_RuntimeWarning" | "RuntimeWarning" | | +--------------------------------------------+-----------------------------------+------------+ | "PyExc_SyntaxWarning" | "SyntaxWarning" | | +--------------------------------------------+-----------------------------------+------------+ | "PyExc_UnicodeWarning" | "UnicodeWarning" | | +--------------------------------------------+-----------------------------------+------------+ | "PyExc_UserWarning" | "UserWarning" | | +--------------------------------------------+-----------------------------------+------------+ New in version 3.2: "PyExc_ResourceWarning". Notes: [3] This is a base class for other standard warning categories. File Objects ************ These APIs are a minimal emulation of the Python 2 C API for built-in file objects, which used to rely on the buffered I/O ("FILE*") support from the C standard library. In Python 3, files and streams use the new "io" module, which defines several layers over the low-level unbuffered I/O of the operating system. The functions described below are convenience C wrappers over these new APIs, and meant mostly for internal error reporting in the interpreter; third-party code is advised to access the "io" APIs instead. PyObject* PyFile_FromFd(int fd, const char *name, const char *mode, int buffering, const char *encoding, const char *errors, const char *newline, int closefd) *Return value: New reference.* Create a Python file object from the file descriptor of an already opened file *fd*. The arguments *name*, *encoding*, *errors* and *newline* can be "NULL" to use the defaults; *buffering* can be *-1* to use the default. *name* is ignored and kept for backward compatibility. Return "NULL" on failure. For a more comprehensive description of the arguments, please refer to the "io.open()" function documentation. Warning: Since Python streams have their own buffering layer, mixing them with OS-level file descriptors can produce various issues (such as unexpected ordering of data). Changed in version 3.2: Ignore *name* attribute. int PyObject_AsFileDescriptor(PyObject *p) Return the file descriptor associated with *p* as an "int". If the object is an integer, its value is returned. If not, the object’s "fileno()" method is called if it exists; the method must return an integer, which is returned as the file descriptor value. Sets an exception and returns "-1" on failure. PyObject* PyFile_GetLine(PyObject *p, int n) *Return value: New reference.* Equivalent to "p.readline([n])", this function reads one line from the object *p*. *p* may be a file object or any object with a "readline()" method. If *n* is "0", exactly one line is read, regardless of the length of the line. If *n* is greater than "0", no more than *n* bytes will be read from the file; a partial line can be returned. In both cases, an empty string is returned if the end of the file is reached immediately. If *n* is less than "0", however, one line is read regardless of length, but "EOFError" is raised if the end of the file is reached immediately. int PyFile_SetOpenCodeHook(Py_OpenCodeHookFunction handler) Overrides the normal behavior of "io.open_code()" to pass its parameter through the provided handler. The handler is a function of type "PyObject *(*)(PyObject *path, void *userData)", where *path* is guaranteed to be "PyUnicodeObject". The *userData* pointer is passed into the hook function. Since hook functions may be called from different runtimes, this pointer should not refer directly to Python state. As this hook is intentionally used during import, avoid importing new modules during its execution unless they are known to be frozen or available in "sys.modules". Once a hook has been set, it cannot be removed or replaced, and later calls to "PyFile_SetOpenCodeHook()" will fail. On failure, the function returns -1 and sets an exception if the interpreter has been initialized. This function is safe to call before "Py_Initialize()". Raises an auditing event "setopencodehook" with no arguments. New in version 3.8. int PyFile_WriteObject(PyObject *obj, PyObject *p, int flags) Write object *obj* to file object *p*. The only supported flag for *flags* is "Py_PRINT_RAW"; if given, the "str()" of the object is written instead of the "repr()". Return "0" on success or "-1" on failure; the appropriate exception will be set. int PyFile_WriteString(const char *s, PyObject *p) Write string *s* to file object *p*. Return "0" on success or "-1" on failure; the appropriate exception will be set. Floating Point Objects ********************** PyFloatObject This subtype of "PyObject" represents a Python floating point object. PyTypeObject PyFloat_Type This instance of "PyTypeObject" represents the Python floating point type. This is the same object as "float" in the Python layer. int PyFloat_Check(PyObject *p) Return true if its argument is a "PyFloatObject" or a subtype of "PyFloatObject". This function always succeeds. int PyFloat_CheckExact(PyObject *p) Return true if its argument is a "PyFloatObject", but not a subtype of "PyFloatObject". This function always succeeds. PyObject* PyFloat_FromString(PyObject *str) *Return value: New reference.* Create a "PyFloatObject" object based on the string value in *str*, or "NULL" on failure. PyObject* PyFloat_FromDouble(double v) *Return value: New reference.* Create a "PyFloatObject" object from *v*, or "NULL" on failure. double PyFloat_AsDouble(PyObject *pyfloat) Return a C "double" representation of the contents of *pyfloat*. If *pyfloat* is not a Python floating point object but has a "__float__()" method, this method will first be called to convert *pyfloat* into a float. If "__float__()" is not defined then it falls back to "__index__()". This method returns "-1.0" upon failure, so one should call "PyErr_Occurred()" to check for errors. Changed in version 3.8: Use "__index__()" if available. double PyFloat_AS_DOUBLE(PyObject *pyfloat) Return a C "double" representation of the contents of *pyfloat*, but without error checking. PyObject* PyFloat_GetInfo(void) *Return value: New reference.* Return a structseq instance which contains information about the precision, minimum and maximum values of a float. It’s a thin wrapper around the header file "float.h". double PyFloat_GetMax() Return the maximum representable finite float *DBL_MAX* as C "double". double PyFloat_GetMin() Return the minimum normalized positive float *DBL_MIN* as C "double". Function Objects **************** There are a few functions specific to Python functions. PyFunctionObject The C structure used for functions. PyTypeObject PyFunction_Type This is an instance of "PyTypeObject" and represents the Python function type. It is exposed to Python programmers as "types.FunctionType". int PyFunction_Check(PyObject *o) Return true if *o* is a function object (has type "PyFunction_Type"). The parameter must not be "NULL". This function always succeeds. PyObject* PyFunction_New(PyObject *code, PyObject *globals) *Return value: New reference.* Return a new function object associated with the code object *code*. *globals* must be a dictionary with the global variables accessible to the function. The function’s docstring and name are retrieved from the code object. *__module__* is retrieved from *globals*. The argument defaults, annotations and closure are set to "NULL". *__qualname__* is set to the same value as the function’s name. PyObject* PyFunction_NewWithQualName(PyObject *code, PyObject *globals, PyObject *qualname) *Return value: New reference.* As "PyFunction_New()", but also allows setting the function object’s "__qualname__" attribute. *qualname* should be a unicode object or "NULL"; if "NULL", the "__qualname__" attribute is set to the same value as its "__name__" attribute. New in version 3.3. PyObject* PyFunction_GetCode(PyObject *op) *Return value: Borrowed reference.* Return the code object associated with the function object *op*. PyObject* PyFunction_GetGlobals(PyObject *op) *Return value: Borrowed reference.* Return the globals dictionary associated with the function object *op*. PyObject* PyFunction_GetModule(PyObject *op) *Return value: Borrowed reference.* Return the *__module__* attribute of the function object *op*. This is normally a string containing the module name, but can be set to any other object by Python code. PyObject* PyFunction_GetDefaults(PyObject *op) *Return value: Borrowed reference.* Return the argument default values of the function object *op*. This can be a tuple of arguments or "NULL". int PyFunction_SetDefaults(PyObject *op, PyObject *defaults) Set the argument default values for the function object *op*. *defaults* must be "Py_None" or a tuple. Raises "SystemError" and returns "-1" on failure. PyObject* PyFunction_GetClosure(PyObject *op) *Return value: Borrowed reference.* Return the closure associated with the function object *op*. This can be "NULL" or a tuple of cell objects. int PyFunction_SetClosure(PyObject *op, PyObject *closure) Set the closure associated with the function object *op*. *closure* must be "Py_None" or a tuple of cell objects. Raises "SystemError" and returns "-1" on failure. PyObject *PyFunction_GetAnnotations(PyObject *op) *Return value: Borrowed reference.* Return the annotations of the function object *op*. This can be a mutable dictionary or "NULL". int PyFunction_SetAnnotations(PyObject *op, PyObject *annotations) Set the annotations for the function object *op*. *annotations* must be a dictionary or "Py_None". Raises "SystemError" and returns "-1" on failure. Supporting Cyclic Garbage Collection ************************************ Python’s support for detecting and collecting garbage which involves circular references requires support from object types which are “containers” for other objects which may also be containers. Types which do not store references to other objects, or which only store references to atomic types (such as numbers or strings), do not need to provide any explicit support for garbage collection. To create a container type, the "tp_flags" field of the type object must include the "Py_TPFLAGS_HAVE_GC" and provide an implementation of the "tp_traverse" handler. If instances of the type are mutable, a "tp_clear" implementation must also be provided. Py_TPFLAGS_HAVE_GC Objects with a type with this flag set must conform with the rules documented here. For convenience these objects will be referred to as container objects. Constructors for container types must conform to two rules: 1. The memory for the object must be allocated using "PyObject_GC_New()" or "PyObject_GC_NewVar()". 2. Once all the fields which may contain references to other containers are initialized, it must call "PyObject_GC_Track()". Similarly, the deallocator for the object must conform to a similar pair of rules: 1. Before fields which refer to other containers are invalidated, "PyObject_GC_UnTrack()" must be called. 2. The object’s memory must be deallocated using "PyObject_GC_Del()". Warning: If a type adds the Py_TPFLAGS_HAVE_GC, then it *must* implement at least a "tp_traverse" handler or explicitly use one from its subclass or subclasses.When calling "PyType_Ready()" or some of the APIs that indirectly call it like "PyType_FromSpecWithBases()" or "PyType_FromSpec()" the interpreter will automatically populate the "tp_flags", "tp_traverse" and "tp_clear" fields if the type inherits from a class that implements the garbage collector protocol and the child class does *not* include the "Py_TPFLAGS_HAVE_GC" flag. TYPE* PyObject_GC_New(TYPE, PyTypeObject *type) Analogous to "PyObject_New()" but for container objects with the "Py_TPFLAGS_HAVE_GC" flag set. TYPE* PyObject_GC_NewVar(TYPE, PyTypeObject *type, Py_ssize_t size) Analogous to "PyObject_NewVar()" but for container objects with the "Py_TPFLAGS_HAVE_GC" flag set. TYPE* PyObject_GC_Resize(TYPE, PyVarObject *op, Py_ssize_t newsize) Resize an object allocated by "PyObject_NewVar()". Returns the resized object or "NULL" on failure. *op* must not be tracked by the collector yet. void PyObject_GC_Track(PyObject *op) Adds the object *op* to the set of container objects tracked by the collector. The collector can run at unexpected times so objects must be valid while being tracked. This should be called once all the fields followed by the "tp_traverse" handler become valid, usually near the end of the constructor. int PyObject_IS_GC(PyObject *obj) Returns non-zero if the object implements the garbage collector protocol, otherwise returns 0. The object cannot be tracked by the garbage collector if this function returns 0. int PyObject_GC_IsTracked(PyObject *op) Returns 1 if the object type of *op* implements the GC protocol and *op* is being currently tracked by the garbage collector and 0 otherwise. This is analogous to the Python function "gc.is_tracked()". New in version 3.9. int PyObject_GC_IsFinalized(PyObject *op) Returns 1 if the object type of *op* implements the GC protocol and *op* has been already finalized by the garbage collector and 0 otherwise. This is analogous to the Python function "gc.is_finalized()". New in version 3.9. void PyObject_GC_Del(void *op) Releases memory allocated to an object using "PyObject_GC_New()" or "PyObject_GC_NewVar()". void PyObject_GC_UnTrack(void *op) Remove the object *op* from the set of container objects tracked by the collector. Note that "PyObject_GC_Track()" can be called again on this object to add it back to the set of tracked objects. The deallocator ("tp_dealloc" handler) should call this for the object before any of the fields used by the "tp_traverse" handler become invalid. Changed in version 3.8: The "_PyObject_GC_TRACK()" and "_PyObject_GC_UNTRACK()" macros have been removed from the public C API. The "tp_traverse" handler accepts a function parameter of this type: int (*visitproc)(PyObject *object, void *arg) Type of the visitor function passed to the "tp_traverse" handler. The function should be called with an object to traverse as *object* and the third parameter to the "tp_traverse" handler as *arg*. The Python core uses several visitor functions to implement cyclic garbage detection; it’s not expected that users will need to write their own visitor functions. The "tp_traverse" handler must have the following type: int (*traverseproc)(PyObject *self, visitproc visit, void *arg) Traversal function for a container object. Implementations must call the *visit* function for each object directly contained by *self*, with the parameters to *visit* being the contained object and the *arg* value passed to the handler. The *visit* function must not be called with a "NULL" object argument. If *visit* returns a non-zero value that value should be returned immediately. To simplify writing "tp_traverse" handlers, a "Py_VISIT()" macro is provided. In order to use this macro, the "tp_traverse" implementation must name its arguments exactly *visit* and *arg*: void Py_VISIT(PyObject *o) If *o* is not "NULL", call the *visit* callback, with arguments *o* and *arg*. If *visit* returns a non-zero value, then return it. Using this macro, "tp_traverse" handlers look like: static int my_traverse(Noddy *self, visitproc visit, void *arg) { Py_VISIT(self->foo); Py_VISIT(self->bar); return 0; } The "tp_clear" handler must be of the "inquiry" type, or "NULL" if the object is immutable. int (*inquiry)(PyObject *self) Drop references that may have created reference cycles. Immutable objects do not have to define this method since they can never directly create reference cycles. Note that the object must still be valid after calling this method (don’t just call "Py_DECREF()" on a reference). The collector will call this method if it detects that this object is involved in a reference cycle. Generator Objects ***************** Generator objects are what Python uses to implement generator iterators. They are normally created by iterating over a function that yields values, rather than explicitly calling "PyGen_New()" or "PyGen_NewWithQualName()". PyGenObject The C structure used for generator objects. PyTypeObject PyGen_Type The type object corresponding to generator objects. int PyGen_Check(PyObject *ob) Return true if *ob* is a generator object; *ob* must not be "NULL". This function always succeeds. int PyGen_CheckExact(PyObject *ob) Return true if *ob*’s type is "PyGen_Type"; *ob* must not be "NULL". This function always succeeds. PyObject* PyGen_New(PyFrameObject *frame) *Return value: New reference.* Create and return a new generator object based on the *frame* object. A reference to *frame* is stolen by this function. The argument must not be "NULL". PyObject* PyGen_NewWithQualName(PyFrameObject *frame, PyObject *name, PyObject *qualname) *Return value: New reference.* Create and return a new generator object based on the *frame* object, with "__name__" and "__qualname__" set to *name* and *qualname*. A reference to *frame* is stolen by this function. The *frame* argument must not be "NULL". Importing Modules ***************** PyObject* PyImport_ImportModule(const char *name) *Return value: New reference.* This is a simplified interface to "PyImport_ImportModuleEx()" below, leaving the *globals* and *locals* arguments set to "NULL" and *level* set to 0. When the *name* argument contains a dot (when it specifies a submodule of a package), the *fromlist* argument is set to the list "['*']" so that the return value is the named module rather than the top-level package containing it as would otherwise be the case. (Unfortunately, this has an additional side effect when *name* in fact specifies a subpackage instead of a submodule: the submodules specified in the package’s "__all__" variable are loaded.) Return a new reference to the imported module, or "NULL" with an exception set on failure. A failing import of a module doesn’t leave the module in "sys.modules". This function always uses absolute imports. PyObject* PyImport_ImportModuleNoBlock(const char *name) *Return value: New reference.* This function is a deprecated alias of "PyImport_ImportModule()". Changed in version 3.3: This function used to fail immediately when the import lock was held by another thread. In Python 3.3 though, the locking scheme switched to per-module locks for most purposes, so this function’s special behaviour isn’t needed anymore. PyObject* PyImport_ImportModuleEx(const char *name, PyObject *globals, PyObject *locals, PyObject *fromlist) *Return value: New reference.* Import a module. This is best described by referring to the built- in Python function "__import__()". The return value is a new reference to the imported module or top- level package, or "NULL" with an exception set on failure. Like for "__import__()", the return value when a submodule of a package was requested is normally the top-level package, unless a non-empty *fromlist* was given. Failing imports remove incomplete module objects, like with "PyImport_ImportModule()". PyObject* PyImport_ImportModuleLevelObject(PyObject *name, PyObject *globals, PyObject *locals, PyObject *fromlist, int level) *Return value: New reference.* Import a module. This is best described by referring to the built- in Python function "__import__()", as the standard "__import__()" function calls this function directly. The return value is a new reference to the imported module or top- level package, or "NULL" with an exception set on failure. Like for "__import__()", the return value when a submodule of a package was requested is normally the top-level package, unless a non-empty *fromlist* was given. New in version 3.3. PyObject* PyImport_ImportModuleLevel(const char *name, PyObject *globals, PyObject *locals, PyObject *fromlist, int level) *Return value: New reference.* Similar to "PyImport_ImportModuleLevelObject()", but the name is a UTF-8 encoded string instead of a Unicode object. Changed in version 3.3: Negative values for *level* are no longer accepted. PyObject* PyImport_Import(PyObject *name) *Return value: New reference.* This is a higher-level interface that calls the current “import hook function” (with an explicit *level* of 0, meaning absolute import). It invokes the "__import__()" function from the "__builtins__" of the current globals. This means that the import is done using whatever import hooks are installed in the current environment. This function always uses absolute imports. PyObject* PyImport_ReloadModule(PyObject *m) *Return value: New reference.* Reload a module. Return a new reference to the reloaded module, or "NULL" with an exception set on failure (the module still exists in this case). PyObject* PyImport_AddModuleObject(PyObject *name) *Return value: Borrowed reference.* Return the module object corresponding to a module name. The *name* argument may be of the form "package.module". First check the modules dictionary if there’s one there, and if not, create a new one and insert it in the modules dictionary. Return "NULL" with an exception set on failure. Note: This function does not load or import the module; if the module wasn’t already loaded, you will get an empty module object. Use "PyImport_ImportModule()" or one of its variants to import a module. Package structures implied by a dotted name for *name* are not created if not already present. New in version 3.3. PyObject* PyImport_AddModule(const char *name) *Return value: Borrowed reference.* Similar to "PyImport_AddModuleObject()", but the name is a UTF-8 encoded string instead of a Unicode object. PyObject* PyImport_ExecCodeModule(const char *name, PyObject *co) *Return value: New reference.* Given a module name (possibly of the form "package.module") and a code object read from a Python bytecode file or obtained from the built-in function "compile()", load the module. Return a new reference to the module object, or "NULL" with an exception set if an error occurred. *name* is removed from "sys.modules" in error cases, even if *name* was already in "sys.modules" on entry to "PyImport_ExecCodeModule()". Leaving incompletely initialized modules in "sys.modules" is dangerous, as imports of such modules have no way to know that the module object is an unknown (and probably damaged with respect to the module author’s intents) state. The module’s "__spec__" and "__loader__" will be set, if not set already, with the appropriate values. The spec’s loader will be set to the module’s "__loader__" (if set) and to an instance of "SourceFileLoader" otherwise. The module’s "__file__" attribute will be set to the code object’s "co_filename". If applicable, "__cached__" will also be set. This function will reload the module if it was already imported. See "PyImport_ReloadModule()" for the intended way to reload a module. If *name* points to a dotted name of the form "package.module", any package structures not already created will still not be created. See also "PyImport_ExecCodeModuleEx()" and "PyImport_ExecCodeModuleWithPathnames()". PyObject* PyImport_ExecCodeModuleEx(const char *name, PyObject *co, const char *pathname) *Return value: New reference.* Like "PyImport_ExecCodeModule()", but the "__file__" attribute of the module object is set to *pathname* if it is non-"NULL". See also "PyImport_ExecCodeModuleWithPathnames()". PyObject* PyImport_ExecCodeModuleObject(PyObject *name, PyObject *co, PyObject *pathname, PyObject *cpathname) *Return value: New reference.* Like "PyImport_ExecCodeModuleEx()", but the "__cached__" attribute of the module object is set to *cpathname* if it is non-"NULL". Of the three functions, this is the preferred one to use. New in version 3.3. PyObject* PyImport_ExecCodeModuleWithPathnames(const char *name, PyObject *co, const char *pathname, const char *cpathname) *Return value: New reference.* Like "PyImport_ExecCodeModuleObject()", but *name*, *pathname* and *cpathname* are UTF-8 encoded strings. Attempts are also made to figure out what the value for *pathname* should be from *cpathname* if the former is set to "NULL". New in version 3.2. Changed in version 3.3: Uses "imp.source_from_cache()" in calculating the source path if only the bytecode path is provided. long PyImport_GetMagicNumber() Return the magic number for Python bytecode files (a.k.a. ".pyc" file). The magic number should be present in the first four bytes of the bytecode file, in little-endian byte order. Returns "-1" on error. Changed in version 3.3: Return value of "-1" upon failure. const char * PyImport_GetMagicTag() Return the magic tag string for **PEP 3147** format Python bytecode file names. Keep in mind that the value at "sys.implementation.cache_tag" is authoritative and should be used instead of this function. New in version 3.2. PyObject* PyImport_GetModuleDict() *Return value: Borrowed reference.* Return the dictionary used for the module administration (a.k.a. "sys.modules"). Note that this is a per-interpreter variable. PyObject* PyImport_GetModule(PyObject *name) *Return value: New reference.* Return the already imported module with the given name. If the module has not been imported yet then returns "NULL" but does not set an error. Returns "NULL" and sets an error if the lookup failed. New in version 3.7. PyObject* PyImport_GetImporter(PyObject *path) *Return value: New reference.* Return a finder object for a "sys.path"/"pkg.__path__" item *path*, possibly by fetching it from the "sys.path_importer_cache" dict. If it wasn’t yet cached, traverse "sys.path_hooks" until a hook is found that can handle the path item. Return "None" if no hook could; this tells our caller that the *path based finder* could not find a finder for this path item. Cache the result in "sys.path_importer_cache". Return a new reference to the finder object. int PyImport_ImportFrozenModuleObject(PyObject *name) *Return value: New reference.* Load a frozen module named *name*. Return "1" for success, "0" if the module is not found, and "-1" with an exception set if the initialization failed. To access the imported module on a successful load, use "PyImport_ImportModule()". (Note the misnomer — this function would reload the module if it was already imported.) New in version 3.3. Changed in version 3.4: The "__file__" attribute is no longer set on the module. int PyImport_ImportFrozenModule(const char *name) Similar to "PyImport_ImportFrozenModuleObject()", but the name is a UTF-8 encoded string instead of a Unicode object. struct _frozen This is the structure type definition for frozen module descriptors, as generated by the **freeze** utility (see "Tools/freeze/" in the Python source distribution). Its definition, found in "Include/import.h", is: struct _frozen { const char *name; const unsigned char *code; int size; }; const struct _frozen* PyImport_FrozenModules This pointer is initialized to point to an array of "struct _frozen" records, terminated by one whose members are all "NULL" or zero. When a frozen module is imported, it is searched in this table. Third-party code could play tricks with this to provide a dynamically created collection of frozen modules. int PyImport_AppendInittab(const char *name, PyObject* (*initfunc)(void)) Add a single module to the existing table of built-in modules. This is a convenience wrapper around "PyImport_ExtendInittab()", returning "-1" if the table could not be extended. The new module can be imported by the name *name*, and uses the function *initfunc* as the initialization function called on the first attempted import. This should be called before "Py_Initialize()". struct _inittab Structure describing a single entry in the list of built-in modules. Each of these structures gives the name and initialization function for a module built into the interpreter. The name is an ASCII encoded string. Programs which embed Python may use an array of these structures in conjunction with "PyImport_ExtendInittab()" to provide additional built-in modules. The structure is defined in "Include/import.h" as: struct _inittab { const char *name; /* ASCII encoded string */ PyObject* (*initfunc)(void); }; int PyImport_ExtendInittab(struct _inittab *newtab) Add a collection of modules to the table of built-in modules. The *newtab* array must end with a sentinel entry which contains "NULL" for the "name" field; failure to provide the sentinel value can result in a memory fault. Returns "0" on success or "-1" if insufficient memory could be allocated to extend the internal table. In the event of failure, no modules are added to the internal table. This must be called before "Py_Initialize()". If Python is initialized multiple times, "PyImport_AppendInittab()" or "PyImport_ExtendInittab()" must be called before each Python initialization. Python/C API Reference Manual ***************************** This manual documents the API used by C and C++ programmers who want to write extension modules or embed Python. It is a companion to Extending and Embedding the Python Interpreter, which describes the general principles of extension writing but does not document the API functions in detail. * Introduction * Coding standards * Include Files * Useful macros * Objects, Types and Reference Counts * Exceptions * Embedding Python * Debugging Builds * Stable Application Binary Interface * The Very High Level Layer * Reference Counting * Exception Handling * Printing and clearing * Raising exceptions * Issuing warnings * Querying the error indicator * Signal Handling * Exception Classes * Exception Objects * Unicode Exception Objects * Recursion Control * Standard Exceptions * Standard Warning Categories * Utilities * Operating System Utilities * System Functions * Process Control * Importing Modules * Data marshalling support * Parsing arguments and building values * String conversion and formatting * Reflection * Codec registry and support functions * Abstract Objects Layer * Object Protocol * Call Protocol * Number Protocol * Sequence Protocol * Mapping Protocol * Iterator Protocol * Buffer Protocol * Old Buffer Protocol * Concrete Objects Layer * Fundamental Objects * Numeric Objects * Sequence Objects * Container Objects * Function Objects * Other Objects * Initialization, Finalization, and Threads * Before Python Initialization * Global configuration variables * Initializing and finalizing the interpreter * Process-wide parameters * Thread State and the Global Interpreter Lock * Sub-interpreter support * Asynchronous Notifications * Profiling and Tracing * Advanced Debugger Support * Thread Local Storage Support * Python Initialization Configuration * PyWideStringList * PyStatus * PyPreConfig * Preinitialization with PyPreConfig * PyConfig * Initialization with PyConfig * Isolated Configuration * Python Configuration * Path Configuration * Py_RunMain() * Py_GetArgcArgv() * Multi-Phase Initialization Private Provisional API * Memory Management * Overview * Raw Memory Interface * Memory Interface * Object allocators * Default Memory Allocators * Customize Memory Allocators * The pymalloc allocator * tracemalloc C API * Examples * Object Implementation Support * Allocating Objects on the Heap * Common Object Structures * Type Objects * Number Object Structures * Mapping Object Structures * Sequence Object Structures * Buffer Object Structures * Async Object Structures * Slot Type typedefs * Examples * Supporting Cyclic Garbage Collection * API and ABI Versioning Initialization, Finalization, and Threads ***************************************** See also Python Initialization Configuration. Before Python Initialization ============================ In an application embedding Python, the "Py_Initialize()" function must be called before using any other Python/C API functions; with the exception of a few functions and the global configuration variables. The following functions can be safely called before Python is initialized: * Configuration functions: * "PyImport_AppendInittab()" * "PyImport_ExtendInittab()" * "PyInitFrozenExtensions()" * "PyMem_SetAllocator()" * "PyMem_SetupDebugHooks()" * "PyObject_SetArenaAllocator()" * "Py_SetPath()" * "Py_SetProgramName()" * "Py_SetPythonHome()" * "Py_SetStandardStreamEncoding()" * "PySys_AddWarnOption()" * "PySys_AddXOption()" * "PySys_ResetWarnOptions()" * Informative functions: * "Py_IsInitialized()" * "PyMem_GetAllocator()" * "PyObject_GetArenaAllocator()" * "Py_GetBuildInfo()" * "Py_GetCompiler()" * "Py_GetCopyright()" * "Py_GetPlatform()" * "Py_GetVersion()" * Utilities: * "Py_DecodeLocale()" * Memory allocators: * "PyMem_RawMalloc()" * "PyMem_RawRealloc()" * "PyMem_RawCalloc()" * "PyMem_RawFree()" Note: The following functions **should not be called** before "Py_Initialize()": "Py_EncodeLocale()", "Py_GetPath()", "Py_GetPrefix()", "Py_GetExecPrefix()", "Py_GetProgramFullPath()", "Py_GetPythonHome()", "Py_GetProgramName()" and "PyEval_InitThreads()". Global configuration variables ============================== Python has variables for the global configuration to control different features and options. By default, these flags are controlled by command line options. When a flag is set by an option, the value of the flag is the number of times that the option was set. For example, "-b" sets "Py_BytesWarningFlag" to 1 and "-bb" sets "Py_BytesWarningFlag" to 2. int Py_BytesWarningFlag Issue a warning when comparing "bytes" or "bytearray" with "str" or "bytes" with "int". Issue an error if greater or equal to "2". Set by the "-b" option. int Py_DebugFlag Turn on parser debugging output (for expert only, depending on compilation options). Set by the "-d" option and the "PYTHONDEBUG" environment variable. int Py_DontWriteBytecodeFlag If set to non-zero, Python won’t try to write ".pyc" files on the import of source modules. Set by the "-B" option and the "PYTHONDONTWRITEBYTECODE" environment variable. int Py_FrozenFlag Suppress error messages when calculating the module search path in "Py_GetPath()". Private flag used by "_freeze_importlib" and "frozenmain" programs. int Py_HashRandomizationFlag Set to "1" if the "PYTHONHASHSEED" environment variable is set to a non-empty string. If the flag is non-zero, read the "PYTHONHASHSEED" environment variable to initialize the secret hash seed. int Py_IgnoreEnvironmentFlag Ignore all "PYTHON*" environment variables, e.g. "PYTHONPATH" and "PYTHONHOME", that might be set. Set by the "-E" and "-I" options. int Py_InspectFlag When a script is passed as first argument or the "-c" option is used, enter interactive mode after executing the script or the command, even when "sys.stdin" does not appear to be a terminal. Set by the "-i" option and the "PYTHONINSPECT" environment variable. int Py_InteractiveFlag Set by the "-i" option. int Py_IsolatedFlag Run Python in isolated mode. In isolated mode "sys.path" contains neither the script’s directory nor the user’s site-packages directory. Set by the "-I" option. New in version 3.4. int Py_LegacyWindowsFSEncodingFlag If the flag is non-zero, use the "mbcs" encoding instead of the UTF-8 encoding for the filesystem encoding. Set to "1" if the "PYTHONLEGACYWINDOWSFSENCODING" environment variable is set to a non-empty string. See **PEP 529** for more details. Availability: Windows. int Py_LegacyWindowsStdioFlag If the flag is non-zero, use "io.FileIO" instead of "WindowsConsoleIO" for "sys" standard streams. Set to "1" if the "PYTHONLEGACYWINDOWSSTDIO" environment variable is set to a non-empty string. See **PEP 528** for more details. Availability: Windows. int Py_NoSiteFlag Disable the import of the module "site" and the site-dependent manipulations of "sys.path" that it entails. Also disable these manipulations if "site" is explicitly imported later (call "site.main()" if you want them to be triggered). Set by the "-S" option. int Py_NoUserSiteDirectory Don’t add the "user site-packages directory" to "sys.path". Set by the "-s" and "-I" options, and the "PYTHONNOUSERSITE" environment variable. int Py_OptimizeFlag Set by the "-O" option and the "PYTHONOPTIMIZE" environment variable. int Py_QuietFlag Don’t display the copyright and version messages even in interactive mode. Set by the "-q" option. New in version 3.2. int Py_UnbufferedStdioFlag Force the stdout and stderr streams to be unbuffered. Set by the "-u" option and the "PYTHONUNBUFFERED" environment variable. int Py_VerboseFlag Print a message each time a module is initialized, showing the place (filename or built-in module) from which it is loaded. If greater or equal to "2", print a message for each file that is checked for when searching for a module. Also provides information on module cleanup at exit. Set by the "-v" option and the "PYTHONVERBOSE" environment variable. Initializing and finalizing the interpreter =========================================== void Py_Initialize() Initialize the Python interpreter. In an application embedding Python, this should be called before using any other Python/C API functions; see Before Python Initialization for the few exceptions. This initializes the table of loaded modules ("sys.modules"), and creates the fundamental modules "builtins", "__main__" and "sys". It also initializes the module search path ("sys.path"). It does not set "sys.argv"; use "PySys_SetArgvEx()" for that. This is a no-op when called for a second time (without calling "Py_FinalizeEx()" first). There is no return value; it is a fatal error if the initialization fails. Note: On Windows, changes the console mode from "O_TEXT" to "O_BINARY", which will also affect non-Python uses of the console using the C Runtime. void Py_InitializeEx(int initsigs) This function works like "Py_Initialize()" if *initsigs* is "1". If *initsigs* is "0", it skips initialization registration of signal handlers, which might be useful when Python is embedded. int Py_IsInitialized() Return true (nonzero) when the Python interpreter has been initialized, false (zero) if not. After "Py_FinalizeEx()" is called, this returns false until "Py_Initialize()" is called again. int Py_FinalizeEx() Undo all initializations made by "Py_Initialize()" and subsequent use of Python/C API functions, and destroy all sub-interpreters (see "Py_NewInterpreter()" below) that were created and not yet destroyed since the last call to "Py_Initialize()". Ideally, this frees all memory allocated by the Python interpreter. This is a no-op when called for a second time (without calling "Py_Initialize()" again first). Normally the return value is "0". If there were errors during finalization (flushing buffered data), "-1" is returned. This function is provided for a number of reasons. An embedding application might want to restart Python without having to restart the application itself. An application that has loaded the Python interpreter from a dynamically loadable library (or DLL) might want to free all memory allocated by Python before unloading the DLL. During a hunt for memory leaks in an application a developer might want to free all memory allocated by Python before exiting from the application. **Bugs and caveats:** The destruction of modules and objects in modules is done in random order; this may cause destructors ("__del__()" methods) to fail when they depend on other objects (even functions) or modules. Dynamically loaded extension modules loaded by Python are not unloaded. Small amounts of memory allocated by the Python interpreter may not be freed (if you find a leak, please report it). Memory tied up in circular references between objects is not freed. Some memory allocated by extension modules may not be freed. Some extensions may not work properly if their initialization routine is called more than once; this can happen if an application calls "Py_Initialize()" and "Py_FinalizeEx()" more than once. Raises an auditing event "cpython._PySys_ClearAuditHooks" with no arguments. New in version 3.6. void Py_Finalize() This is a backwards-compatible version of "Py_FinalizeEx()" that disregards the return value. Process-wide parameters ======================= int Py_SetStandardStreamEncoding(const char *encoding, const char *errors) This function should be called before "Py_Initialize()", if it is called at all. It specifies which encoding and error handling to use with standard IO, with the same meanings as in "str.encode()". It overrides "PYTHONIOENCODING" values, and allows embedding code to control IO encoding when the environment variable does not work. *encoding* and/or *errors* may be "NULL" to use "PYTHONIOENCODING" and/or default values (depending on other settings). Note that "sys.stderr" always uses the “backslashreplace” error handler, regardless of this (or any other) setting. If "Py_FinalizeEx()" is called, this function will need to be called again in order to affect subsequent calls to "Py_Initialize()". Returns "0" if successful, a nonzero value on error (e.g. calling after the interpreter has already been initialized). New in version 3.4. void Py_SetProgramName(const wchar_t *name) This function should be called before "Py_Initialize()" is called for the first time, if it is called at all. It tells the interpreter the value of the "argv[0]" argument to the "main()" function of the program (converted to wide characters). This is used by "Py_GetPath()" and some other functions below to find the Python run-time libraries relative to the interpreter executable. The default value is "'python'". The argument should point to a zero-terminated wide character string in static storage whose contents will not change for the duration of the program’s execution. No code in the Python interpreter will change the contents of this storage. Use "Py_DecodeLocale()" to decode a bytes string to get a "wchar_*" string. wchar* Py_GetProgramName() Return the program name set with "Py_SetProgramName()", or the default. The returned string points into static storage; the caller should not modify its value. wchar_t* Py_GetPrefix() Return the *prefix* for installed platform-independent files. This is derived through a number of complicated rules from the program name set with "Py_SetProgramName()" and some environment variables; for example, if the program name is "'/usr/local/bin/python'", the prefix is "'/usr/local'". The returned string points into static storage; the caller should not modify its value. This corresponds to the **prefix** variable in the top-level "Makefile" and the "-- prefix" argument to the **configure** script at build time. The value is available to Python code as "sys.prefix". It is only useful on Unix. See also the next function. wchar_t* Py_GetExecPrefix() Return the *exec-prefix* for installed platform-*dependent* files. This is derived through a number of complicated rules from the program name set with "Py_SetProgramName()" and some environment variables; for example, if the program name is "'/usr/local/bin/python'", the exec-prefix is "'/usr/local'". The returned string points into static storage; the caller should not modify its value. This corresponds to the **exec_prefix** variable in the top-level "Makefile" and the "--exec-prefix" argument to the **configure** script at build time. The value is available to Python code as "sys.exec_prefix". It is only useful on Unix. Background: The exec-prefix differs from the prefix when platform dependent files (such as executables and shared libraries) are installed in a different directory tree. In a typical installation, platform dependent files may be installed in the "/usr/local/plat" subtree while platform independent may be installed in "/usr/local". Generally speaking, a platform is a combination of hardware and software families, e.g. Sparc machines running the Solaris 2.x operating system are considered the same platform, but Intel machines running Solaris 2.x are another platform, and Intel machines running Linux are yet another platform. Different major revisions of the same operating system generally also form different platforms. Non-Unix operating systems are a different story; the installation strategies on those systems are so different that the prefix and exec-prefix are meaningless, and set to the empty string. Note that compiled Python bytecode files are platform independent (but not independent from the Python version by which they were compiled!). System administrators will know how to configure the **mount** or **automount** programs to share "/usr/local" between platforms while having "/usr/local/plat" be a different filesystem for each platform. wchar_t* Py_GetProgramFullPath() Return the full program name of the Python executable; this is computed as a side-effect of deriving the default module search path from the program name (set by "Py_SetProgramName()" above). The returned string points into static storage; the caller should not modify its value. The value is available to Python code as "sys.executable". wchar_t* Py_GetPath() Return the default module search path; this is computed from the program name (set by "Py_SetProgramName()" above) and some environment variables. The returned string consists of a series of directory names separated by a platform dependent delimiter character. The delimiter character is "':'" on Unix and macOS, "';'" on Windows. The returned string points into static storage; the caller should not modify its value. The list "sys.path" is initialized with this value on interpreter startup; it can be (and usually is) modified later to change the search path for loading modules. void Py_SetPath(const wchar_t *) Set the default module search path. If this function is called before "Py_Initialize()", then "Py_GetPath()" won’t attempt to compute a default search path but uses the one provided instead. This is useful if Python is embedded by an application that has full knowledge of the location of all modules. The path components should be separated by the platform dependent delimiter character, which is "':'" on Unix and macOS, "';'" on Windows. This also causes "sys.executable" to be set to the program full path (see "Py_GetProgramFullPath()") and for "sys.prefix" and "sys.exec_prefix" to be empty. It is up to the caller to modify these if required after calling "Py_Initialize()". Use "Py_DecodeLocale()" to decode a bytes string to get a "wchar_*" string. The path argument is copied internally, so the caller may free it after the call completes. Changed in version 3.8: The program full path is now used for "sys.executable", instead of the program name. const char* Py_GetVersion() Return the version of this Python interpreter. This is a string that looks something like "3.0a5+ (py3k:63103M, May 12 2008, 00:53:55) \n[GCC 4.2.3]" The first word (up to the first space character) is the current Python version; the first characters are the major and minor version separated by a period. The returned string points into static storage; the caller should not modify its value. The value is available to Python code as "sys.version". const char* Py_GetPlatform() Return the platform identifier for the current platform. On Unix, this is formed from the “official” name of the operating system, converted to lower case, followed by the major revision number; e.g., for Solaris 2.x, which is also known as SunOS 5.x, the value is "'sunos5'". On macOS, it is "'darwin'". On Windows, it is "'win'". The returned string points into static storage; the caller should not modify its value. The value is available to Python code as "sys.platform". const char* Py_GetCopyright() Return the official copyright string for the current Python version, for example "'Copyright 1991-1995 Stichting Mathematisch Centrum, Amsterdam'" The returned string points into static storage; the caller should not modify its value. The value is available to Python code as "sys.copyright". const char* Py_GetCompiler() Return an indication of the compiler used to build the current Python version, in square brackets, for example: "[GCC 2.7.2.2]" The returned string points into static storage; the caller should not modify its value. The value is available to Python code as part of the variable "sys.version". const char* Py_GetBuildInfo() Return information about the sequence number and build date and time of the current Python interpreter instance, for example "#67, Aug 1 1997, 22:34:28" The returned string points into static storage; the caller should not modify its value. The value is available to Python code as part of the variable "sys.version". void PySys_SetArgvEx(int argc, wchar_t **argv, int updatepath) Set "sys.argv" based on *argc* and *argv*. These parameters are similar to those passed to the program’s "main()" function with the difference that the first entry should refer to the script file to be executed rather than the executable hosting the Python interpreter. If there isn’t a script that will be run, the first entry in *argv* can be an empty string. If this function fails to initialize "sys.argv", a fatal condition is signalled using "Py_FatalError()". If *updatepath* is zero, this is all the function does. If *updatepath* is non-zero, the function also modifies "sys.path" according to the following algorithm: * If the name of an existing script is passed in "argv[0]", the absolute path of the directory where the script is located is prepended to "sys.path". * Otherwise (that is, if *argc* is "0" or "argv[0]" doesn’t point to an existing file name), an empty string is prepended to "sys.path", which is the same as prepending the current working directory ("".""). Use "Py_DecodeLocale()" to decode a bytes string to get a "wchar_*" string. Note: It is recommended that applications embedding the Python interpreter for purposes other than executing a single script pass "0" as *updatepath*, and update "sys.path" themselves if desired. See CVE-2008-5983.On versions before 3.1.3, you can achieve the same effect by manually popping the first "sys.path" element after having called "PySys_SetArgv()", for example using: PyRun_SimpleString("import sys; sys.path.pop(0)\n"); New in version 3.1.3. void PySys_SetArgv(int argc, wchar_t **argv) This function works like "PySys_SetArgvEx()" with *updatepath* set to "1" unless the **python** interpreter was started with the "-I". Use "Py_DecodeLocale()" to decode a bytes string to get a "wchar_*" string. Changed in version 3.4: The *updatepath* value depends on "-I". void Py_SetPythonHome(const wchar_t *home) Set the default “home” directory, that is, the location of the standard Python libraries. See "PYTHONHOME" for the meaning of the argument string. The argument should point to a zero-terminated character string in static storage whose contents will not change for the duration of the program’s execution. No code in the Python interpreter will change the contents of this storage. Use "Py_DecodeLocale()" to decode a bytes string to get a "wchar_*" string. w_char* Py_GetPythonHome() Return the default “home”, that is, the value set by a previous call to "Py_SetPythonHome()", or the value of the "PYTHONHOME" environment variable if it is set. Thread State and the Global Interpreter Lock ============================================ The Python interpreter is not fully thread-safe. In order to support multi-threaded Python programs, there’s a global lock, called the *global interpreter lock* or *GIL*, that must be held by the current thread before it can safely access Python objects. Without the lock, even the simplest operations could cause problems in a multi-threaded program: for example, when two threads simultaneously increment the reference count of the same object, the reference count could end up being incremented only once instead of twice. Therefore, the rule exists that only the thread that has acquired the *GIL* may operate on Python objects or call Python/C API functions. In order to emulate concurrency of execution, the interpreter regularly tries to switch threads (see "sys.setswitchinterval()"). The lock is also released around potentially blocking I/O operations like reading or writing a file, so that other Python threads can run in the meantime. The Python interpreter keeps some thread-specific bookkeeping information inside a data structure called "PyThreadState". There’s also one global variable pointing to the current "PyThreadState": it can be retrieved using "PyThreadState_Get()". Releasing the GIL from extension code ------------------------------------- Most extension code manipulating the *GIL* has the following simple structure: Save the thread state in a local variable. Release the global interpreter lock. ... Do some blocking I/O operation ... Reacquire the global interpreter lock. Restore the thread state from the local variable. This is so common that a pair of macros exists to simplify it: Py_BEGIN_ALLOW_THREADS ... Do some blocking I/O operation ... Py_END_ALLOW_THREADS The "Py_BEGIN_ALLOW_THREADS" macro opens a new block and declares a hidden local variable; the "Py_END_ALLOW_THREADS" macro closes the block. The block above expands to the following code: PyThreadState *_save; _save = PyEval_SaveThread(); ... Do some blocking I/O operation ... PyEval_RestoreThread(_save); Here is how these functions work: the global interpreter lock is used to protect the pointer to the current thread state. When releasing the lock and saving the thread state, the current thread state pointer must be retrieved before the lock is released (since another thread could immediately acquire the lock and store its own thread state in the global variable). Conversely, when acquiring the lock and restoring the thread state, the lock must be acquired before storing the thread state pointer. Note: Calling system I/O functions is the most common use case for releasing the GIL, but it can also be useful before calling long- running computations which don’t need access to Python objects, such as compression or cryptographic functions operating over memory buffers. For example, the standard "zlib" and "hashlib" modules release the GIL when compressing or hashing data. Non-Python created threads -------------------------- When threads are created using the dedicated Python APIs (such as the "threading" module), a thread state is automatically associated to them and the code showed above is therefore correct. However, when threads are created from C (for example by a third-party library with its own thread management), they don’t hold the GIL, nor is there a thread state structure for them. If you need to call Python code from these threads (often this will be part of a callback API provided by the aforementioned third-party library), you must first register these threads with the interpreter by creating a thread state data structure, then acquiring the GIL, and finally storing their thread state pointer, before you can start using the Python/C API. When you are done, you should reset the thread state pointer, release the GIL, and finally free the thread state data structure. The "PyGILState_Ensure()" and "PyGILState_Release()" functions do all of the above automatically. The typical idiom for calling into Python from a C thread is: PyGILState_STATE gstate; gstate = PyGILState_Ensure(); /* Perform Python actions here. */ result = CallSomeFunction(); /* evaluate result or handle exception */ /* Release the thread. No Python API allowed beyond this point. */ PyGILState_Release(gstate); Note that the "PyGILState_*()" functions assume there is only one global interpreter (created automatically by "Py_Initialize()"). Python supports the creation of additional interpreters (using "Py_NewInterpreter()"), but mixing multiple interpreters and the "PyGILState_*()" API is unsupported. Cautions about fork() --------------------- Another important thing to note about threads is their behaviour in the face of the C "fork()" call. On most systems with "fork()", after a process forks only the thread that issued the fork will exist. This has a concrete impact both on how locks must be handled and on all stored state in CPython’s runtime. The fact that only the “current” thread remains means any locks held by other threads will never be released. Python solves this for "os.fork()" by acquiring the locks it uses internally before the fork, and releasing them afterwards. In addition, it resets any Lock Objects in the child. When extending or embedding Python, there is no way to inform Python of additional (non-Python) locks that need to be acquired before or reset after a fork. OS facilities such as "pthread_atfork()" would need to be used to accomplish the same thing. Additionally, when extending or embedding Python, calling "fork()" directly rather than through "os.fork()" (and returning to or calling into Python) may result in a deadlock by one of Python’s internal locks being held by a thread that is defunct after the fork. "PyOS_AfterFork_Child()" tries to reset the necessary locks, but is not always able to. The fact that all other threads go away also means that CPython’s runtime state there must be cleaned up properly, which "os.fork()" does. This means finalizing all other "PyThreadState" objects belonging to the current interpreter and all other "PyInterpreterState" objects. Due to this and the special nature of the “main” interpreter, "fork()" should only be called in that interpreter’s “main” thread, where the CPython global runtime was originally initialized. The only exception is if "exec()" will be called immediately after. High-level API -------------- These are the most commonly used types and functions when writing C extension code, or when embedding the Python interpreter: PyInterpreterState This data structure represents the state shared by a number of cooperating threads. Threads belonging to the same interpreter share their module administration and a few other internal items. There are no public members in this structure. Threads belonging to different interpreters initially share nothing, except process state like available memory, open file descriptors and such. The global interpreter lock is also shared by all threads, regardless of to which interpreter they belong. PyThreadState This data structure represents the state of a single thread. The only public data member is "interp" ("PyInterpreterState *"), which points to this thread’s interpreter state. void PyEval_InitThreads() Deprecated function which does nothing. In Python 3.6 and older, this function created the GIL if it didn’t exist. Changed in version 3.9: The function now does nothing. Changed in version 3.7: This function is now called by "Py_Initialize()", so you don’t have to call it yourself anymore. Changed in version 3.2: This function cannot be called before "Py_Initialize()" anymore. Deprecated since version 3.9, will be removed in version 3.11. int PyEval_ThreadsInitialized() Returns a non-zero value if "PyEval_InitThreads()" has been called. This function can be called without holding the GIL, and therefore can be used to avoid calls to the locking API when running single- threaded. Changed in version 3.7: The *GIL* is now initialized by "Py_Initialize()". Deprecated since version 3.9, will be removed in version 3.11. PyThreadState* PyEval_SaveThread() Release the global interpreter lock (if it has been created) and reset the thread state to "NULL", returning the previous thread state (which is not "NULL"). If the lock has been created, the current thread must have acquired it. void PyEval_RestoreThread(PyThreadState *tstate) Acquire the global interpreter lock (if it has been created) and set the thread state to *tstate*, which must not be "NULL". If the lock has been created, the current thread must not have acquired it, otherwise deadlock ensues. Note: Calling this function from a thread when the runtime is finalizing will terminate the thread, even if the thread was not created by Python. You can use "_Py_IsFinalizing()" or "sys.is_finalizing()" to check if the interpreter is in process of being finalized before calling this function to avoid unwanted termination. PyThreadState* PyThreadState_Get() Return the current thread state. The global interpreter lock must be held. When the current thread state is "NULL", this issues a fatal error (so that the caller needn’t check for "NULL"). PyThreadState* PyThreadState_Swap(PyThreadState *tstate) Swap the current thread state with the thread state given by the argument *tstate*, which may be "NULL". The global interpreter lock must be held and is not released. The following functions use thread-local storage, and are not compatible with sub-interpreters: PyGILState_STATE PyGILState_Ensure() Ensure that the current thread is ready to call the Python C API regardless of the current state of Python, or of the global interpreter lock. This may be called as many times as desired by a thread as long as each call is matched with a call to "PyGILState_Release()". In general, other thread-related APIs may be used between "PyGILState_Ensure()" and "PyGILState_Release()" calls as long as the thread state is restored to its previous state before the Release(). For example, normal usage of the "Py_BEGIN_ALLOW_THREADS" and "Py_END_ALLOW_THREADS" macros is acceptable. The return value is an opaque “handle” to the thread state when "PyGILState_Ensure()" was called, and must be passed to "PyGILState_Release()" to ensure Python is left in the same state. Even though recursive calls are allowed, these handles *cannot* be shared - each unique call to "PyGILState_Ensure()" must save the handle for its call to "PyGILState_Release()". When the function returns, the current thread will hold the GIL and be able to call arbitrary Python code. Failure is a fatal error. Note: Calling this function from a thread when the runtime is finalizing will terminate the thread, even if the thread was not created by Python. You can use "_Py_IsFinalizing()" or "sys.is_finalizing()" to check if the interpreter is in process of being finalized before calling this function to avoid unwanted termination. void PyGILState_Release(PyGILState_STATE) Release any resources previously acquired. After this call, Python’s state will be the same as it was prior to the corresponding "PyGILState_Ensure()" call (but generally this state will be unknown to the caller, hence the use of the GILState API). Every call to "PyGILState_Ensure()" must be matched by a call to "PyGILState_Release()" on the same thread. PyThreadState* PyGILState_GetThisThreadState() Get the current thread state for this thread. May return "NULL" if no GILState API has been used on the current thread. Note that the main thread always has such a thread-state, even if no auto-thread- state call has been made on the main thread. This is mainly a helper/diagnostic function. int PyGILState_Check() Return "1" if the current thread is holding the GIL and "0" otherwise. This function can be called from any thread at any time. Only if it has had its Python thread state initialized and currently is holding the GIL will it return "1". This is mainly a helper/diagnostic function. It can be useful for example in callback contexts or memory allocation functions when knowing that the GIL is locked can allow the caller to perform sensitive actions or otherwise behave differently. New in version 3.4. The following macros are normally used without a trailing semicolon; look for example usage in the Python source distribution. Py_BEGIN_ALLOW_THREADS This macro expands to "{ PyThreadState *_save; _save = PyEval_SaveThread();". Note that it contains an opening brace; it must be matched with a following "Py_END_ALLOW_THREADS" macro. See above for further discussion of this macro. Py_END_ALLOW_THREADS This macro expands to "PyEval_RestoreThread(_save); }". Note that it contains a closing brace; it must be matched with an earlier "Py_BEGIN_ALLOW_THREADS" macro. See above for further discussion of this macro. Py_BLOCK_THREADS This macro expands to "PyEval_RestoreThread(_save);": it is equivalent to "Py_END_ALLOW_THREADS" without the closing brace. Py_UNBLOCK_THREADS This macro expands to "_save = PyEval_SaveThread();": it is equivalent to "Py_BEGIN_ALLOW_THREADS" without the opening brace and variable declaration. Low-level API ------------- All of the following functions must be called after "Py_Initialize()". Changed in version 3.7: "Py_Initialize()" now initializes the *GIL*. PyInterpreterState* PyInterpreterState_New() Create a new interpreter state object. The global interpreter lock need not be held, but may be held if it is necessary to serialize calls to this function. Raises an auditing event "cpython.PyInterpreterState_New" with no arguments. void PyInterpreterState_Clear(PyInterpreterState *interp) Reset all information in an interpreter state object. The global interpreter lock must be held. Raises an auditing event "cpython.PyInterpreterState_Clear" with no arguments. void PyInterpreterState_Delete(PyInterpreterState *interp) Destroy an interpreter state object. The global interpreter lock need not be held. The interpreter state must have been reset with a previous call to "PyInterpreterState_Clear()". PyThreadState* PyThreadState_New(PyInterpreterState *interp) Create a new thread state object belonging to the given interpreter object. The global interpreter lock need not be held, but may be held if it is necessary to serialize calls to this function. void PyThreadState_Clear(PyThreadState *tstate) Reset all information in a thread state object. The global interpreter lock must be held. Changed in version 3.9: This function now calls the "PyThreadState.on_delete" callback. Previously, that happened in "PyThreadState_Delete()". void PyThreadState_Delete(PyThreadState *tstate) Destroy a thread state object. The global interpreter lock need not be held. The thread state must have been reset with a previous call to "PyThreadState_Clear()". void PyThreadState_DeleteCurrent(void) Destroy the current thread state and release the global interpreter lock. Like "PyThreadState_Delete()", the global interpreter lock need not be held. The thread state must have been reset with a previous call to "PyThreadState_Clear()". PyFrameObject* PyThreadState_GetFrame(PyThreadState *tstate) Get the current frame of the Python thread state *tstate*. Return a strong reference. Return "NULL" if no frame is currently executing. See also "PyEval_GetFrame()". *tstate* must not be "NULL". New in version 3.9. uint64_t PyThreadState_GetID(PyThreadState *tstate) Get the unique thread state identifier of the Python thread state *tstate*. *tstate* must not be "NULL". New in version 3.9. PyInterpreterState* PyThreadState_GetInterpreter(PyThreadState *tstate) Get the interpreter of the Python thread state *tstate*. *tstate* must not be "NULL". New in version 3.9. PyInterpreterState* PyInterpreterState_Get(void) Get the current interpreter. Issue a fatal error if there no current Python thread state or no current interpreter. It cannot return NULL. The caller must hold the GIL. New in version 3.9. int64_t PyInterpreterState_GetID(PyInterpreterState *interp) Return the interpreter’s unique ID. If there was any error in doing so then "-1" is returned and an error is set. The caller must hold the GIL. New in version 3.7. PyObject* PyInterpreterState_GetDict(PyInterpreterState *interp) Return a dictionary in which interpreter-specific data may be stored. If this function returns "NULL" then no exception has been raised and the caller should assume no interpreter-specific dict is available. This is not a replacement for "PyModule_GetState()", which extensions should use to store interpreter-specific state information. New in version 3.8. PyObject* (*_PyFrameEvalFunction)(PyThreadState *tstate, PyFrameObject *frame, int throwflag) Type of a frame evaluation function. The *throwflag* parameter is used by the "throw()" method of generators: if non-zero, handle the current exception. Changed in version 3.9: The function now takes a *tstate* parameter. _PyFrameEvalFunction _PyInterpreterState_GetEvalFrameFunc(PyInterpreterState *interp) Get the frame evaluation function. See the **PEP 523** “Adding a frame evaluation API to CPython”. New in version 3.9. void _PyInterpreterState_SetEvalFrameFunc(PyInterpreterState *interp, _PyFrameEvalFunction eval_frame) Set the frame evaluation function. See the **PEP 523** “Adding a frame evaluation API to CPython”. New in version 3.9. PyObject* PyThreadState_GetDict() *Return value: Borrowed reference.* Return a dictionary in which extensions can store thread-specific state information. Each extension should use a unique key to use to store state in the dictionary. It is okay to call this function when no current thread state is available. If this function returns "NULL", no exception has been raised and the caller should assume no current thread state is available. int PyThreadState_SetAsyncExc(unsigned long id, PyObject *exc) Asynchronously raise an exception in a thread. The *id* argument is the thread id of the target thread; *exc* is the exception object to be raised. This function does not steal any references to *exc*. To prevent naive misuse, you must write your own C extension to call this. Must be called with the GIL held. Returns the number of thread states modified; this is normally one, but will be zero if the thread id isn’t found. If *exc* is "NULL", the pending exception (if any) for the thread is cleared. This raises no exceptions. Changed in version 3.7: The type of the *id* parameter changed from "long" to "unsigned long". void PyEval_AcquireThread(PyThreadState *tstate) Acquire the global interpreter lock and set the current thread state to *tstate*, which must not be "NULL". The lock must have been created earlier. If this thread already has the lock, deadlock ensues. Note: Calling this function from a thread when the runtime is finalizing will terminate the thread, even if the thread was not created by Python. You can use "_Py_IsFinalizing()" or "sys.is_finalizing()" to check if the interpreter is in process of being finalized before calling this function to avoid unwanted termination. Changed in version 3.8: Updated to be consistent with "PyEval_RestoreThread()", "Py_END_ALLOW_THREADS()", and "PyGILState_Ensure()", and terminate the current thread if called while the interpreter is finalizing. "PyEval_RestoreThread()" is a higher-level function which is always available (even when threads have not been initialized). void PyEval_ReleaseThread(PyThreadState *tstate) Reset the current thread state to "NULL" and release the global interpreter lock. The lock must have been created earlier and must be held by the current thread. The *tstate* argument, which must not be "NULL", is only used to check that it represents the current thread state — if it isn’t, a fatal error is reported. "PyEval_SaveThread()" is a higher-level function which is always available (even when threads have not been initialized). void PyEval_AcquireLock() Acquire the global interpreter lock. The lock must have been created earlier. If this thread already has the lock, a deadlock ensues. Deprecated since version 3.2: This function does not update the current thread state. Please use "PyEval_RestoreThread()" or "PyEval_AcquireThread()" instead. Note: Calling this function from a thread when the runtime is finalizing will terminate the thread, even if the thread was not created by Python. You can use "_Py_IsFinalizing()" or "sys.is_finalizing()" to check if the interpreter is in process of being finalized before calling this function to avoid unwanted termination. Changed in version 3.8: Updated to be consistent with "PyEval_RestoreThread()", "Py_END_ALLOW_THREADS()", and "PyGILState_Ensure()", and terminate the current thread if called while the interpreter is finalizing. void PyEval_ReleaseLock() Release the global interpreter lock. The lock must have been created earlier. Deprecated since version 3.2: This function does not update the current thread state. Please use "PyEval_SaveThread()" or "PyEval_ReleaseThread()" instead. Sub-interpreter support ======================= While in most uses, you will only embed a single Python interpreter, there are cases where you need to create several independent interpreters in the same process and perhaps even in the same thread. Sub-interpreters allow you to do that. The “main” interpreter is the first one created when the runtime initializes. It is usually the only Python interpreter in a process. Unlike sub-interpreters, the main interpreter has unique process- global responsibilities like signal handling. It is also responsible for execution during runtime initialization and is usually the active interpreter during runtime finalization. The "PyInterpreterState_Main()" function returns a pointer to its state. You can switch between sub-interpreters using the "PyThreadState_Swap()" function. You can create and destroy them using the following functions: PyThreadState* Py_NewInterpreter() Create a new sub-interpreter. This is an (almost) totally separate environment for the execution of Python code. In particular, the new interpreter has separate, independent versions of all imported modules, including the fundamental modules "builtins", "__main__" and "sys". The table of loaded modules ("sys.modules") and the module search path ("sys.path") are also separate. The new environment has no "sys.argv" variable. It has new standard I/O stream file objects "sys.stdin", "sys.stdout" and "sys.stderr" (however these refer to the same underlying file descriptors). The return value points to the first thread state created in the new sub-interpreter. This thread state is made in the current thread state. Note that no actual thread is created; see the discussion of thread states below. If creation of the new interpreter is unsuccessful, "NULL" is returned; no exception is set since the exception state is stored in the current thread state and there may not be a current thread state. (Like all other Python/C API functions, the global interpreter lock must be held before calling this function and is still held when it returns; however, unlike most other Python/C API functions, there needn’t be a current thread state on entry.) Extension modules are shared between (sub-)interpreters as follows: * For modules using multi-phase initialization, e.g. "PyModule_FromDefAndSpec()", a separate module object is created and initialized for each interpreter. Only C-level static and global variables are shared between these module objects. * For modules using single-phase initialization, e.g. "PyModule_Create()", the first time a particular extension is imported, it is initialized normally, and a (shallow) copy of its module’s dictionary is squirreled away. When the same extension is imported by another (sub-)interpreter, a new module is initialized and filled with the contents of this copy; the extension’s "init" function is not called. Objects in the module’s dictionary thus end up shared across (sub-)interpreters, which might cause unwanted behavior (see Bugs and caveats below). Note that this is different from what happens when an extension is imported after the interpreter has been completely re- initialized by calling "Py_FinalizeEx()" and "Py_Initialize()"; in that case, the extension’s "initmodule" function *is* called again. As with multi-phase initialization, this means that only C-level static and global variables are shared between these modules. void Py_EndInterpreter(PyThreadState *tstate) Destroy the (sub-)interpreter represented by the given thread state. The given thread state must be the current thread state. See the discussion of thread states below. When the call returns, the current thread state is "NULL". All thread states associated with this interpreter are destroyed. (The global interpreter lock must be held before calling this function and is still held when it returns.) "Py_FinalizeEx()" will destroy all sub-interpreters that haven’t been explicitly destroyed at that point. Bugs and caveats ---------------- Because sub-interpreters (and the main interpreter) are part of the same process, the insulation between them isn’t perfect — for example, using low-level file operations like "os.close()" they can (accidentally or maliciously) affect each other’s open files. Because of the way extensions are shared between (sub-)interpreters, some extensions may not work properly; this is especially likely when using single-phase initialization or (static) global variables. It is possible to insert objects created in one sub-interpreter into a namespace of another (sub-)interpreter; this should be avoided if possible. Special care should be taken to avoid sharing user-defined functions, methods, instances or classes between sub-interpreters, since import operations executed by such objects may affect the wrong (sub-)interpreter’s dictionary of loaded modules. It is equally important to avoid sharing objects from which the above are reachable. Also note that combining this functionality with "PyGILState_*()" APIs is delicate, because these APIs assume a bijection between Python thread states and OS-level threads, an assumption broken by the presence of sub-interpreters. It is highly recommended that you don’t switch sub-interpreters between a pair of matching "PyGILState_Ensure()" and "PyGILState_Release()" calls. Furthermore, extensions (such as "ctypes") using these APIs to allow calling of Python code from non-Python created threads will probably be broken when using sub-interpreters. Asynchronous Notifications ========================== A mechanism is provided to make asynchronous notifications to the main interpreter thread. These notifications take the form of a function pointer and a void pointer argument. int Py_AddPendingCall(int (*func)(void *), void *arg) Schedule a function to be called from the main interpreter thread. On success, "0" is returned and *func* is queued for being called in the main thread. On failure, "-1" is returned without setting any exception. When successfully queued, *func* will be *eventually* called from the main interpreter thread with the argument *arg*. It will be called asynchronously with respect to normally running Python code, but with both these conditions met: * on a *bytecode* boundary; * with the main thread holding the *global interpreter lock* (*func* can therefore use the full C API). *func* must return "0" on success, or "-1" on failure with an exception set. *func* won’t be interrupted to perform another asynchronous notification recursively, but it can still be interrupted to switch threads if the global interpreter lock is released. This function doesn’t need a current thread state to run, and it doesn’t need the global interpreter lock. To call this function in a subinterpreter, the caller must hold the GIL. Otherwise, the function *func* can be scheduled to be called from the wrong interpreter. Warning: This is a low-level function, only useful for very special cases. There is no guarantee that *func* will be called as quick as possible. If the main thread is busy executing a system call, *func* won’t be called before the system call returns. This function is generally **not** suitable for calling Python code from arbitrary C threads. Instead, use the PyGILState API. Changed in version 3.9: If this function is called in a subinterpreter, the function *func* is now scheduled to be called from the subinterpreter, rather than being called from the main interpreter. Each subinterpreter now has its own list of scheduled calls. New in version 3.1. Profiling and Tracing ===================== The Python interpreter provides some low-level support for attaching profiling and execution tracing facilities. These are used for profiling, debugging, and coverage analysis tools. This C interface allows the profiling or tracing code to avoid the overhead of calling through Python-level callable objects, making a direct C function call instead. The essential attributes of the facility have not changed; the interface allows trace functions to be installed per-thread, and the basic events reported to the trace function are the same as had been reported to the Python-level trace functions in previous versions. int (*Py_tracefunc)(PyObject *obj, PyFrameObject *frame, int what, PyObject *arg) The type of the trace function registered using "PyEval_SetProfile()" and "PyEval_SetTrace()". The first parameter is the object passed to the registration function as *obj*, *frame* is the frame object to which the event pertains, *what* is one of the constants "PyTrace_CALL", "PyTrace_EXCEPTION", "PyTrace_LINE", "PyTrace_RETURN", "PyTrace_C_CALL", "PyTrace_C_EXCEPTION", "PyTrace_C_RETURN", or "PyTrace_OPCODE", and *arg* depends on the value of *what*: +--------------------------------+------------------------------------------+ | Value of *what* | Meaning of *arg* | |================================|==========================================| | "PyTrace_CALL" | Always "Py_None". | +--------------------------------+------------------------------------------+ | "PyTrace_EXCEPTION" | Exception information as returned by | | | "sys.exc_info()". | +--------------------------------+------------------------------------------+ | "PyTrace_LINE" | Always "Py_None". | +--------------------------------+------------------------------------------+ | "PyTrace_RETURN" | Value being returned to the caller, or | | | "NULL" if caused by an exception. | +--------------------------------+------------------------------------------+ | "PyTrace_C_CALL" | Function object being called. | +--------------------------------+------------------------------------------+ | "PyTrace_C_EXCEPTION" | Function object being called. | +--------------------------------+------------------------------------------+ | "PyTrace_C_RETURN" | Function object being called. | +--------------------------------+------------------------------------------+ | "PyTrace_OPCODE" | Always "Py_None". | +--------------------------------+------------------------------------------+ int PyTrace_CALL The value of the *what* parameter to a "Py_tracefunc" function when a new call to a function or method is being reported, or a new entry into a generator. Note that the creation of the iterator for a generator function is not reported as there is no control transfer to the Python bytecode in the corresponding frame. int PyTrace_EXCEPTION The value of the *what* parameter to a "Py_tracefunc" function when an exception has been raised. The callback function is called with this value for *what* when after any bytecode is processed after which the exception becomes set within the frame being executed. The effect of this is that as exception propagation causes the Python stack to unwind, the callback is called upon return to each frame as the exception propagates. Only trace functions receives these events; they are not needed by the profiler. int PyTrace_LINE The value passed as the *what* parameter to a "Py_tracefunc" function (but not a profiling function) when a line-number event is being reported. It may be disabled for a frame by setting "f_trace_lines" to *0* on that frame. int PyTrace_RETURN The value for the *what* parameter to "Py_tracefunc" functions when a call is about to return. int PyTrace_C_CALL The value for the *what* parameter to "Py_tracefunc" functions when a C function is about to be called. int PyTrace_C_EXCEPTION The value for the *what* parameter to "Py_tracefunc" functions when a C function has raised an exception. int PyTrace_C_RETURN The value for the *what* parameter to "Py_tracefunc" functions when a C function has returned. int PyTrace_OPCODE The value for the *what* parameter to "Py_tracefunc" functions (but not profiling functions) when a new opcode is about to be executed. This event is not emitted by default: it must be explicitly requested by setting "f_trace_opcodes" to *1* on the frame. void PyEval_SetProfile(Py_tracefunc func, PyObject *obj) Set the profiler function to *func*. The *obj* parameter is passed to the function as its first parameter, and may be any Python object, or "NULL". If the profile function needs to maintain state, using a different value for *obj* for each thread provides a convenient and thread-safe place to store it. The profile function is called for all monitored events except "PyTrace_LINE" "PyTrace_OPCODE" and "PyTrace_EXCEPTION". The caller must hold the *GIL*. void PyEval_SetTrace(Py_tracefunc func, PyObject *obj) Set the tracing function to *func*. This is similar to "PyEval_SetProfile()", except the tracing function does receive line-number events and per-opcode events, but does not receive any event related to C function objects being called. Any trace function registered using "PyEval_SetTrace()" will not receive "PyTrace_C_CALL", "PyTrace_C_EXCEPTION" or "PyTrace_C_RETURN" as a value for the *what* parameter. The caller must hold the *GIL*. Advanced Debugger Support ========================= These functions are only intended to be used by advanced debugging tools. PyInterpreterState* PyInterpreterState_Head() Return the interpreter state object at the head of the list of all such objects. PyInterpreterState* PyInterpreterState_Main() Return the main interpreter state object. PyInterpreterState* PyInterpreterState_Next(PyInterpreterState *interp) Return the next interpreter state object after *interp* from the list of all such objects. PyThreadState * PyInterpreterState_ThreadHead(PyInterpreterState *interp) Return the pointer to the first "PyThreadState" object in the list of threads associated with the interpreter *interp*. PyThreadState* PyThreadState_Next(PyThreadState *tstate) Return the next thread state object after *tstate* from the list of all such objects belonging to the same "PyInterpreterState" object. Thread Local Storage Support ============================ The Python interpreter provides low-level support for thread-local storage (TLS) which wraps the underlying native TLS implementation to support the Python-level thread local storage API ("threading.local"). The CPython C level APIs are similar to those offered by pthreads and Windows: use a thread key and functions to associate a "void*" value per thread. The GIL does *not* need to be held when calling these functions; they supply their own locking. Note that "Python.h" does not include the declaration of the TLS APIs, you need to include "pythread.h" to use thread-local storage. Note: None of these API functions handle memory management on behalf of the "void*" values. You need to allocate and deallocate them yourself. If the "void*" values happen to be "PyObject*", these functions don’t do refcount operations on them either. Thread Specific Storage (TSS) API --------------------------------- TSS API is introduced to supersede the use of the existing TLS API within the CPython interpreter. This API uses a new type "Py_tss_t" instead of "int" to represent thread keys. New in version 3.7. See also: “A New C-API for Thread-Local Storage in CPython” (**PEP 539**) Py_tss_t This data structure represents the state of a thread key, the definition of which may depend on the underlying TLS implementation, and it has an internal field representing the key’s initialization state. There are no public members in this structure. When Py_LIMITED_API is not defined, static allocation of this type by "Py_tss_NEEDS_INIT" is allowed. Py_tss_NEEDS_INIT This macro expands to the initializer for "Py_tss_t" variables. Note that this macro won’t be defined with Py_LIMITED_API. Dynamic Allocation ~~~~~~~~~~~~~~~~~~ Dynamic allocation of the "Py_tss_t", required in extension modules built with Py_LIMITED_API, where static allocation of this type is not possible due to its implementation being opaque at build time. Py_tss_t* PyThread_tss_alloc() Return a value which is the same state as a value initialized with "Py_tss_NEEDS_INIT", or "NULL" in the case of dynamic allocation failure. void PyThread_tss_free(Py_tss_t *key) Free the given *key* allocated by "PyThread_tss_alloc()", after first calling "PyThread_tss_delete()" to ensure any associated thread locals have been unassigned. This is a no-op if the *key* argument is *NULL*. Note: A freed key becomes a dangling pointer. You should reset the key to *NULL*. Methods ~~~~~~~ The parameter *key* of these functions must not be "NULL". Moreover, the behaviors of "PyThread_tss_set()" and "PyThread_tss_get()" are undefined if the given "Py_tss_t" has not been initialized by "PyThread_tss_create()". int PyThread_tss_is_created(Py_tss_t *key) Return a non-zero value if the given "Py_tss_t" has been initialized by "PyThread_tss_create()". int PyThread_tss_create(Py_tss_t *key) Return a zero value on successful initialization of a TSS key. The behavior is undefined if the value pointed to by the *key* argument is not initialized by "Py_tss_NEEDS_INIT". This function can be called repeatedly on the same key – calling it on an already initialized key is a no-op and immediately returns success. void PyThread_tss_delete(Py_tss_t *key) Destroy a TSS key to forget the values associated with the key across all threads, and change the key’s initialization state to uninitialized. A destroyed key is able to be initialized again by "PyThread_tss_create()". This function can be called repeatedly on the same key – calling it on an already destroyed key is a no-op. int PyThread_tss_set(Py_tss_t *key, void *value) Return a zero value to indicate successfully associating a "void*" value with a TSS key in the current thread. Each thread has a distinct mapping of the key to a "void*" value. void* PyThread_tss_get(Py_tss_t *key) Return the "void*" value associated with a TSS key in the current thread. This returns "NULL" if no value is associated with the key in the current thread. Thread Local Storage (TLS) API ------------------------------ Deprecated since version 3.7: This API is superseded by Thread Specific Storage (TSS) API. Note: This version of the API does not support platforms where the native TLS key is defined in a way that cannot be safely cast to "int". On such platforms, "PyThread_create_key()" will return immediately with a failure status, and the other TLS functions will all be no-ops on such platforms. Due to the compatibility problem noted above, this version of the API should not be used in new code. int PyThread_create_key() void PyThread_delete_key(int key) int PyThread_set_key_value(int key, void *value) void* PyThread_get_key_value(int key) void PyThread_delete_key_value(int key) void PyThread_ReInitTLS() Python Initialization Configuration *********************************** New in version 3.8. Structures: * "PyConfig" * "PyPreConfig" * "PyStatus" * "PyWideStringList" Functions: * "PyConfig_Clear()" * "PyConfig_InitIsolatedConfig()" * "PyConfig_InitPythonConfig()" * "PyConfig_Read()" * "PyConfig_SetArgv()" * "PyConfig_SetBytesArgv()" * "PyConfig_SetBytesString()" * "PyConfig_SetString()" * "PyConfig_SetWideStringList()" * "PyPreConfig_InitIsolatedConfig()" * "PyPreConfig_InitPythonConfig()" * "PyStatus_Error()" * "PyStatus_Exception()" * "PyStatus_Exit()" * "PyStatus_IsError()" * "PyStatus_IsExit()" * "PyStatus_NoMemory()" * "PyStatus_Ok()" * "PyWideStringList_Append()" * "PyWideStringList_Insert()" * "Py_ExitStatusException()" * "Py_InitializeFromConfig()" * "Py_PreInitialize()" * "Py_PreInitializeFromArgs()" * "Py_PreInitializeFromBytesArgs()" * "Py_RunMain()" * "Py_GetArgcArgv()" The preconfiguration ("PyPreConfig" type) is stored in "_PyRuntime.preconfig" and the configuration ("PyConfig" type) is stored in "PyInterpreterState.config". See also Initialization, Finalization, and Threads. See also: **PEP 587** “Python Initialization Configuration”. PyWideStringList ================ PyWideStringList List of "wchar_t*" strings. If *length* is non-zero, *items* must be non-"NULL" and all strings must be non-"NULL". Methods: PyStatus PyWideStringList_Append(PyWideStringList *list, const wchar_t *item) Append *item* to *list*. Python must be preinitialized to call this function. PyStatus PyWideStringList_Insert(PyWideStringList *list, Py_ssize_t index, const wchar_t *item) Insert *item* into *list* at *index*. If *index* is greater than or equal to *list* length, append *item* to *list*. *index* must be greater than or equal to 0. Python must be preinitialized to call this function. Structure fields: Py_ssize_t length List length. wchar_t** items List items. PyStatus ======== PyStatus Structure to store an initialization function status: success, error or exit. For an error, it can store the C function name which created the error. Structure fields: int exitcode Exit code. Argument passed to "exit()". const char *err_msg Error message. const char *func Name of the function which created an error, can be "NULL". Functions to create a status: PyStatus PyStatus_Ok(void) Success. PyStatus PyStatus_Error(const char *err_msg) Initialization error with a message. PyStatus PyStatus_NoMemory(void) Memory allocation failure (out of memory). PyStatus PyStatus_Exit(int exitcode) Exit Python with the specified exit code. Functions to handle a status: int PyStatus_Exception(PyStatus status) Is the status an error or an exit? If true, the exception must be handled; by calling "Py_ExitStatusException()" for example. int PyStatus_IsError(PyStatus status) Is the result an error? int PyStatus_IsExit(PyStatus status) Is the result an exit? void Py_ExitStatusException(PyStatus status) Call "exit(exitcode)" if *status* is an exit. Print the error message and exit with a non-zero exit code if *status* is an error. Must only be called if "PyStatus_Exception(status)" is non-zero. Note: Internally, Python uses macros which set "PyStatus.func", whereas functions to create a status set "func" to "NULL". Example: PyStatus alloc(void **ptr, size_t size) { *ptr = PyMem_RawMalloc(size); if (*ptr == NULL) { return PyStatus_NoMemory(); } return PyStatus_Ok(); } int main(int argc, char **argv) { void *ptr; PyStatus status = alloc(&ptr, 16); if (PyStatus_Exception(status)) { Py_ExitStatusException(status); } PyMem_Free(ptr); return 0; } PyPreConfig =========== PyPreConfig Structure used to preinitialize Python: * Set the Python memory allocator * Configure the LC_CTYPE locale * Set the UTF-8 mode Function to initialize a preconfiguration: void PyPreConfig_InitPythonConfig(PyPreConfig *preconfig) Initialize the preconfiguration with Python Configuration. void PyPreConfig_InitIsolatedConfig(PyPreConfig *preconfig) Initialize the preconfiguration with Isolated Configuration. Structure fields: int allocator Name of the memory allocator: * "PYMEM_ALLOCATOR_NOT_SET" ("0"): don’t change memory allocators (use defaults) * "PYMEM_ALLOCATOR_DEFAULT" ("1"): default memory allocators * "PYMEM_ALLOCATOR_DEBUG" ("2"): default memory allocators with debug hooks * "PYMEM_ALLOCATOR_MALLOC" ("3"): force usage of "malloc()" * "PYMEM_ALLOCATOR_MALLOC_DEBUG" ("4"): force usage of "malloc()" with debug hooks * "PYMEM_ALLOCATOR_PYMALLOC" ("5"): Python pymalloc memory allocator * "PYMEM_ALLOCATOR_PYMALLOC_DEBUG" ("6"): Python pymalloc memory allocator with debug hooks "PYMEM_ALLOCATOR_PYMALLOC" and "PYMEM_ALLOCATOR_PYMALLOC_DEBUG" are not supported if Python is configured using "--without- pymalloc" See Memory Management. int configure_locale Set the LC_CTYPE locale to the user preferred locale? If equals to 0, set "coerce_c_locale" and "coerce_c_locale_warn" to 0. int coerce_c_locale If equals to 2, coerce the C locale; if equals to 1, read the LC_CTYPE locale to decide if it should be coerced. int coerce_c_locale_warn If non-zero, emit a warning if the C locale is coerced. int dev_mode See "PyConfig.dev_mode". int isolated See "PyConfig.isolated". int legacy_windows_fs_encoding(Windows only) If non-zero, disable UTF-8 Mode, set the Python filesystem encoding to "mbcs", set the filesystem error handler to "replace". Only available on Windows. "#ifdef MS_WINDOWS" macro can be used for Windows specific code. int parse_argv If non-zero, "Py_PreInitializeFromArgs()" and "Py_PreInitializeFromBytesArgs()" parse their "argv" argument the same way the regular Python parses command line arguments: see Command Line Arguments. int use_environment See "PyConfig.use_environment". int utf8_mode If non-zero, enable the UTF-8 mode. Preinitialization with PyPreConfig ================================== Functions to preinitialize Python: PyStatus Py_PreInitialize(const PyPreConfig *preconfig) Preinitialize Python from *preconfig* preconfiguration. PyStatus Py_PreInitializeFromBytesArgs(const PyPreConfig *preconfig, int argc, char * const *argv) Preinitialize Python from *preconfig* preconfiguration and command line arguments (bytes strings). PyStatus Py_PreInitializeFromArgs(const PyPreConfig *preconfig, int argc, wchar_t * const * argv) Preinitialize Python from *preconfig* preconfiguration and command line arguments (wide strings). The caller is responsible to handle exceptions (error or exit) using "PyStatus_Exception()" and "Py_ExitStatusException()". For Python Configuration ("PyPreConfig_InitPythonConfig()"), if Python is initialized with command line arguments, the command line arguments must also be passed to preinitialize Python, since they have an effect on the pre-configuration like encodings. For example, the "-X utf8" command line option enables the UTF-8 Mode. "PyMem_SetAllocator()" can be called after "Py_PreInitialize()" and before "Py_InitializeFromConfig()" to install a custom memory allocator. It can be called before "Py_PreInitialize()" if "PyPreConfig.allocator" is set to "PYMEM_ALLOCATOR_NOT_SET". Python memory allocation functions like "PyMem_RawMalloc()" must not be used before Python preinitialization, whereas calling directly "malloc()" and "free()" is always safe. "Py_DecodeLocale()" must not be called before the preinitialization. Example using the preinitialization to enable the UTF-8 Mode: PyStatus status; PyPreConfig preconfig; PyPreConfig_InitPythonConfig(&preconfig); preconfig.utf8_mode = 1; status = Py_PreInitialize(&preconfig); if (PyStatus_Exception(status)) { Py_ExitStatusException(status); } /* at this point, Python will speak UTF-8 */ Py_Initialize(); /* ... use Python API here ... */ Py_Finalize(); PyConfig ======== PyConfig Structure containing most parameters to configure Python. Structure methods: void PyConfig_InitPythonConfig(PyConfig *config) Initialize configuration with Python Configuration. void PyConfig_InitIsolatedConfig(PyConfig *config) Initialize configuration with Isolated Configuration. PyStatus PyConfig_SetString(PyConfig *config, wchar_t * const *config_str, const wchar_t *str) Copy the wide character string *str* into "*config_str". Preinitialize Python if needed. PyStatus PyConfig_SetBytesString(PyConfig *config, wchar_t * const *config_str, const char *str) Decode *str* using "Py_DecodeLocale()" and set the result into "*config_str". Preinitialize Python if needed. PyStatus PyConfig_SetArgv(PyConfig *config, int argc, wchar_t * const *argv) Set command line arguments from wide character strings. Preinitialize Python if needed. PyStatus PyConfig_SetBytesArgv(PyConfig *config, int argc, char * const *argv) Set command line arguments: decode bytes using "Py_DecodeLocale()". Preinitialize Python if needed. PyStatus PyConfig_SetWideStringList(PyConfig *config, PyWideStringList *list, Py_ssize_t length, wchar_t **items) Set the list of wide strings *list* to *length* and *items*. Preinitialize Python if needed. PyStatus PyConfig_Read(PyConfig *config) Read all Python configuration. Fields which are already initialized are left unchanged. Preinitialize Python if needed. void PyConfig_Clear(PyConfig *config) Release configuration memory. Most "PyConfig" methods preinitialize Python if needed. In that case, the Python preinitialization configuration in based on the "PyConfig". If configuration fields which are in common with "PyPreConfig" are tuned, they must be set before calling a "PyConfig" method: * "dev_mode" * "isolated" * "parse_argv" * "use_environment" Moreover, if "PyConfig_SetArgv()" or "PyConfig_SetBytesArgv()" is used, this method must be called first, before other methods, since the preinitialization configuration depends on command line arguments (if "parse_argv" is non-zero). The caller of these methods is responsible to handle exceptions (error or exit) using "PyStatus_Exception()" and "Py_ExitStatusException()". Structure fields: PyWideStringList argv Command line arguments, "sys.argv". See "parse_argv" to parse "argv" the same way the regular Python parses Python command line arguments. If "argv" is empty, an empty string is added to ensure that "sys.argv" always exists and is never empty. wchar_t* base_exec_prefix "sys.base_exec_prefix". wchar_t* base_executable "sys._base_executable": "__PYVENV_LAUNCHER__" environment variable value, or copy of "PyConfig.executable". wchar_t* base_prefix "sys.base_prefix". wchar_t* platlibdir "sys.platlibdir": platform library directory name, set at configure time by "--with-platlibdir", overrideable by the "PYTHONPLATLIBDIR" environment variable. New in version 3.9. int buffered_stdio If equals to 0, enable unbuffered mode, making the stdout and stderr streams unbuffered. stdin is always opened in buffered mode. int bytes_warning If equals to 1, issue a warning when comparing "bytes" or "bytearray" with "str", or comparing "bytes" with "int". If equal or greater to 2, raise a "BytesWarning" exception. wchar_t* check_hash_pycs_mode Control the validation behavior of hash-based ".pyc" files (see **PEP 552**): "--check-hash-based-pycs" command line option value. Valid values: "always", "never" and "default". The default value is: "default". int configure_c_stdio If non-zero, configure C standard streams ("stdio", "stdout", "stdout"). For example, set their mode to "O_BINARY" on Windows. int dev_mode If non-zero, enable the Python Development Mode. int dump_refs If non-zero, dump all objects which are still alive at exit. "Py_TRACE_REFS" macro must be defined in build. wchar_t* exec_prefix "sys.exec_prefix". wchar_t* executable "sys.executable". int faulthandler If non-zero, call "faulthandler.enable()" at startup. wchar_t* filesystem_encoding Filesystem encoding, "sys.getfilesystemencoding()". wchar_t* filesystem_errors Filesystem encoding errors, "sys.getfilesystemencodeerrors()". unsigned long hash_seed int use_hash_seed Randomized hash function seed. If "use_hash_seed" is zero, a seed is chosen randomly at Pythonstartup, and "hash_seed" is ignored. wchar_t* home Python home directory. Initialized from "PYTHONHOME" environment variable value by default. int import_time If non-zero, profile import time. int inspect Enter interactive mode after executing a script or a command. int install_signal_handlers Install signal handlers? int interactive Interactive mode. int isolated If greater than 0, enable isolated mode: * "sys.path" contains neither the script’s directory (computed from "argv[0]" or the current directory) nor the user’s site- packages directory. * Python REPL doesn’t import "readline" nor enable default readline configuration on interactive prompts. * Set "use_environment" and "user_site_directory" to 0. int legacy_windows_stdio If non-zero, use "io.FileIO" instead of "io.WindowsConsoleIO" for "sys.stdin", "sys.stdout" and "sys.stderr". Only available on Windows. "#ifdef MS_WINDOWS" macro can be used for Windows specific code. int malloc_stats If non-zero, dump statistics on Python pymalloc memory allocator at exit. The option is ignored if Python is built using "--without- pymalloc". wchar_t* pythonpath_env Module search paths as a string separated by "DELIM" ("os.path.pathsep"). Initialized from "PYTHONPATH" environment variable value by default. PyWideStringList module_search_paths int module_search_paths_set "sys.path". If "module_search_paths_set" is equal to 0, the "module_search_paths" is overridden by the function calculating the Path Configuration. int optimization_level Compilation optimization level: * 0: Peephole optimizer (and "__debug__" is set to "True") * 1: Remove assertions, set "__debug__" to "False" * 2: Strip docstrings int parse_argv If non-zero, parse "argv" the same way the regular Python command line arguments, and strip Python arguments from "argv": see Command Line Arguments. int parser_debug If non-zero, turn on parser debugging output (for expert only, depending on compilation options). int pathconfig_warnings If equal to 0, suppress warnings when calculating the Path Configuration (Unix only, Windows does not log any warning). Otherwise, warnings are written into "stderr". wchar_t* prefix "sys.prefix". wchar_t* program_name Program name. Used to initialize "executable", and in early error messages. wchar_t* pycache_prefix "sys.pycache_prefix": ".pyc" cache prefix. If "NULL", "sys.pycache_prefix" is set to "None". int quiet Quiet mode. For example, don’t display the copyright and version messages in interactive mode. wchar_t* run_command "python3 -c COMMAND" argument. Used by "Py_RunMain()". wchar_t* run_filename "python3 FILENAME" argument. Used by "Py_RunMain()". wchar_t* run_module "python3 -m MODULE" argument. Used by "Py_RunMain()". int show_ref_count Show total reference count at exit? Set to 1 by "-X showrefcount" command line option. Need a debug build of Python ("Py_REF_DEBUG" macro must be defined). int site_import Import the "site" module at startup? int skip_source_first_line Skip the first line of the source? wchar_t* stdio_encoding wchar_t* stdio_errors Encoding and encoding errors of "sys.stdin", "sys.stdout" and "sys.stderr". int tracemalloc If non-zero, call "tracemalloc.start()" at startup. int use_environment If greater than 0, use environment variables. int user_site_directory If non-zero, add user site directory to "sys.path". int verbose If non-zero, enable verbose mode. PyWideStringList warnoptions "sys.warnoptions": options of the "warnings" module to build warnings filters: lowest to highest priority. The "warnings" module adds "sys.warnoptions" in the reverse order: the last "PyConfig.warnoptions" item becomes the first item of "warnings.filters" which is checked first (highest priority). int write_bytecode If non-zero, write ".pyc" files. "sys.dont_write_bytecode" is initialized to the inverted value of "write_bytecode". PyWideStringList xoptions "sys._xoptions". int _use_peg_parser Enable PEG parser? Default: 1. Set to 0 by "-X oldparser" and "PYTHONOLDPARSER". See also **PEP 617**. Deprecated since version 3.9, will be removed in version 3.10. If "parse_argv" is non-zero, "argv" arguments are parsed the same way the regular Python parses command line arguments, and Python arguments are stripped from "argv": see Command Line Arguments. The "xoptions" options are parsed to set other options: see "-X" option. Changed in version 3.9: The "show_alloc_count" field has been removed. Initialization with PyConfig ============================ Function to initialize Python: PyStatus Py_InitializeFromConfig(const PyConfig *config) Initialize Python from *config* configuration. The caller is responsible to handle exceptions (error or exit) using "PyStatus_Exception()" and "Py_ExitStatusException()". If "PyImport_FrozenModules()", "PyImport_AppendInittab()" or "PyImport_ExtendInittab()" are used, they must be set or called after Python preinitialization and before the Python initialization. If Python is initialized multiple times, "PyImport_AppendInittab()" or "PyImport_ExtendInittab()" must be called before each Python initialization. Example setting the program name: void init_python(void) { PyStatus status; PyConfig config; PyConfig_InitPythonConfig(&config); /* Set the program name. Implicitly preinitialize Python. */ status = PyConfig_SetString(&config, &config.program_name, L"/path/to/my_program"); if (PyStatus_Exception(status)) { goto fail; } status = Py_InitializeFromConfig(&config); if (PyStatus_Exception(status)) { goto fail; } PyConfig_Clear(&config); return; fail: PyConfig_Clear(&config); Py_ExitStatusException(status); } More complete example modifying the default configuration, read the configuration, and then override some parameters: PyStatus init_python(const char *program_name) { PyStatus status; PyConfig config; PyConfig_InitPythonConfig(&config); /* Set the program name before reading the configuration (decode byte string from the locale encoding). Implicitly preinitialize Python. */ status = PyConfig_SetBytesString(&config, &config.program_name, program_name); if (PyStatus_Exception(status)) { goto done; } /* Read all configuration at once */ status = PyConfig_Read(&config); if (PyStatus_Exception(status)) { goto done; } /* Append our custom search path to sys.path */ status = PyWideStringList_Append(&config.module_search_paths, L"/path/to/more/modules"); if (PyStatus_Exception(status)) { goto done; } /* Override executable computed by PyConfig_Read() */ status = PyConfig_SetString(&config, &config.executable, L"/path/to/my_executable"); if (PyStatus_Exception(status)) { goto done; } status = Py_InitializeFromConfig(&config); done: PyConfig_Clear(&config); return status; } Isolated Configuration ====================== "PyPreConfig_InitIsolatedConfig()" and "PyConfig_InitIsolatedConfig()" functions create a configuration to isolate Python from the system. For example, to embed Python into an application. This configuration ignores global configuration variables, environment variables, command line arguments ("PyConfig.argv" is not parsed) and user site directory. The C standard streams (ex: "stdout") and the LC_CTYPE locale are left unchanged. Signal handlers are not installed. Configuration files are still used with this configuration. Set the Path Configuration (“output fields”) to ignore these configuration files and avoid the function computing the default path configuration. Python Configuration ==================== "PyPreConfig_InitPythonConfig()" and "PyConfig_InitPythonConfig()" functions create a configuration to build a customized Python which behaves as the regular Python. Environments variables and command line arguments are used to configure Python, whereas global configuration variables are ignored. This function enables C locale coercion (**PEP 538**) and UTF-8 Mode (**PEP 540**) depending on the LC_CTYPE locale, "PYTHONUTF8" and "PYTHONCOERCECLOCALE" environment variables. Example of customized Python always running in isolated mode: int main(int argc, char **argv) { PyStatus status; PyConfig config; PyConfig_InitPythonConfig(&config); config.isolated = 1; /* Decode command line arguments. Implicitly preinitialize Python (in isolated mode). */ status = PyConfig_SetBytesArgv(&config, argc, argv); if (PyStatus_Exception(status)) { goto fail; } status = Py_InitializeFromConfig(&config); if (PyStatus_Exception(status)) { goto fail; } PyConfig_Clear(&config); return Py_RunMain(); fail: PyConfig_Clear(&config); if (PyStatus_IsExit(status)) { return status.exitcode; } /* Display the error message and exit the process with non-zero exit code */ Py_ExitStatusException(status); } Path Configuration ================== "PyConfig" contains multiple fields for the path configuration: * Path configuration inputs: * "PyConfig.home" * "PyConfig.platlibdir" * "PyConfig.pathconfig_warnings" * "PyConfig.program_name" * "PyConfig.pythonpath_env" * current working directory: to get absolute paths * "PATH" environment variable to get the program full path (from "PyConfig.program_name") * "__PYVENV_LAUNCHER__" environment variable * (Windows only) Application paths in the registry under “SoftwarePythonPythonCoreX.YPythonPath” of HKEY_CURRENT_USER and HKEY_LOCAL_MACHINE (where X.Y is the Python version). * Path configuration output fields: * "PyConfig.base_exec_prefix" * "PyConfig.base_executable" * "PyConfig.base_prefix" * "PyConfig.exec_prefix" * "PyConfig.executable" * "PyConfig.module_search_paths_set", "PyConfig.module_search_paths" * "PyConfig.prefix" If at least one “output field” is not set, Python calculates the path configuration to fill unset fields. If "module_search_paths_set" is equal to 0, "module_search_paths" is overridden and "module_search_paths_set" is set to 1. It is possible to completely ignore the function calculating the default path configuration by setting explicitly all path configuration output fields listed above. A string is considered as set even if it is non-empty. "module_search_paths" is considered as set if "module_search_paths_set" is set to 1. In this case, path configuration input fields are ignored as well. Set "pathconfig_warnings" to 0 to suppress warnings when calculating the path configuration (Unix only, Windows does not log any warning). If "base_prefix" or "base_exec_prefix" fields are not set, they inherit their value from "prefix" and "exec_prefix" respectively. "Py_RunMain()" and "Py_Main()" modify "sys.path": * If "run_filename" is set and is a directory which contains a "__main__.py" script, prepend "run_filename" to "sys.path". * If "isolated" is zero: * If "run_module" is set, prepend the current directory to "sys.path". Do nothing if the current directory cannot be read. * If "run_filename" is set, prepend the directory of the filename to "sys.path". * Otherwise, prepend an empty string to "sys.path". If "site_import" is non-zero, "sys.path" can be modified by the "site" module. If "user_site_directory" is non-zero and the user’s site- package directory exists, the "site" module appends the user’s site- package directory to "sys.path". The following configuration files are used by the path configuration: * "pyvenv.cfg" * "python._pth" (Windows only) * "pybuilddir.txt" (Unix only) The "__PYVENV_LAUNCHER__" environment variable is used to set "PyConfig.base_executable" Py_RunMain() ============ int Py_RunMain(void) Execute the command ("PyConfig.run_command"), the script ("PyConfig.run_filename") or the module ("PyConfig.run_module") specified on the command line or in the configuration. By default and when if "-i" option is used, run the REPL. Finally, finalizes Python and returns an exit status that can be passed to the "exit()" function. See Python Configuration for an example of customized Python always running in isolated mode using "Py_RunMain()". Py_GetArgcArgv() ================ void Py_GetArgcArgv(int *argc, wchar_t ***argv) Get the original command line arguments, before Python modified them. Multi-Phase Initialization Private Provisional API ================================================== This section is a private provisional API introducing multi-phase initialization, the core feature of **PEP 432**: * “Core” initialization phase, “bare minimum Python”: * Builtin types; * Builtin exceptions; * Builtin and frozen modules; * The "sys" module is only partially initialized (ex: "sys.path" doesn’t exist yet). * “Main” initialization phase, Python is fully initialized: * Install and configure "importlib"; * Apply the Path Configuration; * Install signal handlers; * Finish "sys" module initialization (ex: create "sys.stdout" and "sys.path"); * Enable optional features like "faulthandler" and "tracemalloc"; * Import the "site" module; * etc. Private provisional API: * "PyConfig._init_main": if set to 0, "Py_InitializeFromConfig()" stops at the “Core” initialization phase. * "PyConfig._isolated_interpreter": if non-zero, disallow threads, subprocesses and fork. PyStatus _Py_InitializeMain(void) Move to the “Main” initialization phase, finish the Python initialization. No module is imported during the “Core” phase and the "importlib" module is not configured: the Path Configuration is only applied during the “Main” phase. It may allow to customize Python in Python to override or tune the Path Configuration, maybe install a custom "sys.meta_path" importer or an import hook, etc. It may become possible to calculatin the Path Configuration in Python, after the Core phase and before the Main phase, which is one of the **PEP 432** motivation. The “Core” phase is not properly defined: what should be and what should not be available at this phase is not specified yet. The API is marked as private and provisional: the API can be modified or even be removed anytime until a proper public API is designed. Example running Python code between “Core” and “Main” initialization phases: void init_python(void) { PyStatus status; PyConfig config; PyConfig_InitPythonConfig(&config); config._init_main = 0; /* ... customize 'config' configuration ... */ status = Py_InitializeFromConfig(&config); PyConfig_Clear(&config); if (PyStatus_Exception(status)) { Py_ExitStatusException(status); } /* Use sys.stderr because sys.stdout is only created by _Py_InitializeMain() */ int res = PyRun_SimpleString( "import sys; " "print('Run Python code before _Py_InitializeMain', " "file=sys.stderr)"); if (res < 0) { exit(1); } /* ... put more configuration code here ... */ status = _Py_InitializeMain(); if (PyStatus_Exception(status)) { Py_ExitStatusException(status); } } Introduction ************ The Application Programmer’s Interface to Python gives C and C++ programmers access to the Python interpreter at a variety of levels. The API is equally usable from C++, but for brevity it is generally referred to as the Python/C API. There are two fundamentally different reasons for using the Python/C API. The first reason is to write *extension modules* for specific purposes; these are C modules that extend the Python interpreter. This is probably the most common use. The second reason is to use Python as a component in a larger application; this technique is generally referred to as *embedding* Python in an application. Writing an extension module is a relatively well-understood process, where a “cookbook” approach works well. There are several tools that automate the process to some extent. While people have embedded Python in other applications since its early existence, the process of embedding Python is less straightforward than writing an extension. Many API functions are useful independent of whether you’re embedding or extending Python; moreover, most applications that embed Python will need to provide a custom extension as well, so it’s probably a good idea to become familiar with writing an extension before attempting to embed Python in a real application. Coding standards ================ If you’re writing C code for inclusion in CPython, you **must** follow the guidelines and standards defined in **PEP 7**. These guidelines apply regardless of the version of Python you are contributing to. Following these conventions is not necessary for your own third party extension modules, unless you eventually expect to contribute them to Python. Include Files ============= All function, type and macro definitions needed to use the Python/C API are included in your code by the following line: #define PY_SSIZE_T_CLEAN #include This implies inclusion of the following standard headers: "", "", "", "", "" and "" (if available). Note: Since Python may define some pre-processor definitions which affect the standard headers on some systems, you *must* include "Python.h" before any standard headers are included.It is recommended to always define "PY_SSIZE_T_CLEAN" before including "Python.h". See Parsing arguments and building values for a description of this macro. All user visible names defined by Python.h (except those defined by the included standard headers) have one of the prefixes "Py" or "_Py". Names beginning with "_Py" are for internal use by the Python implementation and should not be used by extension writers. Structure member names do not have a reserved prefix. Note: User code should never define names that begin with "Py" or "_Py". This confuses the reader, and jeopardizes the portability of the user code to future Python versions, which may define additional names beginning with one of these prefixes. The header files are typically installed with Python. On Unix, these are located in the directories "*prefix*/include/pythonversion/" and "*exec_prefix*/include/pythonversion/", where "prefix" and "exec_prefix" are defined by the corresponding parameters to Python’s **configure** script and *version* is "'%d.%d' % sys.version_info[:2]". On Windows, the headers are installed in "*prefix*/include", where "prefix" is the installation directory specified to the installer. To include the headers, place both directories (if different) on your compiler’s search path for includes. Do *not* place the parent directories on the search path and then use "#include "; this will break on multi-platform builds since the platform independent headers under "prefix" include the platform specific headers from "exec_prefix". C++ users should note that although the API is defined entirely using C, the header files properly declare the entry points to be "extern "C"". As a result, there is no need to do anything special to use the API from C++. Useful macros ============= Several useful macros are defined in the Python header files. Many are defined closer to where they are useful (e.g. "Py_RETURN_NONE"). Others of a more general utility are defined here. This is not necessarily a complete listing. Py_UNREACHABLE() Use this when you have a code path that cannot be reached by design. For example, in the "default:" clause in a "switch" statement for which all possible values are covered in "case" statements. Use this in places where you might be tempted to put an "assert(0)" or "abort()" call. In release mode, the macro helps the compiler to optimize the code, and avoids a warning about unreachable code. For example, the macro is implemented with "__builtin_unreachable()" on GCC in release mode. A use for "Py_UNREACHABLE()" is following a call a function that never returns but that is not declared "_Py_NO_RETURN". If a code path is very unlikely code but can be reached under exceptional case, this macro must not be used. For example, under low memory condition or if a system call returns a value out of the expected range. In this case, it’s better to report the error to the caller. If the error cannot be reported to caller, "Py_FatalError()" can be used. New in version 3.7. Py_ABS(x) Return the absolute value of "x". New in version 3.3. Py_MIN(x, y) Return the minimum value between "x" and "y". New in version 3.3. Py_MAX(x, y) Return the maximum value between "x" and "y". New in version 3.3. Py_STRINGIFY(x) Convert "x" to a C string. E.g. "Py_STRINGIFY(123)" returns ""123"". New in version 3.4. Py_MEMBER_SIZE(type, member) Return the size of a structure ("type") "member" in bytes. New in version 3.6. Py_CHARMASK(c) Argument must be a character or an integer in the range [-128, 127] or [0, 255]. This macro returns "c" cast to an "unsigned char". Py_GETENV(s) Like "getenv(s)", but returns "NULL" if "-E" was passed on the command line (i.e. if "Py_IgnoreEnvironmentFlag" is set). Py_UNUSED(arg) Use this for unused arguments in a function definition to silence compiler warnings. Example: "int func(int a, int Py_UNUSED(b)) { return a; }". New in version 3.4. Py_DEPRECATED(version) Use this for deprecated declarations. The macro must be placed before the symbol name. Example: Py_DEPRECATED(3.8) PyAPI_FUNC(int) Py_OldFunction(void); Changed in version 3.8: MSVC support was added. PyDoc_STRVAR(name, str) Creates a variable with name "name" that can be used in docstrings. If Python is built without docstrings, the value will be empty. Use "PyDoc_STRVAR" for docstrings to support building Python without docstrings, as specified in **PEP 7**. Example: PyDoc_STRVAR(pop_doc, "Remove and return the rightmost element."); static PyMethodDef deque_methods[] = { // ... {"pop", (PyCFunction)deque_pop, METH_NOARGS, pop_doc}, // ... } PyDoc_STR(str) Creates a docstring for the given input string or an empty string if docstrings are disabled. Use "PyDoc_STR" in specifying docstrings to support building Python without docstrings, as specified in **PEP 7**. Example: static PyMethodDef pysqlite_row_methods[] = { {"keys", (PyCFunction)pysqlite_row_keys, METH_NOARGS, PyDoc_STR("Returns the keys of the row.")}, {NULL, NULL} }; Objects, Types and Reference Counts =================================== Most Python/C API functions have one or more arguments as well as a return value of type "PyObject*". This type is a pointer to an opaque data type representing an arbitrary Python object. Since all Python object types are treated the same way by the Python language in most situations (e.g., assignments, scope rules, and argument passing), it is only fitting that they should be represented by a single C type. Almost all Python objects live on the heap: you never declare an automatic or static variable of type "PyObject", only pointer variables of type "PyObject*" can be declared. The sole exception are the type objects; since these must never be deallocated, they are typically static "PyTypeObject" objects. All Python objects (even Python integers) have a *type* and a *reference count*. An object’s type determines what kind of object it is (e.g., an integer, a list, or a user-defined function; there are many more as explained in The standard type hierarchy). For each of the well-known types there is a macro to check whether an object is of that type; for instance, "PyList_Check(a)" is true if (and only if) the object pointed to by *a* is a Python list. Reference Counts ---------------- The reference count is important because today’s computers have a finite (and often severely limited) memory size; it counts how many different places there are that have a reference to an object. Such a place could be another object, or a global (or static) C variable, or a local variable in some C function. When an object’s reference count becomes zero, the object is deallocated. If it contains references to other objects, their reference count is decremented. Those other objects may be deallocated in turn, if this decrement makes their reference count become zero, and so on. (There’s an obvious problem with objects that reference each other here; for now, the solution is “don’t do that.”) Reference counts are always manipulated explicitly. The normal way is to use the macro "Py_INCREF()" to increment an object’s reference count by one, and "Py_DECREF()" to decrement it by one. The "Py_DECREF()" macro is considerably more complex than the incref one, since it must check whether the reference count becomes zero and then cause the object’s deallocator to be called. The deallocator is a function pointer contained in the object’s type structure. The type- specific deallocator takes care of decrementing the reference counts for other objects contained in the object if this is a compound object type, such as a list, as well as performing any additional finalization that’s needed. There’s no chance that the reference count can overflow; at least as many bits are used to hold the reference count as there are distinct memory locations in virtual memory (assuming "sizeof(Py_ssize_t) >= sizeof(void*)"). Thus, the reference count increment is a simple operation. It is not necessary to increment an object’s reference count for every local variable that contains a pointer to an object. In theory, the object’s reference count goes up by one when the variable is made to point to it and it goes down by one when the variable goes out of scope. However, these two cancel each other out, so at the end the reference count hasn’t changed. The only real reason to use the reference count is to prevent the object from being deallocated as long as our variable is pointing to it. If we know that there is at least one other reference to the object that lives at least as long as our variable, there is no need to increment the reference count temporarily. An important situation where this arises is in objects that are passed as arguments to C functions in an extension module that are called from Python; the call mechanism guarantees to hold a reference to every argument for the duration of the call. However, a common pitfall is to extract an object from a list and hold on to it for a while without incrementing its reference count. Some other operation might conceivably remove the object from the list, decrementing its reference count and possibly deallocating it. The real danger is that innocent-looking operations may invoke arbitrary Python code which could do this; there is a code path which allows control to flow back to the user from a "Py_DECREF()", so almost any operation is potentially dangerous. A safe approach is to always use the generic operations (functions whose name begins with "PyObject_", "PyNumber_", "PySequence_" or "PyMapping_"). These operations always increment the reference count of the object they return. This leaves the caller with the responsibility to call "Py_DECREF()" when they are done with the result; this soon becomes second nature. Reference Count Details ~~~~~~~~~~~~~~~~~~~~~~~ The reference count behavior of functions in the Python/C API is best explained in terms of *ownership of references*. Ownership pertains to references, never to objects (objects are not owned: they are always shared). “Owning a reference” means being responsible for calling Py_DECREF on it when the reference is no longer needed. Ownership can also be transferred, meaning that the code that receives ownership of the reference then becomes responsible for eventually decref’ing it by calling "Py_DECREF()" or "Py_XDECREF()" when it’s no longer needed—or passing on this responsibility (usually to its caller). When a function passes ownership of a reference on to its caller, the caller is said to receive a *new* reference. When no ownership is transferred, the caller is said to *borrow* the reference. Nothing needs to be done for a borrowed reference. Conversely, when a calling function passes in a reference to an object, there are two possibilities: the function *steals* a reference to the object, or it does not. *Stealing a reference* means that when you pass a reference to a function, that function assumes that it now owns that reference, and you are not responsible for it any longer. Few functions steal references; the two notable exceptions are "PyList_SetItem()" and "PyTuple_SetItem()", which steal a reference to the item (but not to the tuple or list into which the item is put!). These functions were designed to steal a reference because of a common idiom for populating a tuple or list with newly created objects; for example, the code to create the tuple "(1, 2, "three")" could look like this (forgetting about error handling for the moment; a better way to code this is shown below): PyObject *t; t = PyTuple_New(3); PyTuple_SetItem(t, 0, PyLong_FromLong(1L)); PyTuple_SetItem(t, 1, PyLong_FromLong(2L)); PyTuple_SetItem(t, 2, PyUnicode_FromString("three")); Here, "PyLong_FromLong()" returns a new reference which is immediately stolen by "PyTuple_SetItem()". When you want to keep using an object although the reference to it will be stolen, use "Py_INCREF()" to grab another reference before calling the reference-stealing function. Incidentally, "PyTuple_SetItem()" is the *only* way to set tuple items; "PySequence_SetItem()" and "PyObject_SetItem()" refuse to do this since tuples are an immutable data type. You should only use "PyTuple_SetItem()" for tuples that you are creating yourself. Equivalent code for populating a list can be written using "PyList_New()" and "PyList_SetItem()". However, in practice, you will rarely use these ways of creating and populating a tuple or list. There’s a generic function, "Py_BuildValue()", that can create most common objects from C values, directed by a *format string*. For example, the above two blocks of code could be replaced by the following (which also takes care of the error checking): PyObject *tuple, *list; tuple = Py_BuildValue("(iis)", 1, 2, "three"); list = Py_BuildValue("[iis]", 1, 2, "three"); It is much more common to use "PyObject_SetItem()" and friends with items whose references you are only borrowing, like arguments that were passed in to the function you are writing. In that case, their behaviour regarding reference counts is much saner, since you don’t have to increment a reference count so you can give a reference away (“have it be stolen”). For example, this function sets all items of a list (actually, any mutable sequence) to a given item: int set_all(PyObject *target, PyObject *item) { Py_ssize_t i, n; n = PyObject_Length(target); if (n < 0) return -1; for (i = 0; i < n; i++) { PyObject *index = PyLong_FromSsize_t(i); if (!index) return -1; if (PyObject_SetItem(target, index, item) < 0) { Py_DECREF(index); return -1; } Py_DECREF(index); } return 0; } The situation is slightly different for function return values. While passing a reference to most functions does not change your ownership responsibilities for that reference, many functions that return a reference to an object give you ownership of the reference. The reason is simple: in many cases, the returned object is created on the fly, and the reference you get is the only reference to the object. Therefore, the generic functions that return object references, like "PyObject_GetItem()" and "PySequence_GetItem()", always return a new reference (the caller becomes the owner of the reference). It is important to realize that whether you own a reference returned by a function depends on which function you call only — *the plumage* (the type of the object passed as an argument to the function) *doesn’t enter into it!* Thus, if you extract an item from a list using "PyList_GetItem()", you don’t own the reference — but if you obtain the same item from the same list using "PySequence_GetItem()" (which happens to take exactly the same arguments), you do own a reference to the returned object. Here is an example of how you could write a function that computes the sum of the items in a list of integers; once using "PyList_GetItem()", and once using "PySequence_GetItem()". long sum_list(PyObject *list) { Py_ssize_t i, n; long total = 0, value; PyObject *item; n = PyList_Size(list); if (n < 0) return -1; /* Not a list */ for (i = 0; i < n; i++) { item = PyList_GetItem(list, i); /* Can't fail */ if (!PyLong_Check(item)) continue; /* Skip non-integers */ value = PyLong_AsLong(item); if (value == -1 && PyErr_Occurred()) /* Integer too big to fit in a C long, bail out */ return -1; total += value; } return total; } long sum_sequence(PyObject *sequence) { Py_ssize_t i, n; long total = 0, value; PyObject *item; n = PySequence_Length(sequence); if (n < 0) return -1; /* Has no length */ for (i = 0; i < n; i++) { item = PySequence_GetItem(sequence, i); if (item == NULL) return -1; /* Not a sequence, or other failure */ if (PyLong_Check(item)) { value = PyLong_AsLong(item); Py_DECREF(item); if (value == -1 && PyErr_Occurred()) /* Integer too big to fit in a C long, bail out */ return -1; total += value; } else { Py_DECREF(item); /* Discard reference ownership */ } } return total; } Types ----- There are few other data types that play a significant role in the Python/C API; most are simple C types such as "int", "long", "double" and "char*". A few structure types are used to describe static tables used to list the functions exported by a module or the data attributes of a new object type, and another is used to describe the value of a complex number. These will be discussed together with the functions that use them. Py_ssize_t A signed integral type such that "sizeof(Py_ssize_t) == sizeof(size_t)". C99 doesn’t define such a thing directly (size_t is an unsigned integral type). See **PEP 353** for details. "PY_SSIZE_T_MAX" is the largest positive value of type "Py_ssize_t". Exceptions ========== The Python programmer only needs to deal with exceptions if specific error handling is required; unhandled exceptions are automatically propagated to the caller, then to the caller’s caller, and so on, until they reach the top-level interpreter, where they are reported to the user accompanied by a stack traceback. For C programmers, however, error checking always has to be explicit. All functions in the Python/C API can raise exceptions, unless an explicit claim is made otherwise in a function’s documentation. In general, when a function encounters an error, it sets an exception, discards any object references that it owns, and returns an error indicator. If not documented otherwise, this indicator is either "NULL" or "-1", depending on the function’s return type. A few functions return a Boolean true/false result, with false indicating an error. Very few functions return no explicit error indicator or have an ambiguous return value, and require explicit testing for errors with "PyErr_Occurred()". These exceptions are always explicitly documented. Exception state is maintained in per-thread storage (this is equivalent to using global storage in an unthreaded application). A thread can be in one of two states: an exception has occurred, or not. The function "PyErr_Occurred()" can be used to check for this: it returns a borrowed reference to the exception type object when an exception has occurred, and "NULL" otherwise. There are a number of functions to set the exception state: "PyErr_SetString()" is the most common (though not the most general) function to set the exception state, and "PyErr_Clear()" clears the exception state. The full exception state consists of three objects (all of which can be "NULL"): the exception type, the corresponding exception value, and the traceback. These have the same meanings as the Python result of "sys.exc_info()"; however, they are not the same: the Python objects represent the last exception being handled by a Python "try" … "except" statement, while the C level exception state only exists while an exception is being passed on between C functions until it reaches the Python bytecode interpreter’s main loop, which takes care of transferring it to "sys.exc_info()" and friends. Note that starting with Python 1.5, the preferred, thread-safe way to access the exception state from Python code is to call the function "sys.exc_info()", which returns the per-thread exception state for Python code. Also, the semantics of both ways to access the exception state have changed so that a function which catches an exception will save and restore its thread’s exception state so as to preserve the exception state of its caller. This prevents common bugs in exception handling code caused by an innocent-looking function overwriting the exception being handled; it also reduces the often unwanted lifetime extension for objects that are referenced by the stack frames in the traceback. As a general principle, a function that calls another function to perform some task should check whether the called function raised an exception, and if so, pass the exception state on to its caller. It should discard any object references that it owns, and return an error indicator, but it should *not* set another exception — that would overwrite the exception that was just raised, and lose important information about the exact cause of the error. A simple example of detecting exceptions and passing them on is shown in the "sum_sequence()" example above. It so happens that this example doesn’t need to clean up any owned references when it detects an error. The following example function shows some error cleanup. First, to remind you why you like Python, we show the equivalent Python code: def incr_item(dict, key): try: item = dict[key] except KeyError: item = 0 dict[key] = item + 1 Here is the corresponding C code, in all its glory: int incr_item(PyObject *dict, PyObject *key) { /* Objects all initialized to NULL for Py_XDECREF */ PyObject *item = NULL, *const_one = NULL, *incremented_item = NULL; int rv = -1; /* Return value initialized to -1 (failure) */ item = PyObject_GetItem(dict, key); if (item == NULL) { /* Handle KeyError only: */ if (!PyErr_ExceptionMatches(PyExc_KeyError)) goto error; /* Clear the error and use zero: */ PyErr_Clear(); item = PyLong_FromLong(0L); if (item == NULL) goto error; } const_one = PyLong_FromLong(1L); if (const_one == NULL) goto error; incremented_item = PyNumber_Add(item, const_one); if (incremented_item == NULL) goto error; if (PyObject_SetItem(dict, key, incremented_item) < 0) goto error; rv = 0; /* Success */ /* Continue with cleanup code */ error: /* Cleanup code, shared by success and failure path */ /* Use Py_XDECREF() to ignore NULL references */ Py_XDECREF(item); Py_XDECREF(const_one); Py_XDECREF(incremented_item); return rv; /* -1 for error, 0 for success */ } This example represents an endorsed use of the "goto" statement in C! It illustrates the use of "PyErr_ExceptionMatches()" and "PyErr_Clear()" to handle specific exceptions, and the use of "Py_XDECREF()" to dispose of owned references that may be "NULL" (note the "'X'" in the name; "Py_DECREF()" would crash when confronted with a "NULL" reference). It is important that the variables used to hold owned references are initialized to "NULL" for this to work; likewise, the proposed return value is initialized to "-1" (failure) and only set to success after the final call made is successful. Embedding Python ================ The one important task that only embedders (as opposed to extension writers) of the Python interpreter have to worry about is the initialization, and possibly the finalization, of the Python interpreter. Most functionality of the interpreter can only be used after the interpreter has been initialized. The basic initialization function is "Py_Initialize()". This initializes the table of loaded modules, and creates the fundamental modules "builtins", "__main__", and "sys". It also initializes the module search path ("sys.path"). "Py_Initialize()" does not set the “script argument list” ("sys.argv"). If this variable is needed by Python code that will be executed later, it must be set explicitly with a call to "PySys_SetArgvEx(argc, argv, updatepath)" after the call to "Py_Initialize()". On most systems (in particular, on Unix and Windows, although the details are slightly different), "Py_Initialize()" calculates the module search path based upon its best guess for the location of the standard Python interpreter executable, assuming that the Python library is found in a fixed location relative to the Python interpreter executable. In particular, it looks for a directory named "lib/python*X.Y*" relative to the parent directory where the executable named "python" is found on the shell command search path (the environment variable "PATH"). For instance, if the Python executable is found in "/usr/local/bin/python", it will assume that the libraries are in "/usr/local/lib/python*X.Y*". (In fact, this particular path is also the “fallback” location, used when no executable file named "python" is found along "PATH".) The user can override this behavior by setting the environment variable "PYTHONHOME", or insert additional directories in front of the standard path by setting "PYTHONPATH". The embedding application can steer the search by calling "Py_SetProgramName(file)" *before* calling "Py_Initialize()". Note that "PYTHONHOME" still overrides this and "PYTHONPATH" is still inserted in front of the standard path. An application that requires total control has to provide its own implementation of "Py_GetPath()", "Py_GetPrefix()", "Py_GetExecPrefix()", and "Py_GetProgramFullPath()" (all defined in "Modules/getpath.c"). Sometimes, it is desirable to “uninitialize” Python. For instance, the application may want to start over (make another call to "Py_Initialize()") or the application is simply done with its use of Python and wants to free memory allocated by Python. This can be accomplished by calling "Py_FinalizeEx()". The function "Py_IsInitialized()" returns true if Python is currently in the initialized state. More information about these functions is given in a later chapter. Notice that "Py_FinalizeEx()" does *not* free all memory allocated by the Python interpreter, e.g. memory allocated by extension modules currently cannot be released. Debugging Builds ================ Python can be built with several macros to enable extra checks of the interpreter and extension modules. These checks tend to add a large amount of overhead to the runtime so they are not enabled by default. A full list of the various types of debugging builds is in the file "Misc/SpecialBuilds.txt" in the Python source distribution. Builds are available that support tracing of reference counts, debugging the memory allocator, or low-level profiling of the main interpreter loop. Only the most frequently-used builds will be described in the remainder of this section. Compiling the interpreter with the "Py_DEBUG" macro defined produces what is generally meant by “a debug build” of Python. "Py_DEBUG" is enabled in the Unix build by adding "--with-pydebug" to the "./configure" command. It is also implied by the presence of the not- Python-specific "_DEBUG" macro. When "Py_DEBUG" is enabled in the Unix build, compiler optimization is disabled. In addition to the reference count debugging described below, the following extra checks are performed: * Extra checks are added to the object allocator. * Extra checks are added to the parser and compiler. * Downcasts from wide types to narrow types are checked for loss of information. * A number of assertions are added to the dictionary and set implementations. In addition, the set object acquires a "test_c_api()" method. * Sanity checks of the input arguments are added to frame creation. * The storage for ints is initialized with a known invalid pattern to catch reference to uninitialized digits. * Low-level tracing and extra exception checking are added to the runtime virtual machine. * Extra checks are added to the memory arena implementation. * Extra debugging is added to the thread module. There may be additional checks not mentioned here. Defining "Py_TRACE_REFS" enables reference tracing. When defined, a circular doubly linked list of active objects is maintained by adding two extra fields to every "PyObject". Total allocations are tracked as well. Upon exit, all existing references are printed. (In interactive mode this happens after every statement run by the interpreter.) Implied by "Py_DEBUG". Please refer to "Misc/SpecialBuilds.txt" in the Python source distribution for more detailed information. Iterator Protocol ***************** There are two functions specifically for working with iterators. int PyIter_Check(PyObject *o) Return true if the object *o* supports the iterator protocol. This function always succeeds. PyObject* PyIter_Next(PyObject *o) *Return value: New reference.* Return the next value from the iteration *o*. The object must be an iterator (it is up to the caller to check this). If there are no remaining values, returns "NULL" with no exception set. If an error occurs while retrieving the item, returns "NULL" and passes along the exception. To write a loop which iterates over an iterator, the C code should look something like this: PyObject *iterator = PyObject_GetIter(obj); PyObject *item; if (iterator == NULL) { /* propagate error */ } while ((item = PyIter_Next(iterator))) { /* do something with item */ ... /* release reference when done */ Py_DECREF(item); } Py_DECREF(iterator); if (PyErr_Occurred()) { /* propagate error */ } else { /* continue doing useful work */ } Iterator Objects **************** Python provides two general-purpose iterator objects. The first, a sequence iterator, works with an arbitrary sequence supporting the "__getitem__()" method. The second works with a callable object and a sentinel value, calling the callable for each item in the sequence, and ending the iteration when the sentinel value is returned. PyTypeObject PySeqIter_Type Type object for iterator objects returned by "PySeqIter_New()" and the one-argument form of the "iter()" built-in function for built- in sequence types. int PySeqIter_Check(op) Return true if the type of *op* is "PySeqIter_Type". This function always succeeds. PyObject* PySeqIter_New(PyObject *seq) *Return value: New reference.* Return an iterator that works with a general sequence object, *seq*. The iteration ends when the sequence raises "IndexError" for the subscripting operation. PyTypeObject PyCallIter_Type Type object for iterator objects returned by "PyCallIter_New()" and the two-argument form of the "iter()" built-in function. int PyCallIter_Check(op) Return true if the type of *op* is "PyCallIter_Type". This function always succeeds. PyObject* PyCallIter_New(PyObject *callable, PyObject *sentinel) *Return value: New reference.* Return a new iterator. The first parameter, *callable*, can be any Python callable object that can be called with no parameters; each call to it should return the next item in the iteration. When *callable* returns a value equal to *sentinel*, the iteration will be terminated. List Objects ************ PyListObject This subtype of "PyObject" represents a Python list object. PyTypeObject PyList_Type This instance of "PyTypeObject" represents the Python list type. This is the same object as "list" in the Python layer. int PyList_Check(PyObject *p) Return true if *p* is a list object or an instance of a subtype of the list type. This function always succeeds. int PyList_CheckExact(PyObject *p) Return true if *p* is a list object, but not an instance of a subtype of the list type. This function always succeeds. PyObject* PyList_New(Py_ssize_t len) *Return value: New reference.* Return a new list of length *len* on success, or "NULL" on failure. Note: If *len* is greater than zero, the returned list object’s items are set to "NULL". Thus you cannot use abstract API functions such as "PySequence_SetItem()" or expose the object to Python code before setting all items to a real object with "PyList_SetItem()". Py_ssize_t PyList_Size(PyObject *list) Return the length of the list object in *list*; this is equivalent to "len(list)" on a list object. Py_ssize_t PyList_GET_SIZE(PyObject *list) Macro form of "PyList_Size()" without error checking. PyObject* PyList_GetItem(PyObject *list, Py_ssize_t index) *Return value: Borrowed reference.* Return the object at position *index* in the list pointed to by *list*. The position must be non-negative; indexing from the end of the list is not supported. If *index* is out of bounds (<0 or >=len(list)), return "NULL" and set an "IndexError" exception. PyObject* PyList_GET_ITEM(PyObject *list, Py_ssize_t i) *Return value: Borrowed reference.* Macro form of "PyList_GetItem()" without error checking. int PyList_SetItem(PyObject *list, Py_ssize_t index, PyObject *item) Set the item at index *index* in list to *item*. Return "0" on success. If *index* is out of bounds, return "-1" and set an "IndexError" exception. Note: This function “steals” a reference to *item* and discards a reference to an item already in the list at the affected position. void PyList_SET_ITEM(PyObject *list, Py_ssize_t i, PyObject *o) Macro form of "PyList_SetItem()" without error checking. This is normally only used to fill in new lists where there is no previous content. Note: This macro “steals” a reference to *item*, and, unlike "PyList_SetItem()", does *not* discard a reference to any item that is being replaced; any reference in *list* at position *i* will be leaked. int PyList_Insert(PyObject *list, Py_ssize_t index, PyObject *item) Insert the item *item* into list *list* in front of index *index*. Return "0" if successful; return "-1" and set an exception if unsuccessful. Analogous to "list.insert(index, item)". int PyList_Append(PyObject *list, PyObject *item) Append the object *item* at the end of list *list*. Return "0" if successful; return "-1" and set an exception if unsuccessful. Analogous to "list.append(item)". PyObject* PyList_GetSlice(PyObject *list, Py_ssize_t low, Py_ssize_t high) *Return value: New reference.* Return a list of the objects in *list* containing the objects *between* *low* and *high*. Return "NULL" and set an exception if unsuccessful. Analogous to "list[low:high]". Indexing from the end of the list is not supported. int PyList_SetSlice(PyObject *list, Py_ssize_t low, Py_ssize_t high, PyObject *itemlist) Set the slice of *list* between *low* and *high* to the contents of *itemlist*. Analogous to "list[low:high] = itemlist". The *itemlist* may be "NULL", indicating the assignment of an empty list (slice deletion). Return "0" on success, "-1" on failure. Indexing from the end of the list is not supported. int PyList_Sort(PyObject *list) Sort the items of *list* in place. Return "0" on success, "-1" on failure. This is equivalent to "list.sort()". int PyList_Reverse(PyObject *list) Reverse the items of *list* in place. Return "0" on success, "-1" on failure. This is the equivalent of "list.reverse()". PyObject* PyList_AsTuple(PyObject *list) *Return value: New reference.* Return a new tuple object containing the contents of *list*; equivalent to "tuple(list)". Integer Objects *************** All integers are implemented as “long” integer objects of arbitrary size. On error, most "PyLong_As*" APIs return "(return type)-1" which cannot be distinguished from a number. Use "PyErr_Occurred()" to disambiguate. PyLongObject This subtype of "PyObject" represents a Python integer object. PyTypeObject PyLong_Type This instance of "PyTypeObject" represents the Python integer type. This is the same object as "int" in the Python layer. int PyLong_Check(PyObject *p) Return true if its argument is a "PyLongObject" or a subtype of "PyLongObject". This function always succeeds. int PyLong_CheckExact(PyObject *p) Return true if its argument is a "PyLongObject", but not a subtype of "PyLongObject". This function always succeeds. PyObject* PyLong_FromLong(long v) *Return value: New reference.* Return a new "PyLongObject" object from *v*, or "NULL" on failure. The current implementation keeps an array of integer objects for all integers between "-5" and "256". When you create an int in that range you actually just get back a reference to the existing object. PyObject* PyLong_FromUnsignedLong(unsigned long v) *Return value: New reference.* Return a new "PyLongObject" object from a C "unsigned long", or "NULL" on failure. PyObject* PyLong_FromSsize_t(Py_ssize_t v) *Return value: New reference.* Return a new "PyLongObject" object from a C "Py_ssize_t", or "NULL" on failure. PyObject* PyLong_FromSize_t(size_t v) *Return value: New reference.* Return a new "PyLongObject" object from a C "size_t", or "NULL" on failure. PyObject* PyLong_FromLongLong(long long v) *Return value: New reference.* Return a new "PyLongObject" object from a C "long long", or "NULL" on failure. PyObject* PyLong_FromUnsignedLongLong(unsigned long long v) *Return value: New reference.* Return a new "PyLongObject" object from a C "unsigned long long", or "NULL" on failure. PyObject* PyLong_FromDouble(double v) *Return value: New reference.* Return a new "PyLongObject" object from the integer part of *v*, or "NULL" on failure. PyObject* PyLong_FromString(const char *str, char **pend, int base) *Return value: New reference.* Return a new "PyLongObject" based on the string value in *str*, which is interpreted according to the radix in *base*. If *pend* is non-"NULL", **pend* will point to the first character in *str* which follows the representation of the number. If *base* is "0", *str* is interpreted using the Integer literals definition; in this case, leading zeros in a non-zero decimal number raises a "ValueError". If *base* is not "0", it must be between "2" and "36", inclusive. Leading spaces and single underscores after a base specifier and between digits are ignored. If there are no digits, "ValueError" will be raised. PyObject* PyLong_FromUnicode(Py_UNICODE *u, Py_ssize_t length, int base) *Return value: New reference.* Convert a sequence of Unicode digits to a Python integer value. Deprecated since version 3.3, will be removed in version 3.10: Part of the old-style "Py_UNICODE" API; please migrate to using "PyLong_FromUnicodeObject()". PyObject* PyLong_FromUnicodeObject(PyObject *u, int base) *Return value: New reference.* Convert a sequence of Unicode digits in the string *u* to a Python integer value. New in version 3.3. PyObject* PyLong_FromVoidPtr(void *p) *Return value: New reference.* Create a Python integer from the pointer *p*. The pointer value can be retrieved from the resulting value using "PyLong_AsVoidPtr()". long PyLong_AsLong(PyObject *obj) Return a C "long" representation of *obj*. If *obj* is not an instance of "PyLongObject", first call its "__index__()" or "__int__()" method (if present) to convert it to a "PyLongObject". Raise "OverflowError" if the value of *obj* is out of range for a "long". Returns "-1" on error. Use "PyErr_Occurred()" to disambiguate. Changed in version 3.8: Use "__index__()" if available. Deprecated since version 3.8: Using "__int__()" is deprecated. long PyLong_AsLongAndOverflow(PyObject *obj, int *overflow) Return a C "long" representation of *obj*. If *obj* is not an instance of "PyLongObject", first call its "__index__()" or "__int__()" method (if present) to convert it to a "PyLongObject". If the value of *obj* is greater than "LONG_MAX" or less than "LONG_MIN", set **overflow* to "1" or "-1", respectively, and return "-1"; otherwise, set **overflow* to "0". If any other exception occurs set **overflow* to "0" and return "-1" as usual. Returns "-1" on error. Use "PyErr_Occurred()" to disambiguate. Changed in version 3.8: Use "__index__()" if available. Deprecated since version 3.8: Using "__int__()" is deprecated. long long PyLong_AsLongLong(PyObject *obj) Return a C "long long" representation of *obj*. If *obj* is not an instance of "PyLongObject", first call its "__index__()" or "__int__()" method (if present) to convert it to a "PyLongObject". Raise "OverflowError" if the value of *obj* is out of range for a "long long". Returns "-1" on error. Use "PyErr_Occurred()" to disambiguate. Changed in version 3.8: Use "__index__()" if available. Deprecated since version 3.8: Using "__int__()" is deprecated. long long PyLong_AsLongLongAndOverflow(PyObject *obj, int *overflow) Return a C "long long" representation of *obj*. If *obj* is not an instance of "PyLongObject", first call its "__index__()" or "__int__()" method (if present) to convert it to a "PyLongObject". If the value of *obj* is greater than "LLONG_MAX" or less than "LLONG_MIN", set **overflow* to "1" or "-1", respectively, and return "-1"; otherwise, set **overflow* to "0". If any other exception occurs set **overflow* to "0" and return "-1" as usual. Returns "-1" on error. Use "PyErr_Occurred()" to disambiguate. New in version 3.2. Changed in version 3.8: Use "__index__()" if available. Deprecated since version 3.8: Using "__int__()" is deprecated. Py_ssize_t PyLong_AsSsize_t(PyObject *pylong) Return a C "Py_ssize_t" representation of *pylong*. *pylong* must be an instance of "PyLongObject". Raise "OverflowError" if the value of *pylong* is out of range for a "Py_ssize_t". Returns "-1" on error. Use "PyErr_Occurred()" to disambiguate. unsigned long PyLong_AsUnsignedLong(PyObject *pylong) Return a C "unsigned long" representation of *pylong*. *pylong* must be an instance of "PyLongObject". Raise "OverflowError" if the value of *pylong* is out of range for a "unsigned long". Returns "(unsigned long)-1" on error. Use "PyErr_Occurred()" to disambiguate. size_t PyLong_AsSize_t(PyObject *pylong) Return a C "size_t" representation of *pylong*. *pylong* must be an instance of "PyLongObject". Raise "OverflowError" if the value of *pylong* is out of range for a "size_t". Returns "(size_t)-1" on error. Use "PyErr_Occurred()" to disambiguate. unsigned long long PyLong_AsUnsignedLongLong(PyObject *pylong) Return a C "unsigned long long" representation of *pylong*. *pylong* must be an instance of "PyLongObject". Raise "OverflowError" if the value of *pylong* is out of range for an "unsigned long long". Returns "(unsigned long long)-1" on error. Use "PyErr_Occurred()" to disambiguate. Changed in version 3.1: A negative *pylong* now raises "OverflowError", not "TypeError". unsigned long PyLong_AsUnsignedLongMask(PyObject *obj) Return a C "unsigned long" representation of *obj*. If *obj* is not an instance of "PyLongObject", first call its "__index__()" or "__int__()" method (if present) to convert it to a "PyLongObject". If the value of *obj* is out of range for an "unsigned long", return the reduction of that value modulo "ULONG_MAX + 1". Returns "(unsigned long)-1" on error. Use "PyErr_Occurred()" to disambiguate. Changed in version 3.8: Use "__index__()" if available. Deprecated since version 3.8: Using "__int__()" is deprecated. unsigned long long PyLong_AsUnsignedLongLongMask(PyObject *obj) Return a C "unsigned long long" representation of *obj*. If *obj* is not an instance of "PyLongObject", first call its "__index__()" or "__int__()" method (if present) to convert it to a "PyLongObject". If the value of *obj* is out of range for an "unsigned long long", return the reduction of that value modulo "ULLONG_MAX + 1". Returns "(unsigned long long)-1" on error. Use "PyErr_Occurred()" to disambiguate. Changed in version 3.8: Use "__index__()" if available. Deprecated since version 3.8: Using "__int__()" is deprecated. double PyLong_AsDouble(PyObject *pylong) Return a C "double" representation of *pylong*. *pylong* must be an instance of "PyLongObject". Raise "OverflowError" if the value of *pylong* is out of range for a "double". Returns "-1.0" on error. Use "PyErr_Occurred()" to disambiguate. void* PyLong_AsVoidPtr(PyObject *pylong) Convert a Python integer *pylong* to a C "void" pointer. If *pylong* cannot be converted, an "OverflowError" will be raised. This is only assured to produce a usable "void" pointer for values created with "PyLong_FromVoidPtr()". Returns "NULL" on error. Use "PyErr_Occurred()" to disambiguate. Mapping Protocol **************** See also "PyObject_GetItem()", "PyObject_SetItem()" and "PyObject_DelItem()". int PyMapping_Check(PyObject *o) Return "1" if the object provides the mapping protocol or supports slicing, and "0" otherwise. Note that it returns "1" for Python classes with a "__getitem__()" method, since in general it is impossible to determine what type of keys the class supports. This function always succeeds. Py_ssize_t PyMapping_Size(PyObject *o) Py_ssize_t PyMapping_Length(PyObject *o) Returns the number of keys in object *o* on success, and "-1" on failure. This is equivalent to the Python expression "len(o)". PyObject* PyMapping_GetItemString(PyObject *o, const char *key) *Return value: New reference.* Return element of *o* corresponding to the string *key* or "NULL" on failure. This is the equivalent of the Python expression "o[key]". See also "PyObject_GetItem()". int PyMapping_SetItemString(PyObject *o, const char *key, PyObject *v) Map the string *key* to the value *v* in object *o*. Returns "-1" on failure. This is the equivalent of the Python statement "o[key] = v". See also "PyObject_SetItem()". This function *does not* steal a reference to *v*. int PyMapping_DelItem(PyObject *o, PyObject *key) Remove the mapping for the object *key* from the object *o*. Return "-1" on failure. This is equivalent to the Python statement "del o[key]". This is an alias of "PyObject_DelItem()". int PyMapping_DelItemString(PyObject *o, const char *key) Remove the mapping for the string *key* from the object *o*. Return "-1" on failure. This is equivalent to the Python statement "del o[key]". int PyMapping_HasKey(PyObject *o, PyObject *key) Return "1" if the mapping object has the key *key* and "0" otherwise. This is equivalent to the Python expression "key in o". This function always succeeds. Note that exceptions which occur while calling the "__getitem__()" method will get suppressed. To get error reporting use "PyObject_GetItem()" instead. int PyMapping_HasKeyString(PyObject *o, const char *key) Return "1" if the mapping object has the key *key* and "0" otherwise. This is equivalent to the Python expression "key in o". This function always succeeds. Note that exceptions which occur while calling the "__getitem__()" method and creating a temporary string object will get suppressed. To get error reporting use "PyMapping_GetItemString()" instead. PyObject* PyMapping_Keys(PyObject *o) *Return value: New reference.* On success, return a list of the keys in object *o*. On failure, return "NULL". Changed in version 3.7: Previously, the function returned a list or a tuple. PyObject* PyMapping_Values(PyObject *o) *Return value: New reference.* On success, return a list of the values in object *o*. On failure, return "NULL". Changed in version 3.7: Previously, the function returned a list or a tuple. PyObject* PyMapping_Items(PyObject *o) *Return value: New reference.* On success, return a list of the items in object *o*, where each item is a tuple containing a key-value pair. On failure, return "NULL". Changed in version 3.7: Previously, the function returned a list or a tuple. Data marshalling support ************************ These routines allow C code to work with serialized objects using the same data format as the "marshal" module. There are functions to write data into the serialization format, and additional functions that can be used to read the data back. Files used to store marshalled data must be opened in binary mode. Numeric values are stored with the least significant byte first. The module supports two versions of the data format: version 0 is the historical version, version 1 shares interned strings in the file, and upon unmarshalling. Version 2 uses a binary format for floating point numbers. "Py_MARSHAL_VERSION" indicates the current file format (currently 2). void PyMarshal_WriteLongToFile(long value, FILE *file, int version) Marshal a "long" integer, *value*, to *file*. This will only write the least-significant 32 bits of *value*; regardless of the size of the native "long" type. *version* indicates the file format. This function can fail, in which case it sets the error indicator. Use "PyErr_Occurred()" to check for that. void PyMarshal_WriteObjectToFile(PyObject *value, FILE *file, int version) Marshal a Python object, *value*, to *file*. *version* indicates the file format. This function can fail, in which case it sets the error indicator. Use "PyErr_Occurred()" to check for that. PyObject* PyMarshal_WriteObjectToString(PyObject *value, int version) *Return value: New reference.* Return a bytes object containing the marshalled representation of *value*. *version* indicates the file format. The following functions allow marshalled values to be read back in. long PyMarshal_ReadLongFromFile(FILE *file) Return a C "long" from the data stream in a "FILE*" opened for reading. Only a 32-bit value can be read in using this function, regardless of the native size of "long". On error, sets the appropriate exception ("EOFError") and returns "-1". int PyMarshal_ReadShortFromFile(FILE *file) Return a C "short" from the data stream in a "FILE*" opened for reading. Only a 16-bit value can be read in using this function, regardless of the native size of "short". On error, sets the appropriate exception ("EOFError") and returns "-1". PyObject* PyMarshal_ReadObjectFromFile(FILE *file) *Return value: New reference.* Return a Python object from the data stream in a "FILE*" opened for reading. On error, sets the appropriate exception ("EOFError", "ValueError" or "TypeError") and returns "NULL". PyObject* PyMarshal_ReadLastObjectFromFile(FILE *file) *Return value: New reference.* Return a Python object from the data stream in a "FILE*" opened for reading. Unlike "PyMarshal_ReadObjectFromFile()", this function assumes that no further objects will be read from the file, allowing it to aggressively load file data into memory so that the de-serialization can operate from data in memory rather than reading a byte at a time from the file. Only use these variant if you are certain that you won’t be reading anything else from the file. On error, sets the appropriate exception ("EOFError", "ValueError" or "TypeError") and returns "NULL". PyObject* PyMarshal_ReadObjectFromString(const char *data, Py_ssize_t len) *Return value: New reference.* Return a Python object from the data stream in a byte buffer containing *len* bytes pointed to by *data*. On error, sets the appropriate exception ("EOFError", "ValueError" or "TypeError") and returns "NULL". Memory Management ***************** Overview ======== Memory management in Python involves a private heap containing all Python objects and data structures. The management of this private heap is ensured internally by the *Python memory manager*. The Python memory manager has different components which deal with various dynamic storage management aspects, like sharing, segmentation, preallocation or caching. At the lowest level, a raw memory allocator ensures that there is enough room in the private heap for storing all Python-related data by interacting with the memory manager of the operating system. On top of the raw memory allocator, several object-specific allocators operate on the same heap and implement distinct memory management policies adapted to the peculiarities of every object type. For example, integer objects are managed differently within the heap than strings, tuples or dictionaries because integers imply different storage requirements and speed/space tradeoffs. The Python memory manager thus delegates some of the work to the object-specific allocators, but ensures that the latter operate within the bounds of the private heap. It is important to understand that the management of the Python heap is performed by the interpreter itself and that the user has no control over it, even if they regularly manipulate object pointers to memory blocks inside that heap. The allocation of heap space for Python objects and other internal buffers is performed on demand by the Python memory manager through the Python/C API functions listed in this document. To avoid memory corruption, extension writers should never try to operate on Python objects with the functions exported by the C library: "malloc()", "calloc()", "realloc()" and "free()". This will result in mixed calls between the C allocator and the Python memory manager with fatal consequences, because they implement different algorithms and operate on different heaps. However, one may safely allocate and release memory blocks with the C library allocator for individual purposes, as shown in the following example: PyObject *res; char *buf = (char *) malloc(BUFSIZ); /* for I/O */ if (buf == NULL) return PyErr_NoMemory(); ...Do some I/O operation involving buf... res = PyBytes_FromString(buf); free(buf); /* malloc'ed */ return res; In this example, the memory request for the I/O buffer is handled by the C library allocator. The Python memory manager is involved only in the allocation of the bytes object returned as a result. In most situations, however, it is recommended to allocate memory from the Python heap specifically because the latter is under control of the Python memory manager. For example, this is required when the interpreter is extended with new object types written in C. Another reason for using the Python heap is the desire to *inform* the Python memory manager about the memory needs of the extension module. Even when the requested memory is used exclusively for internal, highly- specific purposes, delegating all memory requests to the Python memory manager causes the interpreter to have a more accurate image of its memory footprint as a whole. Consequently, under certain circumstances, the Python memory manager may or may not trigger appropriate actions, like garbage collection, memory compaction or other preventive procedures. Note that by using the C library allocator as shown in the previous example, the allocated memory for the I/O buffer escapes completely the Python memory manager. See also: The "PYTHONMALLOC" environment variable can be used to configure the memory allocators used by Python. The "PYTHONMALLOCSTATS" environment variable can be used to print statistics of the pymalloc memory allocator every time a new pymalloc object arena is created, and on shutdown. Raw Memory Interface ==================== The following function sets are wrappers to the system allocator. These functions are thread-safe, the *GIL* does not need to be held. The default raw memory allocator uses the following functions: "malloc()", "calloc()", "realloc()" and "free()"; call "malloc(1)" (or "calloc(1, 1)") when requesting zero bytes. New in version 3.4. void* PyMem_RawMalloc(size_t n) Allocates *n* bytes and returns a pointer of type "void*" to the allocated memory, or "NULL" if the request fails. Requesting zero bytes returns a distinct non-"NULL" pointer if possible, as if "PyMem_RawMalloc(1)" had been called instead. The memory will not have been initialized in any way. void* PyMem_RawCalloc(size_t nelem, size_t elsize) Allocates *nelem* elements each whose size in bytes is *elsize* and returns a pointer of type "void*" to the allocated memory, or "NULL" if the request fails. The memory is initialized to zeros. Requesting zero elements or elements of size zero bytes returns a distinct non-"NULL" pointer if possible, as if "PyMem_RawCalloc(1, 1)" had been called instead. New in version 3.5. void* PyMem_RawRealloc(void *p, size_t n) Resizes the memory block pointed to by *p* to *n* bytes. The contents will be unchanged to the minimum of the old and the new sizes. If *p* is "NULL", the call is equivalent to "PyMem_RawMalloc(n)"; else if *n* is equal to zero, the memory block is resized but is not freed, and the returned pointer is non-"NULL". Unless *p* is "NULL", it must have been returned by a previous call to "PyMem_RawMalloc()", "PyMem_RawRealloc()" or "PyMem_RawCalloc()". If the request fails, "PyMem_RawRealloc()" returns "NULL" and *p* remains a valid pointer to the previous memory area. void PyMem_RawFree(void *p) Frees the memory block pointed to by *p*, which must have been returned by a previous call to "PyMem_RawMalloc()", "PyMem_RawRealloc()" or "PyMem_RawCalloc()". Otherwise, or if "PyMem_RawFree(p)" has been called before, undefined behavior occurs. If *p* is "NULL", no operation is performed. Memory Interface ================ The following function sets, modeled after the ANSI C standard, but specifying behavior when requesting zero bytes, are available for allocating and releasing memory from the Python heap. The default memory allocator uses the pymalloc memory allocator. Warning: The *GIL* must be held when using these functions. Changed in version 3.6: The default allocator is now pymalloc instead of system "malloc()". void* PyMem_Malloc(size_t n) Allocates *n* bytes and returns a pointer of type "void*" to the allocated memory, or "NULL" if the request fails. Requesting zero bytes returns a distinct non-"NULL" pointer if possible, as if "PyMem_Malloc(1)" had been called instead. The memory will not have been initialized in any way. void* PyMem_Calloc(size_t nelem, size_t elsize) Allocates *nelem* elements each whose size in bytes is *elsize* and returns a pointer of type "void*" to the allocated memory, or "NULL" if the request fails. The memory is initialized to zeros. Requesting zero elements or elements of size zero bytes returns a distinct non-"NULL" pointer if possible, as if "PyMem_Calloc(1, 1)" had been called instead. New in version 3.5. void* PyMem_Realloc(void *p, size_t n) Resizes the memory block pointed to by *p* to *n* bytes. The contents will be unchanged to the minimum of the old and the new sizes. If *p* is "NULL", the call is equivalent to "PyMem_Malloc(n)"; else if *n* is equal to zero, the memory block is resized but is not freed, and the returned pointer is non-"NULL". Unless *p* is "NULL", it must have been returned by a previous call to "PyMem_Malloc()", "PyMem_Realloc()" or "PyMem_Calloc()". If the request fails, "PyMem_Realloc()" returns "NULL" and *p* remains a valid pointer to the previous memory area. void PyMem_Free(void *p) Frees the memory block pointed to by *p*, which must have been returned by a previous call to "PyMem_Malloc()", "PyMem_Realloc()" or "PyMem_Calloc()". Otherwise, or if "PyMem_Free(p)" has been called before, undefined behavior occurs. If *p* is "NULL", no operation is performed. The following type-oriented macros are provided for convenience. Note that *TYPE* refers to any C type. TYPE* PyMem_New(TYPE, size_t n) Same as "PyMem_Malloc()", but allocates "(n * sizeof(TYPE))" bytes of memory. Returns a pointer cast to "TYPE*". The memory will not have been initialized in any way. TYPE* PyMem_Resize(void *p, TYPE, size_t n) Same as "PyMem_Realloc()", but the memory block is resized to "(n * sizeof(TYPE))" bytes. Returns a pointer cast to "TYPE*". On return, *p* will be a pointer to the new memory area, or "NULL" in the event of failure. This is a C preprocessor macro; *p* is always reassigned. Save the original value of *p* to avoid losing memory when handling errors. void PyMem_Del(void *p) Same as "PyMem_Free()". In addition, the following macro sets are provided for calling the Python memory allocator directly, without involving the C API functions listed above. However, note that their use does not preserve binary compatibility across Python versions and is therefore deprecated in extension modules. * "PyMem_MALLOC(size)" * "PyMem_NEW(type, size)" * "PyMem_REALLOC(ptr, size)" * "PyMem_RESIZE(ptr, type, size)" * "PyMem_FREE(ptr)" * "PyMem_DEL(ptr)" Object allocators ================= The following function sets, modeled after the ANSI C standard, but specifying behavior when requesting zero bytes, are available for allocating and releasing memory from the Python heap. The default object allocator uses the pymalloc memory allocator. Warning: The *GIL* must be held when using these functions. void* PyObject_Malloc(size_t n) Allocates *n* bytes and returns a pointer of type "void*" to the allocated memory, or "NULL" if the request fails. Requesting zero bytes returns a distinct non-"NULL" pointer if possible, as if "PyObject_Malloc(1)" had been called instead. The memory will not have been initialized in any way. void* PyObject_Calloc(size_t nelem, size_t elsize) Allocates *nelem* elements each whose size in bytes is *elsize* and returns a pointer of type "void*" to the allocated memory, or "NULL" if the request fails. The memory is initialized to zeros. Requesting zero elements or elements of size zero bytes returns a distinct non-"NULL" pointer if possible, as if "PyObject_Calloc(1, 1)" had been called instead. New in version 3.5. void* PyObject_Realloc(void *p, size_t n) Resizes the memory block pointed to by *p* to *n* bytes. The contents will be unchanged to the minimum of the old and the new sizes. If *p* is "NULL", the call is equivalent to "PyObject_Malloc(n)"; else if *n* is equal to zero, the memory block is resized but is not freed, and the returned pointer is non-"NULL". Unless *p* is "NULL", it must have been returned by a previous call to "PyObject_Malloc()", "PyObject_Realloc()" or "PyObject_Calloc()". If the request fails, "PyObject_Realloc()" returns "NULL" and *p* remains a valid pointer to the previous memory area. void PyObject_Free(void *p) Frees the memory block pointed to by *p*, which must have been returned by a previous call to "PyObject_Malloc()", "PyObject_Realloc()" or "PyObject_Calloc()". Otherwise, or if "PyObject_Free(p)" has been called before, undefined behavior occurs. If *p* is "NULL", no operation is performed. Default Memory Allocators ========================= Default memory allocators: +---------------------------------+----------------------+--------------------+-----------------------+----------------------+ | Configuration | Name | PyMem_RawMalloc | PyMem_Malloc | PyObject_Malloc | |=================================|======================|====================|=======================|======================| | Release build | ""pymalloc"" | "malloc" | "pymalloc" | "pymalloc" | +---------------------------------+----------------------+--------------------+-----------------------+----------------------+ | Debug build | ""pymalloc_debug"" | "malloc" + debug | "pymalloc" + debug | "pymalloc" + debug | +---------------------------------+----------------------+--------------------+-----------------------+----------------------+ | Release build, without pymalloc | ""malloc"" | "malloc" | "malloc" | "malloc" | +---------------------------------+----------------------+--------------------+-----------------------+----------------------+ | Debug build, without pymalloc | ""malloc_debug"" | "malloc" + debug | "malloc" + debug | "malloc" + debug | +---------------------------------+----------------------+--------------------+-----------------------+----------------------+ Legend: * Name: value for "PYTHONMALLOC" environment variable * "malloc": system allocators from the standard C library, C functions: "malloc()", "calloc()", "realloc()" and "free()" * "pymalloc": pymalloc memory allocator * “+ debug”: with debug hooks installed by "PyMem_SetupDebugHooks()" Customize Memory Allocators =========================== New in version 3.4. PyMemAllocatorEx Structure used to describe a memory block allocator. The structure has the following fields: +------------------------------------------------------------+-----------------------------------------+ | Field | Meaning | |============================================================|=========================================| | "void *ctx" | user context passed as first argument | +------------------------------------------------------------+-----------------------------------------+ | "void* malloc(void *ctx, size_t size)" | allocate a memory block | +------------------------------------------------------------+-----------------------------------------+ | "void* calloc(void *ctx, size_t nelem, size_t elsize)" | allocate a memory block initialized | | | with zeros | +------------------------------------------------------------+-----------------------------------------+ | "void* realloc(void *ctx, void *ptr, size_t new_size)" | allocate or resize a memory block | +------------------------------------------------------------+-----------------------------------------+ | "void free(void *ctx, void *ptr)" | free a memory block | +------------------------------------------------------------+-----------------------------------------+ Changed in version 3.5: The "PyMemAllocator" structure was renamed to "PyMemAllocatorEx" and a new "calloc" field was added. PyMemAllocatorDomain Enum used to identify an allocator domain. Domains: PYMEM_DOMAIN_RAW Functions: * "PyMem_RawMalloc()" * "PyMem_RawRealloc()" * "PyMem_RawCalloc()" * "PyMem_RawFree()" PYMEM_DOMAIN_MEM Functions: * "PyMem_Malloc()", * "PyMem_Realloc()" * "PyMem_Calloc()" * "PyMem_Free()" PYMEM_DOMAIN_OBJ Functions: * "PyObject_Malloc()" * "PyObject_Realloc()" * "PyObject_Calloc()" * "PyObject_Free()" void PyMem_GetAllocator(PyMemAllocatorDomain domain, PyMemAllocatorEx *allocator) Get the memory block allocator of the specified domain. void PyMem_SetAllocator(PyMemAllocatorDomain domain, PyMemAllocatorEx *allocator) Set the memory block allocator of the specified domain. The new allocator must return a distinct non-"NULL" pointer when requesting zero bytes. For the "PYMEM_DOMAIN_RAW" domain, the allocator must be thread- safe: the *GIL* is not held when the allocator is called. If the new allocator is not a hook (does not call the previous allocator), the "PyMem_SetupDebugHooks()" function must be called to reinstall the debug hooks on top on the new allocator. void PyMem_SetupDebugHooks(void) Setup hooks to detect bugs in the Python memory allocator functions. Newly allocated memory is filled with the byte "0xCD" ("CLEANBYTE"), freed memory is filled with the byte "0xDD" ("DEADBYTE"). Memory blocks are surrounded by “forbidden bytes” ("FORBIDDENBYTE": byte "0xFD"). Runtime checks: * Detect API violations, ex: "PyObject_Free()" called on a buffer allocated by "PyMem_Malloc()" * Detect write before the start of the buffer (buffer underflow) * Detect write after the end of the buffer (buffer overflow) * Check that the *GIL* is held when allocator functions of "PYMEM_DOMAIN_OBJ" (ex: "PyObject_Malloc()") and "PYMEM_DOMAIN_MEM" (ex: "PyMem_Malloc()") domains are called On error, the debug hooks use the "tracemalloc" module to get the traceback where a memory block was allocated. The traceback is only displayed if "tracemalloc" is tracing Python memory allocations and the memory block was traced. These hooks are installed by default if Python is compiled in debug mode. The "PYTHONMALLOC" environment variable can be used to install debug hooks on a Python compiled in release mode. Changed in version 3.6: This function now also works on Python compiled in release mode. On error, the debug hooks now use "tracemalloc" to get the traceback where a memory block was allocated. The debug hooks now also check if the GIL is held when functions of "PYMEM_DOMAIN_OBJ" and "PYMEM_DOMAIN_MEM" domains are called. Changed in version 3.8: Byte patterns "0xCB" ("CLEANBYTE"), "0xDB" ("DEADBYTE") and "0xFB" ("FORBIDDENBYTE") have been replaced with "0xCD", "0xDD" and "0xFD" to use the same values than Windows CRT debug "malloc()" and "free()". The pymalloc allocator ====================== Python has a *pymalloc* allocator optimized for small objects (smaller or equal to 512 bytes) with a short lifetime. It uses memory mappings called “arenas” with a fixed size of 256 KiB. It falls back to "PyMem_RawMalloc()" and "PyMem_RawRealloc()" for allocations larger than 512 bytes. *pymalloc* is the default allocator of the "PYMEM_DOMAIN_MEM" (ex: "PyMem_Malloc()") and "PYMEM_DOMAIN_OBJ" (ex: "PyObject_Malloc()") domains. The arena allocator uses the following functions: * "VirtualAlloc()" and "VirtualFree()" on Windows, * "mmap()" and "munmap()" if available, * "malloc()" and "free()" otherwise. Customize pymalloc Arena Allocator ---------------------------------- New in version 3.4. PyObjectArenaAllocator Structure used to describe an arena allocator. The structure has three fields: +----------------------------------------------------+-----------------------------------------+ | Field | Meaning | |====================================================|=========================================| | "void *ctx" | user context passed as first argument | +----------------------------------------------------+-----------------------------------------+ | "void* alloc(void *ctx, size_t size)" | allocate an arena of size bytes | +----------------------------------------------------+-----------------------------------------+ | "void free(void *ctx, void *ptr, size_t size)" | free an arena | +----------------------------------------------------+-----------------------------------------+ void PyObject_GetArenaAllocator(PyObjectArenaAllocator *allocator) Get the arena allocator. void PyObject_SetArenaAllocator(PyObjectArenaAllocator *allocator) Set the arena allocator. tracemalloc C API ================= New in version 3.7. int PyTraceMalloc_Track(unsigned int domain, uintptr_t ptr, size_t size) Track an allocated memory block in the "tracemalloc" module. Return "0" on success, return "-1" on error (failed to allocate memory to store the trace). Return "-2" if tracemalloc is disabled. If memory block is already tracked, update the existing trace. int PyTraceMalloc_Untrack(unsigned int domain, uintptr_t ptr) Untrack an allocated memory block in the "tracemalloc" module. Do nothing if the block was not tracked. Return "-2" if tracemalloc is disabled, otherwise return "0". Examples ======== Here is the example from section Overview, rewritten so that the I/O buffer is allocated from the Python heap by using the first function set: PyObject *res; char *buf = (char *) PyMem_Malloc(BUFSIZ); /* for I/O */ if (buf == NULL) return PyErr_NoMemory(); /* ...Do some I/O operation involving buf... */ res = PyBytes_FromString(buf); PyMem_Free(buf); /* allocated with PyMem_Malloc */ return res; The same code using the type-oriented function set: PyObject *res; char *buf = PyMem_New(char, BUFSIZ); /* for I/O */ if (buf == NULL) return PyErr_NoMemory(); /* ...Do some I/O operation involving buf... */ res = PyBytes_FromString(buf); PyMem_Del(buf); /* allocated with PyMem_New */ return res; Note that in the two examples above, the buffer is always manipulated via functions belonging to the same set. Indeed, it is required to use the same memory API family for a given memory block, so that the risk of mixing different allocators is reduced to a minimum. The following code sequence contains two errors, one of which is labeled as *fatal* because it mixes two different allocators operating on different heaps. char *buf1 = PyMem_New(char, BUFSIZ); char *buf2 = (char *) malloc(BUFSIZ); char *buf3 = (char *) PyMem_Malloc(BUFSIZ); ... PyMem_Del(buf3); /* Wrong -- should be PyMem_Free() */ free(buf2); /* Right -- allocated via malloc() */ free(buf1); /* Fatal -- should be PyMem_Del() */ In addition to the functions aimed at handling raw memory blocks from the Python heap, objects in Python are allocated and released with "PyObject_New()", "PyObject_NewVar()" and "PyObject_Del()". These will be explained in the next chapter on defining and implementing new object types in C. MemoryView objects ****************** A "memoryview" object exposes the C level buffer interface as a Python object which can then be passed around like any other object. PyObject *PyMemoryView_FromObject(PyObject *obj) *Return value: New reference.* Create a memoryview object from an object that provides the buffer interface. If *obj* supports writable buffer exports, the memoryview object will be read/write, otherwise it may be either read-only or read/write at the discretion of the exporter. PyObject *PyMemoryView_FromMemory(char *mem, Py_ssize_t size, int flags) *Return value: New reference.* Create a memoryview object using *mem* as the underlying buffer. *flags* can be one of "PyBUF_READ" or "PyBUF_WRITE". New in version 3.3. PyObject *PyMemoryView_FromBuffer(Py_buffer *view) *Return value: New reference.* Create a memoryview object wrapping the given buffer structure *view*. For simple byte buffers, "PyMemoryView_FromMemory()" is the preferred function. PyObject *PyMemoryView_GetContiguous(PyObject *obj, int buffertype, char order) *Return value: New reference.* Create a memoryview object to a *contiguous* chunk of memory (in either ‘C’ or ‘F’ortran *order*) from an object that defines the buffer interface. If memory is contiguous, the memoryview object points to the original memory. Otherwise, a copy is made and the memoryview points to a new bytes object. int PyMemoryView_Check(PyObject *obj) Return true if the object *obj* is a memoryview object. It is not currently allowed to create subclasses of "memoryview". This function always succeeds. Py_buffer *PyMemoryView_GET_BUFFER(PyObject *mview) Return a pointer to the memoryview’s private copy of the exporter’s buffer. *mview* **must** be a memoryview instance; this macro doesn’t check its type, you must do it yourself or you will risk crashes. Py_buffer *PyMemoryView_GET_BASE(PyObject *mview) Return either a pointer to the exporting object that the memoryview is based on or "NULL" if the memoryview has been created by one of the functions "PyMemoryView_FromMemory()" or "PyMemoryView_FromBuffer()". *mview* **must** be a memoryview instance. Instance Method Objects *********************** An instance method is a wrapper for a "PyCFunction" and the new way to bind a "PyCFunction" to a class object. It replaces the former call "PyMethod_New(func, NULL, class)". PyTypeObject PyInstanceMethod_Type This instance of "PyTypeObject" represents the Python instance method type. It is not exposed to Python programs. int PyInstanceMethod_Check(PyObject *o) Return true if *o* is an instance method object (has type "PyInstanceMethod_Type"). The parameter must not be "NULL". This function always succeeds. PyObject* PyInstanceMethod_New(PyObject *func) *Return value: New reference.* Return a new instance method object, with *func* being any callable object. *func* is the function that will be called when the instance method is called. PyObject* PyInstanceMethod_Function(PyObject *im) *Return value: Borrowed reference.* Return the function object associated with the instance method *im*. PyObject* PyInstanceMethod_GET_FUNCTION(PyObject *im) *Return value: Borrowed reference.* Macro version of "PyInstanceMethod_Function()" which avoids error checking. Method Objects ************** Methods are bound function objects. Methods are always bound to an instance of a user-defined class. Unbound methods (methods bound to a class object) are no longer available. PyTypeObject PyMethod_Type This instance of "PyTypeObject" represents the Python method type. This is exposed to Python programs as "types.MethodType". int PyMethod_Check(PyObject *o) Return true if *o* is a method object (has type "PyMethod_Type"). The parameter must not be "NULL". This function always succeeds. PyObject* PyMethod_New(PyObject *func, PyObject *self) *Return value: New reference.* Return a new method object, with *func* being any callable object and *self* the instance the method should be bound. *func* is the function that will be called when the method is called. *self* must not be "NULL". PyObject* PyMethod_Function(PyObject *meth) *Return value: Borrowed reference.* Return the function object associated with the method *meth*. PyObject* PyMethod_GET_FUNCTION(PyObject *meth) *Return value: Borrowed reference.* Macro version of "PyMethod_Function()" which avoids error checking. PyObject* PyMethod_Self(PyObject *meth) *Return value: Borrowed reference.* Return the instance associated with the method *meth*. PyObject* PyMethod_GET_SELF(PyObject *meth) *Return value: Borrowed reference.* Macro version of "PyMethod_Self()" which avoids error checking. Module Objects ************** PyTypeObject PyModule_Type This instance of "PyTypeObject" represents the Python module type. This is exposed to Python programs as "types.ModuleType". int PyModule_Check(PyObject *p) Return true if *p* is a module object, or a subtype of a module object. This function always succeeds. int PyModule_CheckExact(PyObject *p) Return true if *p* is a module object, but not a subtype of "PyModule_Type". This function always succeeds. PyObject* PyModule_NewObject(PyObject *name) *Return value: New reference.* Return a new module object with the "__name__" attribute set to *name*. The module’s "__name__", "__doc__", "__package__", and "__loader__" attributes are filled in (all but "__name__" are set to "None"); the caller is responsible for providing a "__file__" attribute. New in version 3.3. Changed in version 3.4: "__package__" and "__loader__" are set to "None". PyObject* PyModule_New(const char *name) *Return value: New reference.* Similar to "PyModule_NewObject()", but the name is a UTF-8 encoded string instead of a Unicode object. PyObject* PyModule_GetDict(PyObject *module) *Return value: Borrowed reference.* Return the dictionary object that implements *module*’s namespace; this object is the same as the "__dict__" attribute of the module object. If *module* is not a module object (or a subtype of a module object), "SystemError" is raised and "NULL" is returned. It is recommended extensions use other "PyModule_*()" and "PyObject_*()" functions rather than directly manipulate a module’s "__dict__". PyObject* PyModule_GetNameObject(PyObject *module) *Return value: New reference.* Return *module*’s "__name__" value. If the module does not provide one, or if it is not a string, "SystemError" is raised and "NULL" is returned. New in version 3.3. const char* PyModule_GetName(PyObject *module) Similar to "PyModule_GetNameObject()" but return the name encoded to "'utf-8'". void* PyModule_GetState(PyObject *module) Return the “state” of the module, that is, a pointer to the block of memory allocated at module creation time, or "NULL". See "PyModuleDef.m_size". PyModuleDef* PyModule_GetDef(PyObject *module) Return a pointer to the "PyModuleDef" struct from which the module was created, or "NULL" if the module wasn’t created from a definition. PyObject* PyModule_GetFilenameObject(PyObject *module) *Return value: New reference.* Return the name of the file from which *module* was loaded using *module*’s "__file__" attribute. If this is not defined, or if it is not a unicode string, raise "SystemError" and return "NULL"; otherwise return a reference to a Unicode object. New in version 3.2. const char* PyModule_GetFilename(PyObject *module) Similar to "PyModule_GetFilenameObject()" but return the filename encoded to ‘utf-8’. Deprecated since version 3.2: "PyModule_GetFilename()" raises "UnicodeEncodeError" on unencodable filenames, use "PyModule_GetFilenameObject()" instead. Initializing C modules ====================== Modules objects are usually created from extension modules (shared libraries which export an initialization function), or compiled-in modules (where the initialization function is added using "PyImport_AppendInittab()"). See Building C and C++ Extensions or Extending Embedded Python for details. The initialization function can either pass a module definition instance to "PyModule_Create()", and return the resulting module object, or request “multi-phase initialization” by returning the definition struct itself. PyModuleDef The module definition struct, which holds all information needed to create a module object. There is usually only one statically initialized variable of this type for each module. PyModuleDef_Base m_base Always initialize this member to "PyModuleDef_HEAD_INIT". const char *m_name Name for the new module. const char *m_doc Docstring for the module; usually a docstring variable created with "PyDoc_STRVAR" is used. Py_ssize_t m_size Module state may be kept in a per-module memory area that can be retrieved with "PyModule_GetState()", rather than in static globals. This makes modules safe for use in multiple sub- interpreters. This memory area is allocated based on *m_size* on module creation, and freed when the module object is deallocated, after the "m_free" function has been called, if present. Setting "m_size" to "-1" means that the module does not support sub-interpreters, because it has global state. Setting it to a non-negative value means that the module can be re-initialized and specifies the additional amount of memory it requires for its state. Non-negative "m_size" is required for multi-phase initialization. See **PEP 3121** for more details. PyMethodDef* m_methods A pointer to a table of module-level functions, described by "PyMethodDef" values. Can be "NULL" if no functions are present. PyModuleDef_Slot* m_slots An array of slot definitions for multi-phase initialization, terminated by a "{0, NULL}" entry. When using single-phase initialization, *m_slots* must be "NULL". Changed in version 3.5: Prior to version 3.5, this member was always set to "NULL", and was defined as: inquiry m_reload traverseproc m_traverse A traversal function to call during GC traversal of the module object, or "NULL" if not needed. This function is not called if the module state was requested but is not allocated yet. This is the case immediately after the module is created and before the module is executed ("Py_mod_exec" function). More precisely, this function is not called if "m_size" is greater than 0 and the module state (as returned by "PyModule_GetState()") is "NULL". Changed in version 3.9: No longer called before the module state is allocated. inquiry m_clear A clear function to call during GC clearing of the module object, or "NULL" if not needed. This function is not called if the module state was requested but is not allocated yet. This is the case immediately after the module is created and before the module is executed ("Py_mod_exec" function). More precisely, this function is not called if "m_size" is greater than 0 and the module state (as returned by "PyModule_GetState()") is "NULL". Like "PyTypeObject.tp_clear", this function is not *always* called before a module is deallocated. For example, when reference counting is enough to determine that an object is no longer used, the cyclic garbage collector is not involved and "m_free" is called directly. Changed in version 3.9: No longer called before the module state is allocated. freefunc m_free A function to call during deallocation of the module object, or "NULL" if not needed. This function is not called if the module state was requested but is not allocated yet. This is the case immediately after the module is created and before the module is executed ("Py_mod_exec" function). More precisely, this function is not called if "m_size" is greater than 0 and the module state (as returned by "PyModule_GetState()") is "NULL". Changed in version 3.9: No longer called before the module state is allocated. Single-phase initialization --------------------------- The module initialization function may create and return the module object directly. This is referred to as “single-phase initialization”, and uses one of the following two module creation functions: PyObject* PyModule_Create(PyModuleDef *def) *Return value: New reference.* Create a new module object, given the definition in *def*. This behaves like "PyModule_Create2()" with *module_api_version* set to "PYTHON_API_VERSION". PyObject* PyModule_Create2(PyModuleDef *def, int module_api_version) *Return value: New reference.* Create a new module object, given the definition in *def*, assuming the API version *module_api_version*. If that version does not match the version of the running interpreter, a "RuntimeWarning" is emitted. Note: Most uses of this function should be using "PyModule_Create()" instead; only use this if you are sure you need it. Before it is returned from in the initialization function, the resulting module object is typically populated using functions like "PyModule_AddObject()". Multi-phase initialization -------------------------- An alternate way to specify extensions is to request “multi-phase initialization”. Extension modules created this way behave more like Python modules: the initialization is split between the *creation phase*, when the module object is created, and the *execution phase*, when it is populated. The distinction is similar to the "__new__()" and "__init__()" methods of classes. Unlike modules created using single-phase initialization, these modules are not singletons: if the *sys.modules* entry is removed and the module is re-imported, a new module object is created, and the old module is subject to normal garbage collection – as with Python modules. By default, multiple modules created from the same definition should be independent: changes to one should not affect the others. This means that all state should be specific to the module object (using e.g. using "PyModule_GetState()"), or its contents (such as the module’s "__dict__" or individual classes created with "PyType_FromSpec()"). All modules created using multi-phase initialization are expected to support sub-interpreters. Making sure multiple modules are independent is typically enough to achieve this. To request multi-phase initialization, the initialization function (PyInit_modulename) returns a "PyModuleDef" instance with non-empty "m_slots". Before it is returned, the "PyModuleDef" instance must be initialized with the following function: PyObject* PyModuleDef_Init(PyModuleDef *def) *Return value: Borrowed reference.* Ensures a module definition is a properly initialized Python object that correctly reports its type and reference count. Returns *def* cast to "PyObject*", or "NULL" if an error occurred. New in version 3.5. The *m_slots* member of the module definition must point to an array of "PyModuleDef_Slot" structures: PyModuleDef_Slot int slot A slot ID, chosen from the available values explained below. void* value Value of the slot, whose meaning depends on the slot ID. New in version 3.5. The *m_slots* array must be terminated by a slot with id 0. The available slot types are: Py_mod_create Specifies a function that is called to create the module object itself. The *value* pointer of this slot must point to a function of the signature: PyObject* create_module(PyObject *spec, PyModuleDef *def) The function receives a "ModuleSpec" instance, as defined in **PEP 451**, and the module definition. It should return a new module object, or set an error and return "NULL". This function should be kept minimal. In particular, it should not call arbitrary Python code, as trying to import the same module again may result in an infinite loop. Multiple "Py_mod_create" slots may not be specified in one module definition. If "Py_mod_create" is not specified, the import machinery will create a normal module object using "PyModule_New()". The name is taken from *spec*, not the definition, to allow extension modules to dynamically adjust to their place in the module hierarchy and be imported under different names through symlinks, all while sharing a single module definition. There is no requirement for the returned object to be an instance of "PyModule_Type". Any type can be used, as long as it supports setting and getting import-related attributes. However, only "PyModule_Type" instances may be returned if the "PyModuleDef" has non-"NULL" "m_traverse", "m_clear", "m_free"; non-zero "m_size"; or slots other than "Py_mod_create". Py_mod_exec Specifies a function that is called to *execute* the module. This is equivalent to executing the code of a Python module: typically, this function adds classes and constants to the module. The signature of the function is: int exec_module(PyObject* module) If multiple "Py_mod_exec" slots are specified, they are processed in the order they appear in the *m_slots* array. See **PEP 489** for more details on multi-phase initialization. Low-level module creation functions ----------------------------------- The following functions are called under the hood when using multi- phase initialization. They can be used directly, for example when creating module objects dynamically. Note that both "PyModule_FromDefAndSpec" and "PyModule_ExecDef" must be called to fully initialize a module. PyObject * PyModule_FromDefAndSpec(PyModuleDef *def, PyObject *spec) *Return value: New reference.* Create a new module object, given the definition in *module* and the ModuleSpec *spec*. This behaves like "PyModule_FromDefAndSpec2()" with *module_api_version* set to "PYTHON_API_VERSION". New in version 3.5. PyObject * PyModule_FromDefAndSpec2(PyModuleDef *def, PyObject *spec, int module_api_version) *Return value: New reference.* Create a new module object, given the definition in *module* and the ModuleSpec *spec*, assuming the API version *module_api_version*. If that version does not match the version of the running interpreter, a "RuntimeWarning" is emitted. Note: Most uses of this function should be using "PyModule_FromDefAndSpec()" instead; only use this if you are sure you need it. New in version 3.5. int PyModule_ExecDef(PyObject *module, PyModuleDef *def) Process any execution slots ("Py_mod_exec") given in *def*. New in version 3.5. int PyModule_SetDocString(PyObject *module, const char *docstring) Set the docstring for *module* to *docstring*. This function is called automatically when creating a module from "PyModuleDef", using either "PyModule_Create" or "PyModule_FromDefAndSpec". New in version 3.5. int PyModule_AddFunctions(PyObject *module, PyMethodDef *functions) Add the functions from the "NULL" terminated *functions* array to *module*. Refer to the "PyMethodDef" documentation for details on individual entries (due to the lack of a shared module namespace, module level “functions” implemented in C typically receive the module as their first parameter, making them similar to instance methods on Python classes). This function is called automatically when creating a module from "PyModuleDef", using either "PyModule_Create" or "PyModule_FromDefAndSpec". New in version 3.5. Support functions ----------------- The module initialization function (if using single phase initialization) or a function called from a module execution slot (if using multi-phase initialization), can use the following functions to help initialize the module state: int PyModule_AddObject(PyObject *module, const char *name, PyObject *value) Add an object to *module* as *name*. This is a convenience function which can be used from the module’s initialization function. This steals a reference to *value* on success. Return "-1" on error, "0" on success. Note: Unlike other functions that steal references, "PyModule_AddObject()" only decrements the reference count of *value* **on success**.This means that its return value must be checked, and calling code must "Py_DECREF()" *value* manually on error. Example usage: Py_INCREF(spam); if (PyModule_AddObject(module, "spam", spam) < 0) { Py_DECREF(module); Py_DECREF(spam); return NULL; } int PyModule_AddIntConstant(PyObject *module, const char *name, long value) Add an integer constant to *module* as *name*. This convenience function can be used from the module’s initialization function. Return "-1" on error, "0" on success. int PyModule_AddStringConstant(PyObject *module, const char *name, const char *value) Add a string constant to *module* as *name*. This convenience function can be used from the module’s initialization function. The string *value* must be "NULL"-terminated. Return "-1" on error, "0" on success. int PyModule_AddIntMacro(PyObject *module, macro) Add an int constant to *module*. The name and the value are taken from *macro*. For example "PyModule_AddIntMacro(module, AF_INET)" adds the int constant *AF_INET* with the value of *AF_INET* to *module*. Return "-1" on error, "0" on success. int PyModule_AddStringMacro(PyObject *module, macro) Add a string constant to *module*. int PyModule_AddType(PyObject *module, PyTypeObject *type) Add a type object to *module*. The type object is finalized by calling internally "PyType_Ready()". The name of the type object is taken from the last component of "tp_name" after dot. Return "-1" on error, "0" on success. New in version 3.9. Module lookup ============= Single-phase initialization creates singleton modules that can be looked up in the context of the current interpreter. This allows the module object to be retrieved later with only a reference to the module definition. These functions will not work on modules created using multi-phase initialization, since multiple such modules can be created from a single definition. PyObject* PyState_FindModule(PyModuleDef *def) *Return value: Borrowed reference.* Returns the module object that was created from *def* for the current interpreter. This method requires that the module object has been attached to the interpreter state with "PyState_AddModule()" beforehand. In case the corresponding module object is not found or has not been attached to the interpreter state yet, it returns "NULL". int PyState_AddModule(PyObject *module, PyModuleDef *def) Attaches the module object passed to the function to the interpreter state. This allows the module object to be accessible via "PyState_FindModule()". Only effective on modules created using single-phase initialization. Python calls "PyState_AddModule" automatically after importing a module, so it is unnecessary (but harmless) to call it from module initialization code. An explicit call is needed only if the module’s own init code subsequently calls "PyState_FindModule". The function is mainly intended for implementing alternative import mechanisms (either by calling it directly, or by referring to its implementation for details of the required state updates). The caller must hold the GIL. Return 0 on success or -1 on failure. New in version 3.3. int PyState_RemoveModule(PyModuleDef *def) Removes the module object created from *def* from the interpreter state. Return 0 on success or -1 on failure. The caller must hold the GIL. New in version 3.3. The "None" Object ***************** Note that the "PyTypeObject" for "None" is not directly exposed in the Python/C API. Since "None" is a singleton, testing for object identity (using "==" in C) is sufficient. There is no "PyNone_Check()" function for the same reason. PyObject* Py_None The Python "None" object, denoting lack of value. This object has no methods. It needs to be treated just like any other object with respect to reference counts. Py_RETURN_NONE Properly handle returning "Py_None" from within a C function (that is, increment the reference count of "None" and return it.) Number Protocol *************** int PyNumber_Check(PyObject *o) Returns "1" if the object *o* provides numeric protocols, and false otherwise. This function always succeeds. Changed in version 3.8: Returns "1" if *o* is an index integer. PyObject* PyNumber_Add(PyObject *o1, PyObject *o2) *Return value: New reference.* Returns the result of adding *o1* and *o2*, or "NULL" on failure. This is the equivalent of the Python expression "o1 + o2". PyObject* PyNumber_Subtract(PyObject *o1, PyObject *o2) *Return value: New reference.* Returns the result of subtracting *o2* from *o1*, or "NULL" on failure. This is the equivalent of the Python expression "o1 - o2". PyObject* PyNumber_Multiply(PyObject *o1, PyObject *o2) *Return value: New reference.* Returns the result of multiplying *o1* and *o2*, or "NULL" on failure. This is the equivalent of the Python expression "o1 * o2". PyObject* PyNumber_MatrixMultiply(PyObject *o1, PyObject *o2) *Return value: New reference.* Returns the result of matrix multiplication on *o1* and *o2*, or "NULL" on failure. This is the equivalent of the Python expression "o1 @ o2". New in version 3.5. PyObject* PyNumber_FloorDivide(PyObject *o1, PyObject *o2) *Return value: New reference.* Return the floor of *o1* divided by *o2*, or "NULL" on failure. This is the equivalent of the Python expression "o1 // o2". PyObject* PyNumber_TrueDivide(PyObject *o1, PyObject *o2) *Return value: New reference.* Return a reasonable approximation for the mathematical value of *o1* divided by *o2*, or "NULL" on failure. The return value is “approximate” because binary floating point numbers are approximate; it is not possible to represent all real numbers in base two. This function can return a floating point value when passed two integers. This is the equivalent of the Python expression "o1 / o2". PyObject* PyNumber_Remainder(PyObject *o1, PyObject *o2) *Return value: New reference.* Returns the remainder of dividing *o1* by *o2*, or "NULL" on failure. This is the equivalent of the Python expression "o1 % o2". PyObject* PyNumber_Divmod(PyObject *o1, PyObject *o2) *Return value: New reference.* See the built-in function "divmod()". Returns "NULL" on failure. This is the equivalent of the Python expression "divmod(o1, o2)". PyObject* PyNumber_Power(PyObject *o1, PyObject *o2, PyObject *o3) *Return value: New reference.* See the built-in function "pow()". Returns "NULL" on failure. This is the equivalent of the Python expression "pow(o1, o2, o3)", where *o3* is optional. If *o3* is to be ignored, pass "Py_None" in its place (passing "NULL" for *o3* would cause an illegal memory access). PyObject* PyNumber_Negative(PyObject *o) *Return value: New reference.* Returns the negation of *o* on success, or "NULL" on failure. This is the equivalent of the Python expression "-o". PyObject* PyNumber_Positive(PyObject *o) *Return value: New reference.* Returns *o* on success, or "NULL" on failure. This is the equivalent of the Python expression "+o". PyObject* PyNumber_Absolute(PyObject *o) *Return value: New reference.* Returns the absolute value of *o*, or "NULL" on failure. This is the equivalent of the Python expression "abs(o)". PyObject* PyNumber_Invert(PyObject *o) *Return value: New reference.* Returns the bitwise negation of *o* on success, or "NULL" on failure. This is the equivalent of the Python expression "~o". PyObject* PyNumber_Lshift(PyObject *o1, PyObject *o2) *Return value: New reference.* Returns the result of left shifting *o1* by *o2* on success, or "NULL" on failure. This is the equivalent of the Python expression "o1 << o2". PyObject* PyNumber_Rshift(PyObject *o1, PyObject *o2) *Return value: New reference.* Returns the result of right shifting *o1* by *o2* on success, or "NULL" on failure. This is the equivalent of the Python expression "o1 >> o2". PyObject* PyNumber_And(PyObject *o1, PyObject *o2) *Return value: New reference.* Returns the “bitwise and” of *o1* and *o2* on success and "NULL" on failure. This is the equivalent of the Python expression "o1 & o2". PyObject* PyNumber_Xor(PyObject *o1, PyObject *o2) *Return value: New reference.* Returns the “bitwise exclusive or” of *o1* by *o2* on success, or "NULL" on failure. This is the equivalent of the Python expression "o1 ^ o2". PyObject* PyNumber_Or(PyObject *o1, PyObject *o2) *Return value: New reference.* Returns the “bitwise or” of *o1* and *o2* on success, or "NULL" on failure. This is the equivalent of the Python expression "o1 | o2". PyObject* PyNumber_InPlaceAdd(PyObject *o1, PyObject *o2) *Return value: New reference.* Returns the result of adding *o1* and *o2*, or "NULL" on failure. The operation is done *in-place* when *o1* supports it. This is the equivalent of the Python statement "o1 += o2". PyObject* PyNumber_InPlaceSubtract(PyObject *o1, PyObject *o2) *Return value: New reference.* Returns the result of subtracting *o2* from *o1*, or "NULL" on failure. The operation is done *in-place* when *o1* supports it. This is the equivalent of the Python statement "o1 -= o2". PyObject* PyNumber_InPlaceMultiply(PyObject *o1, PyObject *o2) *Return value: New reference.* Returns the result of multiplying *o1* and *o2*, or "NULL" on failure. The operation is done *in-place* when *o1* supports it. This is the equivalent of the Python statement "o1 *= o2". PyObject* PyNumber_InPlaceMatrixMultiply(PyObject *o1, PyObject *o2) *Return value: New reference.* Returns the result of matrix multiplication on *o1* and *o2*, or "NULL" on failure. The operation is done *in-place* when *o1* supports it. This is the equivalent of the Python statement "o1 @= o2". New in version 3.5. PyObject* PyNumber_InPlaceFloorDivide(PyObject *o1, PyObject *o2) *Return value: New reference.* Returns the mathematical floor of dividing *o1* by *o2*, or "NULL" on failure. The operation is done *in-place* when *o1* supports it. This is the equivalent of the Python statement "o1 //= o2". PyObject* PyNumber_InPlaceTrueDivide(PyObject *o1, PyObject *o2) *Return value: New reference.* Return a reasonable approximation for the mathematical value of *o1* divided by *o2*, or "NULL" on failure. The return value is “approximate” because binary floating point numbers are approximate; it is not possible to represent all real numbers in base two. This function can return a floating point value when passed two integers. The operation is done *in-place* when *o1* supports it. This is the equivalent of the Python statement "o1 /= o2". PyObject* PyNumber_InPlaceRemainder(PyObject *o1, PyObject *o2) *Return value: New reference.* Returns the remainder of dividing *o1* by *o2*, or "NULL" on failure. The operation is done *in-place* when *o1* supports it. This is the equivalent of the Python statement "o1 %= o2". PyObject* PyNumber_InPlacePower(PyObject *o1, PyObject *o2, PyObject *o3) *Return value: New reference.* See the built-in function "pow()". Returns "NULL" on failure. The operation is done *in-place* when *o1* supports it. This is the equivalent of the Python statement "o1 **= o2" when o3 is "Py_None", or an in-place variant of "pow(o1, o2, o3)" otherwise. If *o3* is to be ignored, pass "Py_None" in its place (passing "NULL" for *o3* would cause an illegal memory access). PyObject* PyNumber_InPlaceLshift(PyObject *o1, PyObject *o2) *Return value: New reference.* Returns the result of left shifting *o1* by *o2* on success, or "NULL" on failure. The operation is done *in-place* when *o1* supports it. This is the equivalent of the Python statement "o1 <<= o2". PyObject* PyNumber_InPlaceRshift(PyObject *o1, PyObject *o2) *Return value: New reference.* Returns the result of right shifting *o1* by *o2* on success, or "NULL" on failure. The operation is done *in-place* when *o1* supports it. This is the equivalent of the Python statement "o1 >>= o2". PyObject* PyNumber_InPlaceAnd(PyObject *o1, PyObject *o2) *Return value: New reference.* Returns the “bitwise and” of *o1* and *o2* on success and "NULL" on failure. The operation is done *in-place* when *o1* supports it. This is the equivalent of the Python statement "o1 &= o2". PyObject* PyNumber_InPlaceXor(PyObject *o1, PyObject *o2) *Return value: New reference.* Returns the “bitwise exclusive or” of *o1* by *o2* on success, or "NULL" on failure. The operation is done *in-place* when *o1* supports it. This is the equivalent of the Python statement "o1 ^= o2". PyObject* PyNumber_InPlaceOr(PyObject *o1, PyObject *o2) *Return value: New reference.* Returns the “bitwise or” of *o1* and *o2* on success, or "NULL" on failure. The operation is done *in-place* when *o1* supports it. This is the equivalent of the Python statement "o1 |= o2". PyObject* PyNumber_Long(PyObject *o) *Return value: New reference.* Returns the *o* converted to an integer object on success, or "NULL" on failure. This is the equivalent of the Python expression "int(o)". PyObject* PyNumber_Float(PyObject *o) *Return value: New reference.* Returns the *o* converted to a float object on success, or "NULL" on failure. This is the equivalent of the Python expression "float(o)". PyObject* PyNumber_Index(PyObject *o) *Return value: New reference.* Returns the *o* converted to a Python int on success or "NULL" with a "TypeError" exception raised on failure. PyObject* PyNumber_ToBase(PyObject *n, int base) *Return value: New reference.* Returns the integer *n* converted to base *base* as a string. The *base* argument must be one of 2, 8, 10, or 16. For base 2, 8, or 16, the returned string is prefixed with a base marker of "'0b'", "'0o'", or "'0x'", respectively. If *n* is not a Python int, it is converted with "PyNumber_Index()" first. Py_ssize_t PyNumber_AsSsize_t(PyObject *o, PyObject *exc) Returns *o* converted to a "Py_ssize_t" value if *o* can be interpreted as an integer. If the call fails, an exception is raised and "-1" is returned. If *o* can be converted to a Python int but the attempt to convert to a "Py_ssize_t" value would raise an "OverflowError", then the *exc* argument is the type of exception that will be raised (usually "IndexError" or "OverflowError"). If *exc* is "NULL", then the exception is cleared and the value is clipped to "PY_SSIZE_T_MIN" for a negative integer or "PY_SSIZE_T_MAX" for a positive integer. int PyIndex_Check(PyObject *o) Returns "1" if *o* is an index integer (has the "nb_index" slot of the "tp_as_number" structure filled in), and "0" otherwise. This function always succeeds. Old Buffer Protocol ******************* Deprecated since version 3.0. These functions were part of the “old buffer protocol” API in Python 2. In Python 3, this protocol doesn’t exist anymore but the functions are still exposed to ease porting 2.x code. They act as a compatibility wrapper around the new buffer protocol, but they don’t give you control over the lifetime of the resources acquired when a buffer is exported. Therefore, it is recommended that you call "PyObject_GetBuffer()" (or the "y*" or "w*" format codes with the "PyArg_ParseTuple()" family of functions) to get a buffer view over an object, and "PyBuffer_Release()" when the buffer view can be released. int PyObject_AsCharBuffer(PyObject *obj, const char **buffer, Py_ssize_t *buffer_len) Returns a pointer to a read-only memory location usable as character-based input. The *obj* argument must support the single- segment character buffer interface. On success, returns "0", sets *buffer* to the memory location and *buffer_len* to the buffer length. Returns "-1" and sets a "TypeError" on error. int PyObject_AsReadBuffer(PyObject *obj, const void **buffer, Py_ssize_t *buffer_len) Returns a pointer to a read-only memory location containing arbitrary data. The *obj* argument must support the single-segment readable buffer interface. On success, returns "0", sets *buffer* to the memory location and *buffer_len* to the buffer length. Returns "-1" and sets a "TypeError" on error. int PyObject_CheckReadBuffer(PyObject *o) Returns "1" if *o* supports the single-segment readable buffer interface. Otherwise returns "0". This function always succeeds. Note that this function tries to get and release a buffer, and exceptions which occur while calling corresponding functions will get suppressed. To get error reporting use "PyObject_GetBuffer()" instead. int PyObject_AsWriteBuffer(PyObject *obj, void **buffer, Py_ssize_t *buffer_len) Returns a pointer to a writable memory location. The *obj* argument must support the single-segment, character buffer interface. On success, returns "0", sets *buffer* to the memory location and *buffer_len* to the buffer length. Returns "-1" and sets a "TypeError" on error. Object Protocol *************** PyObject* Py_NotImplemented The "NotImplemented" singleton, used to signal that an operation is not implemented for the given type combination. Py_RETURN_NOTIMPLEMENTED Properly handle returning "Py_NotImplemented" from within a C function (that is, increment the reference count of NotImplemented and return it). int PyObject_Print(PyObject *o, FILE *fp, int flags) Print an object *o*, on file *fp*. Returns "-1" on error. The flags argument is used to enable certain printing options. The only option currently supported is "Py_PRINT_RAW"; if given, the "str()" of the object is written instead of the "repr()". int PyObject_HasAttr(PyObject *o, PyObject *attr_name) Returns "1" if *o* has the attribute *attr_name*, and "0" otherwise. This is equivalent to the Python expression "hasattr(o, attr_name)". This function always succeeds. Note that exceptions which occur while calling "__getattr__()" and "__getattribute__()" methods will get suppressed. To get error reporting use "PyObject_GetAttr()" instead. int PyObject_HasAttrString(PyObject *o, const char *attr_name) Returns "1" if *o* has the attribute *attr_name*, and "0" otherwise. This is equivalent to the Python expression "hasattr(o, attr_name)". This function always succeeds. Note that exceptions which occur while calling "__getattr__()" and "__getattribute__()" methods and creating a temporary string object will get suppressed. To get error reporting use "PyObject_GetAttrString()" instead. PyObject* PyObject_GetAttr(PyObject *o, PyObject *attr_name) *Return value: New reference.* Retrieve an attribute named *attr_name* from object *o*. Returns the attribute value on success, or "NULL" on failure. This is the equivalent of the Python expression "o.attr_name". PyObject* PyObject_GetAttrString(PyObject *o, const char *attr_name) *Return value: New reference.* Retrieve an attribute named *attr_name* from object *o*. Returns the attribute value on success, or "NULL" on failure. This is the equivalent of the Python expression "o.attr_name". PyObject* PyObject_GenericGetAttr(PyObject *o, PyObject *name) *Return value: New reference.* Generic attribute getter function that is meant to be put into a type object’s "tp_getattro" slot. It looks for a descriptor in the dictionary of classes in the object’s MRO as well as an attribute in the object’s "__dict__" (if present). As outlined in Implementing Descriptors, data descriptors take preference over instance attributes, while non-data descriptors don’t. Otherwise, an "AttributeError" is raised. int PyObject_SetAttr(PyObject *o, PyObject *attr_name, PyObject *v) Set the value of the attribute named *attr_name*, for object *o*, to the value *v*. Raise an exception and return "-1" on failure; return "0" on success. This is the equivalent of the Python statement "o.attr_name = v". If *v* is "NULL", the attribute is deleted. This behaviour is deprecated in favour of using "PyObject_DelAttr()", but there are currently no plans to remove it. int PyObject_SetAttrString(PyObject *o, const char *attr_name, PyObject *v) Set the value of the attribute named *attr_name*, for object *o*, to the value *v*. Raise an exception and return "-1" on failure; return "0" on success. This is the equivalent of the Python statement "o.attr_name = v". If *v* is "NULL", the attribute is deleted, but this feature is deprecated in favour of using "PyObject_DelAttrString()". int PyObject_GenericSetAttr(PyObject *o, PyObject *name, PyObject *value) Generic attribute setter and deleter function that is meant to be put into a type object’s "tp_setattro" slot. It looks for a data descriptor in the dictionary of classes in the object’s MRO, and if found it takes preference over setting or deleting the attribute in the instance dictionary. Otherwise, the attribute is set or deleted in the object’s "__dict__" (if present). On success, "0" is returned, otherwise an "AttributeError" is raised and "-1" is returned. int PyObject_DelAttr(PyObject *o, PyObject *attr_name) Delete attribute named *attr_name*, for object *o*. Returns "-1" on failure. This is the equivalent of the Python statement "del o.attr_name". int PyObject_DelAttrString(PyObject *o, const char *attr_name) Delete attribute named *attr_name*, for object *o*. Returns "-1" on failure. This is the equivalent of the Python statement "del o.attr_name". PyObject* PyObject_GenericGetDict(PyObject *o, void *context) *Return value: New reference.* A generic implementation for the getter of a "__dict__" descriptor. It creates the dictionary if necessary. New in version 3.3. int PyObject_GenericSetDict(PyObject *o, PyObject *value, void *context) A generic implementation for the setter of a "__dict__" descriptor. This implementation does not allow the dictionary to be deleted. New in version 3.3. PyObject* PyObject_RichCompare(PyObject *o1, PyObject *o2, int opid) *Return value: New reference.* Compare the values of *o1* and *o2* using the operation specified by *opid*, which must be one of "Py_LT", "Py_LE", "Py_EQ", "Py_NE", "Py_GT", or "Py_GE", corresponding to "<", "<=", "==", "!=", ">", or ">=" respectively. This is the equivalent of the Python expression "o1 op o2", where "op" is the operator corresponding to *opid*. Returns the value of the comparison on success, or "NULL" on failure. int PyObject_RichCompareBool(PyObject *o1, PyObject *o2, int opid) Compare the values of *o1* and *o2* using the operation specified by *opid*, which must be one of "Py_LT", "Py_LE", "Py_EQ", "Py_NE", "Py_GT", or "Py_GE", corresponding to "<", "<=", "==", "!=", ">", or ">=" respectively. Returns "-1" on error, "0" if the result is false, "1" otherwise. This is the equivalent of the Python expression "o1 op o2", where "op" is the operator corresponding to *opid*. Note: If *o1* and *o2* are the same object, "PyObject_RichCompareBool()" will always return "1" for "Py_EQ" and "0" for "Py_NE". PyObject* PyObject_Repr(PyObject *o) *Return value: New reference.* Compute a string representation of object *o*. Returns the string representation on success, "NULL" on failure. This is the equivalent of the Python expression "repr(o)". Called by the "repr()" built-in function. Changed in version 3.4: This function now includes a debug assertion to help ensure that it does not silently discard an active exception. PyObject* PyObject_ASCII(PyObject *o) *Return value: New reference.* As "PyObject_Repr()", compute a string representation of object *o*, but escape the non-ASCII characters in the string returned by "PyObject_Repr()" with "\x", "\u" or "\U" escapes. This generates a string similar to that returned by "PyObject_Repr()" in Python 2. Called by the "ascii()" built-in function. PyObject* PyObject_Str(PyObject *o) *Return value: New reference.* Compute a string representation of object *o*. Returns the string representation on success, "NULL" on failure. This is the equivalent of the Python expression "str(o)". Called by the "str()" built-in function and, therefore, by the "print()" function. Changed in version 3.4: This function now includes a debug assertion to help ensure that it does not silently discard an active exception. PyObject* PyObject_Bytes(PyObject *o) *Return value: New reference.* Compute a bytes representation of object *o*. "NULL" is returned on failure and a bytes object on success. This is equivalent to the Python expression "bytes(o)", when *o* is not an integer. Unlike "bytes(o)", a TypeError is raised when *o* is an integer instead of a zero-initialized bytes object. int PyObject_IsSubclass(PyObject *derived, PyObject *cls) Return "1" if the class *derived* is identical to or derived from the class *cls*, otherwise return "0". In case of an error, return "-1". If *cls* is a tuple, the check will be done against every entry in *cls*. The result will be "1" when at least one of the checks returns "1", otherwise it will be "0". If *cls* has a "__subclasscheck__()" method, it will be called to determine the subclass status as described in **PEP 3119**. Otherwise, *derived* is a subclass of *cls* if it is a direct or indirect subclass, i.e. contained in "cls.__mro__". Normally only class objects, i.e. instances of "type" or a derived class, are considered classes. However, objects can override this by having a "__bases__" attribute (which must be a tuple of base classes). int PyObject_IsInstance(PyObject *inst, PyObject *cls) Return "1" if *inst* is an instance of the class *cls* or a subclass of *cls*, or "0" if not. On error, returns "-1" and sets an exception. If *cls* is a tuple, the check will be done against every entry in *cls*. The result will be "1" when at least one of the checks returns "1", otherwise it will be "0". If *cls* has a "__instancecheck__()" method, it will be called to determine the subclass status as described in **PEP 3119**. Otherwise, *inst* is an instance of *cls* if its class is a subclass of *cls*. An instance *inst* can override what is considered its class by having a "__class__" attribute. An object *cls* can override if it is considered a class, and what its base classes are, by having a "__bases__" attribute (which must be a tuple of base classes). Py_hash_t PyObject_Hash(PyObject *o) Compute and return the hash value of an object *o*. On failure, return "-1". This is the equivalent of the Python expression "hash(o)". Changed in version 3.2: The return type is now Py_hash_t. This is a signed integer the same size as "Py_ssize_t". Py_hash_t PyObject_HashNotImplemented(PyObject *o) Set a "TypeError" indicating that "type(o)" is not hashable and return "-1". This function receives special treatment when stored in a "tp_hash" slot, allowing a type to explicitly indicate to the interpreter that it is not hashable. int PyObject_IsTrue(PyObject *o) Returns "1" if the object *o* is considered to be true, and "0" otherwise. This is equivalent to the Python expression "not not o". On failure, return "-1". int PyObject_Not(PyObject *o) Returns "0" if the object *o* is considered to be true, and "1" otherwise. This is equivalent to the Python expression "not o". On failure, return "-1". PyObject* PyObject_Type(PyObject *o) *Return value: New reference.* When *o* is non-"NULL", returns a type object corresponding to the object type of object *o*. On failure, raises "SystemError" and returns "NULL". This is equivalent to the Python expression "type(o)". This function increments the reference count of the return value. There’s really no reason to use this function instead of the "Py_TYPE()" function, which returns a pointer of type "PyTypeObject*", except when the incremented reference count is needed. int PyObject_TypeCheck(PyObject *o, PyTypeObject *type) Return true if the object *o* is of type *type* or a subtype of *type*. Both parameters must be non-"NULL". Py_ssize_t PyObject_Size(PyObject *o) Py_ssize_t PyObject_Length(PyObject *o) Return the length of object *o*. If the object *o* provides either the sequence and mapping protocols, the sequence length is returned. On error, "-1" is returned. This is the equivalent to the Python expression "len(o)". Py_ssize_t PyObject_LengthHint(PyObject *o, Py_ssize_t defaultvalue) Return an estimated length for the object *o*. First try to return its actual length, then an estimate using "__length_hint__()", and finally return the default value. On error return "-1". This is the equivalent to the Python expression "operator.length_hint(o, defaultvalue)". New in version 3.4. PyObject* PyObject_GetItem(PyObject *o, PyObject *key) *Return value: New reference.* Return element of *o* corresponding to the object *key* or "NULL" on failure. This is the equivalent of the Python expression "o[key]". int PyObject_SetItem(PyObject *o, PyObject *key, PyObject *v) Map the object *key* to the value *v*. Raise an exception and return "-1" on failure; return "0" on success. This is the equivalent of the Python statement "o[key] = v". This function *does not* steal a reference to *v*. int PyObject_DelItem(PyObject *o, PyObject *key) Remove the mapping for the object *key* from the object *o*. Return "-1" on failure. This is equivalent to the Python statement "del o[key]". PyObject* PyObject_Dir(PyObject *o) *Return value: New reference.* This is equivalent to the Python expression "dir(o)", returning a (possibly empty) list of strings appropriate for the object argument, or "NULL" if there was an error. If the argument is "NULL", this is like the Python "dir()", returning the names of the current locals; in this case, if no execution frame is active then "NULL" is returned but "PyErr_Occurred()" will return false. PyObject* PyObject_GetIter(PyObject *o) *Return value: New reference.* This is equivalent to the Python expression "iter(o)". It returns a new iterator for the object argument, or the object itself if the object is already an iterator. Raises "TypeError" and returns "NULL" if the object cannot be iterated. Object Implementation Support ***************************** This chapter describes the functions, types, and macros used when defining new object types. * Allocating Objects on the Heap * Common Object Structures * Base object types and macros * Implementing functions and methods * Accessing attributes of extension types * Type Objects * Quick Reference * “tp slots” * sub-slots * slot typedefs * PyTypeObject Definition * PyObject Slots * PyVarObject Slots * PyTypeObject Slots * Heap Types * Number Object Structures * Mapping Object Structures * Sequence Object Structures * Buffer Object Structures * Async Object Structures * Slot Type typedefs * Examples * Supporting Cyclic Garbage Collection Reference Counting ****************** The macros in this section are used for managing reference counts of Python objects. void Py_INCREF(PyObject *o) Increment the reference count for object *o*. The object must not be "NULL"; if you aren’t sure that it isn’t "NULL", use "Py_XINCREF()". void Py_XINCREF(PyObject *o) Increment the reference count for object *o*. The object may be "NULL", in which case the macro has no effect. void Py_DECREF(PyObject *o) Decrement the reference count for object *o*. The object must not be "NULL"; if you aren’t sure that it isn’t "NULL", use "Py_XDECREF()". If the reference count reaches zero, the object’s type’s deallocation function (which must not be "NULL") is invoked. Warning: The deallocation function can cause arbitrary Python code to be invoked (e.g. when a class instance with a "__del__()" method is deallocated). While exceptions in such code are not propagated, the executed code has free access to all Python global variables. This means that any object that is reachable from a global variable should be in a consistent state before "Py_DECREF()" is invoked. For example, code to delete an object from a list should copy a reference to the deleted object in a temporary variable, update the list data structure, and then call "Py_DECREF()" for the temporary variable. void Py_XDECREF(PyObject *o) Decrement the reference count for object *o*. The object may be "NULL", in which case the macro has no effect; otherwise the effect is the same as for "Py_DECREF()", and the same warning applies. void Py_CLEAR(PyObject *o) Decrement the reference count for object *o*. The object may be "NULL", in which case the macro has no effect; otherwise the effect is the same as for "Py_DECREF()", except that the argument is also set to "NULL". The warning for "Py_DECREF()" does not apply with respect to the object passed because the macro carefully uses a temporary variable and sets the argument to "NULL" before decrementing its reference count. It is a good idea to use this macro whenever decrementing the reference count of an object that might be traversed during garbage collection. The following functions are for runtime dynamic embedding of Python: "Py_IncRef(PyObject *o)", "Py_DecRef(PyObject *o)". They are simply exported function versions of "Py_XINCREF()" and "Py_XDECREF()", respectively. The following functions or macros are only for use within the interpreter core: "_Py_Dealloc()", "_Py_ForgetReference()", "_Py_NewReference()", as well as the global variable "_Py_RefTotal". Reflection ********** PyObject* PyEval_GetBuiltins(void) *Return value: Borrowed reference.* Return a dictionary of the builtins in the current execution frame, or the interpreter of the thread state if no frame is currently executing. PyObject* PyEval_GetLocals(void) *Return value: Borrowed reference.* Return a dictionary of the local variables in the current execution frame, or "NULL" if no frame is currently executing. PyObject* PyEval_GetGlobals(void) *Return value: Borrowed reference.* Return a dictionary of the global variables in the current execution frame, or "NULL" if no frame is currently executing. PyFrameObject* PyEval_GetFrame(void) *Return value: Borrowed reference.* Return the current thread state’s frame, which is "NULL" if no frame is currently executing. See also "PyThreadState_GetFrame()". PyFrameObject* PyFrame_GetBack(PyFrameObject *frame) Get the *frame* next outer frame. Return a strong reference, or "NULL" if *frame* has no outer frame. *frame* must not be "NULL". New in version 3.9. PyCodeObject* PyFrame_GetCode(PyFrameObject *frame) Get the *frame* code. Return a strong reference. *frame* must not be "NULL". The result (frame code) cannot be "NULL". New in version 3.9. int PyFrame_GetLineNumber(PyFrameObject *frame) Return the line number that *frame* is currently executing. *frame* must not be "NULL". const char* PyEval_GetFuncName(PyObject *func) Return the name of *func* if it is a function, class or instance object, else the name of *func*s type. const char* PyEval_GetFuncDesc(PyObject *func) Return a description string, depending on the type of *func*. Return values include “()” for functions and methods, ” constructor”, ” instance”, and ” object”. Concatenated with the result of "PyEval_GetFuncName()", the result will be a description of *func*. Sequence Protocol ***************** int PySequence_Check(PyObject *o) Return "1" if the object provides the sequence protocol, and "0" otherwise. Note that it returns "1" for Python classes with a "__getitem__()" method, unless they are "dict" subclasses, since in general it is impossible to determine what type of keys the class supports. This function always succeeds. Py_ssize_t PySequence_Size(PyObject *o) Py_ssize_t PySequence_Length(PyObject *o) Returns the number of objects in sequence *o* on success, and "-1" on failure. This is equivalent to the Python expression "len(o)". PyObject* PySequence_Concat(PyObject *o1, PyObject *o2) *Return value: New reference.* Return the concatenation of *o1* and *o2* on success, and "NULL" on failure. This is the equivalent of the Python expression "o1 + o2". PyObject* PySequence_Repeat(PyObject *o, Py_ssize_t count) *Return value: New reference.* Return the result of repeating sequence object *o* *count* times, or "NULL" on failure. This is the equivalent of the Python expression "o * count". PyObject* PySequence_InPlaceConcat(PyObject *o1, PyObject *o2) *Return value: New reference.* Return the concatenation of *o1* and *o2* on success, and "NULL" on failure. The operation is done *in-place* when *o1* supports it. This is the equivalent of the Python expression "o1 += o2". PyObject* PySequence_InPlaceRepeat(PyObject *o, Py_ssize_t count) *Return value: New reference.* Return the result of repeating sequence object *o* *count* times, or "NULL" on failure. The operation is done *in-place* when *o* supports it. This is the equivalent of the Python expression "o *= count". PyObject* PySequence_GetItem(PyObject *o, Py_ssize_t i) *Return value: New reference.* Return the *i*th element of *o*, or "NULL" on failure. This is the equivalent of the Python expression "o[i]". PyObject* PySequence_GetSlice(PyObject *o, Py_ssize_t i1, Py_ssize_t i2) *Return value: New reference.* Return the slice of sequence object *o* between *i1* and *i2*, or "NULL" on failure. This is the equivalent of the Python expression "o[i1:i2]". int PySequence_SetItem(PyObject *o, Py_ssize_t i, PyObject *v) Assign object *v* to the *i*th element of *o*. Raise an exception and return "-1" on failure; return "0" on success. This is the equivalent of the Python statement "o[i] = v". This function *does not* steal a reference to *v*. If *v* is "NULL", the element is deleted, but this feature is deprecated in favour of using "PySequence_DelItem()". int PySequence_DelItem(PyObject *o, Py_ssize_t i) Delete the *i*th element of object *o*. Returns "-1" on failure. This is the equivalent of the Python statement "del o[i]". int PySequence_SetSlice(PyObject *o, Py_ssize_t i1, Py_ssize_t i2, PyObject *v) Assign the sequence object *v* to the slice in sequence object *o* from *i1* to *i2*. This is the equivalent of the Python statement "o[i1:i2] = v". int PySequence_DelSlice(PyObject *o, Py_ssize_t i1, Py_ssize_t i2) Delete the slice in sequence object *o* from *i1* to *i2*. Returns "-1" on failure. This is the equivalent of the Python statement "del o[i1:i2]". Py_ssize_t PySequence_Count(PyObject *o, PyObject *value) Return the number of occurrences of *value* in *o*, that is, return the number of keys for which "o[key] == value". On failure, return "-1". This is equivalent to the Python expression "o.count(value)". int PySequence_Contains(PyObject *o, PyObject *value) Determine if *o* contains *value*. If an item in *o* is equal to *value*, return "1", otherwise return "0". On error, return "-1". This is equivalent to the Python expression "value in o". Py_ssize_t PySequence_Index(PyObject *o, PyObject *value) Return the first index *i* for which "o[i] == value". On error, return "-1". This is equivalent to the Python expression "o.index(value)". PyObject* PySequence_List(PyObject *o) *Return value: New reference.* Return a list object with the same contents as the sequence or iterable *o*, or "NULL" on failure. The returned list is guaranteed to be new. This is equivalent to the Python expression "list(o)". PyObject* PySequence_Tuple(PyObject *o) *Return value: New reference.* Return a tuple object with the same contents as the sequence or iterable *o*, or "NULL" on failure. If *o* is a tuple, a new reference will be returned, otherwise a tuple will be constructed with the appropriate contents. This is equivalent to the Python expression "tuple(o)". PyObject* PySequence_Fast(PyObject *o, const char *m) *Return value: New reference.* Return the sequence or iterable *o* as an object usable by the other "PySequence_Fast*" family of functions. If the object is not a sequence or iterable, raises "TypeError" with *m* as the message text. Returns "NULL" on failure. The "PySequence_Fast*" functions are thus named because they assume *o* is a "PyTupleObject" or a "PyListObject" and access the data fields of *o* directly. As a CPython implementation detail, if *o* is already a sequence or list, it will be returned. Py_ssize_t PySequence_Fast_GET_SIZE(PyObject *o) Returns the length of *o*, assuming that *o* was returned by "PySequence_Fast()" and that *o* is not "NULL". The size can also be retrieved by calling "PySequence_Size()" on *o*, but "PySequence_Fast_GET_SIZE()" is faster because it can assume *o* is a list or tuple. PyObject* PySequence_Fast_GET_ITEM(PyObject *o, Py_ssize_t i) *Return value: Borrowed reference.* Return the *i*th element of *o*, assuming that *o* was returned by "PySequence_Fast()", *o* is not "NULL", and that *i* is within bounds. PyObject** PySequence_Fast_ITEMS(PyObject *o) Return the underlying array of PyObject pointers. Assumes that *o* was returned by "PySequence_Fast()" and *o* is not "NULL". Note, if a list gets resized, the reallocation may relocate the items array. So, only use the underlying array pointer in contexts where the sequence cannot change. PyObject* PySequence_ITEM(PyObject *o, Py_ssize_t i) *Return value: New reference.* Return the *i*th element of *o* or "NULL" on failure. Faster form of "PySequence_GetItem()" but without checking that "PySequence_Check()" on *o* is true and without adjustment for negative indices. Set Objects *********** This section details the public API for "set" and "frozenset" objects. Any functionality not listed below is best accessed using either the abstract object protocol (including "PyObject_CallMethod()", "PyObject_RichCompareBool()", "PyObject_Hash()", "PyObject_Repr()", "PyObject_IsTrue()", "PyObject_Print()", and "PyObject_GetIter()") or the abstract number protocol (including "PyNumber_And()", "PyNumber_Subtract()", "PyNumber_Or()", "PyNumber_Xor()", "PyNumber_InPlaceAnd()", "PyNumber_InPlaceSubtract()", "PyNumber_InPlaceOr()", and "PyNumber_InPlaceXor()"). PySetObject This subtype of "PyObject" is used to hold the internal data for both "set" and "frozenset" objects. It is like a "PyDictObject" in that it is a fixed size for small sets (much like tuple storage) and will point to a separate, variable sized block of memory for medium and large sized sets (much like list storage). None of the fields of this structure should be considered public and all are subject to change. All access should be done through the documented API rather than by manipulating the values in the structure. PyTypeObject PySet_Type This is an instance of "PyTypeObject" representing the Python "set" type. PyTypeObject PyFrozenSet_Type This is an instance of "PyTypeObject" representing the Python "frozenset" type. The following type check macros work on pointers to any Python object. Likewise, the constructor functions work with any iterable Python object. int PySet_Check(PyObject *p) Return true if *p* is a "set" object or an instance of a subtype. This function always succeeds. int PyFrozenSet_Check(PyObject *p) Return true if *p* is a "frozenset" object or an instance of a subtype. This function always succeeds. int PyAnySet_Check(PyObject *p) Return true if *p* is a "set" object, a "frozenset" object, or an instance of a subtype. This function always succeeds. int PyAnySet_CheckExact(PyObject *p) Return true if *p* is a "set" object or a "frozenset" object but not an instance of a subtype. This function always succeeds. int PyFrozenSet_CheckExact(PyObject *p) Return true if *p* is a "frozenset" object but not an instance of a subtype. This function always succeeds. PyObject* PySet_New(PyObject *iterable) *Return value: New reference.* Return a new "set" containing objects returned by the *iterable*. The *iterable* may be "NULL" to create a new empty set. Return the new set on success or "NULL" on failure. Raise "TypeError" if *iterable* is not actually iterable. The constructor is also useful for copying a set ("c=set(s)"). PyObject* PyFrozenSet_New(PyObject *iterable) *Return value: New reference.* Return a new "frozenset" containing objects returned by the *iterable*. The *iterable* may be "NULL" to create a new empty frozenset. Return the new set on success or "NULL" on failure. Raise "TypeError" if *iterable* is not actually iterable. The following functions and macros are available for instances of "set" or "frozenset" or instances of their subtypes. Py_ssize_t PySet_Size(PyObject *anyset) Return the length of a "set" or "frozenset" object. Equivalent to "len(anyset)". Raises a "PyExc_SystemError" if *anyset* is not a "set", "frozenset", or an instance of a subtype. Py_ssize_t PySet_GET_SIZE(PyObject *anyset) Macro form of "PySet_Size()" without error checking. int PySet_Contains(PyObject *anyset, PyObject *key) Return "1" if found, "0" if not found, and "-1" if an error is encountered. Unlike the Python "__contains__()" method, this function does not automatically convert unhashable sets into temporary frozensets. Raise a "TypeError" if the *key* is unhashable. Raise "PyExc_SystemError" if *anyset* is not a "set", "frozenset", or an instance of a subtype. int PySet_Add(PyObject *set, PyObject *key) Add *key* to a "set" instance. Also works with "frozenset" instances (like "PyTuple_SetItem()" it can be used to fill in the values of brand new frozensets before they are exposed to other code). Return "0" on success or "-1" on failure. Raise a "TypeError" if the *key* is unhashable. Raise a "MemoryError" if there is no room to grow. Raise a "SystemError" if *set* is not an instance of "set" or its subtype. The following functions are available for instances of "set" or its subtypes but not for instances of "frozenset" or its subtypes. int PySet_Discard(PyObject *set, PyObject *key) Return "1" if found and removed, "0" if not found (no action taken), and "-1" if an error is encountered. Does not raise "KeyError" for missing keys. Raise a "TypeError" if the *key* is unhashable. Unlike the Python "discard()" method, this function does not automatically convert unhashable sets into temporary frozensets. Raise "PyExc_SystemError" if *set* is not an instance of "set" or its subtype. PyObject* PySet_Pop(PyObject *set) *Return value: New reference.* Return a new reference to an arbitrary object in the *set*, and removes the object from the *set*. Return "NULL" on failure. Raise "KeyError" if the set is empty. Raise a "SystemError" if *set* is not an instance of "set" or its subtype. int PySet_Clear(PyObject *set) Empty an existing set of all elements. Slice Objects ************* PyTypeObject PySlice_Type The type object for slice objects. This is the same as "slice" in the Python layer. int PySlice_Check(PyObject *ob) Return true if *ob* is a slice object; *ob* must not be "NULL". This function always succeeds. PyObject* PySlice_New(PyObject *start, PyObject *stop, PyObject *step) *Return value: New reference.* Return a new slice object with the given values. The *start*, *stop*, and *step* parameters are used as the values of the slice object attributes of the same names. Any of the values may be "NULL", in which case the "None" will be used for the corresponding attribute. Return "NULL" if the new object could not be allocated. int PySlice_GetIndices(PyObject *slice, Py_ssize_t length, Py_ssize_t *start, Py_ssize_t *stop, Py_ssize_t *step) Retrieve the start, stop and step indices from the slice object *slice*, assuming a sequence of length *length*. Treats indices greater than *length* as errors. Returns "0" on success and "-1" on error with no exception set (unless one of the indices was not "None" and failed to be converted to an integer, in which case "-1" is returned with an exception set). You probably do not want to use this function. Changed in version 3.2: The parameter type for the *slice* parameter was "PySliceObject*" before. int PySlice_GetIndicesEx(PyObject *slice, Py_ssize_t length, Py_ssize_t *start, Py_ssize_t *stop, Py_ssize_t *step, Py_ssize_t *slicelength) Usable replacement for "PySlice_GetIndices()". Retrieve the start, stop, and step indices from the slice object *slice* assuming a sequence of length *length*, and store the length of the slice in *slicelength*. Out of bounds indices are clipped in a manner consistent with the handling of normal slices. Returns "0" on success and "-1" on error with exception set. Note: This function is considered not safe for resizable sequences. Its invocation should be replaced by a combination of "PySlice_Unpack()" and "PySlice_AdjustIndices()" where if (PySlice_GetIndicesEx(slice, length, &start, &stop, &step, &slicelength) < 0) { // return error } is replaced by if (PySlice_Unpack(slice, &start, &stop, &step) < 0) { // return error } slicelength = PySlice_AdjustIndices(length, &start, &stop, step); Changed in version 3.2: The parameter type for the *slice* parameter was "PySliceObject*" before. Changed in version 3.6.1: If "Py_LIMITED_API" is not set or set to the value between "0x03050400" and "0x03060000" (not including) or "0x03060100" or higher "PySlice_GetIndicesEx()" is implemented as a macro using "PySlice_Unpack()" and "PySlice_AdjustIndices()". Arguments *start*, *stop* and *step* are evaluated more than once. Deprecated since version 3.6.1: If "Py_LIMITED_API" is set to the value less than "0x03050400" or between "0x03060000" and "0x03060100" (not including) "PySlice_GetIndicesEx()" is a deprecated function. int PySlice_Unpack(PyObject *slice, Py_ssize_t *start, Py_ssize_t *stop, Py_ssize_t *step) Extract the start, stop and step data members from a slice object as C integers. Silently reduce values larger than "PY_SSIZE_T_MAX" to "PY_SSIZE_T_MAX", silently boost the start and stop values less than "PY_SSIZE_T_MIN" to "PY_SSIZE_T_MIN", and silently boost the step values less than "-PY_SSIZE_T_MAX" to "-PY_SSIZE_T_MAX". Return "-1" on error, "0" on success. New in version 3.6.1. Py_ssize_t PySlice_AdjustIndices(Py_ssize_t length, Py_ssize_t *start, Py_ssize_t *stop, Py_ssize_t step) Adjust start/end slice indices assuming a sequence of the specified length. Out of bounds indices are clipped in a manner consistent with the handling of normal slices. Return the length of the slice. Always successful. Doesn’t call Python code. New in version 3.6.1. Ellipsis Object *************** PyObject *Py_Ellipsis The Python "Ellipsis" object. This object has no methods. It needs to be treated just like any other object with respect to reference counts. Like "Py_None" it is a singleton object. Stable Application Binary Interface *********************************** Traditionally, the C API of Python will change with every release. Most changes will be source-compatible, typically by only adding API, rather than changing existing API or removing API (although some interfaces do get removed after being deprecated first). Unfortunately, the API compatibility does not extend to binary compatibility (the ABI). The reason is primarily the evolution of struct definitions, where addition of a new field, or changing the type of a field, might not break the API, but can break the ABI. As a consequence, extension modules need to be recompiled for every Python release (although an exception is possible on Unix when none of the affected interfaces are used). In addition, on Windows, extension modules link with a specific pythonXY.dll and need to be recompiled to link with a newer one. Since Python 3.2, a subset of the API has been declared to guarantee a stable ABI. Extension modules wishing to use this API (called “limited API”) need to define "Py_LIMITED_API". A number of interpreter details then become hidden from the extension module; in return, a module is built that works on any 3.x version (x>=2) without recompilation. In some cases, the stable ABI needs to be extended with new functions. Extension modules wishing to use these new APIs need to set "Py_LIMITED_API" to the "PY_VERSION_HEX" value (see API and ABI Versioning) of the minimum Python version they want to support (e.g. "0x03030000" for Python 3.3). Such modules will work on all subsequent Python releases, but fail to load (because of missing symbols) on the older releases. As of Python 3.2, the set of functions available to the limited API is documented in **PEP 384**. In the C API documentation, API elements that are not part of the limited API are marked as “Not part of the limited API.” Common Object Structures ************************ There are a large number of structures which are used in the definition of object types for Python. This section describes these structures and how they are used. Base object types and macros ============================ All Python objects ultimately share a small number of fields at the beginning of the object’s representation in memory. These are represented by the "PyObject" and "PyVarObject" types, which are defined, in turn, by the expansions of some macros also used, whether directly or indirectly, in the definition of all other Python objects. PyObject All object types are extensions of this type. This is a type which contains the information Python needs to treat a pointer to an object as an object. In a normal “release” build, it contains only the object’s reference count and a pointer to the corresponding type object. Nothing is actually declared to be a "PyObject", but every pointer to a Python object can be cast to a "PyObject*". Access to the members must be done by using the macros "Py_REFCNT" and "Py_TYPE". PyVarObject This is an extension of "PyObject" that adds the "ob_size" field. This is only used for objects that have some notion of *length*. This type does not often appear in the Python/C API. Access to the members must be done by using the macros "Py_REFCNT", "Py_TYPE", and "Py_SIZE". PyObject_HEAD This is a macro used when declaring new types which represent objects without a varying length. The PyObject_HEAD macro expands to: PyObject ob_base; See documentation of "PyObject" above. PyObject_VAR_HEAD This is a macro used when declaring new types which represent objects with a length that varies from instance to instance. The PyObject_VAR_HEAD macro expands to: PyVarObject ob_base; See documentation of "PyVarObject" above. Py_TYPE(o) This macro is used to access the "ob_type" member of a Python object. It expands to: (((PyObject*)(o))->ob_type) int Py_IS_TYPE(PyObject *o, PyTypeObject *type) Return non-zero if the object *o* type is *type*. Return zero otherwise. Equivalent to: "Py_TYPE(o) == type". New in version 3.9. void Py_SET_TYPE(PyObject *o, PyTypeObject *type) Set the object *o* type to *type*. New in version 3.9. Py_REFCNT(o) This macro is used to access the "ob_refcnt" member of a Python object. It expands to: (((PyObject*)(o))->ob_refcnt) void Py_SET_REFCNT(PyObject *o, Py_ssize_t refcnt) Set the object *o* reference counter to *refcnt*. New in version 3.9. Py_SIZE(o) This macro is used to access the "ob_size" member of a Python object. It expands to: (((PyVarObject*)(o))->ob_size) void Py_SET_SIZE(PyVarObject *o, Py_ssize_t size) Set the object *o* size to *size*. New in version 3.9. PyObject_HEAD_INIT(type) This is a macro which expands to initialization values for a new "PyObject" type. This macro expands to: _PyObject_EXTRA_INIT 1, type, PyVarObject_HEAD_INIT(type, size) This is a macro which expands to initialization values for a new "PyVarObject" type, including the "ob_size" field. This macro expands to: _PyObject_EXTRA_INIT 1, type, size, Implementing functions and methods ================================== PyCFunction Type of the functions used to implement most Python callables in C. Functions of this type take two "PyObject*" parameters and return one such value. If the return value is "NULL", an exception shall have been set. If not "NULL", the return value is interpreted as the return value of the function as exposed in Python. The function must return a new reference. The function signature is: PyObject *PyCFunction(PyObject *self, PyObject *args); PyCFunctionWithKeywords Type of the functions used to implement Python callables in C with signature "METH_VARARGS | METH_KEYWORDS". The function signature is: PyObject *PyCFunctionWithKeywords(PyObject *self, PyObject *args, PyObject *kwargs); _PyCFunctionFast Type of the functions used to implement Python callables in C with signature "METH_FASTCALL". The function signature is: PyObject *_PyCFunctionFast(PyObject *self, PyObject *const *args, Py_ssize_t nargs); _PyCFunctionFastWithKeywords Type of the functions used to implement Python callables in C with signature "METH_FASTCALL | METH_KEYWORDS". The function signature is: PyObject *_PyCFunctionFastWithKeywords(PyObject *self, PyObject *const *args, Py_ssize_t nargs, PyObject *kwnames); PyCMethod Type of the functions used to implement Python callables in C with signature "METH_METHOD | METH_FASTCALL | METH_KEYWORDS". The function signature is: PyObject *PyCMethod(PyObject *self, PyTypeObject *defining_class, PyObject *const *args, Py_ssize_t nargs, PyObject *kwnames) New in version 3.9. PyMethodDef Structure used to describe a method of an extension type. This structure has four fields: +--------------------+-----------------+---------------------------------+ | Field | C Type | Meaning | |====================|=================|=================================| | "ml_name" | const char * | name of the method | +--------------------+-----------------+---------------------------------+ | "ml_meth" | PyCFunction | pointer to the C implementation | +--------------------+-----------------+---------------------------------+ | "ml_flags" | int | flag bits indicating how the | | | | call should be constructed | +--------------------+-----------------+---------------------------------+ | "ml_doc" | const char * | points to the contents of the | | | | docstring | +--------------------+-----------------+---------------------------------+ The "ml_meth" is a C function pointer. The functions may be of different types, but they always return "PyObject*". If the function is not of the "PyCFunction", the compiler will require a cast in the method table. Even though "PyCFunction" defines the first parameter as "PyObject*", it is common that the method implementation uses the specific C type of the *self* object. The "ml_flags" field is a bitfield which can include the following flags. The individual flags indicate either a calling convention or a binding convention. There are these calling conventions: METH_VARARGS This is the typical calling convention, where the methods have the type "PyCFunction". The function expects two "PyObject*" values. The first one is the *self* object for methods; for module functions, it is the module object. The second parameter (often called *args*) is a tuple object representing all arguments. This parameter is typically processed using "PyArg_ParseTuple()" or "PyArg_UnpackTuple()". METH_VARARGS | METH_KEYWORDS Methods with these flags must be of type "PyCFunctionWithKeywords". The function expects three parameters: *self*, *args*, *kwargs* where *kwargs* is a dictionary of all the keyword arguments or possibly "NULL" if there are no keyword arguments. The parameters are typically processed using "PyArg_ParseTupleAndKeywords()". METH_FASTCALL Fast calling convention supporting only positional arguments. The methods have the type "_PyCFunctionFast". The first parameter is *self*, the second parameter is a C array of "PyObject*" values indicating the arguments and the third parameter is the number of arguments (the length of the array). This is not part of the limited API. New in version 3.7. METH_FASTCALL | METH_KEYWORDS Extension of "METH_FASTCALL" supporting also keyword arguments, with methods of type "_PyCFunctionFastWithKeywords". Keyword arguments are passed the same way as in the vectorcall protocol: there is an additional fourth "PyObject*" parameter which is a tuple representing the names of the keyword arguments (which are guaranteed to be strings) or possibly "NULL" if there are no keywords. The values of the keyword arguments are stored in the *args* array, after the positional arguments. This is not part of the limited API. New in version 3.7. METH_METHOD | METH_FASTCALL | METH_KEYWORDS Extension of "METH_FASTCALL | METH_KEYWORDS" supporting the *defining class*, that is, the class that contains the method in question. The defining class might be a superclass of "Py_TYPE(self)". The method needs to be of type "PyCMethod", the same as for "METH_FASTCALL | METH_KEYWORDS" with "defining_class" argument added after "self". New in version 3.9. METH_NOARGS Methods without parameters don’t need to check whether arguments are given if they are listed with the "METH_NOARGS" flag. They need to be of type "PyCFunction". The first parameter is typically named *self* and will hold a reference to the module or object instance. In all cases the second parameter will be "NULL". METH_O Methods with a single object argument can be listed with the "METH_O" flag, instead of invoking "PyArg_ParseTuple()" with a ""O"" argument. They have the type "PyCFunction", with the *self* parameter, and a "PyObject*" parameter representing the single argument. These two constants are not used to indicate the calling convention but the binding when use with methods of classes. These may not be used for functions defined for modules. At most one of these flags may be set for any given method. METH_CLASS The method will be passed the type object as the first parameter rather than an instance of the type. This is used to create *class methods*, similar to what is created when using the "classmethod()" built-in function. METH_STATIC The method will be passed "NULL" as the first parameter rather than an instance of the type. This is used to create *static methods*, similar to what is created when using the "staticmethod()" built-in function. One other constant controls whether a method is loaded in place of another definition with the same method name. METH_COEXIST The method will be loaded in place of existing definitions. Without *METH_COEXIST*, the default is to skip repeated definitions. Since slot wrappers are loaded before the method table, the existence of a *sq_contains* slot, for example, would generate a wrapped method named "__contains__()" and preclude the loading of a corresponding PyCFunction with the same name. With the flag defined, the PyCFunction will be loaded in place of the wrapper object and will co-exist with the slot. This is helpful because calls to PyCFunctions are optimized more than wrapper object calls. Accessing attributes of extension types ======================================= PyMemberDef Structure which describes an attribute of a type which corresponds to a C struct member. Its fields are: +--------------------+-----------------+---------------------------------+ | Field | C Type | Meaning | |====================|=================|=================================| | "name" | const char * | name of the member | +--------------------+-----------------+---------------------------------+ | "type" | int | the type of the member in the C | | | | struct | +--------------------+-----------------+---------------------------------+ | "offset" | Py_ssize_t | the offset in bytes that the | | | | member is located on the type’s | | | | object struct | +--------------------+-----------------+---------------------------------+ | "flags" | int | flag bits indicating if the | | | | field should be read-only or | | | | writable | +--------------------+-----------------+---------------------------------+ | "doc" | const char * | points to the contents of the | | | | docstring | +--------------------+-----------------+---------------------------------+ "type" can be one of many "T_" macros corresponding to various C types. When the member is accessed in Python, it will be converted to the equivalent Python type. +-----------------+--------------------+ | Macro name | C type | |=================|====================| | T_SHORT | short | +-----------------+--------------------+ | T_INT | int | +-----------------+--------------------+ | T_LONG | long | +-----------------+--------------------+ | T_FLOAT | float | +-----------------+--------------------+ | T_DOUBLE | double | +-----------------+--------------------+ | T_STRING | const char * | +-----------------+--------------------+ | T_OBJECT | PyObject * | +-----------------+--------------------+ | T_OBJECT_EX | PyObject * | +-----------------+--------------------+ | T_CHAR | char | +-----------------+--------------------+ | T_BYTE | char | +-----------------+--------------------+ | T_UBYTE | unsigned char | +-----------------+--------------------+ | T_UINT | unsigned int | +-----------------+--------------------+ | T_USHORT | unsigned short | +-----------------+--------------------+ | T_ULONG | unsigned long | +-----------------+--------------------+ | T_BOOL | char | +-----------------+--------------------+ | T_LONGLONG | long long | +-----------------+--------------------+ | T_ULONGLONG | unsigned long long | +-----------------+--------------------+ | T_PYSSIZET | Py_ssize_t | +-----------------+--------------------+ "T_OBJECT" and "T_OBJECT_EX" differ in that "T_OBJECT" returns "None" if the member is "NULL" and "T_OBJECT_EX" raises an "AttributeError". Try to use "T_OBJECT_EX" over "T_OBJECT" because "T_OBJECT_EX" handles use of the "del" statement on that attribute more correctly than "T_OBJECT". "flags" can be "0" for write and read access or "READONLY" for read-only access. Using "T_STRING" for "type" implies "READONLY". "T_STRING" data is interpreted as UTF-8. Only "T_OBJECT" and "T_OBJECT_EX" members can be deleted. (They are set to "NULL"). Heap allocated types (created using "PyType_FromSpec()" or similar), "PyMemberDef" may contain definitions for the special members "__dictoffset__", "__weaklistoffset__" and "__vectorcalloffset__", corresponding to "tp_dictoffset", "tp_weaklistoffset" and "tp_vectorcall_offset" in type objects. These must be defined with "T_PYSSIZET" and "READONLY", for example: static PyMemberDef spam_type_members[] = { {"__dictoffset__", T_PYSSIZET, offsetof(Spam_object, dict), READONLY}, {NULL} /* Sentinel */ }; PyObject* PyMember_GetOne(const char *obj_addr, struct PyMemberDef *m) Get an attribute belonging to the object at address *obj_addr*. The attribute is described by "PyMemberDef" *m*. Returns "NULL" on error. int PyMember_SetOne(char *obj_addr, struct PyMemberDef *m, PyObject *o) Set an attribute belonging to the object at address *obj_addr* to object *o*. The attribute to set is described by "PyMemberDef" *m*. Returns "0" if successful and a negative value on failure. PyGetSetDef Structure to define property-like access for a type. See also description of the "PyTypeObject.tp_getset" slot. +---------------+--------------------+-------------------------------------+ | Field | C Type | Meaning | |===============|====================|=====================================| | name | const char * | attribute name | +---------------+--------------------+-------------------------------------+ | get | getter | C function to get the attribute | +---------------+--------------------+-------------------------------------+ | set | setter | optional C function to set or | | | | delete the attribute, if omitted | | | | the attribute is readonly | +---------------+--------------------+-------------------------------------+ | doc | const char * | optional docstring | +---------------+--------------------+-------------------------------------+ | closure | void * | optional function pointer, | | | | providing additional data for | | | | getter and setter | +---------------+--------------------+-------------------------------------+ The "get" function takes one "PyObject*" parameter (the instance) and a function pointer (the associated "closure"): typedef PyObject *(*getter)(PyObject *, void *); It should return a new reference on success or "NULL" with a set exception on failure. "set" functions take two "PyObject*" parameters (the instance and the value to be set) and a function pointer (the associated "closure"): typedef int (*setter)(PyObject *, PyObject *, void *); In case the attribute should be deleted the second parameter is "NULL". Should return "0" on success or "-1" with a set exception on failure. Operating System Utilities ************************** PyObject* PyOS_FSPath(PyObject *path) *Return value: New reference.* Return the file system representation for *path*. If the object is a "str" or "bytes" object, then its reference count is incremented. If the object implements the "os.PathLike" interface, then "__fspath__()" is returned as long as it is a "str" or "bytes" object. Otherwise "TypeError" is raised and "NULL" is returned. New in version 3.6. int Py_FdIsInteractive(FILE *fp, const char *filename) Return true (nonzero) if the standard I/O file *fp* with name *filename* is deemed interactive. This is the case for files for which "isatty(fileno(fp))" is true. If the global flag "Py_InteractiveFlag" is true, this function also returns true if the *filename* pointer is "NULL" or if the name is equal to one of the strings "''" or "'???'". void PyOS_BeforeFork() Function to prepare some internal state before a process fork. This should be called before calling "fork()" or any similar function that clones the current process. Only available on systems where "fork()" is defined. Warning: The C "fork()" call should only be made from the “main” thread (of the “main” interpreter). The same is true for "PyOS_BeforeFork()". New in version 3.7. void PyOS_AfterFork_Parent() Function to update some internal state after a process fork. This should be called from the parent process after calling "fork()" or any similar function that clones the current process, regardless of whether process cloning was successful. Only available on systems where "fork()" is defined. Warning: The C "fork()" call should only be made from the “main” thread (of the “main” interpreter). The same is true for "PyOS_AfterFork_Parent()". New in version 3.7. void PyOS_AfterFork_Child() Function to update internal interpreter state after a process fork. This must be called from the child process after calling "fork()", or any similar function that clones the current process, if there is any chance the process will call back into the Python interpreter. Only available on systems where "fork()" is defined. Warning: The C "fork()" call should only be made from the “main” thread (of the “main” interpreter). The same is true for "PyOS_AfterFork_Child()". New in version 3.7. See also: "os.register_at_fork()" allows registering custom Python functions to be called by "PyOS_BeforeFork()", "PyOS_AfterFork_Parent()" and "PyOS_AfterFork_Child()". void PyOS_AfterFork() Function to update some internal state after a process fork; this should be called in the new process if the Python interpreter will continue to be used. If a new executable is loaded into the new process, this function does not need to be called. Deprecated since version 3.7: This function is superseded by "PyOS_AfterFork_Child()". int PyOS_CheckStack() Return true when the interpreter runs out of stack space. This is a reliable check, but is only available when "USE_STACKCHECK" is defined (currently on Windows using the Microsoft Visual C++ compiler). "USE_STACKCHECK" will be defined automatically; you should never change the definition in your own code. PyOS_sighandler_t PyOS_getsig(int i) Return the current signal handler for signal *i*. This is a thin wrapper around either "sigaction()" or "signal()". Do not call those functions directly! "PyOS_sighandler_t" is a typedef alias for "void (*)(int)". PyOS_sighandler_t PyOS_setsig(int i, PyOS_sighandler_t h) Set the signal handler for signal *i* to be *h*; return the old signal handler. This is a thin wrapper around either "sigaction()" or "signal()". Do not call those functions directly! "PyOS_sighandler_t" is a typedef alias for "void (*)(int)". wchar_t* Py_DecodeLocale(const char* arg, size_t *size) Decode a byte string from the locale encoding with the surrogateescape error handler: undecodable bytes are decoded as characters in range U+DC80..U+DCFF. If a byte sequence can be decoded as a surrogate character, escape the bytes using the surrogateescape error handler instead of decoding them. Encoding, highest priority to lowest priority: * "UTF-8" on macOS, Android, and VxWorks; * "UTF-8" on Windows if "Py_LegacyWindowsFSEncodingFlag" is zero; * "UTF-8" if the Python UTF-8 mode is enabled; * "ASCII" if the "LC_CTYPE" locale is ""C"", "nl_langinfo(CODESET)" returns the "ASCII" encoding (or an alias), and "mbstowcs()" and "wcstombs()" functions uses the "ISO-8859-1" encoding. * the current locale encoding. Return a pointer to a newly allocated wide character string, use "PyMem_RawFree()" to free the memory. If size is not "NULL", write the number of wide characters excluding the null character into "*size" Return "NULL" on decoding error or memory allocation error. If *size* is not "NULL", "*size" is set to "(size_t)-1" on memory error or set to "(size_t)-2" on decoding error. Decoding errors should never happen, unless there is a bug in the C library. Use the "Py_EncodeLocale()" function to encode the character string back to a byte string. See also: The "PyUnicode_DecodeFSDefaultAndSize()" and "PyUnicode_DecodeLocaleAndSize()" functions. New in version 3.5. Changed in version 3.7: The function now uses the UTF-8 encoding in the UTF-8 mode. Changed in version 3.8: The function now uses the UTF-8 encoding on Windows if "Py_LegacyWindowsFSEncodingFlag" is zero; char* Py_EncodeLocale(const wchar_t *text, size_t *error_pos) Encode a wide character string to the locale encoding with the surrogateescape error handler: surrogate characters in the range U+DC80..U+DCFF are converted to bytes 0x80..0xFF. Encoding, highest priority to lowest priority: * "UTF-8" on macOS, Android, and VxWorks; * "UTF-8" on Windows if "Py_LegacyWindowsFSEncodingFlag" is zero; * "UTF-8" if the Python UTF-8 mode is enabled; * "ASCII" if the "LC_CTYPE" locale is ""C"", "nl_langinfo(CODESET)" returns the "ASCII" encoding (or an alias), and "mbstowcs()" and "wcstombs()" functions uses the "ISO-8859-1" encoding. * the current locale encoding. The function uses the UTF-8 encoding in the Python UTF-8 mode. Return a pointer to a newly allocated byte string, use "PyMem_Free()" to free the memory. Return "NULL" on encoding error or memory allocation error. If error_pos is not "NULL", "*error_pos" is set to "(size_t)-1" on success, or set to the index of the invalid character on encoding error. Use the "Py_DecodeLocale()" function to decode the bytes string back to a wide character string. See also: The "PyUnicode_EncodeFSDefault()" and "PyUnicode_EncodeLocale()" functions. New in version 3.5. Changed in version 3.7: The function now uses the UTF-8 encoding in the UTF-8 mode. Changed in version 3.8: The function now uses the UTF-8 encoding on Windows if "Py_LegacyWindowsFSEncodingFlag" is zero. System Functions **************** These are utility functions that make functionality from the "sys" module accessible to C code. They all work with the current interpreter thread’s "sys" module’s dict, which is contained in the internal thread state structure. PyObject *PySys_GetObject(const char *name) *Return value: Borrowed reference.* Return the object *name* from the "sys" module or "NULL" if it does not exist, without setting an exception. int PySys_SetObject(const char *name, PyObject *v) Set *name* in the "sys" module to *v* unless *v* is "NULL", in which case *name* is deleted from the sys module. Returns "0" on success, "-1" on error. void PySys_ResetWarnOptions() Reset "sys.warnoptions" to an empty list. This function may be called prior to "Py_Initialize()". void PySys_AddWarnOption(const wchar_t *s) Append *s* to "sys.warnoptions". This function must be called prior to "Py_Initialize()" in order to affect the warnings filter list. void PySys_AddWarnOptionUnicode(PyObject *unicode) Append *unicode* to "sys.warnoptions". Note: this function is not currently usable from outside the CPython implementation, as it must be called prior to the implicit import of "warnings" in "Py_Initialize()" to be effective, but can’t be called until enough of the runtime has been initialized to permit the creation of Unicode objects. void PySys_SetPath(const wchar_t *path) Set "sys.path" to a list object of paths found in *path* which should be a list of paths separated with the platform’s search path delimiter (":" on Unix, ";" on Windows). void PySys_WriteStdout(const char *format, ...) Write the output string described by *format* to "sys.stdout". No exceptions are raised, even if truncation occurs (see below). *format* should limit the total size of the formatted output string to 1000 bytes or less – after 1000 bytes, the output string is truncated. In particular, this means that no unrestricted “%s” formats should occur; these should be limited using “%.s” where is a decimal number calculated so that plus the maximum size of other formatted text does not exceed 1000 bytes. Also watch out for “%f”, which can print hundreds of digits for very large numbers. If a problem occurs, or "sys.stdout" is unset, the formatted message is written to the real (C level) *stdout*. void PySys_WriteStderr(const char *format, ...) As "PySys_WriteStdout()", but write to "sys.stderr" or *stderr* instead. void PySys_FormatStdout(const char *format, ...) Function similar to PySys_WriteStdout() but format the message using "PyUnicode_FromFormatV()" and don’t truncate the message to an arbitrary length. New in version 3.2. void PySys_FormatStderr(const char *format, ...) As "PySys_FormatStdout()", but write to "sys.stderr" or *stderr* instead. New in version 3.2. void PySys_AddXOption(const wchar_t *s) Parse *s* as a set of "-X" options and add them to the current options mapping as returned by "PySys_GetXOptions()". This function may be called prior to "Py_Initialize()". New in version 3.2. PyObject *PySys_GetXOptions() *Return value: Borrowed reference.* Return the current dictionary of "-X" options, similarly to "sys._xoptions". On error, "NULL" is returned and an exception is set. New in version 3.2. int PySys_Audit(const char *event, const char *format, ...) Raise an auditing event with any active hooks. Return zero for success and non-zero with an exception set on failure. If any hooks have been added, *format* and other arguments will be used to construct a tuple to pass. Apart from "N", the same format characters as used in "Py_BuildValue()" are available. If the built value is not a tuple, it will be added into a single-element tuple. (The "N" format option consumes a reference, but since there is no way to know whether arguments to this function will be consumed, using it may cause reference leaks.) Note that "#" format characters should always be treated as "Py_ssize_t", regardless of whether "PY_SSIZE_T_CLEAN" was defined. "sys.audit()" performs the same function from Python code. New in version 3.8. Changed in version 3.8.2: Require "Py_ssize_t" for "#" format characters. Previously, an unavoidable deprecation warning was raised. int PySys_AddAuditHook(Py_AuditHookFunction hook, void *userData) Append the callable *hook* to the list of active auditing hooks. Return zero on success and non-zero on failure. If the runtime has been initialized, also set an error on failure. Hooks added through this API are called for all interpreters created by the runtime. The *userData* pointer is passed into the hook function. Since hook functions may be called from different runtimes, this pointer should not refer directly to Python state. This function is safe to call before "Py_Initialize()". When called after runtime initialization, existing audit hooks are notified and may silently abort the operation by raising an error subclassed from "Exception" (other errors will not be silenced). The hook function is of type "int (*)(const char *event, PyObject *args, void *userData)", where *args* is guaranteed to be a "PyTupleObject". The hook function is always called with the GIL held by the Python interpreter that raised the event. See **PEP 578** for a detailed description of auditing. Functions in the runtime and standard library that raise events are listed in the audit events table. Details are in each function’s documentation. If the interpreter is initialized, this function raises a auditing event "sys.addaudithook" with no arguments. If any existing hooks raise an exception derived from "Exception", the new hook will not be added and the exception is cleared. As a result, callers cannot assume that their hook has been added unless they control all existing hooks. New in version 3.8. Process Control *************** void Py_FatalError(const char *message) Print a fatal error message and kill the process. No cleanup is performed. This function should only be invoked when a condition is detected that would make it dangerous to continue using the Python interpreter; e.g., when the object administration appears to be corrupted. On Unix, the standard C library function "abort()" is called which will attempt to produce a "core" file. The "Py_FatalError()" function is replaced with a macro which logs automatically the name of the current function, unless the "Py_LIMITED_API" macro is defined. Changed in version 3.9: Log the function name automatically. void Py_Exit(int status) Exit the current process. This calls "Py_FinalizeEx()" and then calls the standard C library function "exit(status)". If "Py_FinalizeEx()" indicates an error, the exit status is set to 120. Changed in version 3.6: Errors from finalization no longer ignored. int Py_AtExit(void (*func)()) Register a cleanup function to be called by "Py_FinalizeEx()". The cleanup function will be called with no arguments and should return no value. At most 32 cleanup functions can be registered. When the registration is successful, "Py_AtExit()" returns "0"; on failure, it returns "-1". The cleanup function registered last is called first. Each cleanup function will be called at most once. Since Python’s internal finalization will have completed before the cleanup function, no Python APIs should be called by *func*. Tuple Objects ************* PyTupleObject This subtype of "PyObject" represents a Python tuple object. PyTypeObject PyTuple_Type This instance of "PyTypeObject" represents the Python tuple type; it is the same object as "tuple" in the Python layer. int PyTuple_Check(PyObject *p) Return true if *p* is a tuple object or an instance of a subtype of the tuple type. This function always succeeds. int PyTuple_CheckExact(PyObject *p) Return true if *p* is a tuple object, but not an instance of a subtype of the tuple type. This function always succeeds. PyObject* PyTuple_New(Py_ssize_t len) *Return value: New reference.* Return a new tuple object of size *len*, or "NULL" on failure. PyObject* PyTuple_Pack(Py_ssize_t n, ...) *Return value: New reference.* Return a new tuple object of size *n*, or "NULL" on failure. The tuple values are initialized to the subsequent *n* C arguments pointing to Python objects. "PyTuple_Pack(2, a, b)" is equivalent to "Py_BuildValue("(OO)", a, b)". Py_ssize_t PyTuple_Size(PyObject *p) Take a pointer to a tuple object, and return the size of that tuple. Py_ssize_t PyTuple_GET_SIZE(PyObject *p) Return the size of the tuple *p*, which must be non-"NULL" and point to a tuple; no error checking is performed. PyObject* PyTuple_GetItem(PyObject *p, Py_ssize_t pos) *Return value: Borrowed reference.* Return the object at position *pos* in the tuple pointed to by *p*. If *pos* is out of bounds, return "NULL" and set an "IndexError" exception. PyObject* PyTuple_GET_ITEM(PyObject *p, Py_ssize_t pos) *Return value: Borrowed reference.* Like "PyTuple_GetItem()", but does no checking of its arguments. PyObject* PyTuple_GetSlice(PyObject *p, Py_ssize_t low, Py_ssize_t high) *Return value: New reference.* Return the slice of the tuple pointed to by *p* between *low* and *high*, or "NULL" on failure. This is the equivalent of the Python expression "p[low:high]". Indexing from the end of the list is not supported. int PyTuple_SetItem(PyObject *p, Py_ssize_t pos, PyObject *o) Insert a reference to object *o* at position *pos* of the tuple pointed to by *p*. Return "0" on success. If *pos* is out of bounds, return "-1" and set an "IndexError" exception. Note: This function “steals” a reference to *o* and discards a reference to an item already in the tuple at the affected position. void PyTuple_SET_ITEM(PyObject *p, Py_ssize_t pos, PyObject *o) Like "PyTuple_SetItem()", but does no error checking, and should *only* be used to fill in brand new tuples. Note: This macro “steals” a reference to *o*, and, unlike "PyTuple_SetItem()", does *not* discard a reference to any item that is being replaced; any reference in the tuple at position *pos* will be leaked. int _PyTuple_Resize(PyObject **p, Py_ssize_t newsize) Can be used to resize a tuple. *newsize* will be the new length of the tuple. Because tuples are *supposed* to be immutable, this should only be used if there is only one reference to the object. Do *not* use this if the tuple may already be known to some other part of the code. The tuple will always grow or shrink at the end. Think of this as destroying the old tuple and creating a new one, only more efficiently. Returns "0" on success. Client code should never assume that the resulting value of "*p" will be the same as before calling this function. If the object referenced by "*p" is replaced, the original "*p" is destroyed. On failure, returns "-1" and sets "*p" to "NULL", and raises "MemoryError" or "SystemError". Struct Sequence Objects *********************** Struct sequence objects are the C equivalent of "namedtuple()" objects, i.e. a sequence whose items can also be accessed through attributes. To create a struct sequence, you first have to create a specific struct sequence type. PyTypeObject* PyStructSequence_NewType(PyStructSequence_Desc *desc) *Return value: New reference.* Create a new struct sequence type from the data in *desc*, described below. Instances of the resulting type can be created with "PyStructSequence_New()". void PyStructSequence_InitType(PyTypeObject *type, PyStructSequence_Desc *desc) Initializes a struct sequence type *type* from *desc* in place. int PyStructSequence_InitType2(PyTypeObject *type, PyStructSequence_Desc *desc) The same as "PyStructSequence_InitType", but returns "0" on success and "-1" on failure. New in version 3.4. PyStructSequence_Desc Contains the meta information of a struct sequence type to create. +---------------------+--------------------------------+----------------------------------------+ | Field | C Type | Meaning | |=====================|================================|========================================| | "name" | "const char *" | name of the struct sequence type | +---------------------+--------------------------------+----------------------------------------+ | "doc" | "const char *" | pointer to docstring for the type or | | | | "NULL" to omit | +---------------------+--------------------------------+----------------------------------------+ | "fields" | "PyStructSequence_Field *" | pointer to "NULL"-terminated array | | | | with field names of the new type | +---------------------+--------------------------------+----------------------------------------+ | "n_in_sequence" | "int" | number of fields visible to the Python | | | | side (if used as tuple) | +---------------------+--------------------------------+----------------------------------------+ PyStructSequence_Field Describes a field of a struct sequence. As a struct sequence is modeled as a tuple, all fields are typed as "PyObject*". The index in the "fields" array of the "PyStructSequence_Desc" determines which field of the struct sequence is described. +-------------+--------------------+-------------------------------------------+ | Field | C Type | Meaning | |=============|====================|===========================================| | "name" | "const char *" | name for the field or "NULL" to end the | | | | list of named fields, set to | | | | "PyStructSequence_UnnamedField" to leave | | | | unnamed | +-------------+--------------------+-------------------------------------------+ | "doc" | "const char *" | field docstring or "NULL" to omit | +-------------+--------------------+-------------------------------------------+ const char * const PyStructSequence_UnnamedField Special value for a field name to leave it unnamed. Changed in version 3.9: The type was changed from "char *". PyObject* PyStructSequence_New(PyTypeObject *type) *Return value: New reference.* Creates an instance of *type*, which must have been created with "PyStructSequence_NewType()". PyObject* PyStructSequence_GetItem(PyObject *p, Py_ssize_t pos) *Return value: Borrowed reference.* Return the object at position *pos* in the struct sequence pointed to by *p*. No bounds checking is performed. PyObject* PyStructSequence_GET_ITEM(PyObject *p, Py_ssize_t pos) *Return value: Borrowed reference.* Macro equivalent of "PyStructSequence_GetItem()". void PyStructSequence_SetItem(PyObject *p, Py_ssize_t pos, PyObject *o) Sets the field at index *pos* of the struct sequence *p* to value *o*. Like "PyTuple_SET_ITEM()", this should only be used to fill in brand new instances. Note: This function “steals” a reference to *o*. void PyStructSequence_SET_ITEM(PyObject *p, Py_ssize_t *pos, PyObject *o) Macro equivalent of "PyStructSequence_SetItem()". Note: This function “steals” a reference to *o*. Type Objects ************ PyTypeObject The C structure of the objects used to describe built-in types. PyTypeObject PyType_Type This is the type object for type objects; it is the same object as "type" in the Python layer. int PyType_Check(PyObject *o) Return non-zero if the object *o* is a type object, including instances of types derived from the standard type object. Return 0 in all other cases. This function always succeeds. int PyType_CheckExact(PyObject *o) Return non-zero if the object *o* is a type object, but not a subtype of the standard type object. Return 0 in all other cases. This function always succeeds. unsigned int PyType_ClearCache() Clear the internal lookup cache. Return the current version tag. unsigned long PyType_GetFlags(PyTypeObject* type) Return the "tp_flags" member of *type*. This function is primarily meant for use with *Py_LIMITED_API*; the individual flag bits are guaranteed to be stable across Python releases, but access to "tp_flags" itself is not part of the limited API. New in version 3.2. Changed in version 3.4: The return type is now "unsigned long" rather than "long". void PyType_Modified(PyTypeObject *type) Invalidate the internal lookup cache for the type and all of its subtypes. This function must be called after any manual modification of the attributes or base classes of the type. int PyType_HasFeature(PyTypeObject *o, int feature) Return non-zero if the type object *o* sets the feature *feature*. Type features are denoted by single bit flags. int PyType_IS_GC(PyTypeObject *o) Return true if the type object includes support for the cycle detector; this tests the type flag "Py_TPFLAGS_HAVE_GC". int PyType_IsSubtype(PyTypeObject *a, PyTypeObject *b) Return true if *a* is a subtype of *b*. This function only checks for actual subtypes, which means that "__subclasscheck__()" is not called on *b*. Call "PyObject_IsSubclass()" to do the same check that "issubclass()" would do. PyObject* PyType_GenericAlloc(PyTypeObject *type, Py_ssize_t nitems) *Return value: New reference.* Generic handler for the "tp_alloc" slot of a type object. Use Python’s default memory allocation mechanism to allocate a new instance and initialize all its contents to "NULL". PyObject* PyType_GenericNew(PyTypeObject *type, PyObject *args, PyObject *kwds) *Return value: New reference.* Generic handler for the "tp_new" slot of a type object. Create a new instance using the type’s "tp_alloc" slot. int PyType_Ready(PyTypeObject *type) Finalize a type object. This should be called on all type objects to finish their initialization. This function is responsible for adding inherited slots from a type’s base class. Return "0" on success, or return "-1" and sets an exception on error. Note: If some of the base classes implements the GC protocol and the provided type does not include the "Py_TPFLAGS_HAVE_GC" in its flags, then the GC protocol will be automatically implemented from its parents. On the contrary, if the type being created does include "Py_TPFLAGS_HAVE_GC" in its flags then it **must** implement the GC protocol itself by at least implementing the "tp_traverse" handle. void* PyType_GetSlot(PyTypeObject *type, int slot) Return the function pointer stored in the given slot. If the result is "NULL", this indicates that either the slot is "NULL", or that the function was called with invalid parameters. Callers will typically cast the result pointer into the appropriate function type. See "PyType_Slot.slot" for possible values of the *slot* argument. An exception is raised if *type* is not a heap type. New in version 3.4. PyObject* PyType_GetModule(PyTypeObject *type) Return the module object associated with the given type when the type was created using "PyType_FromModuleAndSpec()". If no module is associated with the given type, sets "TypeError" and returns "NULL". This function is usually used to get the module in which a method is defined. Note that in such a method, "PyType_GetModule(Py_TYPE(self))" may not return the intended result. "Py_TYPE(self)" may be a *subclass* of the intended class, and subclasses are not necessarily defined in the same module as their superclass. See "PyCMethod" to get the class that defines the method. New in version 3.9. void* PyType_GetModuleState(PyTypeObject *type) Return the state of the module object associated with the given type. This is a shortcut for calling "PyModule_GetState()" on the result of "PyType_GetModule()". If no module is associated with the given type, sets "TypeError" and returns "NULL". If the *type* has an associated module but its state is "NULL", returns "NULL" without setting an exception. New in version 3.9. Creating Heap-Allocated Types ============================= The following functions and structs are used to create heap types. PyObject* PyType_FromModuleAndSpec(PyObject *module, PyType_Spec *spec, PyObject *bases) *Return value: New reference.* Creates and returns a heap type object from the *spec* ("Py_TPFLAGS_HEAPTYPE"). If *bases* is a tuple, the created heap type contains all types contained in it as base types. If *bases* is "NULL", the *Py_tp_bases* slot is used instead. If that also is "NULL", the *Py_tp_base* slot is used instead. If that also is "NULL", the new type derives from "object". The *module* argument can be used to record the module in which the new class is defined. It must be a module object or "NULL". If not "NULL", the module is associated with the new type and can later be retrieved with "PyType_GetModule()". The associated module is not inherited by subclasses; it must be specified for each class individually. This function calls "PyType_Ready()" on the new type. New in version 3.9. PyObject* PyType_FromSpecWithBases(PyType_Spec *spec, PyObject *bases) *Return value: New reference.* Equivalent to "PyType_FromModuleAndSpec(NULL, spec, bases)". New in version 3.3. PyObject* PyType_FromSpec(PyType_Spec *spec) *Return value: New reference.* Equivalent to "PyType_FromSpecWithBases(spec, NULL)". PyType_Spec Structure defining a type’s behavior. const char* PyType_Spec.name Name of the type, used to set "PyTypeObject.tp_name". int PyType_Spec.basicsize int PyType_Spec.itemsize Size of the instance in bytes, used to set "PyTypeObject.tp_basicsize" and "PyTypeObject.tp_itemsize". int PyType_Spec.flags Type flags, used to set "PyTypeObject.tp_flags". If the "Py_TPFLAGS_HEAPTYPE" flag is not set, "PyType_FromSpecWithBases()" sets it automatically. PyType_Slot *PyType_Spec.slots Array of "PyType_Slot" structures. Terminated by the special slot value "{0, NULL}". PyType_Slot Structure defining optional functionality of a type, containing a slot ID and a value pointer. int PyType_Slot.slot A slot ID. Slot IDs are named like the field names of the structures "PyTypeObject", "PyNumberMethods", "PySequenceMethods", "PyMappingMethods" and "PyAsyncMethods" with an added "Py_" prefix. For example, use: * "Py_tp_dealloc" to set "PyTypeObject.tp_dealloc" * "Py_nb_add" to set "PyNumberMethods.nb_add" * "Py_sq_length" to set "PySequenceMethods.sq_length" The following fields cannot be set at all using "PyType_Spec" and "PyType_Slot": * "tp_dict" * "tp_mro" * "tp_cache" * "tp_subclasses" * "tp_weaklist" * "tp_vectorcall" * "tp_weaklistoffset" (see PyMemberDef) * "tp_dictoffset" (see PyMemberDef) * "tp_vectorcall_offset" (see PyMemberDef) The following fields cannot be set using "PyType_Spec" and "PyType_Slot" under the limited API: * "bf_getbuffer" * "bf_releasebuffer" Setting "Py_tp_bases" or "Py_tp_base" may be problematic on some platforms. To avoid issues, use the *bases* argument of "PyType_FromSpecWithBases()" instead. Changed in version 3.9: Slots in "PyBufferProcs" may be set in the unlimited API. void *PyType_Slot.pfunc The desired value of the slot. In most cases, this is a pointer to a function. May not be "NULL". Objects for Type Hinting ************************ Various built-in types for type hinting are provided. Only GenericAlias is exposed to C. PyObject* Py_GenericAlias(PyObject *origin, PyObject *args) Create a GenericAlias object. Equivalent to calling the Python class "types.GenericAlias". The *origin* and *args* arguments set the "GenericAlias"’s "__origin__" and "__args__" attributes respectively. *origin* should be a "PyTypeObject*", and *args* can be a "PyTupleObject*" or any "PyObject*". If *args* passed is not a tuple, a 1-tuple is automatically constructed and "__args__" is set to "(args,)". Minimal checking is done for the arguments, so the function will succeed even if *origin* is not a type. The "GenericAlias"’s "__parameters__" attribute is constructed lazily from "__args__". On failure, an exception is raised and "NULL" is returned. Here’s an example of how to make an extension type generic: ... static PyMethodDef my_obj_methods[] = { // Other methods. ... {"__class_getitem__", (PyCFunction)Py_GenericAlias, METH_O|METH_CLASS, "See PEP 585"} ... } See also: The data model method "__class_getitem__()". New in version 3.9. PyTypeObject Py_GenericAliasType The C type of the object returned by "Py_GenericAlias()". Equivalent to "types.GenericAlias" in Python. New in version 3.9. Type Objects ************ Perhaps one of the most important structures of the Python object system is the structure that defines a new type: the "PyTypeObject" structure. Type objects can be handled using any of the "PyObject_*()" or "PyType_*()" functions, but do not offer much that’s interesting to most Python applications. These objects are fundamental to how objects behave, so they are very important to the interpreter itself and to any extension module that implements new types. Type objects are fairly large compared to most of the standard types. The reason for the size is that each type object stores a large number of values, mostly C function pointers, each of which implements a small part of the type’s functionality. The fields of the type object are examined in detail in this section. The fields will be described in the order in which they occur in the structure. In addition to the following quick reference, the Examples section provides at-a-glance insight into the meaning and use of "PyTypeObject". Quick Reference =============== “tp slots” ---------- +--------------------+--------------------+--------------------+----+----+----+----+ | PyTypeObject Slot | Type | special | Info [2] | | [1] | | methods/attrs | | | | | +----+----+----+----+ | | | | O | T | D | I | | | | | | | | | |====================|====================|====================|====|====|====|====| | "tp_name" | const char * | __name__ | X | X | | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_basicsize" | "Py_ssize_t" | | X | X | | X | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_itemsize" | "Py_ssize_t" | | | X | | X | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_dealloc" | "destructor" | | X | X | | X | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_vectorcall_of | "Py_ssize_t" | | | X | | X | | fset" | | | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | ("tp_getattr") | "getattrfunc" | __getattribute__, | | | | G | | | | __getattr__ | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | ("tp_setattr") | "setattrfunc" | __setattr__, | | | | G | | | | __delattr__ | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_as_async" | "PyAsyncMethods" * | sub-slots | | | | % | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_repr" | "reprfunc" | __repr__ | X | X | | X | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_as_number" | "PyNumberMethods" | sub-slots | | | | % | | | * | | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_as_sequence" | "PySequenceMethod | sub-slots | | | | % | | | s" * | | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_as_mapping" | "PyMappingMethods" | sub-slots | | | | % | | | * | | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_hash" | "hashfunc" | __hash__ | X | | | G | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_call" | "ternaryfunc" | __call__ | | X | | X | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_str" | "reprfunc" | __str__ | X | | | X | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_getattro" | "getattrofunc" | __getattribute__, | X | X | | G | | | | __getattr__ | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_setattro" | "setattrofunc" | __setattr__, | X | X | | G | | | | __delattr__ | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_as_buffer" | "PyBufferProcs" * | | | | | % | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_flags" | unsigned long | | X | X | | ? | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_doc" | const char * | __doc__ | X | X | | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_traverse" | "traverseproc" | | | X | | G | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_clear" | "inquiry" | | | X | | G | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_richcompare" | "richcmpfunc" | __lt__, __le__, | X | | | G | | | | __eq__, __ne__, | | | | | | | | __gt__, __ge__ | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_weaklistoffse | "Py_ssize_t" | | | X | | ? | | t" | | | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_iter" | "getiterfunc" | __iter__ | | | | X | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_iternext" | "iternextfunc" | __next__ | | | | X | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_methods" | "PyMethodDef" [] | | X | X | | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_members" | "PyMemberDef" [] | | | X | | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_getset" | "PyGetSetDef" [] | | X | X | | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_base" | "PyTypeObject" * | __base__ | | | X | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_dict" | "PyObject" * | __dict__ | | | ? | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_descr_get" | "descrgetfunc" | __get__ | | | | X | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_descr_set" | "descrsetfunc" | __set__, | | | | X | | | | __delete__ | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_dictoffset" | "Py_ssize_t" | | | X | | ? | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_init" | "initproc" | __init__ | X | X | | X | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_alloc" | "allocfunc" | | X | | ? | ? | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_new" | "newfunc" | __new__ | X | X | ? | ? | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_free" | "freefunc" | | X | X | ? | ? | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_is_gc" | "inquiry" | | | X | | X | +--------------------+--------------------+--------------------+----+----+----+----+ | <"tp_bases"> | "PyObject" * | __bases__ | | | ~ | | +--------------------+--------------------+--------------------+----+----+----+----+ | <"tp_mro"> | "PyObject" * | __mro__ | | | ~ | | +--------------------+--------------------+--------------------+----+----+----+----+ | ["tp_cache"] | "PyObject" * | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | ["tp_subclasses"] | "PyObject" * | __subclasses__ | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | ["tp_weaklist"] | "PyObject" * | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | ("tp_del") | "destructor" | | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | ["tp_version_tag"] | unsigned int | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_finalize" | "destructor" | __del__ | | | | X | +--------------------+--------------------+--------------------+----+----+----+----+ | "tp_vectorcall" | "vectorcallfunc" | | | | | | +--------------------+--------------------+--------------------+----+----+----+----+ [1] A slot name in parentheses indicates it is (effectively) deprecated. Names in angle brackets should be treated as read- only. Names in square brackets are for internal use only. “” (as a prefix) means the field is required (must be non-"NULL"). [2] Columns: **“O”**: set on "PyBaseObject_Type" **“T”**: set on "PyType_Type" **“D”**: default (if slot is set to "NULL") X - PyType_Ready sets this value if it is NULL ~ - PyType_Ready always sets this value (it should be NULL) ? - PyType_Ready may set this value depending on other slots Also see the inheritance column ("I"). **“I”**: inheritance X - type slot is inherited via *PyType_Ready* if defined with a *NULL* value % - the slots of the sub-struct are inherited individually G - inherited, but only in combination with other slots; see the slot's description ? - it's complicated; see the slot's description Note that some slots are effectively inherited through the normal attribute lookup chain. sub-slots --------- +----------------------------+-------------------+--------------+ | Slot | Type | special | | | | methods | |============================|===================|==============| | "am_await" | "unaryfunc" | __await__ | +----------------------------+-------------------+--------------+ | "am_aiter" | "unaryfunc" | __aiter__ | +----------------------------+-------------------+--------------+ | "am_anext" | "unaryfunc" | __anext__ | +----------------------------+-------------------+--------------+ | | +----------------------------+-------------------+--------------+ | "nb_add" | "binaryfunc" | __add__ | | | | __radd__ | +----------------------------+-------------------+--------------+ | "nb_inplace_add" | "binaryfunc" | __iadd__ | +----------------------------+-------------------+--------------+ | "nb_subtract" | "binaryfunc" | __sub__ | | | | __rsub__ | +----------------------------+-------------------+--------------+ | "nb_inplace_subtract" | "binaryfunc" | __isub__ | +----------------------------+-------------------+--------------+ | "nb_multiply" | "binaryfunc" | __mul__ | | | | __rmul__ | +----------------------------+-------------------+--------------+ | "nb_inplace_multiply" | "binaryfunc" | __imul__ | +----------------------------+-------------------+--------------+ | "nb_remainder" | "binaryfunc" | __mod__ | | | | __rmod__ | +----------------------------+-------------------+--------------+ | "nb_inplace_remainder" | "binaryfunc" | __imod__ | +----------------------------+-------------------+--------------+ | "nb_divmod" | "binaryfunc" | __divmod__ | | | | __rdivmod__ | +----------------------------+-------------------+--------------+ | "nb_power" | "ternaryfunc" | __pow__ | | | | __rpow__ | +----------------------------+-------------------+--------------+ | "nb_inplace_power" | "ternaryfunc" | __ipow__ | +----------------------------+-------------------+--------------+ | "nb_negative" | "unaryfunc" | __neg__ | +----------------------------+-------------------+--------------+ | "nb_positive" | "unaryfunc" | __pos__ | +----------------------------+-------------------+--------------+ | "nb_absolute" | "unaryfunc" | __abs__ | +----------------------------+-------------------+--------------+ | "nb_bool" | "inquiry" | __bool__ | +----------------------------+-------------------+--------------+ | "nb_invert" | "unaryfunc" | __invert__ | +----------------------------+-------------------+--------------+ | "nb_lshift" | "binaryfunc" | __lshift__ | | | | __rlshift__ | +----------------------------+-------------------+--------------+ | "nb_inplace_lshift" | "binaryfunc" | __ilshift__ | +----------------------------+-------------------+--------------+ | "nb_rshift" | "binaryfunc" | __rshift__ | | | | __rrshift__ | +----------------------------+-------------------+--------------+ | "nb_inplace_rshift" | "binaryfunc" | __irshift__ | +----------------------------+-------------------+--------------+ | "nb_and" | "binaryfunc" | __and__ | | | | __rand__ | +----------------------------+-------------------+--------------+ | "nb_inplace_and" | "binaryfunc" | __iand__ | +----------------------------+-------------------+--------------+ | "nb_xor" | "binaryfunc" | __xor__ | | | | __rxor__ | +----------------------------+-------------------+--------------+ | "nb_inplace_xor" | "binaryfunc" | __ixor__ | +----------------------------+-------------------+--------------+ | "nb_or" | "binaryfunc" | __or__ | | | | __ror__ | +----------------------------+-------------------+--------------+ | "nb_inplace_or" | "binaryfunc" | __ior__ | +----------------------------+-------------------+--------------+ | "nb_int" | "unaryfunc" | __int__ | +----------------------------+-------------------+--------------+ | "nb_reserved" | void * | | +----------------------------+-------------------+--------------+ | "nb_float" | "unaryfunc" | __float__ | +----------------------------+-------------------+--------------+ | "nb_floor_divide" | "binaryfunc" | __floordiv__ | +----------------------------+-------------------+--------------+ | "nb_inplace_floor_divide" | "binaryfunc" | __ifloordiv | | | | __ | +----------------------------+-------------------+--------------+ | "nb_true_divide" | "binaryfunc" | __truediv__ | +----------------------------+-------------------+--------------+ | "nb_inplace_true_divide" | "binaryfunc" | __itruediv__ | +----------------------------+-------------------+--------------+ | "nb_index" | "unaryfunc" | __index__ | +----------------------------+-------------------+--------------+ | "nb_matrix_multiply" | "binaryfunc" | __matmul__ | | | | __rmatmul__ | +----------------------------+-------------------+--------------+ | "nb_inplace_matrix_multip | "binaryfunc" | __imatmul__ | | ly" | | | +----------------------------+-------------------+--------------+ | | +----------------------------+-------------------+--------------+ | "mp_length" | "lenfunc" | __len__ | +----------------------------+-------------------+--------------+ | "mp_subscript" | "binaryfunc" | __getitem__ | +----------------------------+-------------------+--------------+ | "mp_ass_subscript" | "objobjargproc" | __setitem__, | | | | __delitem__ | +----------------------------+-------------------+--------------+ | | +----------------------------+-------------------+--------------+ | "sq_length" | "lenfunc" | __len__ | +----------------------------+-------------------+--------------+ | "sq_concat" | "binaryfunc" | __add__ | +----------------------------+-------------------+--------------+ | "sq_repeat" | "ssizeargfunc" | __mul__ | +----------------------------+-------------------+--------------+ | "sq_item" | "ssizeargfunc" | __getitem__ | +----------------------------+-------------------+--------------+ | "sq_ass_item" | "ssizeobjargproc" | __setitem__ | | | | __delitem__ | +----------------------------+-------------------+--------------+ | "sq_contains" | "objobjproc" | __contains__ | +----------------------------+-------------------+--------------+ | "sq_inplace_concat" | "binaryfunc" | __iadd__ | +----------------------------+-------------------+--------------+ | "sq_inplace_repeat" | "ssizeargfunc" | __imul__ | +----------------------------+-------------------+--------------+ | | +----------------------------+-------------------+--------------+ | "bf_getbuffer" | "getbufferproc()" | | +----------------------------+-------------------+--------------+ | "bf_releasebuffer" | "releasebufferpr | | | | oc()" | | +----------------------------+-------------------+--------------+ slot typedefs ------------- +-------------------------------+-------------------------------+------------------------+ | typedef | Parameter Types | Return Type | |===============================|===============================|========================| | "allocfunc" | "PyTypeObject" * "Py_ssize_t" | "PyObject" * | +-------------------------------+-------------------------------+------------------------+ | "destructor" | void * | void | +-------------------------------+-------------------------------+------------------------+ | "freefunc" | void * | void | +-------------------------------+-------------------------------+------------------------+ | "traverseproc" | void * "visitproc" void * | int | +-------------------------------+-------------------------------+------------------------+ | "newfunc" | "PyObject" * "PyObject" * | "PyObject" * | | | "PyObject" * | | +-------------------------------+-------------------------------+------------------------+ | "initproc" | "PyObject" * "PyObject" * | int | | | "PyObject" * | | +-------------------------------+-------------------------------+------------------------+ | "reprfunc" | "PyObject" * | "PyObject" * | +-------------------------------+-------------------------------+------------------------+ | "getattrfunc" | "PyObject" * const char * | "PyObject" * | +-------------------------------+-------------------------------+------------------------+ | "setattrfunc" | "PyObject" * const char * | int | | | "PyObject" * | | +-------------------------------+-------------------------------+------------------------+ | "getattrofunc" | "PyObject" * "PyObject" * | "PyObject" * | +-------------------------------+-------------------------------+------------------------+ | "setattrofunc" | "PyObject" * "PyObject" * | int | | | "PyObject" * | | +-------------------------------+-------------------------------+------------------------+ | "descrgetfunc" | "PyObject" * "PyObject" * | "PyObject" * | | | "PyObject" * | | +-------------------------------+-------------------------------+------------------------+ | "descrsetfunc" | "PyObject" * "PyObject" * | int | | | "PyObject" * | | +-------------------------------+-------------------------------+------------------------+ | "hashfunc" | "PyObject" * | Py_hash_t | +-------------------------------+-------------------------------+------------------------+ | "richcmpfunc" | "PyObject" * "PyObject" * int | "PyObject" * | +-------------------------------+-------------------------------+------------------------+ | "getiterfunc" | "PyObject" * | "PyObject" * | +-------------------------------+-------------------------------+------------------------+ | "iternextfunc" | "PyObject" * | "PyObject" * | +-------------------------------+-------------------------------+------------------------+ | "lenfunc" | "PyObject" * | "Py_ssize_t" | +-------------------------------+-------------------------------+------------------------+ | "getbufferproc" | "PyObject" * "Py_buffer" * | int | | | int | | +-------------------------------+-------------------------------+------------------------+ | "releasebufferproc" | "PyObject" * "Py_buffer" * | void | +-------------------------------+-------------------------------+------------------------+ | "inquiry" | void * | int | +-------------------------------+-------------------------------+------------------------+ | "unaryfunc" | "PyObject" * | "PyObject" * | +-------------------------------+-------------------------------+------------------------+ | "binaryfunc" | "PyObject" * "PyObject" * | "PyObject" * | +-------------------------------+-------------------------------+------------------------+ | "ternaryfunc" | "PyObject" * "PyObject" * | "PyObject" * | | | "PyObject" * | | +-------------------------------+-------------------------------+------------------------+ | "ssizeargfunc" | "PyObject" * "Py_ssize_t" | "PyObject" * | +-------------------------------+-------------------------------+------------------------+ | "ssizeobjargproc" | "PyObject" * "Py_ssize_t" | int | +-------------------------------+-------------------------------+------------------------+ | "objobjproc" | "PyObject" * "PyObject" * | int | +-------------------------------+-------------------------------+------------------------+ | "objobjargproc" | "PyObject" * "PyObject" * | int | | | "PyObject" * | | +-------------------------------+-------------------------------+------------------------+ See Slot Type typedefs below for more detail. PyTypeObject Definition ======================= The structure definition for "PyTypeObject" can be found in "Include/object.h". For convenience of reference, this repeats the definition found there: typedef struct _typeobject { PyObject_VAR_HEAD const char *tp_name; /* For printing, in format "." */ Py_ssize_t tp_basicsize, tp_itemsize; /* For allocation */ /* Methods to implement standard operations */ destructor tp_dealloc; Py_ssize_t tp_vectorcall_offset; getattrfunc tp_getattr; setattrfunc tp_setattr; PyAsyncMethods *tp_as_async; /* formerly known as tp_compare (Python 2) or tp_reserved (Python 3) */ reprfunc tp_repr; /* Method suites for standard classes */ PyNumberMethods *tp_as_number; PySequenceMethods *tp_as_sequence; PyMappingMethods *tp_as_mapping; /* More standard operations (here for binary compatibility) */ hashfunc tp_hash; ternaryfunc tp_call; reprfunc tp_str; getattrofunc tp_getattro; setattrofunc tp_setattro; /* Functions to access object as input/output buffer */ PyBufferProcs *tp_as_buffer; /* Flags to define presence of optional/expanded features */ unsigned long tp_flags; const char *tp_doc; /* Documentation string */ /* call function for all accessible objects */ traverseproc tp_traverse; /* delete references to contained objects */ inquiry tp_clear; /* rich comparisons */ richcmpfunc tp_richcompare; /* weak reference enabler */ Py_ssize_t tp_weaklistoffset; /* Iterators */ getiterfunc tp_iter; iternextfunc tp_iternext; /* Attribute descriptor and subclassing stuff */ struct PyMethodDef *tp_methods; struct PyMemberDef *tp_members; struct PyGetSetDef *tp_getset; struct _typeobject *tp_base; PyObject *tp_dict; descrgetfunc tp_descr_get; descrsetfunc tp_descr_set; Py_ssize_t tp_dictoffset; initproc tp_init; allocfunc tp_alloc; newfunc tp_new; freefunc tp_free; /* Low-level free-memory routine */ inquiry tp_is_gc; /* For PyObject_IS_GC */ PyObject *tp_bases; PyObject *tp_mro; /* method resolution order */ PyObject *tp_cache; PyObject *tp_subclasses; PyObject *tp_weaklist; destructor tp_del; /* Type attribute cache version tag. Added in version 2.6 */ unsigned int tp_version_tag; destructor tp_finalize; } PyTypeObject; PyObject Slots ============== The type object structure extends the "PyVarObject" structure. The "ob_size" field is used for dynamic types (created by "type_new()", usually called from a class statement). Note that "PyType_Type" (the metatype) initializes "tp_itemsize", which means that its instances (i.e. type objects) *must* have the "ob_size" field. PyObject* PyObject._ob_next PyObject* PyObject._ob_prev These fields are only present when the macro "Py_TRACE_REFS" is defined. Their initialization to "NULL" is taken care of by the "PyObject_HEAD_INIT" macro. For statically allocated objects, these fields always remain "NULL". For dynamically allocated objects, these two fields are used to link the object into a doubly-linked list of *all* live objects on the heap. This could be used for various debugging purposes; currently the only use is to print the objects that are still alive at the end of a run when the environment variable "PYTHONDUMPREFS" is set. **Inheritance:** These fields are not inherited by subtypes. Py_ssize_t PyObject.ob_refcnt This is the type object’s reference count, initialized to "1" by the "PyObject_HEAD_INIT" macro. Note that for statically allocated type objects, the type’s instances (objects whose "ob_type" points back to the type) do *not* count as references. But for dynamically allocated type objects, the instances *do* count as references. **Inheritance:** This field is not inherited by subtypes. PyTypeObject* PyObject.ob_type This is the type’s type, in other words its metatype. It is initialized by the argument to the "PyObject_HEAD_INIT" macro, and its value should normally be "&PyType_Type". However, for dynamically loadable extension modules that must be usable on Windows (at least), the compiler complains that this is not a valid initializer. Therefore, the convention is to pass "NULL" to the "PyObject_HEAD_INIT" macro and to initialize this field explicitly at the start of the module’s initialization function, before doing anything else. This is typically done like this: Foo_Type.ob_type = &PyType_Type; This should be done before any instances of the type are created. "PyType_Ready()" checks if "ob_type" is "NULL", and if so, initializes it to the "ob_type" field of the base class. "PyType_Ready()" will not change this field if it is non-zero. **Inheritance:** This field is inherited by subtypes. PyVarObject Slots ================= Py_ssize_t PyVarObject.ob_size For statically allocated type objects, this should be initialized to zero. For dynamically allocated type objects, this field has a special internal meaning. **Inheritance:** This field is not inherited by subtypes. PyTypeObject Slots ================== Each slot has a section describing inheritance. If "PyType_Ready()" may set a value when the field is set to "NULL" then there will also be a “Default” section. (Note that many fields set on "PyBaseObject_Type" and "PyType_Type" effectively act as defaults.) const char* PyTypeObject.tp_name Pointer to a NUL-terminated string containing the name of the type. For types that are accessible as module globals, the string should be the full module name, followed by a dot, followed by the type name; for built-in types, it should be just the type name. If the module is a submodule of a package, the full package name is part of the full module name. For example, a type named "T" defined in module "M" in subpackage "Q" in package "P" should have the "tp_name" initializer ""P.Q.M.T"". For dynamically allocated type objects, this should just be the type name, and the module name explicitly stored in the type dict as the value for key "'__module__'". For statically allocated type objects, the tp_name field should contain a dot. Everything before the last dot is made accessible as the "__module__" attribute, and everything after the last dot is made accessible as the "__name__" attribute. If no dot is present, the entire "tp_name" field is made accessible as the "__name__" attribute, and the "__module__" attribute is undefined (unless explicitly set in the dictionary, as explained above). This means your type will be impossible to pickle. Additionally, it will not be listed in module documentations created with pydoc. This field must not be "NULL". It is the only required field in "PyTypeObject()" (other than potentially "tp_itemsize"). **Inheritance:** This field is not inherited by subtypes. Py_ssize_t PyTypeObject.tp_basicsize Py_ssize_t PyTypeObject.tp_itemsize These fields allow calculating the size in bytes of instances of the type. There are two kinds of types: types with fixed-length instances have a zero "tp_itemsize" field, types with variable-length instances have a non-zero "tp_itemsize" field. For a type with fixed-length instances, all instances have the same size, given in "tp_basicsize". For a type with variable-length instances, the instances must have an "ob_size" field, and the instance size is "tp_basicsize" plus N times "tp_itemsize", where N is the “length” of the object. The value of N is typically stored in the instance’s "ob_size" field. There are exceptions: for example, ints use a negative "ob_size" to indicate a negative number, and N is "abs(ob_size)" there. Also, the presence of an "ob_size" field in the instance layout doesn’t mean that the instance structure is variable-length (for example, the structure for the list type has fixed-length instances, yet those instances have a meaningful "ob_size" field). The basic size includes the fields in the instance declared by the macro "PyObject_HEAD" or "PyObject_VAR_HEAD" (whichever is used to declare the instance struct) and this in turn includes the "_ob_prev" and "_ob_next" fields if they are present. This means that the only correct way to get an initializer for the "tp_basicsize" is to use the "sizeof" operator on the struct used to declare the instance layout. The basic size does not include the GC header size. A note about alignment: if the variable items require a particular alignment, this should be taken care of by the value of "tp_basicsize". Example: suppose a type implements an array of "double". "tp_itemsize" is "sizeof(double)". It is the programmer’s responsibility that "tp_basicsize" is a multiple of "sizeof(double)" (assuming this is the alignment requirement for "double"). For any type with variable-length instances, this field must not be "NULL". **Inheritance:** These fields are inherited separately by subtypes. If the base type has a non-zero "tp_itemsize", it is generally not safe to set "tp_itemsize" to a different non-zero value in a subtype (though this depends on the implementation of the base type). destructor PyTypeObject.tp_dealloc A pointer to the instance destructor function. This function must be defined unless the type guarantees that its instances will never be deallocated (as is the case for the singletons "None" and "Ellipsis"). The function signature is: void tp_dealloc(PyObject *self); The destructor function is called by the "Py_DECREF()" and "Py_XDECREF()" macros when the new reference count is zero. At this point, the instance is still in existence, but there are no references to it. The destructor function should free all references which the instance owns, free all memory buffers owned by the instance (using the freeing function corresponding to the allocation function used to allocate the buffer), and call the type’s "tp_free" function. If the type is not subtypable (doesn’t have the "Py_TPFLAGS_BASETYPE" flag bit set), it is permissible to call the object deallocator directly instead of via "tp_free". The object deallocator should be the one used to allocate the instance; this is normally "PyObject_Del()" if the instance was allocated using "PyObject_New()" or "PyObject_VarNew()", or "PyObject_GC_Del()" if the instance was allocated using "PyObject_GC_New()" or "PyObject_GC_NewVar()". If the type supports garbage collection (has the "Py_TPFLAGS_HAVE_GC" flag bit set), the destructor should call "PyObject_GC_UnTrack()" before clearing any member fields. static void foo_dealloc(foo_object *self) { PyObject_GC_UnTrack(self); Py_CLEAR(self->ref); Py_TYPE(self)->tp_free((PyObject *)self); } Finally, if the type is heap allocated ("Py_TPFLAGS_HEAPTYPE"), the deallocator should decrement the reference count for its type object after calling the type deallocator. In order to avoid dangling pointers, the recommended way to achieve this is: static void foo_dealloc(foo_object *self) { PyTypeObject *tp = Py_TYPE(self); // free references and buffers here tp->tp_free(self); Py_DECREF(tp); } **Inheritance:** This field is inherited by subtypes. Py_ssize_t PyTypeObject.tp_vectorcall_offset An optional offset to a per-instance function that implements calling the object using the vectorcall protocol, a more efficient alternative of the simpler "tp_call". This field is only used if the flag "Py_TPFLAGS_HAVE_VECTORCALL" is set. If so, this must be a positive integer containing the offset in the instance of a "vectorcallfunc" pointer. The *vectorcallfunc* pointer may be "NULL", in which case the instance behaves as if "Py_TPFLAGS_HAVE_VECTORCALL" was not set: calling the instance falls back to "tp_call". Any class that sets "Py_TPFLAGS_HAVE_VECTORCALL" must also set "tp_call" and make sure its behaviour is consistent with the *vectorcallfunc* function. This can be done by setting *tp_call* to "PyVectorcall_Call()". Warning: It is not recommended for heap types to implement the vectorcall protocol. When a user sets "__call__" in Python code, only *tp_call* is updated, likely making it inconsistent with the vectorcall function. Note: The semantics of the "tp_vectorcall_offset" slot are provisional and expected to be finalized in Python 3.9. If you use vectorcall, plan for updating your code for Python 3.9. Changed in version 3.8: Before version 3.8, this slot was named "tp_print". In Python 2.x, it was used for printing to a file. In Python 3.0 to 3.7, it was unused. **Inheritance:** This field is always inherited. However, the "Py_TPFLAGS_HAVE_VECTORCALL" flag is not always inherited. If it’s not, then the subclass won’t use vectorcall, except when "PyVectorcall_Call()" is explicitly called. This is in particular the case for heap types (including subclasses defined in Python). getattrfunc PyTypeObject.tp_getattr An optional pointer to the get-attribute-string function. This field is deprecated. When it is defined, it should point to a function that acts the same as the "tp_getattro" function, but taking a C string instead of a Python string object to give the attribute name. **Inheritance:** Group: "tp_getattr", "tp_getattro" This field is inherited by subtypes together with "tp_getattro": a subtype inherits both "tp_getattr" and "tp_getattro" from its base type when the subtype’s "tp_getattr" and "tp_getattro" are both "NULL". setattrfunc PyTypeObject.tp_setattr An optional pointer to the function for setting and deleting attributes. This field is deprecated. When it is defined, it should point to a function that acts the same as the "tp_setattro" function, but taking a C string instead of a Python string object to give the attribute name. **Inheritance:** Group: "tp_setattr", "tp_setattro" This field is inherited by subtypes together with "tp_setattro": a subtype inherits both "tp_setattr" and "tp_setattro" from its base type when the subtype’s "tp_setattr" and "tp_setattro" are both "NULL". PyAsyncMethods* PyTypeObject.tp_as_async Pointer to an additional structure that contains fields relevant only to objects which implement *awaitable* and *asynchronous iterator* protocols at the C-level. See Async Object Structures for details. New in version 3.5: Formerly known as "tp_compare" and "tp_reserved". **Inheritance:** The "tp_as_async" field is not inherited, but the contained fields are inherited individually. reprfunc PyTypeObject.tp_repr An optional pointer to a function that implements the built-in function "repr()". The signature is the same as for "PyObject_Repr()": PyObject *tp_repr(PyObject *self); The function must return a string or a Unicode object. Ideally, this function should return a string that, when passed to "eval()", given a suitable environment, returns an object with the same value. If this is not feasible, it should return a string starting with "'<'" and ending with "'>'" from which both the type and the value of the object can be deduced. **Inheritance:** This field is inherited by subtypes. **Default:** When this field is not set, a string of the form "<%s object at %p>" is returned, where "%s" is replaced by the type name, and "%p" by the object’s memory address. PyNumberMethods* PyTypeObject.tp_as_number Pointer to an additional structure that contains fields relevant only to objects which implement the number protocol. These fields are documented in Number Object Structures. **Inheritance:** The "tp_as_number" field is not inherited, but the contained fields are inherited individually. PySequenceMethods* PyTypeObject.tp_as_sequence Pointer to an additional structure that contains fields relevant only to objects which implement the sequence protocol. These fields are documented in Sequence Object Structures. **Inheritance:** The "tp_as_sequence" field is not inherited, but the contained fields are inherited individually. PyMappingMethods* PyTypeObject.tp_as_mapping Pointer to an additional structure that contains fields relevant only to objects which implement the mapping protocol. These fields are documented in Mapping Object Structures. **Inheritance:** The "tp_as_mapping" field is not inherited, but the contained fields are inherited individually. hashfunc PyTypeObject.tp_hash An optional pointer to a function that implements the built-in function "hash()". The signature is the same as for "PyObject_Hash()": Py_hash_t tp_hash(PyObject *); The value "-1" should not be returned as a normal return value; when an error occurs during the computation of the hash value, the function should set an exception and return "-1". When this field is not set (*and* "tp_richcompare" is not set), an attempt to take the hash of the object raises "TypeError". This is the same as setting it to "PyObject_HashNotImplemented()". This field can be set explicitly to "PyObject_HashNotImplemented()" to block inheritance of the hash method from a parent type. This is interpreted as the equivalent of "__hash__ = None" at the Python level, causing "isinstance(o, collections.Hashable)" to correctly return "False". Note that the converse is also true - setting "__hash__ = None" on a class at the Python level will result in the "tp_hash" slot being set to "PyObject_HashNotImplemented()". **Inheritance:** Group: "tp_hash", "tp_richcompare" This field is inherited by subtypes together with "tp_richcompare": a subtype inherits both of "tp_richcompare" and "tp_hash", when the subtype’s "tp_richcompare" and "tp_hash" are both "NULL". ternaryfunc PyTypeObject.tp_call An optional pointer to a function that implements calling the object. This should be "NULL" if the object is not callable. The signature is the same as for "PyObject_Call()": PyObject *tp_call(PyObject *self, PyObject *args, PyObject *kwargs); **Inheritance:** This field is inherited by subtypes. reprfunc PyTypeObject.tp_str An optional pointer to a function that implements the built-in operation "str()". (Note that "str" is a type now, and "str()" calls the constructor for that type. This constructor calls "PyObject_Str()" to do the actual work, and "PyObject_Str()" will call this handler.) The signature is the same as for "PyObject_Str()": PyObject *tp_str(PyObject *self); The function must return a string or a Unicode object. It should be a “friendly” string representation of the object, as this is the representation that will be used, among other things, by the "print()" function. **Inheritance:** This field is inherited by subtypes. **Default:** When this field is not set, "PyObject_Repr()" is called to return a string representation. getattrofunc PyTypeObject.tp_getattro An optional pointer to the get-attribute function. The signature is the same as for "PyObject_GetAttr()": PyObject *tp_getattro(PyObject *self, PyObject *attr); It is usually convenient to set this field to "PyObject_GenericGetAttr()", which implements the normal way of looking for object attributes. **Inheritance:** Group: "tp_getattr", "tp_getattro" This field is inherited by subtypes together with "tp_getattr": a subtype inherits both "tp_getattr" and "tp_getattro" from its base type when the subtype’s "tp_getattr" and "tp_getattro" are both "NULL". **Default:** "PyBaseObject_Type" uses "PyObject_GenericGetAttr()". setattrofunc PyTypeObject.tp_setattro An optional pointer to the function for setting and deleting attributes. The signature is the same as for "PyObject_SetAttr()": int tp_setattro(PyObject *self, PyObject *attr, PyObject *value); In addition, setting *value* to "NULL" to delete an attribute must be supported. It is usually convenient to set this field to "PyObject_GenericSetAttr()", which implements the normal way of setting object attributes. **Inheritance:** Group: "tp_setattr", "tp_setattro" This field is inherited by subtypes together with "tp_setattr": a subtype inherits both "tp_setattr" and "tp_setattro" from its base type when the subtype’s "tp_setattr" and "tp_setattro" are both "NULL". **Default:** "PyBaseObject_Type" uses "PyObject_GenericSetAttr()". PyBufferProcs* PyTypeObject.tp_as_buffer Pointer to an additional structure that contains fields relevant only to objects which implement the buffer interface. These fields are documented in Buffer Object Structures. **Inheritance:** The "tp_as_buffer" field is not inherited, but the contained fields are inherited individually. unsigned long PyTypeObject.tp_flags This field is a bit mask of various flags. Some flags indicate variant semantics for certain situations; others are used to indicate that certain fields in the type object (or in the extension structures referenced via "tp_as_number", "tp_as_sequence", "tp_as_mapping", and "tp_as_buffer") that were historically not always present are valid; if such a flag bit is clear, the type fields it guards must not be accessed and must be considered to have a zero or "NULL" value instead. **Inheritance:** Inheritance of this field is complicated. Most flag bits are inherited individually, i.e. if the base type has a flag bit set, the subtype inherits this flag bit. The flag bits that pertain to extension structures are strictly inherited if the extension structure is inherited, i.e. the base type’s value of the flag bit is copied into the subtype together with a pointer to the extension structure. The "Py_TPFLAGS_HAVE_GC" flag bit is inherited together with the "tp_traverse" and "tp_clear" fields, i.e. if the "Py_TPFLAGS_HAVE_GC" flag bit is clear in the subtype and the "tp_traverse" and "tp_clear" fields in the subtype exist and have "NULL" values. **Default:** "PyBaseObject_Type" uses "Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE". **Bit Masks:** The following bit masks are currently defined; these can be ORed together using the "|" operator to form the value of the "tp_flags" field. The macro "PyType_HasFeature()" takes a type and a flags value, *tp* and *f*, and checks whether "tp->tp_flags & f" is non- zero. Py_TPFLAGS_HEAPTYPE This bit is set when the type object itself is allocated on the heap, for example, types created dynamically using "PyType_FromSpec()". In this case, the "ob_type" field of its instances is considered a reference to the type, and the type object is INCREF’ed when a new instance is created, and DECREF’ed when an instance is destroyed (this does not apply to instances of subtypes; only the type referenced by the instance’s ob_type gets INCREF’ed or DECREF’ed). **Inheritance:** ??? Py_TPFLAGS_BASETYPE This bit is set when the type can be used as the base type of another type. If this bit is clear, the type cannot be subtyped (similar to a “final” class in Java). **Inheritance:** ??? Py_TPFLAGS_READY This bit is set when the type object has been fully initialized by "PyType_Ready()". **Inheritance:** ??? Py_TPFLAGS_READYING This bit is set while "PyType_Ready()" is in the process of initializing the type object. **Inheritance:** ??? Py_TPFLAGS_HAVE_GC This bit is set when the object supports garbage collection. If this bit is set, instances must be created using "PyObject_GC_New()" and destroyed using "PyObject_GC_Del()". More information in section Supporting Cyclic Garbage Collection. This bit also implies that the GC-related fields "tp_traverse" and "tp_clear" are present in the type object. **Inheritance:** Group: "Py_TPFLAGS_HAVE_GC", "tp_traverse", "tp_clear" The "Py_TPFLAGS_HAVE_GC" flag bit is inherited together with the "tp_traverse" and "tp_clear" fields, i.e. if the "Py_TPFLAGS_HAVE_GC" flag bit is clear in the subtype and the "tp_traverse" and "tp_clear" fields in the subtype exist and have "NULL" values. Py_TPFLAGS_DEFAULT This is a bitmask of all the bits that pertain to the existence of certain fields in the type object and its extension structures. Currently, it includes the following bits: "Py_TPFLAGS_HAVE_STACKLESS_EXTENSION", "Py_TPFLAGS_HAVE_VERSION_TAG". **Inheritance:** ??? Py_TPFLAGS_METHOD_DESCRIPTOR This bit indicates that objects behave like unbound methods. If this flag is set for "type(meth)", then: * "meth.__get__(obj, cls)(*args, **kwds)" (with "obj" not None) must be equivalent to "meth(obj, *args, **kwds)". * "meth.__get__(None, cls)(*args, **kwds)" must be equivalent to "meth(*args, **kwds)". This flag enables an optimization for typical method calls like "obj.meth()": it avoids creating a temporary “bound method” object for "obj.meth". New in version 3.8. **Inheritance:** This flag is never inherited by heap types. For extension types, it is inherited whenever "tp_descr_get" is inherited. Py_TPFLAGS_LONG_SUBCLASS Py_TPFLAGS_LIST_SUBCLASS Py_TPFLAGS_TUPLE_SUBCLASS Py_TPFLAGS_BYTES_SUBCLASS Py_TPFLAGS_UNICODE_SUBCLASS Py_TPFLAGS_DICT_SUBCLASS Py_TPFLAGS_BASE_EXC_SUBCLASS Py_TPFLAGS_TYPE_SUBCLASS These flags are used by functions such as "PyLong_Check()" to quickly determine if a type is a subclass of a built-in type; such specific checks are faster than a generic check, like "PyObject_IsInstance()". Custom types that inherit from built- ins should have their "tp_flags" set appropriately, or the code that interacts with such types will behave differently depending on what kind of check is used. Py_TPFLAGS_HAVE_FINALIZE This bit is set when the "tp_finalize" slot is present in the type structure. New in version 3.4. Deprecated since version 3.8: This flag isn’t necessary anymore, as the interpreter assumes the "tp_finalize" slot is always present in the type structure. Py_TPFLAGS_HAVE_VECTORCALL This bit is set when the class implements the vectorcall protocol. See "tp_vectorcall_offset" for details. **Inheritance:** This bit is inherited for *static* subtypes if "tp_call" is also inherited. Heap types do not inherit "Py_TPFLAGS_HAVE_VECTORCALL". New in version 3.9. const char* PyTypeObject.tp_doc An optional pointer to a NUL-terminated C string giving the docstring for this type object. This is exposed as the "__doc__" attribute on the type and instances of the type. **Inheritance:** This field is *not* inherited by subtypes. traverseproc PyTypeObject.tp_traverse An optional pointer to a traversal function for the garbage collector. This is only used if the "Py_TPFLAGS_HAVE_GC" flag bit is set. The signature is: int tp_traverse(PyObject *self, visitproc visit, void *arg); More information about Python’s garbage collection scheme can be found in section Supporting Cyclic Garbage Collection. The "tp_traverse" pointer is used by the garbage collector to detect reference cycles. A typical implementation of a "tp_traverse" function simply calls "Py_VISIT()" on each of the instance’s members that are Python objects that the instance owns. For example, this is function "local_traverse()" from the "_thread" extension module: static int local_traverse(localobject *self, visitproc visit, void *arg) { Py_VISIT(self->args); Py_VISIT(self->kw); Py_VISIT(self->dict); return 0; } Note that "Py_VISIT()" is called only on those members that can participate in reference cycles. Although there is also a "self->key" member, it can only be "NULL" or a Python string and therefore cannot be part of a reference cycle. On the other hand, even if you know a member can never be part of a cycle, as a debugging aid you may want to visit it anyway just so the "gc" module’s "get_referents()" function will include it. Warning: When implementing "tp_traverse", only the members that the instance *owns* (by having strong references to them) must be visited. For instance, if an object supports weak references via the "tp_weaklist" slot, the pointer supporting the linked list (what *tp_weaklist* points to) must **not** be visited as the instance does not directly own the weak references to itself (the weakreference list is there to support the weak reference machinery, but the instance has no strong reference to the elements inside it, as they are allowed to be removed even if the instance is still alive). Note that "Py_VISIT()" requires the *visit* and *arg* parameters to "local_traverse()" to have these specific names; don’t name them just anything. Heap-allocated types ("Py_TPFLAGS_HEAPTYPE", such as those created with "PyType_FromSpec()" and similar APIs) hold a reference to their type. Their traversal function must therefore either visit "Py_TYPE(self)", or delegate this responsibility by calling "tp_traverse" of another heap-allocated type (such as a heap- allocated superclass). If they do not, the type object may not be garbage-collected. Changed in version 3.9: Heap-allocated types are expected to visit "Py_TYPE(self)" in "tp_traverse". In earlier versions of Python, due to bug 40217, doing this may lead to crashes in subclasses. **Inheritance:** Group: "Py_TPFLAGS_HAVE_GC", "tp_traverse", "tp_clear" This field is inherited by subtypes together with "tp_clear" and the "Py_TPFLAGS_HAVE_GC" flag bit: the flag bit, "tp_traverse", and "tp_clear" are all inherited from the base type if they are all zero in the subtype. inquiry PyTypeObject.tp_clear An optional pointer to a clear function for the garbage collector. This is only used if the "Py_TPFLAGS_HAVE_GC" flag bit is set. The signature is: int tp_clear(PyObject *); The "tp_clear" member function is used to break reference cycles in cyclic garbage detected by the garbage collector. Taken together, all "tp_clear" functions in the system must combine to break all reference cycles. This is subtle, and if in any doubt supply a "tp_clear" function. For example, the tuple type does not implement a "tp_clear" function, because it’s possible to prove that no reference cycle can be composed entirely of tuples. Therefore the "tp_clear" functions of other types must be sufficient to break any cycle containing a tuple. This isn’t immediately obvious, and there’s rarely a good reason to avoid implementing "tp_clear". Implementations of "tp_clear" should drop the instance’s references to those of its members that may be Python objects, and set its pointers to those members to "NULL", as in the following example: static int local_clear(localobject *self) { Py_CLEAR(self->key); Py_CLEAR(self->args); Py_CLEAR(self->kw); Py_CLEAR(self->dict); return 0; } The "Py_CLEAR()" macro should be used, because clearing references is delicate: the reference to the contained object must not be decremented until after the pointer to the contained object is set to "NULL". This is because decrementing the reference count may cause the contained object to become trash, triggering a chain of reclamation activity that may include invoking arbitrary Python code (due to finalizers, or weakref callbacks, associated with the contained object). If it’s possible for such code to reference *self* again, it’s important that the pointer to the contained object be "NULL" at that time, so that *self* knows the contained object can no longer be used. The "Py_CLEAR()" macro performs the operations in a safe order. Note that "tp_clear" is not *always* called before an instance is deallocated. For example, when reference counting is enough to determine that an object is no longer used, the cyclic garbage collector is not involved and "tp_dealloc" is called directly. Because the goal of "tp_clear" functions is to break reference cycles, it’s not necessary to clear contained objects like Python strings or Python integers, which can’t participate in reference cycles. On the other hand, it may be convenient to clear all contained Python objects, and write the type’s "tp_dealloc" function to invoke "tp_clear". More information about Python’s garbage collection scheme can be found in section Supporting Cyclic Garbage Collection. **Inheritance:** Group: "Py_TPFLAGS_HAVE_GC", "tp_traverse", "tp_clear" This field is inherited by subtypes together with "tp_traverse" and the "Py_TPFLAGS_HAVE_GC" flag bit: the flag bit, "tp_traverse", and "tp_clear" are all inherited from the base type if they are all zero in the subtype. richcmpfunc PyTypeObject.tp_richcompare An optional pointer to the rich comparison function, whose signature is: PyObject *tp_richcompare(PyObject *self, PyObject *other, int op); The first parameter is guaranteed to be an instance of the type that is defined by "PyTypeObject". The function should return the result of the comparison (usually "Py_True" or "Py_False"). If the comparison is undefined, it must return "Py_NotImplemented", if another error occurred it must return "NULL" and set an exception condition. The following constants are defined to be used as the third argument for "tp_richcompare" and for "PyObject_RichCompare()": +------------------+--------------+ | Constant | Comparison | |==================|==============| | "Py_LT" | "<" | +------------------+--------------+ | "Py_LE" | "<=" | +------------------+--------------+ | "Py_EQ" | "==" | +------------------+--------------+ | "Py_NE" | "!=" | +------------------+--------------+ | "Py_GT" | ">" | +------------------+--------------+ | "Py_GE" | ">=" | +------------------+--------------+ The following macro is defined to ease writing rich comparison functions: Py_RETURN_RICHCOMPARE(VAL_A, VAL_B, op) Return "Py_True" or "Py_False" from the function, depending on the result of a comparison. VAL_A and VAL_B must be orderable by C comparison operators (for example, they may be C ints or floats). The third argument specifies the requested operation, as for "PyObject_RichCompare()". The return value’s reference count is properly incremented. On error, sets an exception and returns "NULL" from the function. New in version 3.7. **Inheritance:** Group: "tp_hash", "tp_richcompare" This field is inherited by subtypes together with "tp_hash": a subtype inherits "tp_richcompare" and "tp_hash" when the subtype’s "tp_richcompare" and "tp_hash" are both "NULL". **Default:** "PyBaseObject_Type" provides a "tp_richcompare" implementation, which may be inherited. However, if only "tp_hash" is defined, not even the inherited function is used and instances of the type will not be able to participate in any comparisons. Py_ssize_t PyTypeObject.tp_weaklistoffset If the instances of this type are weakly referenceable, this field is greater than zero and contains the offset in the instance structure of the weak reference list head (ignoring the GC header, if present); this offset is used by "PyObject_ClearWeakRefs()" and the "PyWeakref_*()" functions. The instance structure needs to include a field of type "PyObject*" which is initialized to "NULL". Do not confuse this field with "tp_weaklist"; that is the list head for weak references to the type object itself. **Inheritance:** This field is inherited by subtypes, but see the rules listed below. A subtype may override this offset; this means that the subtype uses a different weak reference list head than the base type. Since the list head is always found via "tp_weaklistoffset", this should not be a problem. When a type defined by a class statement has no "__slots__" declaration, and none of its base types are weakly referenceable, the type is made weakly referenceable by adding a weak reference list head slot to the instance layout and setting the "tp_weaklistoffset" of that slot’s offset. When a type’s "__slots__" declaration contains a slot named "__weakref__", that slot becomes the weak reference list head for instances of the type, and the slot’s offset is stored in the type’s "tp_weaklistoffset". When a type’s "__slots__" declaration does not contain a slot named "__weakref__", the type inherits its "tp_weaklistoffset" from its base type. getiterfunc PyTypeObject.tp_iter An optional pointer to a function that returns an iterator for the object. Its presence normally signals that the instances of this type are iterable (although sequences may be iterable without this function). This function has the same signature as "PyObject_GetIter()": PyObject *tp_iter(PyObject *self); **Inheritance:** This field is inherited by subtypes. iternextfunc PyTypeObject.tp_iternext An optional pointer to a function that returns the next item in an iterator. The signature is: PyObject *tp_iternext(PyObject *self); When the iterator is exhausted, it must return "NULL"; a "StopIteration" exception may or may not be set. When another error occurs, it must return "NULL" too. Its presence signals that the instances of this type are iterators. Iterator types should also define the "tp_iter" function, and that function should return the iterator instance itself (not a new iterator instance). This function has the same signature as "PyIter_Next()". **Inheritance:** This field is inherited by subtypes. struct PyMethodDef* PyTypeObject.tp_methods An optional pointer to a static "NULL"-terminated array of "PyMethodDef" structures, declaring regular methods of this type. For each entry in the array, an entry is added to the type’s dictionary (see "tp_dict" below) containing a method descriptor. **Inheritance:** This field is not inherited by subtypes (methods are inherited through a different mechanism). struct PyMemberDef* PyTypeObject.tp_members An optional pointer to a static "NULL"-terminated array of "PyMemberDef" structures, declaring regular data members (fields or slots) of instances of this type. For each entry in the array, an entry is added to the type’s dictionary (see "tp_dict" below) containing a member descriptor. **Inheritance:** This field is not inherited by subtypes (members are inherited through a different mechanism). struct PyGetSetDef* PyTypeObject.tp_getset An optional pointer to a static "NULL"-terminated array of "PyGetSetDef" structures, declaring computed attributes of instances of this type. For each entry in the array, an entry is added to the type’s dictionary (see "tp_dict" below) containing a getset descriptor. **Inheritance:** This field is not inherited by subtypes (computed attributes are inherited through a different mechanism). PyTypeObject* PyTypeObject.tp_base An optional pointer to a base type from which type properties are inherited. At this level, only single inheritance is supported; multiple inheritance require dynamically creating a type object by calling the metatype. Note: Slot initialization is subject to the rules of initializing globals. C99 requires the initializers to be “address constants”. Function designators like "PyType_GenericNew()", with implicit conversion to a pointer, are valid C99 address constants.However, the unary ‘&’ operator applied to a non-static variable like "PyBaseObject_Type()" is not required to produce an address constant. Compilers may support this (gcc does), MSVC does not. Both compilers are strictly standard conforming in this particular behavior.Consequently, "tp_base" should be set in the extension module’s init function. **Inheritance:** This field is not inherited by subtypes (obviously). **Default:** This field defaults to "&PyBaseObject_Type" (which to Python programmers is known as the type "object"). PyObject* PyTypeObject.tp_dict The type’s dictionary is stored here by "PyType_Ready()". This field should normally be initialized to "NULL" before PyType_Ready is called; it may also be initialized to a dictionary containing initial attributes for the type. Once "PyType_Ready()" has initialized the type, extra attributes for the type may be added to this dictionary only if they don’t correspond to overloaded operations (like "__add__()"). **Inheritance:** This field is not inherited by subtypes (though the attributes defined in here are inherited through a different mechanism). **Default:** If this field is "NULL", "PyType_Ready()" will assign a new dictionary to it. Warning: It is not safe to use "PyDict_SetItem()" on or otherwise modify "tp_dict" with the dictionary C-API. descrgetfunc PyTypeObject.tp_descr_get An optional pointer to a “descriptor get” function. The function signature is: PyObject * tp_descr_get(PyObject *self, PyObject *obj, PyObject *type); **Inheritance:** This field is inherited by subtypes. descrsetfunc PyTypeObject.tp_descr_set An optional pointer to a function for setting and deleting a descriptor’s value. The function signature is: int tp_descr_set(PyObject *self, PyObject *obj, PyObject *value); The *value* argument is set to "NULL" to delete the value. **Inheritance:** This field is inherited by subtypes. Py_ssize_t PyTypeObject.tp_dictoffset If the instances of this type have a dictionary containing instance variables, this field is non-zero and contains the offset in the instances of the type of the instance variable dictionary; this offset is used by "PyObject_GenericGetAttr()". Do not confuse this field with "tp_dict"; that is the dictionary for attributes of the type object itself. If the value of this field is greater than zero, it specifies the offset from the start of the instance structure. If the value is less than zero, it specifies the offset from the *end* of the instance structure. A negative offset is more expensive to use, and should only be used when the instance structure contains a variable-length part. This is used for example to add an instance variable dictionary to subtypes of "str" or "tuple". Note that the "tp_basicsize" field should account for the dictionary added to the end in that case, even though the dictionary is not included in the basic object layout. On a system with a pointer size of 4 bytes, "tp_dictoffset" should be set to "-4" to indicate that the dictionary is at the very end of the structure. The real dictionary offset in an instance can be computed from a negative "tp_dictoffset" as follows: dictoffset = tp_basicsize + abs(ob_size)*tp_itemsize + tp_dictoffset if dictoffset is not aligned on sizeof(void*): round up to sizeof(void*) where "tp_basicsize", "tp_itemsize" and "tp_dictoffset" are taken from the type object, and "ob_size" is taken from the instance. The absolute value is taken because ints use the sign of "ob_size" to store the sign of the number. (There’s never a need to do this calculation yourself; it is done for you by "_PyObject_GetDictPtr()".) **Inheritance:** This field is inherited by subtypes, but see the rules listed below. A subtype may override this offset; this means that the subtype instances store the dictionary at a difference offset than the base type. Since the dictionary is always found via "tp_dictoffset", this should not be a problem. When a type defined by a class statement has no "__slots__" declaration, and none of its base types has an instance variable dictionary, a dictionary slot is added to the instance layout and the "tp_dictoffset" is set to that slot’s offset. When a type defined by a class statement has a "__slots__" declaration, the type inherits its "tp_dictoffset" from its base type. (Adding a slot named "__dict__" to the "__slots__" declaration does not have the expected effect, it just causes confusion. Maybe this should be added as a feature just like "__weakref__" though.) **Default:** This slot has no default. For static types, if the field is "NULL" then no "__dict__" gets created for instances. initproc PyTypeObject.tp_init An optional pointer to an instance initialization function. This function corresponds to the "__init__()" method of classes. Like "__init__()", it is possible to create an instance without calling "__init__()", and it is possible to reinitialize an instance by calling its "__init__()" method again. The function signature is: int tp_init(PyObject *self, PyObject *args, PyObject *kwds); The self argument is the instance to be initialized; the *args* and *kwds* arguments represent positional and keyword arguments of the call to "__init__()". The "tp_init" function, if not "NULL", is called when an instance is created normally by calling its type, after the type’s "tp_new" function has returned an instance of the type. If the "tp_new" function returns an instance of some other type that is not a subtype of the original type, no "tp_init" function is called; if "tp_new" returns an instance of a subtype of the original type, the subtype’s "tp_init" is called. Returns "0" on success, "-1" and sets an exception on error. **Inheritance:** This field is inherited by subtypes. **Default:** For static types this field does not have a default. allocfunc PyTypeObject.tp_alloc An optional pointer to an instance allocation function. The function signature is: PyObject *tp_alloc(PyTypeObject *self, Py_ssize_t nitems); **Inheritance:** This field is inherited by static subtypes, but not by dynamic subtypes (subtypes created by a class statement). **Default:** For dynamic subtypes, this field is always set to "PyType_GenericAlloc()", to force a standard heap allocation strategy. For static subtypes, "PyBaseObject_Type" uses "PyType_GenericAlloc()". That is the recommended value for all statically defined types. newfunc PyTypeObject.tp_new An optional pointer to an instance creation function. The function signature is: PyObject *tp_new(PyTypeObject *subtype, PyObject *args, PyObject *kwds); The *subtype* argument is the type of the object being created; the *args* and *kwds* arguments represent positional and keyword arguments of the call to the type. Note that *subtype* doesn’t have to equal the type whose "tp_new" function is called; it may be a subtype of that type (but not an unrelated type). The "tp_new" function should call "subtype->tp_alloc(subtype, nitems)" to allocate space for the object, and then do only as much further initialization as is absolutely necessary. Initialization that can safely be ignored or repeated should be placed in the "tp_init" handler. A good rule of thumb is that for immutable types, all initialization should take place in "tp_new", while for mutable types, most initialization should be deferred to "tp_init". **Inheritance:** This field is inherited by subtypes, except it is not inherited by static types whose "tp_base" is "NULL" or "&PyBaseObject_Type". **Default:** For static types this field has no default. This means if the slot is defined as "NULL", the type cannot be called to create new instances; presumably there is some other way to create instances, like a factory function. freefunc PyTypeObject.tp_free An optional pointer to an instance deallocation function. Its signature is: void tp_free(void *self); An initializer that is compatible with this signature is "PyObject_Free()". **Inheritance:** This field is inherited by static subtypes, but not by dynamic subtypes (subtypes created by a class statement) **Default:** In dynamic subtypes, this field is set to a deallocator suitable to match "PyType_GenericAlloc()" and the value of the "Py_TPFLAGS_HAVE_GC" flag bit. For static subtypes, "PyBaseObject_Type" uses PyObject_Del. inquiry PyTypeObject.tp_is_gc An optional pointer to a function called by the garbage collector. The garbage collector needs to know whether a particular object is collectible or not. Normally, it is sufficient to look at the object’s type’s "tp_flags" field, and check the "Py_TPFLAGS_HAVE_GC" flag bit. But some types have a mixture of statically and dynamically allocated instances, and the statically allocated instances are not collectible. Such types should define this function; it should return "1" for a collectible instance, and "0" for a non-collectible instance. The signature is: int tp_is_gc(PyObject *self); (The only example of this are types themselves. The metatype, "PyType_Type", defines this function to distinguish between statically and dynamically allocated types.) **Inheritance:** This field is inherited by subtypes. **Default:** This slot has no default. If this field is "NULL", "Py_TPFLAGS_HAVE_GC" is used as the functional equivalent. PyObject* PyTypeObject.tp_bases Tuple of base types. This is set for types created by a class statement. It should be "NULL" for statically defined types. **Inheritance:** This field is not inherited. PyObject* PyTypeObject.tp_mro Tuple containing the expanded set of base types, starting with the type itself and ending with "object", in Method Resolution Order. **Inheritance:** This field is not inherited; it is calculated fresh by "PyType_Ready()". PyObject* PyTypeObject.tp_cache Unused. Internal use only. **Inheritance:** This field is not inherited. PyObject* PyTypeObject.tp_subclasses List of weak references to subclasses. Internal use only. **Inheritance:** This field is not inherited. PyObject* PyTypeObject.tp_weaklist Weak reference list head, for weak references to this type object. Not inherited. Internal use only. **Inheritance:** This field is not inherited. destructor PyTypeObject.tp_del This field is deprecated. Use "tp_finalize" instead. unsigned int PyTypeObject.tp_version_tag Used to index into the method cache. Internal use only. **Inheritance:** This field is not inherited. destructor PyTypeObject.tp_finalize An optional pointer to an instance finalization function. Its signature is: void tp_finalize(PyObject *self); If "tp_finalize" is set, the interpreter calls it once when finalizing an instance. It is called either from the garbage collector (if the instance is part of an isolated reference cycle) or just before the object is deallocated. Either way, it is guaranteed to be called before attempting to break reference cycles, ensuring that it finds the object in a sane state. "tp_finalize" should not mutate the current exception status; therefore, a recommended way to write a non-trivial finalizer is: static void local_finalize(PyObject *self) { PyObject *error_type, *error_value, *error_traceback; /* Save the current exception, if any. */ PyErr_Fetch(&error_type, &error_value, &error_traceback); /* ... */ /* Restore the saved exception. */ PyErr_Restore(error_type, error_value, error_traceback); } For this field to be taken into account (even through inheritance), you must also set the "Py_TPFLAGS_HAVE_FINALIZE" flags bit. Also, note that, in a garbage collected Python, "tp_dealloc" may be called from any Python thread, not just the thread which created the object (if the object becomes part of a refcount cycle, that cycle might be collected by a garbage collection on any thread). This is not a problem for Python API calls, since the thread on which tp_dealloc is called will own the Global Interpreter Lock (GIL). However, if the object being destroyed in turn destroys objects from some other C or C++ library, care should be taken to ensure that destroying those objects on the thread which called tp_dealloc will not violate any assumptions of the library. **Inheritance:** This field is inherited by subtypes. New in version 3.4. See also: “Safe object finalization” (**PEP 442**) vectorcallfunc PyTypeObject.tp_vectorcall Vectorcall function to use for calls of this type object. In other words, it is used to implement vectorcall for "type.__call__". If "tp_vectorcall" is "NULL", the default call implementation using "__new__" and "__init__" is used. **Inheritance:** This field is never inherited. New in version 3.9: (the field exists since 3.8 but it’s only used since 3.9) Heap Types ========== Traditionally, types defined in C code are *static*, that is, a static "PyTypeObject" structure is defined directly in code and initialized using "PyType_Ready()". This results in types that are limited relative to types defined in Python: * Static types are limited to one base, i.e. they cannot use multiple inheritance. * Static type objects (but not necessarily their instances) are immutable. It is not possible to add or modify the type object’s attributes from Python. * Static type objects are shared across sub-interpreters, so they should not include any subinterpreter-specific state. Also, since "PyTypeObject" is not part of the stable ABI, any extension modules using static types must be compiled for a specific Python minor version. An alternative to static types is *heap-allocated types*, or *heap types* for short, which correspond closely to classes created by Python’s "class" statement. This is done by filling a "PyType_Spec" structure and calling "PyType_FromSpecWithBases()". Number Object Structures ************************ PyNumberMethods This structure holds pointers to the functions which an object uses to implement the number protocol. Each function is used by the function of similar name documented in the Number Protocol section. Here is the structure definition: typedef struct { binaryfunc nb_add; binaryfunc nb_subtract; binaryfunc nb_multiply; binaryfunc nb_remainder; binaryfunc nb_divmod; ternaryfunc nb_power; unaryfunc nb_negative; unaryfunc nb_positive; unaryfunc nb_absolute; inquiry nb_bool; unaryfunc nb_invert; binaryfunc nb_lshift; binaryfunc nb_rshift; binaryfunc nb_and; binaryfunc nb_xor; binaryfunc nb_or; unaryfunc nb_int; void *nb_reserved; unaryfunc nb_float; binaryfunc nb_inplace_add; binaryfunc nb_inplace_subtract; binaryfunc nb_inplace_multiply; binaryfunc nb_inplace_remainder; ternaryfunc nb_inplace_power; binaryfunc nb_inplace_lshift; binaryfunc nb_inplace_rshift; binaryfunc nb_inplace_and; binaryfunc nb_inplace_xor; binaryfunc nb_inplace_or; binaryfunc nb_floor_divide; binaryfunc nb_true_divide; binaryfunc nb_inplace_floor_divide; binaryfunc nb_inplace_true_divide; unaryfunc nb_index; binaryfunc nb_matrix_multiply; binaryfunc nb_inplace_matrix_multiply; } PyNumberMethods; Note: Binary and ternary functions must check the type of all their operands, and implement the necessary conversions (at least one of the operands is an instance of the defined type). If the operation is not defined for the given operands, binary and ternary functions must return "Py_NotImplemented", if another error occurred they must return "NULL" and set an exception. Note: The "nb_reserved" field should always be "NULL". It was previously called "nb_long", and was renamed in Python 3.0.1. binaryfunc PyNumberMethods.nb_add binaryfunc PyNumberMethods.nb_subtract binaryfunc PyNumberMethods.nb_multiply binaryfunc PyNumberMethods.nb_remainder binaryfunc PyNumberMethods.nb_divmod ternaryfunc PyNumberMethods.nb_power unaryfunc PyNumberMethods.nb_negative unaryfunc PyNumberMethods.nb_positive unaryfunc PyNumberMethods.nb_absolute inquiry PyNumberMethods.nb_bool unaryfunc PyNumberMethods.nb_invert binaryfunc PyNumberMethods.nb_lshift binaryfunc PyNumberMethods.nb_rshift binaryfunc PyNumberMethods.nb_and binaryfunc PyNumberMethods.nb_xor binaryfunc PyNumberMethods.nb_or unaryfunc PyNumberMethods.nb_int void *PyNumberMethods.nb_reserved unaryfunc PyNumberMethods.nb_float binaryfunc PyNumberMethods.nb_inplace_add binaryfunc PyNumberMethods.nb_inplace_subtract binaryfunc PyNumberMethods.nb_inplace_multiply binaryfunc PyNumberMethods.nb_inplace_remainder ternaryfunc PyNumberMethods.nb_inplace_power binaryfunc PyNumberMethods.nb_inplace_lshift binaryfunc PyNumberMethods.nb_inplace_rshift binaryfunc PyNumberMethods.nb_inplace_and binaryfunc PyNumberMethods.nb_inplace_xor binaryfunc PyNumberMethods.nb_inplace_or binaryfunc PyNumberMethods.nb_floor_divide binaryfunc PyNumberMethods.nb_true_divide binaryfunc PyNumberMethods.nb_inplace_floor_divide binaryfunc PyNumberMethods.nb_inplace_true_divide unaryfunc PyNumberMethods.nb_index binaryfunc PyNumberMethods.nb_matrix_multiply binaryfunc PyNumberMethods.nb_inplace_matrix_multiply Mapping Object Structures ************************* PyMappingMethods This structure holds pointers to the functions which an object uses to implement the mapping protocol. It has three members: lenfunc PyMappingMethods.mp_length This function is used by "PyMapping_Size()" and "PyObject_Size()", and has the same signature. This slot may be set to "NULL" if the object has no defined length. binaryfunc PyMappingMethods.mp_subscript This function is used by "PyObject_GetItem()" and "PySequence_GetSlice()", and has the same signature as "PyObject_GetItem()". This slot must be filled for the "PyMapping_Check()" function to return "1", it can be "NULL" otherwise. objobjargproc PyMappingMethods.mp_ass_subscript This function is used by "PyObject_SetItem()", "PyObject_DelItem()", "PyObject_SetSlice()" and "PyObject_DelSlice()". It has the same signature as "PyObject_SetItem()", but *v* can also be set to "NULL" to delete an item. If this slot is "NULL", the object does not support item assignment and deletion. Sequence Object Structures ************************** PySequenceMethods This structure holds pointers to the functions which an object uses to implement the sequence protocol. lenfunc PySequenceMethods.sq_length This function is used by "PySequence_Size()" and "PyObject_Size()", and has the same signature. It is also used for handling negative indices via the "sq_item" and the "sq_ass_item" slots. binaryfunc PySequenceMethods.sq_concat This function is used by "PySequence_Concat()" and has the same signature. It is also used by the "+" operator, after trying the numeric addition via the "nb_add" slot. ssizeargfunc PySequenceMethods.sq_repeat This function is used by "PySequence_Repeat()" and has the same signature. It is also used by the "*" operator, after trying numeric multiplication via the "nb_multiply" slot. ssizeargfunc PySequenceMethods.sq_item This function is used by "PySequence_GetItem()" and has the same signature. It is also used by "PyObject_GetItem()", after trying the subscription via the "mp_subscript" slot. This slot must be filled for the "PySequence_Check()" function to return "1", it can be "NULL" otherwise. Negative indexes are handled as follows: if the "sq_length" slot is filled, it is called and the sequence length is used to compute a positive index which is passed to "sq_item". If "sq_length" is "NULL", the index is passed as is to the function. ssizeobjargproc PySequenceMethods.sq_ass_item This function is used by "PySequence_SetItem()" and has the same signature. It is also used by "PyObject_SetItem()" and "PyObject_DelItem()", after trying the item assignment and deletion via the "mp_ass_subscript" slot. This slot may be left to "NULL" if the object does not support item assignment and deletion. objobjproc PySequenceMethods.sq_contains This function may be used by "PySequence_Contains()" and has the same signature. This slot may be left to "NULL", in this case "PySequence_Contains()" simply traverses the sequence until it finds a match. binaryfunc PySequenceMethods.sq_inplace_concat This function is used by "PySequence_InPlaceConcat()" and has the same signature. It should modify its first operand, and return it. This slot may be left to "NULL", in this case "PySequence_InPlaceConcat()" will fall back to "PySequence_Concat()". It is also used by the augmented assignment "+=", after trying numeric in-place addition via the "nb_inplace_add" slot. ssizeargfunc PySequenceMethods.sq_inplace_repeat This function is used by "PySequence_InPlaceRepeat()" and has the same signature. It should modify its first operand, and return it. This slot may be left to "NULL", in this case "PySequence_InPlaceRepeat()" will fall back to "PySequence_Repeat()". It is also used by the augmented assignment "*=", after trying numeric in-place multiplication via the "nb_inplace_multiply" slot. Buffer Object Structures ************************ PyBufferProcs This structure holds pointers to the functions required by the Buffer protocol. The protocol defines how an exporter object can expose its internal data to consumer objects. getbufferproc PyBufferProcs.bf_getbuffer The signature of this function is: int (PyObject *exporter, Py_buffer *view, int flags); Handle a request to *exporter* to fill in *view* as specified by *flags*. Except for point (3), an implementation of this function MUST take these steps: 1. Check if the request can be met. If not, raise "PyExc_BufferError", set "view->obj" to "NULL" and return "-1". 2. Fill in the requested fields. 3. Increment an internal counter for the number of exports. 4. Set "view->obj" to *exporter* and increment "view->obj". 5. Return "0". If *exporter* is part of a chain or tree of buffer providers, two main schemes can be used: * Re-export: Each member of the tree acts as the exporting object and sets "view->obj" to a new reference to itself. * Redirect: The buffer request is redirected to the root object of the tree. Here, "view->obj" will be a new reference to the root object. The individual fields of *view* are described in section Buffer structure, the rules how an exporter must react to specific requests are in section Buffer request types. All memory pointed to in the "Py_buffer" structure belongs to the exporter and must remain valid until there are no consumers left. "format", "shape", "strides", "suboffsets" and "internal" are read- only for the consumer. "PyBuffer_FillInfo()" provides an easy way of exposing a simple bytes buffer while dealing correctly with all request types. "PyObject_GetBuffer()" is the interface for the consumer that wraps this function. releasebufferproc PyBufferProcs.bf_releasebuffer The signature of this function is: void (PyObject *exporter, Py_buffer *view); Handle a request to release the resources of the buffer. If no resources need to be released, "PyBufferProcs.bf_releasebuffer" may be "NULL". Otherwise, a standard implementation of this function will take these optional steps: 1. Decrement an internal counter for the number of exports. 2. If the counter is "0", free all memory associated with *view*. The exporter MUST use the "internal" field to keep track of buffer- specific resources. This field is guaranteed to remain constant, while a consumer MAY pass a copy of the original buffer as the *view* argument. This function MUST NOT decrement "view->obj", since that is done automatically in "PyBuffer_Release()" (this scheme is useful for breaking reference cycles). "PyBuffer_Release()" is the interface for the consumer that wraps this function. Async Object Structures *********************** New in version 3.5. PyAsyncMethods This structure holds pointers to the functions required to implement *awaitable* and *asynchronous iterator* objects. Here is the structure definition: typedef struct { unaryfunc am_await; unaryfunc am_aiter; unaryfunc am_anext; } PyAsyncMethods; unaryfunc PyAsyncMethods.am_await The signature of this function is: PyObject *am_await(PyObject *self); The returned object must be an iterator, i.e. "PyIter_Check()" must return "1" for it. This slot may be set to "NULL" if an object is not an *awaitable*. unaryfunc PyAsyncMethods.am_aiter The signature of this function is: PyObject *am_aiter(PyObject *self); Must return an *asynchronous iterator* object. See "__anext__()" for details. This slot may be set to "NULL" if an object does not implement asynchronous iteration protocol. unaryfunc PyAsyncMethods.am_anext The signature of this function is: PyObject *am_anext(PyObject *self); Must return an *awaitable* object. See "__anext__()" for details. This slot may be set to "NULL". Slot Type typedefs ****************** PyObject *(*allocfunc)(PyTypeObject *cls, Py_ssize_t nitems) The purpose of this function is to separate memory allocation from memory initialization. It should return a pointer to a block of memory of adequate length for the instance, suitably aligned, and initialized to zeros, but with "ob_refcnt" set to "1" and "ob_type" set to the type argument. If the type’s "tp_itemsize" is non-zero, the object’s "ob_size" field should be initialized to *nitems* and the length of the allocated memory block should be "tp_basicsize + nitems*tp_itemsize", rounded up to a multiple of "sizeof(void*)"; otherwise, *nitems* is not used and the length of the block should be "tp_basicsize". This function should not do any other instance initialization, not even to allocate additional memory; that should be done by "tp_new". void (*destructor)(PyObject *) void (*freefunc)(void *) See "tp_free". PyObject *(*newfunc)(PyObject *, PyObject *, PyObject *) See "tp_new". int (*initproc)(PyObject *, PyObject *, PyObject *) See "tp_init". PyObject *(*reprfunc)(PyObject *) See "tp_repr". PyObject *(*getattrfunc)(PyObject *self, char *attr) Return the value of the named attribute for the object. int (*setattrfunc)(PyObject *self, char *attr, PyObject *value) Set the value of the named attribute for the object. The value argument is set to "NULL" to delete the attribute. PyObject *(*getattrofunc)(PyObject *self, PyObject *attr) Return the value of the named attribute for the object. See "tp_getattro". int (*setattrofunc)(PyObject *self, PyObject *attr, PyObject *value) Set the value of the named attribute for the object. The value argument is set to "NULL" to delete the attribute. See "tp_setattro". PyObject *(*descrgetfunc)(PyObject *, PyObject *, PyObject *) See "tp_descrget". int (*descrsetfunc)(PyObject *, PyObject *, PyObject *) See "tp_descrset". Py_hash_t (*hashfunc)(PyObject *) See "tp_hash". PyObject *(*richcmpfunc)(PyObject *, PyObject *, int) See "tp_richcompare". PyObject *(*getiterfunc)(PyObject *) See "tp_iter". PyObject *(*iternextfunc)(PyObject *) See "tp_iternext". Py_ssize_t (*lenfunc)(PyObject *) int (*getbufferproc)(PyObject *, Py_buffer *, int) void (*releasebufferproc)(PyObject *, Py_buffer *) PyObject *(*unaryfunc)(PyObject *) PyObject *(*binaryfunc)(PyObject *, PyObject *) PyObject *(*ternaryfunc)(PyObject *, PyObject *, PyObject *) PyObject *(*ssizeargfunc)(PyObject *, Py_ssize_t) int (*ssizeobjargproc)(PyObject *, Py_ssize_t) int (*objobjproc)(PyObject *, PyObject *) int (*objobjargproc)(PyObject *, PyObject *, PyObject *) Examples ******** The following are simple examples of Python type definitions. They include common usage you may encounter. Some demonstrate tricky corner cases. For more examples, practical info, and a tutorial, see Defining Extension Types: Tutorial and Defining Extension Types: Assorted Topics. A basic static type: typedef struct { PyObject_HEAD const char *data; } MyObject; static PyTypeObject MyObject_Type = { PyVarObject_HEAD_INIT(NULL, 0) .tp_name = "mymod.MyObject", .tp_basicsize = sizeof(MyObject), .tp_doc = PyDoc_STR("My objects"), .tp_new = myobj_new, .tp_dealloc = (destructor)myobj_dealloc, .tp_repr = (reprfunc)myobj_repr, }; You may also find older code (especially in the CPython code base) with a more verbose initializer: static PyTypeObject MyObject_Type = { PyVarObject_HEAD_INIT(NULL, 0) "mymod.MyObject", /* tp_name */ sizeof(MyObject), /* tp_basicsize */ 0, /* tp_itemsize */ (destructor)myobj_dealloc, /* tp_dealloc */ 0, /* tp_vectorcall_offset */ 0, /* tp_getattr */ 0, /* tp_setattr */ 0, /* tp_as_async */ (reprfunc)myobj_repr, /* tp_repr */ 0, /* tp_as_number */ 0, /* tp_as_sequence */ 0, /* tp_as_mapping */ 0, /* tp_hash */ 0, /* tp_call */ 0, /* tp_str */ 0, /* tp_getattro */ 0, /* tp_setattro */ 0, /* tp_as_buffer */ 0, /* tp_flags */ PyDoc_STR("My objects"), /* tp_doc */ 0, /* tp_traverse */ 0, /* tp_clear */ 0, /* tp_richcompare */ 0, /* tp_weaklistoffset */ 0, /* tp_iter */ 0, /* tp_iternext */ 0, /* tp_methods */ 0, /* tp_members */ 0, /* tp_getset */ 0, /* tp_base */ 0, /* tp_dict */ 0, /* tp_descr_get */ 0, /* tp_descr_set */ 0, /* tp_dictoffset */ 0, /* tp_init */ 0, /* tp_alloc */ myobj_new, /* tp_new */ }; A type that supports weakrefs, instance dicts, and hashing: typedef struct { PyObject_HEAD const char *data; PyObject *inst_dict; PyObject *weakreflist; } MyObject; static PyTypeObject MyObject_Type = { PyVarObject_HEAD_INIT(NULL, 0) .tp_name = "mymod.MyObject", .tp_basicsize = sizeof(MyObject), .tp_doc = PyDoc_STR("My objects"), .tp_weaklistoffset = offsetof(MyObject, weakreflist), .tp_dictoffset = offsetof(MyObject, inst_dict), .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE | Py_TPFLAGS_HAVE_GC, .tp_new = myobj_new, .tp_traverse = (traverseproc)myobj_traverse, .tp_clear = (inquiry)myobj_clear, .tp_alloc = PyType_GenericNew, .tp_dealloc = (destructor)myobj_dealloc, .tp_repr = (reprfunc)myobj_repr, .tp_hash = (hashfunc)myobj_hash, .tp_richcompare = PyBaseObject_Type.tp_richcompare, }; A str subclass that cannot be subclassed and cannot be called to create instances (e.g. uses a separate factory func): typedef struct { PyUnicodeObject raw; char *extra; } MyStr; static PyTypeObject MyStr_Type = { PyVarObject_HEAD_INIT(NULL, 0) .tp_name = "mymod.MyStr", .tp_basicsize = sizeof(MyStr), .tp_base = NULL, // set to &PyUnicode_Type in module init .tp_doc = PyDoc_STR("my custom str"), .tp_flags = Py_TPFLAGS_DEFAULT, .tp_new = NULL, .tp_repr = (reprfunc)myobj_repr, }; The simplest static type (with fixed-length instances): typedef struct { PyObject_HEAD } MyObject; static PyTypeObject MyObject_Type = { PyVarObject_HEAD_INIT(NULL, 0) .tp_name = "mymod.MyObject", }; The simplest static type (with variable-length instances): typedef struct { PyObject_VAR_HEAD const char *data[1]; } MyObject; static PyTypeObject MyObject_Type = { PyVarObject_HEAD_INIT(NULL, 0) .tp_name = "mymod.MyObject", .tp_basicsize = sizeof(MyObject) - sizeof(char *), .tp_itemsize = sizeof(char *), }; Unicode Objects and Codecs ************************** Unicode Objects =============== Since the implementation of **PEP 393** in Python 3.3, Unicode objects internally use a variety of representations, in order to allow handling the complete range of Unicode characters while staying memory efficient. There are special cases for strings where all code points are below 128, 256, or 65536; otherwise, code points must be below 1114112 (which is the full Unicode range). "Py_UNICODE*" and UTF-8 representations are created on demand and cached in the Unicode object. The "Py_UNICODE*" representation is deprecated and inefficient. Due to the transition between the old APIs and the new APIs, Unicode objects can internally be in two states depending on how they were created: * “canonical” Unicode objects are all objects created by a non- deprecated Unicode API. They use the most efficient representation allowed by the implementation. * “legacy” Unicode objects have been created through one of the deprecated APIs (typically "PyUnicode_FromUnicode()") and only bear the "Py_UNICODE*" representation; you will have to call "PyUnicode_READY()" on them before calling any other API. Note: The “legacy” Unicode object will be removed in Python 3.12 with deprecated APIs. All Unicode objects will be “canonical” since then. See **PEP 623** for more information. Unicode Type ------------ These are the basic Unicode object types used for the Unicode implementation in Python: Py_UCS4 Py_UCS2 Py_UCS1 These types are typedefs for unsigned integer types wide enough to contain characters of 32 bits, 16 bits and 8 bits, respectively. When dealing with single Unicode characters, use "Py_UCS4". New in version 3.3. Py_UNICODE This is a typedef of "wchar_t", which is a 16-bit type or 32-bit type depending on the platform. Changed in version 3.3: In previous versions, this was a 16-bit type or a 32-bit type depending on whether you selected a “narrow” or “wide” Unicode version of Python at build time. PyASCIIObject PyCompactUnicodeObject PyUnicodeObject These subtypes of "PyObject" represent a Python Unicode object. In almost all cases, they shouldn’t be used directly, since all API functions that deal with Unicode objects take and return "PyObject" pointers. New in version 3.3. PyTypeObject PyUnicode_Type This instance of "PyTypeObject" represents the Python Unicode type. It is exposed to Python code as "str". The following APIs are really C macros and can be used to do fast checks and to access internal read-only data of Unicode objects: int PyUnicode_Check(PyObject *o) Return true if the object *o* is a Unicode object or an instance of a Unicode subtype. This function always succeeds. int PyUnicode_CheckExact(PyObject *o) Return true if the object *o* is a Unicode object, but not an instance of a subtype. This function always succeeds. int PyUnicode_READY(PyObject *o) Ensure the string object *o* is in the “canonical” representation. This is required before using any of the access macros described below. Returns "0" on success and "-1" with an exception set on failure, which in particular happens if memory allocation fails. New in version 3.3. Deprecated since version 3.10, will be removed in version 3.12: This API will be removed with "PyUnicode_FromUnicode()". Py_ssize_t PyUnicode_GET_LENGTH(PyObject *o) Return the length of the Unicode string, in code points. *o* has to be a Unicode object in the “canonical” representation (not checked). New in version 3.3. Py_UCS1* PyUnicode_1BYTE_DATA(PyObject *o) Py_UCS2* PyUnicode_2BYTE_DATA(PyObject *o) Py_UCS4* PyUnicode_4BYTE_DATA(PyObject *o) Return a pointer to the canonical representation cast to UCS1, UCS2 or UCS4 integer types for direct character access. No checks are performed if the canonical representation has the correct character size; use "PyUnicode_KIND()" to select the right macro. Make sure "PyUnicode_READY()" has been called before accessing this. New in version 3.3. PyUnicode_WCHAR_KIND PyUnicode_1BYTE_KIND PyUnicode_2BYTE_KIND PyUnicode_4BYTE_KIND Return values of the "PyUnicode_KIND()" macro. New in version 3.3. Deprecated since version 3.10, will be removed in version 3.12: "PyUnicode_WCHAR_KIND" is deprecated. unsigned int PyUnicode_KIND(PyObject *o) Return one of the PyUnicode kind constants (see above) that indicate how many bytes per character this Unicode object uses to store its data. *o* has to be a Unicode object in the “canonical” representation (not checked). New in version 3.3. void* PyUnicode_DATA(PyObject *o) Return a void pointer to the raw Unicode buffer. *o* has to be a Unicode object in the “canonical” representation (not checked). New in version 3.3. void PyUnicode_WRITE(int kind, void *data, Py_ssize_t index, Py_UCS4 value) Write into a canonical representation *data* (as obtained with "PyUnicode_DATA()"). This macro does not do any sanity checks and is intended for usage in loops. The caller should cache the *kind* value and *data* pointer as obtained from other macro calls. *index* is the index in the string (starts at 0) and *value* is the new code point value which should be written to that location. New in version 3.3. Py_UCS4 PyUnicode_READ(int kind, void *data, Py_ssize_t index) Read a code point from a canonical representation *data* (as obtained with "PyUnicode_DATA()"). No checks or ready calls are performed. New in version 3.3. Py_UCS4 PyUnicode_READ_CHAR(PyObject *o, Py_ssize_t index) Read a character from a Unicode object *o*, which must be in the “canonical” representation. This is less efficient than "PyUnicode_READ()" if you do multiple consecutive reads. New in version 3.3. PyUnicode_MAX_CHAR_VALUE(o) Return the maximum code point that is suitable for creating another string based on *o*, which must be in the “canonical” representation. This is always an approximation but more efficient than iterating over the string. New in version 3.3. Py_ssize_t PyUnicode_GET_SIZE(PyObject *o) Return the size of the deprecated "Py_UNICODE" representation, in code units (this includes surrogate pairs as 2 units). *o* has to be a Unicode object (not checked). Deprecated since version 3.3, will be removed in version 3.12: Part of the old-style Unicode API, please migrate to using "PyUnicode_GET_LENGTH()". Py_ssize_t PyUnicode_GET_DATA_SIZE(PyObject *o) Return the size of the deprecated "Py_UNICODE" representation in bytes. *o* has to be a Unicode object (not checked). Deprecated since version 3.3, will be removed in version 3.12: Part of the old-style Unicode API, please migrate to using "PyUnicode_GET_LENGTH()". Py_UNICODE* PyUnicode_AS_UNICODE(PyObject *o) const char* PyUnicode_AS_DATA(PyObject *o) Return a pointer to a "Py_UNICODE" representation of the object. The returned buffer is always terminated with an extra null code point. It may also contain embedded null code points, which would cause the string to be truncated when used in most C functions. The "AS_DATA" form casts the pointer to "const char *". The *o* argument has to be a Unicode object (not checked). Changed in version 3.3: This macro is now inefficient – because in many cases the "Py_UNICODE" representation does not exist and needs to be created – and can fail (return "NULL" with an exception set). Try to port the code to use the new "PyUnicode_nBYTE_DATA()" macros or use "PyUnicode_WRITE()" or "PyUnicode_READ()". Deprecated since version 3.3, will be removed in version 3.12: Part of the old-style Unicode API, please migrate to using the "PyUnicode_nBYTE_DATA()" family of macros. int PyUnicode_IsIdentifier(PyObject *o) Return "1" if the string is a valid identifier according to the language definition, section Identifiers and keywords. Return "0" otherwise. Changed in version 3.9: The function does not call "Py_FatalError()" anymore if the string is not ready. Unicode Character Properties ---------------------------- Unicode provides many different character properties. The most often needed ones are available through these macros which are mapped to C functions depending on the Python configuration. int Py_UNICODE_ISSPACE(Py_UCS4 ch) Return "1" or "0" depending on whether *ch* is a whitespace character. int Py_UNICODE_ISLOWER(Py_UCS4 ch) Return "1" or "0" depending on whether *ch* is a lowercase character. int Py_UNICODE_ISUPPER(Py_UCS4 ch) Return "1" or "0" depending on whether *ch* is an uppercase character. int Py_UNICODE_ISTITLE(Py_UCS4 ch) Return "1" or "0" depending on whether *ch* is a titlecase character. int Py_UNICODE_ISLINEBREAK(Py_UCS4 ch) Return "1" or "0" depending on whether *ch* is a linebreak character. int Py_UNICODE_ISDECIMAL(Py_UCS4 ch) Return "1" or "0" depending on whether *ch* is a decimal character. int Py_UNICODE_ISDIGIT(Py_UCS4 ch) Return "1" or "0" depending on whether *ch* is a digit character. int Py_UNICODE_ISNUMERIC(Py_UCS4 ch) Return "1" or "0" depending on whether *ch* is a numeric character. int Py_UNICODE_ISALPHA(Py_UCS4 ch) Return "1" or "0" depending on whether *ch* is an alphabetic character. int Py_UNICODE_ISALNUM(Py_UCS4 ch) Return "1" or "0" depending on whether *ch* is an alphanumeric character. int Py_UNICODE_ISPRINTABLE(Py_UCS4 ch) Return "1" or "0" depending on whether *ch* is a printable character. Nonprintable characters are those characters defined in the Unicode character database as “Other” or “Separator”, excepting the ASCII space (0x20) which is considered printable. (Note that printable characters in this context are those which should not be escaped when "repr()" is invoked on a string. It has no bearing on the handling of strings written to "sys.stdout" or "sys.stderr".) These APIs can be used for fast direct character conversions: Py_UCS4 Py_UNICODE_TOLOWER(Py_UCS4 ch) Return the character *ch* converted to lower case. Deprecated since version 3.3: This function uses simple case mappings. Py_UCS4 Py_UNICODE_TOUPPER(Py_UCS4 ch) Return the character *ch* converted to upper case. Deprecated since version 3.3: This function uses simple case mappings. Py_UCS4 Py_UNICODE_TOTITLE(Py_UCS4 ch) Return the character *ch* converted to title case. Deprecated since version 3.3: This function uses simple case mappings. int Py_UNICODE_TODECIMAL(Py_UCS4 ch) Return the character *ch* converted to a decimal positive integer. Return "-1" if this is not possible. This macro does not raise exceptions. int Py_UNICODE_TODIGIT(Py_UCS4 ch) Return the character *ch* converted to a single digit integer. Return "-1" if this is not possible. This macro does not raise exceptions. double Py_UNICODE_TONUMERIC(Py_UCS4 ch) Return the character *ch* converted to a double. Return "-1.0" if this is not possible. This macro does not raise exceptions. These APIs can be used to work with surrogates: Py_UNICODE_IS_SURROGATE(ch) Check if *ch* is a surrogate ("0xD800 <= ch <= 0xDFFF"). Py_UNICODE_IS_HIGH_SURROGATE(ch) Check if *ch* is a high surrogate ("0xD800 <= ch <= 0xDBFF"). Py_UNICODE_IS_LOW_SURROGATE(ch) Check if *ch* is a low surrogate ("0xDC00 <= ch <= 0xDFFF"). Py_UNICODE_JOIN_SURROGATES(high, low) Join two surrogate characters and return a single Py_UCS4 value. *high* and *low* are respectively the leading and trailing surrogates in a surrogate pair. Creating and accessing Unicode strings -------------------------------------- To create Unicode objects and access their basic sequence properties, use these APIs: PyObject* PyUnicode_New(Py_ssize_t size, Py_UCS4 maxchar) *Return value: New reference.* Create a new Unicode object. *maxchar* should be the true maximum code point to be placed in the string. As an approximation, it can be rounded up to the nearest value in the sequence 127, 255, 65535, 1114111. This is the recommended way to allocate a new Unicode object. Objects created using this function are not resizable. New in version 3.3. PyObject* PyUnicode_FromKindAndData(int kind, const void *buffer, Py_ssize_t size) *Return value: New reference.* Create a new Unicode object with the given *kind* (possible values are "PyUnicode_1BYTE_KIND" etc., as returned by "PyUnicode_KIND()"). The *buffer* must point to an array of *size* units of 1, 2 or 4 bytes per character, as given by the kind. New in version 3.3. PyObject* PyUnicode_FromStringAndSize(const char *u, Py_ssize_t size) *Return value: New reference.* Create a Unicode object from the char buffer *u*. The bytes will be interpreted as being UTF-8 encoded. The buffer is copied into the new object. If the buffer is not "NULL", the return value might be a shared object, i.e. modification of the data is not allowed. If *u* is "NULL", this function behaves like "PyUnicode_FromUnicode()" with the buffer set to "NULL". This usage is deprecated in favor of "PyUnicode_New()", and will be removed in Python 3.12. PyObject *PyUnicode_FromString(const char *u) *Return value: New reference.* Create a Unicode object from a UTF-8 encoded null-terminated char buffer *u*. PyObject* PyUnicode_FromFormat(const char *format, ...) *Return value: New reference.* Take a C "printf()"-style *format* string and a variable number of arguments, calculate the size of the resulting Python Unicode string and return a string with the values formatted into it. The variable arguments must be C types and must correspond exactly to the format characters in the *format* ASCII-encoded string. The following format characters are allowed: +---------------------+-----------------------+------------------------------------+ | Format Characters | Type | Comment | |=====================|=======================|====================================| | "%%" | *n/a* | The literal % character. | +---------------------+-----------------------+------------------------------------+ | "%c" | int | A single character, represented as | | | | a C int. | +---------------------+-----------------------+------------------------------------+ | "%d" | int | Equivalent to "printf("%d")". [1] | +---------------------+-----------------------+------------------------------------+ | "%u" | unsigned int | Equivalent to "printf("%u")". [1] | +---------------------+-----------------------+------------------------------------+ | "%ld" | long | Equivalent to "printf("%ld")". [1] | +---------------------+-----------------------+------------------------------------+ | "%li" | long | Equivalent to "printf("%li")". [1] | +---------------------+-----------------------+------------------------------------+ | "%lu" | unsigned long | Equivalent to "printf("%lu")". [1] | +---------------------+-----------------------+------------------------------------+ | "%lld" | long long | Equivalent to "printf("%lld")". | | | | [1] | +---------------------+-----------------------+------------------------------------+ | "%lli" | long long | Equivalent to "printf("%lli")". | | | | [1] | +---------------------+-----------------------+------------------------------------+ | "%llu" | unsigned long long | Equivalent to "printf("%llu")". | | | | [1] | +---------------------+-----------------------+------------------------------------+ | "%zd" | "Py_ssize_t" | Equivalent to "printf("%zd")". [1] | +---------------------+-----------------------+------------------------------------+ | "%zi" | "Py_ssize_t" | Equivalent to "printf("%zi")". [1] | +---------------------+-----------------------+------------------------------------+ | "%zu" | size_t | Equivalent to "printf("%zu")". [1] | +---------------------+-----------------------+------------------------------------+ | "%i" | int | Equivalent to "printf("%i")". [1] | +---------------------+-----------------------+------------------------------------+ | "%x" | int | Equivalent to "printf("%x")". [1] | +---------------------+-----------------------+------------------------------------+ | "%s" | const char* | A null-terminated C character | | | | array. | +---------------------+-----------------------+------------------------------------+ | "%p" | const void* | The hex representation of a C | | | | pointer. Mostly equivalent to | | | | "printf("%p")" except that it is | | | | guaranteed to start with the | | | | literal "0x" regardless of what | | | | the platform’s "printf" yields. | +---------------------+-----------------------+------------------------------------+ | "%A" | PyObject* | The result of calling "ascii()". | +---------------------+-----------------------+------------------------------------+ | "%U" | PyObject* | A Unicode object. | +---------------------+-----------------------+------------------------------------+ | "%V" | PyObject*, const | A Unicode object (which may be | | | char* | "NULL") and a null-terminated C | | | | character array as a second | | | | parameter (which will be used, if | | | | the first parameter is "NULL"). | +---------------------+-----------------------+------------------------------------+ | "%S" | PyObject* | The result of calling | | | | "PyObject_Str()". | +---------------------+-----------------------+------------------------------------+ | "%R" | PyObject* | The result of calling | | | | "PyObject_Repr()". | +---------------------+-----------------------+------------------------------------+ An unrecognized format character causes all the rest of the format string to be copied as-is to the result string, and any extra arguments discarded. Note: The width formatter unit is number of characters rather than bytes. The precision formatter unit is number of bytes for ""%s"" and ""%V"" (if the "PyObject*" argument is "NULL"), and a number of characters for ""%A"", ""%U"", ""%S"", ""%R"" and ""%V"" (if the "PyObject*" argument is not "NULL"). [1] For integer specifiers (d, u, ld, li, lu, lld, lli, llu, zd, zi, zu, i, x): the 0-conversion flag has effect even when a precision is given. Changed in version 3.2: Support for ""%lld"" and ""%llu"" added. Changed in version 3.3: Support for ""%li"", ""%lli"" and ""%zi"" added. Changed in version 3.4: Support width and precision formatter for ""%s"", ""%A"", ""%U"", ""%V"", ""%S"", ""%R"" added. PyObject* PyUnicode_FromFormatV(const char *format, va_list vargs) *Return value: New reference.* Identical to "PyUnicode_FromFormat()" except that it takes exactly two arguments. PyObject* PyUnicode_FromEncodedObject(PyObject *obj, const char *encoding, const char *errors) *Return value: New reference.* Decode an encoded object *obj* to a Unicode object. "bytes", "bytearray" and other *bytes-like objects* are decoded according to the given *encoding* and using the error handling defined by *errors*. Both can be "NULL" to have the interface use the default values (see Built-in Codecs for details). All other objects, including Unicode objects, cause a "TypeError" to be set. The API returns "NULL" if there was an error. The caller is responsible for decref’ing the returned objects. Py_ssize_t PyUnicode_GetLength(PyObject *unicode) Return the length of the Unicode object, in code points. New in version 3.3. Py_ssize_t PyUnicode_CopyCharacters(PyObject *to, Py_ssize_t to_start, PyObject *from, Py_ssize_t from_start, Py_ssize_t how_many) Copy characters from one Unicode object into another. This function performs character conversion when necessary and falls back to "memcpy()" if possible. Returns "-1" and sets an exception on error, otherwise returns the number of copied characters. New in version 3.3. Py_ssize_t PyUnicode_Fill(PyObject *unicode, Py_ssize_t start, Py_ssize_t length, Py_UCS4 fill_char) Fill a string with a character: write *fill_char* into "unicode[start:start+length]". Fail if *fill_char* is bigger than the string maximum character, or if the string has more than 1 reference. Return the number of written character, or return "-1" and raise an exception on error. New in version 3.3. int PyUnicode_WriteChar(PyObject *unicode, Py_ssize_t index, Py_UCS4 character) Write a character to a string. The string must have been created through "PyUnicode_New()". Since Unicode strings are supposed to be immutable, the string must not be shared, or have been hashed yet. This function checks that *unicode* is a Unicode object, that the index is not out of bounds, and that the object can be modified safely (i.e. that it its reference count is one). New in version 3.3. Py_UCS4 PyUnicode_ReadChar(PyObject *unicode, Py_ssize_t index) Read a character from a string. This function checks that *unicode* is a Unicode object and the index is not out of bounds, in contrast to the macro version "PyUnicode_READ_CHAR()". New in version 3.3. PyObject* PyUnicode_Substring(PyObject *str, Py_ssize_t start, Py_ssize_t end) *Return value: New reference.* Return a substring of *str*, from character index *start* (included) to character index *end* (excluded). Negative indices are not supported. New in version 3.3. Py_UCS4* PyUnicode_AsUCS4(PyObject *u, Py_UCS4 *buffer, Py_ssize_t buflen, int copy_null) Copy the string *u* into a UCS4 buffer, including a null character, if *copy_null* is set. Returns "NULL" and sets an exception on error (in particular, a "SystemError" if *buflen* is smaller than the length of *u*). *buffer* is returned on success. New in version 3.3. Py_UCS4* PyUnicode_AsUCS4Copy(PyObject *u) Copy the string *u* into a new UCS4 buffer that is allocated using "PyMem_Malloc()". If this fails, "NULL" is returned with a "MemoryError" set. The returned buffer always has an extra null code point appended. New in version 3.3. Deprecated Py_UNICODE APIs -------------------------- Deprecated since version 3.3, will be removed in version 3.12. These API functions are deprecated with the implementation of **PEP 393**. Extension modules can continue using them, as they will not be removed in Python 3.x, but need to be aware that their use can now cause performance and memory hits. PyObject* PyUnicode_FromUnicode(const Py_UNICODE *u, Py_ssize_t size) *Return value: New reference.* Create a Unicode object from the Py_UNICODE buffer *u* of the given size. *u* may be "NULL" which causes the contents to be undefined. It is the user’s responsibility to fill in the needed data. The buffer is copied into the new object. If the buffer is not "NULL", the return value might be a shared object. Therefore, modification of the resulting Unicode object is only allowed when *u* is "NULL". If the buffer is "NULL", "PyUnicode_READY()" must be called once the string content has been filled before using any of the access macros such as "PyUnicode_KIND()". Deprecated since version 3.3, will be removed in version 3.12: Part of the old-style Unicode API, please migrate to using "PyUnicode_FromKindAndData()", "PyUnicode_FromWideChar()", or "PyUnicode_New()". Py_UNICODE* PyUnicode_AsUnicode(PyObject *unicode) Return a read-only pointer to the Unicode object’s internal "Py_UNICODE" buffer, or "NULL" on error. This will create the "Py_UNICODE*" representation of the object if it is not yet available. The buffer is always terminated with an extra null code point. Note that the resulting "Py_UNICODE" string may also contain embedded null code points, which would cause the string to be truncated when used in most C functions. Deprecated since version 3.3, will be removed in version 3.12: Part of the old-style Unicode API, please migrate to using "PyUnicode_AsUCS4()", "PyUnicode_AsWideChar()", "PyUnicode_ReadChar()" or similar new APIs. Deprecated since version 3.3, will be removed in version 3.10. PyObject* PyUnicode_TransformDecimalToASCII(Py_UNICODE *s, Py_ssize_t size) *Return value: New reference.* Create a Unicode object by replacing all decimal digits in "Py_UNICODE" buffer of the given *size* by ASCII digits 0–9 according to their decimal value. Return "NULL" if an exception occurs. Deprecated since version 3.3, will be removed in version 3.11: Part of the old-style "Py_UNICODE" API; please migrate to using "Py_UNICODE_TODECIMAL()". Py_UNICODE* PyUnicode_AsUnicodeAndSize(PyObject *unicode, Py_ssize_t *size) Like "PyUnicode_AsUnicode()", but also saves the "Py_UNICODE()" array length (excluding the extra null terminator) in *size*. Note that the resulting "Py_UNICODE*" string may contain embedded null code points, which would cause the string to be truncated when used in most C functions. New in version 3.3. Deprecated since version 3.3, will be removed in version 3.12: Part of the old-style Unicode API, please migrate to using "PyUnicode_AsUCS4()", "PyUnicode_AsWideChar()", "PyUnicode_ReadChar()" or similar new APIs. Py_UNICODE* PyUnicode_AsUnicodeCopy(PyObject *unicode) Create a copy of a Unicode string ending with a null code point. Return "NULL" and raise a "MemoryError" exception on memory allocation failure, otherwise return a new allocated buffer (use "PyMem_Free()" to free the buffer). Note that the resulting "Py_UNICODE*" string may contain embedded null code points, which would cause the string to be truncated when used in most C functions. New in version 3.2. Please migrate to using "PyUnicode_AsUCS4Copy()" or similar new APIs. Py_ssize_t PyUnicode_GetSize(PyObject *unicode) Return the size of the deprecated "Py_UNICODE" representation, in code units (this includes surrogate pairs as 2 units). Deprecated since version 3.3, will be removed in version 3.12: Part of the old-style Unicode API, please migrate to using "PyUnicode_GET_LENGTH()". PyObject* PyUnicode_FromObject(PyObject *obj) *Return value: New reference.* Copy an instance of a Unicode subtype to a new true Unicode object if necessary. If *obj* is already a true Unicode object (not a subtype), return the reference with incremented refcount. Objects other than Unicode or its subtypes will cause a "TypeError". Locale Encoding --------------- The current locale encoding can be used to decode text from the operating system. PyObject* PyUnicode_DecodeLocaleAndSize(const char *str, Py_ssize_t len, const char *errors) *Return value: New reference.* Decode a string from UTF-8 on Android and VxWorks, or from the current locale encoding on other platforms. The supported error handlers are ""strict"" and ""surrogateescape"" (**PEP 383**). The decoder uses ""strict"" error handler if *errors* is "NULL". *str* must end with a null character but cannot contain embedded null characters. Use "PyUnicode_DecodeFSDefaultAndSize()" to decode a string from "Py_FileSystemDefaultEncoding" (the locale encoding read at Python startup). This function ignores the Python UTF-8 mode. See also: The "Py_DecodeLocale()" function. New in version 3.3. Changed in version 3.7: The function now also uses the current locale encoding for the "surrogateescape" error handler, except on Android. Previously, "Py_DecodeLocale()" was used for the "surrogateescape", and the current locale encoding was used for "strict". PyObject* PyUnicode_DecodeLocale(const char *str, const char *errors) *Return value: New reference.* Similar to "PyUnicode_DecodeLocaleAndSize()", but compute the string length using "strlen()". New in version 3.3. PyObject* PyUnicode_EncodeLocale(PyObject *unicode, const char *errors) *Return value: New reference.* Encode a Unicode object to UTF-8 on Android and VxWorks, or to the current locale encoding on other platforms. The supported error handlers are ""strict"" and ""surrogateescape"" (**PEP 383**). The encoder uses ""strict"" error handler if *errors* is "NULL". Return a "bytes" object. *unicode* cannot contain embedded null characters. Use "PyUnicode_EncodeFSDefault()" to encode a string to "Py_FileSystemDefaultEncoding" (the locale encoding read at Python startup). This function ignores the Python UTF-8 mode. See also: The "Py_EncodeLocale()" function. New in version 3.3. Changed in version 3.7: The function now also uses the current locale encoding for the "surrogateescape" error handler, except on Android. Previously, "Py_EncodeLocale()" was used for the "surrogateescape", and the current locale encoding was used for "strict". File System Encoding -------------------- To encode and decode file names and other environment strings, "Py_FileSystemDefaultEncoding" should be used as the encoding, and "Py_FileSystemDefaultEncodeErrors" should be used as the error handler (**PEP 383** and **PEP 529**). To encode file names to "bytes" during argument parsing, the ""O&"" converter should be used, passing "PyUnicode_FSConverter()" as the conversion function: int PyUnicode_FSConverter(PyObject* obj, void* result) ParseTuple converter: encode "str" objects – obtained directly or through the "os.PathLike" interface – to "bytes" using "PyUnicode_EncodeFSDefault()"; "bytes" objects are output as-is. *result* must be a "PyBytesObject*" which must be released when it is no longer used. New in version 3.1. Changed in version 3.6: Accepts a *path-like object*. To decode file names to "str" during argument parsing, the ""O&"" converter should be used, passing "PyUnicode_FSDecoder()" as the conversion function: int PyUnicode_FSDecoder(PyObject* obj, void* result) ParseTuple converter: decode "bytes" objects – obtained either directly or indirectly through the "os.PathLike" interface – to "str" using "PyUnicode_DecodeFSDefaultAndSize()"; "str" objects are output as-is. *result* must be a "PyUnicodeObject*" which must be released when it is no longer used. New in version 3.2. Changed in version 3.6: Accepts a *path-like object*. PyObject* PyUnicode_DecodeFSDefaultAndSize(const char *s, Py_ssize_t size) *Return value: New reference.* Decode a string using "Py_FileSystemDefaultEncoding" and the "Py_FileSystemDefaultEncodeErrors" error handler. If "Py_FileSystemDefaultEncoding" is not set, fall back to the locale encoding. "Py_FileSystemDefaultEncoding" is initialized at startup from the locale encoding and cannot be modified later. If you need to decode a string from the current locale encoding, use "PyUnicode_DecodeLocaleAndSize()". See also: The "Py_DecodeLocale()" function. Changed in version 3.6: Use "Py_FileSystemDefaultEncodeErrors" error handler. PyObject* PyUnicode_DecodeFSDefault(const char *s) *Return value: New reference.* Decode a null-terminated string using "Py_FileSystemDefaultEncoding" and the "Py_FileSystemDefaultEncodeErrors" error handler. If "Py_FileSystemDefaultEncoding" is not set, fall back to the locale encoding. Use "PyUnicode_DecodeFSDefaultAndSize()" if you know the string length. Changed in version 3.6: Use "Py_FileSystemDefaultEncodeErrors" error handler. PyObject* PyUnicode_EncodeFSDefault(PyObject *unicode) *Return value: New reference.* Encode a Unicode object to "Py_FileSystemDefaultEncoding" with the "Py_FileSystemDefaultEncodeErrors" error handler, and return "bytes". Note that the resulting "bytes" object may contain null bytes. If "Py_FileSystemDefaultEncoding" is not set, fall back to the locale encoding. "Py_FileSystemDefaultEncoding" is initialized at startup from the locale encoding and cannot be modified later. If you need to encode a string to the current locale encoding, use "PyUnicode_EncodeLocale()". See also: The "Py_EncodeLocale()" function. New in version 3.2. Changed in version 3.6: Use "Py_FileSystemDefaultEncodeErrors" error handler. wchar_t Support --------------- "wchar_t" support for platforms which support it: PyObject* PyUnicode_FromWideChar(const wchar_t *w, Py_ssize_t size) *Return value: New reference.* Create a Unicode object from the "wchar_t" buffer *w* of the given *size*. Passing "-1" as the *size* indicates that the function must itself compute the length, using wcslen. Return "NULL" on failure. Py_ssize_t PyUnicode_AsWideChar(PyObject *unicode, wchar_t *w, Py_ssize_t size) Copy the Unicode object contents into the "wchar_t" buffer *w*. At most *size* "wchar_t" characters are copied (excluding a possibly trailing null termination character). Return the number of "wchar_t" characters copied or "-1" in case of an error. Note that the resulting "wchar_t*" string may or may not be null-terminated. It is the responsibility of the caller to make sure that the "wchar_t*" string is null-terminated in case this is required by the application. Also, note that the "wchar_t*" string might contain null characters, which would cause the string to be truncated when used with most C functions. wchar_t* PyUnicode_AsWideCharString(PyObject *unicode, Py_ssize_t *size) Convert the Unicode object to a wide character string. The output string always ends with a null character. If *size* is not "NULL", write the number of wide characters (excluding the trailing null termination character) into **size*. Note that the resulting "wchar_t" string might contain null characters, which would cause the string to be truncated when used with most C functions. If *size* is "NULL" and the "wchar_t*" string contains null characters a "ValueError" is raised. Returns a buffer allocated by "PyMem_Alloc()" (use "PyMem_Free()" to free it) on success. On error, returns "NULL" and **size* is undefined. Raises a "MemoryError" if memory allocation is failed. New in version 3.2. Changed in version 3.7: Raises a "ValueError" if *size* is "NULL" and the "wchar_t*" string contains null characters. Built-in Codecs =============== Python provides a set of built-in codecs which are written in C for speed. All of these codecs are directly usable via the following functions. Many of the following APIs take two arguments encoding and errors, and they have the same semantics as the ones of the built-in "str()" string object constructor. Setting encoding to "NULL" causes the default encoding to be used which is UTF-8. The file system calls should use "PyUnicode_FSConverter()" for encoding file names. This uses the variable "Py_FileSystemDefaultEncoding" internally. This variable should be treated as read-only: on some systems, it will be a pointer to a static string, on others, it will change at run-time (such as when the application invokes setlocale). Error handling is set by errors which may also be set to "NULL" meaning to use the default handling defined for the codec. Default error handling for all built-in codecs is “strict” ("ValueError" is raised). The codecs all use a similar interface. Only deviations from the following generic ones are documented for simplicity. Generic Codecs -------------- These are the generic codec APIs: PyObject* PyUnicode_Decode(const char *s, Py_ssize_t size, const char *encoding, const char *errors) *Return value: New reference.* Create a Unicode object by decoding *size* bytes of the encoded string *s*. *encoding* and *errors* have the same meaning as the parameters of the same name in the "str()" built-in function. The codec to be used is looked up using the Python codec registry. Return "NULL" if an exception was raised by the codec. PyObject* PyUnicode_AsEncodedString(PyObject *unicode, const char *encoding, const char *errors) *Return value: New reference.* Encode a Unicode object and return the result as Python bytes object. *encoding* and *errors* have the same meaning as the parameters of the same name in the Unicode "encode()" method. The codec to be used is looked up using the Python codec registry. Return "NULL" if an exception was raised by the codec. PyObject* PyUnicode_Encode(const Py_UNICODE *s, Py_ssize_t size, const char *encoding, const char *errors) *Return value: New reference.* Encode the "Py_UNICODE" buffer *s* of the given *size* and return a Python bytes object. *encoding* and *errors* have the same meaning as the parameters of the same name in the Unicode "encode()" method. The codec to be used is looked up using the Python codec registry. Return "NULL" if an exception was raised by the codec. Deprecated since version 3.3, will be removed in version 3.11: Part of the old-style "Py_UNICODE" API; please migrate to using "PyUnicode_AsEncodedString()". UTF-8 Codecs ------------ These are the UTF-8 codec APIs: PyObject* PyUnicode_DecodeUTF8(const char *s, Py_ssize_t size, const char *errors) *Return value: New reference.* Create a Unicode object by decoding *size* bytes of the UTF-8 encoded string *s*. Return "NULL" if an exception was raised by the codec. PyObject* PyUnicode_DecodeUTF8Stateful(const char *s, Py_ssize_t size, const char *errors, Py_ssize_t *consumed) *Return value: New reference.* If *consumed* is "NULL", behave like "PyUnicode_DecodeUTF8()". If *consumed* is not "NULL", trailing incomplete UTF-8 byte sequences will not be treated as an error. Those bytes will not be decoded and the number of bytes that have been decoded will be stored in *consumed*. PyObject* PyUnicode_AsUTF8String(PyObject *unicode) *Return value: New reference.* Encode a Unicode object using UTF-8 and return the result as Python bytes object. Error handling is “strict”. Return "NULL" if an exception was raised by the codec. const char* PyUnicode_AsUTF8AndSize(PyObject *unicode, Py_ssize_t *size) Return a pointer to the UTF-8 encoding of the Unicode object, and store the size of the encoded representation (in bytes) in *size*. The *size* argument can be "NULL"; in this case no size will be stored. The returned buffer always has an extra null byte appended (not included in *size*), regardless of whether there are any other null code points. In the case of an error, "NULL" is returned with an exception set and no *size* is stored. This caches the UTF-8 representation of the string in the Unicode object, and subsequent calls will return a pointer to the same buffer. The caller is not responsible for deallocating the buffer. The buffer is deallocated and pointers to it become invalid when the Unicode object is garbage collected. New in version 3.3. Changed in version 3.7: The return type is now "const char *" rather of "char *". const char* PyUnicode_AsUTF8(PyObject *unicode) As "PyUnicode_AsUTF8AndSize()", but does not store the size. New in version 3.3. Changed in version 3.7: The return type is now "const char *" rather of "char *". PyObject* PyUnicode_EncodeUTF8(const Py_UNICODE *s, Py_ssize_t size, const char *errors) *Return value: New reference.* Encode the "Py_UNICODE" buffer *s* of the given *size* using UTF-8 and return a Python bytes object. Return "NULL" if an exception was raised by the codec. Deprecated since version 3.3, will be removed in version 3.11: Part of the old-style "Py_UNICODE" API; please migrate to using "PyUnicode_AsUTF8String()", "PyUnicode_AsUTF8AndSize()" or "PyUnicode_AsEncodedString()". UTF-32 Codecs ------------- These are the UTF-32 codec APIs: PyObject* PyUnicode_DecodeUTF32(const char *s, Py_ssize_t size, const char *errors, int *byteorder) *Return value: New reference.* Decode *size* bytes from a UTF-32 encoded buffer string and return the corresponding Unicode object. *errors* (if non-"NULL") defines the error handling. It defaults to “strict”. If *byteorder* is non-"NULL", the decoder starts decoding using the given byte order: *byteorder == -1: little endian *byteorder == 0: native order *byteorder == 1: big endian If "*byteorder" is zero, and the first four bytes of the input data are a byte order mark (BOM), the decoder switches to this byte order and the BOM is not copied into the resulting Unicode string. If "*byteorder" is "-1" or "1", any byte order mark is copied to the output. After completion, **byteorder* is set to the current byte order at the end of input data. If *byteorder* is "NULL", the codec starts in native order mode. Return "NULL" if an exception was raised by the codec. PyObject* PyUnicode_DecodeUTF32Stateful(const char *s, Py_ssize_t size, const char *errors, int *byteorder, Py_ssize_t *consumed) *Return value: New reference.* If *consumed* is "NULL", behave like "PyUnicode_DecodeUTF32()". If *consumed* is not "NULL", "PyUnicode_DecodeUTF32Stateful()" will not treat trailing incomplete UTF-32 byte sequences (such as a number of bytes not divisible by four) as an error. Those bytes will not be decoded and the number of bytes that have been decoded will be stored in *consumed*. PyObject* PyUnicode_AsUTF32String(PyObject *unicode) *Return value: New reference.* Return a Python byte string using the UTF-32 encoding in native byte order. The string always starts with a BOM mark. Error handling is “strict”. Return "NULL" if an exception was raised by the codec. PyObject* PyUnicode_EncodeUTF32(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder) *Return value: New reference.* Return a Python bytes object holding the UTF-32 encoded value of the Unicode data in *s*. Output is written according to the following byte order: byteorder == -1: little endian byteorder == 0: native byte order (writes a BOM mark) byteorder == 1: big endian If byteorder is "0", the output string will always start with the Unicode BOM mark (U+FEFF). In the other two modes, no BOM mark is prepended. If "Py_UNICODE_WIDE" is not defined, surrogate pairs will be output as a single code point. Return "NULL" if an exception was raised by the codec. Deprecated since version 3.3, will be removed in version 3.11: Part of the old-style "Py_UNICODE" API; please migrate to using "PyUnicode_AsUTF32String()" or "PyUnicode_AsEncodedString()". UTF-16 Codecs ------------- These are the UTF-16 codec APIs: PyObject* PyUnicode_DecodeUTF16(const char *s, Py_ssize_t size, const char *errors, int *byteorder) *Return value: New reference.* Decode *size* bytes from a UTF-16 encoded buffer string and return the corresponding Unicode object. *errors* (if non-"NULL") defines the error handling. It defaults to “strict”. If *byteorder* is non-"NULL", the decoder starts decoding using the given byte order: *byteorder == -1: little endian *byteorder == 0: native order *byteorder == 1: big endian If "*byteorder" is zero, and the first two bytes of the input data are a byte order mark (BOM), the decoder switches to this byte order and the BOM is not copied into the resulting Unicode string. If "*byteorder" is "-1" or "1", any byte order mark is copied to the output (where it will result in either a "\ufeff" or a "\ufffe" character). After completion, "*byteorder" is set to the current byte order at the end of input data. If *byteorder* is "NULL", the codec starts in native order mode. Return "NULL" if an exception was raised by the codec. PyObject* PyUnicode_DecodeUTF16Stateful(const char *s, Py_ssize_t size, const char *errors, int *byteorder, Py_ssize_t *consumed) *Return value: New reference.* If *consumed* is "NULL", behave like "PyUnicode_DecodeUTF16()". If *consumed* is not "NULL", "PyUnicode_DecodeUTF16Stateful()" will not treat trailing incomplete UTF-16 byte sequences (such as an odd number of bytes or a split surrogate pair) as an error. Those bytes will not be decoded and the number of bytes that have been decoded will be stored in *consumed*. PyObject* PyUnicode_AsUTF16String(PyObject *unicode) *Return value: New reference.* Return a Python byte string using the UTF-16 encoding in native byte order. The string always starts with a BOM mark. Error handling is “strict”. Return "NULL" if an exception was raised by the codec. PyObject* PyUnicode_EncodeUTF16(const Py_UNICODE *s, Py_ssize_t size, const char *errors, int byteorder) *Return value: New reference.* Return a Python bytes object holding the UTF-16 encoded value of the Unicode data in *s*. Output is written according to the following byte order: byteorder == -1: little endian byteorder == 0: native byte order (writes a BOM mark) byteorder == 1: big endian If byteorder is "0", the output string will always start with the Unicode BOM mark (U+FEFF). In the other two modes, no BOM mark is prepended. If "Py_UNICODE_WIDE" is defined, a single "Py_UNICODE" value may get represented as a surrogate pair. If it is not defined, each "Py_UNICODE" values is interpreted as a UCS-2 character. Return "NULL" if an exception was raised by the codec. Deprecated since version 3.3, will be removed in version 3.11: Part of the old-style "Py_UNICODE" API; please migrate to using "PyUnicode_AsUTF16String()" or "PyUnicode_AsEncodedString()". UTF-7 Codecs ------------ These are the UTF-7 codec APIs: PyObject* PyUnicode_DecodeUTF7(const char *s, Py_ssize_t size, const char *errors) *Return value: New reference.* Create a Unicode object by decoding *size* bytes of the UTF-7 encoded string *s*. Return "NULL" if an exception was raised by the codec. PyObject* PyUnicode_DecodeUTF7Stateful(const char *s, Py_ssize_t size, const char *errors, Py_ssize_t *consumed) *Return value: New reference.* If *consumed* is "NULL", behave like "PyUnicode_DecodeUTF7()". If *consumed* is not "NULL", trailing incomplete UTF-7 base-64 sections will not be treated as an error. Those bytes will not be decoded and the number of bytes that have been decoded will be stored in *consumed*. PyObject* PyUnicode_EncodeUTF7(const Py_UNICODE *s, Py_ssize_t size, int base64SetO, int base64WhiteSpace, const char *errors) *Return value: New reference.* Encode the "Py_UNICODE" buffer of the given size using UTF-7 and return a Python bytes object. Return "NULL" if an exception was raised by the codec. If *base64SetO* is nonzero, “Set O” (punctuation that has no otherwise special meaning) will be encoded in base-64. If *base64WhiteSpace* is nonzero, whitespace will be encoded in base-64. Both are set to zero for the Python “utf-7” codec. Deprecated since version 3.3, will be removed in version 3.11: Part of the old-style "Py_UNICODE" API; please migrate to using "PyUnicode_AsEncodedString()". Unicode-Escape Codecs --------------------- These are the “Unicode Escape” codec APIs: PyObject* PyUnicode_DecodeUnicodeEscape(const char *s, Py_ssize_t size, const char *errors) *Return value: New reference.* Create a Unicode object by decoding *size* bytes of the Unicode- Escape encoded string *s*. Return "NULL" if an exception was raised by the codec. PyObject* PyUnicode_AsUnicodeEscapeString(PyObject *unicode) *Return value: New reference.* Encode a Unicode object using Unicode-Escape and return the result as a bytes object. Error handling is “strict”. Return "NULL" if an exception was raised by the codec. PyObject* PyUnicode_EncodeUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size) *Return value: New reference.* Encode the "Py_UNICODE" buffer of the given *size* using Unicode- Escape and return a bytes object. Return "NULL" if an exception was raised by the codec. Deprecated since version 3.3, will be removed in version 3.11: Part of the old-style "Py_UNICODE" API; please migrate to using "PyUnicode_AsUnicodeEscapeString()". Raw-Unicode-Escape Codecs ------------------------- These are the “Raw Unicode Escape” codec APIs: PyObject* PyUnicode_DecodeRawUnicodeEscape(const char *s, Py_ssize_t size, const char *errors) *Return value: New reference.* Create a Unicode object by decoding *size* bytes of the Raw- Unicode-Escape encoded string *s*. Return "NULL" if an exception was raised by the codec. PyObject* PyUnicode_AsRawUnicodeEscapeString(PyObject *unicode) *Return value: New reference.* Encode a Unicode object using Raw-Unicode-Escape and return the result as a bytes object. Error handling is “strict”. Return "NULL" if an exception was raised by the codec. PyObject* PyUnicode_EncodeRawUnicodeEscape(const Py_UNICODE *s, Py_ssize_t size) *Return value: New reference.* Encode the "Py_UNICODE" buffer of the given *size* using Raw- Unicode-Escape and return a bytes object. Return "NULL" if an exception was raised by the codec. Deprecated since version 3.3, will be removed in version 3.11: Part of the old-style "Py_UNICODE" API; please migrate to using "PyUnicode_AsRawUnicodeEscapeString()" or "PyUnicode_AsEncodedString()". Latin-1 Codecs -------------- These are the Latin-1 codec APIs: Latin-1 corresponds to the first 256 Unicode ordinals and only these are accepted by the codecs during encoding. PyObject* PyUnicode_DecodeLatin1(const char *s, Py_ssize_t size, const char *errors) *Return value: New reference.* Create a Unicode object by decoding *size* bytes of the Latin-1 encoded string *s*. Return "NULL" if an exception was raised by the codec. PyObject* PyUnicode_AsLatin1String(PyObject *unicode) *Return value: New reference.* Encode a Unicode object using Latin-1 and return the result as Python bytes object. Error handling is “strict”. Return "NULL" if an exception was raised by the codec. PyObject* PyUnicode_EncodeLatin1(const Py_UNICODE *s, Py_ssize_t size, const char *errors) *Return value: New reference.* Encode the "Py_UNICODE" buffer of the given *size* using Latin-1 and return a Python bytes object. Return "NULL" if an exception was raised by the codec. Deprecated since version 3.3, will be removed in version 3.11: Part of the old-style "Py_UNICODE" API; please migrate to using "PyUnicode_AsLatin1String()" or "PyUnicode_AsEncodedString()". ASCII Codecs ------------ These are the ASCII codec APIs. Only 7-bit ASCII data is accepted. All other codes generate errors. PyObject* PyUnicode_DecodeASCII(const char *s, Py_ssize_t size, const char *errors) *Return value: New reference.* Create a Unicode object by decoding *size* bytes of the ASCII encoded string *s*. Return "NULL" if an exception was raised by the codec. PyObject* PyUnicode_AsASCIIString(PyObject *unicode) *Return value: New reference.* Encode a Unicode object using ASCII and return the result as Python bytes object. Error handling is “strict”. Return "NULL" if an exception was raised by the codec. PyObject* PyUnicode_EncodeASCII(const Py_UNICODE *s, Py_ssize_t size, const char *errors) *Return value: New reference.* Encode the "Py_UNICODE" buffer of the given *size* using ASCII and return a Python bytes object. Return "NULL" if an exception was raised by the codec. Deprecated since version 3.3, will be removed in version 3.11: Part of the old-style "Py_UNICODE" API; please migrate to using "PyUnicode_AsASCIIString()" or "PyUnicode_AsEncodedString()". Character Map Codecs -------------------- This codec is special in that it can be used to implement many different codecs (and this is in fact what was done to obtain most of the standard codecs included in the "encodings" package). The codec uses mappings to encode and decode characters. The mapping objects provided must support the "__getitem__()" mapping interface; dictionaries and sequences work well. These are the mapping codec APIs: PyObject* PyUnicode_DecodeCharmap(const char *data, Py_ssize_t size, PyObject *mapping, const char *errors) *Return value: New reference.* Create a Unicode object by decoding *size* bytes of the encoded string *s* using the given *mapping* object. Return "NULL" if an exception was raised by the codec. If *mapping* is "NULL", Latin-1 decoding will be applied. Else *mapping* must map bytes ordinals (integers in the range from 0 to 255) to Unicode strings, integers (which are then interpreted as Unicode ordinals) or "None". Unmapped data bytes – ones which cause a "LookupError", as well as ones which get mapped to "None", "0xFFFE" or "'\ufffe'", are treated as undefined mappings and cause an error. PyObject* PyUnicode_AsCharmapString(PyObject *unicode, PyObject *mapping) *Return value: New reference.* Encode a Unicode object using the given *mapping* object and return the result as a bytes object. Error handling is “strict”. Return "NULL" if an exception was raised by the codec. The *mapping* object must map Unicode ordinal integers to bytes objects, integers in the range from 0 to 255 or "None". Unmapped character ordinals (ones which cause a "LookupError") as well as mapped to "None" are treated as “undefined mapping” and cause an error. PyObject* PyUnicode_EncodeCharmap(const Py_UNICODE *s, Py_ssize_t size, PyObject *mapping, const char *errors) *Return value: New reference.* Encode the "Py_UNICODE" buffer of the given *size* using the given *mapping* object and return the result as a bytes object. Return "NULL" if an exception was raised by the codec. Deprecated since version 3.3, will be removed in version 3.11: Part of the old-style "Py_UNICODE" API; please migrate to using "PyUnicode_AsCharmapString()" or "PyUnicode_AsEncodedString()". The following codec API is special in that maps Unicode to Unicode. PyObject* PyUnicode_Translate(PyObject *str, PyObject *table, const char *errors) *Return value: New reference.* Translate a string by applying a character mapping table to it and return the resulting Unicode object. Return "NULL" if an exception was raised by the codec. The mapping table must map Unicode ordinal integers to Unicode ordinal integers or "None" (causing deletion of the character). Mapping tables need only provide the "__getitem__()" interface; dictionaries and sequences work well. Unmapped character ordinals (ones which cause a "LookupError") are left untouched and are copied as-is. *errors* has the usual meaning for codecs. It may be "NULL" which indicates to use the default error handling. PyObject* PyUnicode_TranslateCharmap(const Py_UNICODE *s, Py_ssize_t size, PyObject *mapping, const char *errors) *Return value: New reference.* Translate a "Py_UNICODE" buffer of the given *size* by applying a character *mapping* table to it and return the resulting Unicode object. Return "NULL" when an exception was raised by the codec. Deprecated since version 3.3, will be removed in version 3.11: Part of the old-style "Py_UNICODE" API; please migrate to using "PyUnicode_Translate()". or generic codec based API MBCS codecs for Windows ----------------------- These are the MBCS codec APIs. They are currently only available on Windows and use the Win32 MBCS converters to implement the conversions. Note that MBCS (or DBCS) is a class of encodings, not just one. The target encoding is defined by the user settings on the machine running the codec. PyObject* PyUnicode_DecodeMBCS(const char *s, Py_ssize_t size, const char *errors) *Return value: New reference.* Create a Unicode object by decoding *size* bytes of the MBCS encoded string *s*. Return "NULL" if an exception was raised by the codec. PyObject* PyUnicode_DecodeMBCSStateful(const char *s, Py_ssize_t size, const char *errors, Py_ssize_t *consumed) *Return value: New reference.* If *consumed* is "NULL", behave like "PyUnicode_DecodeMBCS()". If *consumed* is not "NULL", "PyUnicode_DecodeMBCSStateful()" will not decode trailing lead byte and the number of bytes that have been decoded will be stored in *consumed*. PyObject* PyUnicode_AsMBCSString(PyObject *unicode) *Return value: New reference.* Encode a Unicode object using MBCS and return the result as Python bytes object. Error handling is “strict”. Return "NULL" if an exception was raised by the codec. PyObject* PyUnicode_EncodeCodePage(int code_page, PyObject *unicode, const char *errors) *Return value: New reference.* Encode the Unicode object using the specified code page and return a Python bytes object. Return "NULL" if an exception was raised by the codec. Use "CP_ACP" code page to get the MBCS encoder. New in version 3.3. PyObject* PyUnicode_EncodeMBCS(const Py_UNICODE *s, Py_ssize_t size, const char *errors) *Return value: New reference.* Encode the "Py_UNICODE" buffer of the given *size* using MBCS and return a Python bytes object. Return "NULL" if an exception was raised by the codec. Deprecated since version 3.3, will be removed in version 4.0: Part of the old-style "Py_UNICODE" API; please migrate to using "PyUnicode_AsMBCSString()", "PyUnicode_EncodeCodePage()" or "PyUnicode_AsEncodedString()". Methods & Slots --------------- Methods and Slot Functions ========================== The following APIs are capable of handling Unicode objects and strings on input (we refer to them as strings in the descriptions) and return Unicode objects or integers as appropriate. They all return "NULL" or "-1" if an exception occurs. PyObject* PyUnicode_Concat(PyObject *left, PyObject *right) *Return value: New reference.* Concat two strings giving a new Unicode string. PyObject* PyUnicode_Split(PyObject *s, PyObject *sep, Py_ssize_t maxsplit) *Return value: New reference.* Split a string giving a list of Unicode strings. If *sep* is "NULL", splitting will be done at all whitespace substrings. Otherwise, splits occur at the given separator. At most *maxsplit* splits will be done. If negative, no limit is set. Separators are not included in the resulting list. PyObject* PyUnicode_Splitlines(PyObject *s, int keepend) *Return value: New reference.* Split a Unicode string at line breaks, returning a list of Unicode strings. CRLF is considered to be one line break. If *keepend* is "0", the line break characters are not included in the resulting strings. PyObject* PyUnicode_Join(PyObject *separator, PyObject *seq) *Return value: New reference.* Join a sequence of strings using the given *separator* and return the resulting Unicode string. Py_ssize_t PyUnicode_Tailmatch(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end, int direction) Return "1" if *substr* matches "str[start:end]" at the given tail end (*direction* == "-1" means to do a prefix match, *direction* == "1" a suffix match), "0" otherwise. Return "-1" if an error occurred. Py_ssize_t PyUnicode_Find(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end, int direction) Return the first position of *substr* in "str[start:end]" using the given *direction* (*direction* == "1" means to do a forward search, *direction* == "-1" a backward search). The return value is the index of the first match; a value of "-1" indicates that no match was found, and "-2" indicates that an error occurred and an exception has been set. Py_ssize_t PyUnicode_FindChar(PyObject *str, Py_UCS4 ch, Py_ssize_t start, Py_ssize_t end, int direction) Return the first position of the character *ch* in "str[start:end]" using the given *direction* (*direction* == "1" means to do a forward search, *direction* == "-1" a backward search). The return value is the index of the first match; a value of "-1" indicates that no match was found, and "-2" indicates that an error occurred and an exception has been set. New in version 3.3. Changed in version 3.7: *start* and *end* are now adjusted to behave like "str[start:end]". Py_ssize_t PyUnicode_Count(PyObject *str, PyObject *substr, Py_ssize_t start, Py_ssize_t end) Return the number of non-overlapping occurrences of *substr* in "str[start:end]". Return "-1" if an error occurred. PyObject* PyUnicode_Replace(PyObject *str, PyObject *substr, PyObject *replstr, Py_ssize_t maxcount) *Return value: New reference.* Replace at most *maxcount* occurrences of *substr* in *str* with *replstr* and return the resulting Unicode object. *maxcount* == "-1" means replace all occurrences. int PyUnicode_Compare(PyObject *left, PyObject *right) Compare two strings and return "-1", "0", "1" for less than, equal, and greater than, respectively. This function returns "-1" upon failure, so one should call "PyErr_Occurred()" to check for errors. int PyUnicode_CompareWithASCIIString(PyObject *uni, const char *string) Compare a Unicode object, *uni*, with *string* and return "-1", "0", "1" for less than, equal, and greater than, respectively. It is best to pass only ASCII-encoded strings, but the function interprets the input string as ISO-8859-1 if it contains non-ASCII characters. This function does not raise exceptions. PyObject* PyUnicode_RichCompare(PyObject *left, PyObject *right, int op) *Return value: New reference.* Rich compare two Unicode strings and return one of the following: * "NULL" in case an exception was raised * "Py_True" or "Py_False" for successful comparisons * "Py_NotImplemented" in case the type combination is unknown Possible values for *op* are "Py_GT", "Py_GE", "Py_EQ", "Py_NE", "Py_LT", and "Py_LE". PyObject* PyUnicode_Format(PyObject *format, PyObject *args) *Return value: New reference.* Return a new string object from *format* and *args*; this is analogous to "format % args". int PyUnicode_Contains(PyObject *container, PyObject *element) Check whether *element* is contained in *container* and return true or false accordingly. *element* has to coerce to a one element Unicode string. "-1" is returned if there was an error. void PyUnicode_InternInPlace(PyObject **string) Intern the argument **string* in place. The argument must be the address of a pointer variable pointing to a Python Unicode string object. If there is an existing interned string that is the same as **string*, it sets **string* to it (decrementing the reference count of the old string object and incrementing the reference count of the interned string object), otherwise it leaves **string* alone and interns it (incrementing its reference count). (Clarification: even though there is a lot of talk about reference counts, think of this function as reference-count-neutral; you own the object after the call if and only if you owned it before the call.) PyObject* PyUnicode_InternFromString(const char *v) *Return value: New reference.* A combination of "PyUnicode_FromString()" and "PyUnicode_InternInPlace()", returning either a new Unicode string object that has been interned, or a new (“owned”) reference to an earlier interned string object with the same value. Utilities ********* The functions in this chapter perform various utility tasks, ranging from helping C code be more portable across platforms, using Python modules from C, and parsing function arguments and constructing Python values from C values. * Operating System Utilities * System Functions * Process Control * Importing Modules * Data marshalling support * Parsing arguments and building values * Parsing arguments * Strings and buffers * Numbers * Other objects * API Functions * Building values * String conversion and formatting * Reflection * Codec registry and support functions * Codec lookup API * Registry API for Unicode encoding error handlers The Very High Level Layer ************************* The functions in this chapter will let you execute Python source code given in a file or a buffer, but they will not let you interact in a more detailed way with the interpreter. Several of these functions accept a start symbol from the grammar as a parameter. The available start symbols are "Py_eval_input", "Py_file_input", and "Py_single_input". These are described following the functions which accept them as parameters. Note also that several of these functions take "FILE*" parameters. One particular issue which needs to be handled carefully is that the "FILE" structure for different C libraries can be different and incompatible. Under Windows (at least), it is possible for dynamically linked extensions to actually use different libraries, so care should be taken that "FILE*" parameters are only passed to these functions if it is certain that they were created by the same library that the Python runtime is using. int Py_Main(int argc, wchar_t **argv) The main program for the standard interpreter. This is made available for programs which embed Python. The *argc* and *argv* parameters should be prepared exactly as those which are passed to a C program’s "main()" function (converted to wchar_t according to the user’s locale). It is important to note that the argument list may be modified (but the contents of the strings pointed to by the argument list are not). The return value will be "0" if the interpreter exits normally (i.e., without an exception), "1" if the interpreter exits due to an exception, or "2" if the parameter list does not represent a valid Python command line. Note that if an otherwise unhandled "SystemExit" is raised, this function will not return "1", but exit the process, as long as "Py_InspectFlag" is not set. int Py_BytesMain(int argc, char **argv) Similar to "Py_Main()" but *argv* is an array of bytes strings. New in version 3.8. int PyRun_AnyFile(FILE *fp, const char *filename) This is a simplified interface to "PyRun_AnyFileExFlags()" below, leaving *closeit* set to "0" and *flags* set to "NULL". int PyRun_AnyFileFlags(FILE *fp, const char *filename, PyCompilerFlags *flags) This is a simplified interface to "PyRun_AnyFileExFlags()" below, leaving the *closeit* argument set to "0". int PyRun_AnyFileEx(FILE *fp, const char *filename, int closeit) This is a simplified interface to "PyRun_AnyFileExFlags()" below, leaving the *flags* argument set to "NULL". int PyRun_AnyFileExFlags(FILE *fp, const char *filename, int closeit, PyCompilerFlags *flags) If *fp* refers to a file associated with an interactive device (console or terminal input or Unix pseudo-terminal), return the value of "PyRun_InteractiveLoop()", otherwise return the result of "PyRun_SimpleFile()". *filename* is decoded from the filesystem encoding ("sys.getfilesystemencoding()"). If *filename* is "NULL", this function uses ""???"" as the filename. If *closeit* is true, the file is closed before "PyRun_SimpleFileExFlags()" returns. int PyRun_SimpleString(const char *command) This is a simplified interface to "PyRun_SimpleStringFlags()" below, leaving the "PyCompilerFlags"* argument set to "NULL". int PyRun_SimpleStringFlags(const char *command, PyCompilerFlags *flags) Executes the Python source code from *command* in the "__main__" module according to the *flags* argument. If "__main__" does not already exist, it is created. Returns "0" on success or "-1" if an exception was raised. If there was an error, there is no way to get the exception information. For the meaning of *flags*, see below. Note that if an otherwise unhandled "SystemExit" is raised, this function will not return "-1", but exit the process, as long as "Py_InspectFlag" is not set. int PyRun_SimpleFile(FILE *fp, const char *filename) This is a simplified interface to "PyRun_SimpleFileExFlags()" below, leaving *closeit* set to "0" and *flags* set to "NULL". int PyRun_SimpleFileEx(FILE *fp, const char *filename, int closeit) This is a simplified interface to "PyRun_SimpleFileExFlags()" below, leaving *flags* set to "NULL". int PyRun_SimpleFileExFlags(FILE *fp, const char *filename, int closeit, PyCompilerFlags *flags) Similar to "PyRun_SimpleStringFlags()", but the Python source code is read from *fp* instead of an in-memory string. *filename* should be the name of the file, it is decoded from the filesystem encoding ("sys.getfilesystemencoding()"). If *closeit* is true, the file is closed before PyRun_SimpleFileExFlags returns. Note: On Windows, *fp* should be opened as binary mode (e.g. "fopen(filename, "rb")"). Otherwise, Python may not handle script file with LF line ending correctly. int PyRun_InteractiveOne(FILE *fp, const char *filename) This is a simplified interface to "PyRun_InteractiveOneFlags()" below, leaving *flags* set to "NULL". int PyRun_InteractiveOneFlags(FILE *fp, const char *filename, PyCompilerFlags *flags) Read and execute a single statement from a file associated with an interactive device according to the *flags* argument. The user will be prompted using "sys.ps1" and "sys.ps2". *filename* is decoded from the filesystem encoding ("sys.getfilesystemencoding()"). Returns "0" when the input was executed successfully, "-1" if there was an exception, or an error code from the "errcode.h" include file distributed as part of Python if there was a parse error. (Note that "errcode.h" is not included by "Python.h", so must be included specifically if needed.) int PyRun_InteractiveLoop(FILE *fp, const char *filename) This is a simplified interface to "PyRun_InteractiveLoopFlags()" below, leaving *flags* set to "NULL". int PyRun_InteractiveLoopFlags(FILE *fp, const char *filename, PyCompilerFlags *flags) Read and execute statements from a file associated with an interactive device until EOF is reached. The user will be prompted using "sys.ps1" and "sys.ps2". *filename* is decoded from the filesystem encoding ("sys.getfilesystemencoding()"). Returns "0" at EOF or a negative number upon failure. int (*PyOS_InputHook)(void) Can be set to point to a function with the prototype "int func(void)". The function will be called when Python’s interpreter prompt is about to become idle and wait for user input from the terminal. The return value is ignored. Overriding this hook can be used to integrate the interpreter’s prompt with other event loops, as done in the "Modules/_tkinter.c" in the Python source code. char* (*PyOS_ReadlineFunctionPointer)(FILE *, FILE *, const char *) Can be set to point to a function with the prototype "char *func(FILE *stdin, FILE *stdout, char *prompt)", overriding the default function used to read a single line of input at the interpreter’s prompt. The function is expected to output the string *prompt* if it’s not "NULL", and then read a line of input from the provided standard input file, returning the resulting string. For example, The "readline" module sets this hook to provide line-editing and tab-completion features. The result must be a string allocated by "PyMem_RawMalloc()" or "PyMem_RawRealloc()", or "NULL" if an error occurred. Changed in version 3.4: The result must be allocated by "PyMem_RawMalloc()" or "PyMem_RawRealloc()", instead of being allocated by "PyMem_Malloc()" or "PyMem_Realloc()". struct _node* PyParser_SimpleParseString(const char *str, int start) This is a simplified interface to "PyParser_SimpleParseStringFlagsFilename()" below, leaving *filename* set to "NULL" and *flags* set to "0". Deprecated since version 3.9, will be removed in version 3.10. struct _node* PyParser_SimpleParseStringFlags(const char *str, int start, int flags) This is a simplified interface to "PyParser_SimpleParseStringFlagsFilename()" below, leaving *filename* set to "NULL". Deprecated since version 3.9, will be removed in version 3.10. struct _node* PyParser_SimpleParseStringFlagsFilename(const char *str, const char *filename, int start, int flags) Parse Python source code from *str* using the start token *start* according to the *flags* argument. The result can be used to create a code object which can be evaluated efficiently. This is useful if a code fragment must be evaluated many times. *filename* is decoded from the filesystem encoding ("sys.getfilesystemencoding()"). Deprecated since version 3.9, will be removed in version 3.10. struct _node* PyParser_SimpleParseFile(FILE *fp, const char *filename, int start) This is a simplified interface to "PyParser_SimpleParseFileFlags()" below, leaving *flags* set to "0". Deprecated since version 3.9, will be removed in version 3.10. struct _node* PyParser_SimpleParseFileFlags(FILE *fp, const char *filename, int start, int flags) Similar to "PyParser_SimpleParseStringFlagsFilename()", but the Python source code is read from *fp* instead of an in-memory string. Deprecated since version 3.9, will be removed in version 3.10. PyObject* PyRun_String(const char *str, int start, PyObject *globals, PyObject *locals) *Return value: New reference.* This is a simplified interface to "PyRun_StringFlags()" below, leaving *flags* set to "NULL". PyObject* PyRun_StringFlags(const char *str, int start, PyObject *globals, PyObject *locals, PyCompilerFlags *flags) *Return value: New reference.* Execute Python source code from *str* in the context specified by the objects *globals* and *locals* with the compiler flags specified by *flags*. *globals* must be a dictionary; *locals* can be any object that implements the mapping protocol. The parameter *start* specifies the start token that should be used to parse the source code. Returns the result of executing the code as a Python object, or "NULL" if an exception was raised. PyObject* PyRun_File(FILE *fp, const char *filename, int start, PyObject *globals, PyObject *locals) *Return value: New reference.* This is a simplified interface to "PyRun_FileExFlags()" below, leaving *closeit* set to "0" and *flags* set to "NULL". PyObject* PyRun_FileEx(FILE *fp, const char *filename, int start, PyObject *globals, PyObject *locals, int closeit) *Return value: New reference.* This is a simplified interface to "PyRun_FileExFlags()" below, leaving *flags* set to "NULL". PyObject* PyRun_FileFlags(FILE *fp, const char *filename, int start, PyObject *globals, PyObject *locals, PyCompilerFlags *flags) *Return value: New reference.* This is a simplified interface to "PyRun_FileExFlags()" below, leaving *closeit* set to "0". PyObject* PyRun_FileExFlags(FILE *fp, const char *filename, int start, PyObject *globals, PyObject *locals, int closeit, PyCompilerFlags *flags) *Return value: New reference.* Similar to "PyRun_StringFlags()", but the Python source code is read from *fp* instead of an in-memory string. *filename* should be the name of the file, it is decoded from the filesystem encoding ("sys.getfilesystemencoding()"). If *closeit* is true, the file is closed before "PyRun_FileExFlags()" returns. PyObject* Py_CompileString(const char *str, const char *filename, int start) *Return value: New reference.* This is a simplified interface to "Py_CompileStringFlags()" below, leaving *flags* set to "NULL". PyObject* Py_CompileStringFlags(const char *str, const char *filename, int start, PyCompilerFlags *flags) *Return value: New reference.* This is a simplified interface to "Py_CompileStringExFlags()" below, with *optimize* set to "-1". PyObject* Py_CompileStringObject(const char *str, PyObject *filename, int start, PyCompilerFlags *flags, int optimize) *Return value: New reference.* Parse and compile the Python source code in *str*, returning the resulting code object. The start token is given by *start*; this can be used to constrain the code which can be compiled and should be "Py_eval_input", "Py_file_input", or "Py_single_input". The filename specified by *filename* is used to construct the code object and may appear in tracebacks or "SyntaxError" exception messages. This returns "NULL" if the code cannot be parsed or compiled. The integer *optimize* specifies the optimization level of the compiler; a value of "-1" selects the optimization level of the interpreter as given by "-O" options. Explicit levels are "0" (no optimization; "__debug__" is true), "1" (asserts are removed, "__debug__" is false) or "2" (docstrings are removed too). New in version 3.4. PyObject* Py_CompileStringExFlags(const char *str, const char *filename, int start, PyCompilerFlags *flags, int optimize) *Return value: New reference.* Like "Py_CompileStringObject()", but *filename* is a byte string decoded from the filesystem encoding ("os.fsdecode()"). New in version 3.2. PyObject* PyEval_EvalCode(PyObject *co, PyObject *globals, PyObject *locals) *Return value: New reference.* This is a simplified interface to "PyEval_EvalCodeEx()", with just the code object, and global and local variables. The other arguments are set to "NULL". PyObject* PyEval_EvalCodeEx(PyObject *co, PyObject *globals, PyObject *locals, PyObject *const *args, int argcount, PyObject *const *kws, int kwcount, PyObject *const *defs, int defcount, PyObject *kwdefs, PyObject *closure) *Return value: New reference.* Evaluate a precompiled code object, given a particular environment for its evaluation. This environment consists of a dictionary of global variables, a mapping object of local variables, arrays of arguments, keywords and defaults, a dictionary of default values for keyword-only arguments and a closure tuple of cells. PyFrameObject The C structure of the objects used to describe frame objects. The fields of this type are subject to change at any time. PyObject* PyEval_EvalFrame(PyFrameObject *f) *Return value: New reference.* Evaluate an execution frame. This is a simplified interface to "PyEval_EvalFrameEx()", for backward compatibility. PyObject* PyEval_EvalFrameEx(PyFrameObject *f, int throwflag) *Return value: New reference.* This is the main, unvarnished function of Python interpretation. The code object associated with the execution frame *f* is executed, interpreting bytecode and executing calls as needed. The additional *throwflag* parameter can mostly be ignored - if true, then it causes an exception to immediately be thrown; this is used for the "throw()" methods of generator objects. Changed in version 3.4: This function now includes a debug assertion to help ensure that it does not silently discard an active exception. int PyEval_MergeCompilerFlags(PyCompilerFlags *cf) This function changes the flags of the current evaluation frame, and returns true on success, false on failure. int Py_eval_input The start symbol from the Python grammar for isolated expressions; for use with "Py_CompileString()". int Py_file_input The start symbol from the Python grammar for sequences of statements as read from a file or other source; for use with "Py_CompileString()". This is the symbol to use when compiling arbitrarily long Python source code. int Py_single_input The start symbol from the Python grammar for a single statement; for use with "Py_CompileString()". This is the symbol used for the interactive interpreter loop. struct PyCompilerFlags This is the structure used to hold compiler flags. In cases where code is only being compiled, it is passed as "int flags", and in cases where code is being executed, it is passed as "PyCompilerFlags *flags". In this case, "from __future__ import" can modify *flags*. Whenever "PyCompilerFlags *flags" is "NULL", "cf_flags" is treated as equal to "0", and any modification due to "from __future__ import" is discarded. int cf_flags Compiler flags. int cf_feature_version *cf_feature_version* is the minor Python version. It should be initialized to "PY_MINOR_VERSION". The field is ignored by default, it is used if and only if "PyCF_ONLY_AST" flag is set in *cf_flags*. Changed in version 3.8: Added *cf_feature_version* field. int CO_FUTURE_DIVISION This bit can be set in *flags* to cause division operator "/" to be interpreted as “true division” according to **PEP 238**. Weak Reference Objects ********************** Python supports *weak references* as first-class objects. There are two specific object types which directly implement weak references. The first is a simple reference object, and the second acts as a proxy for the original object as much as it can. int PyWeakref_Check(ob) Return true if *ob* is either a reference or proxy object. This function always succeeds. int PyWeakref_CheckRef(ob) Return true if *ob* is a reference object. This function always succeeds. int PyWeakref_CheckProxy(ob) Return true if *ob* is a proxy object. This function always succeeds. PyObject* PyWeakref_NewRef(PyObject *ob, PyObject *callback) *Return value: New reference.* Return a weak reference object for the object *ob*. This will always return a new reference, but is not guaranteed to create a new object; an existing reference object may be returned. The second parameter, *callback*, can be a callable object that receives notification when *ob* is garbage collected; it should accept a single parameter, which will be the weak reference object itself. *callback* may also be "None" or "NULL". If *ob* is not a weakly-referencable object, or if *callback* is not callable, "None", or "NULL", this will return "NULL" and raise "TypeError". PyObject* PyWeakref_NewProxy(PyObject *ob, PyObject *callback) *Return value: New reference.* Return a weak reference proxy object for the object *ob*. This will always return a new reference, but is not guaranteed to create a new object; an existing proxy object may be returned. The second parameter, *callback*, can be a callable object that receives notification when *ob* is garbage collected; it should accept a single parameter, which will be the weak reference object itself. *callback* may also be "None" or "NULL". If *ob* is not a weakly- referencable object, or if *callback* is not callable, "None", or "NULL", this will return "NULL" and raise "TypeError". PyObject* PyWeakref_GetObject(PyObject *ref) *Return value: Borrowed reference.* Return the referenced object from a weak reference, *ref*. If the referent is no longer live, returns "Py_None". Note: This function returns a **borrowed reference** to the referenced object. This means that you should always call "Py_INCREF()" on the object except if you know that it cannot be destroyed while you are still using it. PyObject* PyWeakref_GET_OBJECT(PyObject *ref) *Return value: Borrowed reference.* Similar to "PyWeakref_GetObject()", but implemented as a macro that does no error checking. Python Documentation contents ***************************** * What’s New in Python * What’s New In Python 3.9 * Summary – Release highlights * You should check for DeprecationWarning in your code * New Features * Dictionary Merge & Update Operators * New String Methods to Remove Prefixes and Suffixes * Type Hinting Generics in Standard Collections * New Parser * Other Language Changes * New Modules * zoneinfo * graphlib * Improved Modules * ast * asyncio * compileall * concurrent.futures * curses * datetime * distutils * fcntl * ftplib * gc * hashlib * http * IDLE and idlelib * imaplib * importlib * inspect * ipaddress * math * multiprocessing * nntplib * os * pathlib * pdb * poplib * pprint * pydoc * random * signal * smtplib * socket * time * sys * tempfile * tracemalloc * typing * unicodedata * venv * xml * Optimizations * Deprecated * Removed * Porting to Python 3.9 * Changes in the Python API * Changes in the C API * CPython bytecode changes * Build Changes * C API Changes * New Features * Porting to Python 3.9 * Removed * Notable changes in Python 3.9.1 * typing * macOS 11.0 (Big Sur) and Apple Silicon Mac support * Notable changes in Python 3.9.2 * collections.abc * urllib.parse * Notable changes in Python 3.9.3 * Notable changes in Python 3.9.5 * urllib.parse * Notable security feature in 3.9.14 * Notable Changes in 3.9.17 * tarfile * Notable changes in 3.9.20 * ipaddress * email * Notable changes in 3.9.23 * os.path * tarfile * What’s New In Python 3.8 * Summary – Release highlights * New Features * Assignment expressions * Positional-only parameters * Parallel filesystem cache for compiled bytecode files * Debug build uses the same ABI as release build * f-strings support "=" for self-documenting expressions and debugging * PEP 578: Python Runtime Audit Hooks * PEP 587: Python Initialization Configuration * PEP 590: Vectorcall: a fast calling protocol for CPython * Pickle protocol 5 with out-of-band data buffers * Other Language Changes * New Modules * Improved Modules * ast * asyncio * builtins * collections * cProfile * csv * curses * ctypes * datetime * functools * gc * gettext * gzip * IDLE and idlelib * inspect * io * itertools * json.tool * logging * math * mmap * multiprocessing * os * os.path * pathlib * pickle * plistlib * pprint * py_compile * shlex * shutil * socket * ssl * statistics * sys * tarfile * threading * tokenize * tkinter * time * typing * unicodedata * unittest * venv * weakref * xml * xmlrpc * Optimizations * Build and C API Changes * Deprecated * API and Feature Removals * Porting to Python 3.8 * Changes in Python behavior * Changes in the Python API * Changes in the C API * CPython bytecode changes * Demos and Tools * Notable changes in Python 3.8.1 * Notable changes in Python 3.8.8 * Notable changes in Python 3.8.12 * What’s New In Python 3.7 * Summary – Release Highlights * New Features * PEP 563: Postponed Evaluation of Annotations * PEP 538: Legacy C Locale Coercion * PEP 540: Forced UTF-8 Runtime Mode * PEP 553: Built-in "breakpoint()" * PEP 539: New C API for Thread-Local Storage * PEP 562: Customization of Access to Module Attributes * PEP 564: New Time Functions With Nanosecond Resolution * PEP 565: Show DeprecationWarning in "__main__" * PEP 560: Core Support for "typing" module and Generic Types * PEP 552: Hash-based .pyc Files * PEP 545: Python Documentation Translations * Python Development Mode (-X dev) * Other Language Changes * New Modules * contextvars * dataclasses * importlib.resources * Improved Modules * argparse * asyncio * binascii * calendar * collections * compileall * concurrent.futures * contextlib * cProfile * crypt * datetime * dbm * decimal * dis * distutils * enum * functools * gc * hmac * http.client * http.server * idlelib and IDLE * importlib * io * ipaddress * itertools * locale * logging * math * mimetypes * msilib * multiprocessing * os * pathlib * pdb * py_compile * pydoc * queue * re * signal * socket * socketserver * sqlite3 * ssl * string * subprocess * sys * time * tkinter * tracemalloc * types * unicodedata * unittest * unittest.mock * urllib.parse * uu * uuid * warnings * xml.etree * xmlrpc.server * zipapp * zipfile * C API Changes * Build Changes * Optimizations * Other CPython Implementation Changes * Deprecated Python Behavior * Deprecated Python modules, functions and methods * aifc * asyncio * collections * dbm * enum * gettext * importlib * locale * macpath * threading * socket * ssl * sunau * sys * wave * Deprecated functions and types of the C API * Platform Support Removals * API and Feature Removals * Module Removals * Windows-only Changes * Porting to Python 3.7 * Changes in Python Behavior * Changes in the Python API * Changes in the C API * CPython bytecode changes * Windows-only Changes * Other CPython implementation changes * Notable changes in Python 3.7.1 * Notable changes in Python 3.7.2 * Notable changes in Python 3.7.6 * Notable changes in Python 3.7.10 * What’s New In Python 3.6 * Summary – Release highlights * New Features * PEP 498: Formatted string literals * PEP 526: Syntax for variable annotations * PEP 515: Underscores in Numeric Literals * PEP 525: Asynchronous Generators * PEP 530: Asynchronous Comprehensions * PEP 487: Simpler customization of class creation * PEP 487: Descriptor Protocol Enhancements * PEP 519: Adding a file system path protocol * PEP 495: Local Time Disambiguation * PEP 529: Change Windows filesystem encoding to UTF-8 * PEP 528: Change Windows console encoding to UTF-8 * PEP 520: Preserving Class Attribute Definition Order * PEP 468: Preserving Keyword Argument Order * New *dict* implementation * PEP 523: Adding a frame evaluation API to CPython * PYTHONMALLOC environment variable * DTrace and SystemTap probing support * Other Language Changes * New Modules * secrets * Improved Modules * array * ast * asyncio * binascii * cmath * collections * concurrent.futures * contextlib * datetime * decimal * distutils * email * encodings * enum * faulthandler * fileinput * hashlib * http.client * idlelib and IDLE * importlib * inspect * json * logging * math * multiprocessing * os * pathlib * pdb * pickle * pickletools * pydoc * random * re * readline * rlcompleter * shlex * site * sqlite3 * socket * socketserver * ssl * statistics * struct * subprocess * sys * telnetlib * time * timeit * tkinter * traceback * tracemalloc * typing * unicodedata * unittest.mock * urllib.request * urllib.robotparser * venv * warnings * winreg * winsound * xmlrpc.client * zipfile * zlib * Optimizations * Build and C API Changes * Other Improvements * Deprecated * New Keywords * Deprecated Python behavior * Deprecated Python modules, functions and methods * asynchat * asyncore * dbm * distutils * grp * importlib * os * re * ssl * tkinter * venv * Deprecated functions and types of the C API * Deprecated Build Options * Removed * API and Feature Removals * Porting to Python 3.6 * Changes in ‘python’ Command Behavior * Changes in the Python API * Changes in the C API * CPython bytecode changes * Notable changes in Python 3.6.2 * New "make regen-all" build target * Removal of "make touch" build target * Notable changes in Python 3.6.4 * Notable changes in Python 3.6.5 * Notable changes in Python 3.6.7 * Notable changes in Python 3.6.10 * Notable changes in Python 3.6.13 * What’s New In Python 3.5 * Summary – Release highlights * New Features * PEP 492 - Coroutines with async and await syntax * PEP 465 - A dedicated infix operator for matrix multiplication * PEP 448 - Additional Unpacking Generalizations * PEP 461 - percent formatting support for bytes and bytearray * PEP 484 - Type Hints * PEP 471 - os.scandir() function – a better and faster directory iterator * PEP 475: Retry system calls failing with EINTR * PEP 479: Change StopIteration handling inside generators * PEP 485: A function for testing approximate equality * PEP 486: Make the Python Launcher aware of virtual environments * PEP 488: Elimination of PYO files * PEP 489: Multi-phase extension module initialization * Other Language Changes * New Modules * typing * zipapp * Improved Modules * argparse * asyncio * bz2 * cgi * cmath * code * collections * collections.abc * compileall * concurrent.futures * configparser * contextlib * csv * curses * dbm * difflib * distutils * doctest * email * enum * faulthandler * functools * glob * gzip * heapq * http * http.client * idlelib and IDLE * imaplib * imghdr * importlib * inspect * io * ipaddress * json * linecache * locale * logging * lzma * math * multiprocessing * operator * os * pathlib * pickle * poplib * re * readline * selectors * shutil * signal * smtpd * smtplib * sndhdr * socket * ssl * Memory BIO Support * Application-Layer Protocol Negotiation Support * Other Changes * sqlite3 * subprocess * sys * sysconfig * tarfile * threading * time * timeit * tkinter * traceback * types * unicodedata * unittest * unittest.mock * urllib * wsgiref * xmlrpc * xml.sax * zipfile * Other module-level changes * Optimizations * Build and C API Changes * Deprecated * New Keywords * Deprecated Python Behavior * Unsupported Operating Systems * Deprecated Python modules, functions and methods * Removed * API and Feature Removals * Porting to Python 3.5 * Changes in Python behavior * Changes in the Python API * Changes in the C API * Notable changes in Python 3.5.4 * New "make regen-all" build target * Removal of "make touch" build target * What’s New In Python 3.4 * Summary – Release Highlights * New Features * PEP 453: Explicit Bootstrapping of PIP in Python Installations * Bootstrapping pip By Default * Documentation Changes * PEP 446: Newly Created File Descriptors Are Non-Inheritable * Improvements to Codec Handling * PEP 451: A ModuleSpec Type for the Import System * Other Language Changes * New Modules * asyncio * ensurepip * enum * pathlib * selectors * statistics * tracemalloc * Improved Modules * abc * aifc * argparse * audioop * base64 * collections * colorsys * contextlib * dbm * dis * doctest * email * filecmp * functools * gc * glob * hashlib * hmac * html * http * idlelib and IDLE * importlib * inspect * ipaddress * logging * marshal * mmap * multiprocessing * operator * os * pdb * pickle * plistlib * poplib * pprint * pty * pydoc * re * resource * select * shelve * shutil * smtpd * smtplib * socket * sqlite3 * ssl * stat * struct * subprocess * sunau * sys * tarfile * textwrap * threading * traceback * types * urllib * unittest * venv * wave * weakref * xml.etree * zipfile * CPython Implementation Changes * PEP 445: Customization of CPython Memory Allocators * PEP 442: Safe Object Finalization * PEP 456: Secure and Interchangeable Hash Algorithm * PEP 436: Argument Clinic * Other Build and C API Changes * Other Improvements * Significant Optimizations * Deprecated * Deprecations in the Python API * Deprecated Features * Removed * Operating Systems No Longer Supported * API and Feature Removals * Code Cleanups * Porting to Python 3.4 * Changes in ‘python’ Command Behavior * Changes in the Python API * Changes in the C API * Changed in 3.4.3 * PEP 476: Enabling certificate verification by default for stdlib http clients * What’s New In Python 3.3 * Summary – Release highlights * PEP 405: Virtual Environments * PEP 420: Implicit Namespace Packages * PEP 3118: New memoryview implementation and buffer protocol documentation * Features * API changes * PEP 393: Flexible String Representation * Functionality * Performance and resource usage * PEP 397: Python Launcher for Windows * PEP 3151: Reworking the OS and IO exception hierarchy * PEP 380: Syntax for Delegating to a Subgenerator * PEP 409: Suppressing exception context * PEP 414: Explicit Unicode literals * PEP 3155: Qualified name for classes and functions * PEP 412: Key-Sharing Dictionary * PEP 362: Function Signature Object * PEP 421: Adding sys.implementation * SimpleNamespace * Using importlib as the Implementation of Import * New APIs * Visible Changes * Other Language Changes * A Finer-Grained Import Lock * Builtin functions and types * New Modules * faulthandler * ipaddress * lzma * Improved Modules * abc * array * base64 * binascii * bz2 * codecs * collections * contextlib * crypt * curses * datetime * decimal * Features * API changes * email * Policy Framework * Provisional Policy with New Header API * Other API Changes * ftplib * functools * gc * hmac * http * html * imaplib * inspect * io * itertools * logging * math * mmap * multiprocessing * nntplib * os * pdb * pickle * pydoc * re * sched * select * shlex * shutil * signal * smtpd * smtplib * socket * socketserver * sqlite3 * ssl * stat * struct * subprocess * sys * tarfile * tempfile * textwrap * threading * time * types * unittest * urllib * webbrowser * xml.etree.ElementTree * zlib * Optimizations * Build and C API Changes * Deprecated * Unsupported Operating Systems * Deprecated Python modules, functions and methods * Deprecated functions and types of the C API * Deprecated features * Porting to Python 3.3 * Porting Python code * Porting C code * Building C extensions * Command Line Switch Changes * What’s New In Python 3.2 * PEP 384: Defining a Stable ABI * PEP 389: Argparse Command Line Parsing Module * PEP 391: Dictionary Based Configuration for Logging * PEP 3148: The "concurrent.futures" module * PEP 3147: PYC Repository Directories * PEP 3149: ABI Version Tagged .so Files * PEP 3333: Python Web Server Gateway Interface v1.0.1 * Other Language Changes * New, Improved, and Deprecated Modules * email * elementtree * functools * itertools * collections * threading * datetime and time * math * abc * io * reprlib * logging * csv * contextlib * decimal and fractions * ftp * popen * select * gzip and zipfile * tarfile * hashlib * ast * os * shutil * sqlite3 * html * socket * ssl * nntp * certificates * imaplib * http.client * unittest * random * poplib * asyncore * tempfile * inspect * pydoc * dis * dbm * ctypes * site * sysconfig * pdb * configparser * urllib.parse * mailbox * turtledemo * Multi-threading * Optimizations * Unicode * Codecs * Documentation * IDLE * Code Repository * Build and C API Changes * Porting to Python 3.2 * What’s New In Python 3.1 * PEP 372: Ordered Dictionaries * PEP 378: Format Specifier for Thousands Separator * Other Language Changes * New, Improved, and Deprecated Modules * Optimizations * IDLE * Build and C API Changes * Porting to Python 3.1 * What’s New In Python 3.0 * Common Stumbling Blocks * Print Is A Function * Views And Iterators Instead Of Lists * Ordering Comparisons * Integers * Text Vs. Data Instead Of Unicode Vs. 8-bit * Overview Of Syntax Changes * New Syntax * Changed Syntax * Removed Syntax * Changes Already Present In Python 2.6 * Library Changes * **PEP 3101**: A New Approach To String Formatting * Changes To Exceptions * Miscellaneous Other Changes * Operators And Special Methods * Builtins * Build and C API Changes * Performance * Porting To Python 3.0 * What’s New in Python 2.7 * The Future for Python 2.x * Changes to the Handling of Deprecation Warnings * Python 3.1 Features * PEP 372: Adding an Ordered Dictionary to collections * PEP 378: Format Specifier for Thousands Separator * PEP 389: The argparse Module for Parsing Command Lines * PEP 391: Dictionary-Based Configuration For Logging * PEP 3106: Dictionary Views * PEP 3137: The memoryview Object * Other Language Changes * Interpreter Changes * Optimizations * New and Improved Modules * New module: importlib * New module: sysconfig * ttk: Themed Widgets for Tk * Updated module: unittest * Updated module: ElementTree 1.3 * Build and C API Changes * Capsules * Port-Specific Changes: Windows * Port-Specific Changes: Mac OS X * Port-Specific Changes: FreeBSD * Other Changes and Fixes * Porting to Python 2.7 * New Features Added to Python 2.7 Maintenance Releases * Two new environment variables for debug mode * PEP 434: IDLE Enhancement Exception for All Branches * PEP 466: Network Security Enhancements for Python 2.7 * PEP 477: Backport ensurepip (PEP 453) to Python 2.7 * Bootstrapping pip By Default * Documentation Changes * PEP 476: Enabling certificate verification by default for stdlib http clients * PEP 493: HTTPS verification migration tools for Python 2.7 * New "make regen-all" build target * Removal of "make touch" build target * Acknowledgements * What’s New in Python 2.6 * Python 3.0 * Changes to the Development Process * New Issue Tracker: Roundup * New Documentation Format: reStructuredText Using Sphinx * PEP 343: The ‘with’ statement * Writing Context Managers * The contextlib module * PEP 366: Explicit Relative Imports From a Main Module * PEP 370: Per-user "site-packages" Directory * PEP 371: The "multiprocessing" Package * PEP 3101: Advanced String Formatting * PEP 3105: "print" As a Function * PEP 3110: Exception-Handling Changes * PEP 3112: Byte Literals * PEP 3116: New I/O Library * PEP 3118: Revised Buffer Protocol * PEP 3119: Abstract Base Classes * PEP 3127: Integer Literal Support and Syntax * PEP 3129: Class Decorators * PEP 3141: A Type Hierarchy for Numbers * The "fractions" Module * Other Language Changes * Optimizations * Interpreter Changes * New and Improved Modules * The "ast" module * The "future_builtins" module * The "json" module: JavaScript Object Notation * The "plistlib" module: A Property-List Parser * ctypes Enhancements * Improved SSL Support * Deprecations and Removals * Build and C API Changes * Port-Specific Changes: Windows * Port-Specific Changes: Mac OS X * Port-Specific Changes: IRIX * Porting to Python 2.6 * Acknowledgements * What’s New in Python 2.5 * PEP 308: Conditional Expressions * PEP 309: Partial Function Application * PEP 314: Metadata for Python Software Packages v1.1 * PEP 328: Absolute and Relative Imports * PEP 338: Executing Modules as Scripts * PEP 341: Unified try/except/finally * PEP 342: New Generator Features * PEP 343: The ‘with’ statement * Writing Context Managers * The contextlib module * PEP 352: Exceptions as New-Style Classes * PEP 353: Using ssize_t as the index type * PEP 357: The ‘__index__’ method * Other Language Changes * Interactive Interpreter Changes * Optimizations * New, Improved, and Removed Modules * The ctypes package * The ElementTree package * The hashlib package * The sqlite3 package * The wsgiref package * Build and C API Changes * Port-Specific Changes * Porting to Python 2.5 * Acknowledgements * What’s New in Python 2.4 * PEP 218: Built-In Set Objects * PEP 237: Unifying Long Integers and Integers * PEP 289: Generator Expressions * PEP 292: Simpler String Substitutions * PEP 318: Decorators for Functions and Methods * PEP 322: Reverse Iteration * PEP 324: New subprocess Module * PEP 327: Decimal Data Type * Why is Decimal needed? * The "Decimal" type * The "Context" type * PEP 328: Multi-line Imports * PEP 331: Locale-Independent Float/String Conversions * Other Language Changes * Optimizations * New, Improved, and Deprecated Modules * cookielib * doctest * Build and C API Changes * Port-Specific Changes * Porting to Python 2.4 * Acknowledgements * What’s New in Python 2.3 * PEP 218: A Standard Set Datatype * PEP 255: Simple Generators * PEP 263: Source Code Encodings * PEP 273: Importing Modules from ZIP Archives * PEP 277: Unicode file name support for Windows NT * PEP 278: Universal Newline Support * PEP 279: enumerate() * PEP 282: The logging Package * PEP 285: A Boolean Type * PEP 293: Codec Error Handling Callbacks * PEP 301: Package Index and Metadata for Distutils * PEP 302: New Import Hooks * PEP 305: Comma-separated Files * PEP 307: Pickle Enhancements * Extended Slices * Other Language Changes * String Changes * Optimizations * New, Improved, and Deprecated Modules * Date/Time Type * The optparse Module * Pymalloc: A Specialized Object Allocator * Build and C API Changes * Port-Specific Changes * Other Changes and Fixes * Porting to Python 2.3 * Acknowledgements * What’s New in Python 2.2 * Introduction * PEPs 252 and 253: Type and Class Changes * Old and New Classes * Descriptors * Multiple Inheritance: The Diamond Rule * Attribute Access * Related Links * PEP 234: Iterators * PEP 255: Simple Generators * PEP 237: Unifying Long Integers and Integers * PEP 238: Changing the Division Operator * Unicode Changes * PEP 227: Nested Scopes * New and Improved Modules * Interpreter Changes and Fixes * Other Changes and Fixes * Acknowledgements * What’s New in Python 2.1 * Introduction * PEP 227: Nested Scopes * PEP 236: __future__ Directives * PEP 207: Rich Comparisons * PEP 230: Warning Framework * PEP 229: New Build System * PEP 205: Weak References * PEP 232: Function Attributes * PEP 235: Importing Modules on Case-Insensitive Platforms * PEP 217: Interactive Display Hook * PEP 208: New Coercion Model * PEP 241: Metadata in Python Packages * New and Improved Modules * Other Changes and Fixes * Acknowledgements * What’s New in Python 2.0 * Introduction * What About Python 1.6? * New Development Process * Unicode * List Comprehensions * Augmented Assignment * String Methods * Garbage Collection of Cycles * Other Core Changes * Minor Language Changes * Changes to Built-in Functions * Porting to 2.0 * Extending/Embedding Changes * Distutils: Making Modules Easy to Install * XML Modules * SAX2 Support * DOM Support * Relationship to PyXML * Module changes * New modules * IDLE Improvements * Deleted and Deprecated Modules * Acknowledgements * Changelog * Python 3.9.23 final * Security * Library * Python 3.9.22 final * Security * Documentation * Python 3.9.21 final * Tests * Security * Library * Documentation * Python 3.9.20 final * Windows * Tests * Security * Library * Core and Builtins * Python 3.9.19 final * Security * Core and Builtins * Library * Documentation * Windows * Tools/Demos * Python 3.9.18 final * Security * Library * Tools/Demos * C API * Python 3.9.17 final * Security * Core and Builtins * Library * Windows * macOS * Python 3.9.16 final * Security * Python 3.9.15 final * Security * Core and Builtins * Library * Windows * macOS * Python 3.9.14 final * Security * Core and Builtins * Library * Tests * Python 3.9.13 final * Core and Builtins * Library * Documentation * Tests * Build * Windows * Tools/Demos * Python 3.9.12 final * Core and Builtins * Library * Python 3.9.11 final * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Python 3.9.10 final * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * Tools/Demos * C API * Python 3.9.9 final * Core and Builtins * Library * Documentation * Windows * Python 3.9.8 final * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * C API * Python 3.9.7 final * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * Python 3.9.6 final * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.9.5 final * Security * Core and Builtins * Library * Documentation * Tests * Windows * macOS * IDLE * Python 3.9.4 final * Core and Builtins * Library * Python 3.9.3 final * Security * Core and Builtins * Library * Documentation * Tests * Build * macOS * IDLE * Python 3.9.2 final * Windows * Python 3.9.2 release candidate 1 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.9.1 final * Core and Builtins * Library * Tests * macOS * IDLE * Python 3.9.1 release candidate 1 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * C API * Python 3.9.0 final * Library * Tests * Build * Python 3.9.0 release candidate 2 * Core and Builtins * Library * Documentation * Tests * Build * Windows * C API * Python 3.9.0 release candidate 1 * Core and Builtins * Library * Documentation * Windows * IDLE * Python 3.9.0 beta 5 * Security * Core and Builtins * Library * Documentation * macOS * IDLE * C API * Python 3.9.0 beta 4 * Security * Core and Builtins * Library * Tests * Build * Windows * macOS * IDLE * C API * Python 3.9.0 beta 3 * Library * Build * Python 3.9.0 beta 2 * Core and Builtins * Library * Tests * Build * Windows * macOS * IDLE * C API * Python 3.9.0 beta 1 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * Tools/Demos * C API * Python 3.9.0 alpha 6 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.9.0 alpha 5 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.9.0 alpha 4 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * IDLE * C API * Python 3.9.0 alpha 3 * Core and Builtins * Library * Documentation * Build * IDLE * C API * Python 3.9.0 alpha 2 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * C API * Python 3.9.0 alpha 1 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.8.0 beta 1 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.8.0 alpha 4 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.8.0 alpha 3 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * IDLE * Tools/Demos * C API * Python 3.8.0 alpha 2 * Core and Builtins * Library * Documentation * Tests * Windows * IDLE * Python 3.8.0 alpha 1 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.7.0 final * Library * C API * Python 3.7.0 release candidate 1 * Core and Builtins * Library * Documentation * Build * Windows * IDLE * Python 3.7.0 beta 5 * Core and Builtins * Library * Documentation * Tests * Build * macOS * IDLE * Python 3.7.0 beta 4 * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * Python 3.7.0 beta 3 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.7.0 beta 2 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * Python 3.7.0 beta 1 * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * C API * Python 3.7.0 alpha 4 * Core and Builtins * Library * Documentation * Tests * Windows * Tools/Demos * C API * Python 3.7.0 alpha 3 * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.7.0 alpha 2 * Core and Builtins * Library * Documentation * Build * IDLE * C API * Python 3.7.0 alpha 1 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * IDLE * Tools/Demos * C API * Python 3.6.6 final * Python 3.6.6 release candidate 1 * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.6.5 final * Tests * Build * Python 3.6.5 release candidate 1 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.6.4 final * Python 3.6.4 release candidate 1 * Core and Builtins * Library * Documentation * Tests * Build * Windows * macOS * IDLE * Tools/Demos * C API * Python 3.6.3 final * Library * Build * Python 3.6.3 release candidate 1 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * IDLE * Tools/Demos * Python 3.6.2 final * Python 3.6.2 release candidate 2 * Security * Python 3.6.2 release candidate 1 * Security * Core and Builtins * Library * IDLE * C API * Build * Documentation * Tools/Demos * Tests * Windows * Python 3.6.1 final * Core and Builtins * Build * Python 3.6.1 release candidate 1 * Core and Builtins * Library * IDLE * Windows * C API * Documentation * Tests * Build * Python 3.6.0 final * Python 3.6.0 release candidate 2 * Core and Builtins * Tools/Demos * Windows * Build * Python 3.6.0 release candidate 1 * Core and Builtins * Library * C API * Documentation * Tools/Demos * Python 3.6.0 beta 4 * Core and Builtins * Library * Documentation * Tests * Build * Python 3.6.0 beta 3 * Core and Builtins * Library * Windows * Build * Tests * Python 3.6.0 beta 2 * Core and Builtins * Library * Windows * C API * Build * Tests * Python 3.6.0 beta 1 * Core and Builtins * Library * IDLE * C API * Tests * Build * Tools/Demos * Windows * Python 3.6.0 alpha 4 * Core and Builtins * Library * IDLE * Tests * Windows * Build * Python 3.6.0 alpha 3 * Security * Core and Builtins * Library * IDLE * C API * Build * Tools/Demos * Documentation * Tests * Python 3.6.0 alpha 2 * Security * Core and Builtins * Library * IDLE * Documentation * Tests * Windows * Build * C API * Tools/Demos * Python 3.6.0 alpha 1 * Security * Core and Builtins * Library * IDLE * Documentation * Tests * Build * Windows * Tools/Demos * C API * Python 3.5.5 final * Python 3.5.5 release candidate 1 * Security * Core and Builtins * Library * Python 3.5.4 final * Library * Python 3.5.4 release candidate 1 * Security * Core and Builtins * Library * Documentation * Tests * Build * Windows * C API * Python 3.5.3 final * Python 3.5.3 release candidate 1 * Security * Core and Builtins * Library * IDLE * C API * Documentation * Tests * Tools/Demos * Windows * Build * Python 3.5.2 final * Core and Builtins * Tests * IDLE * Python 3.5.2 release candidate 1 * Security * Core and Builtins * Library * IDLE * Documentation * Tests * Build * Windows * Tools/Demos * Python 3.5.1 final * Core and Builtins * Windows * Python 3.5.1 release candidate 1 * Core and Builtins * Library * IDLE * Documentation * Tests * Build * Windows * Tools/Demos * Python 3.5.0 final * Build * Python 3.5.0 release candidate 4 * Library * Build * Python 3.5.0 release candidate 3 * Core and Builtins * Library * Python 3.5.0 release candidate 2 * Core and Builtins * Library * Python 3.5.0 release candidate 1 * Core and Builtins * Library * IDLE * Documentation * Tests * Python 3.5.0 beta 4 * Core and Builtins * Library * Build * Python 3.5.0 beta 3 * Core and Builtins * Library * Tests * Documentation * Build * Python 3.5.0 beta 2 * Core and Builtins * Library * Python 3.5.0 beta 1 * Core and Builtins * Library * IDLE * Tests * Documentation * Tools/Demos * Python 3.5.0 alpha 4 * Core and Builtins * Library * Build * Tests * Tools/Demos * C API * Python 3.5.0 alpha 3 * Core and Builtins * Library * Build * Tests * Tools/Demos * Python 3.5.0 alpha 2 * Core and Builtins * Library * Build * C API * Windows * Python 3.5.0 alpha 1 * Core and Builtins * Library * IDLE * Build * C API * Documentation * Tests * Tools/Demos * Windows * The Python Tutorial * 1. Whetting Your Appetite * 2. Using the Python Interpreter * 2.1. Invoking the Interpreter * 2.1.1. Argument Passing * 2.1.2. Interactive Mode * 2.2. The Interpreter and Its Environment * 2.2.1. Source Code Encoding * 3. An Informal Introduction to Python * 3.1. Using Python as a Calculator * 3.1.1. Numbers * 3.1.2. Strings * 3.1.3. Lists * 3.2. First Steps Towards Programming * 4. More Control Flow Tools * 4.1. "if" Statements * 4.2. "for" Statements * 4.3. The "range()" Function * 4.4. "break" and "continue" Statements, and "else" Clauses on Loops * 4.5. "pass" Statements * 4.6. Defining Functions * 4.7. More on Defining Functions * 4.7.1. Default Argument Values * 4.7.2. Keyword Arguments * 4.7.3. Special parameters * 4.7.3.1. Positional-or-Keyword Arguments * 4.7.3.2. Positional-Only Parameters * 4.7.3.3. Keyword-Only Arguments * 4.7.3.4. Function Examples * 4.7.3.5. Recap * 4.7.4. Arbitrary Argument Lists * 4.7.5. Unpacking Argument Lists * 4.7.6. Lambda Expressions * 4.7.7. Documentation Strings * 4.7.8. Function Annotations * 4.8. Intermezzo: Coding Style * 5. Data Structures * 5.1. More on Lists * 5.1.1. Using Lists as Stacks * 5.1.2. Using Lists as Queues * 5.1.3. List Comprehensions * 5.1.4. Nested List Comprehensions * 5.2. The "del" statement * 5.3. Tuples and Sequences * 5.4. Sets * 5.5. Dictionaries * 5.6. Looping Techniques * 5.7. More on Conditions * 5.8. Comparing Sequences and Other Types * 6. Modules * 6.1. More on Modules * 6.1.1. Executing modules as scripts * 6.1.2. The Module Search Path * 6.1.3. “Compiled” Python files * 6.2. Standard Modules * 6.3. The "dir()" Function * 6.4. Packages * 6.4.1. Importing * From a Package * 6.4.2. Intra-package References * 6.4.3. Packages in Multiple Directories * 7. Input and Output * 7.1. Fancier Output Formatting * 7.1.1. Formatted String Literals * 7.1.2. The String format() Method * 7.1.3. Manual String Formatting * 7.1.4. Old string formatting * 7.2. Reading and Writing Files * 7.2.1. Methods of File Objects * 7.2.2. Saving structured data with "json" * 8. Errors and Exceptions * 8.1. Syntax Errors * 8.2. Exceptions * 8.3. Handling Exceptions * 8.4. Raising Exceptions * 8.5. Exception Chaining * 8.6. User-defined Exceptions * 8.7. Defining Clean-up Actions * 8.8. Predefined Clean-up Actions * 9. Classes * 9.1. A Word About Names and Objects * 9.2. Python Scopes and Namespaces * 9.2.1. Scopes and Namespaces Example * 9.3. A First Look at Classes * 9.3.1. Class Definition Syntax * 9.3.2. Class Objects * 9.3.3. Instance Objects * 9.3.4. Method Objects * 9.3.5. Class and Instance Variables * 9.4. Random Remarks * 9.5. Inheritance * 9.5.1. Multiple Inheritance * 9.6. Private Variables * 9.7. Odds and Ends * 9.8. Iterators * 9.9. Generators * 9.10. Generator Expressions * 10. Brief Tour of the Standard Library * 10.1. Operating System Interface * 10.2. File Wildcards * 10.3. Command Line Arguments * 10.4. Error Output Redirection and Program Termination * 10.5. String Pattern Matching * 10.6. Mathematics * 10.7. Internet Access * 10.8. Dates and Times * 10.9. Data Compression * 10.10. Performance Measurement * 10.11. Quality Control * 10.12. Batteries Included * 11. Brief Tour of the Standard Library — Part II * 11.1. Output Formatting * 11.2. Templating * 11.3. Working with Binary Data Record Layouts * 11.4. Multi-threading * 11.5. Logging * 11.6. Weak References * 11.7. Tools for Working with Lists * 11.8. Decimal Floating Point Arithmetic * 12. Virtual Environments and Packages * 12.1. Introduction * 12.2. Creating Virtual Environments * 12.3. Managing Packages with pip * 13. What Now? * 14. Interactive Input Editing and History Substitution * 14.1. Tab Completion and History Editing * 14.2. Alternatives to the Interactive Interpreter * 15. Floating Point Arithmetic: Issues and Limitations * 15.1. Representation Error * 16. Appendix * 16.1. Interactive Mode * 16.1.1. Error Handling * 16.1.2. Executable Python Scripts * 16.1.3. The Interactive Startup File * 16.1.4. The Customization Modules * Python Setup and Usage * 1. Command line and environment * 1.1. Command line * 1.1.1. Interface options * 1.1.2. Generic options * 1.1.3. Miscellaneous options * 1.1.4. Options you shouldn’t use * 1.2. Environment variables * 1.2.1. Debug-mode variables * 2. Using Python on Unix platforms * 2.1. Getting and installing the latest version of Python * 2.1.1. On Linux * 2.1.2. On FreeBSD and OpenBSD * 2.1.3. On OpenSolaris * 2.2. Building Python * 2.3. Python-related paths and files * 2.4. Miscellaneous * 3. Using Python on Windows * 3.1. The full installer * 3.1.1. Installation steps * 3.1.2. Removing the MAX_PATH Limitation * 3.1.3. Installing Without UI * 3.1.4. Installing Without Downloading * 3.1.5. Modifying an install * 3.2. The Microsoft Store package * 3.2.1. Known Issues * 3.3. The nuget.org packages * 3.4. The embeddable package * 3.4.1. Python Application * 3.4.2. Embedding Python * 3.5. Alternative bundles * 3.6. Configuring Python * 3.6.1. Excursus: Setting environment variables * 3.6.2. Finding the Python executable * 3.7. UTF-8 mode * 3.8. Python Launcher for Windows * 3.8.1. Getting started * 3.8.1.1. From the command-line * 3.8.1.2. Virtual environments * 3.8.1.3. From a script * 3.8.1.4. From file associations * 3.8.2. Shebang Lines * 3.8.3. Arguments in shebang lines * 3.8.4. Customization * 3.8.4.1. Customization via INI files * 3.8.4.2. Customizing default Python versions * 3.8.5. Diagnostics * 3.9. Finding modules * 3.10. Additional modules * 3.10.1. PyWin32 * 3.10.2. cx_Freeze * 3.11. Compiling Python on Windows * 3.12. Other Platforms * 4. Using Python on a Mac * 4.1. Getting and Installing MacPython * 4.1.1. How to run a Python script * 4.1.2. Running scripts with a GUI * 4.1.3. Configuration * 4.2. The IDE * 4.3. Installing Additional Python Packages * 4.4. GUI Programming on the Mac * 4.5. Distributing Python Applications on the Mac * 4.6. Other Resources * 5. Editors and IDEs * The Python Language Reference * 1. Introduction * 1.1. Alternate Implementations * 1.2. Notation * 2. Lexical analysis * 2.1. Line structure * 2.1.1. Logical lines * 2.1.2. Physical lines * 2.1.3. Comments * 2.1.4. Encoding declarations * 2.1.5. Explicit line joining * 2.1.6. Implicit line joining * 2.1.7. Blank lines * 2.1.8. Indentation * 2.1.9. Whitespace between tokens * 2.2. Other tokens * 2.3. Identifiers and keywords * 2.3.1. Keywords * 2.3.2. Reserved classes of identifiers * 2.4. Literals * 2.4.1. String and Bytes literals * 2.4.2. String literal concatenation * 2.4.3. Formatted string literals * 2.4.4. Numeric literals * 2.4.5. Integer literals * 2.4.6. Floating point literals * 2.4.7. Imaginary literals * 2.5. Operators * 2.6. Delimiters * 3. Data model * 3.1. Objects, values and types * 3.2. The standard type hierarchy * 3.3. Special method names * 3.3.1. Basic customization * 3.3.2. Customizing attribute access * 3.3.2.1. Customizing module attribute access * 3.3.2.2. Implementing Descriptors * 3.3.2.3. Invoking Descriptors * 3.3.2.4. __slots__ * 3.3.2.4.1. Notes on using *__slots__* * 3.3.3. Customizing class creation * 3.3.3.1. Metaclasses * 3.3.3.2. Resolving MRO entries * 3.3.3.3. Determining the appropriate metaclass * 3.3.3.4. Preparing the class namespace * 3.3.3.5. Executing the class body * 3.3.3.6. Creating the class object * 3.3.3.7. Uses for metaclasses * 3.3.4. Customizing instance and subclass checks * 3.3.5. Emulating generic types * 3.3.5.1. The purpose of *__class_getitem__* * 3.3.5.2. *__class_getitem__* versus *__getitem__* * 3.3.6. Emulating callable objects * 3.3.7. Emulating container types * 3.3.8. Emulating numeric types * 3.3.9. With Statement Context Managers * 3.3.10. Special method lookup * 3.4. Coroutines * 3.4.1. Awaitable Objects * 3.4.2. Coroutine Objects * 3.4.3. Asynchronous Iterators * 3.4.4. Asynchronous Context Managers * 4. Execution model * 4.1. Structure of a program * 4.2. Naming and binding * 4.2.1. Binding of names * 4.2.2. Resolution of names * 4.2.3. Builtins and restricted execution * 4.2.4. Interaction with dynamic features * 4.3. Exceptions * 5. The import system * 5.1. "importlib" * 5.2. Packages * 5.2.1. Regular packages * 5.2.2. Namespace packages * 5.3. Searching * 5.3.1. The module cache * 5.3.2. Finders and loaders * 5.3.3. Import hooks * 5.3.4. The meta path * 5.4. Loading * 5.4.1. Loaders * 5.4.2. Submodules * 5.4.3. Module spec * 5.4.4. Import-related module attributes * 5.4.5. module.__path__ * 5.4.6. Module reprs * 5.4.7. Cached bytecode invalidation * 5.5. The Path Based Finder * 5.5.1. Path entry finders * 5.5.2. Path entry finder protocol * 5.6. Replacing the standard import system * 5.7. Package Relative Imports * 5.8. Special considerations for __main__ * 5.8.1. __main__.__spec__ * 5.9. Open issues * 5.10. References * 6. Expressions * 6.1. Arithmetic conversions * 6.2. Atoms * 6.2.1. Identifiers (Names) * 6.2.2. Literals * 6.2.3. Parenthesized forms * 6.2.4. Displays for lists, sets and dictionaries * 6.2.5. List displays * 6.2.6. Set displays * 6.2.7. Dictionary displays * 6.2.8. Generator expressions * 6.2.9. Yield expressions * 6.2.9.1. Generator-iterator methods * 6.2.9.2. Examples * 6.2.9.3. Asynchronous generator functions * 6.2.9.4. Asynchronous generator-iterator methods * 6.3. Primaries * 6.3.1. Attribute references * 6.3.2. Subscriptions * 6.3.3. Slicings * 6.3.4. Calls * 6.4. Await expression * 6.5. The power operator * 6.6. Unary arithmetic and bitwise operations * 6.7. Binary arithmetic operations * 6.8. Shifting operations * 6.9. Binary bitwise operations * 6.10. Comparisons * 6.10.1. Value comparisons * 6.10.2. Membership test operations * 6.10.3. Identity comparisons * 6.11. Boolean operations * 6.12. Assignment expressions * 6.13. Conditional expressions * 6.14. Lambdas * 6.15. Expression lists * 6.16. Evaluation order * 6.17. Operator precedence * 7. Simple statements * 7.1. Expression statements * 7.2. Assignment statements * 7.2.1. Augmented assignment statements * 7.2.2. Annotated assignment statements * 7.3. The "assert" statement * 7.4. The "pass" statement * 7.5. The "del" statement * 7.6. The "return" statement * 7.7. The "yield" statement * 7.8. The "raise" statement * 7.9. The "break" statement * 7.10. The "continue" statement * 7.11. The "import" statement * 7.11.1. Future statements * 7.12. The "global" statement * 7.13. The "nonlocal" statement * 8. Compound statements * 8.1. The "if" statement * 8.2. The "while" statement * 8.3. The "for" statement * 8.4. The "try" statement * 8.5. The "with" statement * 8.6. Function definitions * 8.7. Class definitions * 8.8. Coroutines * 8.8.1. Coroutine function definition * 8.8.2. The "async for" statement * 8.8.3. The "async with" statement * 9. Top-level components * 9.1. Complete Python programs * 9.2. File input * 9.3. Interactive input * 9.4. Expression input * 10. Full Grammar specification * The Python Standard Library * Introduction * Notes on availability * Built-in Functions * Built-in Constants * Constants added by the "site" module * Built-in Types * Truth Value Testing * Boolean Operations — "and", "or", "not" * Comparisons * Numeric Types — "int", "float", "complex" * Bitwise Operations on Integer Types * Additional Methods on Integer Types * Additional Methods on Float * Hashing of numeric types * Iterator Types * Generator Types * Sequence Types — "list", "tuple", "range" * Common Sequence Operations * Immutable Sequence Types * Mutable Sequence Types * Lists * Tuples * Ranges * Text Sequence Type — "str" * String Methods * "printf"-style String Formatting * Binary Sequence Types — "bytes", "bytearray", "memoryview" * Bytes Objects * Bytearray Objects * Bytes and Bytearray Operations * "printf"-style Bytes Formatting * Memory Views * Set Types — "set", "frozenset" * Mapping Types — "dict" * Dictionary view objects * Context Manager Types * Generic Alias Type * Standard Generic Classes * Special Attributes of "GenericAlias" objects * Other Built-in Types * Modules * Classes and Class Instances * Functions * Methods * Code Objects * Type Objects * The Null Object * The Ellipsis Object * The NotImplemented Object * Boolean Values * Internal Objects * Special Attributes * Integer string conversion length limitation * Affected APIs * Configuring the limit * Recommended configuration * Built-in Exceptions * Exception context * Inheriting from built-in exceptions * Base classes * Concrete exceptions * OS exceptions * Warnings * Exception hierarchy * Text Processing Services * "string" — Common string operations * String constants * Custom String Formatting * Format String Syntax * Format Specification Mini-Language * Format examples * Template strings * Helper functions * "re" — Regular expression operations * Regular Expression Syntax * Module Contents * Regular Expression Objects * Match Objects * Regular Expression Examples * Checking for a Pair * Simulating scanf() * search() vs. match() * Making a Phonebook * Text Munging * Finding all Adverbs * Finding all Adverbs and their Positions * Raw String Notation * Writing a Tokenizer * "difflib" — Helpers for computing deltas * SequenceMatcher Objects * SequenceMatcher Examples * Differ Objects * Differ Example * A command-line interface to difflib * "textwrap" — Text wrapping and filling * "unicodedata" — Unicode Database * "stringprep" — Internet String Preparation * "readline" — GNU readline interface * Init file * Line buffer * History file * History list * Startup hooks * Completion * Example * "rlcompleter" — Completion function for GNU readline * Completer Objects * Binary Data Services * "struct" — Interpret bytes as packed binary data * Functions and Exceptions * Format Strings * Byte Order, Size, and Alignment * Format Characters * Examples * Classes * "codecs" — Codec registry and base classes * Codec Base Classes * Error Handlers * Stateless Encoding and Decoding * Incremental Encoding and Decoding * IncrementalEncoder Objects * IncrementalDecoder Objects * Stream Encoding and Decoding * StreamWriter Objects * StreamReader Objects * StreamReaderWriter Objects * StreamRecoder Objects * Encodings and Unicode * Standard Encodings * Python Specific Encodings * Text Encodings * Binary Transforms * Text Transforms * "encodings.idna" — Internationalized Domain Names in Applications * "encodings.mbcs" — Windows ANSI codepage * "encodings.utf_8_sig" — UTF-8 codec with BOM signature * Data Types * "datetime" — Basic date and time types * Aware and Naive Objects * Constants * Available Types * Common Properties * Determining if an Object is Aware or Naive * "timedelta" Objects * Examples of usage: "timedelta" * "date" Objects * Examples of Usage: "date" * "datetime" Objects * Examples of Usage: "datetime" * "time" Objects * Examples of Usage: "time" * "tzinfo" Objects * "timezone" Objects * "strftime()" and "strptime()" Behavior * "strftime()" and "strptime()" Format Codes * Technical Detail * "zoneinfo" — IANA time zone support * Using "ZoneInfo" * Data sources * Configuring the data sources * Compile-time configuration * Environment configuration * Runtime configuration * The "ZoneInfo" class * String representations * Pickle serialization * Functions * Globals * Exceptions and warnings * "calendar" — General calendar-related functions * "collections" — Container datatypes * "ChainMap" objects * "ChainMap" Examples and Recipes * "Counter" objects * "deque" objects * "deque" Recipes * "defaultdict" objects * "defaultdict" Examples * "namedtuple()" Factory Function for Tuples with Named Fields * "OrderedDict" objects * "OrderedDict" Examples and Recipes * "UserDict" objects * "UserList" objects * "UserString" objects * "collections.abc" — Abstract Base Classes for Containers * Collections Abstract Base Classes * "heapq" — Heap queue algorithm * Basic Examples * Priority Queue Implementation Notes * Theory * "bisect" — Array bisection algorithm * Searching Sorted Lists * Other Examples * "array" — Efficient arrays of numeric values * "weakref" — Weak references * Weak Reference Objects * Example * Finalizer Objects * Comparing finalizers with "__del__()" methods * "types" — Dynamic type creation and names for built-in types * Dynamic Type Creation * Standard Interpreter Types * Additional Utility Classes and Functions * Coroutine Utility Functions * "copy" — Shallow and deep copy operations * "pprint" — Data pretty printer * PrettyPrinter Objects * Example * "reprlib" — Alternate "repr()" implementation * Repr Objects * Subclassing Repr Objects * "enum" — Support for enumerations * Module Contents * Creating an Enum * Programmatic access to enumeration members and their attributes * Duplicating enum members and values * Ensuring unique enumeration values * Using automatic values * Iteration * Comparisons * Allowed members and attributes of enumerations * Restricted Enum subclassing * Pickling * Functional API * Derived Enumerations * IntEnum * IntFlag * Flag * Others * When to use "__new__()" vs. "__init__()" * Interesting examples * Omitting values * Using "auto" * Using "object" * Using a descriptive string * Using a custom "__new__()" * OrderedEnum * DuplicateFreeEnum * Planet * TimePeriod * How are Enums different? * Enum Classes * Enum Members (aka instances) * Finer Points * Supported "__dunder__" names * Supported "_sunder_" names * _Private__names * "Enum" member type * Boolean value of "Enum" classes and members * "Enum" classes with methods * Combining members of "Flag" * "graphlib" — Functionality to operate with graph-like structures * Exceptions * Numeric and Mathematical Modules * "numbers" — Numeric abstract base classes * The numeric tower * Notes for type implementors * Adding More Numeric ABCs * Implementing the arithmetic operations * "math" — Mathematical functions * Number-theoretic and representation functions * Power and logarithmic functions * Trigonometric functions * Angular conversion * Hyperbolic functions * Special functions * Constants * "cmath" — Mathematical functions for complex numbers * Conversions to and from polar coordinates * Power and logarithmic functions * Trigonometric functions * Hyperbolic functions * Classification functions * Constants * "decimal" — Decimal fixed point and floating point arithmetic * Quick-start Tutorial * Decimal objects * Logical operands * Context objects * Constants * Rounding modes * Signals * Floating Point Notes * Mitigating round-off error with increased precision * Special values * Working with threads * Recipes * Decimal FAQ * "fractions" — Rational numbers * "random" — Generate pseudo-random numbers * Bookkeeping functions * Functions for bytes * Functions for integers * Functions for sequences * Real-valued distributions * Alternative Generator * Notes on Reproducibility * Examples * Recipes * "statistics" — Mathematical statistics functions * Averages and measures of central location * Measures of spread * Function details * Exceptions * "NormalDist" objects * "NormalDist" Examples and Recipes * Functional Programming Modules * "itertools" — Functions creating iterators for efficient looping * Itertool functions * Itertools Recipes * "functools" — Higher-order functions and operations on callable objects * "partial" Objects * "operator" — Standard operators as functions * Mapping Operators to Functions * In-place Operators * File and Directory Access * "pathlib" — Object-oriented filesystem paths * Basic use * Pure paths * General properties * Operators * Accessing individual parts * Methods and properties * Concrete paths * Methods * Correspondence to tools in the "os" module * "os.path" — Common pathname manipulations * "fileinput" — Iterate over lines from multiple input streams * "stat" — Interpreting "stat()" results * "filecmp" — File and Directory Comparisons * The "dircmp" class * "tempfile" — Generate temporary files and directories * Examples * Deprecated functions and variables * "glob" — Unix style pathname pattern expansion * "fnmatch" — Unix filename pattern matching * "linecache" — Random access to text lines * "shutil" — High-level file operations * Directory and files operations * Platform-dependent efficient copy operations * copytree example * rmtree example * Archiving operations * Archiving example * Archiving example with *base_dir* * Querying the size of the output terminal * Data Persistence * "pickle" — Python object serialization * Relationship to other Python modules * Comparison with "marshal" * Comparison with "json" * Data stream format * Module Interface * What can be pickled and unpickled? * Pickling Class Instances * Persistence of External Objects * Dispatch Tables * Handling Stateful Objects * Custom Reduction for Types, Functions, and Other Objects * Out-of-band Buffers * Provider API * Consumer API * Example * Restricting Globals * Performance * Examples * "copyreg" — Register "pickle" support functions * Example * "shelve" — Python object persistence * Restrictions * Example * "marshal" — Internal Python object serialization * "dbm" — Interfaces to Unix “databases” * "dbm.gnu" — GNU’s reinterpretation of dbm * "dbm.ndbm" — Interface based on ndbm * "dbm.dumb" — Portable DBM implementation * "sqlite3" — DB-API 2.0 interface for SQLite databases * Module functions and constants * Connection Objects * Cursor Objects * Row Objects * Exceptions * SQLite and Python types * Introduction * Using adapters to store additional Python types in SQLite databases * Letting your object adapt itself * Registering an adapter callable * Converting SQLite values to custom Python types * Default adapters and converters * Controlling Transactions * Using "sqlite3" efficiently * Using shortcut methods * Accessing columns by name instead of by index * Using the connection as a context manager * Data Compression and Archiving * "zlib" — Compression compatible with **gzip** * "gzip" — Support for **gzip** files * Examples of usage * Command Line Interface * Command line options * "bz2" — Support for **bzip2** compression * (De)compression of files * Incremental (de)compression * One-shot (de)compression * Examples of usage * "lzma" — Compression using the LZMA algorithm * Reading and writing compressed files * Compressing and decompressing data in memory * Miscellaneous * Specifying custom filter chains * Examples * "zipfile" — Work with ZIP archives * ZipFile Objects * Path Objects * PyZipFile Objects * ZipInfo Objects * Command-Line Interface * Command-line options * Decompression pitfalls * From file itself * File System limitations * Resources limitations * Interruption * Default behaviors of extraction * "tarfile" — Read and write tar archive files * TarFile Objects * TarInfo Objects * Extraction filters * Default named filters * Filter errors * Hints for further verification * Supporting older Python versions * Stateful extraction filter example * Command-Line Interface * Command-line options * Examples * Supported tar formats * Unicode issues * File Formats * "csv" — CSV File Reading and Writing * Module Contents * Dialects and Formatting Parameters * Reader Objects * Writer Objects * Examples * "configparser" — Configuration file parser * Quick Start * Supported Datatypes * Fallback Values * Supported INI File Structure * Interpolation of values * Mapping Protocol Access * Customizing Parser Behaviour * Legacy API Examples * ConfigParser Objects * RawConfigParser Objects * Exceptions * "netrc" — netrc file processing * netrc Objects * "plistlib" — Generate and parse Apple ".plist" files * Examples * Cryptographic Services * "hashlib" — Secure hashes and message digests * Hash algorithms * SHAKE variable length digests * Key derivation * BLAKE2 * Creating hash objects * Constants * Examples * Simple hashing * Using different digest sizes * Keyed hashing * Randomized hashing * Personalization * Tree mode * Credits * "hmac" — Keyed-Hashing for Message Authentication * "secrets" — Generate secure random numbers for managing secrets * Random numbers * Generating tokens * How many bytes should tokens use? * Other functions * Recipes and best practices * Generic Operating System Services * "os" — Miscellaneous operating system interfaces * File Names, Command Line Arguments, and Environment Variables * Process Parameters * File Object Creation * File Descriptor Operations * Querying the size of a terminal * Inheritance of File Descriptors * Files and Directories * Linux extended attributes * Process Management * Interface to the scheduler * Miscellaneous System Information * Random numbers * "io" — Core tools for working with streams * Overview * Text I/O * Binary I/O * Raw I/O * High-level Module Interface * Class hierarchy * I/O Base Classes * Raw File I/O * Buffered Streams * Text I/O * Performance * Binary I/O * Text I/O * Multi-threading * Reentrancy * "time" — Time access and conversions * Functions * Clock ID Constants * Timezone Constants * "argparse" — Parser for command-line options, arguments and sub- commands * Example * Creating a parser * Adding arguments * Parsing arguments * ArgumentParser objects * prog * usage * description * epilog * parents * formatter_class * prefix_chars * fromfile_prefix_chars * argument_default * allow_abbrev * conflict_handler * add_help * exit_on_error * The add_argument() method * name or flags * action * nargs * const * default * type * choices * required * help * metavar * dest * Action classes * The parse_args() method * Option value syntax * Invalid arguments * Arguments containing "-" * Argument abbreviations (prefix matching) * Beyond "sys.argv" * The Namespace object * Other utilities * Sub-commands * FileType objects * Argument groups * Mutual exclusion * Parser defaults * Printing help * Partial parsing * Customizing file parsing * Exiting methods * Intermixed parsing * Upgrading optparse code * "getopt" — C-style parser for command line options * "logging" — Logging facility for Python * Logger Objects * Logging Levels * Handler Objects * Formatter Objects * Filter Objects * LogRecord Objects * LogRecord attributes * LoggerAdapter Objects * Thread Safety * Module-Level Functions * Module-Level Attributes * Integration with the warnings module * "logging.config" — Logging configuration * Configuration functions * Security considerations * Configuration dictionary schema * Dictionary Schema Details * Incremental Configuration * Object connections * User-defined objects * Access to external objects * Access to internal objects * Import resolution and custom importers * Configuration file format * "logging.handlers" — Logging handlers * StreamHandler * FileHandler * NullHandler * WatchedFileHandler * BaseRotatingHandler * RotatingFileHandler * TimedRotatingFileHandler * SocketHandler * DatagramHandler * SysLogHandler * NTEventLogHandler * SMTPHandler * MemoryHandler * HTTPHandler * QueueHandler * QueueListener * "getpass" — Portable password input * "curses" — Terminal handling for character-cell displays * Functions * Window Objects * Constants * "curses.textpad" — Text input widget for curses programs * Textbox objects * "curses.ascii" — Utilities for ASCII characters * "curses.panel" — A panel stack extension for curses * Functions * Panel Objects * "platform" — Access to underlying platform’s identifying data * Cross Platform * Java Platform * Windows Platform * macOS Platform * Unix Platforms * "errno" — Standard errno system symbols * "ctypes" — A foreign function library for Python * ctypes tutorial * Loading dynamic link libraries * Accessing functions from loaded dlls * Calling functions * Fundamental data types * Calling functions, continued * Calling functions with your own custom data types * Specifying the required argument types (function prototypes) * Return types * Passing pointers (or: passing parameters by reference) * Structures and unions * Structure/union alignment and byte order * Bit fields in structures and unions * Arrays * Pointers * Type conversions * Incomplete Types * Callback functions * Accessing values exported from dlls * Surprises * Variable-sized data types * ctypes reference * Finding shared libraries * Loading shared libraries * Foreign functions * Function prototypes * Utility functions * Data types * Fundamental data types * Structured data types * Arrays and pointers * Concurrent Execution * "threading" — Thread-based parallelism * Thread-Local Data * Thread Objects * Lock Objects * RLock Objects * Condition Objects * Semaphore Objects * "Semaphore" Example * Event Objects * Timer Objects * Barrier Objects * Using locks, conditions, and semaphores in the "with" statement * "multiprocessing" — Process-based parallelism * Introduction * The "Process" class * Contexts and start methods * Exchanging objects between processes * Synchronization between processes * Sharing state between processes * Using a pool of workers * Reference * "Process" and exceptions * Pipes and Queues * Miscellaneous * Connection Objects * Synchronization primitives * Shared "ctypes" Objects * The "multiprocessing.sharedctypes" module * Managers * Customized managers * Using a remote manager * Proxy Objects * Cleanup * Process Pools * Listeners and Clients * Address Formats * Authentication keys * Logging * The "multiprocessing.dummy" module * Programming guidelines * All start methods * The *spawn* and *forkserver* start methods * Examples * "multiprocessing.shared_memory" — Provides shared memory for direct access across processes * The "concurrent" package * "concurrent.futures" — Launching parallel tasks * Executor Objects * ThreadPoolExecutor * ThreadPoolExecutor Example * ProcessPoolExecutor * ProcessPoolExecutor Example * Future Objects * Module Functions * Exception classes * "subprocess" — Subprocess management * Using the "subprocess" Module * Frequently Used Arguments * Popen Constructor * Exceptions * Security Considerations * Popen Objects * Windows Popen Helpers * Windows Constants * Older high-level API * Replacing Older Functions with the "subprocess" Module * Replacing **/bin/sh** shell command substitution * Replacing shell pipeline * Replacing "os.system()" * Replacing the "os.spawn" family * Replacing "os.popen()", "os.popen2()", "os.popen3()" * Replacing functions from the "popen2" module * Legacy Shell Invocation Functions * Notes * Converting an argument sequence to a string on Windows * "sched" — Event scheduler * Scheduler Objects * "queue" — A synchronized queue class * Queue Objects * SimpleQueue Objects * "contextvars" — Context Variables * Context Variables * Manual Context Management * asyncio support * "_thread" — Low-level threading API * Networking and Interprocess Communication * "asyncio" — Asynchronous I/O * Coroutines and Tasks * Coroutines * Awaitables * Running an asyncio Program * Creating Tasks * Sleeping * Running Tasks Concurrently * Shielding From Cancellation * Timeouts * Waiting Primitives * Running in Threads * Scheduling From Other Threads * Introspection * Task Object * Generator-based Coroutines * Streams * StreamReader * StreamWriter * Examples * TCP echo client using streams * TCP echo server using streams * Get HTTP headers * Register an open socket to wait for data using streams * Synchronization Primitives * Lock * Event * Condition * Semaphore * BoundedSemaphore * Subprocesses * Creating Subprocesses * Constants * Interacting with Subprocesses * Subprocess and Threads * Examples * Queues * Queue * Priority Queue * LIFO Queue * Exceptions * Examples * Exceptions * Event Loop * Event Loop Methods * Running and stopping the loop * Scheduling callbacks * Scheduling delayed callbacks * Creating Futures and Tasks * Opening network connections * Creating network servers * Transferring files * TLS Upgrade * Watching file descriptors * Working with socket objects directly * DNS * Working with pipes * Unix signals * Executing code in thread or process pools * Error Handling API * Enabling debug mode * Running Subprocesses * Callback Handles * Server Objects * Event Loop Implementations * Examples * Hello World with call_soon() * Display the current date with call_later() * Watch a file descriptor for read events * Set signal handlers for SIGINT and SIGTERM * Futures * Future Functions * Future Object * Transports and Protocols * Transports * Transports Hierarchy * Base Transport * Read-only Transports * Write-only Transports * Datagram Transports * Subprocess Transports * Protocols * Base Protocols * Base Protocol * Streaming Protocols * Buffered Streaming Protocols * Datagram Protocols * Subprocess Protocols * Examples * TCP Echo Server * TCP Echo Client * UDP Echo Server * UDP Echo Client * Connecting Existing Sockets * loop.subprocess_exec() and SubprocessProtocol * Policies * Getting and Setting the Policy * Policy Objects * Process Watchers * Custom Policies * Platform Support * All Platforms * Windows * Subprocess Support on Windows * macOS * High-level API Index * Tasks * Queues * Subprocesses * Streams * Synchronization * Exceptions * Low-level API Index * Obtaining the Event Loop * Event Loop Methods * Transports * Protocols * Event Loop Policies * Developing with asyncio * Debug Mode * Concurrency and Multithreading * Running Blocking Code * Logging * Detect never-awaited coroutines * Detect never-retrieved exceptions * "socket" — Low-level networking interface * Socket families * Module contents * Exceptions * Constants * Functions * Creating sockets * Other functions * Socket Objects * Notes on socket timeouts * Timeouts and the "connect" method * Timeouts and the "accept" method * Example * "ssl" — TLS/SSL wrapper for socket objects * Functions, Constants, and Exceptions * Socket creation * Context creation * Exceptions * Random generation * Certificate handling * Constants * SSL Sockets * SSL Contexts * Certificates * Certificate chains * CA certificates * Combined key and certificate * Self-signed certificates * Examples * Testing for SSL support * Client-side operation * Server-side operation * Notes on non-blocking sockets * Memory BIO Support * SSL session * Security considerations * Best defaults * Manual settings * Verifying certificates * Protocol versions * Cipher selection * Multi-processing * TLS 1.3 * LibreSSL support * "select" — Waiting for I/O completion * "/dev/poll" Polling Objects * Edge and Level Trigger Polling (epoll) Objects * Polling Objects * Kqueue Objects * Kevent Objects * "selectors" — High-level I/O multiplexing * Introduction * Classes * Examples * "signal" — Set handlers for asynchronous events * General rules * Execution of Python signal handlers * Signals and threads * Module contents * Example * Note on SIGPIPE * Note on Signal Handlers and Exceptions * "mmap" — Memory-mapped file support * MADV_* Constants * Internet Data Handling * "email" — An email and MIME handling package * "email.message": Representing an email message * "email.parser": Parsing email messages * FeedParser API * Parser API * Additional notes * "email.generator": Generating MIME documents * "email.policy": Policy Objects * "email.errors": Exception and Defect classes * "email.headerregistry": Custom Header Objects * "email.contentmanager": Managing MIME Content * Content Manager Instances * "email": Examples * "email.message.Message": Representing an email message using the "compat32" API * "email.mime": Creating email and MIME objects from scratch * "email.header": Internationalized headers * "email.charset": Representing character sets * "email.encoders": Encoders * "email.utils": Miscellaneous utilities * "email.iterators": Iterators * "json" — JSON encoder and decoder * Basic Usage * Encoders and Decoders * Exceptions * Standard Compliance and Interoperability * Character Encodings * Infinite and NaN Number Values * Repeated Names Within an Object * Top-level Non-Object, Non-Array Values * Implementation Limitations * Command Line Interface * Command line options * "mailbox" — Manipulate mailboxes in various formats * "Mailbox" objects * "Maildir" * "mbox" * "MH" * "Babyl" * "MMDF" * "Message" objects * "MaildirMessage" * "mboxMessage" * "MHMessage" * "BabylMessage" * "MMDFMessage" * Exceptions * Examples * "mimetypes" — Map filenames to MIME types * MimeTypes Objects * "base64" — Base16, Base32, Base64, Base85 Data Encodings * "binhex" — Encode and decode binhex4 files * Notes * "binascii" — Convert between binary and ASCII * "quopri" — Encode and decode MIME quoted-printable data * Structured Markup Processing Tools * "html" — HyperText Markup Language support * "html.parser" — Simple HTML and XHTML parser * Example HTML Parser Application * "HTMLParser" Methods * Examples * "html.entities" — Definitions of HTML general entities * XML Processing Modules * XML vulnerabilities * The "defusedxml" Package * "xml.etree.ElementTree" — The ElementTree XML API * Tutorial * XML tree and elements * Parsing XML * Pull API for non-blocking parsing * Finding interesting elements * Modifying an XML File * Building XML documents * Parsing XML with Namespaces * XPath support * Example * Supported XPath syntax * Reference * Functions * XInclude support * Example * Reference * Functions * Element Objects * ElementTree Objects * QName Objects * TreeBuilder Objects * XMLParser Objects * XMLPullParser Objects * Exceptions * "xml.dom" — The Document Object Model API * Module Contents * Objects in the DOM * DOMImplementation Objects * Node Objects * NodeList Objects * DocumentType Objects * Document Objects * Element Objects * Attr Objects * NamedNodeMap Objects * Comment Objects * Text and CDATASection Objects * ProcessingInstruction Objects * Exceptions * Conformance * Type Mapping * Accessor Methods * "xml.dom.minidom" — Minimal DOM implementation * DOM Objects * DOM Example * minidom and the DOM standard * "xml.dom.pulldom" — Support for building partial DOM trees * DOMEventStream Objects * "xml.sax" — Support for SAX2 parsers * SAXException Objects * "xml.sax.handler" — Base classes for SAX handlers * ContentHandler Objects * DTDHandler Objects * EntityResolver Objects * ErrorHandler Objects * "xml.sax.saxutils" — SAX Utilities * "xml.sax.xmlreader" — Interface for XML parsers * XMLReader Objects * IncrementalParser Objects * Locator Objects * InputSource Objects * The "Attributes" Interface * The "AttributesNS" Interface * "xml.parsers.expat" — Fast XML parsing using Expat * XMLParser Objects * ExpatError Exceptions * Example * Content Model Descriptions * Expat error constants * Internet Protocols and Support * "webbrowser" — Convenient Web-browser controller * Browser Controller Objects * "wsgiref" — WSGI Utilities and Reference Implementation * "wsgiref.util" – WSGI environment utilities * "wsgiref.headers" – WSGI response header tools * "wsgiref.simple_server" – a simple WSGI HTTP server * "wsgiref.validate" — WSGI conformance checker * "wsgiref.handlers" – server/gateway base classes * Examples * "urllib" — URL handling modules * "urllib.request" — Extensible library for opening URLs * Request Objects * OpenerDirector Objects * BaseHandler Objects * HTTPRedirectHandler Objects * HTTPCookieProcessor Objects * ProxyHandler Objects * HTTPPasswordMgr Objects * HTTPPasswordMgrWithPriorAuth Objects * AbstractBasicAuthHandler Objects * HTTPBasicAuthHandler Objects * ProxyBasicAuthHandler Objects * AbstractDigestAuthHandler Objects * HTTPDigestAuthHandler Objects * ProxyDigestAuthHandler Objects * HTTPHandler Objects * HTTPSHandler Objects * FileHandler Objects * DataHandler Objects * FTPHandler Objects * CacheFTPHandler Objects * UnknownHandler Objects * HTTPErrorProcessor Objects * Examples * Legacy interface * "urllib.request" Restrictions * "urllib.response" — Response classes used by urllib * "urllib.parse" — Parse URLs into components * URL Parsing * URL parsing security * Parsing ASCII Encoded Bytes * Structured Parse Results * URL Quoting * "urllib.error" — Exception classes raised by urllib.request * "urllib.robotparser" — Parser for robots.txt * "http" — HTTP modules * HTTP status codes * "http.client" — HTTP protocol client * HTTPConnection Objects * HTTPResponse Objects * Examples * HTTPMessage Objects * "ftplib" — FTP protocol client * FTP Objects * FTP_TLS Objects * "poplib" — POP3 protocol client * POP3 Objects * POP3 Example * "imaplib" — IMAP4 protocol client * IMAP4 Objects * IMAP4 Example * "smtplib" — SMTP protocol client * SMTP Objects * SMTP Example * "uuid" — UUID objects according to **RFC 4122** * Example * "socketserver" — A framework for network servers * Server Creation Notes * Server Objects * Request Handler Objects * Examples * "socketserver.TCPServer" Example * "socketserver.UDPServer" Example * Asynchronous Mixins * "http.server" — HTTP servers * Security Considerations * "http.cookies" — HTTP state management * Cookie Objects * Morsel Objects * Example * "http.cookiejar" — Cookie handling for HTTP clients * CookieJar and FileCookieJar Objects * FileCookieJar subclasses and co-operation with web browsers * CookiePolicy Objects * DefaultCookiePolicy Objects * Cookie Objects * Examples * "xmlrpc" — XMLRPC server and client modules * "xmlrpc.client" — XML-RPC client access * ServerProxy Objects * DateTime Objects * Binary Objects * Fault Objects * ProtocolError Objects * MultiCall Objects * Convenience Functions * Example of Client Usage * Example of Client and Server Usage * "xmlrpc.server" — Basic XML-RPC servers * SimpleXMLRPCServer Objects * SimpleXMLRPCServer Example * CGIXMLRPCRequestHandler * Documenting XMLRPC server * DocXMLRPCServer Objects * DocCGIXMLRPCRequestHandler * "ipaddress" — IPv4/IPv6 manipulation library * Convenience factory functions * IP Addresses * Address objects * Conversion to Strings and Integers * Operators * Comparison operators * Arithmetic operators * IP Network definitions * Prefix, net mask and host mask * Network objects * Operators * Logical operators * Iteration * Networks as containers of addresses * Interface objects * Operators * Logical operators * Other Module Level Functions * Custom Exceptions * Multimedia Services * "wave" — Read and write WAV files * Wave_read Objects * Wave_write Objects * "colorsys" — Conversions between color systems * Internationalization * "gettext" — Multilingual internationalization services * GNU **gettext** API * Class-based API * The "NullTranslations" class * The "GNUTranslations" class * Solaris message catalog support * The Catalog constructor * Internationalizing your programs and modules * Localizing your module * Localizing your application * Changing languages on the fly * Deferred translations * Acknowledgements * "locale" — Internationalization services * Background, details, hints, tips and caveats * For extension writers and programs that embed Python * Access to message catalogs * Program Frameworks * "turtle" — Turtle graphics * Introduction * Overview of available Turtle and Screen methods * Turtle methods * Methods of TurtleScreen/Screen * Methods of RawTurtle/Turtle and corresponding functions * Turtle motion * Tell Turtle’s state * Settings for measurement * Pen control * Drawing state * Color control * Filling * More drawing control * Turtle state * Visibility * Appearance * Using events * Special Turtle methods * Compound shapes * Methods of TurtleScreen/Screen and corresponding functions * Window control * Animation control * Using screen events * Input methods * Settings and special methods * Methods specific to Screen, not inherited from TurtleScreen * Public classes * Help and configuration * How to use help * Translation of docstrings into different languages * How to configure Screen and Turtles * "turtledemo" — Demo scripts * Changes since Python 2.6 * Changes since Python 3.0 * "cmd" — Support for line-oriented command interpreters * Cmd Objects * Cmd Example * "shlex" — Simple lexical analysis * shlex Objects * Parsing Rules * Improved Compatibility with Shells * Graphical User Interfaces with Tk * "tkinter" — Python interface to Tcl/Tk * Tkinter Modules * Tkinter Life Preserver * How To Use This Section * A Simple Hello World Program * A (Very) Quick Look at Tcl/Tk * Mapping Basic Tk into Tkinter * How Tk and Tkinter are Related * Handy Reference * Setting Options * The Packer * Packer Options * Coupling Widget Variables * The Window Manager * Tk Option Data Types * Bindings and Events * The index Parameter * Images * File Handlers * "tkinter.colorchooser" — Color choosing dialog * "tkinter.font" — Tkinter font wrapper * Tkinter Dialogs * "tkinter.simpledialog" — Standard Tkinter input dialogs * "tkinter.filedialog" — File selection dialogs * Native Load/Save Dialogs * "tkinter.commondialog" — Dialog window templates * "tkinter.messagebox" — Tkinter message prompts * "tkinter.scrolledtext" — Scrolled Text Widget * "tkinter.dnd" — Drag and drop support * "tkinter.ttk" — Tk themed widgets * Using Ttk * Ttk Widgets * Widget * Standard Options * Scrollable Widget Options * Label Options * Compatibility Options * Widget States * ttk.Widget * Combobox * Options * Virtual events * ttk.Combobox * Spinbox * Options * Virtual events * ttk.Spinbox * Notebook * Options * Tab Options * Tab Identifiers * Virtual Events * ttk.Notebook * Progressbar * Options * ttk.Progressbar * Separator * Options * Sizegrip * Platform-specific notes * Bugs * Treeview * Options * Item Options * Tag Options * Column Identifiers * Virtual Events * ttk.Treeview * Ttk Styling * Layouts * "tkinter.tix" — Extension widgets for Tk * Using Tix * Tix Widgets * Basic Widgets * File Selectors * Hierarchical ListBox * Tabular ListBox * Manager Widgets * Image Types * Miscellaneous Widgets * Form Geometry Manager * Tix Commands * IDLE * Menus * File menu (Shell and Editor) * Edit menu (Shell and Editor) * Format menu (Editor window only) * Run menu (Editor window only) * Shell menu (Shell window only) * Debug menu (Shell window only) * Options menu (Shell and Editor) * Window menu (Shell and Editor) * Help menu (Shell and Editor) * Context Menus * Editing and navigation * Editor windows * Key bindings * Automatic indentation * Completions * Calltips * Code Context * Python Shell window * Text colors * Startup and code execution * Command line usage * Startup failure * Running user code * User output in Shell * Developing tkinter applications * Running without a subprocess * Help and preferences * Help sources * Setting preferences * IDLE on macOS * Extensions * Development Tools * "typing" — Support for type hints * Relevant PEPs * Type aliases * NewType * Callable * Generics * User-defined generic types * The "Any" type * Nominal vs structural subtyping * Module contents * Special typing primitives * Special types * Special forms * Building generic types * Other special directives * Generic concrete collections * Corresponding to built-in types * Corresponding to types in "collections" * Other concrete types * Abstract Base Classes * Corresponding to collections in "collections.abc" * Corresponding to other types in "collections.abc" * Asynchronous programming * Context manager types * Protocols * Functions and decorators * Introspection helpers * Constant * "pydoc" — Documentation generator and online help system * Python Development Mode * Effects of the Python Development Mode * ResourceWarning Example * Bad file descriptor error example * "doctest" — Test interactive Python examples * Simple Usage: Checking Examples in Docstrings * Simple Usage: Checking Examples in a Text File * How It Works * Which Docstrings Are Examined? * How are Docstring Examples Recognized? * What’s the Execution Context? * What About Exceptions? * Option Flags * Directives * Warnings * Basic API * Unittest API * Advanced API * DocTest Objects * Example Objects * DocTestFinder objects * DocTestParser objects * DocTestRunner objects * OutputChecker objects * Debugging * Soapbox * "unittest" — Unit testing framework * Basic example * Command-Line Interface * Command-line options * Test Discovery * Organizing test code * Re-using old test code * Skipping tests and expected failures * Distinguishing test iterations using subtests * Classes and functions * Test cases * Deprecated aliases * Grouping tests * Loading and running tests * load_tests Protocol * Class and Module Fixtures * setUpClass and tearDownClass * setUpModule and tearDownModule * Signal Handling * "unittest.mock" — mock object library * Quick Guide * The Mock Class * Calling * Deleting Attributes * Mock names and the name attribute * Attaching Mocks as Attributes * The patchers * patch * patch.object * patch.dict * patch.multiple * patch methods: start and stop * patch builtins * TEST_PREFIX * Nesting Patch Decorators * Where to patch * Patching Descriptors and Proxy Objects * MagicMock and magic method support * Mocking Magic Methods * Magic Mock * Helpers * sentinel * DEFAULT * call * create_autospec * ANY * FILTER_DIR * mock_open * Autospeccing * Sealing mocks * "unittest.mock" — getting started * Using Mock * Mock Patching Methods * Mock for Method Calls on an Object * Mocking Classes * Naming your mocks * Tracking all Calls * Setting Return Values and Attributes * Raising exceptions with mocks * Side effect functions and iterables * Mocking asynchronous iterators * Mocking asynchronous context manager * Creating a Mock from an Existing Object * Patch Decorators * Further Examples * Mocking chained calls * Partial mocking * Mocking a Generator Method * Applying the same patch to every test method * Mocking Unbound Methods * Checking multiple calls with mock * Coping with mutable arguments * Nesting Patches * Mocking a dictionary with MagicMock * Mock subclasses and their attributes * Mocking imports with patch.dict * Tracking order of calls and less verbose call assertions * More complex argument matching * 2to3 - Automated Python 2 to 3 code translation * Using 2to3 * Fixers * "lib2to3" - 2to3’s library * "test" — Regression tests package for Python * Writing Unit Tests for the "test" package * Running tests using the command-line interface * "test.support" — Utilities for the Python test suite * "test.support.socket_helper" — Utilities for socket tests * "test.support.script_helper" — Utilities for the Python execution tests * "test.support.bytecode_helper" — Support tools for testing correct bytecode generation * Debugging and Profiling * Audit events table * "bdb" — Debugger framework * "faulthandler" — Dump the Python traceback * Dumping the traceback * Fault handler state * Dumping the tracebacks after a timeout * Dumping the traceback on a user signal * Issue with file descriptors * Example * "pdb" — The Python Debugger * Debugger Commands * The Python Profilers * Introduction to the profilers * Instant User’s Manual * "profile" and "cProfile" Module Reference * The "Stats" Class * What Is Deterministic Profiling? * Limitations * Calibration * Using a custom timer * "timeit" — Measure execution time of small code snippets * Basic Examples * Python Interface * Command-Line Interface * Examples * "trace" — Trace or track Python statement execution * Command-Line Usage * Main options * Modifiers * Filters * Programmatic Interface * "tracemalloc" — Trace memory allocations * Examples * Display the top 10 * Compute differences * Get the traceback of a memory block * Pretty top * Record the current and peak size of all traced memory blocks * API * Functions * DomainFilter * Filter * Frame * Snapshot * Statistic * StatisticDiff * Trace * Traceback * Software Packaging and Distribution * "distutils" — Building and installing Python modules * "ensurepip" — Bootstrapping the "pip" installer * Command line interface * Module API * "venv" — Creation of virtual environments * Creating virtual environments * API * An example of extending "EnvBuilder" * "zipapp" — Manage executable Python zip archives * Basic Example * Command-Line Interface * Python API * Examples * Specifying the Interpreter * Creating Standalone Applications with zipapp * Making a Windows executable * Caveats * The Python Zip Application Archive Format * Python Runtime Services * "sys" — System-specific parameters and functions * "sysconfig" — Provide access to Python’s configuration information * Configuration variables * Installation paths * Other functions * Using "sysconfig" as a script * "builtins" — Built-in objects * "__main__" — Top-level script environment * "warnings" — Warning control * Warning Categories * The Warnings Filter * Describing Warning Filters * Default Warning Filter * Overriding the default filter * Temporarily Suppressing Warnings * Testing Warnings * Updating Code For New Versions of Dependencies * Available Functions * Available Context Managers * "dataclasses" — Data Classes * Module-level decorators, classes, and functions * Post-init processing * Class variables * Init-only variables * Frozen instances * Inheritance * Default factory functions * Mutable default values * Exceptions * "contextlib" — Utilities for "with"-statement contexts * Utilities * Examples and Recipes * Supporting a variable number of context managers * Catching exceptions from "__enter__" methods * Cleaning up in an "__enter__" implementation * Replacing any use of "try-finally" and flag variables * Using a context manager as a function decorator * Single use, reusable and reentrant context managers * Reentrant context managers * Reusable context managers * "abc" — Abstract Base Classes * "atexit" — Exit handlers * "atexit" Example * "traceback" — Print or retrieve a stack traceback * "TracebackException" Objects * "StackSummary" Objects * "FrameSummary" Objects * Traceback Examples * "__future__" — Future statement definitions * "gc" — Garbage Collector interface * "inspect" — Inspect live objects * Types and members * Retrieving source code * Introspecting callables with the Signature object * Classes and functions * The interpreter stack * Fetching attributes statically * Current State of Generators and Coroutines * Code Objects Bit Flags * Command Line Interface * "site" — Site-specific configuration hook * Readline configuration * Module contents * Command Line Interface * Custom Python Interpreters * "code" — Interpreter base classes * Interactive Interpreter Objects * Interactive Console Objects * "codeop" — Compile Python code * Importing Modules * "zipimport" — Import modules from Zip archives * zipimporter Objects * Examples * "pkgutil" — Package extension utility * "modulefinder" — Find modules used by a script * Example usage of "ModuleFinder" * "runpy" — Locating and executing Python modules * "importlib" — The implementation of "import" * Introduction * Functions * "importlib.abc" – Abstract base classes related to import * "importlib.resources" – Resources * "importlib.machinery" – Importers and path hooks * "importlib.util" – Utility code for importers * Examples * Importing programmatically * Checking if a module can be imported * Importing a source file directly * Setting up an importer * Approximating "importlib.import_module()" * Using "importlib.metadata" * Overview * Functional API * Entry points * Distribution metadata * Distribution versions * Distribution files * Distribution requirements * Distributions * Extending the search algorithm * Python Language Services * "parser" — Access Python parse trees * Creating ST Objects * Converting ST Objects * Queries on ST Objects * Exceptions and Error Handling * ST Objects * Example: Emulation of "compile()" * "ast" — Abstract Syntax Trees * Abstract Grammar * Node classes * Literals * Variables * Expressions * Subscripting * Comprehensions * Statements * Imports * Control flow * Function and class definitions * Async and await * "ast" Helpers * Compiler Flags * Command-Line Usage * "symtable" — Access to the compiler’s symbol tables * Generating Symbol Tables * Examining Symbol Tables * "symbol" — Constants used with Python parse trees * "token" — Constants used with Python parse trees * "keyword" — Testing for Python keywords * "tokenize" — Tokenizer for Python source * Tokenizing Input * Command-Line Usage * Examples * "tabnanny" — Detection of ambiguous indentation * "pyclbr" — Python module browser support * Function Objects * Class Objects * "py_compile" — Compile Python source files * "compileall" — Byte-compile Python libraries * Command-line use * Public functions * "dis" — Disassembler for Python bytecode * Bytecode analysis * Analysis functions * Python Bytecode Instructions * Opcode collections * "pickletools" — Tools for pickle developers * Command line usage * Command line options * Programmatic Interface * Miscellaneous Services * "formatter" — Generic output formatting * The Formatter Interface * Formatter Implementations * The Writer Interface * Writer Implementations * MS Windows Specific Services * "msvcrt" — Useful routines from the MS VC++ runtime * File Operations * Console I/O * Other Functions * "winreg" — Windows registry access * Functions * Constants * HKEY_* Constants * Access Rights * 64-bit Specific * Value Types * Registry Handle Objects * "winsound" — Sound-playing interface for Windows * Unix Specific Services * "posix" — The most common POSIX system calls * Large File Support * Notable Module Contents * "pwd" — The password database * "grp" — The group database * "termios" — POSIX style tty control * Example * "tty" — Terminal control functions * "pty" — Pseudo-terminal utilities * Example * "fcntl" — The "fcntl" and "ioctl" system calls * "resource" — Resource usage information * Resource Limits * Resource Usage * "syslog" — Unix syslog library routines * Examples * Simple example * Superseded Modules * "aifc" — Read and write AIFF and AIFC files * "asynchat" — Asynchronous socket command/response handler * asynchat Example * "asyncore" — Asynchronous socket handler * asyncore Example basic HTTP client * asyncore Example basic echo server * "audioop" — Manipulate raw audio data * "cgi" — Common Gateway Interface support * Introduction * Using the cgi module * Higher Level Interface * Functions * Caring about security * Installing your CGI script on a Unix system * Testing your CGI script * Debugging CGI scripts * Common problems and solutions * "cgitb" — Traceback manager for CGI scripts * "chunk" — Read IFF chunked data * "crypt" — Function to check Unix passwords * Hashing Methods * Module Attributes * Module Functions * Examples * "imghdr" — Determine the type of an image * "imp" — Access the *import* internals * Examples * "mailcap" — Mailcap file handling * "msilib" — Read and write Microsoft Installer files * Database Objects * View Objects * Summary Information Objects * Record Objects * Errors * CAB Objects * Directory Objects * Features * GUI classes * Precomputed tables * "nis" — Interface to Sun’s NIS (Yellow Pages) * "nntplib" — NNTP protocol client * NNTP Objects * Attributes * Methods * Utility functions * "optparse" — Parser for command line options * Background * Terminology * What are options for? * What are positional arguments for? * Tutorial * Understanding option actions * The store action * Handling boolean (flag) options * Other actions * Default values * Generating help * Grouping Options * Printing a version string * How "optparse" handles errors * Putting it all together * Reference Guide * Creating the parser * Populating the parser * Defining options * Option attributes * Standard option actions * Standard option types * Parsing arguments * Querying and manipulating your option parser * Conflicts between options * Cleanup * Other methods * Option Callbacks * Defining a callback option * How callbacks are called * Raising errors in a callback * Callback example 1: trivial callback * Callback example 2: check option order * Callback example 3: check option order (generalized) * Callback example 4: check arbitrary condition * Callback example 5: fixed arguments * Callback example 6: variable arguments * Extending "optparse" * Adding new types * Adding new actions * "ossaudiodev" — Access to OSS-compatible audio devices * Audio Device Objects * Mixer Device Objects * "pipes" — Interface to shell pipelines * Template Objects * "smtpd" — SMTP Server * SMTPServer Objects * DebuggingServer Objects * PureProxy Objects * MailmanProxy Objects * SMTPChannel Objects * "sndhdr" — Determine type of sound file * "spwd" — The shadow password database * "sunau" — Read and write Sun AU files * AU_read Objects * AU_write Objects * "telnetlib" — Telnet client * Telnet Objects * Telnet Example * "uu" — Encode and decode uuencode files * "xdrlib" — Encode and decode XDR data * Packer Objects * Unpacker Objects * Exceptions * Security Considerations * Extending and Embedding the Python Interpreter * Recommended third party tools * Creating extensions without third party tools * 1. Extending Python with C or C++ * 1.1. A Simple Example * 1.2. Intermezzo: Errors and Exceptions * 1.3. Back to the Example * 1.4. The Module’s Method Table and Initialization Function * 1.5. Compilation and Linkage * 1.6. Calling Python Functions from C * 1.7. Extracting Parameters in Extension Functions * 1.8. Keyword Parameters for Extension Functions * 1.9. Building Arbitrary Values * 1.10. Reference Counts * 1.10.1. Reference Counting in Python * 1.10.2. Ownership Rules * 1.10.3. Thin Ice * 1.10.4. NULL Pointers * 1.11. Writing Extensions in C++ * 1.12. Providing a C API for an Extension Module * 2. Defining Extension Types: Tutorial * 2.1. The Basics * 2.2. Adding data and methods to the Basic example * 2.3. Providing finer control over data attributes * 2.4. Supporting cyclic garbage collection * 2.5. Subclassing other types * 3. Defining Extension Types: Assorted Topics * 3.1. Finalization and De-allocation * 3.2. Object Presentation * 3.3. Attribute Management * 3.3.1. Generic Attribute Management * 3.3.2. Type-specific Attribute Management * 3.4. Object Comparison * 3.5. Abstract Protocol Support * 3.6. Weak Reference Support * 3.7. More Suggestions * 4. Building C and C++ Extensions * 4.1. Building C and C++ Extensions with distutils * 4.2. Distributing your extension modules * 5. Building C and C++ Extensions on Windows * 5.1. A Cookbook Approach * 5.2. Differences Between Unix and Windows * 5.3. Using DLLs in Practice * Embedding the CPython runtime in a larger application * 1. Embedding Python in Another Application * 1.1. Very High Level Embedding * 1.2. Beyond Very High Level Embedding: An overview * 1.3. Pure Embedding * 1.4. Extending Embedded Python * 1.5. Embedding Python in C++ * 1.6. Compiling and Linking under Unix-like systems * Python/C API Reference Manual * Introduction * Coding standards * Include Files * Useful macros * Objects, Types and Reference Counts * Reference Counts * Reference Count Details * Types * Exceptions * Embedding Python * Debugging Builds * Stable Application Binary Interface * The Very High Level Layer * Reference Counting * Exception Handling * Printing and clearing * Raising exceptions * Issuing warnings * Querying the error indicator * Signal Handling * Exception Classes * Exception Objects * Unicode Exception Objects * Recursion Control * Standard Exceptions * Standard Warning Categories * Utilities * Operating System Utilities * System Functions * Process Control * Importing Modules * Data marshalling support * Parsing arguments and building values * Parsing arguments * Strings and buffers * Numbers * Other objects * API Functions * Building values * String conversion and formatting * Reflection * Codec registry and support functions * Codec lookup API * Registry API for Unicode encoding error handlers * Abstract Objects Layer * Object Protocol * Call Protocol * The *tp_call* Protocol * The Vectorcall Protocol * Recursion Control * Vectorcall Support API * Object Calling API * Call Support API * Number Protocol * Sequence Protocol * Mapping Protocol * Iterator Protocol * Buffer Protocol * Buffer structure * Buffer request types * request-independent fields * readonly, format * shape, strides, suboffsets * contiguity requests * compound requests * Complex arrays * NumPy-style: shape and strides * PIL-style: shape, strides and suboffsets * Buffer-related functions * Old Buffer Protocol * Concrete Objects Layer * Fundamental Objects * Type Objects * Creating Heap-Allocated Types * The "None" Object * Numeric Objects * Integer Objects * Boolean Objects * Floating Point Objects * Complex Number Objects * Complex Numbers as C Structures * Complex Numbers as Python Objects * Sequence Objects * Bytes Objects * Byte Array Objects * Type check macros * Direct API functions * Macros * Unicode Objects and Codecs * Unicode Objects * Unicode Type * Unicode Character Properties * Creating and accessing Unicode strings * Deprecated Py_UNICODE APIs * Locale Encoding * File System Encoding * wchar_t Support * Built-in Codecs * Generic Codecs * UTF-8 Codecs * UTF-32 Codecs * UTF-16 Codecs * UTF-7 Codecs * Unicode-Escape Codecs * Raw-Unicode-Escape Codecs * Latin-1 Codecs * ASCII Codecs * Character Map Codecs * MBCS codecs for Windows * Methods & Slots * Methods and Slot Functions * Tuple Objects * Struct Sequence Objects * List Objects * Container Objects * Dictionary Objects * Set Objects * Function Objects * Function Objects * Instance Method Objects * Method Objects * Cell Objects * Code Objects * Other Objects * File Objects * Module Objects * Initializing C modules * Single-phase initialization * Multi-phase initialization * Low-level module creation functions * Support functions * Module lookup * Iterator Objects * Descriptor Objects * Slice Objects * Ellipsis Object * MemoryView objects * Weak Reference Objects * Capsules * Generator Objects * Coroutine Objects * Context Variables Objects * DateTime Objects * Objects for Type Hinting * Initialization, Finalization, and Threads * Before Python Initialization * Global configuration variables * Initializing and finalizing the interpreter * Process-wide parameters * Thread State and the Global Interpreter Lock * Releasing the GIL from extension code * Non-Python created threads * Cautions about fork() * High-level API * Low-level API * Sub-interpreter support * Bugs and caveats * Asynchronous Notifications * Profiling and Tracing * Advanced Debugger Support * Thread Local Storage Support * Thread Specific Storage (TSS) API * Dynamic Allocation * Methods * Thread Local Storage (TLS) API * Python Initialization Configuration * PyWideStringList * PyStatus * PyPreConfig * Preinitialization with PyPreConfig * PyConfig * Initialization with PyConfig * Isolated Configuration * Python Configuration * Path Configuration * Py_RunMain() * Py_GetArgcArgv() * Multi-Phase Initialization Private Provisional API * Memory Management * Overview * Raw Memory Interface * Memory Interface * Object allocators * Default Memory Allocators * Customize Memory Allocators * The pymalloc allocator * Customize pymalloc Arena Allocator * tracemalloc C API * Examples * Object Implementation Support * Allocating Objects on the Heap * Common Object Structures * Base object types and macros * Implementing functions and methods * Accessing attributes of extension types * Type Objects * Quick Reference * “tp slots” * sub-slots * slot typedefs * PyTypeObject Definition * PyObject Slots * PyVarObject Slots * PyTypeObject Slots * Heap Types * Number Object Structures * Mapping Object Structures * Sequence Object Structures * Buffer Object Structures * Async Object Structures * Slot Type typedefs * Examples * Supporting Cyclic Garbage Collection * API and ABI Versioning * Distributing Python Modules * Key terms * Open source licensing and collaboration * Installing the tools * Reading the Python Packaging User Guide * How do I…? * … choose a name for my project? * … create and distribute binary extensions? * Installing Python Modules * Key terms * Basic usage * How do I …? * … install "pip" in versions of Python prior to Python 3.4? * … install packages just for the current user? * … install scientific Python packages? * … work with multiple versions of Python installed in parallel? * Common installation issues * Installing into the system Python on Linux * Pip not installed * Installing binary extensions * Python HOWTOs * Porting Python 2 Code to Python 3 * The Short Explanation * Details * Drop support for Python 2.6 and older * Make sure you specify the proper version support in your "setup.py" file * Have good test coverage * Learn the differences between Python 2 & 3 * Update your code * Division * Text versus binary data * Use feature detection instead of version detection * Prevent compatibility regressions * Check which dependencies block your transition * Update your "setup.py" file to denote Python 3 compatibility * Use continuous integration to stay compatible * Consider using optional static type checking * Porting Extension Modules to Python 3 * Curses Programming with Python * What is curses? * The Python curses module * Starting and ending a curses application * Windows and Pads * Displaying Text * Attributes and Color * User Input * For More Information * Descriptor HowTo Guide * Primer * Simple example: A descriptor that returns a constant * Dynamic lookups * Managed attributes * Customized names * Closing thoughts * Complete Practical Example * Validator class * Custom validators * Practical application * Technical Tutorial * Abstract * Definition and introduction * Descriptor protocol * Overview of descriptor invocation * Invocation from an instance * Invocation from a class * Invocation from super * Summary of invocation logic * Automatic name notification * ORM example * Pure Python Equivalents * Properties * Functions and methods * Kinds of methods * Static methods * Class methods * Member objects and __slots__ * Functional Programming HOWTO * Introduction * Formal provability * Modularity * Ease of debugging and testing * Composability * Iterators * Data Types That Support Iterators * Generator expressions and list comprehensions * Generators * Passing values into a generator * Built-in functions * The itertools module * Creating new iterators * Calling functions on elements * Selecting elements * Combinatoric functions * Grouping elements * The functools module * The operator module * Small functions and the lambda expression * Revision History and Acknowledgements * References * General * Python-specific * Python documentation * Logging HOWTO * Basic Logging Tutorial * When to use logging * A simple example * Logging to a file * Logging from multiple modules * Logging variable data * Changing the format of displayed messages * Displaying the date/time in messages * Next Steps * Advanced Logging Tutorial * Logging Flow * Loggers * Handlers * Formatters * Configuring Logging * What happens if no configuration is provided * Configuring Logging for a Library * Logging Levels * Custom Levels * Useful Handlers * Exceptions raised during logging * Using arbitrary objects as messages * Optimization * Logging Cookbook * Using logging in multiple modules * Logging from multiple threads * Multiple handlers and formatters * Logging to multiple destinations * Configuration server example * Dealing with handlers that block * Sending and receiving logging events across a network * Running a logging socket listener in production * Adding contextual information to your logging output * Using LoggerAdapters to impart contextual information * Using objects other than dicts to pass contextual information * Using Filters to impart contextual information * Logging to a single file from multiple processes * Using concurrent.futures.ProcessPoolExecutor * Deploying Web applications using Gunicorn and uWSGI * Using file rotation * Use of alternative formatting styles * Customizing "LogRecord" * Subclassing QueueHandler - a ZeroMQ example * Subclassing QueueListener - a ZeroMQ example * An example dictionary-based configuration * Using a rotator and namer to customize log rotation processing * A more elaborate multiprocessing example * Inserting a BOM into messages sent to a SysLogHandler * Implementing structured logging * Customizing handlers with "dictConfig()" * Using particular formatting styles throughout your application * Using LogRecord factories * Using custom message objects * Configuring filters with "dictConfig()" * Customized exception formatting * Speaking logging messages * Buffering logging messages and outputting them conditionally * Formatting times using UTC (GMT) via configuration * Using a context manager for selective logging * A CLI application starter template * A Qt GUI for logging * Patterns to avoid * Opening the same log file multiple times * Using loggers as attributes in a class or passing them as parameters * Adding handlers other than "NullHandler" to a logger in a library * Creating a lot of loggers * Regular Expression HOWTO * Introduction * Simple Patterns * Matching Characters * Repeating Things * Using Regular Expressions * Compiling Regular Expressions * The Backslash Plague * Performing Matches * Module-Level Functions * Compilation Flags * More Pattern Power * More Metacharacters * Grouping * Non-capturing and Named Groups * Lookahead Assertions * Modifying Strings * Splitting Strings * Search and Replace * Common Problems * Use String Methods * match() versus search() * Greedy versus Non-Greedy * Using re.VERBOSE * Feedback * Socket Programming HOWTO * Sockets * History * Creating a Socket * IPC * Using a Socket * Binary Data * Disconnecting * When Sockets Die * Non-blocking Sockets * Sorting HOW TO * Sorting Basics * Key Functions * Operator Module Functions * Ascending and Descending * Sort Stability and Complex Sorts * The Old Way Using Decorate-Sort-Undecorate * The Old Way Using the *cmp* Parameter * Odd and Ends * Unicode HOWTO * Introduction to Unicode * Definitions * Encodings * References * Python’s Unicode Support * The String Type * Converting to Bytes * Unicode Literals in Python Source Code * Unicode Properties * Comparing Strings * Unicode Regular Expressions * References * Reading and Writing Unicode Data * Unicode filenames * Tips for Writing Unicode-aware Programs * Converting Between File Encodings * Files in an Unknown Encoding * References * Acknowledgements * HOWTO Fetch Internet Resources Using The urllib Package * Introduction * Fetching URLs * Data * Headers * Handling Exceptions * URLError * HTTPError * Error Codes * Wrapping it Up * Number 1 * Number 2 * info and geturl * Openers and Handlers * Basic Authentication * Proxies * Sockets and Layers * Footnotes * Argparse Tutorial * Concepts * The basics * Introducing Positional arguments * Introducing Optional arguments * Short options * Combining Positional and Optional arguments * Getting a little more advanced * Conflicting options * Conclusion * An introduction to the ipaddress module * Creating Address/Network/Interface objects * A Note on IP Versions * IP Host Addresses * Defining Networks * Host Interfaces * Inspecting Address/Network/Interface Objects * Networks as lists of Addresses * Comparisons * Using IP Addresses with other modules * Getting more detail when instance creation fails * Argument Clinic How-To * The Goals Of Argument Clinic * Basic Concepts And Usage * Converting Your First Function * Advanced Topics * Symbolic default values * Renaming the C functions and variables generated by Argument Clinic * Converting functions using PyArg_UnpackTuple * Optional Groups * Using real Argument Clinic converters, instead of “legacy converters” * Py_buffer * Advanced converters * Parameter default values * The "NULL" default value * Expressions specified as default values * Using a return converter * Cloning existing functions * Calling Python code * Using a “self converter” * Writing a custom converter * Writing a custom return converter * METH_O and METH_NOARGS * tp_new and tp_init functions * Changing and redirecting Clinic’s output * The #ifdef trick * Using Argument Clinic in Python files * Instrumenting CPython with DTrace and SystemTap * Enabling the static markers * Static DTrace probes * Static SystemTap markers * Available static markers * SystemTap Tapsets * Examples * Python Frequently Asked Questions * General Python FAQ * General Information * Python in the real world * Programming FAQ * General Questions * Core Language * Numbers and strings * Performance * Sequences (Tuples/Lists) * Objects * Modules * Design and History FAQ * Why does Python use indentation for grouping of statements? * Why am I getting strange results with simple arithmetic operations? * Why are floating-point calculations so inaccurate? * Why are Python strings immutable? * Why must ‘self’ be used explicitly in method definitions and calls? * Why can’t I use an assignment in an expression? * Why does Python use methods for some functionality (e.g. list.index()) but functions for other (e.g. len(list))? * Why is join() a string method instead of a list or tuple method? * How fast are exceptions? * Why isn’t there a switch or case statement in Python? * Can’t you emulate threads in the interpreter instead of relying on an OS-specific thread implementation? * Why can’t lambda expressions contain statements? * Can Python be compiled to machine code, C or some other language? * How does Python manage memory? * Why doesn’t CPython use a more traditional garbage collection scheme? * Why isn’t all memory freed when CPython exits? * Why are there separate tuple and list data types? * How are lists implemented in CPython? * How are dictionaries implemented in CPython? * Why must dictionary keys be immutable? * Why doesn’t list.sort() return the sorted list? * How do you specify and enforce an interface spec in Python? * Why is there no goto? * Why can’t raw strings (r-strings) end with a backslash? * Why doesn’t Python have a “with” statement for attribute assignments? * Why don’t generators support the with statement? * Why are colons required for the if/while/def/class statements? * Why does Python allow commas at the end of lists and tuples? * Library and Extension FAQ * General Library Questions * Common tasks * Threads * Input and Output * Network/Internet Programming * Databases * Mathematics and Numerics * Extending/Embedding FAQ * Can I create my own functions in C? * Can I create my own functions in C++? * Writing C is hard; are there any alternatives? * How can I execute arbitrary Python statements from C? * How can I evaluate an arbitrary Python expression from C? * How do I extract C values from a Python object? * How do I use Py_BuildValue() to create a tuple of arbitrary length? * How do I call an object’s method from C? * How do I catch the output from PyErr_Print() (or anything that prints to stdout/stderr)? * How do I access a module written in Python from C? * How do I interface to C++ objects from Python? * I added a module using the Setup file and the make fails; why? * How do I debug an extension? * I want to compile a Python module on my Linux system, but some files are missing. Why? * How do I tell “incomplete input” from “invalid input”? * How do I find undefined g++ symbols __builtin_new or __pure_virtual? * Can I create an object class with some methods implemented in C and others in Python (e.g. through inheritance)? * Python on Windows FAQ * How do I run a Python program under Windows? * How do I make Python scripts executable? * Why does Python sometimes take so long to start? * How do I make an executable from a Python script? * Is a "*.pyd" file the same as a DLL? * How can I embed Python into a Windows application? * How do I keep editors from inserting tabs into my Python source? * How do I check for a keypress without blocking? * Graphic User Interface FAQ * General GUI Questions * What GUI toolkits exist for Python? * Tkinter questions * “Why is Python Installed on my Computer?” FAQ * What is Python? * Why is Python installed on my machine? * Can I delete Python? * Glossary * About these documents * Contributors to the Python Documentation * Dealing with Bugs * Documentation bugs * Using the Python issue tracker * Getting started contributing to Python yourself * Copyright * History and License * History of the software * Terms and conditions for accessing or otherwise using Python * PSF LICENSE AGREEMENT FOR PYTHON 3.9.23 * BEOPEN.COM LICENSE AGREEMENT FOR PYTHON 2.0 * CNRI LICENSE AGREEMENT FOR PYTHON 1.6.1 * CWI LICENSE AGREEMENT FOR PYTHON 0.9.0 THROUGH 1.2 * ZERO-CLAUSE BSD LICENSE FOR CODE IN THE PYTHON 3.9.23 DOCUMENTATION * Licenses and Acknowledgements for Incorporated Software * Mersenne Twister * Sockets * Asynchronous socket services * Cookie management * Execution tracing * UUencode and UUdecode functions * XML Remote Procedure Calls * test_epoll * Select kqueue * SipHash24 * strtod and dtoa * OpenSSL * expat * libffi * zlib * cfuhash * libmpdec * W3C C14N test suite Copyright ********* Python and this documentation is: Copyright © 2001-2023 Python Software Foundation. All rights reserved. Copyright © 2000 BeOpen.com. All rights reserved. Copyright © 1995-2000 Corporation for National Research Initiatives. All rights reserved. Copyright © 1991-1995 Stichting Mathematisch Centrum. All rights reserved. ====================================================================== See History and License for complete license and permissions information. Distributing Python Modules *************************** Email: distutils-sig@python.org As a popular open source development project, Python has an active supporting community of contributors and users that also make their software available for other Python developers to use under open source license terms. This allows Python users to share and collaborate effectively, benefiting from the solutions others have already created to common (and sometimes even rare!) problems, as well as potentially contributing their own solutions to the common pool. This guide covers the distribution part of the process. For a guide to installing other Python projects, refer to the installation guide. Note: For corporate and other institutional users, be aware that many organisations have their own policies around using and contributing to open source software. Please take such policies into account when making use of the distribution and installation tools provided with Python. Key terms ========= * the Python Package Index is a public repository of open source licensed packages made available for use by other Python users * the Python Packaging Authority are the group of developers and documentation authors responsible for the maintenance and evolution of the standard packaging tools and the associated metadata and file format standards. They maintain a variety of tools, documentation and issue trackers on both GitHub and Bitbucket. * "distutils" is the original build and distribution system first added to the Python standard library in 1998. While direct use of "distutils" is being phased out, it still laid the foundation for the current packaging and distribution infrastructure, and it not only remains part of the standard library, but its name lives on in other ways (such as the name of the mailing list used to coordinate Python packaging standards development). * setuptools is a (largely) drop-in replacement for "distutils" first published in 2004. Its most notable addition over the unmodified "distutils" tools was the ability to declare dependencies on other packages. It is currently recommended as a more regularly updated alternative to "distutils" that offers consistent support for more recent packaging standards across a wide range of Python versions. * wheel (in this context) is a project that adds the "bdist_wheel" command to "distutils"/setuptools. This produces a cross platform binary packaging format (called “wheels” or “wheel files” and defined in **PEP 427**) that allows Python libraries, even those including binary extensions, to be installed on a system without needing to be built locally. Open source licensing and collaboration ======================================= In most parts of the world, software is automatically covered by copyright. This means that other developers require explicit permission to copy, use, modify and redistribute the software. Open source licensing is a way of explicitly granting such permission in a relatively consistent way, allowing developers to share and collaborate efficiently by making common solutions to various problems freely available. This leaves many developers free to spend more time focusing on the problems that are relatively unique to their specific situation. The distribution tools provided with Python are designed to make it reasonably straightforward for developers to make their own contributions back to that common pool of software if they choose to do so. The same distribution tools can also be used to distribute software within an organisation, regardless of whether that software is published as open source software or not. Installing the tools ==================== The standard library does not include build tools that support modern Python packaging standards, as the core development team has found that it is important to have standard tools that work consistently, even on older versions of Python. The currently recommended build and distribution tools can be installed by invoking the "pip" module at the command line: python -m pip install setuptools wheel twine Note: For POSIX users (including macOS and Linux users), these instructions assume the use of a *virtual environment*.For Windows users, these instructions assume that the option to adjust the system PATH environment variable was selected when installing Python. The Python Packaging User Guide includes more details on the currently recommended tools. Reading the Python Packaging User Guide ======================================= The Python Packaging User Guide covers the various key steps and elements involved in creating and publishing a project: * Project structure * Building and packaging the project * Uploading the project to the Python Package Index * The .pypirc file How do I…? ========== These are quick answers or links for some common tasks. … choose a name for my project? ------------------------------- This isn’t an easy topic, but here are a few tips: * check the Python Package Index to see if the name is already in use * check popular hosting sites like GitHub, Bitbucket, etc to see if there is already a project with that name * check what comes up in a web search for the name you’re considering * avoid particularly common words, especially ones with multiple meanings, as they can make it difficult for users to find your software when searching for it … create and distribute binary extensions? ------------------------------------------ This is actually quite a complex topic, with a variety of alternatives available depending on exactly what you’re aiming to achieve. See the Python Packaging User Guide for more information and recommendations. See also: Python Packaging User Guide: Binary Extensions Note: This document is being retained solely until the "setuptools" documentation at https://setuptools.readthedocs.io/en/latest/setuptools.html independently covers all of the relevant information currently included here. 9. API Reference **************** See also: New and changed setup.py arguments in setuptools The "setuptools" project adds new capabilities to the "setup" function and other APIs, makes the API consistent across different Python versions, and is hence recommended over using "distutils" directly. Note: This document is being retained solely until the "setuptools" documentation at https://setuptools.readthedocs.io/en/latest/setuptools.html independently covers all of the relevant information currently included here. 9.1. "distutils.core" — Core Distutils functionality ==================================================== The "distutils.core" module is the only module that needs to be installed to use the Distutils. It provides the "setup()" (which is called from the setup script). Indirectly provides the "distutils.dist.Distribution" and "distutils.cmd.Command" class. distutils.core.setup(arguments) The basic do-everything function that does most everything you could ever ask for from a Distutils method. The setup function takes a large number of arguments. These are laid out in the following table. +----------------------+----------------------------------+---------------------------------------------------------------+ | argument name | value | type | |======================|==================================|===============================================================| | *name* | The name of the package | a string | +----------------------+----------------------------------+---------------------------------------------------------------+ | *version* | The version number of the | a string | | | package; see "distutils.version" | | +----------------------+----------------------------------+---------------------------------------------------------------+ | *description* | A single line describing the | a string | | | package | | +----------------------+----------------------------------+---------------------------------------------------------------+ | *long_description* | Longer description of the | a string | | | package | | +----------------------+----------------------------------+---------------------------------------------------------------+ | *author* | The name of the package author | a string | +----------------------+----------------------------------+---------------------------------------------------------------+ | *author_email* | The email address of the package | a string | | | author | | +----------------------+----------------------------------+---------------------------------------------------------------+ | *maintainer* | The name of the current | a string | | | maintainer, if different from | | | | the author. Note that if the | | | | maintainer is provided, | | | | distutils will use it as the | | | | author in "PKG-INFO" | | +----------------------+----------------------------------+---------------------------------------------------------------+ | *maintainer_email* | The email address of the current | a string | | | maintainer, if different from | | | | the author | | +----------------------+----------------------------------+---------------------------------------------------------------+ | *url* | A URL for the package (homepage) | a string | +----------------------+----------------------------------+---------------------------------------------------------------+ | *download_url* | A URL to download the package | a string | +----------------------+----------------------------------+---------------------------------------------------------------+ | *packages* | A list of Python packages that | a list of strings | | | distutils will manipulate | | +----------------------+----------------------------------+---------------------------------------------------------------+ | *py_modules* | A list of Python modules that | a list of strings | | | distutils will manipulate | | +----------------------+----------------------------------+---------------------------------------------------------------+ | *scripts* | A list of standalone script | a list of strings | | | files to be built and installed | | +----------------------+----------------------------------+---------------------------------------------------------------+ | *ext_modules* | A list of Python extensions to | a list of instances of "distutils.core.Extension" | | | be built | | +----------------------+----------------------------------+---------------------------------------------------------------+ | *classifiers* | A list of categories for the | a list of strings; valid classifiers are listed on PyPI. | | | package | | +----------------------+----------------------------------+---------------------------------------------------------------+ | *distclass* | the "Distribution" class to use | a subclass of "distutils.core.Distribution" | +----------------------+----------------------------------+---------------------------------------------------------------+ | *script_name* | The name of the setup.py script | a string | | | - defaults to "sys.argv[0]" | | +----------------------+----------------------------------+---------------------------------------------------------------+ | *script_args* | Arguments to supply to the setup | a list of strings | | | script | | +----------------------+----------------------------------+---------------------------------------------------------------+ | *options* | default options for the setup | a dictionary | | | script | | +----------------------+----------------------------------+---------------------------------------------------------------+ | *license* | The license for the package | a string | +----------------------+----------------------------------+---------------------------------------------------------------+ | *keywords* | Descriptive meta-data, see **PEP | a list of strings or a comma-separated string | | | 314** | | +----------------------+----------------------------------+---------------------------------------------------------------+ | *platforms* | | a list of strings or a comma-separated string | +----------------------+----------------------------------+---------------------------------------------------------------+ | *cmdclass* | A mapping of command names to | a dictionary | | | "Command" subclasses | | +----------------------+----------------------------------+---------------------------------------------------------------+ | *data_files* | A list of data files to install | a list | +----------------------+----------------------------------+---------------------------------------------------------------+ | *package_dir* | A mapping of package to | a dictionary | | | directory names | | +----------------------+----------------------------------+---------------------------------------------------------------+ distutils.core.run_setup(script_name[, script_args=None, stop_after='run']) Run a setup script in a somewhat controlled environment, and return the "distutils.dist.Distribution" instance that drives things. This is useful if you need to find out the distribution meta-data (passed as keyword args from *script* to "setup()"), or the contents of the config files or command-line. *script_name* is a file that will be read and run with "exec()". "sys.argv[0]" will be replaced with *script* for the duration of the call. *script_args* is a list of strings; if supplied, "sys.argv[1:]" will be replaced by *script_args* for the duration of the call. *stop_after* tells "setup()" when to stop processing; possible values: +-----------------+-----------------------------------------------+ | value | description | |=================|===============================================| | *init* | Stop after the "Distribution" instance has | | | been created and populated with the keyword | | | arguments to "setup()" | +-----------------+-----------------------------------------------+ | *config* | Stop after config files have been parsed (and | | | their data stored in the "Distribution" | | | instance) | +-----------------+-----------------------------------------------+ | *commandline* | Stop after the command-line ("sys.argv[1:]" | | | or *script_args*) have been parsed (and the | | | data stored in the "Distribution" instance.) | +-----------------+-----------------------------------------------+ | *run* | Stop after all commands have been run (the | | | same as if "setup()" had been called in the | | | usual way). This is the default value. | +-----------------+-----------------------------------------------+ In addition, the "distutils.core" module exposed a number of classes that live elsewhere. * "Extension" from "distutils.extension" * "Command" from "distutils.cmd" * "Distribution" from "distutils.dist" A short description of each of these follows, but see the relevant module for the full reference. class distutils.core.Extension The Extension class describes a single C or C++ extension module in a setup script. It accepts the following keyword arguments in its constructor: +--------------------------+----------------------------------+-----------------------------+ | argument name | value | type | |==========================|==================================|=============================| | *name* | the full name of the extension, | a string | | | including any packages — ie. | | | | *not* a filename or pathname, | | | | but Python dotted name | | +--------------------------+----------------------------------+-----------------------------+ | *sources* | list of source filenames, | a list of strings | | | relative to the distribution | | | | root (where the setup script | | | | lives), in Unix form (slash- | | | | separated) for portability. | | | | Source files may be C, C++, SWIG | | | | (.i), platform- specific | | | | resource files, or whatever else | | | | is recognized by the | | | | **build_ext** command as source | | | | for a Python extension. | | +--------------------------+----------------------------------+-----------------------------+ | *include_dirs* | list of directories to search | a list of strings | | | for C/C++ header files (in Unix | | | | form for portability) | | +--------------------------+----------------------------------+-----------------------------+ | *define_macros* | list of macros to define; each | a list of tuples | | | macro is defined using a 2-tuple | | | | "(name, value)", where *value* | | | | is either the string to define | | | | it to or "None" to define it | | | | without a particular value | | | | (equivalent of "#define FOO" in | | | | source or "-DFOO" on Unix C | | | | compiler command line) | | +--------------------------+----------------------------------+-----------------------------+ | *undef_macros* | list of macros to undefine | a list of strings | | | explicitly | | +--------------------------+----------------------------------+-----------------------------+ | *library_dirs* | list of directories to search | a list of strings | | | for C/C++ libraries at link time | | +--------------------------+----------------------------------+-----------------------------+ | *libraries* | list of library names (not | a list of strings | | | filenames or paths) to link | | | | against | | +--------------------------+----------------------------------+-----------------------------+ | *runtime_library_dirs* | list of directories to search | a list of strings | | | for C/C++ libraries at run time | | | | (for shared extensions, this is | | | | when the extension is loaded) | | +--------------------------+----------------------------------+-----------------------------+ | *extra_objects* | list of extra files to link with | a list of strings | | | (eg. object files not implied by | | | | ‘sources’, static library that | | | | must be explicitly specified, | | | | binary resource files, etc.) | | +--------------------------+----------------------------------+-----------------------------+ | *extra_compile_args* | any extra platform- and | a list of strings | | | compiler-specific information to | | | | use when compiling the source | | | | files in ‘sources’. For | | | | platforms and compilers where a | | | | command line makes sense, this | | | | is typically a list of command- | | | | line arguments, but for other | | | | platforms it could be anything. | | +--------------------------+----------------------------------+-----------------------------+ | *extra_link_args* | any extra platform- and | a list of strings | | | compiler-specific information to | | | | use when linking object files | | | | together to create the extension | | | | (or to create a new static | | | | Python interpreter). Similar | | | | interpretation as for | | | | ‘extra_compile_args’. | | +--------------------------+----------------------------------+-----------------------------+ | *export_symbols* | list of symbols to be exported | a list of strings | | | from a shared extension. Not | | | | used on all platforms, and not | | | | generally necessary for Python | | | | extensions, which typically | | | | export exactly one symbol: | | | | "init" + extension_name. | | +--------------------------+----------------------------------+-----------------------------+ | *depends* | list of files that the extension | a list of strings | | | depends on | | +--------------------------+----------------------------------+-----------------------------+ | *language* | extension language (i.e. "'c'", | a string | | | "'c++'", "'objc'"). Will be | | | | detected from the source | | | | extensions if not provided. | | +--------------------------+----------------------------------+-----------------------------+ | *optional* | specifies that a build failure | a boolean | | | in the extension should not | | | | abort the build process, but | | | | simply skip the extension. | | +--------------------------+----------------------------------+-----------------------------+ Changed in version 3.8: On Unix, C extensions are no longer linked to libpython except on Android and Cygwin. class distutils.core.Distribution A "Distribution" describes how to build, install and package up a Python software package. See the "setup()" function for a list of keyword arguments accepted by the Distribution constructor. "setup()" creates a Distribution instance. Changed in version 3.7: "Distribution" now warns if "classifiers", "keywords" and "platforms" fields are not specified as a list or a string. class distutils.core.Command A "Command" class (or rather, an instance of one of its subclasses) implement a single distutils command. 9.2. "distutils.ccompiler" — CCompiler base class ================================================= This module provides the abstract base class for the "CCompiler" classes. A "CCompiler" instance can be used for all the compile and link steps needed to build a single project. Methods are provided to set options for the compiler — macro definitions, include directories, link path, libraries and the like. This module provides the following functions. distutils.ccompiler.gen_lib_options(compiler, library_dirs, runtime_library_dirs, libraries) Generate linker options for searching library directories and linking with specific libraries. *libraries* and *library_dirs* are, respectively, lists of library names (not filenames!) and search directories. Returns a list of command-line options suitable for use with some compiler (depending on the two format strings passed in). distutils.ccompiler.gen_preprocess_options(macros, include_dirs) Generate C pre-processor options ("-D", "-U", "-I") as used by at least two types of compilers: the typical Unix compiler and Visual C++. *macros* is the usual thing, a list of 1- or 2-tuples, where "(name,)" means undefine ("-U") macro *name*, and "(name, value)" means define ("-D") macro *name* to *value*. *include_dirs* is just a list of directory names to be added to the header file search path ("-I"). Returns a list of command-line options suitable for either Unix compilers or Visual C++. distutils.ccompiler.get_default_compiler(osname, platform) Determine the default compiler to use for the given platform. *osname* should be one of the standard Python OS names (i.e. the ones returned by "os.name") and *platform* the common value returned by "sys.platform" for the platform in question. The default values are "os.name" and "sys.platform" in case the parameters are not given. distutils.ccompiler.new_compiler(plat=None, compiler=None, verbose=0, dry_run=0, force=0) Factory function to generate an instance of some CCompiler subclass for the supplied platform/compiler combination. *plat* defaults to "os.name" (eg. "'posix'", "'nt'"), and *compiler* defaults to the default compiler for that platform. Currently only "'posix'" and "'nt'" are supported, and the default compilers are “traditional Unix interface” ("UnixCCompiler" class) and Visual C++ ("MSVCCompiler" class). Note that it’s perfectly possible to ask for a Unix compiler object under Windows, and a Microsoft compiler object under Unix—if you supply a value for *compiler*, *plat* is ignored. distutils.ccompiler.show_compilers() Print list of available compilers (used by the "--help-compiler" options to **build**, **build_ext**, **build_clib**). class distutils.ccompiler.CCompiler([verbose=0, dry_run=0, force=0]) The abstract base class "CCompiler" defines the interface that must be implemented by real compiler classes. The class also has some utility methods used by several compiler classes. The basic idea behind a compiler abstraction class is that each instance can be used for all the compile/link steps in building a single project. Thus, attributes common to all of those compile and link steps — include directories, macros to define, libraries to link against, etc. — are attributes of the compiler instance. To allow for variability in how individual files are treated, most of those attributes may be varied on a per-compilation or per-link basis. The constructor for each subclass creates an instance of the Compiler object. Flags are *verbose* (show verbose output), *dry_run* (don’t actually execute the steps) and *force* (rebuild everything, regardless of dependencies). All of these flags default to "0" (off). Note that you probably don’t want to instantiate "CCompiler" or one of its subclasses directly - use the "distutils.CCompiler.new_compiler()" factory function instead. The following methods allow you to manually alter compiler options for the instance of the Compiler class. add_include_dir(dir) Add *dir* to the list of directories that will be searched for header files. The compiler is instructed to search directories in the order in which they are supplied by successive calls to "add_include_dir()". set_include_dirs(dirs) Set the list of directories that will be searched to *dirs* (a list of strings). Overrides any preceding calls to "add_include_dir()"; subsequent calls to "add_include_dir()" add to the list passed to "set_include_dirs()". This does not affect any list of standard include directories that the compiler may search by default. add_library(libname) Add *libname* to the list of libraries that will be included in all links driven by this compiler object. Note that *libname* should *not* be the name of a file containing a library, but the name of the library itself: the actual filename will be inferred by the linker, the compiler, or the compiler class (depending on the platform). The linker will be instructed to link against libraries in the order they were supplied to "add_library()" and/or "set_libraries()". It is perfectly valid to duplicate library names; the linker will be instructed to link against libraries as many times as they are mentioned. set_libraries(libnames) Set the list of libraries to be included in all links driven by this compiler object to *libnames* (a list of strings). This does not affect any standard system libraries that the linker may include by default. add_library_dir(dir) Add *dir* to the list of directories that will be searched for libraries specified to "add_library()" and "set_libraries()". The linker will be instructed to search for libraries in the order they are supplied to "add_library_dir()" and/or "set_library_dirs()". set_library_dirs(dirs) Set the list of library search directories to *dirs* (a list of strings). This does not affect any standard library search path that the linker may search by default. add_runtime_library_dir(dir) Add *dir* to the list of directories that will be searched for shared libraries at runtime. set_runtime_library_dirs(dirs) Set the list of directories to search for shared libraries at runtime to *dirs* (a list of strings). This does not affect any standard search path that the runtime linker may search by default. define_macro(name[, value=None]) Define a preprocessor macro for all compilations driven by this compiler object. The optional parameter *value* should be a string; if it is not supplied, then the macro will be defined without an explicit value and the exact outcome depends on the compiler used. undefine_macro(name) Undefine a preprocessor macro for all compilations driven by this compiler object. If the same macro is defined by "define_macro()" and undefined by "undefine_macro()" the last call takes precedence (including multiple redefinitions or undefinitions). If the macro is redefined/undefined on a per- compilation basis (ie. in the call to "compile()"), then that takes precedence. add_link_object(object) Add *object* to the list of object files (or analogues, such as explicitly named library files or the output of “resource compilers”) to be included in every link driven by this compiler object. set_link_objects(objects) Set the list of object files (or analogues) to be included in every link to *objects*. This does not affect any standard object files that the linker may include by default (such as system libraries). The following methods implement methods for autodetection of compiler options, providing some functionality similar to GNU **autoconf**. detect_language(sources) Detect the language of a given file, or list of files. Uses the instance attributes "language_map" (a dictionary), and "language_order" (a list) to do the job. find_library_file(dirs, lib[, debug=0]) Search the specified list of directories for a static or shared library file *lib* and return the full path to that file. If *debug* is true, look for a debugging version (if that makes sense on the current platform). Return "None" if *lib* wasn’t found in any of the specified directories. has_function(funcname[, includes=None, include_dirs=None, libraries=None, library_dirs=None]) Return a boolean indicating whether *funcname* is supported on the current platform. The optional arguments can be used to augment the compilation environment by providing additional include files and paths and libraries and paths. library_dir_option(dir) Return the compiler option to add *dir* to the list of directories searched for libraries. library_option(lib) Return the compiler option to add *lib* to the list of libraries linked into the shared library or executable. runtime_library_dir_option(dir) Return the compiler option to add *dir* to the list of directories searched for runtime libraries. set_executables(**args) Define the executables (and options for them) that will be run to perform the various stages of compilation. The exact set of executables that may be specified here depends on the compiler class (via the ‘executables’ class attribute), but most will have: +----------------+--------------------------------------------+ | attribute | description | |================|============================================| | *compiler* | the C/C++ compiler | +----------------+--------------------------------------------+ | *linker_so* | linker used to create shared objects and | | | libraries | +----------------+--------------------------------------------+ | *linker_exe* | linker used to create binary executables | +----------------+--------------------------------------------+ | *archiver* | static library creator | +----------------+--------------------------------------------+ On platforms with a command-line (Unix, DOS/Windows), each of these is a string that will be split into executable name and (optional) list of arguments. (Splitting the string is done similarly to how Unix shells operate: words are delimited by spaces, but quotes and backslashes can override this. See "distutils.util.split_quoted()".) The following methods invoke stages in the build process. compile(sources[, output_dir=None, macros=None, include_dirs=None, debug=0, extra_preargs=None, extra_postargs=None, depends=None]) Compile one or more source files. Generates object files (e.g. transforms a ".c" file to a ".o" file.) *sources* must be a list of filenames, most likely C/C++ files, but in reality anything that can be handled by a particular compiler and compiler class (eg. "MSVCCompiler" can handle resource files in *sources*). Return a list of object filenames, one per source filename in *sources*. Depending on the implementation, not all source files will necessarily be compiled, but all corresponding object filenames will be returned. If *output_dir* is given, object files will be put under it, while retaining their original path component. That is, "foo/bar.c" normally compiles to "foo/bar.o" (for a Unix implementation); if *output_dir* is *build*, then it would compile to "build/foo/bar.o". *macros*, if given, must be a list of macro definitions. A macro definition is either a "(name, value)" 2-tuple or a "(name,)" 1-tuple. The former defines a macro; if the value is "None", the macro is defined without an explicit value. The 1-tuple case undefines a macro. Later definitions/redefinitions/undefinitions take precedence. *include_dirs*, if given, must be a list of strings, the directories to add to the default include file search path for this compilation only. *debug* is a boolean; if true, the compiler will be instructed to output debug symbols in (or alongside) the object file(s). *extra_preargs* and *extra_postargs* are implementation- dependent. On platforms that have the notion of a command-line (e.g. Unix, DOS/Windows), they are most likely lists of strings: extra command-line arguments to prepend/append to the compiler command line. On other platforms, consult the implementation class documentation. In any event, they are intended as an escape hatch for those occasions when the abstract compiler framework doesn’t cut the mustard. *depends*, if given, is a list of filenames that all targets depend on. If a source file is older than any file in depends, then the source file will be recompiled. This supports dependency tracking, but only at a coarse granularity. Raises "CompileError" on failure. create_static_lib(objects, output_libname[, output_dir=None, debug=0, target_lang=None]) Link a bunch of stuff together to create a static library file. The “bunch of stuff” consists of the list of object files supplied as *objects*, the extra object files supplied to "add_link_object()" and/or "set_link_objects()", the libraries supplied to "add_library()" and/or "set_libraries()", and the libraries supplied as *libraries* (if any). *output_libname* should be a library name, not a filename; the filename will be inferred from the library name. *output_dir* is the directory where the library file will be put. *debug* is a boolean; if true, debugging information will be included in the library (note that on most platforms, it is the compile step where this matters: the *debug* flag is included here just for consistency). *target_lang* is the target language for which the given objects are being compiled. This allows specific linkage time treatment of certain languages. Raises "LibError" on failure. link(target_desc, objects, output_filename[, output_dir=None, libraries=None, library_dirs=None, runtime_library_dirs=None, export_symbols=None, debug=0, extra_preargs=None, extra_postargs=None, build_temp=None, target_lang=None]) Link a bunch of stuff together to create an executable or shared library file. The “bunch of stuff” consists of the list of object files supplied as *objects*. *output_filename* should be a filename. If *output_dir* is supplied, *output_filename* is relative to it (i.e. *output_filename* can provide directory components if needed). *libraries* is a list of libraries to link against. These are library names, not filenames, since they’re translated into filenames in a platform-specific way (eg. *foo* becomes "libfoo.a" on Unix and "foo.lib" on DOS/Windows). However, they can include a directory component, which means the linker will look in that specific directory rather than searching all the normal locations. *library_dirs*, if supplied, should be a list of directories to search for libraries that were specified as bare library names (ie. no directory component). These are on top of the system default and those supplied to "add_library_dir()" and/or "set_library_dirs()". *runtime_library_dirs* is a list of directories that will be embedded into the shared library and used to search for other shared libraries that *it* depends on at run-time. (This may only be relevant on Unix.) *export_symbols* is a list of symbols that the shared library will export. (This appears to be relevant only on Windows.) *debug* is as for "compile()" and "create_static_lib()", with the slight distinction that it actually matters on most platforms (as opposed to "create_static_lib()", which includes a *debug* flag mostly for form’s sake). *extra_preargs* and *extra_postargs* are as for "compile()" (except of course that they supply command-line arguments for the particular linker being used). *target_lang* is the target language for which the given objects are being compiled. This allows specific linkage time treatment of certain languages. Raises "LinkError" on failure. link_executable(objects, output_progname[, output_dir=None, libraries=None, library_dirs=None, runtime_library_dirs=None, debug=0, extra_preargs=None, extra_postargs=None, target_lang=None]) Link an executable. *output_progname* is the name of the file executable, while *objects* are a list of object filenames to link in. Other arguments are as for the "link()" method. link_shared_lib(objects, output_libname[, output_dir=None, libraries=None, library_dirs=None, runtime_library_dirs=None, export_symbols=None, debug=0, extra_preargs=None, extra_postargs=None, build_temp=None, target_lang=None]) Link a shared library. *output_libname* is the name of the output library, while *objects* is a list of object filenames to link in. Other arguments are as for the "link()" method. link_shared_object(objects, output_filename[, output_dir=None, libraries=None, library_dirs=None, runtime_library_dirs=None, export_symbols=None, debug=0, extra_preargs=None, extra_postargs=None, build_temp=None, target_lang=None]) Link a shared object. *output_filename* is the name of the shared object that will be created, while *objects* is a list of object filenames to link in. Other arguments are as for the "link()" method. preprocess(source[, output_file=None, macros=None, include_dirs=None, extra_preargs=None, extra_postargs=None]) Preprocess a single C/C++ source file, named in *source*. Output will be written to file named *output_file*, or *stdout* if *output_file* not supplied. *macros* is a list of macro definitions as for "compile()", which will augment the macros set with "define_macro()" and "undefine_macro()". *include_dirs* is a list of directory names that will be added to the default list, in the same way as "add_include_dir()". Raises "PreprocessError" on failure. The following utility methods are defined by the "CCompiler" class, for use by the various concrete subclasses. executable_filename(basename[, strip_dir=0, output_dir='']) Returns the filename of the executable for the given *basename*. Typically for non-Windows platforms this is the same as the basename, while Windows will get a ".exe" added. library_filename(libname[, lib_type='static', strip_dir=0, output_dir='']) Returns the filename for the given library name on the current platform. On Unix a library with *lib_type* of "'static'" will typically be of the form "liblibname.a", while a *lib_type* of "'dynamic'" will be of the form "liblibname.so". object_filenames(source_filenames[, strip_dir=0, output_dir='']) Returns the name of the object files for the given source files. *source_filenames* should be a list of filenames. shared_object_filename(basename[, strip_dir=0, output_dir='']) Returns the name of a shared object file for the given file name *basename*. execute(func, args[, msg=None, level=1]) Invokes "distutils.util.execute()". This method invokes a Python function *func* with the given arguments *args*, after logging and taking into account the *dry_run* flag. spawn(cmd) Invokes "distutils.util.spawn()". This invokes an external process to run the given command. mkpath(name[, mode=511]) Invokes "distutils.dir_util.mkpath()". This creates a directory and any missing ancestor directories. move_file(src, dst) Invokes "distutils.file_util.move_file()". Renames *src* to *dst*. announce(msg[, level=1]) Write a message using "distutils.log.debug()". warn(msg) Write a warning message *msg* to standard error. debug_print(msg) If the *debug* flag is set on this "CCompiler" instance, print *msg* to standard output, otherwise do nothing. 9.3. "distutils.unixccompiler" — Unix C Compiler ================================================ This module provides the "UnixCCompiler" class, a subclass of "CCompiler" that handles the typical Unix-style command-line C compiler: * macros defined with "-Dname[=value]" * macros undefined with "-Uname" * include search directories specified with "-Idir" * libraries specified with "-llib" * library search directories specified with "-Ldir" * compile handled by **cc** (or similar) executable with "-c" option: compiles ".c" to ".o" * link static library handled by **ar** command (possibly with **ranlib**) * link shared library handled by **cc** "-shared" 9.4. "distutils.msvccompiler" — Microsoft Compiler ================================================== This module provides "MSVCCompiler", an implementation of the abstract "CCompiler" class for Microsoft Visual Studio. Typically, extension modules need to be compiled with the same compiler that was used to compile Python. For Python 2.3 and earlier, the compiler was Visual Studio 6. For Python 2.4 and 2.5, the compiler is Visual Studio .NET 2003. "MSVCCompiler" will normally choose the right compiler, linker etc. on its own. To override this choice, the environment variables *DISTUTILS_USE_SDK* and *MSSdk* must be both set. *MSSdk* indicates that the current environment has been setup by the SDK’s "SetEnv.Cmd" script, or that the environment variables had been registered when the SDK was installed; *DISTUTILS_USE_SDK* indicates that the distutils user has made an explicit choice to override the compiler selection by "MSVCCompiler". 9.5. "distutils.bcppcompiler" — Borland Compiler ================================================ This module provides "BorlandCCompiler", a subclass of the abstract "CCompiler" class for the Borland C++ compiler. 9.6. "distutils.cygwincompiler" — Cygwin Compiler ================================================= This module provides the "CygwinCCompiler" class, a subclass of "UnixCCompiler" that handles the Cygwin port of the GNU C compiler to Windows. It also contains the Mingw32CCompiler class which handles the mingw32 port of GCC (same as cygwin in no-cygwin mode). 9.7. "distutils.archive_util" — Archiving utilities ==================================================== This module provides a few functions for creating archive files, such as tarballs or zipfiles. distutils.archive_util.make_archive(base_name, format[, root_dir=None, base_dir=None, verbose=0, dry_run=0]) Create an archive file (eg. "zip" or "tar"). *base_name* is the name of the file to create, minus any format-specific extension; *format* is the archive format: one of "zip", "tar", "gztar", "bztar", "xztar", or "ztar". *root_dir* is a directory that will be the root directory of the archive; ie. we typically "chdir" into *root_dir* before creating the archive. *base_dir* is the directory where we start archiving from; ie. *base_dir* will be the common prefix of all files and directories in the archive. *root_dir* and *base_dir* both default to the current directory. Returns the name of the archive file. Changed in version 3.5: Added support for the "xztar" format. distutils.archive_util.make_tarball(base_name, base_dir[, compress='gzip', verbose=0, dry_run=0]) ‘Create an (optional compressed) archive as a tar file from all files in and under *base_dir*. *compress* must be "'gzip'" (the default), "'bzip2'", "'xz'", "'compress'", or "None". For the "'compress'" method the compression utility named by **compress** must be on the default program search path, so this is probably Unix-specific. The output tar file will be named "base_dir.tar", possibly plus the appropriate compression extension (".gz", ".bz2", ".xz" or ".Z"). Return the output filename. Changed in version 3.5: Added support for the "xz" compression. distutils.archive_util.make_zipfile(base_name, base_dir[, verbose=0, dry_run=0]) Create a zip file from all files in and under *base_dir*. The output zip file will be named *base_name* + ".zip". Uses either the "zipfile" Python module (if available) or the InfoZIP "zip" utility (if installed and found on the default search path). If neither tool is available, raises "DistutilsExecError". Returns the name of the output zip file. 9.8. "distutils.dep_util" — Dependency checking =============================================== This module provides functions for performing simple, timestamp-based dependency of files and groups of files; also, functions based entirely on such timestamp dependency analysis. distutils.dep_util.newer(source, target) Return true if *source* exists and is more recently modified than *target*, or if *source* exists and *target* doesn’t. Return false if both exist and *target* is the same age or newer than *source*. Raise "DistutilsFileError" if *source* does not exist. distutils.dep_util.newer_pairwise(sources, targets) Walk two filename lists in parallel, testing if each source is newer than its corresponding target. Return a pair of lists (*sources*, *targets*) where source is newer than target, according to the semantics of "newer()". distutils.dep_util.newer_group(sources, target[, missing='error']) Return true if *target* is out-of-date with respect to any file listed in *sources*. In other words, if *target* exists and is newer than every file in *sources*, return false; otherwise return true. *missing* controls what we do when a source file is missing; the default ("'error'") is to blow up with an "OSError" from inside "os.stat()"; if it is "'ignore'", we silently drop any missing source files; if it is "'newer'", any missing source files make us assume that *target* is out-of-date (this is handy in “dry- run” mode: it’ll make you pretend to carry out commands that wouldn’t work because inputs are missing, but that doesn’t matter because you’re not actually going to run the commands). 9.9. "distutils.dir_util" — Directory tree operations ===================================================== This module provides functions for operating on directories and trees of directories. distutils.dir_util.mkpath(name[, mode=0o777, verbose=0, dry_run=0]) Create a directory and any missing ancestor directories. If the directory already exists (or if *name* is the empty string, which means the current directory, which of course exists), then do nothing. Raise "DistutilsFileError" if unable to create some directory along the way (eg. some sub-path exists, but is a file rather than a directory). If *verbose* is true, print a one-line summary of each mkdir to stdout. Return the list of directories actually created. distutils.dir_util.create_tree(base_dir, files[, mode=0o777, verbose=0, dry_run=0]) Create all the empty directories under *base_dir* needed to put *files* there. *base_dir* is just the name of a directory which doesn’t necessarily exist yet; *files* is a list of filenames to be interpreted relative to *base_dir*. *base_dir* + the directory portion of every file in *files* will be created if it doesn’t already exist. *mode*, *verbose* and *dry_run* flags are as for "mkpath()". distutils.dir_util.copy_tree(src, dst[, preserve_mode=1, preserve_times=1, preserve_symlinks=0, update=0, verbose=0, dry_run=0]) Copy an entire directory tree *src* to a new location *dst*. Both *src* and *dst* must be directory names. If *src* is not a directory, raise "DistutilsFileError". If *dst* does not exist, it is created with "mkpath()". The end result of the copy is that every file in *src* is copied to *dst*, and directories under *src* are recursively copied to *dst*. Return the list of files that were copied or might have been copied, using their output name. The return value is unaffected by *update* or *dry_run*: it is simply the list of all files under *src*, with the names changed to be under *dst*. *preserve_mode* and *preserve_times* are the same as for "distutils.file_util.copy_file()"; note that they only apply to regular files, not to directories. If *preserve_symlinks* is true, symlinks will be copied as symlinks (on platforms that support them!); otherwise (the default), the destination of the symlink will be copied. *update* and *verbose* are the same as for "copy_file()". Files in *src* that begin with ".nfs" are skipped (more information on these files is available in answer D2 of the NFS FAQ page). Changed in version 3.3.1: NFS files are ignored. distutils.dir_util.remove_tree(directory[, verbose=0, dry_run=0]) Recursively remove *directory* and all files and directories underneath it. Any errors are ignored (apart from being reported to "sys.stdout" if *verbose* is true). 9.10. "distutils.file_util" — Single file operations ==================================================== This module contains some utility functions for operating on individual files. distutils.file_util.copy_file(src, dst[, preserve_mode=1, preserve_times=1, update=0, link=None, verbose=0, dry_run=0]) Copy file *src* to *dst*. If *dst* is a directory, then *src* is copied there with the same name; otherwise, it must be a filename. (If the file exists, it will be ruthlessly clobbered.) If *preserve_mode* is true (the default), the file’s mode (type and permission bits, or whatever is analogous on the current platform) is copied. If *preserve_times* is true (the default), the last- modified and last-access times are copied as well. If *update* is true, *src* will only be copied if *dst* does not exist, or if *dst* does exist but is older than *src*. *link* allows you to make hard links (using "os.link()") or symbolic links (using "os.symlink()") instead of copying: set it to "'hard'" or "'sym'"; if it is "None" (the default), files are copied. Don’t set *link* on systems that don’t support it: "copy_file()" doesn’t check if hard or symbolic linking is available. It uses "_copy_file_contents()" to copy file contents. Return a tuple "(dest_name, copied)": *dest_name* is the actual name of the output file, and *copied* is true if the file was copied (or would have been copied, if *dry_run* true). distutils.file_util.move_file(src, dst[, verbose, dry_run]) Move file *src* to *dst*. If *dst* is a directory, the file will be moved into it with the same name; otherwise, *src* is just renamed to *dst*. Returns the new full name of the file. Warning: Handles cross-device moves on Unix using "copy_file()". What about other systems? distutils.file_util.write_file(filename, contents) Create a file called *filename* and write *contents* (a sequence of strings without line terminators) to it. 9.11. "distutils.util" — Miscellaneous other utility functions ============================================================== This module contains other assorted bits and pieces that don’t fit into any other utility module. distutils.util.get_platform() Return a string that identifies the current platform. This is used mainly to distinguish platform-specific build directories and platform-specific built distributions. Typically includes the OS name and version and the architecture (as supplied by ‘os.uname()’), although the exact information included depends on the OS; e.g., on Linux, the kernel version isn’t particularly important. Examples of returned values: * "linux-i586" * "linux-alpha" * "solaris-2.6-sun4u" For non-POSIX platforms, currently just returns "sys.platform". For macOS systems the OS version reflects the minimal version on which binaries will run (that is, the value of "MACOSX_DEPLOYMENT_TARGET" during the build of Python), not the OS version of the current system. For universal binary builds on macOS the architecture value reflects the universal binary status instead of the architecture of the current processor. For 32-bit universal binaries the architecture is "fat", for 64-bit universal binaries the architecture is "fat64", and for 4-way universal binaries the architecture is "universal". Starting from Python 2.7 and Python 3.2 the architecture "fat3" is used for a 3-way universal build (ppc, i386, x86_64) and "intel" is used for a universal build with the i386 and x86_64 architectures Examples of returned values on macOS: * "macosx-10.3-ppc" * "macosx-10.3-fat" * "macosx-10.5-universal" * "macosx-10.6-intel" For AIX, Python 3.9 and later return a string starting with “aix”, followed by additional fields (separated by "'-'") that represent the combined values of AIX Version, Release and Technology Level (first field), Build Date (second field), and bit-size (third field). Python 3.8 and earlier returned only a single additional field with the AIX Version and Release. Examples of returned values on AIX: * "aix-5307-0747-32" # 32-bit build on AIX "oslevel -s": 5300-07-00-0000 * "aix-7105-1731-64" # 64-bit build on AIX "oslevel -s": 7100-05-01-1731 * "aix-7.2" # Legacy form reported in Python 3.8 and earlier Changed in version 3.9: The AIX platform string format now also includes the technology level, build date, and ABI bit-size. distutils.util.convert_path(pathname) Return ‘pathname’ as a name that will work on the native filesystem, i.e. split it on ‘/’ and put it back together again using the current directory separator. Needed because filenames in the setup script are always supplied in Unix style, and have to be converted to the local convention before we can actually use them in the filesystem. Raises "ValueError" on non-Unix-ish systems if *pathname* either starts or ends with a slash. distutils.util.change_root(new_root, pathname) Return *pathname* with *new_root* prepended. If *pathname* is relative, this is equivalent to "os.path.join(new_root,pathname)" Otherwise, it requires making *pathname* relative and then joining the two, which is tricky on DOS/Windows. distutils.util.check_environ() Ensure that ‘os.environ’ has all the environment variables we guarantee that users can use in config files, command-line options, etc. Currently this includes: * "HOME" - user’s home directory (Unix only) * "PLAT" - description of the current platform, including hardware and OS (see "get_platform()") distutils.util.subst_vars(s, local_vars) Perform shell/Perl-style variable substitution on *s*. Every occurrence of "$" followed by a name is considered a variable, and variable is substituted by the value found in the *local_vars* dictionary, or in "os.environ" if it’s not in *local_vars*. *os.environ* is first checked/augmented to guarantee that it contains certain values: see "check_environ()". Raise "ValueError" for any variables not found in either *local_vars* or "os.environ". Note that this is not a fully-fledged string interpolation function. A valid "$variable" can consist only of upper and lower case letters, numbers and an underscore. No { } or ( ) style quoting is available. distutils.util.split_quoted(s) Split a string up according to Unix shell-like rules for quotes and backslashes. In short: words are delimited by spaces, as long as those spaces are not escaped by a backslash, or inside a quoted string. Single and double quotes are equivalent, and the quote characters can be backslash-escaped. The backslash is stripped from any two-character escape sequence, leaving only the escaped character. The quote characters are stripped from any quoted string. Returns a list of words. distutils.util.execute(func, args[, msg=None, verbose=0, dry_run=0]) Perform some action that affects the outside world (for instance, writing to the filesystem). Such actions are special because they are disabled by the *dry_run* flag. This method takes care of all that bureaucracy for you; all you have to do is supply the function to call and an argument tuple for it (to embody the “external action” being performed), and an optional message to print. distutils.util.strtobool(val) Convert a string representation of truth to true (1) or false (0). True values are "y", "yes", "t", "true", "on" and "1"; false values are "n", "no", "f", "false", "off" and "0". Raises "ValueError" if *val* is anything else. distutils.util.byte_compile(py_files[, optimize=0, force=0, prefix=None, base_dir=None, verbose=1, dry_run=0, direct=None]) Byte-compile a collection of Python source files to ".pyc" files in a "__pycache__" subdirectory (see **PEP 3147** and **PEP 488**). *py_files* is a list of files to compile; any files that don’t end in ".py" are silently skipped. *optimize* must be one of the following: * "0" - don’t optimize * "1" - normal optimization (like "python -O") * "2" - extra optimization (like "python -OO") If *force* is true, all files are recompiled regardless of timestamps. The source filename encoded in each *bytecode* file defaults to the filenames listed in *py_files*; you can modify these with *prefix* and *basedir*. *prefix* is a string that will be stripped off of each source filename, and *base_dir* is a directory name that will be prepended (after *prefix* is stripped). You can supply either or both (or neither) of *prefix* and *base_dir*, as you wish. If *dry_run* is true, doesn’t actually do anything that would affect the filesystem. Byte-compilation is either done directly in this interpreter process with the standard "py_compile" module, or indirectly by writing a temporary script and executing it. Normally, you should let "byte_compile()" figure out to use direct compilation or not (see the source for details). The *direct* flag is used by the script generated in indirect mode; unless you know what you’re doing, leave it set to "None". Changed in version 3.2.3: Create ".pyc" files with an "import magic tag" in their name, in a "__pycache__" subdirectory instead of files without tag in the current directory. Changed in version 3.5: Create ".pyc" files according to **PEP 488**. distutils.util.rfc822_escape(header) Return a version of *header* escaped for inclusion in an **RFC 822** header, by ensuring there are 8 spaces space after each newline. Note that it does no other modification of the string. 9.12. "distutils.dist" — The Distribution class =============================================== This module provides the "Distribution" class, which represents the module distribution being built/installed/distributed. 9.13. "distutils.extension" — The Extension class ================================================= This module provides the "Extension" class, used to describe C/C++ extension modules in setup scripts. 9.14. "distutils.debug" — Distutils debug mode ============================================== This module provides the DEBUG flag. 9.15. "distutils.errors" — Distutils exceptions =============================================== Provides exceptions used by the Distutils modules. Note that Distutils modules may raise standard exceptions; in particular, SystemExit is usually raised for errors that are obviously the end- user’s fault (eg. bad command-line arguments). This module is safe to use in "from ... import *" mode; it only exports symbols whose names start with "Distutils" and end with "Error". 9.16. "distutils.fancy_getopt" — Wrapper around the standard getopt module ========================================================================== This module provides a wrapper around the standard "getopt" module that provides the following additional features: * short and long options are tied together * options have help strings, so "fancy_getopt()" could potentially create a complete usage summary * options set attributes of a passed-in object * boolean options can have “negative aliases” — eg. if "--quiet" is the “negative alias” of "--verbose", then "--quiet" on the command line sets *verbose* to false. distutils.fancy_getopt.fancy_getopt(options, negative_opt, object, args) Wrapper function. *options* is a list of "(long_option, short_option, help_string)" 3-tuples as described in the constructor for "FancyGetopt". *negative_opt* should be a dictionary mapping option names to option names, both the key and value should be in the *options* list. *object* is an object which will be used to store values (see the "getopt()" method of the "FancyGetopt" class). *args* is the argument list. Will use "sys.argv[1:]" if you pass "None" as *args*. distutils.fancy_getopt.wrap_text(text, width) Wraps *text* to less than *width* wide. class distutils.fancy_getopt.FancyGetopt([option_table=None]) The option_table is a list of 3-tuples: "(long_option, short_option, help_string)" If an option takes an argument, its *long_option* should have "'='" appended; *short_option* should just be a single character, no "':'" in any case. *short_option* should be "None" if a *long_option* doesn’t have a corresponding *short_option*. All option tuples must have long options. The "FancyGetopt" class provides the following methods: FancyGetopt.getopt([args=None, object=None]) Parse command-line options in args. Store as attributes on *object*. If *args* is "None" or not supplied, uses "sys.argv[1:]". If *object* is "None" or not supplied, creates a new "OptionDummy" instance, stores option values there, and returns a tuple "(args, object)". If *object* is supplied, it is modified in place and "getopt()" just returns *args*; in both cases, the returned *args* is a modified copy of the passed-in *args* list, which is left untouched. FancyGetopt.get_option_order() Returns the list of "(option, value)" tuples processed by the previous run of "getopt()" Raises "RuntimeError" if "getopt()" hasn’t been called yet. FancyGetopt.generate_help([header=None]) Generate help text (a list of strings, one per suggested line of output) from the option table for this "FancyGetopt" object. If supplied, prints the supplied *header* at the top of the help. 9.17. "distutils.filelist" — The FileList class =============================================== This module provides the "FileList" class, used for poking about the filesystem and building lists of files. 9.18. "distutils.log" — Simple **PEP 282**-style logging ======================================================== 9.19. "distutils.spawn" — Spawn a sub-process ============================================= This module provides the "spawn()" function, a front-end to various platform-specific functions for launching another program in a sub- process. Also provides "find_executable()" to search the path for a given executable name. 9.20. "distutils.sysconfig" — System configuration information ============================================================== The "distutils.sysconfig" module provides access to Python’s low-level configuration information. The specific configuration variables available depend heavily on the platform and configuration. The specific variables depend on the build process for the specific version of Python being run; the variables are those found in the "Makefile" and configuration header that are installed with Python on Unix systems. The configuration header is called "pyconfig.h" for Python versions starting with 2.2, and "config.h" for earlier versions of Python. Some additional functions are provided which perform some useful manipulations for other parts of the "distutils" package. distutils.sysconfig.PREFIX The result of "os.path.normpath(sys.prefix)". distutils.sysconfig.EXEC_PREFIX The result of "os.path.normpath(sys.exec_prefix)". distutils.sysconfig.get_config_var(name) Return the value of a single variable. This is equivalent to "get_config_vars().get(name)". distutils.sysconfig.get_config_vars(...) Return a set of variable definitions. If there are no arguments, this returns a dictionary mapping names of configuration variables to values. If arguments are provided, they should be strings, and the return value will be a sequence giving the associated values. If a given name does not have a corresponding value, "None" will be included for that variable. distutils.sysconfig.get_config_h_filename() Return the full path name of the configuration header. For Unix, this will be the header generated by the **configure** script; for other platforms the header will have been supplied directly by the Python source distribution. The file is a platform-specific text file. distutils.sysconfig.get_makefile_filename() Return the full path name of the "Makefile" used to build Python. For Unix, this will be a file generated by the **configure** script; the meaning for other platforms will vary. The file is a platform-specific text file, if it exists. This function is only useful on POSIX platforms. distutils.sysconfig.get_python_inc([plat_specific[, prefix]]) Return the directory for either the general or platform-dependent C include files. If *plat_specific* is true, the platform-dependent include directory is returned; if false or omitted, the platform- independent directory is returned. If *prefix* is given, it is used as either the prefix instead of "PREFIX", or as the exec-prefix instead of "EXEC_PREFIX" if *plat_specific* is true. distutils.sysconfig.get_python_lib([plat_specific[, standard_lib[, prefix]]]) Return the directory for either the general or platform-dependent library installation. If *plat_specific* is true, the platform- dependent include directory is returned; if false or omitted, the platform-independent directory is returned. If *prefix* is given, it is used as either the prefix instead of "PREFIX", or as the exec-prefix instead of "EXEC_PREFIX" if *plat_specific* is true. If *standard_lib* is true, the directory for the standard library is returned rather than the directory for the installation of third-party extensions. The following function is only intended for use within the "distutils" package. distutils.sysconfig.customize_compiler(compiler) Do any platform-specific customization of a "distutils.ccompiler.CCompiler" instance. This function is only needed on Unix at this time, but should be called consistently to support forward-compatibility. It inserts the information that varies across Unix flavors and is stored in Python’s "Makefile". This information includes the selected compiler, compiler and linker options, and the extension used by the linker for shared objects. This function is even more special-purpose, and should only be used from Python’s own build procedures. distutils.sysconfig.set_python_build() Inform the "distutils.sysconfig" module that it is being used as part of the build process for Python. This changes a lot of relative locations for files, allowing them to be located in the build area rather than in an installed Python. 9.21. "distutils.text_file" — The TextFile class ================================================ This module provides the "TextFile" class, which gives an interface to text files that (optionally) takes care of stripping comments, ignoring blank lines, and joining lines with backslashes. class distutils.text_file.TextFile([filename=None, file=None, **options]) This class provides a file-like object that takes care of all the things you commonly want to do when processing a text file that has some line-by-line syntax: strip comments (as long as "#" is your comment character), skip blank lines, join adjacent lines by escaping the newline (ie. backslash at end of line), strip leading and/or trailing whitespace. All of these are optional and independently controllable. The class provides a "warn()" method so you can generate warning messages that report physical line number, even if the logical line in question spans multiple physical lines. Also provides "unreadline()" for implementing line-at-a-time lookahead. "TextFile" instances are create with either *filename*, *file*, or both. "RuntimeError" is raised if both are "None". *filename* should be a string, and *file* a file object (or something that provides "readline()" and "close()" methods). It is recommended that you supply at least *filename*, so that "TextFile" can include it in warning messages. If *file* is not supplied, "TextFile" creates its own using the "open()" built-in function. The options are all boolean, and affect the values returned by "readline()" +--------------------+----------------------------------+-----------+ | option name | description | default | |====================|==================================|===========| | *strip_comments* | strip from "'#'" to end-of-line, | true | | | as well as any whitespace | | | | leading up to the "'#'"—unless | | | | it is escaped by a backslash | | +--------------------+----------------------------------+-----------+ | *lstrip_ws* | strip leading whitespace from | false | | | each line before returning it | | +--------------------+----------------------------------+-----------+ | *rstrip_ws* | strip trailing whitespace | true | | | (including line terminator!) | | | | from each line before returning | | | | it. | | +--------------------+----------------------------------+-----------+ | *skip_blanks* | skip lines that are empty | true | | | *after* stripping comments and | | | | whitespace. (If both lstrip_ws | | | | and rstrip_ws are false, then | | | | some lines may consist of solely | | | | whitespace: these will *not* be | | | | skipped, even if *skip_blanks* | | | | is true.) | | +--------------------+----------------------------------+-----------+ | *join_lines* | if a backslash is the last non- | false | | | newline character on a line | | | | after stripping comments and | | | | whitespace, join the following | | | | line to it to form one logical | | | | line; if N consecutive lines end | | | | with a backslash, then N+1 | | | | physical lines will be joined to | | | | form one logical line. | | +--------------------+----------------------------------+-----------+ | *collapse_join* | strip leading whitespace from | false | | | lines that are joined to their | | | | predecessor; only matters if | | | | "(join_lines and not lstrip_ws)" | | +--------------------+----------------------------------+-----------+ Note that since *rstrip_ws* can strip the trailing newline, the semantics of "readline()" must differ from those of the built-in file object’s "readline()" method! In particular, "readline()" returns "None" for end-of-file: an empty string might just be a blank line (or an all-whitespace line), if *rstrip_ws* is true but *skip_blanks* is not. open(filename) Open a new file *filename*. This overrides any *file* or *filename* constructor arguments. close() Close the current file and forget everything we know about it (including the filename and the current line number). warn(msg[, line=None]) Print (to stderr) a warning message tied to the current logical line in the current file. If the current logical line in the file spans multiple physical lines, the warning refers to the whole range, such as ""lines 3-5"". If *line* is supplied, it overrides the current line number; it may be a list or tuple to indicate a range of physical lines, or an integer for a single physical line. readline() Read and return a single logical line from the current file (or from an internal buffer if lines have previously been “unread” with "unreadline()"). If the *join_lines* option is true, this may involve reading multiple physical lines concatenated into a single string. Updates the current line number, so calling "warn()" after "readline()" emits a warning about the physical line(s) just read. Returns "None" on end-of-file, since the empty string can occur if *rstrip_ws* is true but *strip_blanks* is not. readlines() Read and return the list of all logical lines remaining in the current file. This updates the current line number to the last line of the file. unreadline(line) Push *line* (a string) onto an internal buffer that will be checked by future "readline()" calls. Handy for implementing a parser with line-at-a-time lookahead. Note that lines that are “unread” with "unreadline()" are not subsequently re-cleansed (whitespace stripped, or whatever) when read with "readline()". If multiple calls are made to "unreadline()" before a call to "readline()", the lines will be returned most in most recent first order. 9.22. "distutils.version" — Version number classes ================================================== 9.23. "distutils.cmd" — Abstract base class for Distutils commands ================================================================== This module supplies the abstract base class "Command". class distutils.cmd.Command(dist) Abstract base class for defining command classes, the “worker bees” of the Distutils. A useful analogy for command classes is to think of them as subroutines with local variables called *options*. The options are declared in "initialize_options()" and defined (given their final values) in "finalize_options()", both of which must be defined by every command class. The distinction between the two is necessary because option values might come from the outside world (command line, config file, …), and any options dependent on other options must be computed after these outside influences have been processed — hence "finalize_options()". The body of the subroutine, where it does all its work based on the values of its options, is the "run()" method, which must also be implemented by every command class. The class constructor takes a single argument *dist*, a "Distribution" instance. 9.24. Creating a new Distutils command ====================================== This section outlines the steps to create a new Distutils command. A new command lives in a module in the "distutils.command" package. There is a sample template in that directory called "command_template". Copy this file to a new module with the same name as the new command you’re implementing. This module should implement a class with the same name as the module (and the command). So, for instance, to create the command "peel_banana" (so that users can run "setup.py peel_banana"), you’d copy "command_template" to "distutils/command/peel_banana.py", then edit it so that it’s implementing the class "peel_banana", a subclass of "distutils.cmd.Command". Subclasses of "Command" must define the following methods. Command.initialize_options() Set default values for all the options that this command supports. Note that these defaults may be overridden by other commands, by the setup script, by config files, or by the command-line. Thus, this is not the place to code dependencies between options; generally, "initialize_options()" implementations are just a bunch of "self.foo = None" assignments. Command.finalize_options() Set final values for all the options that this command supports. This is always called as late as possible, ie. after any option assignments from the command-line or from other commands have been done. Thus, this is the place to code option dependencies: if *foo* depends on *bar*, then it is safe to set *foo* from *bar* as long as *foo* still has the same value it was assigned in "initialize_options()". Command.run() A command’s raison d’etre: carry out the action it exists to perform, controlled by the options initialized in "initialize_options()", customized by other commands, the setup script, the command-line, and config files, and finalized in "finalize_options()". All terminal output and filesystem interaction should be done by "run()". Command.sub_commands *sub_commands* formalizes the notion of a “family” of commands, e.g. "install" as the parent with sub-commands "install_lib", "install_headers", etc. The parent of a family of commands defines *sub_commands* as a class attribute; it’s a list of 2-tuples "(command_name, predicate)", with *command_name* a string and *predicate* a function, a string or "None". *predicate* is a method of the parent command that determines whether the corresponding command is applicable in the current situation. (E.g. "install_headers" is only applicable if we have any C header files to install.) If *predicate* is "None", that command is always applicable. *sub_commands* is usually defined at the *end* of a class, because predicates can be methods of the class, so they must already have been defined. The canonical example is the **install** command. 9.25. "distutils.command" — Individual Distutils commands ========================================================= 9.26. "distutils.command.bdist" — Build a binary installer ========================================================== 9.27. "distutils.command.bdist_packager" — Abstract base class for packagers ============================================================================ 9.28. "distutils.command.bdist_dumb" — Build a “dumb” installer =============================================================== 9.29. "distutils.command.bdist_msi" — Build a Microsoft Installer binary package ================================================================================ class distutils.command.bdist_msi.bdist_msi Deprecated since version 3.9: Use bdist_wheel (wheel packages) instead.Builds a Windows Installer (.msi) binary package.In most cases, the "bdist_msi" installer is a better choice than the "bdist_wininst" installer, because it provides better support for Win64 platforms, allows administrators to perform non-interactive installations, and allows installation through group policies. 9.30. "distutils.command.bdist_rpm" — Build a binary distribution as a Redhat RPM and SRPM ========================================================================================== 9.31. "distutils.command.bdist_wininst" — Build a Windows installer =================================================================== Deprecated since version 3.8: Use bdist_wheel (wheel packages) instead. 9.32. "distutils.command.sdist" — Build a source distribution ============================================================= 9.33. "distutils.command.build" — Build all files of a package ============================================================== 9.34. "distutils.command.build_clib" — Build any C libraries in a package ========================================================================= 9.35. "distutils.command.build_ext" — Build any extensions in a package ======================================================================= 9.36. "distutils.command.build_py" — Build the .py/.pyc files of a package ========================================================================== class distutils.command.build_py.build_py class distutils.command.build_py.build_py_2to3 Alternative implementation of build_py which also runs the 2to3 conversion library on each .py file that is going to be installed. To use this in a setup.py file for a distribution that is designed to run with both Python 2.x and 3.x, add: try: from distutils.command.build_py import build_py_2to3 as build_py except ImportError: from distutils.command.build_py import build_py to your setup.py, and later: cmdclass = {'build_py': build_py} to the invocation of setup(). 9.37. "distutils.command.build_scripts" — Build the scripts of a package ======================================================================== 9.38. "distutils.command.clean" — Clean a package build area ============================================================ This command removes the temporary files created by **build** and its subcommands, like intermediary compiled object files. With the "-- all" option, the complete build directory will be removed. Extension modules built in place will not be cleaned, as they are not in the build directory. 9.39. "distutils.command.config" — Perform package configuration ================================================================ 9.40. "distutils.command.install" — Install a package ===================================================== 9.41. "distutils.command.install_data" — Install data files from a package ========================================================================== 9.42. "distutils.command.install_headers" — Install C/C++ header files from a package ===================================================================================== 9.43. "distutils.command.install_lib" — Install library files from a package ============================================================================ 9.44. "distutils.command.install_scripts" — Install script files from a package =============================================================================== 9.45. "distutils.command.register" — Register a module with the Python Package Index ==================================================================================== The "register" command registers the package with the Python Package Index. This is described in more detail in **PEP 301**. 9.46. "distutils.command.check" — Check the meta-data of a package ================================================================== The "check" command performs some tests on the meta-data of a package. For example, it verifies that all required meta-data are provided as the arguments passed to the "setup()" function. 5. Creating Built Distributions ******************************* Note: This document is being retained solely until the "setuptools" documentation at https://setuptools.readthedocs.io/en/latest/setuptools.html independently covers all of the relevant information currently included here. A “built distribution” is what you’re probably used to thinking of either as a “binary package” or an “installer” (depending on your background). It’s not necessarily binary, though, because it might contain only Python source code and/or byte-code; and we don’t call it a package, because that word is already spoken for in Python. (And “installer” is a term specific to the world of mainstream desktop systems.) A built distribution is how you make life as easy as possible for installers of your module distribution: for users of RPM-based Linux systems, it’s a binary RPM; for Windows users, it’s an executable installer; for Debian-based Linux users, it’s a Debian package; and so forth. Obviously, no one person will be able to create built distributions for every platform under the sun, so the Distutils are designed to enable module developers to concentrate on their specialty—writing code and creating source distributions—while an intermediary species called *packagers* springs up to turn source distributions into built distributions for as many platforms as there are packagers. Of course, the module developer could be their own packager; or the packager could be a volunteer “out there” somewhere who has access to a platform which the original developer does not; or it could be software periodically grabbing new source distributions and turning them into built distributions for as many platforms as the software has access to. Regardless of who they are, a packager uses the setup script and the **bdist** command family to generate built distributions. As a simple example, if I run the following command in the Distutils source tree: python setup.py bdist then the Distutils builds my module distribution (the Distutils itself in this case), does a “fake” installation (also in the "build" directory), and creates the default type of built distribution for my platform. The default format for built distributions is a “dumb” tar file on Unix, and a simple executable installer on Windows. (That tar file is considered “dumb” because it has to be unpacked in a specific location to work.) Thus, the above command on a Unix system creates "Distutils-1.0.*plat*.tar.gz"; unpacking this tarball from the right place installs the Distutils just as though you had downloaded the source distribution and run "python setup.py install". (The “right place” is either the root of the filesystem or Python’s "*prefix*" directory, depending on the options given to the **bdist_dumb** command; the default is to make dumb distributions relative to "*prefix*".) Obviously, for pure Python distributions, this isn’t any simpler than just running "python setup.py install"—but for non-pure distributions, which include extensions that would need to be compiled, it can mean the difference between someone being able to use your extensions or not. And creating “smart” built distributions, such as an RPM package or an executable installer for Windows, is far more convenient for users even if your distribution doesn’t include any extensions. The **bdist** command has a "--formats" option, similar to the **sdist** command, which you can use to select the types of built distribution to generate: for example, python setup.py bdist --format=zip would, when run on a Unix system, create "Distutils-1.0.*plat*.zip"—again, this archive would be unpacked from the root directory to install the Distutils. The available formats for built distributions are: +---------------+--------------------------------+-----------+ | Format | Description | Notes | |===============|================================|===========| | "gztar" | gzipped tar file (".tar.gz") | (1) | +---------------+--------------------------------+-----------+ | "bztar" | bzipped tar file (".tar.bz2") | | +---------------+--------------------------------+-----------+ | "xztar" | xzipped tar file (".tar.xz") | | +---------------+--------------------------------+-----------+ | "ztar" | compressed tar file (".tar.Z") | (3) | +---------------+--------------------------------+-----------+ | "tar" | tar file (".tar") | | +---------------+--------------------------------+-----------+ | "zip" | zip file (".zip") | (2),(4) | +---------------+--------------------------------+-----------+ | "rpm" | RPM | (5) | +---------------+--------------------------------+-----------+ | "pkgtool" | Solaris **pkgtool** | | +---------------+--------------------------------+-----------+ | "sdux" | HP-UX **swinstall** | | +---------------+--------------------------------+-----------+ | "wininst" | self-extracting ZIP file for | (4) | | | Windows | | +---------------+--------------------------------+-----------+ | "msi" | Microsoft Installer. | | +---------------+--------------------------------+-----------+ Changed in version 3.5: Added support for the "xztar" format. Notes: 1. default on Unix 2. default on Windows 3. requires external **compress** utility. 4. requires either external **zip** utility or "zipfile" module (part of the standard Python library since Python 1.6) 5. requires external **rpm** utility, version 3.0.4 or better (use "rpm --version" to find out which version you have) You don’t have to use the **bdist** command with the "--formats" option; you can also use the command that directly implements the format you’re interested in. Some of these **bdist** “sub-commands” actually generate several similar formats; for instance, the **bdist_dumb** command generates all the “dumb” archive formats ("tar", "gztar", "bztar", "xztar", "ztar", and "zip"), and **bdist_rpm** generates both binary and source RPMs. The **bdist** sub-commands, and the formats generated by each, are: +----------------------------+---------------------------------------+ | Command | Formats | |============================|=======================================| | **bdist_dumb** | tar, gztar, bztar, xztar, ztar, zip | +----------------------------+---------------------------------------+ | **bdist_rpm** | rpm, srpm | +----------------------------+---------------------------------------+ | **bdist_wininst** | wininst | +----------------------------+---------------------------------------+ | **bdist_msi** | msi | +----------------------------+---------------------------------------+ Note: bdist_wininst is deprecated since Python 3.8. Note: bdist_msi is deprecated since Python 3.9. The following sections give details on the individual **bdist_*** commands. 5.1. Creating RPM packages ========================== The RPM format is used by many popular Linux distributions, including Red Hat, SuSE, and Mandrake. If one of these (or any of the other RPM-based Linux distributions) is your usual environment, creating RPM packages for other users of that same distribution is trivial. Depending on the complexity of your module distribution and differences between Linux distributions, you may also be able to create RPMs that work on different RPM-based distributions. The usual way to create an RPM of your module distribution is to run the **bdist_rpm** command: python setup.py bdist_rpm or the **bdist** command with the "--format" option: python setup.py bdist --formats=rpm The former allows you to specify RPM-specific options; the latter allows you to easily specify multiple formats in one run. If you need to do both, you can explicitly specify multiple **bdist_*** commands and their options: python setup.py bdist_rpm --packager="John Doe " \ bdist_wininst --target-version="2.0" Creating RPM packages is driven by a ".spec" file, much as using the Distutils is driven by the setup script. To make your life easier, the **bdist_rpm** command normally creates a ".spec" file based on the information you supply in the setup script, on the command line, and in any Distutils configuration files. Various options and sections in the ".spec" file are derived from options in the setup script as follows: +--------------------------------------------+------------------------------------------------+ | RPM ".spec" file option or section | Distutils setup script option | |============================================|================================================| | Name | "name" | +--------------------------------------------+------------------------------------------------+ | Summary (in preamble) | "description" | +--------------------------------------------+------------------------------------------------+ | Version | "version" | +--------------------------------------------+------------------------------------------------+ | Vendor | "author" and "author_email", or — & | | | "maintainer" and "maintainer_email" | +--------------------------------------------+------------------------------------------------+ | Copyright | "license" | +--------------------------------------------+------------------------------------------------+ | Url | "url" | +--------------------------------------------+------------------------------------------------+ | %description (section) | "long_description" | +--------------------------------------------+------------------------------------------------+ Additionally, there are many options in ".spec" files that don’t have corresponding options in the setup script. Most of these are handled through options to the **bdist_rpm** command as follows: +---------------------------------+-------------------------------+---------------------------+ | RPM ".spec" file option or | **bdist_rpm** option | default value | | section | | | |=================================|===============================|===========================| | Release | "release" | “1” | +---------------------------------+-------------------------------+---------------------------+ | Group | "group" | “Development/Libraries” | +---------------------------------+-------------------------------+---------------------------+ | Vendor | "vendor" | (see above) | +---------------------------------+-------------------------------+---------------------------+ | Packager | "packager" | (none) | +---------------------------------+-------------------------------+---------------------------+ | Provides | "provides" | (none) | +---------------------------------+-------------------------------+---------------------------+ | Requires | "requires" | (none) | +---------------------------------+-------------------------------+---------------------------+ | Conflicts | "conflicts" | (none) | +---------------------------------+-------------------------------+---------------------------+ | Obsoletes | "obsoletes" | (none) | +---------------------------------+-------------------------------+---------------------------+ | Distribution | "distribution_name" | (none) | +---------------------------------+-------------------------------+---------------------------+ | BuildRequires | "build_requires" | (none) | +---------------------------------+-------------------------------+---------------------------+ | Icon | "icon" | (none) | +---------------------------------+-------------------------------+---------------------------+ Obviously, supplying even a few of these options on the command-line would be tedious and error-prone, so it’s usually best to put them in the setup configuration file, "setup.cfg"—see section Writing the Setup Configuration File. If you distribute or package many Python module distributions, you might want to put options that apply to all of them in your personal Distutils configuration file ("~/.pydistutils.cfg"). If you want to temporarily disable this file, you can pass the "--no-user-cfg" option to "setup.py". There are three steps to building a binary RPM package, all of which are handled automatically by the Distutils: 1. create a ".spec" file, which describes the package (analogous to the Distutils setup script; in fact, much of the information in the setup script winds up in the ".spec" file) 2. create the source RPM 3. create the “binary” RPM (which may or may not contain binary code, depending on whether your module distribution contains Python extensions) Normally, RPM bundles the last two steps together; when you use the Distutils, all three steps are typically bundled together. If you wish, you can separate these three steps. You can use the "-- spec-only" option to make **bdist_rpm** just create the ".spec" file and exit; in this case, the ".spec" file will be written to the “distribution directory”—normally "dist/", but customizable with the " --dist-dir" option. (Normally, the ".spec" file winds up deep in the “build tree,” in a temporary directory created by **bdist_rpm**.) 5.2. Creating Windows Installers ================================ Warning: bdist_wininst is deprecated since Python 3.8. Warning: bdist_msi is deprecated since Python 3.9. Executable installers are the natural format for binary distributions on Windows. They display a nice graphical user interface, display some information about the module distribution to be installed taken from the metadata in the setup script, let the user select a few options, and start or cancel the installation. Since the metadata is taken from the setup script, creating Windows installers is usually as easy as running: python setup.py bdist_wininst or the **bdist** command with the "--formats" option: python setup.py bdist --formats=wininst If you have a pure module distribution (only containing pure Python modules and packages), the resulting installer will be version independent and have a name like "foo-1.0.win32.exe". Note that creating "wininst" binary distributions in only supported on Windows systems. If you have a non-pure distribution, the extensions can only be created on a Windows platform, and will be Python version dependent. The installer filename will reflect this and now has the form "foo-1.0.win32-py2.0.exe". You have to create a separate installer for every Python version you want to support. The installer will try to compile pure modules into *bytecode* after installation on the target system in normal and optimizing mode. If you don’t want this to happen for some reason, you can run the **bdist_wininst** command with the "--no-target-compile" and/or the " --no-target-optimize" option. By default the installer will display the cool “Python Powered” logo when it is run, but you can also supply your own 152x261 bitmap which must be a Windows ".bmp" file with the "--bitmap" option. The installer will also display a large title on the desktop background window when it is run, which is constructed from the name of your distribution and the version number. This can be changed to another text by using the "--title" option. The installer file will be written to the “distribution directory” — normally "dist/", but customizable with the "--dist-dir" option. 5.3. Cross-compiling on Windows =============================== Starting with Python 2.6, distutils is capable of cross-compiling between Windows platforms. In practice, this means that with the correct tools installed, you can use a 32bit version of Windows to create 64bit extensions and vice-versa. To build for an alternate platform, specify the "--plat-name" option to the build command. Valid values are currently ‘win32’, and ‘win- amd64’. For example, on a 32bit version of Windows, you could execute: python setup.py build --plat-name=win-amd64 to build a 64bit version of your extension. The Windows Installers also support this option, so the command: python setup.py build --plat-name=win-amd64 bdist_wininst would create a 64bit installation executable on your 32bit version of Windows. To cross-compile, you must download the Python source code and cross- compile Python itself for the platform you are targeting - it is not possible from a binary installation of Python (as the .lib etc file for other platforms are not included.) In practice, this means the user of a 32 bit operating system will need to use Visual Studio 2008 to open the "PCbuild/PCbuild.sln" solution in the Python source tree and build the “x64” configuration of the ‘pythoncore’ project before cross-compiling extensions is possible. Note that by default, Visual Studio 2008 does not install 64bit compilers or tools. You may need to reexecute the Visual Studio setup process and select these tools (using Control Panel->[Add/Remove] Programs is a convenient way to check or modify your existing install.) 5.3.1. The Postinstallation script ---------------------------------- Starting with Python 2.3, a postinstallation script can be specified with the "--install-script" option. The basename of the script must be specified, and the script filename must also be listed in the scripts argument to the setup function. This script will be run at installation time on the target system after all the files have been copied, with "argv[1]" set to "-install", and again at uninstallation time before the files are removed with "argv[1]" set to "-remove". The installation script runs embedded in the windows installer, every output ("sys.stdout", "sys.stderr") is redirected into a buffer and will be displayed in the GUI after the script has finished. Some functions especially useful in this context are available as additional built-in functions in the installation script. directory_created(path) file_created(path) These functions should be called when a directory or file is created by the postinstall script at installation time. It will register *path* with the uninstaller, so that it will be removed when the distribution is uninstalled. To be safe, directories are only removed if they are empty. get_special_folder_path(csidl_string) This function can be used to retrieve special folder locations on Windows like the Start Menu or the Desktop. It returns the full path to the folder. *csidl_string* must be one of the following strings: "CSIDL_APPDATA" "CSIDL_COMMON_STARTMENU" "CSIDL_STARTMENU" "CSIDL_COMMON_DESKTOPDIRECTORY" "CSIDL_DESKTOPDIRECTORY" "CSIDL_COMMON_STARTUP" "CSIDL_STARTUP" "CSIDL_COMMON_PROGRAMS" "CSIDL_PROGRAMS" "CSIDL_FONTS" If the folder cannot be retrieved, "OSError" is raised. Which folders are available depends on the exact Windows version, and probably also the configuration. For details refer to Microsoft’s documentation of the "SHGetSpecialFolderPath()" function. create_shortcut(target, description, filename[, arguments[, workdir[, iconpath[, iconindex]]]]) This function creates a shortcut. *target* is the path to the program to be started by the shortcut. *description* is the description of the shortcut. *filename* is the title of the shortcut that the user will see. *arguments* specifies the command line arguments, if any. *workdir* is the working directory for the program. *iconpath* is the file containing the icon for the shortcut, and *iconindex* is the index of the icon in the file *iconpath*. Again, for details consult the Microsoft documentation for the "IShellLink" interface. 5.4. Vista User Access Control (UAC) ==================================== Starting with Python 2.6, bdist_wininst supports a "--user-access- control" option. The default is ‘none’ (meaning no UAC handling is done), and other valid values are ‘auto’ (meaning prompt for UAC elevation if Python was installed for all users) and ‘force’ (meaning always prompt for elevation). Note: bdist_wininst is deprecated since Python 3.8. Note: bdist_msi is deprecated since Python 3.9. 8. Command Reference ******************** Note: This document is being retained solely until the "setuptools" documentation at https://setuptools.readthedocs.io/en/latest/setuptools.html independently covers all of the relevant information currently included here. 8.1. Installing modules: the **install** command family ======================================================= The install command ensures that the build commands have been run and then runs the subcommands **install_lib**, **install_data** and **install_scripts**. 8.1.1. **install_data** ----------------------- This command installs all data files provided with the distribution. 8.1.2. **install_scripts** -------------------------- This command installs all (Python) scripts in the distribution. 8.2. Creating a source distribution: the **sdist** command ========================================================== The manifest template commands are: +---------------------------------------------+-------------------------------------------------+ | Command | Description | |=============================================|=================================================| | **include pat1 pat2 ...** | include all files matching any of the listed | | | patterns | +---------------------------------------------+-------------------------------------------------+ | **exclude pat1 pat2 ...** | exclude all files matching any of the listed | | | patterns | +---------------------------------------------+-------------------------------------------------+ | **recursive-include dir pat1 pat2 ...** | include all files under *dir* matching any of | | | the listed patterns | +---------------------------------------------+-------------------------------------------------+ | **recursive-exclude dir pat1 pat2 ...** | exclude all files under *dir* matching any of | | | the listed patterns | +---------------------------------------------+-------------------------------------------------+ | **global-include pat1 pat2 ...** | include all files anywhere in the source tree | | | matching — & any of the listed patterns | +---------------------------------------------+-------------------------------------------------+ | **global-exclude pat1 pat2 ...** | exclude all files anywhere in the source tree | | | matching — & any of the listed patterns | +---------------------------------------------+-------------------------------------------------+ | **prune dir** | exclude all files under *dir* | +---------------------------------------------+-------------------------------------------------+ | **graft dir** | include all files under *dir* | +---------------------------------------------+-------------------------------------------------+ The patterns here are Unix-style “glob” patterns: "*" matches any sequence of regular filename characters, "?" matches any single regular filename character, and "[range]" matches any of the characters in *range* (e.g., "a-z", "a-zA-Z", "a-f0-9_."). The definition of “regular filename character” is platform-specific: on Unix it is anything except slash; on Windows anything except backslash or colon. 3. Writing the Setup Configuration File *************************************** Note: This document is being retained solely until the "setuptools" documentation at https://setuptools.readthedocs.io/en/latest/setuptools.html independently covers all of the relevant information currently included here. Often, it’s not possible to write down everything needed to build a distribution *a priori*: you may need to get some information from the user, or from the user’s system, in order to proceed. As long as that information is fairly simple—a list of directories to search for C header files or libraries, for example—then providing a configuration file, "setup.cfg", for users to edit is a cheap and easy way to solicit it. Configuration files also let you provide default values for any command option, which the installer can then override either on the command-line or by editing the config file. The setup configuration file is a useful middle-ground between the setup script—which, ideally, would be opaque to installers [1]—and the command-line to the setup script, which is outside of your control and entirely up to the installer. In fact, "setup.cfg" (and any other Distutils configuration files present on the target system) are processed after the contents of the setup script, but before the command-line. This has several useful consequences: * installers can override some of what you put in "setup.py" by editing "setup.cfg" * you can provide non-standard defaults for options that are not easily set in "setup.py" * installers can override anything in "setup.cfg" using the command- line options to "setup.py" The basic syntax of the configuration file is simple: [command] option=value ... where *command* is one of the Distutils commands (e.g. **build_py**, **install**), and *option* is one of the options that command supports. Any number of options can be supplied for each command, and any number of command sections can be included in the file. Blank lines are ignored, as are comments, which run from a "'#'" character until the end of the line. Long option values can be split across multiple lines simply by indenting the continuation lines. You can find out the list of options supported by a particular command with the universal "--help" option, e.g. $ python setup.py --help build_ext [...] Options for 'build_ext' command: --build-lib (-b) directory for compiled extension modules --build-temp (-t) directory for temporary files (build by-products) --inplace (-i) ignore build-lib and put compiled extensions into the source directory alongside your pure Python modules --include-dirs (-I) list of directories to search for header files --define (-D) C preprocessor macros to define --undef (-U) C preprocessor macros to undefine --swig-opts list of SWIG command line options [...] Note that an option spelled "--foo-bar" on the command-line is spelled "foo_bar" in configuration files. For example, say you want your extensions to be built “in-place”—that is, you have an extension "pkg.ext", and you want the compiled extension file ("ext.so" on Unix, say) to be put in the same source directory as your pure Python modules "pkg.mod1" and "pkg.mod2". You can always use the "--inplace" option on the command-line to ensure this: python setup.py build_ext --inplace But this requires that you always specify the **build_ext** command explicitly, and remember to provide "--inplace". An easier way is to “set and forget” this option, by encoding it in "setup.cfg", the configuration file for this distribution: [build_ext] inplace=1 This will affect all builds of this module distribution, whether or not you explicitly specify **build_ext**. If you include "setup.cfg" in your source distribution, it will also affect end-user builds—which is probably a bad idea for this option, since always building extensions in-place would break installation of the module distribution. In certain peculiar cases, though, modules are built right in their installation directory, so this is conceivably a useful ability. (Distributing extensions that expect to be built in their installation directory is almost always a bad idea, though.) Another example: certain commands take a lot of options that don’t change from run to run; for example, **bdist_rpm** needs to know everything required to generate a “spec” file for creating an RPM distribution. Some of this information comes from the setup script, and some is automatically generated by the Distutils (such as the list of files installed). But some of it has to be supplied as options to **bdist_rpm**, which would be very tedious to do on the command-line for every run. Hence, here is a snippet from the Distutils’ own "setup.cfg": [bdist_rpm] release = 1 packager = Greg Ward doc_files = CHANGES.txt README.txt USAGE.txt doc/ examples/ Note that the "doc_files" option is simply a whitespace-separated string split across multiple lines for readability. See also: Syntax of config files in “Installing Python Modules” More information on the configuration files is available in the manual for system administrators. -[ Footnotes ]- [1] This ideal probably won’t be achieved until auto-configuration is fully supported by the Distutils. 6. Distutils Examples ********************* Note: This document is being retained solely until the "setuptools" documentation at https://setuptools.readthedocs.io/en/latest/setuptools.html independently covers all of the relevant information currently included here. This chapter provides a number of basic examples to help get started with distutils. Additional information about using distutils can be found in the Distutils Cookbook. See also: Distutils Cookbook Collection of recipes showing how to achieve more control over distutils. 6.1. Pure Python distribution (by module) ========================================= If you’re just distributing a couple of modules, especially if they don’t live in a particular package, you can specify them individually using the "py_modules" option in the setup script. In the simplest case, you’ll have two files to worry about: a setup script and the single module you’re distributing, "foo.py" in this example: / setup.py foo.py (In all diagrams in this section, ** will refer to the distribution root directory.) A minimal setup script to describe this situation would be: from distutils.core import setup setup(name='foo', version='1.0', py_modules=['foo'], ) Note that the name of the distribution is specified independently with the "name" option, and there’s no rule that says it has to be the same as the name of the sole module in the distribution (although that’s probably a good convention to follow). However, the distribution name is used to generate filenames, so you should stick to letters, digits, underscores, and hyphens. Since "py_modules" is a list, you can of course specify multiple modules, eg. if you’re distributing modules "foo" and "bar", your setup might look like this: / setup.py foo.py bar.py and the setup script might be from distutils.core import setup setup(name='foobar', version='1.0', py_modules=['foo', 'bar'], ) You can put module source files into another directory, but if you have enough modules to do that, it’s probably easier to specify modules by package rather than listing them individually. 6.2. Pure Python distribution (by package) ========================================== If you have more than a couple of modules to distribute, especially if they are in multiple packages, it’s probably easier to specify whole packages rather than individual modules. This works even if your modules are not in a package; you can just tell the Distutils to process modules from the root package, and that works the same as any other package (except that you don’t have to have an "__init__.py" file). The setup script from the last example could also be written as from distutils.core import setup setup(name='foobar', version='1.0', packages=[''], ) (The empty string stands for the root package.) If those two files are moved into a subdirectory, but remain in the root package, e.g.: / setup.py src/ foo.py bar.py then you would still specify the root package, but you have to tell the Distutils where source files in the root package live: from distutils.core import setup setup(name='foobar', version='1.0', package_dir={'': 'src'}, packages=[''], ) More typically, though, you will want to distribute multiple modules in the same package (or in sub-packages). For example, if the "foo" and "bar" modules belong in package "foobar", one way to layout your source tree is / setup.py foobar/ __init__.py foo.py bar.py This is in fact the default layout expected by the Distutils, and the one that requires the least work to describe in your setup script: from distutils.core import setup setup(name='foobar', version='1.0', packages=['foobar'], ) If you want to put modules in directories not named for their package, then you need to use the "package_dir" option again. For example, if the "src" directory holds modules in the "foobar" package: / setup.py src/ __init__.py foo.py bar.py an appropriate setup script would be from distutils.core import setup setup(name='foobar', version='1.0', package_dir={'foobar': 'src'}, packages=['foobar'], ) Or, you might put modules from your main package right in the distribution root: / setup.py __init__.py foo.py bar.py in which case your setup script would be from distutils.core import setup setup(name='foobar', version='1.0', package_dir={'foobar': ''}, packages=['foobar'], ) (The empty string also stands for the current directory.) If you have sub-packages, they must be explicitly listed in "packages", but any entries in "package_dir" automatically extend to sub-packages. (In other words, the Distutils does *not* scan your source tree, trying to figure out which directories correspond to Python packages by looking for "__init__.py" files.) Thus, if the default layout grows a sub-package: / setup.py foobar/ __init__.py foo.py bar.py subfoo/ __init__.py blah.py then the corresponding setup script would be from distutils.core import setup setup(name='foobar', version='1.0', packages=['foobar', 'foobar.subfoo'], ) 6.3. Single extension module ============================ Extension modules are specified using the "ext_modules" option. "package_dir" has no effect on where extension source files are found; it only affects the source for pure Python modules. The simplest case, a single extension module in a single C source file, is: / setup.py foo.c If the "foo" extension belongs in the root package, the setup script for this could be from distutils.core import setup from distutils.extension import Extension setup(name='foobar', version='1.0', ext_modules=[Extension('foo', ['foo.c'])], ) If the extension actually belongs in a package, say "foopkg", then With exactly the same source tree layout, this extension can be put in the "foopkg" package simply by changing the name of the extension: from distutils.core import setup from distutils.extension import Extension setup(name='foobar', version='1.0', ext_modules=[Extension('foopkg.foo', ['foo.c'])], ) 6.4. Checking a package ======================= The "check" command allows you to verify if your package meta-data meet the minimum requirements to build a distribution. To run it, just call it using your "setup.py" script. If something is missing, "check" will display a warning. Let’s take an example with a simple script: from distutils.core import setup setup(name='foobar') Running the "check" command will display some warnings: $ python setup.py check running check warning: check: missing required meta-data: version, url warning: check: missing meta-data: either (author and author_email) or (maintainer and maintainer_email) should be supplied If you use the reStructuredText syntax in the "long_description" field and docutils is installed you can check if the syntax is fine with the "check" command, using the "restructuredtext" option. For example, if the "setup.py" script is changed like this: from distutils.core import setup desc = """\ My description ============== This is the description of the ``foobar`` package. """ setup(name='foobar', version='1', author='tarek', author_email='tarek@ziade.org', url='http://example.com', long_description=desc) Where the long description is broken, "check" will be able to detect it by using the "docutils" parser: $ python setup.py check --restructuredtext running check warning: check: Title underline too short. (line 2) warning: check: Could not finish the parsing. 6.5. Reading the metadata ========================= The "distutils.core.setup()" function provides a command-line interface that allows you to query the metadata fields of a project through the "setup.py" script of a given project: $ python setup.py --name distribute This call reads the "name" metadata by running the "distutils.core.setup()" function. Although, when a source or binary distribution is created with Distutils, the metadata fields are written in a static file called "PKG-INFO". When a Distutils-based project is installed in Python, the "PKG-INFO" file is copied alongside the modules and packages of the distribution under "NAME- VERSION-pyX.X.egg-info", where "NAME" is the name of the project, "VERSION" its version as defined in the Metadata, and "pyX.X" the major and minor version of Python like "2.7" or "3.2". You can read back this static file, by using the "distutils.dist.DistributionMetadata" class and its "read_pkg_file()" method: >>> from distutils.dist import DistributionMetadata >>> metadata = DistributionMetadata() >>> metadata.read_pkg_file(open('distribute-0.6.8-py2.7.egg-info')) >>> metadata.name 'distribute' >>> metadata.version '0.6.8' >>> metadata.description 'Easily download, build, install, upgrade, and uninstall Python packages' Notice that the class can also be instantiated with a metadata file path to loads its values: >>> pkg_info_path = 'distribute-0.6.8-py2.7.egg-info' >>> DistributionMetadata(pkg_info_path).name 'distribute' 7. Extending Distutils ********************** Note: This document is being retained solely until the "setuptools" documentation at https://setuptools.readthedocs.io/en/latest/setuptools.html independently covers all of the relevant information currently included here. Distutils can be extended in various ways. Most extensions take the form of new commands or replacements for existing commands. New commands may be written to support new types of platform-specific packaging, for example, while replacements for existing commands may be made to modify details of how the command operates on a package. Most extensions of the distutils are made within "setup.py" scripts that want to modify existing commands; many simply add a few file extensions that should be copied into packages in addition to ".py" files as a convenience. Most distutils command implementations are subclasses of the "distutils.cmd.Command" class. New commands may directly inherit from "Command", while replacements often derive from "Command" indirectly, directly subclassing the command they are replacing. Commands are required to derive from "Command". 7.1. Integrating new commands ============================= There are different ways to integrate new command implementations into distutils. The most difficult is to lobby for the inclusion of the new features in distutils itself, and wait for (and require) a version of Python that provides that support. This is really hard for many reasons. The most common, and possibly the most reasonable for most needs, is to include the new implementations with your "setup.py" script, and cause the "distutils.core.setup()" function use them: from distutils.command.build_py import build_py as _build_py from distutils.core import setup class build_py(_build_py): """Specialized Python source builder.""" # implement whatever needs to be different... setup(cmdclass={'build_py': build_py}, ...) This approach is most valuable if the new implementations must be used to use a particular package, as everyone interested in the package will need to have the new command implementation. Beginning with Python 2.4, a third option is available, intended to allow new commands to be added which can support existing "setup.py" scripts without requiring modifications to the Python installation. This is expected to allow third-party extensions to provide support for additional packaging systems, but the commands can be used for anything distutils commands can be used for. A new configuration option, "command_packages" (command-line option "--command-packages"), can be used to specify additional packages to be searched for modules implementing commands. Like all distutils options, this can be specified on the command line or in a configuration file. This option can only be set in the "[global]" section of a configuration file, or before any commands on the command line. If set in a configuration file, it can be overridden from the command line; setting it to an empty string on the command line causes the default to be used. This should never be set in a configuration file provided with a package. This new option can be used to add any number of packages to the list of packages searched for command implementations; multiple package names should be separated by commas. When not specified, the search is only performed in the "distutils.command" package. When "setup.py" is run with the option "--command-packages distcmds,buildcmds", however, the packages "distutils.command", "distcmds", and "buildcmds" will be searched in that order. New commands are expected to be implemented in modules of the same name as the command by classes sharing the same name. Given the example command line option above, the command **bdist_openpkg** could be implemented by the class "distcmds.bdist_openpkg.bdist_openpkg" or "buildcmds.bdist_openpkg.bdist_openpkg". 7.2. Adding new distribution types ================================== Commands that create distributions (files in the "dist/" directory) need to add "(command, filename)" pairs to "self.distribution.dist_files" so that **upload** can upload it to PyPI. The *filename* in the pair contains no path information, only the name of the file itself. In dry-run mode, pairs should still be added to represent what would have been created. Distributing Python Modules (Legacy version) ******************************************** Authors: Greg Ward, Anthony Baxter Email: distutils-sig@python.org See also: Distributing Python Modules The up to date module distribution documentations Note: This document is being retained solely until the "setuptools" documentation at https://setuptools.readthedocs.io/en/latest/setuptools.html independently covers all of the relevant information currently included here. Note: This guide only covers the basic tools for building and distributing extensions that are provided as part of this version of Python. Third party tools offer easier to use and more secure alternatives. Refer to the quick recommendations section in the Python Packaging User Guide for more information. This document describes the Python Distribution Utilities (“Distutils”) from the module developer’s point of view, describing the underlying capabilities that "setuptools" builds on to allow Python developers to make Python modules and extensions readily available to a wider audience. * 1. An Introduction to Distutils * 1.1. Concepts & Terminology * 1.2. A Simple Example * 1.3. General Python terminology * 1.4. Distutils-specific terminology * 2. Writing the Setup Script * 2.1. Listing whole packages * 2.2. Listing individual modules * 2.3. Describing extension modules * 2.4. Relationships between Distributions and Packages * 2.5. Installing Scripts * 2.6. Installing Package Data * 2.7. Installing Additional Files * 2.8. Additional meta-data * 2.9. Debugging the setup script * 3. Writing the Setup Configuration File * 4. Creating a Source Distribution * 4.1. Specifying the files to distribute * 4.2. Manifest-related options * 5. Creating Built Distributions * 5.1. Creating RPM packages * 5.2. Creating Windows Installers * 5.3. Cross-compiling on Windows * 5.4. Vista User Access Control (UAC) * 6. Distutils Examples * 6.1. Pure Python distribution (by module) * 6.2. Pure Python distribution (by package) * 6.3. Single extension module * 6.4. Checking a package * 6.5. Reading the metadata * 7. Extending Distutils * 7.1. Integrating new commands * 7.2. Adding new distribution types * 8. Command Reference * 8.1. Installing modules: the **install** command family * 8.2. Creating a source distribution: the **sdist** command * 9. API Reference * 9.1. "distutils.core" — Core Distutils functionality * 9.2. "distutils.ccompiler" — CCompiler base class * 9.3. "distutils.unixccompiler" — Unix C Compiler * 9.4. "distutils.msvccompiler" — Microsoft Compiler * 9.5. "distutils.bcppcompiler" — Borland Compiler * 9.6. "distutils.cygwincompiler" — Cygwin Compiler * 9.7. "distutils.archive_util" — Archiving utilities * 9.8. "distutils.dep_util" — Dependency checking * 9.9. "distutils.dir_util" — Directory tree operations * 9.10. "distutils.file_util" — Single file operations * 9.11. "distutils.util" — Miscellaneous other utility functions * 9.12. "distutils.dist" — The Distribution class * 9.13. "distutils.extension" — The Extension class * 9.14. "distutils.debug" — Distutils debug mode * 9.15. "distutils.errors" — Distutils exceptions * 9.16. "distutils.fancy_getopt" — Wrapper around the standard getopt module * 9.17. "distutils.filelist" — The FileList class * 9.18. "distutils.log" — Simple **PEP 282**-style logging * 9.19. "distutils.spawn" — Spawn a sub-process * 9.20. "distutils.sysconfig" — System configuration information * 9.21. "distutils.text_file" — The TextFile class * 9.22. "distutils.version" — Version number classes * 9.23. "distutils.cmd" — Abstract base class for Distutils commands * 9.24. Creating a new Distutils command * 9.25. "distutils.command" — Individual Distutils commands * 9.26. "distutils.command.bdist" — Build a binary installer * 9.27. "distutils.command.bdist_packager" — Abstract base class for packagers * 9.28. "distutils.command.bdist_dumb" — Build a “dumb” installer * 9.29. "distutils.command.bdist_msi" — Build a Microsoft Installer binary package * 9.30. "distutils.command.bdist_rpm" — Build a binary distribution as a Redhat RPM and SRPM * 9.31. "distutils.command.bdist_wininst" — Build a Windows installer * 9.32. "distutils.command.sdist" — Build a source distribution * 9.33. "distutils.command.build" — Build all files of a package * 9.34. "distutils.command.build_clib" — Build any C libraries in a package * 9.35. "distutils.command.build_ext" — Build any extensions in a package * 9.36. "distutils.command.build_py" — Build the .py/.pyc files of a package * 9.37. "distutils.command.build_scripts" — Build the scripts of a package * 9.38. "distutils.command.clean" — Clean a package build area * 9.39. "distutils.command.config" — Perform package configuration * 9.40. "distutils.command.install" — Install a package * 9.41. "distutils.command.install_data" — Install data files from a package * 9.42. "distutils.command.install_headers" — Install C/C++ header files from a package * 9.43. "distutils.command.install_lib" — Install library files from a package * 9.44. "distutils.command.install_scripts" — Install script files from a package * 9.45. "distutils.command.register" — Register a module with the Python Package Index * 9.46. "distutils.command.check" — Check the meta-data of a package 1. An Introduction to Distutils ******************************* Note: This document is being retained solely until the "setuptools" documentation at https://setuptools.readthedocs.io/en/latest/setuptools.html independently covers all of the relevant information currently included here. This document covers using the Distutils to distribute your Python modules, concentrating on the role of developer/distributor: if you’re looking for information on installing Python modules, you should refer to the Installing Python Modules (Legacy version) chapter. 1.1. Concepts & Terminology =========================== Using the Distutils is quite simple, both for module developers and for users/administrators installing third-party modules. As a developer, your responsibilities (apart from writing solid, well- documented and well-tested code, of course!) are: * write a setup script ("setup.py" by convention) * (optional) write a setup configuration file * create a source distribution * (optional) create one or more built (binary) distributions Each of these tasks is covered in this document. Not all module developers have access to a multitude of platforms, so it’s not always feasible to expect them to create a multitude of built distributions. It is hoped that a class of intermediaries, called *packagers*, will arise to address this need. Packagers will take source distributions released by module developers, build them on one or more platforms, and release the resulting built distributions. Thus, users on the most popular platforms will be able to install most popular Python module distributions in the most natural way for their platform, without having to run a single setup script or compile a line of code. 1.2. A Simple Example ===================== The setup script is usually quite simple, although since it’s written in Python, there are no arbitrary limits to what you can do with it, though you should be careful about putting arbitrarily expensive operations in your setup script. Unlike, say, Autoconf-style configure scripts, the setup script may be run multiple times in the course of building and installing your module distribution. If all you want to do is distribute a module called "foo", contained in a file "foo.py", then your setup script can be as simple as this: from distutils.core import setup setup(name='foo', version='1.0', py_modules=['foo'], ) Some observations: * most information that you supply to the Distutils is supplied as keyword arguments to the "setup()" function * those keyword arguments fall into two categories: package metadata (name, version number) and information about what’s in the package (a list of pure Python modules, in this case) * modules are specified by module name, not filename (the same will hold true for packages and extensions) * it’s recommended that you supply a little more metadata, in particular your name, email address and a URL for the project (see section Writing the Setup Script for an example) To create a source distribution for this module, you would create a setup script, "setup.py", containing the above code, and run this command from a terminal: python setup.py sdist For Windows, open a command prompt window (Start ‣ Accessories) and change the command to: setup.py sdist **sdist** will create an archive file (e.g., tarball on Unix, ZIP file on Windows) containing your setup script "setup.py", and your module "foo.py". The archive file will be named "foo-1.0.tar.gz" (or ".zip"), and will unpack into a directory "foo-1.0". If an end-user wishes to install your "foo" module, all they have to do is download "foo-1.0.tar.gz" (or ".zip"), unpack it, and—from the "foo-1.0" directory—run python setup.py install which will ultimately copy "foo.py" to the appropriate directory for third-party modules in their Python installation. This simple example demonstrates some fundamental concepts of the Distutils. First, both developers and installers have the same basic user interface, i.e. the setup script. The difference is which Distutils *commands* they use: the **sdist** command is almost exclusively for module developers, while **install** is more often for installers (although most developers will want to install their own code occasionally). If you want to make things really easy for your users, you can create one or more built distributions for them. For instance, if you are running on a Windows machine, and want to make things easy for other Windows users, you can create an executable installer (the most appropriate type of built distribution for this platform) with the **bdist_wininst** command. For example: python setup.py bdist_wininst will create an executable installer, "foo-1.0.win32.exe", in the current directory. Other useful built distribution formats are RPM, implemented by the **bdist_rpm** command, Solaris **pkgtool** (**bdist_pkgtool**), and HP-UX **swinstall** (**bdist_sdux**). For example, the following command will create an RPM file called "foo-1.0.noarch.rpm": python setup.py bdist_rpm (The **bdist_rpm** command uses the **rpm** executable, therefore this has to be run on an RPM-based system such as Red Hat Linux, SuSE Linux, or Mandrake Linux.) You can find out what distribution formats are available at any time by running python setup.py bdist --help-formats 1.3. General Python terminology =============================== If you’re reading this document, you probably have a good idea of what modules, extensions, and so forth are. Nevertheless, just to be sure that everyone is operating from a common starting point, we offer the following glossary of common Python terms: module the basic unit of code reusability in Python: a block of code imported by some other code. Three types of modules concern us here: pure Python modules, extension modules, and packages. pure Python module a module written in Python and contained in a single ".py" file (and possibly associated ".pyc" files). Sometimes referred to as a “pure module.” extension module a module written in the low-level language of the Python implementation: C/C++ for Python, Java for Jython. Typically contained in a single dynamically loadable pre-compiled file, e.g. a shared object (".so") file for Python extensions on Unix, a DLL (given the ".pyd" extension) for Python extensions on Windows, or a Java class file for Jython extensions. (Note that currently, the Distutils only handles C/C++ extensions for Python.) package a module that contains other modules; typically contained in a directory in the filesystem and distinguished from other directories by the presence of a file "__init__.py". root package the root of the hierarchy of packages. (This isn’t really a package, since it doesn’t have an "__init__.py" file. But we have to call it something.) The vast majority of the standard library is in the root package, as are many small, standalone third-party modules that don’t belong to a larger module collection. Unlike regular packages, modules in the root package can be found in many directories: in fact, every directory listed in "sys.path" contributes modules to the root package. 1.4. Distutils-specific terminology =================================== The following terms apply more specifically to the domain of distributing Python modules using the Distutils: module distribution a collection of Python modules distributed together as a single downloadable resource and meant to be installed *en masse*. Examples of some well-known module distributions are NumPy, SciPy, Pillow, or mxBase. (This would be called a *package*, except that term is already taken in the Python context: a single module distribution may contain zero, one, or many Python packages.) pure module distribution a module distribution that contains only pure Python modules and packages. Sometimes referred to as a “pure distribution.” non-pure module distribution a module distribution that contains at least one extension module. Sometimes referred to as a “non-pure distribution.” distribution root the top-level directory of your source tree (or source distribution); the directory where "setup.py" exists. Generally "setup.py" will be run from this directory. The Python Package Index (PyPI) ******************************* The Python Package Index (PyPI) stores metadata describing distributions packaged with distutils and other publishing tools, as well the distribution archives themselves. References to up to date PyPI documentation can be found at Reading the Python Packaging User Guide. 2. Writing the Setup Script *************************** Note: This document is being retained solely until the "setuptools" documentation at https://setuptools.readthedocs.io/en/latest/setuptools.html independently covers all of the relevant information currently included here. The setup script is the centre of all activity in building, distributing, and installing modules using the Distutils. The main purpose of the setup script is to describe your module distribution to the Distutils, so that the various commands that operate on your modules do the right thing. As we saw in section A Simple Example above, the setup script consists mainly of a call to "setup()", and most information supplied to the Distutils by the module developer is supplied as keyword arguments to "setup()". Here’s a slightly more involved example, which we’ll follow for the next couple of sections: the Distutils’ own setup script. (Keep in mind that although the Distutils are included with Python 1.6 and later, they also have an independent existence so that Python 1.5.2 users can use them to install other module distributions. The Distutils’ own setup script, shown here, is used to install the package into Python 1.5.2.) #!/usr/bin/env python from distutils.core import setup setup(name='Distutils', version='1.0', description='Python Distribution Utilities', author='Greg Ward', author_email='gward@python.net', url='https://www.python.org/sigs/distutils-sig/', packages=['distutils', 'distutils.command'], ) There are only two differences between this and the trivial one-file distribution presented in section A Simple Example: more metadata, and the specification of pure Python modules by package, rather than by module. This is important since the Distutils consist of a couple of dozen modules split into (so far) two packages; an explicit list of every module would be tedious to generate and difficult to maintain. For more information on the additional meta-data, see section Additional meta-data. Note that any pathnames (files or directories) supplied in the setup script should be written using the Unix convention, i.e. slash- separated. The Distutils will take care of converting this platform- neutral representation into whatever is appropriate on your current platform before actually using the pathname. This makes your setup script portable across operating systems, which of course is one of the major goals of the Distutils. In this spirit, all pathnames in this document are slash-separated. This, of course, only applies to pathnames given to Distutils functions. If you, for example, use standard Python functions such as "glob.glob()" or "os.listdir()" to specify files, you should be careful to write portable code instead of hardcoding path separators: glob.glob(os.path.join('mydir', 'subdir', '*.html')) os.listdir(os.path.join('mydir', 'subdir')) 2.1. Listing whole packages =========================== The "packages" option tells the Distutils to process (build, distribute, install, etc.) all pure Python modules found in each package mentioned in the "packages" list. In order to do this, of course, there has to be a correspondence between package names and directories in the filesystem. The default correspondence is the most obvious one, i.e. package "distutils" is found in the directory "distutils" relative to the distribution root. Thus, when you say "packages = ['foo']" in your setup script, you are promising that the Distutils will find a file "foo/__init__.py" (which might be spelled differently on your system, but you get the idea) relative to the directory where your setup script lives. If you break this promise, the Distutils will issue a warning but still process the broken package anyway. If you use a different convention to lay out your source directory, that’s no problem: you just have to supply the "package_dir" option to tell the Distutils about your convention. For example, say you keep all Python source under "lib", so that modules in the “root package” (i.e., not in any package at all) are in "lib", modules in the "foo" package are in "lib/foo", and so forth. Then you would put package_dir = {'': 'lib'} in your setup script. The keys to this dictionary are package names, and an empty package name stands for the root package. The values are directory names relative to your distribution root. In this case, when you say "packages = ['foo']", you are promising that the file "lib/foo/__init__.py" exists. Another possible convention is to put the "foo" package right in "lib", the "foo.bar" package in "lib/bar", etc. This would be written in the setup script as package_dir = {'foo': 'lib'} A "package: dir" entry in the "package_dir" dictionary implicitly applies to all packages below *package*, so the "foo.bar" case is automatically handled here. In this example, having "packages = ['foo', 'foo.bar']" tells the Distutils to look for "lib/__init__.py" and "lib/bar/__init__.py". (Keep in mind that although "package_dir" applies recursively, you must explicitly list all packages in "packages": the Distutils will *not* recursively scan your source tree looking for any directory with an "__init__.py" file.) 2.2. Listing individual modules =============================== For a small module distribution, you might prefer to list all modules rather than listing packages—especially the case of a single module that goes in the “root package” (i.e., no package at all). This simplest case was shown in section A Simple Example; here is a slightly more involved example: py_modules = ['mod1', 'pkg.mod2'] This describes two modules, one of them in the “root” package, the other in the "pkg" package. Again, the default package/directory layout implies that these two modules can be found in "mod1.py" and "pkg/mod2.py", and that "pkg/__init__.py" exists as well. And again, you can override the package/directory correspondence using the "package_dir" option. 2.3. Describing extension modules ================================= Just as writing Python extension modules is a bit more complicated than writing pure Python modules, describing them to the Distutils is a bit more complicated. Unlike pure modules, it’s not enough just to list modules or packages and expect the Distutils to go out and find the right files; you have to specify the extension name, source file(s), and any compile/link requirements (include directories, libraries to link with, etc.). All of this is done through another keyword argument to "setup()", the "ext_modules" option. "ext_modules" is just a list of "Extension" instances, each of which describes a single extension module. Suppose your distribution includes a single extension, called "foo" and implemented by "foo.c". If no additional instructions to the compiler/linker are needed, describing this extension is quite simple: Extension('foo', ['foo.c']) The "Extension" class can be imported from "distutils.core" along with "setup()". Thus, the setup script for a module distribution that contains only this one extension and nothing else might be: from distutils.core import setup, Extension setup(name='foo', version='1.0', ext_modules=[Extension('foo', ['foo.c'])], ) The "Extension" class (actually, the underlying extension-building machinery implemented by the **build_ext** command) supports a great deal of flexibility in describing Python extensions, which is explained in the following sections. 2.3.1. Extension names and packages ----------------------------------- The first argument to the "Extension" constructor is always the name of the extension, including any package names. For example, Extension('foo', ['src/foo1.c', 'src/foo2.c']) describes an extension that lives in the root package, while Extension('pkg.foo', ['src/foo1.c', 'src/foo2.c']) describes the same extension in the "pkg" package. The source files and resulting object code are identical in both cases; the only difference is where in the filesystem (and therefore where in Python’s namespace hierarchy) the resulting extension lives. If you have a number of extensions all in the same package (or all under the same base package), use the "ext_package" keyword argument to "setup()". For example, setup(..., ext_package='pkg', ext_modules=[Extension('foo', ['foo.c']), Extension('subpkg.bar', ['bar.c'])], ) will compile "foo.c" to the extension "pkg.foo", and "bar.c" to "pkg.subpkg.bar". 2.3.2. Extension source files ----------------------------- The second argument to the "Extension" constructor is a list of source files. Since the Distutils currently only support C, C++, and Objective-C extensions, these are normally C/C++/Objective-C source files. (Be sure to use appropriate extensions to distinguish C++ source files: ".cc" and ".cpp" seem to be recognized by both Unix and Windows compilers.) However, you can also include SWIG interface (".i") files in the list; the **build_ext** command knows how to deal with SWIG extensions: it will run SWIG on the interface file and compile the resulting C/C++ file into your extension. This warning notwithstanding, options to SWIG can be currently passed like this: setup(..., ext_modules=[Extension('_foo', ['foo.i'], swig_opts=['-modern', '-I../include'])], py_modules=['foo'], ) Or on the commandline like this: > python setup.py build_ext --swig-opts="-modern -I../include" On some platforms, you can include non-source files that are processed by the compiler and included in your extension. Currently, this just means Windows message text (".mc") files and resource definition (".rc") files for Visual C++. These will be compiled to binary resource (".res") files and linked into the executable. 2.3.3. Preprocessor options --------------------------- Three optional arguments to "Extension" will help if you need to specify include directories to search or preprocessor macros to define/undefine: "include_dirs", "define_macros", and "undef_macros". For example, if your extension requires header files in the "include" directory under your distribution root, use the "include_dirs" option: Extension('foo', ['foo.c'], include_dirs=['include']) You can specify absolute directories there; if you know that your extension will only be built on Unix systems with X11R6 installed to "/usr", you can get away with Extension('foo', ['foo.c'], include_dirs=['/usr/include/X11']) You should avoid this sort of non-portable usage if you plan to distribute your code: it’s probably better to write C code like #include If you need to include header files from some other Python extension, you can take advantage of the fact that header files are installed in a consistent way by the Distutils **install_headers** command. For example, the Numerical Python header files are installed (on a standard Unix installation) to "/usr/local/include/python1.5/Numerical". (The exact location will differ according to your platform and Python installation.) Since the Python include directory—"/usr/local/include/python1.5" in this case—is always included in the search path when building Python extensions, the best approach is to write C code like #include If you must put the "Numerical" include directory right into your header search path, though, you can find that directory using the Distutils "distutils.sysconfig" module: from distutils.sysconfig import get_python_inc incdir = os.path.join(get_python_inc(plat_specific=1), 'Numerical') setup(..., Extension(..., include_dirs=[incdir]), ) Even though this is quite portable—it will work on any Python installation, regardless of platform—it’s probably easier to just write your C code in the sensible way. You can define and undefine pre-processor macros with the "define_macros" and "undef_macros" options. "define_macros" takes a list of "(name, value)" tuples, where "name" is the name of the macro to define (a string) and "value" is its value: either a string or "None". (Defining a macro "FOO" to "None" is the equivalent of a bare "#define FOO" in your C source: with most compilers, this sets "FOO" to the string "1".) "undef_macros" is just a list of macros to undefine. For example: Extension(..., define_macros=[('NDEBUG', '1'), ('HAVE_STRFTIME', None)], undef_macros=['HAVE_FOO', 'HAVE_BAR']) is the equivalent of having this at the top of every C source file: #define NDEBUG 1 #define HAVE_STRFTIME #undef HAVE_FOO #undef HAVE_BAR 2.3.4. Library options ---------------------- You can also specify the libraries to link against when building your extension, and the directories to search for those libraries. The "libraries" option is a list of libraries to link against, "library_dirs" is a list of directories to search for libraries at link-time, and "runtime_library_dirs" is a list of directories to search for shared (dynamically loaded) libraries at run-time. For example, if you need to link against libraries known to be in the standard library search path on target systems Extension(..., libraries=['gdbm', 'readline']) If you need to link with libraries in a non-standard location, you’ll have to include the location in "library_dirs": Extension(..., library_dirs=['/usr/X11R6/lib'], libraries=['X11', 'Xt']) (Again, this sort of non-portable construct should be avoided if you intend to distribute your code.) 2.3.5. Other options -------------------- There are still some other options which can be used to handle special cases. The "optional" option is a boolean; if it is true, a build failure in the extension will not abort the build process, but instead simply not install the failing extension. The "extra_objects" option is a list of object files to be passed to the linker. These files must not have extensions, as the default extension for the compiler is used. "extra_compile_args" and "extra_link_args" can be used to specify additional command line options for the respective compiler and linker command lines. "export_symbols" is only useful on Windows. It can contain a list of symbols (functions or variables) to be exported. This option is not needed when building compiled extensions: Distutils will automatically add "initmodule" to the list of exported symbols. The "depends" option is a list of files that the extension depends on (for example header files). The build command will call the compiler on the sources to rebuild extension if any on this files has been modified since the previous build. 2.4. Relationships between Distributions and Packages ===================================================== A distribution may relate to packages in three specific ways: 1. It can require packages or modules. 2. It can provide packages or modules. 3. It can obsolete packages or modules. These relationships can be specified using keyword arguments to the "distutils.core.setup()" function. Dependencies on other Python modules and packages can be specified by supplying the *requires* keyword argument to "setup()". The value must be a list of strings. Each string specifies a package that is required, and optionally what versions are sufficient. To specify that any version of a module or package is required, the string should consist entirely of the module or package name. Examples include "'mymodule'" and "'xml.parsers.expat'". If specific versions are required, a sequence of qualifiers can be supplied in parentheses. Each qualifier may consist of a comparison operator and a version number. The accepted comparison operators are: < > == <= >= != These can be combined by using multiple qualifiers separated by commas (and optional whitespace). In this case, all of the qualifiers must be matched; a logical AND is used to combine the evaluations. Let’s look at a bunch of examples: +---------------------------+------------------------------------------------+ | Requires Expression | Explanation | |===========================|================================================| | "==1.0" | Only version "1.0" is compatible | +---------------------------+------------------------------------------------+ | ">1.0, !=1.5.1, <2.0" | Any version after "1.0" and before "2.0" is | | | compatible, except "1.5.1" | +---------------------------+------------------------------------------------+ Now that we can specify dependencies, we also need to be able to specify what we provide that other distributions can require. This is done using the *provides* keyword argument to "setup()". The value for this keyword is a list of strings, each of which names a Python module or package, and optionally identifies the version. If the version is not specified, it is assumed to match that of the distribution. Some examples: +-----------------------+------------------------------------------------+ | Provides Expression | Explanation | |=======================|================================================| | "mypkg" | Provide "mypkg", using the distribution | | | version | +-----------------------+------------------------------------------------+ | "mypkg (1.1)" | Provide "mypkg" version 1.1, regardless of the | | | distribution version | +-----------------------+------------------------------------------------+ A package can declare that it obsoletes other packages using the *obsoletes* keyword argument. The value for this is similar to that of the *requires* keyword: a list of strings giving module or package specifiers. Each specifier consists of a module or package name optionally followed by one or more version qualifiers. Version qualifiers are given in parentheses after the module or package name. The versions identified by the qualifiers are those that are obsoleted by the distribution being described. If no qualifiers are given, all versions of the named module or package are understood to be obsoleted. 2.5. Installing Scripts ======================= So far we have been dealing with pure and non-pure Python modules, which are usually not run by themselves but imported by scripts. Scripts are files containing Python source code, intended to be started from the command line. Scripts don’t require Distutils to do anything very complicated. The only clever feature is that if the first line of the script starts with "#!" and contains the word “python”, the Distutils will adjust the first line to refer to the current interpreter location. By default, it is replaced with the current interpreter location. The "--executable" (or "-e") option will allow the interpreter path to be explicitly overridden. The "scripts" option simply is a list of files to be handled in this way. From the PyXML setup script: setup(..., scripts=['scripts/xmlproc_parse', 'scripts/xmlproc_val'] ) Changed in version 3.1: All the scripts will also be added to the "MANIFEST" file if no template is provided. See Specifying the files to distribute. 2.6. Installing Package Data ============================ Often, additional files need to be installed into a package. These files are often data that’s closely related to the package’s implementation, or text files containing documentation that might be of interest to programmers using the package. These files are called *package data*. Package data can be added to packages using the "package_data" keyword argument to the "setup()" function. The value must be a mapping from package name to a list of relative path names that should be copied into the package. The paths are interpreted as relative to the directory containing the package (information from the "package_dir" mapping is used if appropriate); that is, the files are expected to be part of the package in the source directories. They may contain glob patterns as well. The path names may contain directory portions; any necessary directories will be created in the installation. For example, if a package should contain a subdirectory with several data files, the files can be arranged like this in the source tree: setup.py src/ mypkg/ __init__.py module.py data/ tables.dat spoons.dat forks.dat The corresponding call to "setup()" might be: setup(..., packages=['mypkg'], package_dir={'mypkg': 'src/mypkg'}, package_data={'mypkg': ['data/*.dat']}, ) Changed in version 3.1: All the files that match "package_data" will be added to the "MANIFEST" file if no template is provided. See Specifying the files to distribute. 2.7. Installing Additional Files ================================ The "data_files" option can be used to specify additional files needed by the module distribution: configuration files, message catalogs, data files, anything which doesn’t fit in the previous categories. "data_files" specifies a sequence of (*directory*, *files*) pairs in the following way: setup(..., data_files=[('bitmaps', ['bm/b1.gif', 'bm/b2.gif']), ('config', ['cfg/data.cfg'])], ) Each (*directory*, *files*) pair in the sequence specifies the installation directory and the files to install there. Each file name in *files* is interpreted relative to the "setup.py" script at the top of the package source distribution. Note that you can specify the directory where the data files will be installed, but you cannot rename the data files themselves. The *directory* should be a relative path. It is interpreted relative to the installation prefix (Python’s "sys.prefix" for system installations; "site.USER_BASE" for user installations). Distutils allows *directory* to be an absolute installation path, but this is discouraged since it is incompatible with the wheel packaging format. No directory information from *files* is used to determine the final location of the installed file; only the name of the file is used. You can specify the "data_files" options as a simple sequence of files without specifying a target directory, but this is not recommended, and the **install** command will print a warning in this case. To install data files directly in the target directory, an empty string should be given as the directory. Changed in version 3.1: All the files that match "data_files" will be added to the "MANIFEST" file if no template is provided. See Specifying the files to distribute. 2.8. Additional meta-data ========================= The setup script may include additional meta-data beyond the name and version. This information includes: +------------------------+-----------------------------+-------------------+----------+ | Meta-Data | Description | Value | Notes | |========================|=============================|===================|==========| | "name" | name of the package | short string | (1) | +------------------------+-----------------------------+-------------------+----------+ | "version" | version of this release | short string | (1)(2) | +------------------------+-----------------------------+-------------------+----------+ | "author" | package author’s name | short string | (3) | +------------------------+-----------------------------+-------------------+----------+ | "author_email" | email address of the | email address | (3) | | | package author | | | +------------------------+-----------------------------+-------------------+----------+ | "maintainer" | package maintainer’s name | short string | (3) | +------------------------+-----------------------------+-------------------+----------+ | "maintainer_email" | email address of the | email address | (3) | | | package maintainer | | | +------------------------+-----------------------------+-------------------+----------+ | "url" | home page for the package | URL | (1) | +------------------------+-----------------------------+-------------------+----------+ | "description" | short, summary description | short string | | | | of the package | | | +------------------------+-----------------------------+-------------------+----------+ | "long_description" | longer description of the | long string | (4) | | | package | | | +------------------------+-----------------------------+-------------------+----------+ | "download_url" | location where the package | URL | | | | may be downloaded | | | +------------------------+-----------------------------+-------------------+----------+ | "classifiers" | a list of classifiers | list of strings | (6)(7) | +------------------------+-----------------------------+-------------------+----------+ | "platforms" | a list of platforms | list of strings | (6)(8) | +------------------------+-----------------------------+-------------------+----------+ | "keywords" | a list of keywords | list of strings | (6)(8) | +------------------------+-----------------------------+-------------------+----------+ | "license" | license for the package | short string | (5) | +------------------------+-----------------------------+-------------------+----------+ Notes: 1. These fields are required. 2. It is recommended that versions take the form *major.minor[.patch[.sub]]*. 3. Either the author or the maintainer must be identified. If maintainer is provided, distutils lists it as the author in "PKG- INFO". 4. The "long_description" field is used by PyPI when you publish a package, to build its project page. 5. The "license" field is a text indicating the license covering the package where the license is not a selection from the “License” Trove classifiers. See the "Classifier" field. Notice that there’s a "licence" distribution option which is deprecated but still acts as an alias for "license". 6. This field must be a list. 7. The valid classifiers are listed on PyPI. 8. To preserve backward compatibility, this field also accepts a string. If you pass a comma-separated string "'foo, bar'", it will be converted to "['foo', 'bar']", Otherwise, it will be converted to a list of one string. ‘short string’ A single line of text, not more than 200 characters. ‘long string’ Multiple lines of plain text in reStructuredText format (see http://docutils.sourceforge.net/). ‘list of strings’ See below. Encoding the version information is an art in itself. Python packages generally adhere to the version format *major.minor[.patch][sub]*. The major number is 0 for initial, experimental releases of software. It is incremented for releases that represent major milestones in a package. The minor number is incremented when important new features are added to the package. The patch number increments when bug-fix releases are made. Additional trailing version information is sometimes used to indicate sub-releases. These are “a1,a2,…,aN” (for alpha releases, where functionality and API may change), “b1,b2,…,bN” (for beta releases, which only fix bugs) and “pr1,pr2,…,prN” (for final pre-release release testing). Some examples: 0.1.0 the first, experimental release of a package 1.0.1a2 the second alpha release of the first patch version of 1.0 "classifiers" must be specified in a list: setup(..., classifiers=[ 'Development Status :: 4 - Beta', 'Environment :: Console', 'Environment :: Web Environment', 'Intended Audience :: End Users/Desktop', 'Intended Audience :: Developers', 'Intended Audience :: System Administrators', 'License :: OSI Approved :: Python Software Foundation License', 'Operating System :: MacOS :: MacOS X', 'Operating System :: Microsoft :: Windows', 'Operating System :: POSIX', 'Programming Language :: Python', 'Topic :: Communications :: Email', 'Topic :: Office/Business', 'Topic :: Software Development :: Bug Tracking', ], ) Changed in version 3.7: "setup" now warns when "classifiers", "keywords" or "platforms" fields are not specified as a list or a string. 2.9. Debugging the setup script =============================== Sometimes things go wrong, and the setup script doesn’t do what the developer wants. Distutils catches any exceptions when running the setup script, and print a simple error message before the script is terminated. The motivation for this behaviour is to not confuse administrators who don’t know much about Python and are trying to install a package. If they get a big long traceback from deep inside the guts of Distutils, they may think the package or the Python installation is broken because they don’t read all the way down to the bottom and see that it’s a permission problem. On the other hand, this doesn’t help the developer to find the cause of the failure. For this purpose, the "DISTUTILS_DEBUG" environment variable can be set to anything except an empty string, and distutils will now print detailed information about what it is doing, dump the full traceback when an exception occurs, and print the whole command line when an external program (like a C compiler) fails. 4. Creating a Source Distribution ********************************* Note: This document is being retained solely until the "setuptools" documentation at https://setuptools.readthedocs.io/en/latest/setuptools.html independently covers all of the relevant information currently included here. As shown in section A Simple Example, you use the **sdist** command to create a source distribution. In the simplest case, python setup.py sdist (assuming you haven’t specified any **sdist** options in the setup script or config file), **sdist** creates the archive of the default format for the current platform. The default format is a gzip’ed tar file (".tar.gz") on Unix, and ZIP file on Windows. You can specify as many formats as you like using the "--formats" option, for example: python setup.py sdist --formats=gztar,zip to create a gzipped tarball and a zip file. The available formats are: +-------------+---------------------------+---------------+ | Format | Description | Notes | |=============|===========================|===============| | "zip" | zip file (".zip") | (1),(3) | +-------------+---------------------------+---------------+ | "gztar" | gzip’ed tar file | (2) | | | (".tar.gz") | | +-------------+---------------------------+---------------+ | "bztar" | bzip2’ed tar file | (5) | | | (".tar.bz2") | | +-------------+---------------------------+---------------+ | "xztar" | xz’ed tar file | (5) | | | (".tar.xz") | | +-------------+---------------------------+---------------+ | "ztar" | compressed tar file | (4),(5) | | | (".tar.Z") | | +-------------+---------------------------+---------------+ | "tar" | tar file (".tar") | (5) | +-------------+---------------------------+---------------+ Changed in version 3.5: Added support for the "xztar" format. Notes: 1. default on Windows 2. default on Unix 3. requires either external **zip** utility or "zipfile" module (part of the standard Python library since Python 1.6) 4. requires the **compress** program. Notice that this format is now pending for deprecation and will be removed in the future versions of Python. 5. deprecated by PEP 527; PyPI only accepts ".zip" and ".tar.gz" files. When using any "tar" format ("gztar", "bztar", "xztar", "ztar" or "tar"), under Unix you can specify the "owner" and "group" names that will be set for each member of the archive. For example, if you want all files of the archive to be owned by root: python setup.py sdist --owner=root --group=root 4.1. Specifying the files to distribute ======================================= If you don’t supply an explicit list of files (or instructions on how to generate one), the **sdist** command puts a minimal default set into the source distribution: * all Python source files implied by the "py_modules" and "packages" options * all C source files mentioned in the "ext_modules" or "libraries" options * scripts identified by the "scripts" option See Installing Scripts. * anything that looks like a test script: "test/test*.py" (currently, the Distutils don’t do anything with test scripts except include them in source distributions, but in the future there will be a standard for testing Python module distributions) * Any of the standard README files ("README", "README.txt", or "README.rst"), "setup.py" (or whatever you called your setup script), and "setup.cfg". * all files that matches the "package_data" metadata. See Installing Package Data. * all files that matches the "data_files" metadata. See Installing Additional Files. Sometimes this is enough, but usually you will want to specify additional files to distribute. The typical way to do this is to write a *manifest template*, called "MANIFEST.in" by default. The manifest template is just a list of instructions for how to generate your manifest file, "MANIFEST", which is the exact list of files to include in your source distribution. The **sdist** command processes this template and generates a manifest based on its instructions and what it finds in the filesystem. If you prefer to roll your own manifest file, the format is simple: one filename per line, regular files (or symlinks to them) only. If you do supply your own "MANIFEST", you must specify everything: the default set of files described above does not apply in this case. Changed in version 3.1: An existing generated "MANIFEST" will be regenerated without **sdist** comparing its modification time to the one of "MANIFEST.in" or "setup.py". Changed in version 3.1.3: "MANIFEST" files start with a comment indicating they are generated. Files without this comment are not overwritten or removed. Changed in version 3.2.2: **sdist** will read a "MANIFEST" file if no "MANIFEST.in" exists, like it used to do. Changed in version 3.7: "README.rst" is now included in the list of distutils standard READMEs. The manifest template has one command per line, where each command specifies a set of files to include or exclude from the source distribution. For an example, again we turn to the Distutils’ own manifest template: include *.txt recursive-include examples *.txt *.py prune examples/sample?/build The meanings should be fairly clear: include all files in the distribution root matching "*.txt", all files anywhere under the "examples" directory matching "*.txt" or "*.py", and exclude all directories matching "examples/sample?/build". All of this is done *after* the standard include set, so you can exclude files from the standard set with explicit instructions in the manifest template. (Or, you can use the "--no-defaults" option to disable the standard set entirely.) There are several other commands available in the manifest template mini-language; see section Creating a source distribution: the sdist command. The order of commands in the manifest template matters: initially, we have the list of default files as described above, and each command in the template adds to or removes from that list of files. Once we have fully processed the manifest template, we remove files that should not be included in the source distribution: * all files in the Distutils “build” tree (default "build/") * all files in directories named "RCS", "CVS", ".svn", ".hg", ".git", ".bzr" or "_darcs" Now we have our complete list of files, which is written to the manifest for future reference, and then used to build the source distribution archive(s). You can disable the default set of included files with the "--no- defaults" option, and you can disable the standard exclude set with " --no-prune". Following the Distutils’ own manifest template, let’s trace how the **sdist** command builds the list of files to include in the Distutils source distribution: 1. include all Python source files in the "distutils" and "distutils/command" subdirectories (because packages corresponding to those two directories were mentioned in the "packages" option in the setup script—see section Writing the Setup Script) 2. include "README.txt", "setup.py", and "setup.cfg" (standard files) 3. include "test/test*.py" (standard files) 4. include "*.txt" in the distribution root (this will find "README.txt" a second time, but such redundancies are weeded out later) 5. include anything matching "*.txt" or "*.py" in the sub-tree under "examples", 6. exclude all files in the sub-trees starting at directories matching "examples/sample?/build"—this may exclude files included by the previous two steps, so it’s important that the "prune" command in the manifest template comes after the "recursive-include" command 7. exclude the entire "build" tree, and any "RCS", "CVS", ".svn", ".hg", ".git", ".bzr" and "_darcs" directories Just like in the setup script, file and directory names in the manifest template should always be slash-separated; the Distutils will take care of converting them to the standard representation on your platform. That way, the manifest template is portable across operating systems. 4.2. Manifest-related options ============================= The normal course of operations for the **sdist** command is as follows: * if the manifest file ("MANIFEST" by default) exists and the first line does not have a comment indicating it is generated from "MANIFEST.in", then it is used as is, unaltered * if the manifest file doesn’t exist or has been previously automatically generated, read "MANIFEST.in" and create the manifest * if neither "MANIFEST" nor "MANIFEST.in" exist, create a manifest with just the default file set * use the list of files now in "MANIFEST" (either just generated or read in) to create the source distribution archive(s) There are a couple of options that modify this behaviour. First, use the "--no-defaults" and "--no-prune" to disable the standard “include” and “exclude” sets. Second, you might just want to (re)generate the manifest, but not create a source distribution: python setup.py sdist --manifest-only "-o" is a shortcut for "--manifest-only". Uploading Packages to the Package Index *************************************** References to up to date PyPI documentation can be found at Reading the Python Packaging User Guide. 4. Building C and C++ Extensions ******************************** A C extension for CPython is a shared library (e.g. a ".so" file on Linux, ".pyd" on Windows), which exports an *initialization function*. To be importable, the shared library must be available on "PYTHONPATH", and must be named after the module name, with an appropriate extension. When using distutils, the correct filename is generated automatically. The initialization function has the signature: PyObject* PyInit_modulename(void) It returns either a fully-initialized module, or a "PyModuleDef" instance. See Initializing C modules for details. For modules with ASCII-only names, the function must be named "PyInit_", with "" replaced by the name of the module. When using Multi-phase initialization, non-ASCII module names are allowed. In this case, the initialization function name is "PyInitU_", with "" encoded using Python’s *punycode* encoding with hyphens replaced by underscores. In Python: def initfunc_name(name): try: suffix = b'_' + name.encode('ascii') except UnicodeEncodeError: suffix = b'U_' + name.encode('punycode').replace(b'-', b'_') return b'PyInit' + suffix It is possible to export multiple modules from a single shared library by defining multiple initialization functions. However, importing them requires using symbolic links or a custom importer, because by default only the function corresponding to the filename is found. See the *“Multiple modules in one library”* section in **PEP 489** for details. 4.1. Building C and C++ Extensions with distutils ================================================= Extension modules can be built using distutils, which is included in Python. Since distutils also supports creation of binary packages, users don’t necessarily need a compiler and distutils to install the extension. A distutils package contains a driver script, "setup.py". This is a plain Python file, which, in the most simple case, could look like this: from distutils.core import setup, Extension module1 = Extension('demo', sources = ['demo.c']) setup (name = 'PackageName', version = '1.0', description = 'This is a demo package', ext_modules = [module1]) With this "setup.py", and a file "demo.c", running python setup.py build will compile "demo.c", and produce an extension module named "demo" in the "build" directory. Depending on the system, the module file will end up in a subdirectory "build/lib.system", and may have a name like "demo.so" or "demo.pyd". In the "setup.py", all execution is performed by calling the "setup" function. This takes a variable number of keyword arguments, of which the example above uses only a subset. Specifically, the example specifies meta-information to build packages, and it specifies the contents of the package. Normally, a package will contain additional modules, like Python source modules, documentation, subpackages, etc. Please refer to the distutils documentation in Distributing Python Modules (Legacy version) to learn more about the features of distutils; this section explains building extension modules only. It is common to pre-compute arguments to "setup()", to better structure the driver script. In the example above, the "ext_modules" argument to "setup()" is a list of extension modules, each of which is an instance of the "Extension". In the example, the instance defines an extension named "demo" which is build by compiling a single source file, "demo.c". In many cases, building an extension is more complex, since additional preprocessor defines and libraries may be needed. This is demonstrated in the example below. from distutils.core import setup, Extension module1 = Extension('demo', define_macros = [('MAJOR_VERSION', '1'), ('MINOR_VERSION', '0')], include_dirs = ['/usr/local/include'], libraries = ['tcl83'], library_dirs = ['/usr/local/lib'], sources = ['demo.c']) setup (name = 'PackageName', version = '1.0', description = 'This is a demo package', author = 'Martin v. Loewis', author_email = 'martin@v.loewis.de', url = 'https://docs.python.org/extending/building', long_description = ''' This is really just a demo package. ''', ext_modules = [module1]) In this example, "setup()" is called with additional meta-information, which is recommended when distribution packages have to be built. For the extension itself, it specifies preprocessor defines, include directories, library directories, and libraries. Depending on the compiler, distutils passes this information in different ways to the compiler. For example, on Unix, this may result in the compilation commands gcc -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC -DMAJOR_VERSION=1 -DMINOR_VERSION=0 -I/usr/local/include -I/usr/local/include/python2.2 -c demo.c -o build/temp.linux-i686-2.2/demo.o gcc -shared build/temp.linux-i686-2.2/demo.o -L/usr/local/lib -ltcl83 -o build/lib.linux-i686-2.2/demo.so These lines are for demonstration purposes only; distutils users should trust that distutils gets the invocations right. 4.2. Distributing your extension modules ======================================== When an extension has been successfully built, there are three ways to use it. End-users will typically want to install the module, they do so by running python setup.py install Module maintainers should produce source packages; to do so, they run python setup.py sdist In some cases, additional files need to be included in a source distribution; this is done through a "MANIFEST.in" file; see Specifying the files to distribute for details. If the source distribution has been built successfully, maintainers can also create binary distributions. Depending on the platform, one of the following commands can be used to do so. python setup.py bdist_wininst python setup.py bdist_rpm python setup.py bdist_dumb 1. Embedding Python in Another Application ****************************************** The previous chapters discussed how to extend Python, that is, how to extend the functionality of Python by attaching a library of C functions to it. It is also possible to do it the other way around: enrich your C/C++ application by embedding Python in it. Embedding provides your application with the ability to implement some of the functionality of your application in Python rather than C or C++. This can be used for many purposes; one example would be to allow users to tailor the application to their needs by writing some scripts in Python. You can also use it yourself if some of the functionality can be written in Python more easily. Embedding Python is similar to extending it, but not quite. The difference is that when you extend Python, the main program of the application is still the Python interpreter, while if you embed Python, the main program may have nothing to do with Python — instead, some parts of the application occasionally call the Python interpreter to run some Python code. So if you are embedding Python, you are providing your own main program. One of the things this main program has to do is initialize the Python interpreter. At the very least, you have to call the function "Py_Initialize()". There are optional calls to pass command line arguments to Python. Then later you can call the interpreter from any part of the application. There are several different ways to call the interpreter: you can pass a string containing Python statements to "PyRun_SimpleString()", or you can pass a stdio file pointer and a file name (for identification in error messages only) to "PyRun_SimpleFile()". You can also call the lower-level operations described in the previous chapters to construct and use Python objects. See also: Python/C API Reference Manual The details of Python’s C interface are given in this manual. A great deal of necessary information can be found here. 1.1. Very High Level Embedding ============================== The simplest form of embedding Python is the use of the very high level interface. This interface is intended to execute a Python script without needing to interact with the application directly. This can for example be used to perform some operation on a file. #define PY_SSIZE_T_CLEAN #include int main(int argc, char *argv[]) { wchar_t *program = Py_DecodeLocale(argv[0], NULL); if (program == NULL) { fprintf(stderr, "Fatal error: cannot decode argv[0]\n"); exit(1); } Py_SetProgramName(program); /* optional but recommended */ Py_Initialize(); PyRun_SimpleString("from time import time,ctime\n" "print('Today is', ctime(time()))\n"); if (Py_FinalizeEx() < 0) { exit(120); } PyMem_RawFree(program); return 0; } The "Py_SetProgramName()" function should be called before "Py_Initialize()" to inform the interpreter about paths to Python run- time libraries. Next, the Python interpreter is initialized with "Py_Initialize()", followed by the execution of a hard-coded Python script that prints the date and time. Afterwards, the "Py_FinalizeEx()" call shuts the interpreter down, followed by the end of the program. In a real program, you may want to get the Python script from another source, perhaps a text-editor routine, a file, or a database. Getting the Python code from a file can better be done by using the "PyRun_SimpleFile()" function, which saves you the trouble of allocating memory space and loading the file contents. 1.2. Beyond Very High Level Embedding: An overview ================================================== The high level interface gives you the ability to execute arbitrary pieces of Python code from your application, but exchanging data values is quite cumbersome to say the least. If you want that, you should use lower level calls. At the cost of having to write more C code, you can achieve almost anything. It should be noted that extending Python and embedding Python is quite the same activity, despite the different intent. Most topics discussed in the previous chapters are still valid. To show this, consider what the extension code from Python to C really does: 1. Convert data values from Python to C, 2. Perform a function call to a C routine using the converted values, and 3. Convert the data values from the call from C to Python. When embedding Python, the interface code does: 1. Convert data values from C to Python, 2. Perform a function call to a Python interface routine using the converted values, and 3. Convert the data values from the call from Python to C. As you can see, the data conversion steps are simply swapped to accommodate the different direction of the cross-language transfer. The only difference is the routine that you call between both data conversions. When extending, you call a C routine, when embedding, you call a Python routine. This chapter will not discuss how to convert data from Python to C and vice versa. Also, proper use of references and dealing with errors is assumed to be understood. Since these aspects do not differ from extending the interpreter, you can refer to earlier chapters for the required information. 1.3. Pure Embedding =================== The first program aims to execute a function in a Python script. Like in the section about the very high level interface, the Python interpreter does not directly interact with the application (but that will change in the next section). The code to run a function defined in a Python script is: #define PY_SSIZE_T_CLEAN #include int main(int argc, char *argv[]) { PyObject *pName, *pModule, *pFunc; PyObject *pArgs, *pValue; int i; if (argc < 3) { fprintf(stderr,"Usage: call pythonfile funcname [args]\n"); return 1; } Py_Initialize(); pName = PyUnicode_DecodeFSDefault(argv[1]); /* Error checking of pName left out */ pModule = PyImport_Import(pName); Py_DECREF(pName); if (pModule != NULL) { pFunc = PyObject_GetAttrString(pModule, argv[2]); /* pFunc is a new reference */ if (pFunc && PyCallable_Check(pFunc)) { pArgs = PyTuple_New(argc - 3); for (i = 0; i < argc - 3; ++i) { pValue = PyLong_FromLong(atoi(argv[i + 3])); if (!pValue) { Py_DECREF(pArgs); Py_DECREF(pModule); fprintf(stderr, "Cannot convert argument\n"); return 1; } /* pValue reference stolen here: */ PyTuple_SetItem(pArgs, i, pValue); } pValue = PyObject_CallObject(pFunc, pArgs); Py_DECREF(pArgs); if (pValue != NULL) { printf("Result of call: %ld\n", PyLong_AsLong(pValue)); Py_DECREF(pValue); } else { Py_DECREF(pFunc); Py_DECREF(pModule); PyErr_Print(); fprintf(stderr,"Call failed\n"); return 1; } } else { if (PyErr_Occurred()) PyErr_Print(); fprintf(stderr, "Cannot find function \"%s\"\n", argv[2]); } Py_XDECREF(pFunc); Py_DECREF(pModule); } else { PyErr_Print(); fprintf(stderr, "Failed to load \"%s\"\n", argv[1]); return 1; } if (Py_FinalizeEx() < 0) { return 120; } return 0; } This code loads a Python script using "argv[1]", and calls the function named in "argv[2]". Its integer arguments are the other values of the "argv" array. If you compile and link this program (let’s call the finished executable **call**), and use it to execute a Python script, such as: def multiply(a,b): print("Will compute", a, "times", b) c = 0 for i in range(0, a): c = c + b return c then the result should be: $ call multiply multiply 3 2 Will compute 3 times 2 Result of call: 6 Although the program is quite large for its functionality, most of the code is for data conversion between Python and C, and for error reporting. The interesting part with respect to embedding Python starts with Py_Initialize(); pName = PyUnicode_DecodeFSDefault(argv[1]); /* Error checking of pName left out */ pModule = PyImport_Import(pName); After initializing the interpreter, the script is loaded using "PyImport_Import()". This routine needs a Python string as its argument, which is constructed using the "PyUnicode_FromString()" data conversion routine. pFunc = PyObject_GetAttrString(pModule, argv[2]); /* pFunc is a new reference */ if (pFunc && PyCallable_Check(pFunc)) { ... } Py_XDECREF(pFunc); Once the script is loaded, the name we’re looking for is retrieved using "PyObject_GetAttrString()". If the name exists, and the object returned is callable, you can safely assume that it is a function. The program then proceeds by constructing a tuple of arguments as normal. The call to the Python function is then made with: pValue = PyObject_CallObject(pFunc, pArgs); Upon return of the function, "pValue" is either "NULL" or it contains a reference to the return value of the function. Be sure to release the reference after examining the value. 1.4. Extending Embedded Python ============================== Until now, the embedded Python interpreter had no access to functionality from the application itself. The Python API allows this by extending the embedded interpreter. That is, the embedded interpreter gets extended with routines provided by the application. While it sounds complex, it is not so bad. Simply forget for a while that the application starts the Python interpreter. Instead, consider the application to be a set of subroutines, and write some glue code that gives Python access to those routines, just like you would write a normal Python extension. For example: static int numargs=0; /* Return the number of arguments of the application command line */ static PyObject* emb_numargs(PyObject *self, PyObject *args) { if(!PyArg_ParseTuple(args, ":numargs")) return NULL; return PyLong_FromLong(numargs); } static PyMethodDef EmbMethods[] = { {"numargs", emb_numargs, METH_VARARGS, "Return the number of arguments received by the process."}, {NULL, NULL, 0, NULL} }; static PyModuleDef EmbModule = { PyModuleDef_HEAD_INIT, "emb", NULL, -1, EmbMethods, NULL, NULL, NULL, NULL }; static PyObject* PyInit_emb(void) { return PyModule_Create(&EmbModule); } Insert the above code just above the "main()" function. Also, insert the following two statements before the call to "Py_Initialize()": numargs = argc; PyImport_AppendInittab("emb", &PyInit_emb); These two lines initialize the "numargs" variable, and make the "emb.numargs()" function accessible to the embedded Python interpreter. With these extensions, the Python script can do things like import emb print("Number of arguments", emb.numargs()) In a real application, the methods will expose an API of the application to Python. 1.5. Embedding Python in C++ ============================ It is also possible to embed Python in a C++ program; precisely how this is done will depend on the details of the C++ system used; in general you will need to write the main program in C++, and use the C++ compiler to compile and link your program. There is no need to recompile Python itself using C++. 1.6. Compiling and Linking under Unix-like systems ================================================== It is not necessarily trivial to find the right flags to pass to your compiler (and linker) in order to embed the Python interpreter into your application, particularly because Python needs to load library modules implemented as C dynamic extensions (".so" files) linked against it. To find out the required compiler and linker flags, you can execute the "python*X.Y*-config" script which is generated as part of the installation process (a "python3-config" script may also be available). This script has several options, of which the following will be directly useful to you: * "pythonX.Y-config --cflags" will give you the recommended flags when compiling: $ /opt/bin/python3.4-config --cflags -I/opt/include/python3.4m -I/opt/include/python3.4m -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes * "pythonX.Y-config --ldflags" will give you the recommended flags when linking: $ /opt/bin/python3.4-config --ldflags -L/opt/lib/python3.4/config-3.4m -lpthread -ldl -lutil -lm -lpython3.4m -Xlinker -export-dynamic Note: To avoid confusion between several Python installations (and especially between the system Python and your own compiled Python), it is recommended that you use the absolute path to "python*X.Y*-config", as in the above example. If this procedure doesn’t work for you (it is not guaranteed to work for all Unix-like platforms; however, we welcome bug reports) you will have to read your system’s documentation about dynamic linking and/or examine Python’s "Makefile" (use "sysconfig.get_makefile_filename()" to find its location) and compilation options. In this case, the "sysconfig" module is a useful tool to programmatically extract the configuration values that you will want to combine together. For example: >>> import sysconfig >>> sysconfig.get_config_var('LIBS') '-lpthread -ldl -lutil' >>> sysconfig.get_config_var('LINKFORSHARED') '-Xlinker -export-dynamic' 1. Extending Python with C or C++ ********************************* It is quite easy to add new built-in modules to Python, if you know how to program in C. Such *extension modules* can do two things that can’t be done directly in Python: they can implement new built-in object types, and they can call C library functions and system calls. To support extensions, the Python API (Application Programmers Interface) defines a set of functions, macros and variables that provide access to most aspects of the Python run-time system. The Python API is incorporated in a C source file by including the header ""Python.h"". The compilation of an extension module depends on its intended use as well as on your system setup; details are given in later chapters. Note: The C extension interface is specific to CPython, and extension modules do not work on other Python implementations. In many cases, it is possible to avoid writing C extensions and preserve portability to other implementations. For example, if your use case is calling C library functions or system calls, you should consider using the "ctypes" module or the cffi library rather than writing custom C code. These modules let you write Python code to interface with C code and are more portable between implementations of Python than writing and compiling a C extension module. 1.1. A Simple Example ===================== Let’s create an extension module called "spam" (the favorite food of Monty Python fans…) and let’s say we want to create a Python interface to the C library function "system()" [1]. This function takes a null- terminated character string as argument and returns an integer. We want this function to be callable from Python as follows: >>> import spam >>> status = spam.system("ls -l") Begin by creating a file "spammodule.c". (Historically, if a module is called "spam", the C file containing its implementation is called "spammodule.c"; if the module name is very long, like "spammify", the module name can be just "spammify.c".) The first two lines of our file can be: #define PY_SSIZE_T_CLEAN #include which pulls in the Python API (you can add a comment describing the purpose of the module and a copyright notice if you like). Note: Since Python may define some pre-processor definitions which affect the standard headers on some systems, you *must* include "Python.h" before any standard headers are included.It is recommended to always define "PY_SSIZE_T_CLEAN" before including "Python.h". See Extracting Parameters in Extension Functions for a description of this macro. All user-visible symbols defined by "Python.h" have a prefix of "Py" or "PY", except those defined in standard header files. For convenience, and since they are used extensively by the Python interpreter, ""Python.h"" includes a few standard header files: "", "", "", and "". If the latter header file does not exist on your system, it declares the functions "malloc()", "free()" and "realloc()" directly. The next thing we add to our module file is the C function that will be called when the Python expression "spam.system(string)" is evaluated (we’ll see shortly how it ends up being called): static PyObject * spam_system(PyObject *self, PyObject *args) { const char *command; int sts; if (!PyArg_ParseTuple(args, "s", &command)) return NULL; sts = system(command); return PyLong_FromLong(sts); } There is a straightforward translation from the argument list in Python (for example, the single expression ""ls -l"") to the arguments passed to the C function. The C function always has two arguments, conventionally named *self* and *args*. The *self* argument points to the module object for module-level functions; for a method it would point to the object instance. The *args* argument will be a pointer to a Python tuple object containing the arguments. Each item of the tuple corresponds to an argument in the call’s argument list. The arguments are Python objects — in order to do anything with them in our C function we have to convert them to C values. The function "PyArg_ParseTuple()" in the Python API checks the argument types and converts them to C values. It uses a template string to determine the required types of the arguments as well as the types of the C variables into which to store the converted values. More about this later. "PyArg_ParseTuple()" returns true (nonzero) if all arguments have the right type and its components have been stored in the variables whose addresses are passed. It returns false (zero) if an invalid argument list was passed. In the latter case it also raises an appropriate exception so the calling function can return "NULL" immediately (as we saw in the example). 1.2. Intermezzo: Errors and Exceptions ====================================== An important convention throughout the Python interpreter is the following: when a function fails, it should set an exception condition and return an error value (usually "-1" or a "NULL" pointer). Exception information is stored in three members of the interpreter’s thread state. These are "NULL" if there is no exception. Otherwise they are the C equivalents of the members of the Python tuple returned by "sys.exc_info()". These are the exception type, exception instance, and a traceback object. It is important to know about them to understand how errors are passed around. The Python API defines a number of functions to set various types of exceptions. The most common one is "PyErr_SetString()". Its arguments are an exception object and a C string. The exception object is usually a predefined object like "PyExc_ZeroDivisionError". The C string indicates the cause of the error and is converted to a Python string object and stored as the “associated value” of the exception. Another useful function is "PyErr_SetFromErrno()", which only takes an exception argument and constructs the associated value by inspection of the global variable "errno". The most general function is "PyErr_SetObject()", which takes two object arguments, the exception and its associated value. You don’t need to "Py_INCREF()" the objects passed to any of these functions. You can test non-destructively whether an exception has been set with "PyErr_Occurred()". This returns the current exception object, or "NULL" if no exception has occurred. You normally don’t need to call "PyErr_Occurred()" to see whether an error occurred in a function call, since you should be able to tell from the return value. When a function *f* that calls another function *g* detects that the latter fails, *f* should itself return an error value (usually "NULL" or "-1"). It should *not* call one of the "PyErr_*()" functions — one has already been called by *g*. *f*’s caller is then supposed to also return an error indication to *its* caller, again *without* calling "PyErr_*()", and so on — the most detailed cause of the error was already reported by the function that first detected it. Once the error reaches the Python interpreter’s main loop, this aborts the currently executing Python code and tries to find an exception handler specified by the Python programmer. (There are situations where a module can actually give a more detailed error message by calling another "PyErr_*()" function, and in such cases it is fine to do so. As a general rule, however, this is not necessary, and can cause information about the cause of the error to be lost: most operations can fail for a variety of reasons.) To ignore an exception set by a function call that failed, the exception condition must be cleared explicitly by calling "PyErr_Clear()". The only time C code should call "PyErr_Clear()" is if it doesn’t want to pass the error on to the interpreter but wants to handle it completely by itself (possibly by trying something else, or pretending nothing went wrong). Every failing "malloc()" call must be turned into an exception — the direct caller of "malloc()" (or "realloc()") must call "PyErr_NoMemory()" and return a failure indicator itself. All the object-creating functions (for example, "PyLong_FromLong()") already do this, so this note is only relevant to those who call "malloc()" directly. Also note that, with the important exception of "PyArg_ParseTuple()" and friends, functions that return an integer status usually return a positive value or zero for success and "-1" for failure, like Unix system calls. Finally, be careful to clean up garbage (by making "Py_XDECREF()" or "Py_DECREF()" calls for objects you have already created) when you return an error indicator! The choice of which exception to raise is entirely yours. There are predeclared C objects corresponding to all built-in Python exceptions, such as "PyExc_ZeroDivisionError", which you can use directly. Of course, you should choose exceptions wisely — don’t use "PyExc_TypeError" to mean that a file couldn’t be opened (that should probably be "PyExc_IOError"). If something’s wrong with the argument list, the "PyArg_ParseTuple()" function usually raises "PyExc_TypeError". If you have an argument whose value must be in a particular range or must satisfy other conditions, "PyExc_ValueError" is appropriate. You can also define a new exception that is unique to your module. For this, you usually declare a static object variable at the beginning of your file: static PyObject *SpamError; and initialize it in your module’s initialization function ("PyInit_spam()") with an exception object: PyMODINIT_FUNC PyInit_spam(void) { PyObject *m; m = PyModule_Create(&spammodule); if (m == NULL) return NULL; SpamError = PyErr_NewException("spam.error", NULL, NULL); Py_XINCREF(SpamError); if (PyModule_AddObject(m, "error", SpamError) < 0) { Py_XDECREF(SpamError); Py_CLEAR(SpamError); Py_DECREF(m); return NULL; } return m; } Note that the Python name for the exception object is "spam.error". The "PyErr_NewException()" function may create a class with the base class being "Exception" (unless another class is passed in instead of "NULL"), described in Built-in Exceptions. Note also that the "SpamError" variable retains a reference to the newly created exception class; this is intentional! Since the exception could be removed from the module by external code, an owned reference to the class is needed to ensure that it will not be discarded, causing "SpamError" to become a dangling pointer. Should it become a dangling pointer, C code which raises the exception could cause a core dump or other unintended side effects. We discuss the use of "PyMODINIT_FUNC" as a function return type later in this sample. The "spam.error" exception can be raised in your extension module using a call to "PyErr_SetString()" as shown below: static PyObject * spam_system(PyObject *self, PyObject *args) { const char *command; int sts; if (!PyArg_ParseTuple(args, "s", &command)) return NULL; sts = system(command); if (sts < 0) { PyErr_SetString(SpamError, "System command failed"); return NULL; } return PyLong_FromLong(sts); } 1.3. Back to the Example ======================== Going back to our example function, you should now be able to understand this statement: if (!PyArg_ParseTuple(args, "s", &command)) return NULL; It returns "NULL" (the error indicator for functions returning object pointers) if an error is detected in the argument list, relying on the exception set by "PyArg_ParseTuple()". Otherwise the string value of the argument has been copied to the local variable "command". This is a pointer assignment and you are not supposed to modify the string to which it points (so in Standard C, the variable "command" should properly be declared as "const char *command"). The next statement is a call to the Unix function "system()", passing it the string we just got from "PyArg_ParseTuple()": sts = system(command); Our "spam.system()" function must return the value of "sts" as a Python object. This is done using the function "PyLong_FromLong()". return PyLong_FromLong(sts); In this case, it will return an integer object. (Yes, even integers are objects on the heap in Python!) If you have a C function that returns no useful argument (a function returning "void"), the corresponding Python function must return "None". You need this idiom to do so (which is implemented by the "Py_RETURN_NONE" macro): Py_INCREF(Py_None); return Py_None; "Py_None" is the C name for the special Python object "None". It is a genuine Python object rather than a "NULL" pointer, which means “error” in most contexts, as we have seen. 1.4. The Module’s Method Table and Initialization Function ========================================================== I promised to show how "spam_system()" is called from Python programs. First, we need to list its name and address in a “method table”: static PyMethodDef SpamMethods[] = { ... {"system", spam_system, METH_VARARGS, "Execute a shell command."}, ... {NULL, NULL, 0, NULL} /* Sentinel */ }; Note the third entry ("METH_VARARGS"). This is a flag telling the interpreter the calling convention to be used for the C function. It should normally always be "METH_VARARGS" or "METH_VARARGS | METH_KEYWORDS"; a value of "0" means that an obsolete variant of "PyArg_ParseTuple()" is used. When using only "METH_VARARGS", the function should expect the Python- level parameters to be passed in as a tuple acceptable for parsing via "PyArg_ParseTuple()"; more information on this function is provided below. The "METH_KEYWORDS" bit may be set in the third field if keyword arguments should be passed to the function. In this case, the C function should accept a third "PyObject *" parameter which will be a dictionary of keywords. Use "PyArg_ParseTupleAndKeywords()" to parse the arguments to such a function. The method table must be referenced in the module definition structure: static struct PyModuleDef spammodule = { PyModuleDef_HEAD_INIT, "spam", /* name of module */ spam_doc, /* module documentation, may be NULL */ -1, /* size of per-interpreter state of the module, or -1 if the module keeps state in global variables. */ SpamMethods }; This structure, in turn, must be passed to the interpreter in the module’s initialization function. The initialization function must be named "PyInit_name()", where *name* is the name of the module, and should be the only non-"static" item defined in the module file: PyMODINIT_FUNC PyInit_spam(void) { return PyModule_Create(&spammodule); } Note that PyMODINIT_FUNC declares the function as "PyObject *" return type, declares any special linkage declarations required by the platform, and for C++ declares the function as "extern "C"". When the Python program imports module "spam" for the first time, "PyInit_spam()" is called. (See below for comments about embedding Python.) It calls "PyModule_Create()", which returns a module object, and inserts built-in function objects into the newly created module based upon the table (an array of "PyMethodDef" structures) found in the module definition. "PyModule_Create()" returns a pointer to the module object that it creates. It may abort with a fatal error for certain errors, or return "NULL" if the module could not be initialized satisfactorily. The init function must return the module object to its caller, so that it then gets inserted into "sys.modules". When embedding Python, the "PyInit_spam()" function is not called automatically unless there’s an entry in the "PyImport_Inittab" table. To add the module to the initialization table, use "PyImport_AppendInittab()", optionally followed by an import of the module: int main(int argc, char *argv[]) { wchar_t *program = Py_DecodeLocale(argv[0], NULL); if (program == NULL) { fprintf(stderr, "Fatal error: cannot decode argv[0]\n"); exit(1); } /* Add a built-in module, before Py_Initialize */ if (PyImport_AppendInittab("spam", PyInit_spam) == -1) { fprintf(stderr, "Error: could not extend in-built modules table\n"); exit(1); } /* Pass argv[0] to the Python interpreter */ Py_SetProgramName(program); /* Initialize the Python interpreter. Required. If this step fails, it will be a fatal error. */ Py_Initialize(); /* Optionally import the module; alternatively, import can be deferred until the embedded script imports it. */ PyObject *pmodule = PyImport_ImportModule("spam"); if (!pmodule) { PyErr_Print(); fprintf(stderr, "Error: could not import module 'spam'\n"); } ... PyMem_RawFree(program); return 0; } Note: Removing entries from "sys.modules" or importing compiled modules into multiple interpreters within a process (or following a "fork()" without an intervening "exec()") can create problems for some extension modules. Extension module authors should exercise caution when initializing internal data structures. A more substantial example module is included in the Python source distribution as "Modules/xxmodule.c". This file may be used as a template or simply read as an example. Note: Unlike our "spam" example, "xxmodule" uses *multi-phase initialization* (new in Python 3.5), where a PyModuleDef structure is returned from "PyInit_spam", and creation of the module is left to the import machinery. For details on multi-phase initialization, see **PEP 489**. 1.5. Compilation and Linkage ============================ There are two more things to do before you can use your new extension: compiling and linking it with the Python system. If you use dynamic loading, the details may depend on the style of dynamic loading your system uses; see the chapters about building extension modules (chapter Building C and C++ Extensions) and additional information that pertains only to building on Windows (chapter Building C and C++ Extensions on Windows) for more information about this. If you can’t use dynamic loading, or if you want to make your module a permanent part of the Python interpreter, you will have to change the configuration setup and rebuild the interpreter. Luckily, this is very simple on Unix: just place your file ("spammodule.c" for example) in the "Modules/" directory of an unpacked source distribution, add a line to the file "Modules/Setup.local" describing your file: spam spammodule.o and rebuild the interpreter by running **make** in the toplevel directory. You can also run **make** in the "Modules/" subdirectory, but then you must first rebuild "Makefile" there by running ‘**make** Makefile’. (This is necessary each time you change the "Setup" file.) If your module requires additional libraries to link with, these can be listed on the line in the configuration file as well, for instance: spam spammodule.o -lX11 1.6. Calling Python Functions from C ==================================== So far we have concentrated on making C functions callable from Python. The reverse is also useful: calling Python functions from C. This is especially the case for libraries that support so-called “callback” functions. If a C interface makes use of callbacks, the equivalent Python often needs to provide a callback mechanism to the Python programmer; the implementation will require calling the Python callback functions from a C callback. Other uses are also imaginable. Fortunately, the Python interpreter is easily called recursively, and there is a standard interface to call a Python function. (I won’t dwell on how to call the Python parser with a particular string as input — if you’re interested, have a look at the implementation of the "-c" command line option in "Modules/main.c" from the Python source code.) Calling a Python function is easy. First, the Python program must somehow pass you the Python function object. You should provide a function (or some other interface) to do this. When this function is called, save a pointer to the Python function object (be careful to "Py_INCREF()" it!) in a global variable — or wherever you see fit. For example, the following function might be part of a module definition: static PyObject *my_callback = NULL; static PyObject * my_set_callback(PyObject *dummy, PyObject *args) { PyObject *result = NULL; PyObject *temp; if (PyArg_ParseTuple(args, "O:set_callback", &temp)) { if (!PyCallable_Check(temp)) { PyErr_SetString(PyExc_TypeError, "parameter must be callable"); return NULL; } Py_XINCREF(temp); /* Add a reference to new callback */ Py_XDECREF(my_callback); /* Dispose of previous callback */ my_callback = temp; /* Remember new callback */ /* Boilerplate to return "None" */ Py_INCREF(Py_None); result = Py_None; } return result; } This function must be registered with the interpreter using the "METH_VARARGS" flag; this is described in section The Module’s Method Table and Initialization Function. The "PyArg_ParseTuple()" function and its arguments are documented in section Extracting Parameters in Extension Functions. The macros "Py_XINCREF()" and "Py_XDECREF()" increment/decrement the reference count of an object and are safe in the presence of "NULL" pointers (but note that *temp* will not be "NULL" in this context). More info on them in section Reference Counts. Later, when it is time to call the function, you call the C function "PyObject_CallObject()". This function has two arguments, both pointers to arbitrary Python objects: the Python function, and the argument list. The argument list must always be a tuple object, whose length is the number of arguments. To call the Python function with no arguments, pass in "NULL", or an empty tuple; to call it with one argument, pass a singleton tuple. "Py_BuildValue()" returns a tuple when its format string consists of zero or more format codes between parentheses. For example: int arg; PyObject *arglist; PyObject *result; ... arg = 123; ... /* Time to call the callback */ arglist = Py_BuildValue("(i)", arg); result = PyObject_CallObject(my_callback, arglist); Py_DECREF(arglist); "PyObject_CallObject()" returns a Python object pointer: this is the return value of the Python function. "PyObject_CallObject()" is “reference-count-neutral” with respect to its arguments. In the example a new tuple was created to serve as the argument list, which is "Py_DECREF()"-ed immediately after the "PyObject_CallObject()" call. The return value of "PyObject_CallObject()" is “new”: either it is a brand new object, or it is an existing object whose reference count has been incremented. So, unless you want to save it in a global variable, you should somehow "Py_DECREF()" the result, even (especially!) if you are not interested in its value. Before you do this, however, it is important to check that the return value isn’t "NULL". If it is, the Python function terminated by raising an exception. If the C code that called "PyObject_CallObject()" is called from Python, it should now return an error indication to its Python caller, so the interpreter can print a stack trace, or the calling Python code can handle the exception. If this is not possible or desirable, the exception should be cleared by calling "PyErr_Clear()". For example: if (result == NULL) return NULL; /* Pass error back */ ...use result... Py_DECREF(result); Depending on the desired interface to the Python callback function, you may also have to provide an argument list to "PyObject_CallObject()". In some cases the argument list is also provided by the Python program, through the same interface that specified the callback function. It can then be saved and used in the same manner as the function object. In other cases, you may have to construct a new tuple to pass as the argument list. The simplest way to do this is to call "Py_BuildValue()". For example, if you want to pass an integral event code, you might use the following code: PyObject *arglist; ... arglist = Py_BuildValue("(l)", eventcode); result = PyObject_CallObject(my_callback, arglist); Py_DECREF(arglist); if (result == NULL) return NULL; /* Pass error back */ /* Here maybe use the result */ Py_DECREF(result); Note the placement of "Py_DECREF(arglist)" immediately after the call, before the error check! Also note that strictly speaking this code is not complete: "Py_BuildValue()" may run out of memory, and this should be checked. You may also call a function with keyword arguments by using "PyObject_Call()", which supports arguments and keyword arguments. As in the above example, we use "Py_BuildValue()" to construct the dictionary. PyObject *dict; ... dict = Py_BuildValue("{s:i}", "name", val); result = PyObject_Call(my_callback, NULL, dict); Py_DECREF(dict); if (result == NULL) return NULL; /* Pass error back */ /* Here maybe use the result */ Py_DECREF(result); 1.7. Extracting Parameters in Extension Functions ================================================= The "PyArg_ParseTuple()" function is declared as follows: int PyArg_ParseTuple(PyObject *arg, const char *format, ...); The *arg* argument must be a tuple object containing an argument list passed from Python to a C function. The *format* argument must be a format string, whose syntax is explained in Parsing arguments and building values in the Python/C API Reference Manual. The remaining arguments must be addresses of variables whose type is determined by the format string. Note that while "PyArg_ParseTuple()" checks that the Python arguments have the required types, it cannot check the validity of the addresses of C variables passed to the call: if you make mistakes there, your code will probably crash or at least overwrite random bits in memory. So be careful! Note that any Python object references which are provided to the caller are *borrowed* references; do not decrement their reference count! Some example calls: #define PY_SSIZE_T_CLEAN /* Make "s#" use Py_ssize_t rather than int. */ #include int ok; int i, j; long k, l; const char *s; Py_ssize_t size; ok = PyArg_ParseTuple(args, ""); /* No arguments */ /* Python call: f() */ ok = PyArg_ParseTuple(args, "s", &s); /* A string */ /* Possible Python call: f('whoops!') */ ok = PyArg_ParseTuple(args, "lls", &k, &l, &s); /* Two longs and a string */ /* Possible Python call: f(1, 2, 'three') */ ok = PyArg_ParseTuple(args, "(ii)s#", &i, &j, &s, &size); /* A pair of ints and a string, whose size is also returned */ /* Possible Python call: f((1, 2), 'three') */ { const char *file; const char *mode = "r"; int bufsize = 0; ok = PyArg_ParseTuple(args, "s|si", &file, &mode, &bufsize); /* A string, and optionally another string and an integer */ /* Possible Python calls: f('spam') f('spam', 'w') f('spam', 'wb', 100000) */ } { int left, top, right, bottom, h, v; ok = PyArg_ParseTuple(args, "((ii)(ii))(ii)", &left, &top, &right, &bottom, &h, &v); /* A rectangle and a point */ /* Possible Python call: f(((0, 0), (400, 300)), (10, 10)) */ } { Py_complex c; ok = PyArg_ParseTuple(args, "D:myfunction", &c); /* a complex, also providing a function name for errors */ /* Possible Python call: myfunction(1+2j) */ } 1.8. Keyword Parameters for Extension Functions =============================================== The "PyArg_ParseTupleAndKeywords()" function is declared as follows: int PyArg_ParseTupleAndKeywords(PyObject *arg, PyObject *kwdict, const char *format, char *kwlist[], ...); The *arg* and *format* parameters are identical to those of the "PyArg_ParseTuple()" function. The *kwdict* parameter is the dictionary of keywords received as the third parameter from the Python runtime. The *kwlist* parameter is a "NULL"-terminated list of strings which identify the parameters; the names are matched with the type information from *format* from left to right. On success, "PyArg_ParseTupleAndKeywords()" returns true, otherwise it returns false and raises an appropriate exception. Note: Nested tuples cannot be parsed when using keyword arguments! Keyword parameters passed in which are not present in the *kwlist* will cause "TypeError" to be raised. Here is an example module which uses keywords, based on an example by Geoff Philbrick (philbrick@hks.com): #define PY_SSIZE_T_CLEAN /* Make "s#" use Py_ssize_t rather than int. */ #include static PyObject * keywdarg_parrot(PyObject *self, PyObject *args, PyObject *keywds) { int voltage; const char *state = "a stiff"; const char *action = "voom"; const char *type = "Norwegian Blue"; static char *kwlist[] = {"voltage", "state", "action", "type", NULL}; if (!PyArg_ParseTupleAndKeywords(args, keywds, "i|sss", kwlist, &voltage, &state, &action, &type)) return NULL; printf("-- This parrot wouldn't %s if you put %i Volts through it.\n", action, voltage); printf("-- Lovely plumage, the %s -- It's %s!\n", type, state); Py_RETURN_NONE; } static PyMethodDef keywdarg_methods[] = { /* The cast of the function is necessary since PyCFunction values * only take two PyObject* parameters, and keywdarg_parrot() takes * three. */ {"parrot", (PyCFunction)(void(*)(void))keywdarg_parrot, METH_VARARGS | METH_KEYWORDS, "Print a lovely skit to standard output."}, {NULL, NULL, 0, NULL} /* sentinel */ }; static struct PyModuleDef keywdargmodule = { PyModuleDef_HEAD_INIT, "keywdarg", NULL, -1, keywdarg_methods }; PyMODINIT_FUNC PyInit_keywdarg(void) { return PyModule_Create(&keywdargmodule); } 1.9. Building Arbitrary Values ============================== This function is the counterpart to "PyArg_ParseTuple()". It is declared as follows: PyObject *Py_BuildValue(const char *format, ...); It recognizes a set of format units similar to the ones recognized by "PyArg_ParseTuple()", but the arguments (which are input to the function, not output) must not be pointers, just values. It returns a new Python object, suitable for returning from a C function called from Python. One difference with "PyArg_ParseTuple()": while the latter requires its first argument to be a tuple (since Python argument lists are always represented as tuples internally), "Py_BuildValue()" does not always build a tuple. It builds a tuple only if its format string contains two or more format units. If the format string is empty, it returns "None"; if it contains exactly one format unit, it returns whatever object is described by that format unit. To force it to return a tuple of size 0 or one, parenthesize the format string. Examples (to the left the call, to the right the resulting Python value): Py_BuildValue("") None Py_BuildValue("i", 123) 123 Py_BuildValue("iii", 123, 456, 789) (123, 456, 789) Py_BuildValue("s", "hello") 'hello' Py_BuildValue("y", "hello") b'hello' Py_BuildValue("ss", "hello", "world") ('hello', 'world') Py_BuildValue("s#", "hello", 4) 'hell' Py_BuildValue("y#", "hello", 4) b'hell' Py_BuildValue("()") () Py_BuildValue("(i)", 123) (123,) Py_BuildValue("(ii)", 123, 456) (123, 456) Py_BuildValue("(i,i)", 123, 456) (123, 456) Py_BuildValue("[i,i]", 123, 456) [123, 456] Py_BuildValue("{s:i,s:i}", "abc", 123, "def", 456) {'abc': 123, 'def': 456} Py_BuildValue("((ii)(ii)) (ii)", 1, 2, 3, 4, 5, 6) (((1, 2), (3, 4)), (5, 6)) 1.10. Reference Counts ====================== In languages like C or C++, the programmer is responsible for dynamic allocation and deallocation of memory on the heap. In C, this is done using the functions "malloc()" and "free()". In C++, the operators "new" and "delete" are used with essentially the same meaning and we’ll restrict the following discussion to the C case. Every block of memory allocated with "malloc()" should eventually be returned to the pool of available memory by exactly one call to "free()". It is important to call "free()" at the right time. If a block’s address is forgotten but "free()" is not called for it, the memory it occupies cannot be reused until the program terminates. This is called a *memory leak*. On the other hand, if a program calls "free()" for a block and then continues to use the block, it creates a conflict with re-use of the block through another "malloc()" call. This is called *using freed memory*. It has the same bad consequences as referencing uninitialized data — core dumps, wrong results, mysterious crashes. Common causes of memory leaks are unusual paths through the code. For instance, a function may allocate a block of memory, do some calculation, and then free the block again. Now a change in the requirements for the function may add a test to the calculation that detects an error condition and can return prematurely from the function. It’s easy to forget to free the allocated memory block when taking this premature exit, especially when it is added later to the code. Such leaks, once introduced, often go undetected for a long time: the error exit is taken only in a small fraction of all calls, and most modern machines have plenty of virtual memory, so the leak only becomes apparent in a long-running process that uses the leaking function frequently. Therefore, it’s important to prevent leaks from happening by having a coding convention or strategy that minimizes this kind of errors. Since Python makes heavy use of "malloc()" and "free()", it needs a strategy to avoid memory leaks as well as the use of freed memory. The chosen method is called *reference counting*. The principle is simple: every object contains a counter, which is incremented when a reference to the object is stored somewhere, and which is decremented when a reference to it is deleted. When the counter reaches zero, the last reference to the object has been deleted and the object is freed. An alternative strategy is called *automatic garbage collection*. (Sometimes, reference counting is also referred to as a garbage collection strategy, hence my use of “automatic” to distinguish the two.) The big advantage of automatic garbage collection is that the user doesn’t need to call "free()" explicitly. (Another claimed advantage is an improvement in speed or memory usage — this is no hard fact however.) The disadvantage is that for C, there is no truly portable automatic garbage collector, while reference counting can be implemented portably (as long as the functions "malloc()" and "free()" are available — which the C Standard guarantees). Maybe some day a sufficiently portable automatic garbage collector will be available for C. Until then, we’ll have to live with reference counts. While Python uses the traditional reference counting implementation, it also offers a cycle detector that works to detect reference cycles. This allows applications to not worry about creating direct or indirect circular references; these are the weakness of garbage collection implemented using only reference counting. Reference cycles consist of objects which contain (possibly indirect) references to themselves, so that each object in the cycle has a reference count which is non-zero. Typical reference counting implementations are not able to reclaim the memory belonging to any objects in a reference cycle, or referenced from the objects in the cycle, even though there are no further references to the cycle itself. The cycle detector is able to detect garbage cycles and can reclaim them. The "gc" module exposes a way to run the detector (the "collect()" function), as well as configuration interfaces and the ability to disable the detector at runtime. The cycle detector is considered an optional component; though it is included by default, it can be disabled at build time using the "--without-cycle-gc" option to the **configure** script on Unix platforms (including Mac OS X). If the cycle detector is disabled in this way, the "gc" module will not be available. 1.10.1. Reference Counting in Python ------------------------------------ There are two macros, "Py_INCREF(x)" and "Py_DECREF(x)", which handle the incrementing and decrementing of the reference count. "Py_DECREF()" also frees the object when the count reaches zero. For flexibility, it doesn’t call "free()" directly — rather, it makes a call through a function pointer in the object’s *type object*. For this purpose (and others), every object also contains a pointer to its type object. The big question now remains: when to use "Py_INCREF(x)" and "Py_DECREF(x)"? Let’s first introduce some terms. Nobody “owns” an object; however, you can *own a reference* to an object. An object’s reference count is now defined as the number of owned references to it. The owner of a reference is responsible for calling "Py_DECREF()" when the reference is no longer needed. Ownership of a reference can be transferred. There are three ways to dispose of an owned reference: pass it on, store it, or call "Py_DECREF()". Forgetting to dispose of an owned reference creates a memory leak. It is also possible to *borrow* [2] a reference to an object. The borrower of a reference should not call "Py_DECREF()". The borrower must not hold on to the object longer than the owner from which it was borrowed. Using a borrowed reference after the owner has disposed of it risks using freed memory and should be avoided completely [3]. The advantage of borrowing over owning a reference is that you don’t need to take care of disposing of the reference on all possible paths through the code — in other words, with a borrowed reference you don’t run the risk of leaking when a premature exit is taken. The disadvantage of borrowing over owning is that there are some subtle situations where in seemingly correct code a borrowed reference can be used after the owner from which it was borrowed has in fact disposed of it. A borrowed reference can be changed into an owned reference by calling "Py_INCREF()". This does not affect the status of the owner from which the reference was borrowed — it creates a new owned reference, and gives full owner responsibilities (the new owner must dispose of the reference properly, as well as the previous owner). 1.10.2. Ownership Rules ----------------------- Whenever an object reference is passed into or out of a function, it is part of the function’s interface specification whether ownership is transferred with the reference or not. Most functions that return a reference to an object pass on ownership with the reference. In particular, all functions whose function it is to create a new object, such as "PyLong_FromLong()" and "Py_BuildValue()", pass ownership to the receiver. Even if the object is not actually new, you still receive ownership of a new reference to that object. For instance, "PyLong_FromLong()" maintains a cache of popular values and can return a reference to a cached item. Many functions that extract objects from other objects also transfer ownership with the reference, for instance "PyObject_GetAttrString()". The picture is less clear, here, however, since a few common routines are exceptions: "PyTuple_GetItem()", "PyList_GetItem()", "PyDict_GetItem()", and "PyDict_GetItemString()" all return references that you borrow from the tuple, list or dictionary. The function "PyImport_AddModule()" also returns a borrowed reference, even though it may actually create the object it returns: this is possible because an owned reference to the object is stored in "sys.modules". When you pass an object reference into another function, in general, the function borrows the reference from you — if it needs to store it, it will use "Py_INCREF()" to become an independent owner. There are exactly two important exceptions to this rule: "PyTuple_SetItem()" and "PyList_SetItem()". These functions take over ownership of the item passed to them — even if they fail! (Note that "PyDict_SetItem()" and friends don’t take over ownership — they are “normal.”) When a C function is called from Python, it borrows references to its arguments from the caller. The caller owns a reference to the object, so the borrowed reference’s lifetime is guaranteed until the function returns. Only when such a borrowed reference must be stored or passed on, it must be turned into an owned reference by calling "Py_INCREF()". The object reference returned from a C function that is called from Python must be an owned reference — ownership is transferred from the function to its caller. 1.10.3. Thin Ice ---------------- There are a few situations where seemingly harmless use of a borrowed reference can lead to problems. These all have to do with implicit invocations of the interpreter, which can cause the owner of a reference to dispose of it. The first and most important case to know about is using "Py_DECREF()" on an unrelated object while borrowing a reference to a list item. For instance: void bug(PyObject *list) { PyObject *item = PyList_GetItem(list, 0); PyList_SetItem(list, 1, PyLong_FromLong(0L)); PyObject_Print(item, stdout, 0); /* BUG! */ } This function first borrows a reference to "list[0]", then replaces "list[1]" with the value "0", and finally prints the borrowed reference. Looks harmless, right? But it’s not! Let’s follow the control flow into "PyList_SetItem()". The list owns references to all its items, so when item 1 is replaced, it has to dispose of the original item 1. Now let’s suppose the original item 1 was an instance of a user-defined class, and let’s further suppose that the class defined a "__del__()" method. If this class instance has a reference count of 1, disposing of it will call its "__del__()" method. Since it is written in Python, the "__del__()" method can execute arbitrary Python code. Could it perhaps do something to invalidate the reference to "item" in "bug()"? You bet! Assuming that the list passed into "bug()" is accessible to the "__del__()" method, it could execute a statement to the effect of "del list[0]", and assuming this was the last reference to that object, it would free the memory associated with it, thereby invalidating "item". The solution, once you know the source of the problem, is easy: temporarily increment the reference count. The correct version of the function reads: void no_bug(PyObject *list) { PyObject *item = PyList_GetItem(list, 0); Py_INCREF(item); PyList_SetItem(list, 1, PyLong_FromLong(0L)); PyObject_Print(item, stdout, 0); Py_DECREF(item); } This is a true story. An older version of Python contained variants of this bug and someone spent a considerable amount of time in a C debugger to figure out why his "__del__()" methods would fail… The second case of problems with a borrowed reference is a variant involving threads. Normally, multiple threads in the Python interpreter can’t get in each other’s way, because there is a global lock protecting Python’s entire object space. However, it is possible to temporarily release this lock using the macro "Py_BEGIN_ALLOW_THREADS", and to re-acquire it using "Py_END_ALLOW_THREADS". This is common around blocking I/O calls, to let other threads use the processor while waiting for the I/O to complete. Obviously, the following function has the same problem as the previous one: void bug(PyObject *list) { PyObject *item = PyList_GetItem(list, 0); Py_BEGIN_ALLOW_THREADS ...some blocking I/O call... Py_END_ALLOW_THREADS PyObject_Print(item, stdout, 0); /* BUG! */ } 1.10.4. NULL Pointers --------------------- In general, functions that take object references as arguments do not expect you to pass them "NULL" pointers, and will dump core (or cause later core dumps) if you do so. Functions that return object references generally return "NULL" only to indicate that an exception occurred. The reason for not testing for "NULL" arguments is that functions often pass the objects they receive on to other function — if each function were to test for "NULL", there would be a lot of redundant tests and the code would run more slowly. It is better to test for "NULL" only at the “source:” when a pointer that may be "NULL" is received, for example, from "malloc()" or from a function that may raise an exception. The macros "Py_INCREF()" and "Py_DECREF()" do not check for "NULL" pointers — however, their variants "Py_XINCREF()" and "Py_XDECREF()" do. The macros for checking for a particular object type ("Pytype_Check()") don’t check for "NULL" pointers — again, there is much code that calls several of these in a row to test an object against various different expected types, and this would generate redundant tests. There are no variants with "NULL" checking. The C function calling mechanism guarantees that the argument list passed to C functions ("args" in the examples) is never "NULL" — in fact it guarantees that it is always a tuple [4]. It is a severe error to ever let a "NULL" pointer “escape” to the Python user. 1.11. Writing Extensions in C++ =============================== It is possible to write extension modules in C++. Some restrictions apply. If the main program (the Python interpreter) is compiled and linked by the C compiler, global or static objects with constructors cannot be used. This is not a problem if the main program is linked by the C++ compiler. Functions that will be called by the Python interpreter (in particular, module initialization functions) have to be declared using "extern "C"". It is unnecessary to enclose the Python header files in "extern "C" {...}" — they use this form already if the symbol "__cplusplus" is defined (all recent C++ compilers define this symbol). 1.12. Providing a C API for an Extension Module =============================================== Many extension modules just provide new functions and types to be used from Python, but sometimes the code in an extension module can be useful for other extension modules. For example, an extension module could implement a type “collection” which works like lists without order. Just like the standard Python list type has a C API which permits extension modules to create and manipulate lists, this new collection type should have a set of C functions for direct manipulation from other extension modules. At first sight this seems easy: just write the functions (without declaring them "static", of course), provide an appropriate header file, and document the C API. And in fact this would work if all extension modules were always linked statically with the Python interpreter. When modules are used as shared libraries, however, the symbols defined in one module may not be visible to another module. The details of visibility depend on the operating system; some systems use one global namespace for the Python interpreter and all extension modules (Windows, for example), whereas others require an explicit list of imported symbols at module link time (AIX is one example), or offer a choice of different strategies (most Unices). And even if symbols are globally visible, the module whose functions one wishes to call might not have been loaded yet! Portability therefore requires not to make any assumptions about symbol visibility. This means that all symbols in extension modules should be declared "static", except for the module’s initialization function, in order to avoid name clashes with other extension modules (as discussed in section The Module’s Method Table and Initialization Function). And it means that symbols that *should* be accessible from other extension modules must be exported in a different way. Python provides a special mechanism to pass C-level information (pointers) from one extension module to another one: Capsules. A Capsule is a Python data type which stores a pointer ("void *"). Capsules can only be created and accessed via their C API, but they can be passed around like any other Python object. In particular, they can be assigned to a name in an extension module’s namespace. Other extension modules can then import this module, retrieve the value of this name, and then retrieve the pointer from the Capsule. There are many ways in which Capsules can be used to export the C API of an extension module. Each function could get its own Capsule, or all C API pointers could be stored in an array whose address is published in a Capsule. And the various tasks of storing and retrieving the pointers can be distributed in different ways between the module providing the code and the client modules. Whichever method you choose, it’s important to name your Capsules properly. The function "PyCapsule_New()" takes a name parameter ("const char *"); you’re permitted to pass in a "NULL" name, but we strongly encourage you to specify a name. Properly named Capsules provide a degree of runtime type-safety; there is no feasible way to tell one unnamed Capsule from another. In particular, Capsules used to expose C APIs should be given a name following this convention: modulename.attributename The convenience function "PyCapsule_Import()" makes it easy to load a C API provided via a Capsule, but only if the Capsule’s name matches this convention. This behavior gives C API users a high degree of certainty that the Capsule they load contains the correct C API. The following example demonstrates an approach that puts most of the burden on the writer of the exporting module, which is appropriate for commonly used library modules. It stores all C API pointers (just one in the example!) in an array of "void" pointers which becomes the value of a Capsule. The header file corresponding to the module provides a macro that takes care of importing the module and retrieving its C API pointers; client modules only have to call this macro before accessing the C API. The exporting module is a modification of the "spam" module from section A Simple Example. The function "spam.system()" does not call the C library function "system()" directly, but a function "PySpam_System()", which would of course do something more complicated in reality (such as adding “spam” to every command). This function "PySpam_System()" is also exported to other extension modules. The function "PySpam_System()" is a plain C function, declared "static" like everything else: static int PySpam_System(const char *command) { return system(command); } The function "spam_system()" is modified in a trivial way: static PyObject * spam_system(PyObject *self, PyObject *args) { const char *command; int sts; if (!PyArg_ParseTuple(args, "s", &command)) return NULL; sts = PySpam_System(command); return PyLong_FromLong(sts); } In the beginning of the module, right after the line #include two more lines must be added: #define SPAM_MODULE #include "spammodule.h" The "#define" is used to tell the header file that it is being included in the exporting module, not a client module. Finally, the module’s initialization function must take care of initializing the C API pointer array: PyMODINIT_FUNC PyInit_spam(void) { PyObject *m; static void *PySpam_API[PySpam_API_pointers]; PyObject *c_api_object; m = PyModule_Create(&spammodule); if (m == NULL) return NULL; /* Initialize the C API pointer array */ PySpam_API[PySpam_System_NUM] = (void *)PySpam_System; /* Create a Capsule containing the API pointer array's address */ c_api_object = PyCapsule_New((void *)PySpam_API, "spam._C_API", NULL); if (PyModule_AddObject(m, "_C_API", c_api_object) < 0) { Py_XDECREF(c_api_object); Py_DECREF(m); return NULL; } return m; } Note that "PySpam_API" is declared "static"; otherwise the pointer array would disappear when "PyInit_spam()" terminates! The bulk of the work is in the header file "spammodule.h", which looks like this: #ifndef Py_SPAMMODULE_H #define Py_SPAMMODULE_H #ifdef __cplusplus extern "C" { #endif /* Header file for spammodule */ /* C API functions */ #define PySpam_System_NUM 0 #define PySpam_System_RETURN int #define PySpam_System_PROTO (const char *command) /* Total number of C API pointers */ #define PySpam_API_pointers 1 #ifdef SPAM_MODULE /* This section is used when compiling spammodule.c */ static PySpam_System_RETURN PySpam_System PySpam_System_PROTO; #else /* This section is used in modules that use spammodule's API */ static void **PySpam_API; #define PySpam_System \ (*(PySpam_System_RETURN (*)PySpam_System_PROTO) PySpam_API[PySpam_System_NUM]) /* Return -1 on error, 0 on success. * PyCapsule_Import will set an exception if there's an error. */ static int import_spam(void) { PySpam_API = (void **)PyCapsule_Import("spam._C_API", 0); return (PySpam_API != NULL) ? 0 : -1; } #endif #ifdef __cplusplus } #endif #endif /* !defined(Py_SPAMMODULE_H) */ All that a client module must do in order to have access to the function "PySpam_System()" is to call the function (or rather macro) "import_spam()" in its initialization function: PyMODINIT_FUNC PyInit_client(void) { PyObject *m; m = PyModule_Create(&clientmodule); if (m == NULL) return NULL; if (import_spam() < 0) return NULL; /* additional initialization can happen here */ return m; } The main disadvantage of this approach is that the file "spammodule.h" is rather complicated. However, the basic structure is the same for each function that is exported, so it has to be learned only once. Finally it should be mentioned that Capsules offer additional functionality, which is especially useful for memory allocation and deallocation of the pointer stored in a Capsule. The details are described in the Python/C API Reference Manual in the section Capsules and in the implementation of Capsules (files "Include/pycapsule.h" and "Objects/pycapsule.c" in the Python source code distribution). -[ Footnotes ]- [1] An interface for this function already exists in the standard module "os" — it was chosen as a simple and straightforward example. [2] The metaphor of “borrowing” a reference is not completely correct: the owner still has a copy of the reference. [3] Checking that the reference count is at least 1 **does not work** — the reference count itself could be in freed memory and may thus be reused for another object! [4] These guarantees don’t hold when you use the “old” style calling convention — this is still found in much existing code. Extending and Embedding the Python Interpreter ********************************************** This document describes how to write modules in C or C++ to extend the Python interpreter with new modules. Those modules can not only define new functions but also new object types and their methods. The document also describes how to embed the Python interpreter in another application, for use as an extension language. Finally, it shows how to compile and link extension modules so that they can be loaded dynamically (at run time) into the interpreter, if the underlying operating system supports this feature. This document assumes basic knowledge about Python. For an informal introduction to the language, see The Python Tutorial. The Python Language Reference gives a more formal definition of the language. The Python Standard Library documents the existing object types, functions and modules (both built-in and written in Python) that give the language its wide application range. For a detailed description of the whole Python/C API, see the separate Python/C API Reference Manual. Recommended third party tools ============================= This guide only covers the basic tools for creating extensions provided as part of this version of CPython. Third party tools like Cython, cffi, SWIG and Numba offer both simpler and more sophisticated approaches to creating C and C++ extensions for Python. See also: Python Packaging User Guide: Binary Extensions The Python Packaging User Guide not only covers several available tools that simplify the creation of binary extensions, but also discusses the various reasons why creating an extension module may be desirable in the first place. Creating extensions without third party tools ============================================= This section of the guide covers creating C and C++ extensions without assistance from third party tools. It is intended primarily for creators of those tools, rather than being a recommended way to create your own C extensions. * 1. Extending Python with C or C++ * 1.1. A Simple Example * 1.2. Intermezzo: Errors and Exceptions * 1.3. Back to the Example * 1.4. The Module’s Method Table and Initialization Function * 1.5. Compilation and Linkage * 1.6. Calling Python Functions from C * 1.7. Extracting Parameters in Extension Functions * 1.8. Keyword Parameters for Extension Functions * 1.9. Building Arbitrary Values * 1.10. Reference Counts * 1.11. Writing Extensions in C++ * 1.12. Providing a C API for an Extension Module * 2. Defining Extension Types: Tutorial * 2.1. The Basics * 2.2. Adding data and methods to the Basic example * 2.3. Providing finer control over data attributes * 2.4. Supporting cyclic garbage collection * 2.5. Subclassing other types * 3. Defining Extension Types: Assorted Topics * 3.1. Finalization and De-allocation * 3.2. Object Presentation * 3.3. Attribute Management * 3.4. Object Comparison * 3.5. Abstract Protocol Support * 3.6. Weak Reference Support * 3.7. More Suggestions * 4. Building C and C++ Extensions * 4.1. Building C and C++ Extensions with distutils * 4.2. Distributing your extension modules * 5. Building C and C++ Extensions on Windows * 5.1. A Cookbook Approach * 5.2. Differences Between Unix and Windows * 5.3. Using DLLs in Practice Embedding the CPython runtime in a larger application ===================================================== Sometimes, rather than creating an extension that runs inside the Python interpreter as the main application, it is desirable to instead embed the CPython runtime inside a larger application. This section covers some of the details involved in doing that successfully. * 1. Embedding Python in Another Application * 1.1. Very High Level Embedding * 1.2. Beyond Very High Level Embedding: An overview * 1.3. Pure Embedding * 1.4. Extending Embedded Python * 1.5. Embedding Python in C++ * 1.6. Compiling and Linking under Unix-like systems 3. Defining Extension Types: Assorted Topics ******************************************** This section aims to give a quick fly-by on the various type methods you can implement and what they do. Here is the definition of "PyTypeObject", with some fields only used in debug builds omitted: typedef struct _typeobject { PyObject_VAR_HEAD const char *tp_name; /* For printing, in format "." */ Py_ssize_t tp_basicsize, tp_itemsize; /* For allocation */ /* Methods to implement standard operations */ destructor tp_dealloc; Py_ssize_t tp_vectorcall_offset; getattrfunc tp_getattr; setattrfunc tp_setattr; PyAsyncMethods *tp_as_async; /* formerly known as tp_compare (Python 2) or tp_reserved (Python 3) */ reprfunc tp_repr; /* Method suites for standard classes */ PyNumberMethods *tp_as_number; PySequenceMethods *tp_as_sequence; PyMappingMethods *tp_as_mapping; /* More standard operations (here for binary compatibility) */ hashfunc tp_hash; ternaryfunc tp_call; reprfunc tp_str; getattrofunc tp_getattro; setattrofunc tp_setattro; /* Functions to access object as input/output buffer */ PyBufferProcs *tp_as_buffer; /* Flags to define presence of optional/expanded features */ unsigned long tp_flags; const char *tp_doc; /* Documentation string */ /* call function for all accessible objects */ traverseproc tp_traverse; /* delete references to contained objects */ inquiry tp_clear; /* rich comparisons */ richcmpfunc tp_richcompare; /* weak reference enabler */ Py_ssize_t tp_weaklistoffset; /* Iterators */ getiterfunc tp_iter; iternextfunc tp_iternext; /* Attribute descriptor and subclassing stuff */ struct PyMethodDef *tp_methods; struct PyMemberDef *tp_members; struct PyGetSetDef *tp_getset; struct _typeobject *tp_base; PyObject *tp_dict; descrgetfunc tp_descr_get; descrsetfunc tp_descr_set; Py_ssize_t tp_dictoffset; initproc tp_init; allocfunc tp_alloc; newfunc tp_new; freefunc tp_free; /* Low-level free-memory routine */ inquiry tp_is_gc; /* For PyObject_IS_GC */ PyObject *tp_bases; PyObject *tp_mro; /* method resolution order */ PyObject *tp_cache; PyObject *tp_subclasses; PyObject *tp_weaklist; destructor tp_del; /* Type attribute cache version tag. Added in version 2.6 */ unsigned int tp_version_tag; destructor tp_finalize; } PyTypeObject; Now that’s a *lot* of methods. Don’t worry too much though – if you have a type you want to define, the chances are very good that you will only implement a handful of these. As you probably expect by now, we’re going to go over this and give more information about the various handlers. We won’t go in the order they are defined in the structure, because there is a lot of historical baggage that impacts the ordering of the fields. It’s often easiest to find an example that includes the fields you need and then change the values to suit your new type. const char *tp_name; /* For printing */ The name of the type – as mentioned in the previous chapter, this will appear in various places, almost entirely for diagnostic purposes. Try to choose something that will be helpful in such a situation! Py_ssize_t tp_basicsize, tp_itemsize; /* For allocation */ These fields tell the runtime how much memory to allocate when new objects of this type are created. Python has some built-in support for variable length structures (think: strings, tuples) which is where the "tp_itemsize" field comes in. This will be dealt with later. const char *tp_doc; Here you can put a string (or its address) that you want returned when the Python script references "obj.__doc__" to retrieve the doc string. Now we come to the basic type methods – the ones most extension types will implement. 3.1. Finalization and De-allocation =================================== destructor tp_dealloc; This function is called when the reference count of the instance of your type is reduced to zero and the Python interpreter wants to reclaim it. If your type has memory to free or other clean-up to perform, you can put it here. The object itself needs to be freed here as well. Here is an example of this function: static void newdatatype_dealloc(newdatatypeobject *obj) { free(obj->obj_UnderlyingDatatypePtr); Py_TYPE(obj)->tp_free((PyObject *)obj); } If your type supports garbage collection, the destructor should call "PyObject_GC_UnTrack()" before clearing any member fields: static void newdatatype_dealloc(newdatatypeobject *obj) { PyObject_GC_UnTrack(obj); Py_CLEAR(obj->other_obj); ... Py_TYPE(obj)->tp_free((PyObject *)obj); } One important requirement of the deallocator function is that it leaves any pending exceptions alone. This is important since deallocators are frequently called as the interpreter unwinds the Python stack; when the stack is unwound due to an exception (rather than normal returns), nothing is done to protect the deallocators from seeing that an exception has already been set. Any actions which a deallocator performs which may cause additional Python code to be executed may detect that an exception has been set. This can lead to misleading errors from the interpreter. The proper way to protect against this is to save a pending exception before performing the unsafe action, and restoring it when done. This can be done using the "PyErr_Fetch()" and "PyErr_Restore()" functions: static void my_dealloc(PyObject *obj) { MyObject *self = (MyObject *) obj; PyObject *cbresult; if (self->my_callback != NULL) { PyObject *err_type, *err_value, *err_traceback; /* This saves the current exception state */ PyErr_Fetch(&err_type, &err_value, &err_traceback); cbresult = PyObject_CallNoArgs(self->my_callback); if (cbresult == NULL) PyErr_WriteUnraisable(self->my_callback); else Py_DECREF(cbresult); /* This restores the saved exception state */ PyErr_Restore(err_type, err_value, err_traceback); Py_DECREF(self->my_callback); } Py_TYPE(obj)->tp_free((PyObject*)self); } Note: There are limitations to what you can safely do in a deallocator function. First, if your type supports garbage collection (using "tp_traverse" and/or "tp_clear"), some of the object’s members can have been cleared or finalized by the time "tp_dealloc" is called. Second, in "tp_dealloc", your object is in an unstable state: its reference count is equal to zero. Any call to a non-trivial object or API (as in the example above) might end up calling "tp_dealloc" again, causing a double free and a crash.Starting with Python 3.4, it is recommended not to put any complex finalization code in "tp_dealloc", and instead use the new "tp_finalize" type method. See also: **PEP 442** explains the new finalization scheme. 3.2. Object Presentation ======================== In Python, there are two ways to generate a textual representation of an object: the "repr()" function, and the "str()" function. (The "print()" function just calls "str()".) These handlers are both optional. reprfunc tp_repr; reprfunc tp_str; The "tp_repr" handler should return a string object containing a representation of the instance for which it is called. Here is a simple example: static PyObject * newdatatype_repr(newdatatypeobject * obj) { return PyUnicode_FromFormat("Repr-ified_newdatatype{{size:%d}}", obj->obj_UnderlyingDatatypePtr->size); } If no "tp_repr" handler is specified, the interpreter will supply a representation that uses the type’s "tp_name" and a uniquely- identifying value for the object. The "tp_str" handler is to "str()" what the "tp_repr" handler described above is to "repr()"; that is, it is called when Python code calls "str()" on an instance of your object. Its implementation is very similar to the "tp_repr" function, but the resulting string is intended for human consumption. If "tp_str" is not specified, the "tp_repr" handler is used instead. Here is a simple example: static PyObject * newdatatype_str(newdatatypeobject * obj) { return PyUnicode_FromFormat("Stringified_newdatatype{{size:%d}}", obj->obj_UnderlyingDatatypePtr->size); } 3.3. Attribute Management ========================= For every object which can support attributes, the corresponding type must provide the functions that control how the attributes are resolved. There needs to be a function which can retrieve attributes (if any are defined), and another to set attributes (if setting attributes is allowed). Removing an attribute is a special case, for which the new value passed to the handler is "NULL". Python supports two pairs of attribute handlers; a type that supports attributes only needs to implement the functions for one pair. The difference is that one pair takes the name of the attribute as a "char*", while the other accepts a "PyObject*". Each type can use whichever pair makes more sense for the implementation’s convenience. getattrfunc tp_getattr; /* char * version */ setattrfunc tp_setattr; /* ... */ getattrofunc tp_getattro; /* PyObject * version */ setattrofunc tp_setattro; If accessing attributes of an object is always a simple operation (this will be explained shortly), there are generic implementations which can be used to provide the "PyObject*" version of the attribute management functions. The actual need for type-specific attribute handlers almost completely disappeared starting with Python 2.2, though there are many examples which have not been updated to use some of the new generic mechanism that is available. 3.3.1. Generic Attribute Management ----------------------------------- Most extension types only use *simple* attributes. So, what makes the attributes simple? There are only a couple of conditions that must be met: 1. The name of the attributes must be known when "PyType_Ready()" is called. 2. No special processing is needed to record that an attribute was looked up or set, nor do actions need to be taken based on the value. Note that this list does not place any restrictions on the values of the attributes, when the values are computed, or how relevant data is stored. When "PyType_Ready()" is called, it uses three tables referenced by the type object to create *descriptor*s which are placed in the dictionary of the type object. Each descriptor controls access to one attribute of the instance object. Each of the tables is optional; if all three are "NULL", instances of the type will only have attributes that are inherited from their base type, and should leave the "tp_getattro" and "tp_setattro" fields "NULL" as well, allowing the base type to handle attributes. The tables are declared as three fields of the type object: struct PyMethodDef *tp_methods; struct PyMemberDef *tp_members; struct PyGetSetDef *tp_getset; If "tp_methods" is not "NULL", it must refer to an array of "PyMethodDef" structures. Each entry in the table is an instance of this structure: typedef struct PyMethodDef { const char *ml_name; /* method name */ PyCFunction ml_meth; /* implementation function */ int ml_flags; /* flags */ const char *ml_doc; /* docstring */ } PyMethodDef; One entry should be defined for each method provided by the type; no entries are needed for methods inherited from a base type. One additional entry is needed at the end; it is a sentinel that marks the end of the array. The "ml_name" field of the sentinel must be "NULL". The second table is used to define attributes which map directly to data stored in the instance. A variety of primitive C types are supported, and access may be read-only or read-write. The structures in the table are defined as: typedef struct PyMemberDef { const char *name; int type; int offset; int flags; const char *doc; } PyMemberDef; For each entry in the table, a *descriptor* will be constructed and added to the type which will be able to extract a value from the instance structure. The "type" field should contain one of the type codes defined in the "structmember.h" header; the value will be used to determine how to convert Python values to and from C values. The "flags" field is used to store flags which control how the attribute can be accessed. The following flag constants are defined in "structmember.h"; they may be combined using bitwise-OR. +-----------------------------+------------------------------------------------+ | Constant | Meaning | |=============================|================================================| | "READONLY" | Never writable. | +-----------------------------+------------------------------------------------+ | "READ_RESTRICTED" | Not readable in restricted mode. | +-----------------------------+------------------------------------------------+ | "WRITE_RESTRICTED" | Not writable in restricted mode. | +-----------------------------+------------------------------------------------+ | "RESTRICTED" | Not readable or writable in restricted mode. | +-----------------------------+------------------------------------------------+ An interesting advantage of using the "tp_members" table to build descriptors that are used at runtime is that any attribute defined this way can have an associated doc string simply by providing the text in the table. An application can use the introspection API to retrieve the descriptor from the class object, and get the doc string using its "__doc__" attribute. As with the "tp_methods" table, a sentinel entry with a "name" value of "NULL" is required. 3.3.2. Type-specific Attribute Management ----------------------------------------- For simplicity, only the "char*" version will be demonstrated here; the type of the name parameter is the only difference between the "char*" and "PyObject*" flavors of the interface. This example effectively does the same thing as the generic example above, but does not use the generic support added in Python 2.2. It explains how the handler functions are called, so that if you do need to extend their functionality, you’ll understand what needs to be done. The "tp_getattr" handler is called when the object requires an attribute look-up. It is called in the same situations where the "__getattr__()" method of a class would be called. Here is an example: static PyObject * newdatatype_getattr(newdatatypeobject *obj, char *name) { if (strcmp(name, "data") == 0) { return PyLong_FromLong(obj->data); } PyErr_Format(PyExc_AttributeError, "'%.50s' object has no attribute '%.400s'", tp->tp_name, name); return NULL; } The "tp_setattr" handler is called when the "__setattr__()" or "__delattr__()" method of a class instance would be called. When an attribute should be deleted, the third parameter will be "NULL". Here is an example that simply raises an exception; if this were really all you wanted, the "tp_setattr" handler should be set to "NULL". static int newdatatype_setattr(newdatatypeobject *obj, char *name, PyObject *v) { PyErr_Format(PyExc_RuntimeError, "Read-only attribute: %s", name); return -1; } 3.4. Object Comparison ====================== richcmpfunc tp_richcompare; The "tp_richcompare" handler is called when comparisons are needed. It is analogous to the rich comparison methods, like "__lt__()", and also called by "PyObject_RichCompare()" and "PyObject_RichCompareBool()". This function is called with two Python objects and the operator as arguments, where the operator is one of "Py_EQ", "Py_NE", "Py_LE", "Py_GE", "Py_LT" or "Py_GT". It should compare the two objects with respect to the specified operator and return "Py_True" or "Py_False" if the comparison is successful, "Py_NotImplemented" to indicate that comparison is not implemented and the other object’s comparison method should be tried, or "NULL" if an exception was set. Here is a sample implementation, for a datatype that is considered equal if the size of an internal pointer is equal: static PyObject * newdatatype_richcmp(PyObject *obj1, PyObject *obj2, int op) { PyObject *result; int c, size1, size2; /* code to make sure that both arguments are of type newdatatype omitted */ size1 = obj1->obj_UnderlyingDatatypePtr->size; size2 = obj2->obj_UnderlyingDatatypePtr->size; switch (op) { case Py_LT: c = size1 < size2; break; case Py_LE: c = size1 <= size2; break; case Py_EQ: c = size1 == size2; break; case Py_NE: c = size1 != size2; break; case Py_GT: c = size1 > size2; break; case Py_GE: c = size1 >= size2; break; } result = c ? Py_True : Py_False; Py_INCREF(result); return result; } 3.5. Abstract Protocol Support ============================== Python supports a variety of *abstract* ‘protocols;’ the specific interfaces provided to use these interfaces are documented in Abstract Objects Layer. A number of these abstract interfaces were defined early in the development of the Python implementation. In particular, the number, mapping, and sequence protocols have been part of Python since the beginning. Other protocols have been added over time. For protocols which depend on several handler routines from the type implementation, the older protocols have been defined as optional blocks of handlers referenced by the type object. For newer protocols there are additional slots in the main type object, with a flag bit being set to indicate that the slots are present and should be checked by the interpreter. (The flag bit does not indicate that the slot values are non-"NULL". The flag may be set to indicate the presence of a slot, but a slot may still be unfilled.) PyNumberMethods *tp_as_number; PySequenceMethods *tp_as_sequence; PyMappingMethods *tp_as_mapping; If you wish your object to be able to act like a number, a sequence, or a mapping object, then you place the address of a structure that implements the C type "PyNumberMethods", "PySequenceMethods", or "PyMappingMethods", respectively. It is up to you to fill in this structure with appropriate values. You can find examples of the use of each of these in the "Objects" directory of the Python source distribution. hashfunc tp_hash; This function, if you choose to provide it, should return a hash number for an instance of your data type. Here is a simple example: static Py_hash_t newdatatype_hash(newdatatypeobject *obj) { Py_hash_t result; result = obj->some_size + 32767 * obj->some_number; if (result == -1) result = -2; return result; } "Py_hash_t" is a signed integer type with a platform-varying width. Returning "-1" from "tp_hash" indicates an error, which is why you should be careful to avoid returning it when hash computation is successful, as seen above. ternaryfunc tp_call; This function is called when an instance of your data type is “called”, for example, if "obj1" is an instance of your data type and the Python script contains "obj1('hello')", the "tp_call" handler is invoked. This function takes three arguments: 1. *self* is the instance of the data type which is the subject of the call. If the call is "obj1('hello')", then *self* is "obj1". 2. *args* is a tuple containing the arguments to the call. You can use "PyArg_ParseTuple()" to extract the arguments. 3. *kwds* is a dictionary of keyword arguments that were passed. If this is non-"NULL" and you support keyword arguments, use "PyArg_ParseTupleAndKeywords()" to extract the arguments. If you do not want to support keyword arguments and this is non-"NULL", raise a "TypeError" with a message saying that keyword arguments are not supported. Here is a toy "tp_call" implementation: static PyObject * newdatatype_call(newdatatypeobject *self, PyObject *args, PyObject *kwds) { PyObject *result; const char *arg1; const char *arg2; const char *arg3; if (!PyArg_ParseTuple(args, "sss:call", &arg1, &arg2, &arg3)) { return NULL; } result = PyUnicode_FromFormat( "Returning -- value: [%d] arg1: [%s] arg2: [%s] arg3: [%s]\n", obj->obj_UnderlyingDatatypePtr->size, arg1, arg2, arg3); return result; } /* Iterators */ getiterfunc tp_iter; iternextfunc tp_iternext; These functions provide support for the iterator protocol. Both handlers take exactly one parameter, the instance for which they are being called, and return a new reference. In the case of an error, they should set an exception and return "NULL". "tp_iter" corresponds to the Python "__iter__()" method, while "tp_iternext" corresponds to the Python "__next__()" method. Any *iterable* object must implement the "tp_iter" handler, which must return an *iterator* object. Here the same guidelines apply as for Python classes: * For collections (such as lists and tuples) which can support multiple independent iterators, a new iterator should be created and returned by each call to "tp_iter". * Objects which can only be iterated over once (usually due to side effects of iteration, such as file objects) can implement "tp_iter" by returning a new reference to themselves – and should also therefore implement the "tp_iternext" handler. Any *iterator* object should implement both "tp_iter" and "tp_iternext". An iterator’s "tp_iter" handler should return a new reference to the iterator. Its "tp_iternext" handler should return a new reference to the next object in the iteration, if there is one. If the iteration has reached the end, "tp_iternext" may return "NULL" without setting an exception, or it may set "StopIteration" *in addition* to returning "NULL"; avoiding the exception can yield slightly better performance. If an actual error occurs, "tp_iternext" should always set an exception and return "NULL". 3.6. Weak Reference Support =========================== One of the goals of Python’s weak reference implementation is to allow any type to participate in the weak reference mechanism without incurring the overhead on performance-critical objects (such as numbers). See also: Documentation for the "weakref" module. For an object to be weakly referencable, the extension type must do two things: 1. Include a "PyObject*" field in the C object structure dedicated to the weak reference mechanism. The object’s constructor should leave it "NULL" (which is automatic when using the default "tp_alloc"). 2. Set the "tp_weaklistoffset" type member to the offset of the aforementioned field in the C object structure, so that the interpreter knows how to access and modify that field. Concretely, here is how a trivial object structure would be augmented with the required field: typedef struct { PyObject_HEAD PyObject *weakreflist; /* List of weak references */ } TrivialObject; And the corresponding member in the statically-declared type object: static PyTypeObject TrivialType = { PyVarObject_HEAD_INIT(NULL, 0) /* ... other members omitted for brevity ... */ .tp_weaklistoffset = offsetof(TrivialObject, weakreflist), }; The only further addition is that "tp_dealloc" needs to clear any weak references (by calling "PyObject_ClearWeakRefs()") if the field is non-"NULL": static void Trivial_dealloc(TrivialObject *self) { /* Clear weakrefs first before calling any destructors */ if (self->weakreflist != NULL) PyObject_ClearWeakRefs((PyObject *) self); /* ... remainder of destruction code omitted for brevity ... */ Py_TYPE(self)->tp_free((PyObject *) self); } 3.7. More Suggestions ===================== In order to learn how to implement any specific method for your new data type, get the *CPython* source code. Go to the "Objects" directory, then search the C source files for "tp_" plus the function you want (for example, "tp_richcompare"). You will find examples of the function you want to implement. When you need to verify that an object is a concrete instance of the type you are implementing, use the "PyObject_TypeCheck()" function. A sample of its use might be something like the following: if (!PyObject_TypeCheck(some_object, &MyType)) { PyErr_SetString(PyExc_TypeError, "arg #1 not a mything"); return NULL; } See also: Download CPython source releases. https://www.python.org/downloads/source/ The CPython project on GitHub, where the CPython source code is developed. https://github.com/python/cpython 2. Defining Extension Types: Tutorial ************************************* Python allows the writer of a C extension module to define new types that can be manipulated from Python code, much like the built-in "str" and "list" types. The code for all extension types follows a pattern, but there are some details that you need to understand before you can get started. This document is a gentle introduction to the topic. 2.1. The Basics =============== The *CPython* runtime sees all Python objects as variables of type "PyObject*", which serves as a “base type” for all Python objects. The "PyObject" structure itself only contains the object’s *reference count* and a pointer to the object’s “type object”. This is where the action is; the type object determines which (C) functions get called by the interpreter when, for instance, an attribute gets looked up on an object, a method called, or it is multiplied by another object. These C functions are called “type methods”. So, if you want to define a new extension type, you need to create a new type object. This sort of thing can only be explained by example, so here’s a minimal, but complete, module that defines a new type named "Custom" inside a C extension module "custom": Note: What we’re showing here is the traditional way of defining *static* extension types. It should be adequate for most uses. The C API also allows defining heap-allocated extension types using the "PyType_FromSpec()" function, which isn’t covered in this tutorial. #define PY_SSIZE_T_CLEAN #include typedef struct { PyObject_HEAD /* Type-specific fields go here. */ } CustomObject; static PyTypeObject CustomType = { PyVarObject_HEAD_INIT(NULL, 0) .tp_name = "custom.Custom", .tp_doc = PyDoc_STR("Custom objects"), .tp_basicsize = sizeof(CustomObject), .tp_itemsize = 0, .tp_flags = Py_TPFLAGS_DEFAULT, .tp_new = PyType_GenericNew, }; static PyModuleDef custommodule = { PyModuleDef_HEAD_INIT, .m_name = "custom", .m_doc = "Example module that creates an extension type.", .m_size = -1, }; PyMODINIT_FUNC PyInit_custom(void) { PyObject *m; if (PyType_Ready(&CustomType) < 0) return NULL; m = PyModule_Create(&custommodule); if (m == NULL) return NULL; Py_INCREF(&CustomType); if (PyModule_AddObject(m, "Custom", (PyObject *) &CustomType) < 0) { Py_DECREF(&CustomType); Py_DECREF(m); return NULL; } return m; } Now that’s quite a bit to take in at once, but hopefully bits will seem familiar from the previous chapter. This file defines three things: 1. What a "Custom" **object** contains: this is the "CustomObject" struct, which is allocated once for each "Custom" instance. 2. How the "Custom" **type** behaves: this is the "CustomType" struct, which defines a set of flags and function pointers that the interpreter inspects when specific operations are requested. 3. How to initialize the "custom" module: this is the "PyInit_custom" function and the associated "custommodule" struct. The first bit is: typedef struct { PyObject_HEAD } CustomObject; This is what a Custom object will contain. "PyObject_HEAD" is mandatory at the start of each object struct and defines a field called "ob_base" of type "PyObject", containing a pointer to a type object and a reference count (these can be accessed using the macros "Py_TYPE" and "Py_REFCNT" respectively). The reason for the macro is to abstract away the layout and to enable additional fields in debug builds. Note: There is no semicolon above after the "PyObject_HEAD" macro. Be wary of adding one by accident: some compilers will complain. Of course, objects generally store additional data besides the standard "PyObject_HEAD" boilerplate; for example, here is the definition for standard Python floats: typedef struct { PyObject_HEAD double ob_fval; } PyFloatObject; The second bit is the definition of the type object. static PyTypeObject CustomType = { PyVarObject_HEAD_INIT(NULL, 0) .tp_name = "custom.Custom", .tp_doc = PyDoc_STR("Custom objects"), .tp_basicsize = sizeof(CustomObject), .tp_itemsize = 0, .tp_flags = Py_TPFLAGS_DEFAULT, .tp_new = PyType_GenericNew, }; Note: We recommend using C99-style designated initializers as above, to avoid listing all the "PyTypeObject" fields that you don’t care about and also to avoid caring about the fields’ declaration order. The actual definition of "PyTypeObject" in "object.h" has many more fields than the definition above. The remaining fields will be filled with zeros by the C compiler, and it’s common practice to not specify them explicitly unless you need them. We’re going to pick it apart, one field at a time: PyVarObject_HEAD_INIT(NULL, 0) This line is mandatory boilerplate to initialize the "ob_base" field mentioned above. .tp_name = "custom.Custom", The name of our type. This will appear in the default textual representation of our objects and in some error messages, for example: >>> "" + custom.Custom() Traceback (most recent call last): File "", line 1, in TypeError: can only concatenate str (not "custom.Custom") to str Note that the name is a dotted name that includes both the module name and the name of the type within the module. The module in this case is "custom" and the type is "Custom", so we set the type name to "custom.Custom". Using the real dotted import path is important to make your type compatible with the "pydoc" and "pickle" modules. .tp_basicsize = sizeof(CustomObject), .tp_itemsize = 0, This is so that Python knows how much memory to allocate when creating new "Custom" instances. "tp_itemsize" is only used for variable-sized objects and should otherwise be zero. Note: If you want your type to be subclassable from Python, and your type has the same "tp_basicsize" as its base type, you may have problems with multiple inheritance. A Python subclass of your type will have to list your type first in its "__bases__", or else it will not be able to call your type’s "__new__()" method without getting an error. You can avoid this problem by ensuring that your type has a larger value for "tp_basicsize" than its base type does. Most of the time, this will be true anyway, because either your base type will be "object", or else you will be adding data members to your base type, and therefore increasing its size. We set the class flags to "Py_TPFLAGS_DEFAULT". .tp_flags = Py_TPFLAGS_DEFAULT, All types should include this constant in their flags. It enables all of the members defined until at least Python 3.3. If you need further members, you will need to OR the corresponding flags. We provide a doc string for the type in "tp_doc". .tp_doc = PyDoc_STR("Custom objects"), To enable object creation, we have to provide a "tp_new" handler. This is the equivalent of the Python method "__new__()", but has to be specified explicitly. In this case, we can just use the default implementation provided by the API function "PyType_GenericNew()". .tp_new = PyType_GenericNew, Everything else in the file should be familiar, except for some code in "PyInit_custom()": if (PyType_Ready(&CustomType) < 0) return; This initializes the "Custom" type, filling in a number of members to the appropriate default values, including "ob_type" that we initially set to "NULL". Py_INCREF(&CustomType); if (PyModule_AddObject(m, "Custom", (PyObject *) &CustomType) < 0) { Py_DECREF(&CustomType); Py_DECREF(m); return NULL; } This adds the type to the module dictionary. This allows us to create "Custom" instances by calling the "Custom" class: >>> import custom >>> mycustom = custom.Custom() That’s it! All that remains is to build it; put the above code in a file called "custom.c" and: from distutils.core import setup, Extension setup(name="custom", version="1.0", ext_modules=[Extension("custom", ["custom.c"])]) in a file called "setup.py"; then typing $ python setup.py build at a shell should produce a file "custom.so" in a subdirectory; move to that directory and fire up Python — you should be able to "import custom" and play around with Custom objects. That wasn’t so hard, was it? Of course, the current Custom type is pretty uninteresting. It has no data and doesn’t do anything. It can’t even be subclassed. Note: While this documentation showcases the standard "distutils" module for building C extensions, it is recommended in real-world use cases to use the newer and better-maintained "setuptools" library. Documentation on how to do this is out of scope for this document and can be found in the Python Packaging User’s Guide. 2.2. Adding data and methods to the Basic example ================================================= Let’s extend the basic example to add some data and methods. Let’s also make the type usable as a base class. We’ll create a new module, "custom2" that adds these capabilities: #define PY_SSIZE_T_CLEAN #include #include "structmember.h" typedef struct { PyObject_HEAD PyObject *first; /* first name */ PyObject *last; /* last name */ int number; } CustomObject; static void Custom_dealloc(CustomObject *self) { Py_XDECREF(self->first); Py_XDECREF(self->last); Py_TYPE(self)->tp_free((PyObject *) self); } static PyObject * Custom_new(PyTypeObject *type, PyObject *args, PyObject *kwds) { CustomObject *self; self = (CustomObject *) type->tp_alloc(type, 0); if (self != NULL) { self->first = PyUnicode_FromString(""); if (self->first == NULL) { Py_DECREF(self); return NULL; } self->last = PyUnicode_FromString(""); if (self->last == NULL) { Py_DECREF(self); return NULL; } self->number = 0; } return (PyObject *) self; } static int Custom_init(CustomObject *self, PyObject *args, PyObject *kwds) { static char *kwlist[] = {"first", "last", "number", NULL}; PyObject *first = NULL, *last = NULL, *tmp; if (!PyArg_ParseTupleAndKeywords(args, kwds, "|OOi", kwlist, &first, &last, &self->number)) return -1; if (first) { tmp = self->first; Py_INCREF(first); self->first = first; Py_XDECREF(tmp); } if (last) { tmp = self->last; Py_INCREF(last); self->last = last; Py_XDECREF(tmp); } return 0; } static PyMemberDef Custom_members[] = { {"first", T_OBJECT_EX, offsetof(CustomObject, first), 0, "first name"}, {"last", T_OBJECT_EX, offsetof(CustomObject, last), 0, "last name"}, {"number", T_INT, offsetof(CustomObject, number), 0, "custom number"}, {NULL} /* Sentinel */ }; static PyObject * Custom_name(CustomObject *self, PyObject *Py_UNUSED(ignored)) { if (self->first == NULL) { PyErr_SetString(PyExc_AttributeError, "first"); return NULL; } if (self->last == NULL) { PyErr_SetString(PyExc_AttributeError, "last"); return NULL; } return PyUnicode_FromFormat("%S %S", self->first, self->last); } static PyMethodDef Custom_methods[] = { {"name", (PyCFunction) Custom_name, METH_NOARGS, "Return the name, combining the first and last name" }, {NULL} /* Sentinel */ }; static PyTypeObject CustomType = { PyVarObject_HEAD_INIT(NULL, 0) .tp_name = "custom2.Custom", .tp_doc = PyDoc_STR("Custom objects"), .tp_basicsize = sizeof(CustomObject), .tp_itemsize = 0, .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, .tp_new = Custom_new, .tp_init = (initproc) Custom_init, .tp_dealloc = (destructor) Custom_dealloc, .tp_members = Custom_members, .tp_methods = Custom_methods, }; static PyModuleDef custommodule = { PyModuleDef_HEAD_INIT, .m_name = "custom2", .m_doc = "Example module that creates an extension type.", .m_size = -1, }; PyMODINIT_FUNC PyInit_custom2(void) { PyObject *m; if (PyType_Ready(&CustomType) < 0) return NULL; m = PyModule_Create(&custommodule); if (m == NULL) return NULL; Py_INCREF(&CustomType); if (PyModule_AddObject(m, "Custom", (PyObject *) &CustomType) < 0) { Py_DECREF(&CustomType); Py_DECREF(m); return NULL; } return m; } This version of the module has a number of changes. We’ve added an extra include: #include This include provides declarations that we use to handle attributes, as described a bit later. The "Custom" type now has three data attributes in its C struct, *first*, *last*, and *number*. The *first* and *last* variables are Python strings containing first and last names. The *number* attribute is a C integer. The object structure is updated accordingly: typedef struct { PyObject_HEAD PyObject *first; /* first name */ PyObject *last; /* last name */ int number; } CustomObject; Because we now have data to manage, we have to be more careful about object allocation and deallocation. At a minimum, we need a deallocation method: static void Custom_dealloc(CustomObject *self) { Py_XDECREF(self->first); Py_XDECREF(self->last); Py_TYPE(self)->tp_free((PyObject *) self); } which is assigned to the "tp_dealloc" member: .tp_dealloc = (destructor) Custom_dealloc, This method first clears the reference counts of the two Python attributes. "Py_XDECREF()" correctly handles the case where its argument is "NULL" (which might happen here if "tp_new" failed midway). It then calls the "tp_free" member of the object’s type (computed by "Py_TYPE(self)") to free the object’s memory. Note that the object’s type might not be "CustomType", because the object may be an instance of a subclass. Note: The explicit cast to "destructor" above is needed because we defined "Custom_dealloc" to take a "CustomObject *" argument, but the "tp_dealloc" function pointer expects to receive a "PyObject *" argument. Otherwise, the compiler will emit a warning. This is object-oriented polymorphism, in C! We want to make sure that the first and last names are initialized to empty strings, so we provide a "tp_new" implementation: static PyObject * Custom_new(PyTypeObject *type, PyObject *args, PyObject *kwds) { CustomObject *self; self = (CustomObject *) type->tp_alloc(type, 0); if (self != NULL) { self->first = PyUnicode_FromString(""); if (self->first == NULL) { Py_DECREF(self); return NULL; } self->last = PyUnicode_FromString(""); if (self->last == NULL) { Py_DECREF(self); return NULL; } self->number = 0; } return (PyObject *) self; } and install it in the "tp_new" member: .tp_new = Custom_new, The "tp_new" handler is responsible for creating (as opposed to initializing) objects of the type. It is exposed in Python as the "__new__()" method. It is not required to define a "tp_new" member, and indeed many extension types will simply reuse "PyType_GenericNew()" as done in the first version of the "Custom" type above. In this case, we use the "tp_new" handler to initialize the "first" and "last" attributes to non-"NULL" default values. "tp_new" is passed the type being instantiated (not necessarily "CustomType", if a subclass is instantiated) and any arguments passed when the type was called, and is expected to return the instance created. "tp_new" handlers always accept positional and keyword arguments, but they often ignore the arguments, leaving the argument handling to initializer (a.k.a. "tp_init" in C or "__init__" in Python) methods. Note: "tp_new" shouldn’t call "tp_init" explicitly, as the interpreter will do it itself. The "tp_new" implementation calls the "tp_alloc" slot to allocate memory: self = (CustomObject *) type->tp_alloc(type, 0); Since memory allocation may fail, we must check the "tp_alloc" result against "NULL" before proceeding. Note: We didn’t fill the "tp_alloc" slot ourselves. Rather "PyType_Ready()" fills it for us by inheriting it from our base class, which is "object" by default. Most types use the default allocation strategy. Note: If you are creating a co-operative "tp_new" (one that calls a base type’s "tp_new" or "__new__()"), you must *not* try to determine what method to call using method resolution order at runtime. Always statically determine what type you are going to call, and call its "tp_new" directly, or via "type->tp_base->tp_new". If you do not do this, Python subclasses of your type that also inherit from other Python-defined classes may not work correctly. (Specifically, you may not be able to create instances of such subclasses without getting a "TypeError".) We also define an initialization function which accepts arguments to provide initial values for our instance: static int Custom_init(CustomObject *self, PyObject *args, PyObject *kwds) { static char *kwlist[] = {"first", "last", "number", NULL}; PyObject *first = NULL, *last = NULL, *tmp; if (!PyArg_ParseTupleAndKeywords(args, kwds, "|OOi", kwlist, &first, &last, &self->number)) return -1; if (first) { tmp = self->first; Py_INCREF(first); self->first = first; Py_XDECREF(tmp); } if (last) { tmp = self->last; Py_INCREF(last); self->last = last; Py_XDECREF(tmp); } return 0; } by filling the "tp_init" slot. .tp_init = (initproc) Custom_init, The "tp_init" slot is exposed in Python as the "__init__()" method. It is used to initialize an object after it’s created. Initializers always accept positional and keyword arguments, and they should return either "0" on success or "-1" on error. Unlike the "tp_new" handler, there is no guarantee that "tp_init" is called at all (for example, the "pickle" module by default doesn’t call "__init__()" on unpickled instances). It can also be called multiple times. Anyone can call the "__init__()" method on our objects. For this reason, we have to be extra careful when assigning the new attribute values. We might be tempted, for example to assign the "first" member like this: if (first) { Py_XDECREF(self->first); Py_INCREF(first); self->first = first; } But this would be risky. Our type doesn’t restrict the type of the "first" member, so it could be any kind of object. It could have a destructor that causes code to be executed that tries to access the "first" member; or that destructor could release the *Global interpreter Lock* and let arbitrary code run in other threads that accesses and modifies our object. To be paranoid and protect ourselves against this possibility, we almost always reassign members before decrementing their reference counts. When don’t we have to do this? * when we absolutely know that the reference count is greater than 1; * when we know that deallocation of the object [1] will neither release the *GIL* nor cause any calls back into our type’s code; * when decrementing a reference count in a "tp_dealloc" handler on a type which doesn’t support cyclic garbage collection [2]. We want to expose our instance variables as attributes. There are a number of ways to do that. The simplest way is to define member definitions: static PyMemberDef Custom_members[] = { {"first", T_OBJECT_EX, offsetof(CustomObject, first), 0, "first name"}, {"last", T_OBJECT_EX, offsetof(CustomObject, last), 0, "last name"}, {"number", T_INT, offsetof(CustomObject, number), 0, "custom number"}, {NULL} /* Sentinel */ }; and put the definitions in the "tp_members" slot: .tp_members = Custom_members, Each member definition has a member name, type, offset, access flags and documentation string. See the Generic Attribute Management section below for details. A disadvantage of this approach is that it doesn’t provide a way to restrict the types of objects that can be assigned to the Python attributes. We expect the first and last names to be strings, but any Python objects can be assigned. Further, the attributes can be deleted, setting the C pointers to "NULL". Even though we can make sure the members are initialized to non-"NULL" values, the members can be set to "NULL" if the attributes are deleted. We define a single method, "Custom.name()", that outputs the objects name as the concatenation of the first and last names. static PyObject * Custom_name(CustomObject *self, PyObject *Py_UNUSED(ignored)) { if (self->first == NULL) { PyErr_SetString(PyExc_AttributeError, "first"); return NULL; } if (self->last == NULL) { PyErr_SetString(PyExc_AttributeError, "last"); return NULL; } return PyUnicode_FromFormat("%S %S", self->first, self->last); } The method is implemented as a C function that takes a "Custom" (or "Custom" subclass) instance as the first argument. Methods always take an instance as the first argument. Methods often take positional and keyword arguments as well, but in this case we don’t take any and don’t need to accept a positional argument tuple or keyword argument dictionary. This method is equivalent to the Python method: def name(self): return "%s %s" % (self.first, self.last) Note that we have to check for the possibility that our "first" and "last" members are "NULL". This is because they can be deleted, in which case they are set to "NULL". It would be better to prevent deletion of these attributes and to restrict the attribute values to be strings. We’ll see how to do that in the next section. Now that we’ve defined the method, we need to create an array of method definitions: static PyMethodDef Custom_methods[] = { {"name", (PyCFunction) Custom_name, METH_NOARGS, "Return the name, combining the first and last name" }, {NULL} /* Sentinel */ }; (note that we used the "METH_NOARGS" flag to indicate that the method is expecting no arguments other than *self*) and assign it to the "tp_methods" slot: .tp_methods = Custom_methods, Finally, we’ll make our type usable as a base class for subclassing. We’ve written our methods carefully so far so that they don’t make any assumptions about the type of the object being created or used, so all we need to do is to add the "Py_TPFLAGS_BASETYPE" to our class flag definition: .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, We rename "PyInit_custom()" to "PyInit_custom2()", update the module name in the "PyModuleDef" struct, and update the full class name in the "PyTypeObject" struct. Finally, we update our "setup.py" file to build the new module: from distutils.core import setup, Extension setup(name="custom", version="1.0", ext_modules=[ Extension("custom", ["custom.c"]), Extension("custom2", ["custom2.c"]), ]) 2.3. Providing finer control over data attributes ================================================= In this section, we’ll provide finer control over how the "first" and "last" attributes are set in the "Custom" example. In the previous version of our module, the instance variables "first" and "last" could be set to non-string values or even deleted. We want to make sure that these attributes always contain strings. #define PY_SSIZE_T_CLEAN #include #include "structmember.h" typedef struct { PyObject_HEAD PyObject *first; /* first name */ PyObject *last; /* last name */ int number; } CustomObject; static void Custom_dealloc(CustomObject *self) { Py_XDECREF(self->first); Py_XDECREF(self->last); Py_TYPE(self)->tp_free((PyObject *) self); } static PyObject * Custom_new(PyTypeObject *type, PyObject *args, PyObject *kwds) { CustomObject *self; self = (CustomObject *) type->tp_alloc(type, 0); if (self != NULL) { self->first = PyUnicode_FromString(""); if (self->first == NULL) { Py_DECREF(self); return NULL; } self->last = PyUnicode_FromString(""); if (self->last == NULL) { Py_DECREF(self); return NULL; } self->number = 0; } return (PyObject *) self; } static int Custom_init(CustomObject *self, PyObject *args, PyObject *kwds) { static char *kwlist[] = {"first", "last", "number", NULL}; PyObject *first = NULL, *last = NULL, *tmp; if (!PyArg_ParseTupleAndKeywords(args, kwds, "|UUi", kwlist, &first, &last, &self->number)) return -1; if (first) { tmp = self->first; Py_INCREF(first); self->first = first; Py_DECREF(tmp); } if (last) { tmp = self->last; Py_INCREF(last); self->last = last; Py_DECREF(tmp); } return 0; } static PyMemberDef Custom_members[] = { {"number", T_INT, offsetof(CustomObject, number), 0, "custom number"}, {NULL} /* Sentinel */ }; static PyObject * Custom_getfirst(CustomObject *self, void *closure) { Py_INCREF(self->first); return self->first; } static int Custom_setfirst(CustomObject *self, PyObject *value, void *closure) { PyObject *tmp; if (value == NULL) { PyErr_SetString(PyExc_TypeError, "Cannot delete the first attribute"); return -1; } if (!PyUnicode_Check(value)) { PyErr_SetString(PyExc_TypeError, "The first attribute value must be a string"); return -1; } tmp = self->first; Py_INCREF(value); self->first = value; Py_DECREF(tmp); return 0; } static PyObject * Custom_getlast(CustomObject *self, void *closure) { Py_INCREF(self->last); return self->last; } static int Custom_setlast(CustomObject *self, PyObject *value, void *closure) { PyObject *tmp; if (value == NULL) { PyErr_SetString(PyExc_TypeError, "Cannot delete the last attribute"); return -1; } if (!PyUnicode_Check(value)) { PyErr_SetString(PyExc_TypeError, "The last attribute value must be a string"); return -1; } tmp = self->last; Py_INCREF(value); self->last = value; Py_DECREF(tmp); return 0; } static PyGetSetDef Custom_getsetters[] = { {"first", (getter) Custom_getfirst, (setter) Custom_setfirst, "first name", NULL}, {"last", (getter) Custom_getlast, (setter) Custom_setlast, "last name", NULL}, {NULL} /* Sentinel */ }; static PyObject * Custom_name(CustomObject *self, PyObject *Py_UNUSED(ignored)) { return PyUnicode_FromFormat("%S %S", self->first, self->last); } static PyMethodDef Custom_methods[] = { {"name", (PyCFunction) Custom_name, METH_NOARGS, "Return the name, combining the first and last name" }, {NULL} /* Sentinel */ }; static PyTypeObject CustomType = { PyVarObject_HEAD_INIT(NULL, 0) .tp_name = "custom3.Custom", .tp_doc = PyDoc_STR("Custom objects"), .tp_basicsize = sizeof(CustomObject), .tp_itemsize = 0, .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, .tp_new = Custom_new, .tp_init = (initproc) Custom_init, .tp_dealloc = (destructor) Custom_dealloc, .tp_members = Custom_members, .tp_methods = Custom_methods, .tp_getset = Custom_getsetters, }; static PyModuleDef custommodule = { PyModuleDef_HEAD_INIT, .m_name = "custom3", .m_doc = "Example module that creates an extension type.", .m_size = -1, }; PyMODINIT_FUNC PyInit_custom3(void) { PyObject *m; if (PyType_Ready(&CustomType) < 0) return NULL; m = PyModule_Create(&custommodule); if (m == NULL) return NULL; Py_INCREF(&CustomType); if (PyModule_AddObject(m, "Custom", (PyObject *) &CustomType) < 0) { Py_DECREF(&CustomType); Py_DECREF(m); return NULL; } return m; } To provide greater control, over the "first" and "last" attributes, we’ll use custom getter and setter functions. Here are the functions for getting and setting the "first" attribute: static PyObject * Custom_getfirst(CustomObject *self, void *closure) { Py_INCREF(self->first); return self->first; } static int Custom_setfirst(CustomObject *self, PyObject *value, void *closure) { PyObject *tmp; if (value == NULL) { PyErr_SetString(PyExc_TypeError, "Cannot delete the first attribute"); return -1; } if (!PyUnicode_Check(value)) { PyErr_SetString(PyExc_TypeError, "The first attribute value must be a string"); return -1; } tmp = self->first; Py_INCREF(value); self->first = value; Py_DECREF(tmp); return 0; } The getter function is passed a "Custom" object and a “closure”, which is a void pointer. In this case, the closure is ignored. (The closure supports an advanced usage in which definition data is passed to the getter and setter. This could, for example, be used to allow a single set of getter and setter functions that decide the attribute to get or set based on data in the closure.) The setter function is passed the "Custom" object, the new value, and the closure. The new value may be "NULL", in which case the attribute is being deleted. In our setter, we raise an error if the attribute is deleted or if its new value is not a string. We create an array of "PyGetSetDef" structures: static PyGetSetDef Custom_getsetters[] = { {"first", (getter) Custom_getfirst, (setter) Custom_setfirst, "first name", NULL}, {"last", (getter) Custom_getlast, (setter) Custom_setlast, "last name", NULL}, {NULL} /* Sentinel */ }; and register it in the "tp_getset" slot: .tp_getset = Custom_getsetters, The last item in a "PyGetSetDef" structure is the “closure” mentioned above. In this case, we aren’t using a closure, so we just pass "NULL". We also remove the member definitions for these attributes: static PyMemberDef Custom_members[] = { {"number", T_INT, offsetof(CustomObject, number), 0, "custom number"}, {NULL} /* Sentinel */ }; We also need to update the "tp_init" handler to only allow strings [3] to be passed: static int Custom_init(CustomObject *self, PyObject *args, PyObject *kwds) { static char *kwlist[] = {"first", "last", "number", NULL}; PyObject *first = NULL, *last = NULL, *tmp; if (!PyArg_ParseTupleAndKeywords(args, kwds, "|UUi", kwlist, &first, &last, &self->number)) return -1; if (first) { tmp = self->first; Py_INCREF(first); self->first = first; Py_DECREF(tmp); } if (last) { tmp = self->last; Py_INCREF(last); self->last = last; Py_DECREF(tmp); } return 0; } With these changes, we can assure that the "first" and "last" members are never "NULL" so we can remove checks for "NULL" values in almost all cases. This means that most of the "Py_XDECREF()" calls can be converted to "Py_DECREF()" calls. The only place we can’t change these calls is in the "tp_dealloc" implementation, where there is the possibility that the initialization of these members failed in "tp_new". We also rename the module initialization function and module name in the initialization function, as we did before, and we add an extra definition to the "setup.py" file. 2.4. Supporting cyclic garbage collection ========================================= Python has a *cyclic garbage collector (GC)* that can identify unneeded objects even when their reference counts are not zero. This can happen when objects are involved in cycles. For example, consider: >>> l = [] >>> l.append(l) >>> del l In this example, we create a list that contains itself. When we delete it, it still has a reference from itself. Its reference count doesn’t drop to zero. Fortunately, Python’s cyclic garbage collector will eventually figure out that the list is garbage and free it. In the second version of the "Custom" example, we allowed any kind of object to be stored in the "first" or "last" attributes [4]. Besides, in the second and third versions, we allowed subclassing "Custom", and subclasses may add arbitrary attributes. For any of those two reasons, "Custom" objects can participate in cycles: >>> import custom3 >>> class Derived(custom3.Custom): pass ... >>> n = Derived() >>> n.some_attribute = n To allow a "Custom" instance participating in a reference cycle to be properly detected and collected by the cyclic GC, our "Custom" type needs to fill two additional slots and to enable a flag that enables these slots: #define PY_SSIZE_T_CLEAN #include #include "structmember.h" typedef struct { PyObject_HEAD PyObject *first; /* first name */ PyObject *last; /* last name */ int number; } CustomObject; static int Custom_traverse(CustomObject *self, visitproc visit, void *arg) { Py_VISIT(self->first); Py_VISIT(self->last); return 0; } static int Custom_clear(CustomObject *self) { Py_CLEAR(self->first); Py_CLEAR(self->last); return 0; } static void Custom_dealloc(CustomObject *self) { PyObject_GC_UnTrack(self); Custom_clear(self); Py_TYPE(self)->tp_free((PyObject *) self); } static PyObject * Custom_new(PyTypeObject *type, PyObject *args, PyObject *kwds) { CustomObject *self; self = (CustomObject *) type->tp_alloc(type, 0); if (self != NULL) { self->first = PyUnicode_FromString(""); if (self->first == NULL) { Py_DECREF(self); return NULL; } self->last = PyUnicode_FromString(""); if (self->last == NULL) { Py_DECREF(self); return NULL; } self->number = 0; } return (PyObject *) self; } static int Custom_init(CustomObject *self, PyObject *args, PyObject *kwds) { static char *kwlist[] = {"first", "last", "number", NULL}; PyObject *first = NULL, *last = NULL, *tmp; if (!PyArg_ParseTupleAndKeywords(args, kwds, "|UUi", kwlist, &first, &last, &self->number)) return -1; if (first) { tmp = self->first; Py_INCREF(first); self->first = first; Py_DECREF(tmp); } if (last) { tmp = self->last; Py_INCREF(last); self->last = last; Py_DECREF(tmp); } return 0; } static PyMemberDef Custom_members[] = { {"number", T_INT, offsetof(CustomObject, number), 0, "custom number"}, {NULL} /* Sentinel */ }; static PyObject * Custom_getfirst(CustomObject *self, void *closure) { Py_INCREF(self->first); return self->first; } static int Custom_setfirst(CustomObject *self, PyObject *value, void *closure) { if (value == NULL) { PyErr_SetString(PyExc_TypeError, "Cannot delete the first attribute"); return -1; } if (!PyUnicode_Check(value)) { PyErr_SetString(PyExc_TypeError, "The first attribute value must be a string"); return -1; } Py_INCREF(value); Py_CLEAR(self->first); self->first = value; return 0; } static PyObject * Custom_getlast(CustomObject *self, void *closure) { Py_INCREF(self->last); return self->last; } static int Custom_setlast(CustomObject *self, PyObject *value, void *closure) { if (value == NULL) { PyErr_SetString(PyExc_TypeError, "Cannot delete the last attribute"); return -1; } if (!PyUnicode_Check(value)) { PyErr_SetString(PyExc_TypeError, "The last attribute value must be a string"); return -1; } Py_INCREF(value); Py_CLEAR(self->last); self->last = value; return 0; } static PyGetSetDef Custom_getsetters[] = { {"first", (getter) Custom_getfirst, (setter) Custom_setfirst, "first name", NULL}, {"last", (getter) Custom_getlast, (setter) Custom_setlast, "last name", NULL}, {NULL} /* Sentinel */ }; static PyObject * Custom_name(CustomObject *self, PyObject *Py_UNUSED(ignored)) { return PyUnicode_FromFormat("%S %S", self->first, self->last); } static PyMethodDef Custom_methods[] = { {"name", (PyCFunction) Custom_name, METH_NOARGS, "Return the name, combining the first and last name" }, {NULL} /* Sentinel */ }; static PyTypeObject CustomType = { PyVarObject_HEAD_INIT(NULL, 0) .tp_name = "custom4.Custom", .tp_doc = PyDoc_STR("Custom objects"), .tp_basicsize = sizeof(CustomObject), .tp_itemsize = 0, .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE | Py_TPFLAGS_HAVE_GC, .tp_new = Custom_new, .tp_init = (initproc) Custom_init, .tp_dealloc = (destructor) Custom_dealloc, .tp_traverse = (traverseproc) Custom_traverse, .tp_clear = (inquiry) Custom_clear, .tp_members = Custom_members, .tp_methods = Custom_methods, .tp_getset = Custom_getsetters, }; static PyModuleDef custommodule = { PyModuleDef_HEAD_INIT, .m_name = "custom4", .m_doc = "Example module that creates an extension type.", .m_size = -1, }; PyMODINIT_FUNC PyInit_custom4(void) { PyObject *m; if (PyType_Ready(&CustomType) < 0) return NULL; m = PyModule_Create(&custommodule); if (m == NULL) return NULL; Py_INCREF(&CustomType); if (PyModule_AddObject(m, "Custom", (PyObject *) &CustomType) < 0) { Py_DECREF(&CustomType); Py_DECREF(m); return NULL; } return m; } First, the traversal method lets the cyclic GC know about subobjects that could participate in cycles: static int Custom_traverse(CustomObject *self, visitproc visit, void *arg) { int vret; if (self->first) { vret = visit(self->first, arg); if (vret != 0) return vret; } if (self->last) { vret = visit(self->last, arg); if (vret != 0) return vret; } return 0; } For each subobject that can participate in cycles, we need to call the "visit()" function, which is passed to the traversal method. The "visit()" function takes as arguments the subobject and the extra argument *arg* passed to the traversal method. It returns an integer value that must be returned if it is non-zero. Python provides a "Py_VISIT()" macro that automates calling visit functions. With "Py_VISIT()", we can minimize the amount of boilerplate in "Custom_traverse": static int Custom_traverse(CustomObject *self, visitproc visit, void *arg) { Py_VISIT(self->first); Py_VISIT(self->last); return 0; } Note: The "tp_traverse" implementation must name its arguments exactly *visit* and *arg* in order to use "Py_VISIT()". Second, we need to provide a method for clearing any subobjects that can participate in cycles: static int Custom_clear(CustomObject *self) { Py_CLEAR(self->first); Py_CLEAR(self->last); return 0; } Notice the use of the "Py_CLEAR()" macro. It is the recommended and safe way to clear data attributes of arbitrary types while decrementing their reference counts. If you were to call "Py_XDECREF()" instead on the attribute before setting it to "NULL", there is a possibility that the attribute’s destructor would call back into code that reads the attribute again (*especially* if there is a reference cycle). Note: You could emulate "Py_CLEAR()" by writing: PyObject *tmp; tmp = self->first; self->first = NULL; Py_XDECREF(tmp); Nevertheless, it is much easier and less error-prone to always use "Py_CLEAR()" when deleting an attribute. Don’t try to micro- optimize at the expense of robustness! The deallocator "Custom_dealloc" may call arbitrary code when clearing attributes. It means the circular GC can be triggered inside the function. Since the GC assumes reference count is not zero, we need to untrack the object from the GC by calling "PyObject_GC_UnTrack()" before clearing members. Here is our reimplemented deallocator using "PyObject_GC_UnTrack()" and "Custom_clear": static void Custom_dealloc(CustomObject *self) { PyObject_GC_UnTrack(self); Custom_clear(self); Py_TYPE(self)->tp_free((PyObject *) self); } Finally, we add the "Py_TPFLAGS_HAVE_GC" flag to the class flags: .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE | Py_TPFLAGS_HAVE_GC, That’s pretty much it. If we had written custom "tp_alloc" or "tp_free" handlers, we’d need to modify them for cyclic garbage collection. Most extensions will use the versions automatically provided. 2.5. Subclassing other types ============================ It is possible to create new extension types that are derived from existing types. It is easiest to inherit from the built in types, since an extension can easily use the "PyTypeObject" it needs. It can be difficult to share these "PyTypeObject" structures between extension modules. In this example we will create a "SubList" type that inherits from the built-in "list" type. The new type will be completely compatible with regular lists, but will have an additional "increment()" method that increases an internal counter: >>> import sublist >>> s = sublist.SubList(range(3)) >>> s.extend(s) >>> print(len(s)) 6 >>> print(s.increment()) 1 >>> print(s.increment()) 2 #define PY_SSIZE_T_CLEAN #include typedef struct { PyListObject list; int state; } SubListObject; static PyObject * SubList_increment(SubListObject *self, PyObject *unused) { self->state++; return PyLong_FromLong(self->state); } static PyMethodDef SubList_methods[] = { {"increment", (PyCFunction) SubList_increment, METH_NOARGS, PyDoc_STR("increment state counter")}, {NULL}, }; static int SubList_init(SubListObject *self, PyObject *args, PyObject *kwds) { if (PyList_Type.tp_init((PyObject *) self, args, kwds) < 0) return -1; self->state = 0; return 0; } static PyTypeObject SubListType = { PyVarObject_HEAD_INIT(NULL, 0) .tp_name = "sublist.SubList", .tp_doc = PyDoc_STR("SubList objects"), .tp_basicsize = sizeof(SubListObject), .tp_itemsize = 0, .tp_flags = Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE, .tp_init = (initproc) SubList_init, .tp_methods = SubList_methods, }; static PyModuleDef sublistmodule = { PyModuleDef_HEAD_INIT, .m_name = "sublist", .m_doc = "Example module that creates an extension type.", .m_size = -1, }; PyMODINIT_FUNC PyInit_sublist(void) { PyObject *m; SubListType.tp_base = &PyList_Type; if (PyType_Ready(&SubListType) < 0) return NULL; m = PyModule_Create(&sublistmodule); if (m == NULL) return NULL; Py_INCREF(&SubListType); if (PyModule_AddObject(m, "SubList", (PyObject *) &SubListType) < 0) { Py_DECREF(&SubListType); Py_DECREF(m); return NULL; } return m; } As you can see, the source code closely resembles the "Custom" examples in previous sections. We will break down the main differences between them. typedef struct { PyListObject list; int state; } SubListObject; The primary difference for derived type objects is that the base type’s object structure must be the first value. The base type will already include the "PyObject_HEAD()" at the beginning of its structure. When a Python object is a "SubList" instance, its "PyObject *" pointer can be safely cast to both "PyListObject *" and "SubListObject *": static int SubList_init(SubListObject *self, PyObject *args, PyObject *kwds) { if (PyList_Type.tp_init((PyObject *) self, args, kwds) < 0) return -1; self->state = 0; return 0; } We see above how to call through to the "__init__" method of the base type. This pattern is important when writing a type with custom "tp_new" and "tp_dealloc" members. The "tp_new" handler should not actually create the memory for the object with its "tp_alloc", but let the base class handle it by calling its own "tp_new". The "PyTypeObject" struct supports a "tp_base" specifying the type’s concrete base class. Due to cross-platform compiler issues, you can’t fill that field directly with a reference to "PyList_Type"; it should be done later in the module initialization function: PyMODINIT_FUNC PyInit_sublist(void) { PyObject* m; SubListType.tp_base = &PyList_Type; if (PyType_Ready(&SubListType) < 0) return NULL; m = PyModule_Create(&sublistmodule); if (m == NULL) return NULL; Py_INCREF(&SubListType); if (PyModule_AddObject(m, "SubList", (PyObject *) &SubListType) < 0) { Py_DECREF(&SubListType); Py_DECREF(m); return NULL; } return m; } Before calling "PyType_Ready()", the type structure must have the "tp_base" slot filled in. When we are deriving an existing type, it is not necessary to fill out the "tp_alloc" slot with "PyType_GenericNew()" – the allocation function from the base type will be inherited. After that, calling "PyType_Ready()" and adding the type object to the module is the same as with the basic "Custom" examples. -[ Footnotes ]- [1] This is true when we know that the object is a basic type, like a string or a float. [2] We relied on this in the "tp_dealloc" handler in this example, because our type doesn’t support garbage collection. [3] We now know that the first and last members are strings, so perhaps we could be less careful about decrementing their reference counts, however, we accept instances of string subclasses. Even though deallocating normal strings won’t call back into our objects, we can’t guarantee that deallocating an instance of a string subclass won’t call back into our objects. [4] Also, even with our attributes restricted to strings instances, the user could pass arbitrary "str" subclasses and therefore still create reference cycles. 5. Building C and C++ Extensions on Windows ******************************************* This chapter briefly explains how to create a Windows extension module for Python using Microsoft Visual C++, and follows with more detailed background information on how it works. The explanatory material is useful for both the Windows programmer learning to build Python extensions and the Unix programmer interested in producing software which can be successfully built on both Unix and Windows. Module authors are encouraged to use the distutils approach for building extension modules, instead of the one described in this section. You will still need the C compiler that was used to build Python; typically Microsoft Visual C++. Note: This chapter mentions a number of filenames that include an encoded Python version number. These filenames are represented with the version number shown as "XY"; in practice, "'X'" will be the major version number and "'Y'" will be the minor version number of the Python release you’re working with. For example, if you are using Python 2.2.1, "XY" will actually be "22". 5.1. A Cookbook Approach ======================== There are two approaches to building extension modules on Windows, just as there are on Unix: use the "distutils" package to control the build process, or do things manually. The distutils approach works well for most extensions; documentation on using "distutils" to build and package extension modules is available in Distributing Python Modules (Legacy version). If you find you really need to do things manually, it may be instructive to study the project file for the winsound standard library module. 5.2. Differences Between Unix and Windows ========================================= Unix and Windows use completely different paradigms for run-time loading of code. Before you try to build a module that can be dynamically loaded, be aware of how your system works. In Unix, a shared object (".so") file contains code to be used by the program, and also the names of functions and data that it expects to find in the program. When the file is joined to the program, all references to those functions and data in the file’s code are changed to point to the actual locations in the program where the functions and data are placed in memory. This is basically a link operation. In Windows, a dynamic-link library (".dll") file has no dangling references. Instead, an access to functions or data goes through a lookup table. So the DLL code does not have to be fixed up at runtime to refer to the program’s memory; instead, the code already uses the DLL’s lookup table, and the lookup table is modified at runtime to point to the functions and data. In Unix, there is only one type of library file (".a") which contains code from several object files (".o"). During the link step to create a shared object file (".so"), the linker may find that it doesn’t know where an identifier is defined. The linker will look for it in the object files in the libraries; if it finds it, it will include all the code from that object file. In Windows, there are two types of library, a static library and an import library (both called ".lib"). A static library is like a Unix ".a" file; it contains code to be included as necessary. An import library is basically used only to reassure the linker that a certain identifier is legal, and will be present in the program when the DLL is loaded. So the linker uses the information from the import library to build the lookup table for using identifiers that are not included in the DLL. When an application or a DLL is linked, an import library may be generated, which will need to be used for all future DLLs that depend on the symbols in the application or DLL. Suppose you are building two dynamic-load modules, B and C, which should share another block of code A. On Unix, you would *not* pass "A.a" to the linker for "B.so" and "C.so"; that would cause it to be included twice, so that B and C would each have their own copy. In Windows, building "A.dll" will also build "A.lib". You *do* pass "A.lib" to the linker for B and C. "A.lib" does not contain code; it just contains information which will be used at runtime to access A’s code. In Windows, using an import library is sort of like using "import spam"; it gives you access to spam’s names, but does not create a separate copy. On Unix, linking with a library is more like "from spam import *"; it does create a separate copy. 5.3. Using DLLs in Practice =========================== Windows Python is built in Microsoft Visual C++; using other compilers may or may not work. The rest of this section is MSVC++ specific. When creating DLLs in Windows, you must pass "pythonXY.lib" to the linker. To build two DLLs, spam and ni (which uses C functions found in spam), you could use these commands: cl /LD /I/python/include spam.c ../libs/pythonXY.lib cl /LD /I/python/include ni.c spam.lib ../libs/pythonXY.lib The first command created three files: "spam.obj", "spam.dll" and "spam.lib". "Spam.dll" does not contain any Python functions (such as "PyArg_ParseTuple()"), but it does know how to find the Python code thanks to "pythonXY.lib". The second command created "ni.dll" (and ".obj" and ".lib"), which knows how to find the necessary functions from spam, and also from the Python executable. Not every identifier is exported to the lookup table. If you want any other modules (including Python) to be able to see your identifiers, you have to say "_declspec(dllexport)", as in "void _declspec(dllexport) initspam(void)" or "PyObject _declspec(dllexport) *NiGetSpamData(void)". Developer Studio will throw in a lot of import libraries that you do not really need, adding about 100K to your executable. To get rid of them, use the Project Settings dialog, Link tab, to specify *ignore default libraries*. Add the correct "msvcrtxx.lib" to the list of libraries. Design and History FAQ ********************** Why does Python use indentation for grouping of statements? =========================================================== Guido van Rossum believes that using indentation for grouping is extremely elegant and contributes a lot to the clarity of the average Python program. Most people learn to love this feature after a while. Since there are no begin/end brackets there cannot be a disagreement between grouping perceived by the parser and the human reader. Occasionally C programmers will encounter a fragment of code like this: if (x <= y) x++; y--; z++; Only the "x++" statement is executed if the condition is true, but the indentation leads many to believe otherwise. Even experienced C programmers will sometimes stare at it a long time wondering as to why "y" is being decremented even for "x > y". Because there are no begin/end brackets, Python is much less prone to coding-style conflicts. In C there are many different ways to place the braces. After becoming used to reading and writing code using a particular style, it is normal to feel somewhat uneasy when reading (or being required to write) in a different one. Many coding styles place begin/end brackets on a line by themselves. This makes programs considerably longer and wastes valuable screen space, making it harder to get a good overview of a program. Ideally, a function should fit on one screen (say, 20–30 lines). 20 lines of Python can do a lot more work than 20 lines of C. This is not solely due to the lack of begin/end brackets – the lack of declarations and the high-level data types are also responsible – but the indentation- based syntax certainly helps. Why am I getting strange results with simple arithmetic operations? =================================================================== See the next question. Why are floating-point calculations so inaccurate? ================================================== Users are often surprised by results like this: >>> 1.2 - 1.0 0.19999999999999996 and think it is a bug in Python. It’s not. This has little to do with Python, and much more to do with how the underlying platform handles floating-point numbers. The "float" type in CPython uses a C "double" for storage. A "float" object’s value is stored in binary floating-point with a fixed precision (typically 53 bits) and Python uses C operations, which in turn rely on the hardware implementation in the processor, to perform floating-point operations. This means that as far as floating-point operations are concerned, Python behaves like many popular languages including C and Java. Many numbers that can be written easily in decimal notation cannot be expressed exactly in binary floating-point. For example, after: >>> x = 1.2 the value stored for "x" is a (very good) approximation to the decimal value "1.2", but is not exactly equal to it. On a typical machine, the actual stored value is: 1.0011001100110011001100110011001100110011001100110011 (binary) which is exactly: 1.1999999999999999555910790149937383830547332763671875 (decimal) The typical precision of 53 bits provides Python floats with 15–16 decimal digits of accuracy. For a fuller explanation, please see the floating point arithmetic chapter in the Python tutorial. Why are Python strings immutable? ================================= There are several advantages. One is performance: knowing that a string is immutable means we can allocate space for it at creation time, and the storage requirements are fixed and unchanging. This is also one of the reasons for the distinction between tuples and lists. Another advantage is that strings in Python are considered as “elemental” as numbers. No amount of activity will change the value 8 to anything else, and in Python, no amount of activity will change the string “eight” to anything else. Why must ‘self’ be used explicitly in method definitions and calls? =================================================================== The idea was borrowed from Modula-3. It turns out to be very useful, for a variety of reasons. First, it’s more obvious that you are using a method or instance attribute instead of a local variable. Reading "self.x" or "self.meth()" makes it absolutely clear that an instance variable or method is used even if you don’t know the class definition by heart. In C++, you can sort of tell by the lack of a local variable declaration (assuming globals are rare or easily recognizable) – but in Python, there are no local variable declarations, so you’d have to look up the class definition to be sure. Some C++ and Java coding standards call for instance attributes to have an "m_" prefix, so this explicitness is still useful in those languages, too. Second, it means that no special syntax is necessary if you want to explicitly reference or call the method from a particular class. In C++, if you want to use a method from a base class which is overridden in a derived class, you have to use the "::" operator – in Python you can write "baseclass.methodname(self, )". This is particularly useful for "__init__()" methods, and in general in cases where a derived class method wants to extend the base class method of the same name and thus has to call the base class method somehow. Finally, for instance variables it solves a syntactic problem with assignment: since local variables in Python are (by definition!) those variables to which a value is assigned in a function body (and that aren’t explicitly declared global), there has to be some way to tell the interpreter that an assignment was meant to assign to an instance variable instead of to a local variable, and it should preferably be syntactic (for efficiency reasons). C++ does this through declarations, but Python doesn’t have declarations and it would be a pity having to introduce them just for this purpose. Using the explicit "self.var" solves this nicely. Similarly, for using instance variables, having to write "self.var" means that references to unqualified names inside a method don’t have to search the instance’s directories. To put it another way, local variables and instance variables live in two different namespaces, and you need to tell Python which namespace to use. Why can’t I use an assignment in an expression? =============================================== Starting in Python 3.8, you can! Assignment expressions using the walrus operator *:=* assign a variable in an expression: while chunk := fp.read(200): print(chunk) See **PEP 572** for more information. Why does Python use methods for some functionality (e.g. list.index()) but functions for other (e.g. len(list))? ================================================================================================================ As Guido said: (a) For some operations, prefix notation just reads better than postfix – prefix (and infix!) operations have a long tradition in mathematics which likes notations where the visuals help the mathematician thinking about a problem. Compare the easy with which we rewrite a formula like x*(a+b) into x*a + x*b to the clumsiness of doing the same thing using a raw OO notation. (b) When I read code that says len(x) I *know* that it is asking for the length of something. This tells me two things: the result is an integer, and the argument is some kind of container. To the contrary, when I read x.len(), I have to already know that x is some kind of container implementing an interface or inheriting from a class that has a standard len(). Witness the confusion we occasionally have when a class that is not implementing a mapping has a get() or keys() method, or something that isn’t a file has a write() method. -- https://mail.python.org/pipermail/python-3000/2006-November/004 643.html Why is join() a string method instead of a list or tuple method? ================================================================ Strings became much more like other standard types starting in Python 1.6, when methods were added which give the same functionality that has always been available using the functions of the string module. Most of these new methods have been widely accepted, but the one which appears to make some programmers feel uncomfortable is: ", ".join(['1', '2', '4', '8', '16']) which gives the result: "1, 2, 4, 8, 16" There are two common arguments against this usage. The first runs along the lines of: “It looks really ugly using a method of a string literal (string constant)”, to which the answer is that it might, but a string literal is just a fixed value. If the methods are to be allowed on names bound to strings there is no logical reason to make them unavailable on literals. The second objection is typically cast as: “I am really telling a sequence to join its members together with a string constant”. Sadly, you aren’t. For some reason there seems to be much less difficulty with having "split()" as a string method, since in that case it is easy to see that "1, 2, 4, 8, 16".split(", ") is an instruction to a string literal to return the substrings delimited by the given separator (or, by default, arbitrary runs of white space). "join()" is a string method because in using it you are telling the separator string to iterate over a sequence of strings and insert itself between adjacent elements. This method can be used with any argument which obeys the rules for sequence objects, including any new classes you might define yourself. Similar methods exist for bytes and bytearray objects. How fast are exceptions? ======================== A try/except block is extremely efficient if no exceptions are raised. Actually catching an exception is expensive. In versions of Python prior to 2.0 it was common to use this idiom: try: value = mydict[key] except KeyError: mydict[key] = getvalue(key) value = mydict[key] This only made sense when you expected the dict to have the key almost all the time. If that wasn’t the case, you coded it like this: if key in mydict: value = mydict[key] else: value = mydict[key] = getvalue(key) For this specific case, you could also use "value = dict.setdefault(key, getvalue(key))", but only if the "getvalue()" call is cheap enough because it is evaluated in all cases. Why isn’t there a switch or case statement in Python? ===================================================== You can do this easily enough with a sequence of "if... elif... elif... else". There have been some proposals for switch statement syntax, but there is no consensus (yet) on whether and how to do range tests. See **PEP 275** for complete details and the current status. For cases where you need to choose from a very large number of possibilities, you can create a dictionary mapping case values to functions to call. For example: functions = {'a': function_1, 'b': function_2, 'c': self.method_1} func = functions[value] func() For calling methods on objects, you can simplify yet further by using the "getattr()" built-in to retrieve methods with a particular name: class MyVisitor: def visit_a(self): ... def dispatch(self, value): method_name = 'visit_' + str(value) method = getattr(self, method_name) method() It’s suggested that you use a prefix for the method names, such as "visit_" in this example. Without such a prefix, if values are coming from an untrusted source, an attacker would be able to call any method on your object. Can’t you emulate threads in the interpreter instead of relying on an OS-specific thread implementation? ======================================================================================================== Answer 1: Unfortunately, the interpreter pushes at least one C stack frame for each Python stack frame. Also, extensions can call back into Python at almost random moments. Therefore, a complete threads implementation requires thread support for C. Answer 2: Fortunately, there is Stackless Python, which has a completely redesigned interpreter loop that avoids the C stack. Why can’t lambda expressions contain statements? ================================================ Python lambda expressions cannot contain statements because Python’s syntactic framework can’t handle statements nested inside expressions. However, in Python, this is not a serious problem. Unlike lambda forms in other languages, where they add functionality, Python lambdas are only a shorthand notation if you’re too lazy to define a function. Functions are already first class objects in Python, and can be declared in a local scope. Therefore the only advantage of using a lambda instead of a locally-defined function is that you don’t need to invent a name for the function – but that’s just a local variable to which the function object (which is exactly the same type of object that a lambda expression yields) is assigned! Can Python be compiled to machine code, C or some other language? ================================================================= Cython compiles a modified version of Python with optional annotations into C extensions. Nuitka is an up-and-coming compiler of Python into C++ code, aiming to support the full Python language. For compiling to Java you can consider VOC. How does Python manage memory? ============================== The details of Python memory management depend on the implementation. The standard implementation of Python, *CPython*, uses reference counting to detect inaccessible objects, and another mechanism to collect reference cycles, periodically executing a cycle detection algorithm which looks for inaccessible cycles and deletes the objects involved. The "gc" module provides functions to perform a garbage collection, obtain debugging statistics, and tune the collector’s parameters. Other implementations (such as Jython or PyPy), however, can rely on a different mechanism such as a full-blown garbage collector. This difference can cause some subtle porting problems if your Python code depends on the behavior of the reference counting implementation. In some Python implementations, the following code (which is fine in CPython) will probably run out of file descriptors: for file in very_long_list_of_files: f = open(file) c = f.read(1) Indeed, using CPython’s reference counting and destructor scheme, each new assignment to *f* closes the previous file. With a traditional GC, however, those file objects will only get collected (and closed) at varying and possibly long intervals. If you want to write code that will work with any Python implementation, you should explicitly close the file or use the "with" statement; this will work regardless of memory management scheme: for file in very_long_list_of_files: with open(file) as f: c = f.read(1) Why doesn’t CPython use a more traditional garbage collection scheme? ===================================================================== For one thing, this is not a C standard feature and hence it’s not portable. (Yes, we know about the Boehm GC library. It has bits of assembler code for *most* common platforms, not for all of them, and although it is mostly transparent, it isn’t completely transparent; patches are required to get Python to work with it.) Traditional GC also becomes a problem when Python is embedded into other applications. While in a standalone Python it’s fine to replace the standard malloc() and free() with versions provided by the GC library, an application embedding Python may want to have its *own* substitute for malloc() and free(), and may not want Python’s. Right now, CPython works with anything that implements malloc() and free() properly. Why isn’t all memory freed when CPython exits? ============================================== Objects referenced from the global namespaces of Python modules are not always deallocated when Python exits. This may happen if there are circular references. There are also certain bits of memory that are allocated by the C library that are impossible to free (e.g. a tool like Purify will complain about these). Python is, however, aggressive about cleaning up memory on exit and does try to destroy every single object. If you want to force Python to delete certain things on deallocation use the "atexit" module to run a function that will force those deletions. Why are there separate tuple and list data types? ================================================= Lists and tuples, while similar in many respects, are generally used in fundamentally different ways. Tuples can be thought of as being similar to Pascal records or C structs; they’re small collections of related data which may be of different types which are operated on as a group. For example, a Cartesian coordinate is appropriately represented as a tuple of two or three numbers. Lists, on the other hand, are more like arrays in other languages. They tend to hold a varying number of objects all of which have the same type and which are operated on one-by-one. For example, "os.listdir('.')" returns a list of strings representing the files in the current directory. Functions which operate on this output would generally not break if you added another file or two to the directory. Tuples are immutable, meaning that once a tuple has been created, you can’t replace any of its elements with a new value. Lists are mutable, meaning that you can always change a list’s elements. Only immutable elements can be used as dictionary keys, and hence only tuples and not lists can be used as keys. How are lists implemented in CPython? ===================================== CPython’s lists are really variable-length arrays, not Lisp-style linked lists. The implementation uses a contiguous array of references to other objects, and keeps a pointer to this array and the array’s length in a list head structure. This makes indexing a list "a[i]" an operation whose cost is independent of the size of the list or the value of the index. When items are appended or inserted, the array of references is resized. Some cleverness is applied to improve the performance of appending items repeatedly; when the array must be grown, some extra space is allocated so the next few times don’t require an actual resize. How are dictionaries implemented in CPython? ============================================ CPython’s dictionaries are implemented as resizable hash tables. Compared to B-trees, this gives better performance for lookup (the most common operation by far) under most circumstances, and the implementation is simpler. Dictionaries work by computing a hash code for each key stored in the dictionary using the "hash()" built-in function. The hash code varies widely depending on the key and a per-process seed; for example, “Python” could hash to -539294296 while “python”, a string that differs by a single bit, could hash to 1142331976. The hash code is then used to calculate a location in an internal array where the value will be stored. Assuming that you’re storing keys that all have different hash values, this means that dictionaries take constant time – O(1), in Big-O notation – to retrieve a key. Why must dictionary keys be immutable? ====================================== The hash table implementation of dictionaries uses a hash value calculated from the key value to find the key. If the key were a mutable object, its value could change, and thus its hash could also change. But since whoever changes the key object can’t tell that it was being used as a dictionary key, it can’t move the entry around in the dictionary. Then, when you try to look up the same object in the dictionary it won’t be found because its hash value is different. If you tried to look up the old value it wouldn’t be found either, because the value of the object found in that hash bin would be different. If you want a dictionary indexed with a list, simply convert the list to a tuple first; the function "tuple(L)" creates a tuple with the same entries as the list "L". Tuples are immutable and can therefore be used as dictionary keys. Some unacceptable solutions that have been proposed: * Hash lists by their address (object ID). This doesn’t work because if you construct a new list with the same value it won’t be found; e.g.: mydict = {[1, 2]: '12'} print(mydict[[1, 2]]) would raise a "KeyError" exception because the id of the "[1, 2]" used in the second line differs from that in the first line. In other words, dictionary keys should be compared using "==", not using "is". * Make a copy when using a list as a key. This doesn’t work because the list, being a mutable object, could contain a reference to itself, and then the copying code would run into an infinite loop. * Allow lists as keys but tell the user not to modify them. This would allow a class of hard-to-track bugs in programs when you forgot or modified a list by accident. It also invalidates an important invariant of dictionaries: every value in "d.keys()" is usable as a key of the dictionary. * Mark lists as read-only once they are used as a dictionary key. The problem is that it’s not just the top-level object that could change its value; you could use a tuple containing a list as a key. Entering anything as a key into a dictionary would require marking all objects reachable from there as read-only – and again, self- referential objects could cause an infinite loop. There is a trick to get around this if you need to, but use it at your own risk: You can wrap a mutable structure inside a class instance which has both a "__eq__()" and a "__hash__()" method. You must then make sure that the hash value for all such wrapper objects that reside in a dictionary (or other hash based structure), remain fixed while the object is in the dictionary (or other structure). class ListWrapper: def __init__(self, the_list): self.the_list = the_list def __eq__(self, other): return self.the_list == other.the_list def __hash__(self): l = self.the_list result = 98767 - len(l)*555 for i, el in enumerate(l): try: result = result + (hash(el) % 9999999) * 1001 + i except Exception: result = (result % 7777777) + i * 333 return result Note that the hash computation is complicated by the possibility that some members of the list may be unhashable and also by the possibility of arithmetic overflow. Furthermore it must always be the case that if "o1 == o2" (ie "o1.__eq__(o2) is True") then "hash(o1) == hash(o2)" (ie, "o1.__hash__() == o2.__hash__()"), regardless of whether the object is in a dictionary or not. If you fail to meet these restrictions dictionaries and other hash based structures will misbehave. In the case of ListWrapper, whenever the wrapper object is in a dictionary the wrapped list must not change to avoid anomalies. Don’t do this unless you are prepared to think hard about the requirements and the consequences of not meeting them correctly. Consider yourself warned. Why doesn’t list.sort() return the sorted list? =============================================== In situations where performance matters, making a copy of the list just to sort it would be wasteful. Therefore, "list.sort()" sorts the list in place. In order to remind you of that fact, it does not return the sorted list. This way, you won’t be fooled into accidentally overwriting a list when you need a sorted copy but also need to keep the unsorted version around. If you want to return a new list, use the built-in "sorted()" function instead. This function creates a new list from a provided iterable, sorts it and returns it. For example, here’s how to iterate over the keys of a dictionary in sorted order: for key in sorted(mydict): ... # do whatever with mydict[key]... How do you specify and enforce an interface spec in Python? =========================================================== An interface specification for a module as provided by languages such as C++ and Java describes the prototypes for the methods and functions of the module. Many feel that compile-time enforcement of interface specifications helps in the construction of large programs. Python 2.6 adds an "abc" module that lets you define Abstract Base Classes (ABCs). You can then use "isinstance()" and "issubclass()" to check whether an instance or a class implements a particular ABC. The "collections.abc" module defines a set of useful ABCs such as "Iterable", "Container", and "MutableMapping". For Python, many of the advantages of interface specifications can be obtained by an appropriate test discipline for components. A good test suite for a module can both provide a regression test and serve as a module interface specification and a set of examples. Many Python modules can be run as a script to provide a simple “self test.” Even modules which use complex external interfaces can often be tested in isolation using trivial “stub” emulations of the external interface. The "doctest" and "unittest" modules or third-party test frameworks can be used to construct exhaustive test suites that exercise every line of code in a module. An appropriate testing discipline can help build large complex applications in Python as well as having interface specifications would. In fact, it can be better because an interface specification cannot test certain properties of a program. For example, the "append()" method is expected to add new elements to the end of some internal list; an interface specification cannot test that your "append()" implementation will actually do this correctly, but it’s trivial to check this property in a test suite. Writing test suites is very helpful, and you might want to design your code to make it easily tested. One increasingly popular technique, test-driven development, calls for writing parts of the test suite first, before you write any of the actual code. Of course Python allows you to be sloppy and not write test cases at all. Why is there no goto? ===================== In the 1970s people realized that unrestricted goto could lead to messy “spaghetti” code that was hard to understand and revise. In a high-level language, it is also unneeded as long as there are ways to branch (in Python, with "if" statements and "or", "and", and "if-else" expressions) and loop (with "while" and "for" statements, possibly containing "continue" and "break"). One can also use exceptions to provide a “structured goto” that works even across function calls. Many feel that exceptions can conveniently emulate all reasonable uses of the “go” or “goto” constructs of C, Fortran, and other languages. For example: class label(Exception): pass # declare a label try: ... if condition: raise label() # goto label ... except label: # where to goto pass ... This doesn’t allow you to jump into the middle of a loop, but that’s usually considered an abuse of goto anyway. Use sparingly. Why can’t raw strings (r-strings) end with a backslash? ======================================================= More precisely, they can’t end with an odd number of backslashes: the unpaired backslash at the end escapes the closing quote character, leaving an unterminated string. Raw strings were designed to ease creating input for processors (chiefly regular expression engines) that want to do their own backslash escape processing. Such processors consider an unmatched trailing backslash to be an error anyway, so raw strings disallow that. In return, they allow you to pass on the string quote character by escaping it with a backslash. These rules work well when r-strings are used for their intended purpose. If you’re trying to build Windows pathnames, note that all Windows system calls accept forward slashes too: f = open("/mydir/file.txt") # works fine! If you’re trying to build a pathname for a DOS command, try e.g. one of dir = r"\this\is\my\dos\dir" "\\" dir = r"\this\is\my\dos\dir\ "[:-1] dir = "\\this\\is\\my\\dos\\dir\\" Why doesn’t Python have a “with” statement for attribute assignments? ===================================================================== Python has a ‘with’ statement that wraps the execution of a block, calling code on the entrance and exit from the block. Some languages have a construct that looks like this: with obj: a = 1 # equivalent to obj.a = 1 total = total + 1 # obj.total = obj.total + 1 In Python, such a construct would be ambiguous. Other languages, such as Object Pascal, Delphi, and C++, use static types, so it’s possible to know, in an unambiguous way, what member is being assigned to. This is the main point of static typing – the compiler *always* knows the scope of every variable at compile time. Python uses dynamic types. It is impossible to know in advance which attribute will be referenced at runtime. Member attributes may be added or removed from objects on the fly. This makes it impossible to know, from a simple reading, what attribute is being referenced: a local one, a global one, or a member attribute? For instance, take the following incomplete snippet: def foo(a): with a: print(x) The snippet assumes that “a” must have a member attribute called “x”. However, there is nothing in Python that tells the interpreter this. What should happen if “a” is, let us say, an integer? If there is a global variable named “x”, will it be used inside the with block? As you see, the dynamic nature of Python makes such choices much harder. The primary benefit of “with” and similar language features (reduction of code volume) can, however, easily be achieved in Python by assignment. Instead of: function(args).mydict[index][index].a = 21 function(args).mydict[index][index].b = 42 function(args).mydict[index][index].c = 63 write this: ref = function(args).mydict[index][index] ref.a = 21 ref.b = 42 ref.c = 63 This also has the side-effect of increasing execution speed because name bindings are resolved at run-time in Python, and the second version only needs to perform the resolution once. Why don’t generators support the with statement? ================================================ For technical reasons, a generator used directly as a context manager would not work correctly. When, as is most common, a generator is used as an iterator run to completion, no closing is needed. When it is, wrap it as “contextlib.closing(generator)” in the ‘with’ statement. Why are colons required for the if/while/def/class statements? ============================================================== The colon is required primarily to enhance readability (one of the results of the experimental ABC language). Consider this: if a == b print(a) versus if a == b: print(a) Notice how the second one is slightly easier to read. Notice further how a colon sets off the example in this FAQ answer; it’s a standard usage in English. Another minor reason is that the colon makes it easier for editors with syntax highlighting; they can look for colons to decide when indentation needs to be increased instead of having to do a more elaborate parsing of the program text. Why does Python allow commas at the end of lists and tuples? ============================================================ Python lets you add a trailing comma at the end of lists, tuples, and dictionaries: [1, 2, 3,] ('a', 'b', 'c',) d = { "A": [1, 5], "B": [6, 7], # last trailing comma is optional but good style } There are several reasons to allow this. When you have a literal value for a list, tuple, or dictionary spread across multiple lines, it’s easier to add more elements because you don’t have to remember to add a comma to the previous line. The lines can also be reordered without creating a syntax error. Accidentally omitting the comma can lead to errors that are hard to diagnose. For example: x = [ "fee", "fie" "foo", "fum" ] This list looks like it has four elements, but it actually contains three: “fee”, “fiefoo” and “fum”. Always adding the comma avoids this source of error. Allowing the trailing comma may also make programmatic code generation easier. Extending/Embedding FAQ *********************** Can I create my own functions in C? =================================== Yes, you can create built-in modules containing functions, variables, exceptions and even new types in C. This is explained in the document Extending and Embedding the Python Interpreter. Most intermediate or advanced Python books will also cover this topic. Can I create my own functions in C++? ===================================== Yes, using the C compatibility features found in C++. Place "extern "C" { ... }" around the Python include files and put "extern "C"" before each function that is going to be called by the Python interpreter. Global or static C++ objects with constructors are probably not a good idea. Writing C is hard; are there any alternatives? ============================================== There are a number of alternatives to writing your own C extensions, depending on what you’re trying to do. Cython and its relative Pyrex are compilers that accept a slightly modified form of Python and generate the corresponding C code. Cython and Pyrex make it possible to write an extension without having to learn Python’s C API. If you need to interface to some C or C++ library for which no Python extension currently exists, you can try wrapping the library’s data types and functions with a tool such as SWIG. SIP, CXX Boost, or Weave are also alternatives for wrapping C++ libraries. How can I execute arbitrary Python statements from C? ===================================================== The highest-level function to do this is "PyRun_SimpleString()" which takes a single string argument to be executed in the context of the module "__main__" and returns "0" for success and "-1" when an exception occurred (including "SyntaxError"). If you want more control, use "PyRun_String()"; see the source for "PyRun_SimpleString()" in "Python/pythonrun.c". How can I evaluate an arbitrary Python expression from C? ========================================================= Call the function "PyRun_String()" from the previous question with the start symbol "Py_eval_input"; it parses an expression, evaluates it and returns its value. How do I extract C values from a Python object? =============================================== That depends on the object’s type. If it’s a tuple, "PyTuple_Size()" returns its length and "PyTuple_GetItem()" returns the item at a specified index. Lists have similar functions, "PyListSize()" and "PyList_GetItem()". For bytes, "PyBytes_Size()" returns its length and "PyBytes_AsStringAndSize()" provides a pointer to its value and its length. Note that Python bytes objects may contain null bytes so C’s "strlen()" should not be used. To test the type of an object, first make sure it isn’t "NULL", and then use "PyBytes_Check()", "PyTuple_Check()", "PyList_Check()", etc. There is also a high-level API to Python objects which is provided by the so-called ‘abstract’ interface – read "Include/abstract.h" for further details. It allows interfacing with any kind of Python sequence using calls like "PySequence_Length()", "PySequence_GetItem()", etc. as well as many other useful protocols such as numbers ("PyNumber_Index()" et al.) and mappings in the PyMapping APIs. How do I use Py_BuildValue() to create a tuple of arbitrary length? =================================================================== You can’t. Use "PyTuple_Pack()" instead. How do I call an object’s method from C? ======================================== The "PyObject_CallMethod()" function can be used to call an arbitrary method of an object. The parameters are the object, the name of the method to call, a format string like that used with "Py_BuildValue()", and the argument values: PyObject * PyObject_CallMethod(PyObject *object, const char *method_name, const char *arg_format, ...); This works for any object that has methods – whether built-in or user- defined. You are responsible for eventually "Py_DECREF()"’ing the return value. To call, e.g., a file object’s “seek” method with arguments 10, 0 (assuming the file object pointer is “f”): res = PyObject_CallMethod(f, "seek", "(ii)", 10, 0); if (res == NULL) { ... an exception occurred ... } else { Py_DECREF(res); } Note that since "PyObject_CallObject()" *always* wants a tuple for the argument list, to call a function without arguments, pass “()” for the format, and to call a function with one argument, surround the argument in parentheses, e.g. “(i)”. How do I catch the output from PyErr_Print() (or anything that prints to stdout/stderr)? ======================================================================================== In Python code, define an object that supports the "write()" method. Assign this object to "sys.stdout" and "sys.stderr". Call print_error, or just allow the standard traceback mechanism to work. Then, the output will go wherever your "write()" method sends it. The easiest way to do this is to use the "io.StringIO" class: >>> import io, sys >>> sys.stdout = io.StringIO() >>> print('foo') >>> print('hello world!') >>> sys.stderr.write(sys.stdout.getvalue()) foo hello world! A custom object to do the same would look like this: >>> import io, sys >>> class StdoutCatcher(io.TextIOBase): ... def __init__(self): ... self.data = [] ... def write(self, stuff): ... self.data.append(stuff) ... >>> import sys >>> sys.stdout = StdoutCatcher() >>> print('foo') >>> print('hello world!') >>> sys.stderr.write(''.join(sys.stdout.data)) foo hello world! How do I access a module written in Python from C? ================================================== You can get a pointer to the module object as follows: module = PyImport_ImportModule(""); If the module hasn’t been imported yet (i.e. it is not yet present in "sys.modules"), this initializes the module; otherwise it simply returns the value of "sys.modules[""]". Note that it doesn’t enter the module into any namespace – it only ensures it has been initialized and is stored in "sys.modules". You can then access the module’s attributes (i.e. any name defined in the module) as follows: attr = PyObject_GetAttrString(module, ""); Calling "PyObject_SetAttrString()" to assign to variables in the module also works. How do I interface to C++ objects from Python? ============================================== Depending on your requirements, there are many approaches. To do this manually, begin by reading the “Extending and Embedding” document. Realize that for the Python run-time system, there isn’t a whole lot of difference between C and C++ – so the strategy of building a new Python type around a C structure (pointer) type will also work for C++ objects. For C++ libraries, see Writing C is hard; are there any alternatives?. I added a module using the Setup file and the make fails; why? ============================================================== Setup must end in a newline, if there is no newline there, the build process fails. (Fixing this requires some ugly shell script hackery, and this bug is so minor that it doesn’t seem worth the effort.) How do I debug an extension? ============================ When using GDB with dynamically loaded extensions, you can’t set a breakpoint in your extension until your extension is loaded. In your ".gdbinit" file (or interactively), add the command: br _PyImport_LoadDynamicModule Then, when you run GDB: $ gdb /local/bin/python gdb) run myscript.py gdb) continue # repeat until your extension is loaded gdb) finish # so that your extension is loaded gdb) br myfunction.c:50 gdb) continue I want to compile a Python module on my Linux system, but some files are missing. Why? ====================================================================================== Most packaged versions of Python don’t include the "/usr/lib/python2.*x*/config/" directory, which contains various files required for compiling Python extensions. For Red Hat, install the python-devel RPM to get the necessary files. For Debian, run "apt-get install python-dev". How do I tell “incomplete input” from “invalid input”? ====================================================== Sometimes you want to emulate the Python interactive interpreter’s behavior, where it gives you a continuation prompt when the input is incomplete (e.g. you typed the start of an “if” statement or you didn’t close your parentheses or triple string quotes), but it gives you a syntax error message immediately when the input is invalid. In Python you can use the "codeop" module, which approximates the parser’s behavior sufficiently. IDLE uses this, for example. The easiest way to do it in C is to call "PyRun_InteractiveLoop()" (perhaps in a separate thread) and let the Python interpreter handle the input for you. You can also set the "PyOS_ReadlineFunctionPointer()" to point at your custom input function. See "Modules/readline.c" and "Parser/myreadline.c" for more hints. How do I find undefined g++ symbols __builtin_new or __pure_virtual? ==================================================================== To dynamically load g++ extension modules, you must recompile Python, relink it using g++ (change LINKCC in the Python Modules Makefile), and link your extension module using g++ (e.g., "g++ -shared -o mymodule.so mymodule.o"). Can I create an object class with some methods implemented in C and others in Python (e.g. through inheritance)? ================================================================================================================ Yes, you can inherit from built-in classes such as "int", "list", "dict", etc. The Boost Python Library (BPL, http://www.boost.org/libs/python/doc/index.html) provides a way of doing this from C++ (i.e. you can inherit from an extension class written in C++ using the BPL). General Python FAQ ****************** General Information =================== What is Python? --------------- Python is an interpreted, interactive, object-oriented programming language. It incorporates modules, exceptions, dynamic typing, very high level dynamic data types, and classes. It supports multiple programming paradigms beyond object-oriented programming, such as procedural and functional programming. Python combines remarkable power with very clear syntax. It has interfaces to many system calls and libraries, as well as to various window systems, and is extensible in C or C++. It is also usable as an extension language for applications that need a programmable interface. Finally, Python is portable: it runs on many Unix variants including Linux and macOS, and on Windows. To find out more, start with The Python Tutorial. The Beginner’s Guide to Python links to other introductory tutorials and resources for learning Python. What is the Python Software Foundation? --------------------------------------- The Python Software Foundation is an independent non-profit organization that holds the copyright on Python versions 2.1 and newer. The PSF’s mission is to advance open source technology related to the Python programming language and to publicize the use of Python. The PSF’s home page is at https://www.python.org/psf/. Donations to the PSF are tax-exempt in the US. If you use Python and find it helpful, please contribute via the PSF donation page. Are there copyright restrictions on the use of Python? ------------------------------------------------------ You can do anything you want with the source, as long as you leave the copyrights in and display those copyrights in any documentation about Python that you produce. If you honor the copyright rules, it’s OK to use Python for commercial use, to sell copies of Python in source or binary form (modified or unmodified), or to sell products that incorporate Python in some form. We would still like to know about all commercial use of Python, of course. See the PSF license page to find further explanations and a link to the full text of the license. The Python logo is trademarked, and in certain cases permission is required to use it. Consult the Trademark Usage Policy for more information. Why was Python created in the first place? ------------------------------------------ Here’s a *very* brief summary of what started it all, written by Guido van Rossum: I had extensive experience with implementing an interpreted language in the ABC group at CWI, and from working with this group I had learned a lot about language design. This is the origin of many Python features, including the use of indentation for statement grouping and the inclusion of very-high-level data types (although the details are all different in Python). I had a number of gripes about the ABC language, but also liked many of its features. It was impossible to extend the ABC language (or its implementation) to remedy my complaints – in fact its lack of extensibility was one of its biggest problems. I had some experience with using Modula-2+ and talked with the designers of Modula-3 and read the Modula-3 report. Modula-3 is the origin of the syntax and semantics used for exceptions, and some other Python features. I was working in the Amoeba distributed operating system group at CWI. We needed a better way to do system administration than by writing either C programs or Bourne shell scripts, since Amoeba had its own system call interface which wasn’t easily accessible from the Bourne shell. My experience with error handling in Amoeba made me acutely aware of the importance of exceptions as a programming language feature. It occurred to me that a scripting language with a syntax like ABC but with access to the Amoeba system calls would fill the need. I realized that it would be foolish to write an Amoeba-specific language, so I decided that I needed a language that was generally extensible. During the 1989 Christmas holidays, I had a lot of time on my hand, so I decided to give it a try. During the next year, while still mostly working on it in my own time, Python was used in the Amoeba project with increasing success, and the feedback from colleagues made me add many early improvements. In February 1991, after just over a year of development, I decided to post to USENET. The rest is in the "Misc/HISTORY" file. What is Python good for? ------------------------ Python is a high-level general-purpose programming language that can be applied to many different classes of problems. The language comes with a large standard library that covers areas such as string processing (regular expressions, Unicode, calculating differences between files), Internet protocols (HTTP, FTP, SMTP, XML- RPC, POP, IMAP, CGI programming), software engineering (unit testing, logging, profiling, parsing Python code), and operating system interfaces (system calls, filesystems, TCP/IP sockets). Look at the table of contents for The Python Standard Library to get an idea of what’s available. A wide variety of third-party extensions are also available. Consult the Python Package Index to find packages of interest to you. How does the Python version numbering scheme work? -------------------------------------------------- Python versions are numbered A.B.C or A.B. A is the major version number – it is only incremented for really major changes in the language. B is the minor version number, incremented for less earth- shattering changes. C is the micro-level – it is incremented for each bugfix release. See **PEP 6** for more information about bugfix releases. Not all releases are bugfix releases. In the run-up to a new major release, a series of development releases are made, denoted as alpha, beta, or release candidate. Alphas are early releases in which interfaces aren’t yet finalized; it’s not unexpected to see an interface change between two alpha releases. Betas are more stable, preserving existing interfaces but possibly adding new modules, and release candidates are frozen, making no changes except as needed to fix critical bugs. Alpha, beta and release candidate versions have an additional suffix. The suffix for an alpha version is “aN” for some small number N, the suffix for a beta version is “bN” for some small number N, and the suffix for a release candidate version is “rcN” for some small number N. In other words, all versions labeled 2.0aN precede the versions labeled 2.0bN, which precede versions labeled 2.0rcN, and *those* precede 2.0. You may also find version numbers with a “+” suffix, e.g. “2.2+”. These are unreleased versions, built directly from the CPython development repository. In practice, after a final minor release is made, the version is incremented to the next minor version, which becomes the “a0” version, e.g. “2.4a0”. See also the documentation for "sys.version", "sys.hexversion", and "sys.version_info". How do I obtain a copy of the Python source? -------------------------------------------- The latest Python source distribution is always available from python.org, at https://www.python.org/downloads/. The latest development sources can be obtained at https://github.com/python/cpython/. The source distribution is a gzipped tar file containing the complete C source, Sphinx-formatted documentation, Python library modules, example programs, and several useful pieces of freely distributable software. The source will compile and run out of the box on most UNIX platforms. Consult the Getting Started section of the Python Developer’s Guide for more information on getting the source code and compiling it. How do I get documentation on Python? ------------------------------------- The standard documentation for the current stable version of Python is available at https://docs.python.org/3/. PDF, plain text, and downloadable HTML versions are also available at https://docs.python.org/3/download.html. The documentation is written in reStructuredText and processed by the Sphinx documentation tool. The reStructuredText source for the documentation is part of the Python source distribution. I’ve never programmed before. Is there a Python tutorial? --------------------------------------------------------- There are numerous tutorials and books available. The standard documentation includes The Python Tutorial. Consult the Beginner’s Guide to find information for beginning Python programmers, including lists of tutorials. Is there a newsgroup or mailing list devoted to Python? ------------------------------------------------------- There is a newsgroup, *comp.lang.python*, and a mailing list, python- list. The newsgroup and mailing list are gatewayed into each other – if you can read news it’s unnecessary to subscribe to the mailing list. *comp.lang.python* is high-traffic, receiving hundreds of postings every day, and Usenet readers are often more able to cope with this volume. Announcements of new software releases and events can be found in comp.lang.python.announce, a low-traffic moderated list that receives about five postings per day. It’s available as the python-announce mailing list. More info about other mailing lists and newsgroups can be found at https://www.python.org/community/lists/. How do I get a beta test version of Python? ------------------------------------------- Alpha and beta releases are available from https://www.python.org/downloads/. All releases are announced on the comp.lang.python and comp.lang.python.announce newsgroups and on the Python home page at https://www.python.org/; an RSS feed of news is available. You can also access the development version of Python through Git. See The Python Developer’s Guide for details. How do I submit bug reports and patches for Python? --------------------------------------------------- To report a bug or submit a patch, please use the Roundup installation at https://bugs.python.org/. You must have a Roundup account to report bugs; this makes it possible for us to contact you if we have follow-up questions. It will also enable Roundup to send you updates as we act on your bug. If you had previously used SourceForge to report bugs to Python, you can obtain your Roundup password through Roundup’s password reset procedure. For more information on how Python is developed, consult the Python Developer’s Guide. Are there any published articles about Python that I can reference? ------------------------------------------------------------------- It’s probably best to cite your favorite book about Python. The very first article about Python was written in 1991 and is now quite outdated. Guido van Rossum and Jelke de Boer, “Interactively Testing Remote Servers Using the Python Programming Language”, CWI Quarterly, Volume 4, Issue 4 (December 1991), Amsterdam, pp 283–303. Are there any books on Python? ------------------------------ Yes, there are many, and more are being published. See the python.org wiki at https://wiki.python.org/moin/PythonBooks for a list. You can also search online bookstores for “Python” and filter out the Monty Python references; or perhaps search for “Python” and “language”. Where in the world is www.python.org located? --------------------------------------------- The Python project’s infrastructure is located all over the world and is managed by the Python Infrastructure Team. Details here. Why is it called Python? ------------------------ When he began implementing Python, Guido van Rossum was also reading the published scripts from “Monty Python’s Flying Circus”, a BBC comedy series from the 1970s. Van Rossum thought he needed a name that was short, unique, and slightly mysterious, so he decided to call the language Python. Do I have to like “Monty Python’s Flying Circus”? ------------------------------------------------- No, but it helps. :) Python in the real world ======================== How stable is Python? --------------------- Very stable. New, stable releases have been coming out roughly every 6 to 18 months since 1991, and this seems likely to continue. As of version 3.9, Python will have a major new release every 12 months (**PEP 602**). The developers issue “bugfix” releases of older versions, so the stability of existing releases gradually improves. Bugfix releases, indicated by a third component of the version number (e.g. 3.5.3, 3.6.2), are managed for stability; only fixes for known problems are included in a bugfix release, and it’s guaranteed that interfaces will remain the same throughout a series of bugfix releases. The latest stable releases can always be found on the Python download page. There are two production-ready versions of Python: 2.x and 3.x. The recommended version is 3.x, which is supported by most widely used libraries. Although 2.x is still widely used, it is not maintained anymore. How many people are using Python? --------------------------------- There are probably millions of users, though it’s difficult to obtain an exact count. Python is available for free download, so there are no sales figures, and it’s available from many different sites and packaged with many Linux distributions, so download statistics don’t tell the whole story either. The comp.lang.python newsgroup is very active, but not all Python users post to the group or even read it. Have any significant projects been done in Python? -------------------------------------------------- See https://www.python.org/about/success for a list of projects that use Python. Consulting the proceedings for past Python conferences will reveal contributions from many different companies and organizations. High-profile Python projects include the Mailman mailing list manager and the Zope application server. Several Linux distributions, most notably Red Hat, have written part or all of their installer and system administration software in Python. Companies that use Python internally include Google, Yahoo, and Lucasfilm Ltd. What new developments are expected for Python in the future? ------------------------------------------------------------ See https://www.python.org/dev/peps/ for the Python Enhancement Proposals (PEPs). PEPs are design documents describing a suggested new feature for Python, providing a concise technical specification and a rationale. Look for a PEP titled “Python X.Y Release Schedule”, where X.Y is a version that hasn’t been publicly released yet. New development is discussed on the python-dev mailing list. Is it reasonable to propose incompatible changes to Python? ----------------------------------------------------------- In general, no. There are already millions of lines of Python code around the world, so any change in the language that invalidates more than a very small fraction of existing programs has to be frowned upon. Even if you can provide a conversion program, there’s still the problem of updating all documentation; many books have been written about Python, and we don’t want to invalidate them all at a single stroke. Providing a gradual upgrade path is necessary if a feature has to be changed. **PEP 5** describes the procedure followed for introducing backward-incompatible changes while minimizing disruption for users. Is Python a good language for beginning programmers? ---------------------------------------------------- Yes. It is still common to start students with a procedural and statically typed language such as Pascal, C, or a subset of C++ or Java. Students may be better served by learning Python as their first language. Python has a very simple and consistent syntax and a large standard library and, most importantly, using Python in a beginning programming course lets students concentrate on important programming skills such as problem decomposition and data type design. With Python, students can be quickly introduced to basic concepts such as loops and procedures. They can probably even work with user-defined objects in their very first course. For a student who has never programmed before, using a statically typed language seems unnatural. It presents additional complexity that the student must master and slows the pace of the course. The students are trying to learn to think like a computer, decompose problems, design consistent interfaces, and encapsulate data. While learning to use a statically typed language is important in the long term, it is not necessarily the best topic to address in the students’ first programming course. Many other aspects of Python make it a good first language. Like Java, Python has a large standard library so that students can be assigned programming projects very early in the course that *do* something. Assignments aren’t restricted to the standard four- function calculator and check balancing programs. By using the standard library, students can gain the satisfaction of working on realistic applications as they learn the fundamentals of programming. Using the standard library also teaches students about code reuse. Third-party modules such as PyGame are also helpful in extending the students’ reach. Python’s interactive interpreter enables students to test language features while they’re programming. They can keep a window with the interpreter running while they enter their program’s source in another window. If they can’t remember the methods for a list, they can do something like this: >>> L = [] >>> dir(L) ['__add__', '__class__', '__contains__', '__delattr__', '__delitem__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__', '__iter__', '__le__', '__len__', '__lt__', '__mul__', '__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__', '__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__', '__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort'] >>> [d for d in dir(L) if '__' not in d] ['append', 'clear', 'copy', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort'] >>> help(L.append) Help on built-in function append: append(...) L.append(object) -> None -- append object to end >>> L.append(1) >>> L [1] With the interpreter, documentation is never far from the student as they are programming. There are also good IDEs for Python. IDLE is a cross-platform IDE for Python that is written in Python using Tkinter. PythonWin is a Windows-specific IDE. Emacs users will be happy to know that there is a very good Python mode for Emacs. All of these programming environments provide syntax highlighting, auto-indenting, and access to the interactive interpreter while coding. Consult the Python wiki for a full list of Python editing environments. If you want to discuss Python’s use in education, you may be interested in joining the edu-sig mailing list. Graphic User Interface FAQ ************************** General GUI Questions ===================== What GUI toolkits exist for Python? =================================== Standard builds of Python include an object-oriented interface to the Tcl/Tk widget set, called tkinter. This is probably the easiest to install (since it comes included with most binary distributions of Python) and use. For more info about Tk, including pointers to the source, see the Tcl/Tk home page. Tcl/Tk is fully portable to the macOS, Windows, and Unix platforms. Depending on what platform(s) you are aiming at, there are also several alternatives. A list of cross-platform and platform-specific GUI frameworks can be found on the python wiki. Tkinter questions ================= How do I freeze Tkinter applications? ------------------------------------- Freeze is a tool to create stand-alone applications. When freezing Tkinter applications, the applications will not be truly stand-alone, as the application will still need the Tcl and Tk libraries. One solution is to ship the application with the Tcl and Tk libraries, and point to them at run-time using the "TCL_LIBRARY" and "TK_LIBRARY" environment variables. To get truly stand-alone applications, the Tcl scripts that form the library have to be integrated into the application as well. One tool supporting that is SAM (stand-alone modules), which is part of the Tix distribution (http://tix.sourceforge.net/). Build Tix with SAM enabled, perform the appropriate call to "Tclsam_init()", etc. inside Python’s "Modules/tkappinit.c", and link with libtclsam and libtksam (you might include the Tix libraries as well). Can I have Tk events handled while waiting for I/O? --------------------------------------------------- On platforms other than Windows, yes, and you don’t even need threads! But you’ll have to restructure your I/O code a bit. Tk has the equivalent of Xt’s "XtAddInput()" call, which allows you to register a callback function which will be called from the Tk mainloop when I/O is possible on a file descriptor. See File Handlers. I can’t get key bindings to work in Tkinter: why? ------------------------------------------------- An often-heard complaint is that event handlers bound to events with the "bind()" method don’t get handled even when the appropriate key is pressed. The most common cause is that the widget to which the binding applies doesn’t have “keyboard focus”. Check out the Tk documentation for the focus command. Usually a widget is given the keyboard focus by clicking in it (but not for labels; see the takefocus option). Python Frequently Asked Questions ********************************* * General Python FAQ * Programming FAQ * Design and History FAQ * Library and Extension FAQ * Extending/Embedding FAQ * Python on Windows FAQ * Graphic User Interface FAQ * “Why is Python Installed on my Computer?” FAQ “Why is Python Installed on my Computer?” FAQ ********************************************* What is Python? =============== Python is a programming language. It’s used for many different applications. It’s used in some high schools and colleges as an introductory programming language because Python is easy to learn, but it’s also used by professional software developers at places such as Google, NASA, and Lucasfilm Ltd. If you wish to learn more about Python, start with the Beginner’s Guide to Python. Why is Python installed on my machine? ====================================== If you find Python installed on your system but don’t remember installing it, there are several possible ways it could have gotten there. * Perhaps another user on the computer wanted to learn programming and installed it; you’ll have to figure out who’s been using the machine and might have installed it. * A third-party application installed on the machine might have been written in Python and included a Python installation. There are many such applications, from GUI programs to network servers and administrative scripts. * Some Windows machines also have Python installed. At this writing we’re aware of computers from Hewlett-Packard and Compaq that include Python. Apparently some of HP/Compaq’s administrative tools are written in Python. * Many Unix-compatible operating systems, such as macOS and some Linux distributions, have Python installed by default; it’s included in the base installation. Can I delete Python? ==================== That depends on where Python came from. If someone installed it deliberately, you can remove it without hurting anything. On Windows, use the Add/Remove Programs icon in the Control Panel. If Python was installed by a third-party application, you can also remove it, but that application will no longer work. You should use that application’s uninstaller rather than removing Python directly. If Python came with your operating system, removing it is not recommended. If you remove it, whatever tools were written in Python will no longer run, and some of them might be important to you. Reinstalling the whole system would then be required to fix things again. Library and Extension FAQ ************************* General Library Questions ========================= How do I find a module or application to perform task X? -------------------------------------------------------- Check the Library Reference to see if there’s a relevant standard library module. (Eventually you’ll learn what’s in the standard library and will be able to skip this step.) For third-party packages, search the Python Package Index or try Google or another Web search engine. Searching for “Python” plus a keyword or two for your topic of interest will usually find something helpful. Where is the math.py (socket.py, regex.py, etc.) source file? ------------------------------------------------------------- If you can’t find a source file for a module it may be a built-in or dynamically loaded module implemented in C, C++ or other compiled language. In this case you may not have the source file or it may be something like "mathmodule.c", somewhere in a C source directory (not on the Python Path). There are (at least) three kinds of modules in Python: 1. modules written in Python (.py); 2. modules written in C and dynamically loaded (.dll, .pyd, .so, .sl, etc); 3. modules written in C and linked with the interpreter; to get a list of these, type: import sys print(sys.builtin_module_names) How do I make a Python script executable on Unix? ------------------------------------------------- You need to do two things: the script file’s mode must be executable and the first line must begin with "#!" followed by the path of the Python interpreter. The first is done by executing "chmod +x scriptfile" or perhaps "chmod 755 scriptfile". The second can be done in a number of ways. The most straightforward way is to write #!/usr/local/bin/python as the very first line of your file, using the pathname for where the Python interpreter is installed on your platform. If you would like the script to be independent of where the Python interpreter lives, you can use the **env** program. Almost all Unix variants support the following, assuming the Python interpreter is in a directory on the user’s "PATH": #!/usr/bin/env python *Don’t* do this for CGI scripts. The "PATH" variable for CGI scripts is often very minimal, so you need to use the actual absolute pathname of the interpreter. Occasionally, a user’s environment is so full that the **/usr/bin/env** program fails; or there’s no env program at all. In that case, you can try the following hack (due to Alex Rezinsky): #! /bin/sh """:" exec python $0 ${1+"$@"} """ The minor disadvantage is that this defines the script’s __doc__ string. However, you can fix that by adding __doc__ = """...Whatever...""" Is there a curses/termcap package for Python? --------------------------------------------- For Unix variants: The standard Python source distribution comes with a curses module in the Modules subdirectory, though it’s not compiled by default. (Note that this is not available in the Windows distribution – there is no curses module for Windows.) The "curses" module supports basic curses features as well as many additional functions from ncurses and SYSV curses such as colour, alternative character set support, pads, and mouse support. This means the module isn’t compatible with operating systems that only have BSD curses, but there don’t seem to be any currently maintained OSes that fall into this category. Is there an equivalent to C’s onexit() in Python? ------------------------------------------------- The "atexit" module provides a register function that is similar to C’s "onexit()". Why don’t my signal handlers work? ---------------------------------- The most common problem is that the signal handler is declared with the wrong argument list. It is called as handler(signum, frame) so it should be declared with two parameters: def handler(signum, frame): ... Common tasks ============ How do I test a Python program or component? -------------------------------------------- Python comes with two testing frameworks. The "doctest" module finds examples in the docstrings for a module and runs them, comparing the output with the expected output given in the docstring. The "unittest" module is a fancier testing framework modelled on Java and Smalltalk testing frameworks. To make testing easier, you should use good modular design in your program. Your program should have almost all functionality encapsulated in either functions or class methods – and this sometimes has the surprising and delightful effect of making the program run faster (because local variable accesses are faster than global accesses). Furthermore the program should avoid depending on mutating global variables, since this makes testing much more difficult to do. The “global main logic” of your program may be as simple as if __name__ == "__main__": main_logic() at the bottom of the main module of your program. Once your program is organized as a tractable collection of function and class behaviours, you should write test functions that exercise the behaviours. A test suite that automates a sequence of tests can be associated with each module. This sounds like a lot of work, but since Python is so terse and flexible it’s surprisingly easy. You can make coding much more pleasant and fun by writing your test functions in parallel with the “production code”, since this makes it easy to find bugs and even design flaws earlier. “Support modules” that are not intended to be the main module of a program may include a self-test of the module. if __name__ == "__main__": self_test() Even programs that interact with complex external interfaces may be tested when the external interfaces are unavailable by using “fake” interfaces implemented in Python. How do I create documentation from doc strings? ----------------------------------------------- The "pydoc" module can create HTML from the doc strings in your Python source code. An alternative for creating API documentation purely from docstrings is epydoc. Sphinx can also include docstring content. How do I get a single keypress at a time? ----------------------------------------- For Unix variants there are several solutions. It’s straightforward to do this using curses, but curses is a fairly large module to learn. Threads ======= How do I program using threads? ------------------------------- Be sure to use the "threading" module and not the "_thread" module. The "threading" module builds convenient abstractions on top of the low-level primitives provided by the "_thread" module. None of my threads seem to run: why? ------------------------------------ As soon as the main thread exits, all threads are killed. Your main thread is running too quickly, giving the threads no time to do any work. A simple fix is to add a sleep to the end of the program that’s long enough for all the threads to finish: import threading, time def thread_task(name, n): for i in range(n): print(name, i) for i in range(10): T = threading.Thread(target=thread_task, args=(str(i), i)) T.start() time.sleep(10) # <---------------------------! But now (on many platforms) the threads don’t run in parallel, but appear to run sequentially, one at a time! The reason is that the OS thread scheduler doesn’t start a new thread until the previous thread is blocked. A simple fix is to add a tiny sleep to the start of the run function: def thread_task(name, n): time.sleep(0.001) # <--------------------! for i in range(n): print(name, i) for i in range(10): T = threading.Thread(target=thread_task, args=(str(i), i)) T.start() time.sleep(10) Instead of trying to guess a good delay value for "time.sleep()", it’s better to use some kind of semaphore mechanism. One idea is to use the "queue" module to create a queue object, let each thread append a token to the queue when it finishes, and let the main thread read as many tokens from the queue as there are threads. How do I parcel out work among a bunch of worker threads? --------------------------------------------------------- The easiest way is to use the "concurrent.futures" module, especially the "ThreadPoolExecutor" class. Or, if you want fine control over the dispatching algorithm, you can write your own logic manually. Use the "queue" module to create a queue containing a list of jobs. The "Queue" class maintains a list of objects and has a ".put(obj)" method that adds items to the queue and a ".get()" method to return them. The class will take care of the locking necessary to ensure that each job is handed out exactly once. Here’s a trivial example: import threading, queue, time # The worker thread gets jobs off the queue. When the queue is empty, it # assumes there will be no more work and exits. # (Realistically workers will run until terminated.) def worker(): print('Running worker') time.sleep(0.1) while True: try: arg = q.get(block=False) except queue.Empty: print('Worker', threading.currentThread(), end=' ') print('queue empty') break else: print('Worker', threading.currentThread(), end=' ') print('running with argument', arg) time.sleep(0.5) # Create queue q = queue.Queue() # Start a pool of 5 workers for i in range(5): t = threading.Thread(target=worker, name='worker %i' % (i+1)) t.start() # Begin adding work to the queue for i in range(50): q.put(i) # Give threads time to run print('Main thread sleeping') time.sleep(5) When run, this will produce the following output: Running worker Running worker Running worker Running worker Running worker Main thread sleeping Worker running with argument 0 Worker running with argument 1 Worker running with argument 2 Worker running with argument 3 Worker running with argument 4 Worker running with argument 5 ... Consult the module’s documentation for more details; the "Queue" class provides a featureful interface. What kinds of global value mutation are thread-safe? ---------------------------------------------------- A *global interpreter lock* (GIL) is used internally to ensure that only one thread runs in the Python VM at a time. In general, Python offers to switch among threads only between bytecode instructions; how frequently it switches can be set via "sys.setswitchinterval()". Each bytecode instruction and therefore all the C implementation code reached from each instruction is therefore atomic from the point of view of a Python program. In theory, this means an exact accounting requires an exact understanding of the PVM bytecode implementation. In practice, it means that operations on shared variables of built-in data types (ints, lists, dicts, etc) that “look atomic” really are. For example, the following operations are all atomic (L, L1, L2 are lists, D, D1, D2 are dicts, x, y are objects, i, j are ints): L.append(x) L1.extend(L2) x = L[i] x = L.pop() L1[i:j] = L2 L.sort() x = y x.field = y D[x] = y D1.update(D2) D.keys() These aren’t: i = i+1 L.append(L[-1]) L[i] = L[j] D[x] = D[x] + 1 Operations that replace other objects may invoke those other objects’ "__del__()" method when their reference count reaches zero, and that can affect things. This is especially true for the mass updates to dictionaries and lists. When in doubt, use a mutex! Can’t we get rid of the Global Interpreter Lock? ------------------------------------------------ The *global interpreter lock* (GIL) is often seen as a hindrance to Python’s deployment on high-end multiprocessor server machines, because a multi-threaded Python program effectively only uses one CPU, due to the insistence that (almost) all Python code can only run while the GIL is held. Back in the days of Python 1.5, Greg Stein actually implemented a comprehensive patch set (the “free threading” patches) that removed the GIL and replaced it with fine-grained locking. Adam Olsen recently did a similar experiment in his python-safethread project. Unfortunately, both experiments exhibited a sharp drop in single- thread performance (at least 30% slower), due to the amount of fine- grained locking necessary to compensate for the removal of the GIL. This doesn’t mean that you can’t make good use of Python on multi-CPU machines! You just have to be creative with dividing the work up between multiple *processes* rather than multiple *threads*. The "ProcessPoolExecutor" class in the new "concurrent.futures" module provides an easy way of doing so; the "multiprocessing" module provides a lower-level API in case you want more control over dispatching of tasks. Judicious use of C extensions will also help; if you use a C extension to perform a time-consuming task, the extension can release the GIL while the thread of execution is in the C code and allow other threads to get some work done. Some standard library modules such as "zlib" and "hashlib" already do this. It has been suggested that the GIL should be a per-interpreter-state lock rather than truly global; interpreters then wouldn’t be able to share objects. Unfortunately, this isn’t likely to happen either. It would be a tremendous amount of work, because many object implementations currently have global state. For example, small integers and short strings are cached; these caches would have to be moved to the interpreter state. Other object types have their own free list; these free lists would have to be moved to the interpreter state. And so on. And I doubt that it can even be done in finite time, because the same problem exists for 3rd party extensions. It is likely that 3rd party extensions are being written at a faster rate than you can convert them to store all their global state in the interpreter state. And finally, once you have multiple interpreters not sharing any state, what have you gained over running each interpreter in a separate process? Input and Output ================ How do I delete a file? (And other file questions…) --------------------------------------------------- Use "os.remove(filename)" or "os.unlink(filename)"; for documentation, see the "os" module. The two functions are identical; "unlink()" is simply the name of the Unix system call for this function. To remove a directory, use "os.rmdir()"; use "os.mkdir()" to create one. "os.makedirs(path)" will create any intermediate directories in "path" that don’t exist. "os.removedirs(path)" will remove intermediate directories as long as they’re empty; if you want to delete an entire directory tree and its contents, use "shutil.rmtree()". To rename a file, use "os.rename(old_path, new_path)". To truncate a file, open it using "f = open(filename, "rb+")", and use "f.truncate(offset)"; offset defaults to the current seek position. There’s also "os.ftruncate(fd, offset)" for files opened with "os.open()", where *fd* is the file descriptor (a small integer). The "shutil" module also contains a number of functions to work on files including "copyfile()", "copytree()", and "rmtree()". How do I copy a file? --------------------- The "shutil" module contains a "copyfile()" function. Note that on MacOS 9 it doesn’t copy the resource fork and Finder info. How do I read (or write) binary data? ------------------------------------- To read or write complex binary data formats, it’s best to use the "struct" module. It allows you to take a string containing binary data (usually numbers) and convert it to Python objects; and vice versa. For example, the following code reads two 2-byte integers and one 4-byte integer in big-endian format from a file: import struct with open(filename, "rb") as f: s = f.read(8) x, y, z = struct.unpack(">hhl", s) The ‘>’ in the format string forces big-endian data; the letter ‘h’ reads one “short integer” (2 bytes), and ‘l’ reads one “long integer” (4 bytes) from the string. For data that is more regular (e.g. a homogeneous list of ints or floats), you can also use the "array" module. Note: To read and write binary data, it is mandatory to open the file in binary mode (here, passing ""rb"" to "open()"). If you use ""r"" instead (the default), the file will be open in text mode and "f.read()" will return "str" objects rather than "bytes" objects. I can’t seem to use os.read() on a pipe created with os.popen(); why? --------------------------------------------------------------------- "os.read()" is a low-level function which takes a file descriptor, a small integer representing the opened file. "os.popen()" creates a high-level file object, the same type returned by the built-in "open()" function. Thus, to read *n* bytes from a pipe *p* created with "os.popen()", you need to use "p.read(n)". How do I access the serial (RS232) port? ---------------------------------------- For Win32, OSX, Linux, BSD, Jython, IronPython: https://pypi.org/project/pyserial/ For Unix, see a Usenet post by Mitch Chapman: https://groups.google.com/groups?selm=34A04430.CF9@ohioee.com Why doesn’t closing sys.stdout (stdin, stderr) really close it? --------------------------------------------------------------- Python *file objects* are a high-level layer of abstraction on low- level C file descriptors. For most file objects you create in Python via the built-in "open()" function, "f.close()" marks the Python file object as being closed from Python’s point of view, and also arranges to close the underlying C file descriptor. This also happens automatically in "f"’s destructor, when "f" becomes garbage. But stdin, stdout and stderr are treated specially by Python, because of the special status also given to them by C. Running "sys.stdout.close()" marks the Python-level file object as being closed, but does *not* close the associated C file descriptor. To close the underlying C file descriptor for one of these three, you should first be sure that’s what you really want to do (e.g., you may confuse extension modules trying to do I/O). If it is, use "os.close()": os.close(stdin.fileno()) os.close(stdout.fileno()) os.close(stderr.fileno()) Or you can use the numeric constants 0, 1 and 2, respectively. Network/Internet Programming ============================ What WWW tools are there for Python? ------------------------------------ See the chapters titled Internet Protocols and Support and Internet Data Handling in the Library Reference Manual. Python has many modules that will help you build server-side and client-side web systems. A summary of available frameworks is maintained by Paul Boddie at https://wiki.python.org/moin/WebProgramming. Cameron Laird maintains a useful set of pages about Python web technologies at http://phaseit.net/claird/comp.lang.python/web_python. How can I mimic CGI form submission (METHOD=POST)? -------------------------------------------------- I would like to retrieve web pages that are the result of POSTing a form. Is there existing code that would let me do this easily? Yes. Here’s a simple example that uses "urllib.request": #!/usr/local/bin/python import urllib.request # build the query string qs = "First=Josephine&MI=Q&Last=Public" # connect and send the server a path req = urllib.request.urlopen('http://www.some-server.out-there' '/cgi-bin/some-cgi-script', data=qs) with req: msg, hdrs = req.read(), req.info() Note that in general for percent-encoded POST operations, query strings must be quoted using "urllib.parse.urlencode()". For example, to send "name=Guy Steele, Jr.": >>> import urllib.parse >>> urllib.parse.urlencode({'name': 'Guy Steele, Jr.'}) 'name=Guy+Steele%2C+Jr.' See also: HOWTO Fetch Internet Resources Using The urllib Package for extensive examples. What module should I use to help with generating HTML? ------------------------------------------------------ You can find a collection of useful links on the Web Programming wiki page. How do I send mail from a Python script? ---------------------------------------- Use the standard library module "smtplib". Here’s a very simple interactive mail sender that uses it. This method will work on any host that supports an SMTP listener. import sys, smtplib fromaddr = input("From: ") toaddrs = input("To: ").split(',') print("Enter message, end with ^D:") msg = '' while True: line = sys.stdin.readline() if not line: break msg += line # The actual mail send server = smtplib.SMTP('localhost') server.sendmail(fromaddr, toaddrs, msg) server.quit() A Unix-only alternative uses sendmail. The location of the sendmail program varies between systems; sometimes it is "/usr/lib/sendmail", sometimes "/usr/sbin/sendmail". The sendmail manual page will help you out. Here’s some sample code: import os SENDMAIL = "/usr/sbin/sendmail" # sendmail location p = os.popen("%s -t -i" % SENDMAIL, "w") p.write("To: receiver@example.com\n") p.write("Subject: test\n") p.write("\n") # blank line separating headers from body p.write("Some text\n") p.write("some more text\n") sts = p.close() if sts != 0: print("Sendmail exit status", sts) How do I avoid blocking in the connect() method of a socket? ------------------------------------------------------------ The "select" module is commonly used to help with asynchronous I/O on sockets. To prevent the TCP connect from blocking, you can set the socket to non-blocking mode. Then when you do the "socket.connect()", you will either connect immediately (unlikely) or get an exception that contains the error number as ".errno". "errno.EINPROGRESS" indicates that the connection is in progress, but hasn’t finished yet. Different OSes will return different values, so you’re going to have to check what’s returned on your system. You can use the "socket.connect_ex()" method to avoid creating an exception. It will just return the errno value. To poll, you can call "socket.connect_ex()" again later – "0" or "errno.EISCONN" indicate that you’re connected – or you can pass this socket to "select.select()" to check if it’s writable. Note: The "asyncio" module provides a general purpose single-threaded and concurrent asynchronous library, which can be used for writing non- blocking network code. The third-party Twisted library is a popular and feature-rich alternative. Databases ========= Are there any interfaces to database packages in Python? -------------------------------------------------------- Yes. Interfaces to disk-based hashes such as "DBM" and "GDBM" are also included with standard Python. There is also the "sqlite3" module, which provides a lightweight disk-based relational database. Support for most relational databases is available. See the DatabaseProgramming wiki page for details. How do you implement persistent objects in Python? -------------------------------------------------- The "pickle" library module solves this in a very general way (though you still can’t store things like open files, sockets or windows), and the "shelve" library module uses pickle and (g)dbm to create persistent mappings containing arbitrary Python objects. Mathematics and Numerics ======================== How do I generate random numbers in Python? ------------------------------------------- The standard module "random" implements a random number generator. Usage is simple: import random random.random() This returns a random floating point number in the range [0, 1). There are also many other specialized generators in this module, such as: * "randrange(a, b)" chooses an integer in the range [a, b). * "uniform(a, b)" chooses a floating point number in the range [a, b). * "normalvariate(mean, sdev)" samples the normal (Gaussian) distribution. Some higher-level functions operate on sequences directly, such as: * "choice(S)" chooses a random element from a given sequence. * "shuffle(L)" shuffles a list in-place, i.e. permutes it randomly. There’s also a "Random" class you can instantiate to create independent multiple random number generators. Programming FAQ *************** General Questions ================= Is there a source code level debugger with breakpoints, single-stepping, etc.? ------------------------------------------------------------------------------ Yes. Several debuggers for Python are described below, and the built-in function "breakpoint()" allows you to drop into any of them. The pdb module is a simple but adequate console-mode debugger for Python. It is part of the standard Python library, and is "documented in the Library Reference Manual". You can also write your own debugger by using the code for pdb as an example. The IDLE interactive development environment, which is part of the standard Python distribution (normally available as Tools/scripts/idle), includes a graphical debugger. PythonWin is a Python IDE that includes a GUI debugger based on pdb. The PythonWin debugger colors breakpoints and has quite a few cool features such as debugging non-PythonWin programs. PythonWin is available as part of pywin32 project and as a part of the ActivePython distribution. Eric is an IDE built on PyQt and the Scintilla editing component. trepan3k is a gdb-like debugger. Visual Studio Code is an IDE with debugging tools that integrates with version-control software. There are a number of commercial Python IDEs that include graphical debuggers. They include: * Wing IDE * Komodo IDE * PyCharm Are there tools to help find bugs or perform static analysis? ------------------------------------------------------------- Yes. Pylint and Pyflakes do basic checking that will help you catch bugs sooner. Static type checkers such as Mypy, Pyre, and Pytype can check type hints in Python source code. How can I create a stand-alone binary from a Python script? ----------------------------------------------------------- You don’t need the ability to compile Python to C code if all you want is a stand-alone program that users can download and run without having to install the Python distribution first. There are a number of tools that determine the set of modules required by a program and bind these modules together with a Python binary to produce a single executable. One is to use the freeze tool, which is included in the Python source tree as "Tools/freeze". It converts Python byte code to C arrays; with a C compiler you can embed all your modules into a new program, which is then linked with the standard Python modules. It works by scanning your source recursively for import statements (in both forms) and looking for the modules in the standard Python path as well as in the source directory (for built-in modules). It then turns the bytecode for modules written in Python into C code (array initializers that can be turned into code objects using the marshal module) and creates a custom-made config file that only contains those built-in modules which are actually used in the program. It then compiles the generated C code and links it with the rest of the Python interpreter to form a self-contained binary which acts exactly like your script. The following packages can help with the creation of console and GUI executables: * Nuitka (Cross-platform) * PyInstaller (Cross-platform) * PyOxidizer (Cross-platform) * cx_Freeze (Cross-platform) * py2app (macOS only) * py2exe (Windows only) Are there coding standards or a style guide for Python programs? ---------------------------------------------------------------- Yes. The coding style required for standard library modules is documented as **PEP 8**. Core Language ============= Why am I getting an UnboundLocalError when the variable has a value? -------------------------------------------------------------------- It can be a surprise to get the UnboundLocalError in previously working code when it is modified by adding an assignment statement somewhere in the body of a function. This code: >>> x = 10 >>> def bar(): ... print(x) >>> bar() 10 works, but this code: >>> x = 10 >>> def foo(): ... print(x) ... x += 1 results in an UnboundLocalError: >>> foo() Traceback (most recent call last): ... UnboundLocalError: local variable 'x' referenced before assignment This is because when you make an assignment to a variable in a scope, that variable becomes local to that scope and shadows any similarly named variable in the outer scope. Since the last statement in foo assigns a new value to "x", the compiler recognizes it as a local variable. Consequently when the earlier "print(x)" attempts to print the uninitialized local variable and an error results. In the example above you can access the outer scope variable by declaring it global: >>> x = 10 >>> def foobar(): ... global x ... print(x) ... x += 1 >>> foobar() 10 This explicit declaration is required in order to remind you that (unlike the superficially analogous situation with class and instance variables) you are actually modifying the value of the variable in the outer scope: >>> print(x) 11 You can do a similar thing in a nested scope using the "nonlocal" keyword: >>> def foo(): ... x = 10 ... def bar(): ... nonlocal x ... print(x) ... x += 1 ... bar() ... print(x) >>> foo() 10 11 What are the rules for local and global variables in Python? ------------------------------------------------------------ In Python, variables that are only referenced inside a function are implicitly global. If a variable is assigned a value anywhere within the function’s body, it’s assumed to be a local unless explicitly declared as global. Though a bit surprising at first, a moment’s consideration explains this. On one hand, requiring "global" for assigned variables provides a bar against unintended side-effects. On the other hand, if "global" was required for all global references, you’d be using "global" all the time. You’d have to declare as global every reference to a built- in function or to a component of an imported module. This clutter would defeat the usefulness of the "global" declaration for identifying side-effects. Why do lambdas defined in a loop with different values all return the same result? ---------------------------------------------------------------------------------- Assume you use a for loop to define a few different lambdas (or even plain functions), e.g.: >>> squares = [] >>> for x in range(5): ... squares.append(lambda: x**2) This gives you a list that contains 5 lambdas that calculate "x**2". You might expect that, when called, they would return, respectively, "0", "1", "4", "9", and "16". However, when you actually try you will see that they all return "16": >>> squares[2]() 16 >>> squares[4]() 16 This happens because "x" is not local to the lambdas, but is defined in the outer scope, and it is accessed when the lambda is called — not when it is defined. At the end of the loop, the value of "x" is "4", so all the functions now return "4**2", i.e. "16". You can also verify this by changing the value of "x" and see how the results of the lambdas change: >>> x = 8 >>> squares[2]() 64 In order to avoid this, you need to save the values in variables local to the lambdas, so that they don’t rely on the value of the global "x": >>> squares = [] >>> for x in range(5): ... squares.append(lambda n=x: n**2) Here, "n=x" creates a new variable "n" local to the lambda and computed when the lambda is defined so that it has the same value that "x" had at that point in the loop. This means that the value of "n" will be "0" in the first lambda, "1" in the second, "2" in the third, and so on. Therefore each lambda will now return the correct result: >>> squares[2]() 4 >>> squares[4]() 16 Note that this behaviour is not peculiar to lambdas, but applies to regular functions too. How do I share global variables across modules? ----------------------------------------------- The canonical way to share information across modules within a single program is to create a special module (often called config or cfg). Just import the config module in all modules of your application; the module then becomes available as a global name. Because there is only one instance of each module, any changes made to the module object get reflected everywhere. For example: config.py: x = 0 # Default value of the 'x' configuration setting mod.py: import config config.x = 1 main.py: import config import mod print(config.x) Note that using a module is also the basis for implementing the Singleton design pattern, for the same reason. What are the “best practices” for using import in a module? ----------------------------------------------------------- In general, don’t use "from modulename import *". Doing so clutters the importer’s namespace, and makes it much harder for linters to detect undefined names. Import modules at the top of a file. Doing so makes it clear what other modules your code requires and avoids questions of whether the module name is in scope. Using one import per line makes it easy to add and delete module imports, but using multiple imports per line uses less screen space. It’s good practice if you import modules in the following order: 1. standard library modules – e.g. "sys", "os", "getopt", "re" 2. third-party library modules (anything installed in Python’s site- packages directory) – e.g. mx.DateTime, ZODB, PIL.Image, etc. 3. locally-developed modules It is sometimes necessary to move imports to a function or class to avoid problems with circular imports. Gordon McMillan says: Circular imports are fine where both modules use the “import ” form of import. They fail when the 2nd module wants to grab a name out of the first (“from module import name”) and the import is at the top level. That’s because names in the 1st are not yet available, because the first module is busy importing the 2nd. In this case, if the second module is only used in one function, then the import can easily be moved into that function. By the time the import is called, the first module will have finished initializing, and the second module can do its import. It may also be necessary to move imports out of the top level of code if some of the modules are platform-specific. In that case, it may not even be possible to import all of the modules at the top of the file. In this case, importing the correct modules in the corresponding platform-specific code is a good option. Only move imports into a local scope, such as inside a function definition, if it’s necessary to solve a problem such as avoiding a circular import or are trying to reduce the initialization time of a module. This technique is especially helpful if many of the imports are unnecessary depending on how the program executes. You may also want to move imports into a function if the modules are only ever used in that function. Note that loading a module the first time may be expensive because of the one time initialization of the module, but loading a module multiple times is virtually free, costing only a couple of dictionary lookups. Even if the module name has gone out of scope, the module is probably available in "sys.modules". Why are default values shared between objects? ---------------------------------------------- This type of bug commonly bites neophyte programmers. Consider this function: def foo(mydict={}): # Danger: shared reference to one dict for all calls ... compute something ... mydict[key] = value return mydict The first time you call this function, "mydict" contains a single item. The second time, "mydict" contains two items because when "foo()" begins executing, "mydict" starts out with an item already in it. It is often expected that a function call creates new objects for default values. This is not what happens. Default values are created exactly once, when the function is defined. If that object is changed, like the dictionary in this example, subsequent calls to the function will refer to this changed object. By definition, immutable objects such as numbers, strings, tuples, and "None", are safe from change. Changes to mutable objects such as dictionaries, lists, and class instances can lead to confusion. Because of this feature, it is good programming practice to not use mutable objects as default values. Instead, use "None" as the default value and inside the function, check if the parameter is "None" and create a new list/dictionary/whatever if it is. For example, don’t write: def foo(mydict={}): ... but: def foo(mydict=None): if mydict is None: mydict = {} # create a new dict for local namespace This feature can be useful. When you have a function that’s time- consuming to compute, a common technique is to cache the parameters and the resulting value of each call to the function, and return the cached value if the same value is requested again. This is called “memoizing”, and can be implemented like this: # Callers can only provide two parameters and optionally pass _cache by keyword def expensive(arg1, arg2, *, _cache={}): if (arg1, arg2) in _cache: return _cache[(arg1, arg2)] # Calculate the value result = ... expensive computation ... _cache[(arg1, arg2)] = result # Store result in the cache return result You could use a global variable containing a dictionary instead of the default value; it’s a matter of taste. How can I pass optional or keyword parameters from one function to another? --------------------------------------------------------------------------- Collect the arguments using the "*" and "**" specifiers in the function’s parameter list; this gives you the positional arguments as a tuple and the keyword arguments as a dictionary. You can then pass these arguments when calling another function by using "*" and "**": def f(x, *args, **kwargs): ... kwargs['width'] = '14.3c' ... g(x, *args, **kwargs) What is the difference between arguments and parameters? -------------------------------------------------------- *Parameters* are defined by the names that appear in a function definition, whereas *arguments* are the values actually passed to a function when calling it. Parameters define what types of arguments a function can accept. For example, given the function definition: def func(foo, bar=None, **kwargs): pass *foo*, *bar* and *kwargs* are parameters of "func". However, when calling "func", for example: func(42, bar=314, extra=somevar) the values "42", "314", and "somevar" are arguments. Why did changing list ‘y’ also change list ‘x’? ----------------------------------------------- If you wrote code like: >>> x = [] >>> y = x >>> y.append(10) >>> y [10] >>> x [10] you might be wondering why appending an element to "y" changed "x" too. There are two factors that produce this result: 1. Variables are simply names that refer to objects. Doing "y = x" doesn’t create a copy of the list – it creates a new variable "y" that refers to the same object "x" refers to. This means that there is only one object (the list), and both "x" and "y" refer to it. 2. Lists are *mutable*, which means that you can change their content. After the call to "append()", the content of the mutable object has changed from "[]" to "[10]". Since both the variables refer to the same object, using either name accesses the modified value "[10]". If we instead assign an immutable object to "x": >>> x = 5 # ints are immutable >>> y = x >>> x = x + 1 # 5 can't be mutated, we are creating a new object here >>> x 6 >>> y 5 we can see that in this case "x" and "y" are not equal anymore. This is because integers are *immutable*, and when we do "x = x + 1" we are not mutating the int "5" by incrementing its value; instead, we are creating a new object (the int "6") and assigning it to "x" (that is, changing which object "x" refers to). After this assignment we have two objects (the ints "6" and "5") and two variables that refer to them ("x" now refers to "6" but "y" still refers to "5"). Some operations (for example "y.append(10)" and "y.sort()") mutate the object, whereas superficially similar operations (for example "y = y + [10]" and "sorted(y)") create a new object. In general in Python (and in all cases in the standard library) a method that mutates an object will return "None" to help avoid getting the two types of operations confused. So if you mistakenly write "y.sort()" thinking it will give you a sorted copy of "y", you’ll instead end up with "None", which will likely cause your program to generate an easily diagnosed error. However, there is one class of operations where the same operation sometimes has different behaviors with different types: the augmented assignment operators. For example, "+=" mutates lists but not tuples or ints ("a_list += [1, 2, 3]" is equivalent to "a_list.extend([1, 2, 3])" and mutates "a_list", whereas "some_tuple += (1, 2, 3)" and "some_int += 1" create new objects). In other words: * If we have a mutable object ("list", "dict", "set", etc.), we can use some specific operations to mutate it and all the variables that refer to it will see the change. * If we have an immutable object ("str", "int", "tuple", etc.), all the variables that refer to it will always see the same value, but operations that transform that value into a new value always return a new object. If you want to know if two variables refer to the same object or not, you can use the "is" operator, or the built-in function "id()". How do I write a function with output parameters (call by reference)? --------------------------------------------------------------------- Remember that arguments are passed by assignment in Python. Since assignment just creates references to objects, there’s no alias between an argument name in the caller and callee, and so no call-by- reference per se. You can achieve the desired effect in a number of ways. 1. By returning a tuple of the results: >>> def func1(a, b): ... a = 'new-value' # a and b are local names ... b = b + 1 # assigned to new objects ... return a, b # return new values ... >>> x, y = 'old-value', 99 >>> func1(x, y) ('new-value', 100) This is almost always the clearest solution. 2. By using global variables. This isn’t thread-safe, and is not recommended. 3. By passing a mutable (changeable in-place) object: >>> def func2(a): ... a[0] = 'new-value' # 'a' references a mutable list ... a[1] = a[1] + 1 # changes a shared object ... >>> args = ['old-value', 99] >>> func2(args) >>> args ['new-value', 100] 4. By passing in a dictionary that gets mutated: >>> def func3(args): ... args['a'] = 'new-value' # args is a mutable dictionary ... args['b'] = args['b'] + 1 # change it in-place ... >>> args = {'a': 'old-value', 'b': 99} >>> func3(args) >>> args {'a': 'new-value', 'b': 100} 5. Or bundle up values in a class instance: >>> class Namespace: ... def __init__(self, /, **args): ... for key, value in args.items(): ... setattr(self, key, value) ... >>> def func4(args): ... args.a = 'new-value' # args is a mutable Namespace ... args.b = args.b + 1 # change object in-place ... >>> args = Namespace(a='old-value', b=99) >>> func4(args) >>> vars(args) {'a': 'new-value', 'b': 100} There’s almost never a good reason to get this complicated. Your best choice is to return a tuple containing the multiple results. How do you make a higher order function in Python? -------------------------------------------------- You have two choices: you can use nested scopes or you can use callable objects. For example, suppose you wanted to define "linear(a,b)" which returns a function "f(x)" that computes the value "a*x+b". Using nested scopes: def linear(a, b): def result(x): return a * x + b return result Or using a callable object: class linear: def __init__(self, a, b): self.a, self.b = a, b def __call__(self, x): return self.a * x + self.b In both cases, taxes = linear(0.3, 2) gives a callable object where "taxes(10e6) == 0.3 * 10e6 + 2". The callable object approach has the disadvantage that it is a bit slower and results in slightly longer code. However, note that a collection of callables can share their signature via inheritance: class exponential(linear): # __init__ inherited def __call__(self, x): return self.a * (x ** self.b) Object can encapsulate state for several methods: class counter: value = 0 def set(self, x): self.value = x def up(self): self.value = self.value + 1 def down(self): self.value = self.value - 1 count = counter() inc, dec, reset = count.up, count.down, count.set Here "inc()", "dec()" and "reset()" act like functions which share the same counting variable. How do I copy an object in Python? ---------------------------------- In general, try "copy.copy()" or "copy.deepcopy()" for the general case. Not all objects can be copied, but most can. Some objects can be copied more easily. Dictionaries have a "copy()" method: newdict = olddict.copy() Sequences can be copied by slicing: new_l = l[:] How can I find the methods or attributes of an object? ------------------------------------------------------ For an instance x of a user-defined class, "dir(x)" returns an alphabetized list of the names containing the instance attributes and methods and attributes defined by its class. How can my code discover the name of an object? ----------------------------------------------- Generally speaking, it can’t, because objects don’t really have names. Essentially, assignment always binds a name to a value; the same is true of "def" and "class" statements, but in that case the value is a callable. Consider the following code: >>> class A: ... pass ... >>> B = A >>> a = B() >>> b = a >>> print(b) <__main__.A object at 0x16D07CC> >>> print(a) <__main__.A object at 0x16D07CC> Arguably the class has a name: even though it is bound to two names and invoked through the name B the created instance is still reported as an instance of class A. However, it is impossible to say whether the instance’s name is a or b, since both names are bound to the same value. Generally speaking it should not be necessary for your code to “know the names” of particular values. Unless you are deliberately writing introspective programs, this is usually an indication that a change of approach might be beneficial. In comp.lang.python, Fredrik Lundh once gave an excellent analogy in answer to this question: The same way as you get the name of that cat you found on your porch: the cat (object) itself cannot tell you its name, and it doesn’t really care – so the only way to find out what it’s called is to ask all your neighbours (namespaces) if it’s their cat (object)… ….and don’t be surprised if you’ll find that it’s known by many names, or no name at all! What’s up with the comma operator’s precedence? ----------------------------------------------- Comma is not an operator in Python. Consider this session: >>> "a" in "b", "a" (False, 'a') Since the comma is not an operator, but a separator between expressions the above is evaluated as if you had entered: ("a" in "b"), "a" not: "a" in ("b", "a") The same is true of the various assignment operators ("=", "+=" etc). They are not truly operators but syntactic delimiters in assignment statements. Is there an equivalent of C’s “?:” ternary operator? ---------------------------------------------------- Yes, there is. The syntax is as follows: [on_true] if [expression] else [on_false] x, y = 50, 25 small = x if x < y else y Before this syntax was introduced in Python 2.5, a common idiom was to use logical operators: [expression] and [on_true] or [on_false] However, this idiom is unsafe, as it can give wrong results when *on_true* has a false boolean value. Therefore, it is always better to use the "... if ... else ..." form. Is it possible to write obfuscated one-liners in Python? -------------------------------------------------------- Yes. Usually this is done by nesting "lambda" within "lambda". See the following three examples, due to Ulf Bartelt: from functools import reduce # Primes < 1000 print(list(filter(None,map(lambda y:y*reduce(lambda x,y:x*y!=0, map(lambda x,y=y:y%x,range(2,int(pow(y,0.5)+1))),1),range(2,1000))))) # First 10 Fibonacci numbers print(list(map(lambda x,f=lambda x,f:(f(x-1,f)+f(x-2,f)) if x>1 else 1: f(x,f), range(10)))) # Mandelbrot set print((lambda Ru,Ro,Iu,Io,IM,Sx,Sy:reduce(lambda x,y:x+y,map(lambda y, Iu=Iu,Io=Io,Ru=Ru,Ro=Ro,Sy=Sy,L=lambda yc,Iu=Iu,Io=Io,Ru=Ru,Ro=Ro,i=IM, Sx=Sx,Sy=Sy:reduce(lambda x,y:x+y,map(lambda x,xc=Ru,yc=yc,Ru=Ru,Ro=Ro, i=i,Sx=Sx,F=lambda xc,yc,x,y,k,f=lambda xc,yc,x,y,k,f:(k<=0)or (x*x+y*y >=4.0) or 1+f(xc,yc,x*x-y*y+xc,2.0*x*y+yc,k-1,f):f(xc,yc,x,y,k,f):chr( 64+F(Ru+x*(Ro-Ru)/Sx,yc,0,0,i)),range(Sx))):L(Iu+y*(Io-Iu)/Sy),range(Sy ))))(-2.1, 0.7, -1.2, 1.2, 30, 80, 24)) # \___ ___/ \___ ___/ | | |__ lines on screen # V V | |______ columns on screen # | | |__________ maximum of "iterations" # | |_________________ range on y axis # |____________________________ range on x axis Don’t try this at home, kids! What does the slash(/) in the parameter list of a function mean? ---------------------------------------------------------------- A slash in the argument list of a function denotes that the parameters prior to it are positional-only. Positional-only parameters are the ones without an externally-usable name. Upon calling a function that accepts positional-only parameters, arguments are mapped to parameters based solely on their position. For example, "divmod()" is a function that accepts positional-only parameters. Its documentation looks like this: >>> help(divmod) Help on built-in function divmod in module builtins: divmod(x, y, /) Return the tuple (x//y, x%y). Invariant: div*y + mod == x. The slash at the end of the parameter list means that both parameters are positional-only. Thus, calling "divmod()" with keyword arguments would lead to an error: >>> divmod(x=3, y=4) Traceback (most recent call last): File "", line 1, in TypeError: divmod() takes no keyword arguments Numbers and strings =================== How do I specify hexadecimal and octal integers? ------------------------------------------------ To specify an octal digit, precede the octal value with a zero, and then a lower or uppercase “o”. For example, to set the variable “a” to the octal value “10” (8 in decimal), type: >>> a = 0o10 >>> a 8 Hexadecimal is just as easy. Simply precede the hexadecimal number with a zero, and then a lower or uppercase “x”. Hexadecimal digits can be specified in lower or uppercase. For example, in the Python interpreter: >>> a = 0xa5 >>> a 165 >>> b = 0XB2 >>> b 178 Why does -22 // 10 return -3? ----------------------------- It’s primarily driven by the desire that "i % j" have the same sign as "j". If you want that, and also want: i == (i // j) * j + (i % j) then integer division has to return the floor. C also requires that identity to hold, and then compilers that truncate "i // j" need to make "i % j" have the same sign as "i". There are few real use cases for "i % j" when "j" is negative. When "j" is positive, there are many, and in virtually all of them it’s more useful for "i % j" to be ">= 0". If the clock says 10 now, what did it say 200 hours ago? "-190 % 12 == 2" is useful; "-190 % 12 == -10" is a bug waiting to bite. How do I get int literal attribute instead of SyntaxError? ---------------------------------------------------------- Trying to lookup an "int" literal attribute in the normal manner gives a syntax error because the period is seen as a decimal point: >>> 1.__class__ File "", line 1 1.__class__ ^ SyntaxError: invalid decimal literal The solution is to separate the literal from the period with either a space or parentheses. >>> 1 .__class__ >>> (1).__class__ How do I convert a string to a number? -------------------------------------- For integers, use the built-in "int()" type constructor, e.g. "int('144') == 144". Similarly, "float()" converts to floating-point, e.g. "float('144') == 144.0". By default, these interpret the number as decimal, so that "int('0144') == 144" holds true, and "int('0x144')" raises "ValueError". "int(string, base)" takes the base to convert from as a second optional argument, so "int( '0x144', 16) == 324". If the base is specified as 0, the number is interpreted using Python’s rules: a leading ‘0o’ indicates octal, and ‘0x’ indicates a hex number. Do not use the built-in function "eval()" if all you need is to convert strings to numbers. "eval()" will be significantly slower and it presents a security risk: someone could pass you a Python expression that might have unwanted side effects. For example, someone could pass "__import__('os').system("rm -rf $HOME")" which would erase your home directory. "eval()" also has the effect of interpreting numbers as Python expressions, so that e.g. "eval('09')" gives a syntax error because Python does not allow leading ‘0’ in a decimal number (except ‘0’). How do I convert a number to a string? -------------------------------------- To convert, e.g., the number 144 to the string ‘144’, use the built-in type constructor "str()". If you want a hexadecimal or octal representation, use the built-in functions "hex()" or "oct()". For fancy formatting, see the Formatted string literals and Format String Syntax sections, e.g. ""{:04d}".format(144)" yields "'0144'" and ""{:.3f}".format(1.0/3.0)" yields "'0.333'". How do I modify a string in place? ---------------------------------- You can’t, because strings are immutable. In most situations, you should simply construct a new string from the various parts you want to assemble it from. However, if you need an object with the ability to modify in-place unicode data, try using an "io.StringIO" object or the "array" module: >>> import io >>> s = "Hello, world" >>> sio = io.StringIO(s) >>> sio.getvalue() 'Hello, world' >>> sio.seek(7) 7 >>> sio.write("there!") 6 >>> sio.getvalue() 'Hello, there!' >>> import array >>> a = array.array('u', s) >>> print(a) array('u', 'Hello, world') >>> a[0] = 'y' >>> print(a) array('u', 'yello, world') >>> a.tounicode() 'yello, world' How do I use strings to call functions/methods? ----------------------------------------------- There are various techniques. * The best is to use a dictionary that maps strings to functions. The primary advantage of this technique is that the strings do not need to match the names of the functions. This is also the primary technique used to emulate a case construct: def a(): pass def b(): pass dispatch = {'go': a, 'stop': b} # Note lack of parens for funcs dispatch[get_input()]() # Note trailing parens to call function * Use the built-in function "getattr()": import foo getattr(foo, 'bar')() Note that "getattr()" works on any object, including classes, class instances, modules, and so on. This is used in several places in the standard library, like this: class Foo: def do_foo(self): ... def do_bar(self): ... f = getattr(foo_instance, 'do_' + opname) f() * Use "locals()" to resolve the function name: def myFunc(): print("hello") fname = "myFunc" f = locals()[fname] f() Is there an equivalent to Perl’s chomp() for removing trailing newlines from strings? ------------------------------------------------------------------------------------- You can use "S.rstrip("\r\n")" to remove all occurrences of any line terminator from the end of the string "S" without removing other trailing whitespace. If the string "S" represents more than one line, with several empty lines at the end, the line terminators for all the blank lines will be removed: >>> lines = ("line 1 \r\n" ... "\r\n" ... "\r\n") >>> lines.rstrip("\n\r") 'line 1 ' Since this is typically only desired when reading text one line at a time, using "S.rstrip()" this way works well. Is there a scanf() or sscanf() equivalent? ------------------------------------------ Not as such. For simple input parsing, the easiest approach is usually to split the line into whitespace-delimited words using the "split()" method of string objects and then convert decimal strings to numeric values using "int()" or "float()". "split()" supports an optional “sep” parameter which is useful if the line uses something other than whitespace as a separator. For more complicated input parsing, regular expressions are more powerful than C’s "sscanf()" and better suited for the task. What does ‘UnicodeDecodeError’ or ‘UnicodeEncodeError’ error mean? ------------------------------------------------------------------- See the Unicode HOWTO. Performance =========== My program is too slow. How do I speed it up? --------------------------------------------- That’s a tough one, in general. First, here are a list of things to remember before diving further: * Performance characteristics vary across Python implementations. This FAQ focuses on *CPython*. * Behaviour can vary across operating systems, especially when talking about I/O or multi-threading. * You should always find the hot spots in your program *before* attempting to optimize any code (see the "profile" module). * Writing benchmark scripts will allow you to iterate quickly when searching for improvements (see the "timeit" module). * It is highly recommended to have good code coverage (through unit testing or any other technique) before potentially introducing regressions hidden in sophisticated optimizations. That being said, there are many tricks to speed up Python code. Here are some general principles which go a long way towards reaching acceptable performance levels: * Making your algorithms faster (or changing to faster ones) can yield much larger benefits than trying to sprinkle micro-optimization tricks all over your code. * Use the right data structures. Study documentation for the Built-in Types and the "collections" module. * When the standard library provides a primitive for doing something, it is likely (although not guaranteed) to be faster than any alternative you may come up with. This is doubly true for primitives written in C, such as builtins and some extension types. For example, be sure to use either the "list.sort()" built-in method or the related "sorted()" function to do sorting (and see the Sorting HOW TO for examples of moderately advanced usage). * Abstractions tend to create indirections and force the interpreter to work more. If the levels of indirection outweigh the amount of useful work done, your program will be slower. You should avoid excessive abstraction, especially under the form of tiny functions or methods (which are also often detrimental to readability). If you have reached the limit of what pure Python can allow, there are tools to take you further away. For example, Cython can compile a slightly modified version of Python code into a C extension, and can be used on many different platforms. Cython can take advantage of compilation (and optional type annotations) to make your code significantly faster than when interpreted. If you are confident in your C programming skills, you can also write a C extension module yourself. See also: The wiki page devoted to performance tips. What is the most efficient way to concatenate many strings together? -------------------------------------------------------------------- "str" and "bytes" objects are immutable, therefore concatenating many strings together is inefficient as each concatenation creates a new object. In the general case, the total runtime cost is quadratic in the total string length. To accumulate many "str" objects, the recommended idiom is to place them into a list and call "str.join()" at the end: chunks = [] for s in my_strings: chunks.append(s) result = ''.join(chunks) (another reasonably efficient idiom is to use "io.StringIO") To accumulate many "bytes" objects, the recommended idiom is to extend a "bytearray" object using in-place concatenation (the "+=" operator): result = bytearray() for b in my_bytes_objects: result += b Sequences (Tuples/Lists) ======================== How do I convert between tuples and lists? ------------------------------------------ The type constructor "tuple(seq)" converts any sequence (actually, any iterable) into a tuple with the same items in the same order. For example, "tuple([1, 2, 3])" yields "(1, 2, 3)" and "tuple('abc')" yields "('a', 'b', 'c')". If the argument is a tuple, it does not make a copy but returns the same object, so it is cheap to call "tuple()" when you aren’t sure that an object is already a tuple. The type constructor "list(seq)" converts any sequence or iterable into a list with the same items in the same order. For example, "list((1, 2, 3))" yields "[1, 2, 3]" and "list('abc')" yields "['a', 'b', 'c']". If the argument is a list, it makes a copy just like "seq[:]" would. What’s a negative index? ------------------------ Python sequences are indexed with positive numbers and negative numbers. For positive numbers 0 is the first index 1 is the second index and so forth. For negative indices -1 is the last index and -2 is the penultimate (next to last) index and so forth. Think of "seq[-n]" as the same as "seq[len(seq)-n]". Using negative indices can be very convenient. For example "S[:-1]" is all of the string except for its last character, which is useful for removing the trailing newline from a string. How do I iterate over a sequence in reverse order? -------------------------------------------------- Use the "reversed()" built-in function: for x in reversed(sequence): ... # do something with x ... This won’t touch your original sequence, but build a new copy with reversed order to iterate over. How do you remove duplicates from a list? ----------------------------------------- See the Python Cookbook for a long discussion of many ways to do this: https://code.activestate.com/recipes/52560/ If you don’t mind reordering the list, sort it and then scan from the end of the list, deleting duplicates as you go: if mylist: mylist.sort() last = mylist[-1] for i in range(len(mylist)-2, -1, -1): if last == mylist[i]: del mylist[i] else: last = mylist[i] If all elements of the list may be used as set keys (i.e. they are all *hashable*) this is often faster mylist = list(set(mylist)) This converts the list into a set, thereby removing duplicates, and then back into a list. How do you remove multiple items from a list -------------------------------------------- As with removing duplicates, explicitly iterating in reverse with a delete condition is one possibility. However, it is easier and faster to use slice replacement with an implicit or explicit forward iteration. Here are three variations.: mylist[:] = filter(keep_function, mylist) mylist[:] = (x for x in mylist if keep_condition) mylist[:] = [x for x in mylist if keep_condition] The list comprehension may be fastest. How do you make an array in Python? ----------------------------------- Use a list: ["this", 1, "is", "an", "array"] Lists are equivalent to C or Pascal arrays in their time complexity; the primary difference is that a Python list can contain objects of many different types. The "array" module also provides methods for creating arrays of fixed types with compact representations, but they are slower to index than lists. Also note that NumPy and other third party packages define array-like structures with various characteristics as well. To get Lisp-style linked lists, you can emulate cons cells using tuples: lisp_list = ("like", ("this", ("example", None) ) ) If mutability is desired, you could use lists instead of tuples. Here the analogue of lisp car is "lisp_list[0]" and the analogue of cdr is "lisp_list[1]". Only do this if you’re sure you really need to, because it’s usually a lot slower than using Python lists. How do I create a multidimensional list? ---------------------------------------- You probably tried to make a multidimensional array like this: >>> A = [[None] * 2] * 3 This looks correct if you print it: >>> A [[None, None], [None, None], [None, None]] But when you assign a value, it shows up in multiple places: >>> A[0][0] = 5 >>> A [[5, None], [5, None], [5, None]] The reason is that replicating a list with "*" doesn’t create copies, it only creates references to the existing objects. The "*3" creates a list containing 3 references to the same list of length two. Changes to one row will show in all rows, which is almost certainly not what you want. The suggested approach is to create a list of the desired length first and then fill in each element with a newly created list: A = [None] * 3 for i in range(3): A[i] = [None] * 2 This generates a list containing 3 different lists of length two. You can also use a list comprehension: w, h = 2, 3 A = [[None] * w for i in range(h)] Or, you can use an extension that provides a matrix datatype; NumPy is the best known. How do I apply a method to a sequence of objects? ------------------------------------------------- Use a list comprehension: result = [obj.method() for obj in mylist] Why does a_tuple[i] += [‘item’] raise an exception when the addition works? --------------------------------------------------------------------------- This is because of a combination of the fact that augmented assignment operators are *assignment* operators, and the difference between mutable and immutable objects in Python. This discussion applies in general when augmented assignment operators are applied to elements of a tuple that point to mutable objects, but we’ll use a "list" and "+=" as our exemplar. If you wrote: >>> a_tuple = (1, 2) >>> a_tuple[0] += 1 Traceback (most recent call last): ... TypeError: 'tuple' object does not support item assignment The reason for the exception should be immediately clear: "1" is added to the object "a_tuple[0]" points to ("1"), producing the result object, "2", but when we attempt to assign the result of the computation, "2", to element "0" of the tuple, we get an error because we can’t change what an element of a tuple points to. Under the covers, what this augmented assignment statement is doing is approximately this: >>> result = a_tuple[0] + 1 >>> a_tuple[0] = result Traceback (most recent call last): ... TypeError: 'tuple' object does not support item assignment It is the assignment part of the operation that produces the error, since a tuple is immutable. When you write something like: >>> a_tuple = (['foo'], 'bar') >>> a_tuple[0] += ['item'] Traceback (most recent call last): ... TypeError: 'tuple' object does not support item assignment The exception is a bit more surprising, and even more surprising is the fact that even though there was an error, the append worked: >>> a_tuple[0] ['foo', 'item'] To see why this happens, you need to know that (a) if an object implements an "__iadd__" magic method, it gets called when the "+=" augmented assignment is executed, and its return value is what gets used in the assignment statement; and (b) for lists, "__iadd__" is equivalent to calling "extend" on the list and returning the list. That’s why we say that for lists, "+=" is a “shorthand” for "list.extend": >>> a_list = [] >>> a_list += [1] >>> a_list [1] This is equivalent to: >>> result = a_list.__iadd__([1]) >>> a_list = result The object pointed to by a_list has been mutated, and the pointer to the mutated object is assigned back to "a_list". The end result of the assignment is a no-op, since it is a pointer to the same object that "a_list" was previously pointing to, but the assignment still happens. Thus, in our tuple example what is happening is equivalent to: >>> result = a_tuple[0].__iadd__(['item']) >>> a_tuple[0] = result Traceback (most recent call last): ... TypeError: 'tuple' object does not support item assignment The "__iadd__" succeeds, and thus the list is extended, but even though "result" points to the same object that "a_tuple[0]" already points to, that final assignment still results in an error, because tuples are immutable. I want to do a complicated sort: can you do a Schwartzian Transform in Python? ------------------------------------------------------------------------------ The technique, attributed to Randal Schwartz of the Perl community, sorts the elements of a list by a metric which maps each element to its “sort value”. In Python, use the "key" argument for the "list.sort()" method: Isorted = L[:] Isorted.sort(key=lambda s: int(s[10:15])) How can I sort one list by values from another list? ---------------------------------------------------- Merge them into an iterator of tuples, sort the resulting list, and then pick out the element you want. >>> list1 = ["what", "I'm", "sorting", "by"] >>> list2 = ["something", "else", "to", "sort"] >>> pairs = zip(list1, list2) >>> pairs = sorted(pairs) >>> pairs [("I'm", 'else'), ('by', 'sort'), ('sorting', 'to'), ('what', 'something')] >>> result = [x[1] for x in pairs] >>> result ['else', 'sort', 'to', 'something'] Objects ======= What is a class? ---------------- A class is the particular object type created by executing a class statement. Class objects are used as templates to create instance objects, which embody both the data (attributes) and code (methods) specific to a datatype. A class can be based on one or more other classes, called its base class(es). It then inherits the attributes and methods of its base classes. This allows an object model to be successively refined by inheritance. You might have a generic "Mailbox" class that provides basic accessor methods for a mailbox, and subclasses such as "MboxMailbox", "MaildirMailbox", "OutlookMailbox" that handle various specific mailbox formats. What is a method? ----------------- A method is a function on some object "x" that you normally call as "x.name(arguments...)". Methods are defined as functions inside the class definition: class C: def meth(self, arg): return arg * 2 + self.attribute What is self? ------------- Self is merely a conventional name for the first argument of a method. A method defined as "meth(self, a, b, c)" should be called as "x.meth(a, b, c)" for some instance "x" of the class in which the definition occurs; the called method will think it is called as "meth(x, a, b, c)". See also Why must ‘self’ be used explicitly in method definitions and calls?. How do I check if an object is an instance of a given class or of a subclass of it? ----------------------------------------------------------------------------------- Use the built-in function "isinstance(obj, cls)". You can check if an object is an instance of any of a number of classes by providing a tuple instead of a single class, e.g. "isinstance(obj, (class1, class2, ...))", and can also check whether an object is one of Python’s built-in types, e.g. "isinstance(obj, str)" or "isinstance(obj, (int, float, complex))". Note that "isinstance()" also checks for virtual inheritance from an *abstract base class*. So, the test will return "True" for a registered class even if hasn’t directly or indirectly inherited from it. To test for “true inheritance”, scan the *MRO* of the class: from collections.abc import Mapping class P: pass class C(P): pass Mapping.register(P) >>> c = C() >>> isinstance(c, C) # direct True >>> isinstance(c, P) # indirect True >>> isinstance(c, Mapping) # virtual True # Actual inheritance chain >>> type(c).__mro__ (, , ) # Test for "true inheritance" >>> Mapping in type(c).__mro__ False Note that most programs do not use "isinstance()" on user-defined classes very often. If you are developing the classes yourself, a more proper object-oriented style is to define methods on the classes that encapsulate a particular behaviour, instead of checking the object’s class and doing a different thing based on what class it is. For example, if you have a function that does something: def search(obj): if isinstance(obj, Mailbox): ... # code to search a mailbox elif isinstance(obj, Document): ... # code to search a document elif ... A better approach is to define a "search()" method on all the classes and just call it: class Mailbox: def search(self): ... # code to search a mailbox class Document: def search(self): ... # code to search a document obj.search() What is delegation? ------------------- Delegation is an object oriented technique (also called a design pattern). Let’s say you have an object "x" and want to change the behaviour of just one of its methods. You can create a new class that provides a new implementation of the method you’re interested in changing and delegates all other methods to the corresponding method of "x". Python programmers can easily implement delegation. For example, the following class implements a class that behaves like a file but converts all written data to uppercase: class UpperOut: def __init__(self, outfile): self._outfile = outfile def write(self, s): self._outfile.write(s.upper()) def __getattr__(self, name): return getattr(self._outfile, name) Here the "UpperOut" class redefines the "write()" method to convert the argument string to uppercase before calling the underlying "self._outfile.write()" method. All other methods are delegated to the underlying "self._outfile" object. The delegation is accomplished via the "__getattr__" method; consult the language reference for more information about controlling attribute access. Note that for more general cases delegation can get trickier. When attributes must be set as well as retrieved, the class must define a "__setattr__()" method too, and it must do so carefully. The basic implementation of "__setattr__()" is roughly equivalent to the following: class X: ... def __setattr__(self, name, value): self.__dict__[name] = value ... Most "__setattr__()" implementations must modify "self.__dict__" to store local state for self without causing an infinite recursion. How do I call a method defined in a base class from a derived class that overrides it? -------------------------------------------------------------------------------------- Use the built-in "super()" function: class Derived(Base): def meth(self): super(Derived, self).meth() For version prior to 3.0, you may be using classic classes: For a class definition such as "class Derived(Base): ..." you can call method "meth()" defined in "Base" (or one of "Base"’s base classes) as "Base.meth(self, arguments...)". Here, "Base.meth" is an unbound method, so you need to provide the "self" argument. How can I organize my code to make it easier to change the base class? ---------------------------------------------------------------------- You could assign the base class to an alias and derive from the alias. Then all you have to change is the value assigned to the alias. Incidentally, this trick is also handy if you want to decide dynamically (e.g. depending on availability of resources) which base class to use. Example: class Base: ... BaseAlias = Base class Derived(BaseAlias): ... How do I create static class data and static class methods? ----------------------------------------------------------- Both static data and static methods (in the sense of C++ or Java) are supported in Python. For static data, simply define a class attribute. To assign a new value to the attribute, you have to explicitly use the class name in the assignment: class C: count = 0 # number of times C.__init__ called def __init__(self): C.count = C.count + 1 def getcount(self): return C.count # or return self.count "c.count" also refers to "C.count" for any "c" such that "isinstance(c, C)" holds, unless overridden by "c" itself or by some class on the base-class search path from "c.__class__" back to "C". Caution: within a method of C, an assignment like "self.count = 42" creates a new and unrelated instance named “count” in "self"’s own dict. Rebinding of a class-static data name must always specify the class whether inside a method or not: C.count = 314 Static methods are possible: class C: @staticmethod def static(arg1, arg2, arg3): # No 'self' parameter! ... However, a far more straightforward way to get the effect of a static method is via a simple module-level function: def getcount(): return C.count If your code is structured so as to define one class (or tightly related class hierarchy) per module, this supplies the desired encapsulation. How can I overload constructors (or methods) in Python? ------------------------------------------------------- This answer actually applies to all methods, but the question usually comes up first in the context of constructors. In C++ you’d write class C { C() { cout << "No arguments\n"; } C(int i) { cout << "Argument is " << i << "\n"; } } In Python you have to write a single constructor that catches all cases using default arguments. For example: class C: def __init__(self, i=None): if i is None: print("No arguments") else: print("Argument is", i) This is not entirely equivalent, but close enough in practice. You could also try a variable-length argument list, e.g. def __init__(self, *args): ... The same approach works for all method definitions. I try to use __spam and I get an error about _SomeClassName__spam. ------------------------------------------------------------------ Variable names with double leading underscores are “mangled” to provide a simple but effective way to define class private variables. Any identifier of the form "__spam" (at least two leading underscores, at most one trailing underscore) is textually replaced with "_classname__spam", where "classname" is the current class name with any leading underscores stripped. This doesn’t guarantee privacy: an outside user can still deliberately access the “_classname__spam” attribute, and private values are visible in the object’s "__dict__". Many Python programmers never bother to use private variable names at all. My class defines __del__ but it is not called when I delete the object. ----------------------------------------------------------------------- There are several possible reasons for this. The del statement does not necessarily call "__del__()" – it simply decrements the object’s reference count, and if this reaches zero "__del__()" is called. If your data structures contain circular links (e.g. a tree where each child has a parent reference and each parent has a list of children) the reference counts will never go back to zero. Once in a while Python runs an algorithm to detect such cycles, but the garbage collector might run some time after the last reference to your data structure vanishes, so your "__del__()" method may be called at an inconvenient and random time. This is inconvenient if you’re trying to reproduce a problem. Worse, the order in which object’s "__del__()" methods are executed is arbitrary. You can run "gc.collect()" to force a collection, but there *are* pathological cases where objects will never be collected. Despite the cycle collector, it’s still a good idea to define an explicit "close()" method on objects to be called whenever you’re done with them. The "close()" method can then remove attributes that refer to subobjects. Don’t call "__del__()" directly – "__del__()" should call "close()" and "close()" should make sure that it can be called more than once for the same object. Another way to avoid cyclical references is to use the "weakref" module, which allows you to point to objects without incrementing their reference count. Tree data structures, for instance, should use weak references for their parent and sibling references (if they need them!). Finally, if your "__del__()" method raises an exception, a warning message is printed to "sys.stderr". How do I get a list of all instances of a given class? ------------------------------------------------------ Python does not keep track of all instances of a class (or of a built- in type). You can program the class’s constructor to keep track of all instances by keeping a list of weak references to each instance. Why does the result of "id()" appear to be not unique? ------------------------------------------------------ The "id()" builtin returns an integer that is guaranteed to be unique during the lifetime of the object. Since in CPython, this is the object’s memory address, it happens frequently that after an object is deleted from memory, the next freshly created object is allocated at the same position in memory. This is illustrated by this example: >>> id(1000) 13901272 >>> id(2000) 13901272 The two ids belong to different integer objects that are created before, and deleted immediately after execution of the "id()" call. To be sure that objects whose id you want to examine are still alive, create another reference to the object: >>> a = 1000; b = 2000 >>> id(a) 13901272 >>> id(b) 13891296 When can I rely on identity tests with the *is* operator? --------------------------------------------------------- The "is" operator tests for object identity. The test "a is b" is equivalent to "id(a) == id(b)". The most important property of an identity test is that an object is always identical to itself, "a is a" always returns "True". Identity tests are usually faster than equality tests. And unlike equality tests, identity tests are guaranteed to return a boolean "True" or "False". However, identity tests can *only* be substituted for equality tests when object identity is assured. Generally, there are three circumstances where identity is guaranteed: 1) Assignments create new names but do not change object identity. After the assignment "new = old", it is guaranteed that "new is old". 2) Putting an object in a container that stores object references does not change object identity. After the list assignment "s[0] = x", it is guaranteed that "s[0] is x". 3) If an object is a singleton, it means that only one instance of that object can exist. After the assignments "a = None" and "b = None", it is guaranteed that "a is b" because "None" is a singleton. In most other circumstances, identity tests are inadvisable and equality tests are preferred. In particular, identity tests should not be used to check constants such as "int" and "str" which aren’t guaranteed to be singletons: >>> a = 1000 >>> b = 500 >>> c = b + 500 >>> a is c False >>> a = 'Python' >>> b = 'Py' >>> c = b + 'thon' >>> a is c False Likewise, new instances of mutable containers are never identical: >>> a = [] >>> b = [] >>> a is b False In the standard library code, you will see several common patterns for correctly using identity tests: 1) As recommended by **PEP 8**, an identity test is the preferred way to check for "None". This reads like plain English in code and avoids confusion with other objects that may have boolean values that evaluate to false. 2) Detecting optional arguments can be tricky when "None" is a valid input value. In those situations, you can create a singleton sentinel object guaranteed to be distinct from other objects. For example, here is how to implement a method that behaves like "dict.pop()": _sentinel = object() def pop(self, key, default=_sentinel): if key in self: value = self[key] del self[key] return value if default is _sentinel: raise KeyError(key) return default 3) Container implementations sometimes need to augment equality tests with identity tests. This prevents the code from being confused by objects such as "float('NaN')" that are not equal to themselves. For example, here is the implementation of "collections.abc.Sequence.__contains__()": def __contains__(self, value): for v in self: if v is value or v == value: return True return False How can a subclass control what data is stored in an immutable instance? ------------------------------------------------------------------------ When subclassing an immutable type, override the "__new__()" method instead of the "__init__()" method. The latter only runs *after* an instance is created, which is too late to alter data in an immutable instance. All of these immutable classes have a different signature than their parent class: from datetime import date class FirstOfMonthDate(date): "Always choose the first day of the month" def __new__(cls, year, month, day): return super().__new__(cls, year, month, 1) class NamedInt(int): "Allow text names for some numbers" xlat = {'zero': 0, 'one': 1, 'ten': 10} def __new__(cls, value): value = cls.xlat.get(value, value) return super().__new__(cls, value) class TitleStr(str): "Convert str to name suitable for a URL path" def __new__(cls, s): s = s.lower().replace(' ', '-') s = ''.join([c for c in s if c.isalnum() or c == '-']) return super().__new__(cls, s) The classes can be used like this: >>> FirstOfMonthDate(2012, 2, 14) FirstOfMonthDate(2012, 2, 1) >>> NamedInt('ten') 10 >>> NamedInt(20) 20 >>> TitleStr('Blog: Why Python Rocks') 'blog-why-python-rocks' Modules ======= How do I create a .pyc file? ---------------------------- When a module is imported for the first time (or when the source file has changed since the current compiled file was created) a ".pyc" file containing the compiled code should be created in a "__pycache__" subdirectory of the directory containing the ".py" file. The ".pyc" file will have a filename that starts with the same name as the ".py" file, and ends with ".pyc", with a middle component that depends on the particular "python" binary that created it. (See **PEP 3147** for details.) One reason that a ".pyc" file may not be created is a permissions problem with the directory containing the source file, meaning that the "__pycache__" subdirectory cannot be created. This can happen, for example, if you develop as one user but run as another, such as if you are testing with a web server. Unless the "PYTHONDONTWRITEBYTECODE" environment variable is set, creation of a .pyc file is automatic if you’re importing a module and Python has the ability (permissions, free space, etc…) to create a "__pycache__" subdirectory and write the compiled module to that subdirectory. Running Python on a top level script is not considered an import and no ".pyc" will be created. For example, if you have a top-level module "foo.py" that imports another module "xyz.py", when you run "foo" (by typing "python foo.py" as a shell command), a ".pyc" will be created for "xyz" because "xyz" is imported, but no ".pyc" file will be created for "foo" since "foo.py" isn’t being imported. If you need to create a ".pyc" file for "foo" – that is, to create a ".pyc" file for a module that is not imported – you can, using the "py_compile" and "compileall" modules. The "py_compile" module can manually compile any module. One way is to use the "compile()" function in that module interactively: >>> import py_compile >>> py_compile.compile('foo.py') This will write the ".pyc" to a "__pycache__" subdirectory in the same location as "foo.py" (or you can override that with the optional parameter "cfile"). You can also automatically compile all files in a directory or directories using the "compileall" module. You can do it from the shell prompt by running "compileall.py" and providing the path of a directory containing Python files to compile: python -m compileall . How do I find the current module name? -------------------------------------- A module can find out its own module name by looking at the predefined global variable "__name__". If this has the value "'__main__'", the program is running as a script. Many modules that are usually used by importing them also provide a command-line interface or a self-test, and only execute this code after checking "__name__": def main(): print('Running test...') ... if __name__ == '__main__': main() How can I have modules that mutually import each other? ------------------------------------------------------- Suppose you have the following modules: foo.py: from bar import bar_var foo_var = 1 bar.py: from foo import foo_var bar_var = 2 The problem is that the interpreter will perform the following steps: * main imports foo * Empty globals for foo are created * foo is compiled and starts executing * foo imports bar * Empty globals for bar are created * bar is compiled and starts executing * bar imports foo (which is a no-op since there already is a module named foo) * bar.foo_var = foo.foo_var The last step fails, because Python isn’t done with interpreting "foo" yet and the global symbol dictionary for "foo" is still empty. The same thing happens when you use "import foo", and then try to access "foo.foo_var" in global code. There are (at least) three possible workarounds for this problem. Guido van Rossum recommends avoiding all uses of "from import ...", and placing all code inside functions. Initializations of global variables and class variables should use constants or built-in functions only. This means everything from an imported module is referenced as ".". Jim Roskind suggests performing steps in the following order in each module: * exports (globals, functions, and classes that don’t need imported base classes) * "import" statements * active code (including globals that are initialized from imported values). Van Rossum doesn’t like this approach much because the imports appear in a strange place, but it does work. Matthias Urlichs recommends restructuring your code so that the recursive import is not necessary in the first place. These solutions are not mutually exclusive. __import__(‘x.y.z’) returns ; how do I get z? --------------------------------------------------------- Consider using the convenience function "import_module()" from "importlib" instead: z = importlib.import_module('x.y.z') When I edit an imported module and reimport it, the changes don’t show up. Why does this happen? ------------------------------------------------------------------------------------------------- For reasons of efficiency as well as consistency, Python only reads the module file on the first time a module is imported. If it didn’t, in a program consisting of many modules where each one imports the same basic module, the basic module would be parsed and re-parsed many times. To force re-reading of a changed module, do this: import importlib import modname importlib.reload(modname) Warning: this technique is not 100% fool-proof. In particular, modules containing statements like from modname import some_objects will continue to work with the old version of the imported objects. If the module contains class definitions, existing class instances will *not* be updated to use the new class definition. This can result in the following paradoxical behaviour: >>> import importlib >>> import cls >>> c = cls.C() # Create an instance of C >>> importlib.reload(cls) >>> isinstance(c, cls.C) # isinstance is false?!? False The nature of the problem is made clear if you print out the “identity” of the class objects: >>> hex(id(c.__class__)) '0x7352a0' >>> hex(id(cls.C)) '0x4198d0' Python on Windows FAQ ********************* How do I run a Python program under Windows? ============================================ This is not necessarily a straightforward question. If you are already familiar with running programs from the Windows command line then everything will seem obvious; otherwise, you might need a little more guidance. Unless you use some sort of integrated development environment, you will end up *typing* Windows commands into what is referred to as a “Command prompt window”. Usually you can create such a window from your search bar by searching for "cmd". You should be able to recognize when you have started such a window because you will see a Windows “command prompt”, which usually looks like this: C:\> The letter may be different, and there might be other things after it, so you might just as easily see something like: D:\YourName\Projects\Python> depending on how your computer has been set up and what else you have recently done with it. Once you have started such a window, you are well on the way to running Python programs. You need to realize that your Python scripts have to be processed by another program called the Python *interpreter*. The interpreter reads your script, compiles it into bytecodes, and then executes the bytecodes to run your program. So, how do you arrange for the interpreter to handle your Python? First, you need to make sure that your command window recognises the word “py” as an instruction to start the interpreter. If you have opened a command window, you should try entering the command "py" and hitting return: C:\Users\YourName> py You should then see something like: Python 3.6.4 (v3.6.4:d48eceb, Dec 19 2017, 06:04:45) [MSC v.1900 32 bit (Intel)] on win32 Type "help", "copyright", "credits" or "license" for more information. >>> You have started the interpreter in “interactive mode”. That means you can enter Python statements or expressions interactively and have them executed or evaluated while you wait. This is one of Python’s strongest features. Check it by entering a few expressions of your choice and seeing the results: >>> print("Hello") Hello >>> "Hello" * 3 'HelloHelloHello' Many people use the interactive mode as a convenient yet highly programmable calculator. When you want to end your interactive Python session, call the "exit()" function or hold the "Ctrl" key down while you enter a "Z", then hit the “"Enter"” key to get back to your Windows command prompt. You may also find that you have a Start-menu entry such as Start ‣ Programs ‣ Python 3.x ‣ Python (command line) that results in you seeing the ">>>" prompt in a new window. If so, the window will disappear after you call the "exit()" function or enter the "Ctrl-Z" character; Windows is running a single “python” command in the window, and closes it when you terminate the interpreter. Now that we know the "py" command is recognized, you can give your Python script to it. You’ll have to give either an absolute or a relative path to the Python script. Let’s say your Python script is located in your desktop and is named "hello.py", and your command prompt is nicely opened in your home directory so you’re seeing something similar to: C:\Users\YourName> So now you’ll ask the "py" command to give your script to Python by typing "py" followed by your script path: C:\Users\YourName> py Desktop\hello.py hello How do I make Python scripts executable? ======================================== On Windows, the standard Python installer already associates the .py extension with a file type (Python.File) and gives that file type an open command that runs the interpreter ("D:\Program Files\Python\python.exe "%1" %*"). This is enough to make scripts executable from the command prompt as ‘foo.py’. If you’d rather be able to execute the script by simple typing ‘foo’ with no extension you need to add .py to the PATHEXT environment variable. Why does Python sometimes take so long to start? ================================================ Usually Python starts very quickly on Windows, but occasionally there are bug reports that Python suddenly begins to take a long time to start up. This is made even more puzzling because Python will work fine on other Windows systems which appear to be configured identically. The problem may be caused by a misconfiguration of virus checking software on the problem machine. Some virus scanners have been known to introduce startup overhead of two orders of magnitude when the scanner is configured to monitor all reads from the filesystem. Try checking the configuration of virus scanning software on your systems to ensure that they are indeed configured identically. McAfee, when configured to scan all file system read activity, is a particular offender. How do I make an executable from a Python script? ================================================= See How can I create a stand-alone binary from a Python script? for a list of tools that can be used to make executables. Is a "*.pyd" file the same as a DLL? ==================================== Yes, .pyd files are dll’s, but there are a few differences. If you have a DLL named "foo.pyd", then it must have a function "PyInit_foo()". You can then write Python “import foo”, and Python will search for foo.pyd (as well as foo.py, foo.pyc) and if it finds it, will attempt to call "PyInit_foo()" to initialize it. You do not link your .exe with foo.lib, as that would cause Windows to require the DLL to be present. Note that the search path for foo.pyd is PYTHONPATH, not the same as the path that Windows uses to search for foo.dll. Also, foo.pyd need not be present to run your program, whereas if you linked your program with a dll, the dll is required. Of course, foo.pyd is required if you want to say "import foo". In a DLL, linkage is declared in the source code with "__declspec(dllexport)". In a .pyd, linkage is defined in a list of available functions. How can I embed Python into a Windows application? ================================================== Embedding the Python interpreter in a Windows app can be summarized as follows: 1. Do _not_ build Python into your .exe file directly. On Windows, Python must be a DLL to handle importing modules that are themselves DLL’s. (This is the first key undocumented fact.) Instead, link to "python*NN*.dll"; it is typically installed in "C:\Windows\System". *NN* is the Python version, a number such as “33” for Python 3.3. You can link to Python in two different ways. Load-time linking means linking against "python*NN*.lib", while run-time linking means linking against "python*NN*.dll". (General note: "python*NN*.lib" is the so-called “import lib” corresponding to "python*NN*.dll". It merely defines symbols for the linker.) Run-time linking greatly simplifies link options; everything happens at run time. Your code must load "python*NN*.dll" using the Windows "LoadLibraryEx()" routine. The code must also use access routines and data in "python*NN*.dll" (that is, Python’s C API’s) using pointers obtained by the Windows "GetProcAddress()" routine. Macros can make using these pointers transparent to any C code that calls routines in Python’s C API. 2. If you use SWIG, it is easy to create a Python “extension module” that will make the app’s data and methods available to Python. SWIG will handle just about all the grungy details for you. The result is C code that you link *into* your .exe file (!) You do _not_ have to create a DLL file, and this also simplifies linking. 3. SWIG will create an init function (a C function) whose name depends on the name of the extension module. For example, if the name of the module is leo, the init function will be called initleo(). If you use SWIG shadow classes, as you should, the init function will be called initleoc(). This initializes a mostly hidden helper class used by the shadow class. The reason you can link the C code in step 2 into your .exe file is that calling the initialization function is equivalent to importing the module into Python! (This is the second key undocumented fact.) 4. In short, you can use the following code to initialize the Python interpreter with your extension module. #include "python.h" ... Py_Initialize(); // Initialize Python. initmyAppc(); // Initialize (import) the helper class. PyRun_SimpleString("import myApp"); // Import the shadow class. 5. There are two problems with Python’s C API which will become apparent if you use a compiler other than MSVC, the compiler used to build pythonNN.dll. Problem 1: The so-called “Very High Level” functions that take FILE * arguments will not work in a multi-compiler environment because each compiler’s notion of a struct FILE will be different. From an implementation standpoint these are very _low_ level functions. Problem 2: SWIG generates the following code when generating wrappers to void functions: Py_INCREF(Py_None); _resultobj = Py_None; return _resultobj; Alas, Py_None is a macro that expands to a reference to a complex data structure called _Py_NoneStruct inside pythonNN.dll. Again, this code will fail in a mult-compiler environment. Replace such code by: return Py_BuildValue(""); It may be possible to use SWIG’s "%typemap" command to make the change automatically, though I have not been able to get this to work (I’m a complete SWIG newbie). 6. Using a Python shell script to put up a Python interpreter window from inside your Windows app is not a good idea; the resulting window will be independent of your app’s windowing system. Rather, you (or the wxPythonWindow class) should create a “native” interpreter window. It is easy to connect that window to the Python interpreter. You can redirect Python’s i/o to _any_ object that supports read and write, so all you need is a Python object (defined in your extension module) that contains read() and write() methods. How do I keep editors from inserting tabs into my Python source? ================================================================ The FAQ does not recommend using tabs, and the Python style guide, **PEP 8**, recommends 4 spaces for distributed Python code; this is also the Emacs python-mode default. Under any editor, mixing tabs and spaces is a bad idea. MSVC is no different in this respect, and is easily configured to use spaces: Take Tools ‣ Options ‣ Tabs, and for file type “Default” set “Tab size” and “Indent size” to 4, and select the “Insert spaces” radio button. Python raises "IndentationError" or "TabError" if mixed tabs and spaces are causing problems in leading whitespace. You may also run the "tabnanny" module to check a directory tree in batch mode. How do I check for a keypress without blocking? =============================================== Use the "msvcrt" module. This is a standard Windows-specific extension module. It defines a function "kbhit()" which checks whether a keyboard hit is present, and "getch()" which gets one character without echoing it. Glossary ******** ">>>" The default Python prompt of the interactive shell. Often seen for code examples which can be executed interactively in the interpreter. "..." Can refer to: * The default Python prompt of the interactive shell when entering the code for an indented code block, when within a pair of matching left and right delimiters (parentheses, square brackets, curly braces or triple quotes), or after specifying a decorator. * The "Ellipsis" built-in constant. 2to3 A tool that tries to convert Python 2.x code to Python 3.x code by handling most of the incompatibilities which can be detected by parsing the source and traversing the parse tree. 2to3 is available in the standard library as "lib2to3"; a standalone entry point is provided as "Tools/scripts/2to3". See 2to3 - Automated Python 2 to 3 code translation. abstract base class Abstract base classes complement *duck-typing* by providing a way to define interfaces when other techniques like "hasattr()" would be clumsy or subtly wrong (for example with magic methods). ABCs introduce virtual subclasses, which are classes that don’t inherit from a class but are still recognized by "isinstance()" and "issubclass()"; see the "abc" module documentation. Python comes with many built-in ABCs for data structures (in the "collections.abc" module), numbers (in the "numbers" module), streams (in the "io" module), import finders and loaders (in the "importlib.abc" module). You can create your own ABCs with the "abc" module. annotation A label associated with a variable, a class attribute or a function parameter or return value, used by convention as a *type hint*. Annotations of local variables cannot be accessed at runtime, but annotations of global variables, class attributes, and functions are stored in the "__annotations__" special attribute of modules, classes, and functions, respectively. See *variable annotation*, *function annotation*, **PEP 484** and **PEP 526**, which describe this functionality. argument A value passed to a *function* (or *method*) when calling the function. There are two kinds of argument: * *keyword argument*: an argument preceded by an identifier (e.g. "name=") in a function call or passed as a value in a dictionary preceded by "**". For example, "3" and "5" are both keyword arguments in the following calls to "complex()": complex(real=3, imag=5) complex(**{'real': 3, 'imag': 5}) * *positional argument*: an argument that is not a keyword argument. Positional arguments can appear at the beginning of an argument list and/or be passed as elements of an *iterable* preceded by "*". For example, "3" and "5" are both positional arguments in the following calls: complex(3, 5) complex(*(3, 5)) Arguments are assigned to the named local variables in a function body. See the Calls section for the rules governing this assignment. Syntactically, any expression can be used to represent an argument; the evaluated value is assigned to the local variable. See also the *parameter* glossary entry, the FAQ question on the difference between arguments and parameters, and **PEP 362**. asynchronous context manager An object which controls the environment seen in an "async with" statement by defining "__aenter__()" and "__aexit__()" methods. Introduced by **PEP 492**. asynchronous generator A function which returns an *asynchronous generator iterator*. It looks like a coroutine function defined with "async def" except that it contains "yield" expressions for producing a series of values usable in an "async for" loop. Usually refers to an asynchronous generator function, but may refer to an *asynchronous generator iterator* in some contexts. In cases where the intended meaning isn’t clear, using the full terms avoids ambiguity. An asynchronous generator function may contain "await" expressions as well as "async for", and "async with" statements. asynchronous generator iterator An object created by a *asynchronous generator* function. This is an *asynchronous iterator* which when called using the "__anext__()" method returns an awaitable object which will execute the body of the asynchronous generator function until the next "yield" expression. Each "yield" temporarily suspends processing, remembering the location execution state (including local variables and pending try-statements). When the *asynchronous generator iterator* effectively resumes with another awaitable returned by "__anext__()", it picks up where it left off. See **PEP 492** and **PEP 525**. asynchronous iterable An object, that can be used in an "async for" statement. Must return an *asynchronous iterator* from its "__aiter__()" method. Introduced by **PEP 492**. asynchronous iterator An object that implements the "__aiter__()" and "__anext__()" methods. "__anext__" must return an *awaitable* object. "async for" resolves the awaitables returned by an asynchronous iterator’s "__anext__()" method until it raises a "StopAsyncIteration" exception. Introduced by **PEP 492**. attribute A value associated with an object which is referenced by name using dotted expressions. For example, if an object *o* has an attribute *a* it would be referenced as *o.a*. awaitable An object that can be used in an "await" expression. Can be a *coroutine* or an object with an "__await__()" method. See also **PEP 492**. BDFL Benevolent Dictator For Life, a.k.a. Guido van Rossum, Python’s creator. binary file A *file object* able to read and write *bytes-like objects*. Examples of binary files are files opened in binary mode ("'rb'", "'wb'" or "'rb+'"), "sys.stdin.buffer", "sys.stdout.buffer", and instances of "io.BytesIO" and "gzip.GzipFile". See also *text file* for a file object able to read and write "str" objects. bytes-like object An object that supports the Buffer Protocol and can export a C-*contiguous* buffer. This includes all "bytes", "bytearray", and "array.array" objects, as well as many common "memoryview" objects. Bytes-like objects can be used for various operations that work with binary data; these include compression, saving to a binary file, and sending over a socket. Some operations need the binary data to be mutable. The documentation often refers to these as “read-write bytes-like objects”. Example mutable buffer objects include "bytearray" and a "memoryview" of a "bytearray". Other operations require the binary data to be stored in immutable objects (“read-only bytes-like objects”); examples of these include "bytes" and a "memoryview" of a "bytes" object. bytecode Python source code is compiled into bytecode, the internal representation of a Python program in the CPython interpreter. The bytecode is also cached in ".pyc" files so that executing the same file is faster the second time (recompilation from source to bytecode can be avoided). This “intermediate language” is said to run on a *virtual machine* that executes the machine code corresponding to each bytecode. Do note that bytecodes are not expected to work between different Python virtual machines, nor to be stable between Python releases. A list of bytecode instructions can be found in the documentation for the dis module. callback A subroutine function which is passed as an argument to be executed at some point in the future. class A template for creating user-defined objects. Class definitions normally contain method definitions which operate on instances of the class. class variable A variable defined in a class and intended to be modified only at class level (i.e., not in an instance of the class). coercion The implicit conversion of an instance of one type to another during an operation which involves two arguments of the same type. For example, "int(3.15)" converts the floating point number to the integer "3", but in "3+4.5", each argument is of a different type (one int, one float), and both must be converted to the same type before they can be added or it will raise a "TypeError". Without coercion, all arguments of even compatible types would have to be normalized to the same value by the programmer, e.g., "float(3)+4.5" rather than just "3+4.5". complex number An extension of the familiar real number system in which all numbers are expressed as a sum of a real part and an imaginary part. Imaginary numbers are real multiples of the imaginary unit (the square root of "-1"), often written "i" in mathematics or "j" in engineering. Python has built-in support for complex numbers, which are written with this latter notation; the imaginary part is written with a "j" suffix, e.g., "3+1j". To get access to complex equivalents of the "math" module, use "cmath". Use of complex numbers is a fairly advanced mathematical feature. If you’re not aware of a need for them, it’s almost certain you can safely ignore them. context manager An object which controls the environment seen in a "with" statement by defining "__enter__()" and "__exit__()" methods. See **PEP 343**. context variable A variable which can have different values depending on its context. This is similar to Thread-Local Storage in which each execution thread may have a different value for a variable. However, with context variables, there may be several contexts in one execution thread and the main usage for context variables is to keep track of variables in concurrent asynchronous tasks. See "contextvars". contiguous A buffer is considered contiguous exactly if it is either *C-contiguous* or *Fortran contiguous*. Zero-dimensional buffers are C and Fortran contiguous. In one-dimensional arrays, the items must be laid out in memory next to each other, in order of increasing indexes starting from zero. In multidimensional C-contiguous arrays, the last index varies the fastest when visiting items in order of memory address. However, in Fortran contiguous arrays, the first index varies the fastest. coroutine Coroutines are a more generalized form of subroutines. Subroutines are entered at one point and exited at another point. Coroutines can be entered, exited, and resumed at many different points. They can be implemented with the "async def" statement. See also **PEP 492**. coroutine function A function which returns a *coroutine* object. A coroutine function may be defined with the "async def" statement, and may contain "await", "async for", and "async with" keywords. These were introduced by **PEP 492**. CPython The canonical implementation of the Python programming language, as distributed on python.org. The term “CPython” is used when necessary to distinguish this implementation from others such as Jython or IronPython. decorator A function returning another function, usually applied as a function transformation using the "@wrapper" syntax. Common examples for decorators are "classmethod()" and "staticmethod()". The decorator syntax is merely syntactic sugar, the following two function definitions are semantically equivalent: def f(arg): ... f = staticmethod(f) @staticmethod def f(arg): ... The same concept exists for classes, but is less commonly used there. See the documentation for function definitions and class definitions for more about decorators. descriptor Any object which defines the methods "__get__()", "__set__()", or "__delete__()". When a class attribute is a descriptor, its special binding behavior is triggered upon attribute lookup. Normally, using *a.b* to get, set or delete an attribute looks up the object named *b* in the class dictionary for *a*, but if *b* is a descriptor, the respective descriptor method gets called. Understanding descriptors is a key to a deep understanding of Python because they are the basis for many features including functions, methods, properties, class methods, static methods, and reference to super classes. For more information about descriptors’ methods, see Implementing Descriptors or the Descriptor How To Guide. dictionary An associative array, where arbitrary keys are mapped to values. The keys can be any object with "__hash__()" and "__eq__()" methods. Called a hash in Perl. dictionary comprehension A compact way to process all or part of the elements in an iterable and return a dictionary with the results. "results = {n: n ** 2 for n in range(10)}" generates a dictionary containing key "n" mapped to value "n ** 2". See Displays for lists, sets and dictionaries. dictionary view The objects returned from "dict.keys()", "dict.values()", and "dict.items()" are called dictionary views. They provide a dynamic view on the dictionary’s entries, which means that when the dictionary changes, the view reflects these changes. To force the dictionary view to become a full list use "list(dictview)". See Dictionary view objects. docstring A string literal which appears as the first expression in a class, function or module. While ignored when the suite is executed, it is recognized by the compiler and put into the "__doc__" attribute of the enclosing class, function or module. Since it is available via introspection, it is the canonical place for documentation of the object. duck-typing A programming style which does not look at an object’s type to determine if it has the right interface; instead, the method or attribute is simply called or used (“If it looks like a duck and quacks like a duck, it must be a duck.”) By emphasizing interfaces rather than specific types, well-designed code improves its flexibility by allowing polymorphic substitution. Duck-typing avoids tests using "type()" or "isinstance()". (Note, however, that duck-typing can be complemented with *abstract base classes*.) Instead, it typically employs "hasattr()" tests or *EAFP* programming. EAFP Easier to ask for forgiveness than permission. This common Python coding style assumes the existence of valid keys or attributes and catches exceptions if the assumption proves false. This clean and fast style is characterized by the presence of many "try" and "except" statements. The technique contrasts with the *LBYL* style common to many other languages such as C. expression A piece of syntax which can be evaluated to some value. In other words, an expression is an accumulation of expression elements like literals, names, attribute access, operators or function calls which all return a value. In contrast to many other languages, not all language constructs are expressions. There are also *statement*s which cannot be used as expressions, such as "while". Assignments are also statements, not expressions. extension module A module written in C or C++, using Python’s C API to interact with the core and with user code. f-string String literals prefixed with "'f'" or "'F'" are commonly called “f-strings” which is short for formatted string literals. See also **PEP 498**. file object An object exposing a file-oriented API (with methods such as "read()" or "write()") to an underlying resource. Depending on the way it was created, a file object can mediate access to a real on- disk file or to another type of storage or communication device (for example standard input/output, in-memory buffers, sockets, pipes, etc.). File objects are also called *file-like objects* or *streams*. There are actually three categories of file objects: raw *binary files*, buffered *binary files* and *text files*. Their interfaces are defined in the "io" module. The canonical way to create a file object is by using the "open()" function. file-like object A synonym for *file object*. finder An object that tries to find the *loader* for a module that is being imported. Since Python 3.3, there are two types of finder: *meta path finders* for use with "sys.meta_path", and *path entry finders* for use with "sys.path_hooks". See **PEP 302**, **PEP 420** and **PEP 451** for much more detail. floor division Mathematical division that rounds down to nearest integer. The floor division operator is "//". For example, the expression "11 // 4" evaluates to "2" in contrast to the "2.75" returned by float true division. Note that "(-11) // 4" is "-3" because that is "-2.75" rounded *downward*. See **PEP 238**. function A series of statements which returns some value to a caller. It can also be passed zero or more *arguments* which may be used in the execution of the body. See also *parameter*, *method*, and the Function definitions section. function annotation An *annotation* of a function parameter or return value. Function annotations are usually used for *type hints*: for example, this function is expected to take two "int" arguments and is also expected to have an "int" return value: def sum_two_numbers(a: int, b: int) -> int: return a + b Function annotation syntax is explained in section Function definitions. See *variable annotation* and **PEP 484**, which describe this functionality. __future__ A future statement, "from __future__ import ", directs the compiler to compile the current module using syntax or semantics that will become standard in a future release of Python. The "__future__" module documents the possible values of *feature*. By importing this module and evaluating its variables, you can see when a new feature was first added to the language and when it will (or did) become the default: >>> import __future__ >>> __future__.division _Feature((2, 2, 0, 'alpha', 2), (3, 0, 0, 'alpha', 0), 8192) garbage collection The process of freeing memory when it is not used anymore. Python performs garbage collection via reference counting and a cyclic garbage collector that is able to detect and break reference cycles. The garbage collector can be controlled using the "gc" module. generator A function which returns a *generator iterator*. It looks like a normal function except that it contains "yield" expressions for producing a series of values usable in a for-loop or that can be retrieved one at a time with the "next()" function. Usually refers to a generator function, but may refer to a *generator iterator* in some contexts. In cases where the intended meaning isn’t clear, using the full terms avoids ambiguity. generator iterator An object created by a *generator* function. Each "yield" temporarily suspends processing, remembering the location execution state (including local variables and pending try-statements). When the *generator iterator* resumes, it picks up where it left off (in contrast to functions which start fresh on every invocation). generator expression An expression that returns an iterator. It looks like a normal expression followed by a "for" clause defining a loop variable, range, and an optional "if" clause. The combined expression generates values for an enclosing function: >>> sum(i*i for i in range(10)) # sum of squares 0, 1, 4, ... 81 285 generic function A function composed of multiple functions implementing the same operation for different types. Which implementation should be used during a call is determined by the dispatch algorithm. See also the *single dispatch* glossary entry, the "functools.singledispatch()" decorator, and **PEP 443**. generic type A *type* that can be parameterized; typically a container class such as "list" or "dict". Used for *type hints* and *annotations*. For more details, see generic alias types, **PEP 483**, **PEP 484**, **PEP 585**, and the "typing" module. GIL See *global interpreter lock*. global interpreter lock The mechanism used by the *CPython* interpreter to assure that only one thread executes Python *bytecode* at a time. This simplifies the CPython implementation by making the object model (including critical built-in types such as "dict") implicitly safe against concurrent access. Locking the entire interpreter makes it easier for the interpreter to be multi-threaded, at the expense of much of the parallelism afforded by multi-processor machines. However, some extension modules, either standard or third-party, are designed so as to release the GIL when doing computationally- intensive tasks such as compression or hashing. Also, the GIL is always released when doing I/O. Past efforts to create a “free-threaded” interpreter (one which locks shared data at a much finer granularity) have not been successful because performance suffered in the common single- processor case. It is believed that overcoming this performance issue would make the implementation much more complicated and therefore costlier to maintain. hash-based pyc A bytecode cache file that uses the hash rather than the last- modified time of the corresponding source file to determine its validity. See Cached bytecode invalidation. hashable An object is *hashable* if it has a hash value which never changes during its lifetime (it needs a "__hash__()" method), and can be compared to other objects (it needs an "__eq__()" method). Hashable objects which compare equal must have the same hash value. Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally. Most of Python’s immutable built-in objects are hashable; mutable containers (such as lists or dictionaries) are not; immutable containers (such as tuples and frozensets) are only hashable if their elements are hashable. Objects which are instances of user- defined classes are hashable by default. They all compare unequal (except with themselves), and their hash value is derived from their "id()". IDLE An Integrated Development Environment for Python. IDLE is a basic editor and interpreter environment which ships with the standard distribution of Python. immutable An object with a fixed value. Immutable objects include numbers, strings and tuples. Such an object cannot be altered. A new object has to be created if a different value has to be stored. They play an important role in places where a constant hash value is needed, for example as a key in a dictionary. import path A list of locations (or *path entries*) that are searched by the *path based finder* for modules to import. During import, this list of locations usually comes from "sys.path", but for subpackages it may also come from the parent package’s "__path__" attribute. importing The process by which Python code in one module is made available to Python code in another module. importer An object that both finds and loads a module; both a *finder* and *loader* object. interactive Python has an interactive interpreter which means you can enter statements and expressions at the interpreter prompt, immediately execute them and see their results. Just launch "python" with no arguments (possibly by selecting it from your computer’s main menu). It is a very powerful way to test out new ideas or inspect modules and packages (remember "help(x)"). interpreted Python is an interpreted language, as opposed to a compiled one, though the distinction can be blurry because of the presence of the bytecode compiler. This means that source files can be run directly without explicitly creating an executable which is then run. Interpreted languages typically have a shorter development/debug cycle than compiled ones, though their programs generally also run more slowly. See also *interactive*. interpreter shutdown When asked to shut down, the Python interpreter enters a special phase where it gradually releases all allocated resources, such as modules and various critical internal structures. It also makes several calls to the *garbage collector*. This can trigger the execution of code in user-defined destructors or weakref callbacks. Code executed during the shutdown phase can encounter various exceptions as the resources it relies on may not function anymore (common examples are library modules or the warnings machinery). The main reason for interpreter shutdown is that the "__main__" module or the script being run has finished executing. iterable An object capable of returning its members one at a time. Examples of iterables include all sequence types (such as "list", "str", and "tuple") and some non-sequence types like "dict", *file objects*, and objects of any classes you define with an "__iter__()" method or with a "__getitem__()" method that implements *Sequence* semantics. Iterables can be used in a "for" loop and in many other places where a sequence is needed ("zip()", "map()", …). When an iterable object is passed as an argument to the built-in function "iter()", it returns an iterator for the object. This iterator is good for one pass over the set of values. When using iterables, it is usually not necessary to call "iter()" or deal with iterator objects yourself. The "for" statement does that automatically for you, creating a temporary unnamed variable to hold the iterator for the duration of the loop. See also *iterator*, *sequence*, and *generator*. iterator An object representing a stream of data. Repeated calls to the iterator’s "__next__()" method (or passing it to the built-in function "next()") return successive items in the stream. When no more data are available a "StopIteration" exception is raised instead. At this point, the iterator object is exhausted and any further calls to its "__next__()" method just raise "StopIteration" again. Iterators are required to have an "__iter__()" method that returns the iterator object itself so every iterator is also iterable and may be used in most places where other iterables are accepted. One notable exception is code which attempts multiple iteration passes. A container object (such as a "list") produces a fresh new iterator each time you pass it to the "iter()" function or use it in a "for" loop. Attempting this with an iterator will just return the same exhausted iterator object used in the previous iteration pass, making it appear like an empty container. More information can be found in Iterator Types. key function A key function or collation function is a callable that returns a value used for sorting or ordering. For example, "locale.strxfrm()" is used to produce a sort key that is aware of locale specific sort conventions. A number of tools in Python accept key functions to control how elements are ordered or grouped. They include "min()", "max()", "sorted()", "list.sort()", "heapq.merge()", "heapq.nsmallest()", "heapq.nlargest()", and "itertools.groupby()". There are several ways to create a key function. For example. the "str.lower()" method can serve as a key function for case insensitive sorts. Alternatively, a key function can be built from a "lambda" expression such as "lambda r: (r[0], r[2])". Also, the "operator" module provides three key function constructors: "attrgetter()", "itemgetter()", and "methodcaller()". See the Sorting HOW TO for examples of how to create and use key functions. keyword argument See *argument*. lambda An anonymous inline function consisting of a single *expression* which is evaluated when the function is called. The syntax to create a lambda function is "lambda [parameters]: expression" LBYL Look before you leap. This coding style explicitly tests for pre- conditions before making calls or lookups. This style contrasts with the *EAFP* approach and is characterized by the presence of many "if" statements. In a multi-threaded environment, the LBYL approach can risk introducing a race condition between “the looking” and “the leaping”. For example, the code, "if key in mapping: return mapping[key]" can fail if another thread removes *key* from *mapping* after the test, but before the lookup. This issue can be solved with locks or by using the EAFP approach. list A built-in Python *sequence*. Despite its name it is more akin to an array in other languages than to a linked list since access to elements is O(1). list comprehension A compact way to process all or part of the elements in a sequence and return a list with the results. "result = ['{:#04x}'.format(x) for x in range(256) if x % 2 == 0]" generates a list of strings containing even hex numbers (0x..) in the range from 0 to 255. The "if" clause is optional. If omitted, all elements in "range(256)" are processed. loader An object that loads a module. It must define a method named "load_module()". A loader is typically returned by a *finder*. See **PEP 302** for details and "importlib.abc.Loader" for an *abstract base class*. magic method An informal synonym for *special method*. mapping A container object that supports arbitrary key lookups and implements the methods specified in the "Mapping" or "MutableMapping" abstract base classes. Examples include "dict", "collections.defaultdict", "collections.OrderedDict" and "collections.Counter". meta path finder A *finder* returned by a search of "sys.meta_path". Meta path finders are related to, but different from *path entry finders*. See "importlib.abc.MetaPathFinder" for the methods that meta path finders implement. metaclass The class of a class. Class definitions create a class name, a class dictionary, and a list of base classes. The metaclass is responsible for taking those three arguments and creating the class. Most object oriented programming languages provide a default implementation. What makes Python special is that it is possible to create custom metaclasses. Most users never need this tool, but when the need arises, metaclasses can provide powerful, elegant solutions. They have been used for logging attribute access, adding thread-safety, tracking object creation, implementing singletons, and many other tasks. More information can be found in Metaclasses. method A function which is defined inside a class body. If called as an attribute of an instance of that class, the method will get the instance object as its first *argument* (which is usually called "self"). See *function* and *nested scope*. method resolution order Method Resolution Order is the order in which base classes are searched for a member during lookup. See The Python 2.3 Method Resolution Order for details of the algorithm used by the Python interpreter since the 2.3 release. module An object that serves as an organizational unit of Python code. Modules have a namespace containing arbitrary Python objects. Modules are loaded into Python by the process of *importing*. See also *package*. module spec A namespace containing the import-related information used to load a module. An instance of "importlib.machinery.ModuleSpec". MRO See *method resolution order*. mutable Mutable objects can change their value but keep their "id()". See also *immutable*. named tuple The term “named tuple” applies to any type or class that inherits from tuple and whose indexable elements are also accessible using named attributes. The type or class may have other features as well. Several built-in types are named tuples, including the values returned by "time.localtime()" and "os.stat()". Another example is "sys.float_info": >>> sys.float_info[1] # indexed access 1024 >>> sys.float_info.max_exp # named field access 1024 >>> isinstance(sys.float_info, tuple) # kind of tuple True Some named tuples are built-in types (such as the above examples). Alternatively, a named tuple can be created from a regular class definition that inherits from "tuple" and that defines named fields. Such a class can be written by hand or it can be created with the factory function "collections.namedtuple()". The latter technique also adds some extra methods that may not be found in hand-written or built-in named tuples. namespace The place where a variable is stored. Namespaces are implemented as dictionaries. There are the local, global and built-in namespaces as well as nested namespaces in objects (in methods). Namespaces support modularity by preventing naming conflicts. For instance, the functions "builtins.open" and "os.open()" are distinguished by their namespaces. Namespaces also aid readability and maintainability by making it clear which module implements a function. For instance, writing "random.seed()" or "itertools.islice()" makes it clear that those functions are implemented by the "random" and "itertools" modules, respectively. namespace package A **PEP 420** *package* which serves only as a container for subpackages. Namespace packages may have no physical representation, and specifically are not like a *regular package* because they have no "__init__.py" file. See also *module*. nested scope The ability to refer to a variable in an enclosing definition. For instance, a function defined inside another function can refer to variables in the outer function. Note that nested scopes by default work only for reference and not for assignment. Local variables both read and write in the innermost scope. Likewise, global variables read and write to the global namespace. The "nonlocal" allows writing to outer scopes. new-style class Old name for the flavor of classes now used for all class objects. In earlier Python versions, only new-style classes could use Python’s newer, versatile features like "__slots__", descriptors, properties, "__getattribute__()", class methods, and static methods. object Any data with state (attributes or value) and defined behavior (methods). Also the ultimate base class of any *new-style class*. package A Python *module* which can contain submodules or recursively, subpackages. Technically, a package is a Python module with an "__path__" attribute. See also *regular package* and *namespace package*. parameter A named entity in a *function* (or method) definition that specifies an *argument* (or in some cases, arguments) that the function can accept. There are five kinds of parameter: * *positional-or-keyword*: specifies an argument that can be passed either *positionally* or as a *keyword argument*. This is the default kind of parameter, for example *foo* and *bar* in the following: def func(foo, bar=None): ... * *positional-only*: specifies an argument that can be supplied only by position. Positional-only parameters can be defined by including a "/" character in the parameter list of the function definition after them, for example *posonly1* and *posonly2* in the following: def func(posonly1, posonly2, /, positional_or_keyword): ... * *keyword-only*: specifies an argument that can be supplied only by keyword. Keyword-only parameters can be defined by including a single var-positional parameter or bare "*" in the parameter list of the function definition before them, for example *kw_only1* and *kw_only2* in the following: def func(arg, *, kw_only1, kw_only2): ... * *var-positional*: specifies that an arbitrary sequence of positional arguments can be provided (in addition to any positional arguments already accepted by other parameters). Such a parameter can be defined by prepending the parameter name with "*", for example *args* in the following: def func(*args, **kwargs): ... * *var-keyword*: specifies that arbitrarily many keyword arguments can be provided (in addition to any keyword arguments already accepted by other parameters). Such a parameter can be defined by prepending the parameter name with "**", for example *kwargs* in the example above. Parameters can specify both optional and required arguments, as well as default values for some optional arguments. See also the *argument* glossary entry, the FAQ question on the difference between arguments and parameters, the "inspect.Parameter" class, the Function definitions section, and **PEP 362**. path entry A single location on the *import path* which the *path based finder* consults to find modules for importing. path entry finder A *finder* returned by a callable on "sys.path_hooks" (i.e. a *path entry hook*) which knows how to locate modules given a *path entry*. See "importlib.abc.PathEntryFinder" for the methods that path entry finders implement. path entry hook A callable on the "sys.path_hook" list which returns a *path entry finder* if it knows how to find modules on a specific *path entry*. path based finder One of the default *meta path finders* which searches an *import path* for modules. path-like object An object representing a file system path. A path-like object is either a "str" or "bytes" object representing a path, or an object implementing the "os.PathLike" protocol. An object that supports the "os.PathLike" protocol can be converted to a "str" or "bytes" file system path by calling the "os.fspath()" function; "os.fsdecode()" and "os.fsencode()" can be used to guarantee a "str" or "bytes" result instead, respectively. Introduced by **PEP 519**. PEP Python Enhancement Proposal. A PEP is a design document providing information to the Python community, or describing a new feature for Python or its processes or environment. PEPs should provide a concise technical specification and a rationale for proposed features. PEPs are intended to be the primary mechanisms for proposing major new features, for collecting community input on an issue, and for documenting the design decisions that have gone into Python. The PEP author is responsible for building consensus within the community and documenting dissenting opinions. See **PEP 1**. portion A set of files in a single directory (possibly stored in a zip file) that contribute to a namespace package, as defined in **PEP 420**. positional argument See *argument*. provisional API A provisional API is one which has been deliberately excluded from the standard library’s backwards compatibility guarantees. While major changes to such interfaces are not expected, as long as they are marked provisional, backwards incompatible changes (up to and including removal of the interface) may occur if deemed necessary by core developers. Such changes will not be made gratuitously – they will occur only if serious fundamental flaws are uncovered that were missed prior to the inclusion of the API. Even for provisional APIs, backwards incompatible changes are seen as a “solution of last resort” - every attempt will still be made to find a backwards compatible resolution to any identified problems. This process allows the standard library to continue to evolve over time, without locking in problematic design errors for extended periods of time. See **PEP 411** for more details. provisional package See *provisional API*. Python 3000 Nickname for the Python 3.x release line (coined long ago when the release of version 3 was something in the distant future.) This is also abbreviated “Py3k”. Pythonic An idea or piece of code which closely follows the most common idioms of the Python language, rather than implementing code using concepts common to other languages. For example, a common idiom in Python is to loop over all elements of an iterable using a "for" statement. Many other languages don’t have this type of construct, so people unfamiliar with Python sometimes use a numerical counter instead: for i in range(len(food)): print(food[i]) As opposed to the cleaner, Pythonic method: for piece in food: print(piece) qualified name A dotted name showing the “path” from a module’s global scope to a class, function or method defined in that module, as defined in **PEP 3155**. For top-level functions and classes, the qualified name is the same as the object’s name: >>> class C: ... class D: ... def meth(self): ... pass ... >>> C.__qualname__ 'C' >>> C.D.__qualname__ 'C.D' >>> C.D.meth.__qualname__ 'C.D.meth' When used to refer to modules, the *fully qualified name* means the entire dotted path to the module, including any parent packages, e.g. "email.mime.text": >>> import email.mime.text >>> email.mime.text.__name__ 'email.mime.text' reference count The number of references to an object. When the reference count of an object drops to zero, it is deallocated. Reference counting is generally not visible to Python code, but it is a key element of the *CPython* implementation. The "sys" module defines a "getrefcount()" function that programmers can call to return the reference count for a particular object. regular package A traditional *package*, such as a directory containing an "__init__.py" file. See also *namespace package*. __slots__ A declaration inside a class that saves memory by pre-declaring space for instance attributes and eliminating instance dictionaries. Though popular, the technique is somewhat tricky to get right and is best reserved for rare cases where there are large numbers of instances in a memory-critical application. sequence An *iterable* which supports efficient element access using integer indices via the "__getitem__()" special method and defines a "__len__()" method that returns the length of the sequence. Some built-in sequence types are "list", "str", "tuple", and "bytes". Note that "dict" also supports "__getitem__()" and "__len__()", but is considered a mapping rather than a sequence because the lookups use arbitrary *immutable* keys rather than integers. The "collections.abc.Sequence" abstract base class defines a much richer interface that goes beyond just "__getitem__()" and "__len__()", adding "count()", "index()", "__contains__()", and "__reversed__()". Types that implement this expanded interface can be registered explicitly using "register()". set comprehension A compact way to process all or part of the elements in an iterable and return a set with the results. "results = {c for c in 'abracadabra' if c not in 'abc'}" generates the set of strings "{'r', 'd'}". See Displays for lists, sets and dictionaries. single dispatch A form of *generic function* dispatch where the implementation is chosen based on the type of a single argument. slice An object usually containing a portion of a *sequence*. A slice is created using the subscript notation, "[]" with colons between numbers when several are given, such as in "variable_name[1:3:5]". The bracket (subscript) notation uses "slice" objects internally. special method A method that is called implicitly by Python to execute a certain operation on a type, such as addition. Such methods have names starting and ending with double underscores. Special methods are documented in Special method names. statement A statement is part of a suite (a “block” of code). A statement is either an *expression* or one of several constructs with a keyword, such as "if", "while" or "for". text encoding A string in Python is a sequence of Unicode code points (in range "U+0000"–"U+10FFFF"). To store or transfer a string, it needs to be serialized as a sequence of bytes. Serializing a string into a sequence of bytes is known as “encoding”, and recreating the string from the sequence of bytes is known as “decoding”. There are a variety of different text serialization codecs, which are collectively referred to as “text encodings”. text file A *file object* able to read and write "str" objects. Often, a text file actually accesses a byte-oriented datastream and handles the *text encoding* automatically. Examples of text files are files opened in text mode ("'r'" or "'w'"), "sys.stdin", "sys.stdout", and instances of "io.StringIO". See also *binary file* for a file object able to read and write *bytes-like objects*. triple-quoted string A string which is bound by three instances of either a quotation mark (”) or an apostrophe (‘). While they don’t provide any functionality not available with single-quoted strings, they are useful for a number of reasons. They allow you to include unescaped single and double quotes within a string and they can span multiple lines without the use of the continuation character, making them especially useful when writing docstrings. type The type of a Python object determines what kind of object it is; every object has a type. An object’s type is accessible as its "__class__" attribute or can be retrieved with "type(obj)". type alias A synonym for a type, created by assigning the type to an identifier. Type aliases are useful for simplifying *type hints*. For example: def remove_gray_shades( colors: list[tuple[int, int, int]]) -> list[tuple[int, int, int]]: pass could be made more readable like this: Color = tuple[int, int, int] def remove_gray_shades(colors: list[Color]) -> list[Color]: pass See "typing" and **PEP 484**, which describe this functionality. type hint An *annotation* that specifies the expected type for a variable, a class attribute, or a function parameter or return value. Type hints are optional and are not enforced by Python but they are useful to static type analysis tools, and aid IDEs with code completion and refactoring. Type hints of global variables, class attributes, and functions, but not local variables, can be accessed using "typing.get_type_hints()". See "typing" and **PEP 484**, which describe this functionality. universal newlines A manner of interpreting text streams in which all of the following are recognized as ending a line: the Unix end-of-line convention "'\n'", the Windows convention "'\r\n'", and the old Macintosh convention "'\r'". See **PEP 278** and **PEP 3116**, as well as "bytes.splitlines()" for an additional use. variable annotation An *annotation* of a variable or a class attribute. When annotating a variable or a class attribute, assignment is optional: class C: field: 'annotation' Variable annotations are usually used for *type hints*: for example this variable is expected to take "int" values: count: int = 0 Variable annotation syntax is explained in section Annotated assignment statements. See *function annotation*, **PEP 484** and **PEP 526**, which describe this functionality. virtual environment A cooperatively isolated runtime environment that allows Python users and applications to install and upgrade Python distribution packages without interfering with the behaviour of other Python applications running on the same system. See also "venv". virtual machine A computer defined entirely in software. Python’s virtual machine executes the *bytecode* emitted by the bytecode compiler. Zen of Python Listing of Python design principles and philosophies that are helpful in understanding and using the language. The listing can be found by typing “"import this"” at the interactive prompt. Argparse Tutorial ***************** author: Tshepang Lekhonkhobe This tutorial is intended to be a gentle introduction to "argparse", the recommended command-line parsing module in the Python standard library. Note: There are two other modules that fulfill the same task, namely "getopt" (an equivalent for "getopt()" from the C language) and the deprecated "optparse". Note also that "argparse" is based on "optparse", and therefore very similar in terms of usage. Concepts ======== Let’s show the sort of functionality that we are going to explore in this introductory tutorial by making use of the **ls** command: $ ls cpython devguide prog.py pypy rm-unused-function.patch $ ls pypy ctypes_configure demo dotviewer include lib_pypy lib-python ... $ ls -l total 20 drwxr-xr-x 19 wena wena 4096 Feb 18 18:51 cpython drwxr-xr-x 4 wena wena 4096 Feb 8 12:04 devguide -rwxr-xr-x 1 wena wena 535 Feb 19 00:05 prog.py drwxr-xr-x 14 wena wena 4096 Feb 7 00:59 pypy -rw-r--r-- 1 wena wena 741 Feb 18 01:01 rm-unused-function.patch $ ls --help Usage: ls [OPTION]... [FILE]... List information about the FILEs (the current directory by default). Sort entries alphabetically if none of -cftuvSUX nor --sort is specified. ... A few concepts we can learn from the four commands: * The **ls** command is useful when run without any options at all. It defaults to displaying the contents of the current directory. * If we want beyond what it provides by default, we tell it a bit more. In this case, we want it to display a different directory, "pypy". What we did is specify what is known as a positional argument. It’s named so because the program should know what to do with the value, solely based on where it appears on the command line. This concept is more relevant to a command like **cp**, whose most basic usage is "cp SRC DEST". The first position is *what you want copied,* and the second position is *where you want it copied to*. * Now, say we want to change behaviour of the program. In our example, we display more info for each file instead of just showing the file names. The "-l" in that case is known as an optional argument. * That’s a snippet of the help text. It’s very useful in that you can come across a program you have never used before, and can figure out how it works simply by reading its help text. The basics ========== Let us start with a very simple example which does (almost) nothing: import argparse parser = argparse.ArgumentParser() parser.parse_args() Following is a result of running the code: $ python3 prog.py $ python3 prog.py --help usage: prog.py [-h] optional arguments: -h, --help show this help message and exit $ python3 prog.py --verbose usage: prog.py [-h] prog.py: error: unrecognized arguments: --verbose $ python3 prog.py foo usage: prog.py [-h] prog.py: error: unrecognized arguments: foo Here is what is happening: * Running the script without any options results in nothing displayed to stdout. Not so useful. * The second one starts to display the usefulness of the "argparse" module. We have done almost nothing, but already we get a nice help message. * The "--help" option, which can also be shortened to "-h", is the only option we get for free (i.e. no need to specify it). Specifying anything else results in an error. But even then, we do get a useful usage message, also for free. Introducing Positional arguments ================================ An example: import argparse parser = argparse.ArgumentParser() parser.add_argument("echo") args = parser.parse_args() print(args.echo) And running the code: $ python3 prog.py usage: prog.py [-h] echo prog.py: error: the following arguments are required: echo $ python3 prog.py --help usage: prog.py [-h] echo positional arguments: echo optional arguments: -h, --help show this help message and exit $ python3 prog.py foo foo Here is what’s happening: * We’ve added the "add_argument()" method, which is what we use to specify which command-line options the program is willing to accept. In this case, I’ve named it "echo" so that it’s in line with its function. * Calling our program now requires us to specify an option. * The "parse_args()" method actually returns some data from the options specified, in this case, "echo". * The variable is some form of ‘magic’ that "argparse" performs for free (i.e. no need to specify which variable that value is stored in). You will also notice that its name matches the string argument given to the method, "echo". Note however that, although the help display looks nice and all, it currently is not as helpful as it can be. For example we see that we got "echo" as a positional argument, but we don’t know what it does, other than by guessing or by reading the source code. So, let’s make it a bit more useful: import argparse parser = argparse.ArgumentParser() parser.add_argument("echo", help="echo the string you use here") args = parser.parse_args() print(args.echo) And we get: $ python3 prog.py -h usage: prog.py [-h] echo positional arguments: echo echo the string you use here optional arguments: -h, --help show this help message and exit Now, how about doing something even more useful: import argparse parser = argparse.ArgumentParser() parser.add_argument("square", help="display a square of a given number") args = parser.parse_args() print(args.square**2) Following is a result of running the code: $ python3 prog.py 4 Traceback (most recent call last): File "prog.py", line 5, in print(args.square**2) TypeError: unsupported operand type(s) for ** or pow(): 'str' and 'int' That didn’t go so well. That’s because "argparse" treats the options we give it as strings, unless we tell it otherwise. So, let’s tell "argparse" to treat that input as an integer: import argparse parser = argparse.ArgumentParser() parser.add_argument("square", help="display a square of a given number", type=int) args = parser.parse_args() print(args.square**2) Following is a result of running the code: $ python3 prog.py 4 16 $ python3 prog.py four usage: prog.py [-h] square prog.py: error: argument square: invalid int value: 'four' That went well. The program now even helpfully quits on bad illegal input before proceeding. Introducing Optional arguments ============================== So far we have been playing with positional arguments. Let us have a look on how to add optional ones: import argparse parser = argparse.ArgumentParser() parser.add_argument("--verbosity", help="increase output verbosity") args = parser.parse_args() if args.verbosity: print("verbosity turned on") And the output: $ python3 prog.py --verbosity 1 verbosity turned on $ python3 prog.py $ python3 prog.py --help usage: prog.py [-h] [--verbosity VERBOSITY] optional arguments: -h, --help show this help message and exit --verbosity VERBOSITY increase output verbosity $ python3 prog.py --verbosity usage: prog.py [-h] [--verbosity VERBOSITY] prog.py: error: argument --verbosity: expected one argument Here is what is happening: * The program is written so as to display something when "--verbosity" is specified and display nothing when not. * To show that the option is actually optional, there is no error when running the program without it. Note that by default, if an optional argument isn’t used, the relevant variable, in this case "args.verbosity", is given "None" as a value, which is the reason it fails the truth test of the "if" statement. * The help message is a bit different. * When using the "--verbosity" option, one must also specify some value, any value. The above example accepts arbitrary integer values for "--verbosity", but for our simple program, only two values are actually useful, "True" or "False". Let’s modify the code accordingly: import argparse parser = argparse.ArgumentParser() parser.add_argument("--verbose", help="increase output verbosity", action="store_true") args = parser.parse_args() if args.verbose: print("verbosity turned on") And the output: $ python3 prog.py --verbose verbosity turned on $ python3 prog.py --verbose 1 usage: prog.py [-h] [--verbose] prog.py: error: unrecognized arguments: 1 $ python3 prog.py --help usage: prog.py [-h] [--verbose] optional arguments: -h, --help show this help message and exit --verbose increase output verbosity Here is what is happening: * The option is now more of a flag than something that requires a value. We even changed the name of the option to match that idea. Note that we now specify a new keyword, "action", and give it the value ""store_true"". This means that, if the option is specified, assign the value "True" to "args.verbose". Not specifying it implies "False". * It complains when you specify a value, in true spirit of what flags actually are. * Notice the different help text. Short options ------------- If you are familiar with command line usage, you will notice that I haven’t yet touched on the topic of short versions of the options. It’s quite simple: import argparse parser = argparse.ArgumentParser() parser.add_argument("-v", "--verbose", help="increase output verbosity", action="store_true") args = parser.parse_args() if args.verbose: print("verbosity turned on") And here goes: $ python3 prog.py -v verbosity turned on $ python3 prog.py --help usage: prog.py [-h] [-v] optional arguments: -h, --help show this help message and exit -v, --verbose increase output verbosity Note that the new ability is also reflected in the help text. Combining Positional and Optional arguments =========================================== Our program keeps growing in complexity: import argparse parser = argparse.ArgumentParser() parser.add_argument("square", type=int, help="display a square of a given number") parser.add_argument("-v", "--verbose", action="store_true", help="increase output verbosity") args = parser.parse_args() answer = args.square**2 if args.verbose: print("the square of {} equals {}".format(args.square, answer)) else: print(answer) And now the output: $ python3 prog.py usage: prog.py [-h] [-v] square prog.py: error: the following arguments are required: square $ python3 prog.py 4 16 $ python3 prog.py 4 --verbose the square of 4 equals 16 $ python3 prog.py --verbose 4 the square of 4 equals 16 * We’ve brought back a positional argument, hence the complaint. * Note that the order does not matter. How about we give this program of ours back the ability to have multiple verbosity values, and actually get to use them: import argparse parser = argparse.ArgumentParser() parser.add_argument("square", type=int, help="display a square of a given number") parser.add_argument("-v", "--verbosity", type=int, help="increase output verbosity") args = parser.parse_args() answer = args.square**2 if args.verbosity == 2: print("the square of {} equals {}".format(args.square, answer)) elif args.verbosity == 1: print("{}^2 == {}".format(args.square, answer)) else: print(answer) And the output: $ python3 prog.py 4 16 $ python3 prog.py 4 -v usage: prog.py [-h] [-v VERBOSITY] square prog.py: error: argument -v/--verbosity: expected one argument $ python3 prog.py 4 -v 1 4^2 == 16 $ python3 prog.py 4 -v 2 the square of 4 equals 16 $ python3 prog.py 4 -v 3 16 These all look good except the last one, which exposes a bug in our program. Let’s fix it by restricting the values the "--verbosity" option can accept: import argparse parser = argparse.ArgumentParser() parser.add_argument("square", type=int, help="display a square of a given number") parser.add_argument("-v", "--verbosity", type=int, choices=[0, 1, 2], help="increase output verbosity") args = parser.parse_args() answer = args.square**2 if args.verbosity == 2: print("the square of {} equals {}".format(args.square, answer)) elif args.verbosity == 1: print("{}^2 == {}".format(args.square, answer)) else: print(answer) And the output: $ python3 prog.py 4 -v 3 usage: prog.py [-h] [-v {0,1,2}] square prog.py: error: argument -v/--verbosity: invalid choice: 3 (choose from 0, 1, 2) $ python3 prog.py 4 -h usage: prog.py [-h] [-v {0,1,2}] square positional arguments: square display a square of a given number optional arguments: -h, --help show this help message and exit -v {0,1,2}, --verbosity {0,1,2} increase output verbosity Note that the change also reflects both in the error message as well as the help string. Now, let’s use a different approach of playing with verbosity, which is pretty common. It also matches the way the CPython executable handles its own verbosity argument (check the output of "python --help"): import argparse parser = argparse.ArgumentParser() parser.add_argument("square", type=int, help="display the square of a given number") parser.add_argument("-v", "--verbosity", action="count", help="increase output verbosity") args = parser.parse_args() answer = args.square**2 if args.verbosity == 2: print("the square of {} equals {}".format(args.square, answer)) elif args.verbosity == 1: print("{}^2 == {}".format(args.square, answer)) else: print(answer) We have introduced another action, “count”, to count the number of occurrences of a specific optional arguments: $ python3 prog.py 4 16 $ python3 prog.py 4 -v 4^2 == 16 $ python3 prog.py 4 -vv the square of 4 equals 16 $ python3 prog.py 4 --verbosity --verbosity the square of 4 equals 16 $ python3 prog.py 4 -v 1 usage: prog.py [-h] [-v] square prog.py: error: unrecognized arguments: 1 $ python3 prog.py 4 -h usage: prog.py [-h] [-v] square positional arguments: square display a square of a given number optional arguments: -h, --help show this help message and exit -v, --verbosity increase output verbosity $ python3 prog.py 4 -vvv 16 * Yes, it’s now more of a flag (similar to "action="store_true"") in the previous version of our script. That should explain the complaint. * It also behaves similar to “store_true” action. * Now here’s a demonstration of what the “count” action gives. You’ve probably seen this sort of usage before. * And if you don’t specify the "-v" flag, that flag is considered to have "None" value. * As should be expected, specifying the long form of the flag, we should get the same output. * Sadly, our help output isn’t very informative on the new ability our script has acquired, but that can always be fixed by improving the documentation for our script (e.g. via the "help" keyword argument). * That last output exposes a bug in our program. Let’s fix: import argparse parser = argparse.ArgumentParser() parser.add_argument("square", type=int, help="display a square of a given number") parser.add_argument("-v", "--verbosity", action="count", help="increase output verbosity") args = parser.parse_args() answer = args.square**2 # bugfix: replace == with >= if args.verbosity >= 2: print("the square of {} equals {}".format(args.square, answer)) elif args.verbosity >= 1: print("{}^2 == {}".format(args.square, answer)) else: print(answer) And this is what it gives: $ python3 prog.py 4 -vvv the square of 4 equals 16 $ python3 prog.py 4 -vvvv the square of 4 equals 16 $ python3 prog.py 4 Traceback (most recent call last): File "prog.py", line 11, in if args.verbosity >= 2: TypeError: '>=' not supported between instances of 'NoneType' and 'int' * First output went well, and fixes the bug we had before. That is, we want any value >= 2 to be as verbose as possible. * Third output not so good. Let’s fix that bug: import argparse parser = argparse.ArgumentParser() parser.add_argument("square", type=int, help="display a square of a given number") parser.add_argument("-v", "--verbosity", action="count", default=0, help="increase output verbosity") args = parser.parse_args() answer = args.square**2 if args.verbosity >= 2: print("the square of {} equals {}".format(args.square, answer)) elif args.verbosity >= 1: print("{}^2 == {}".format(args.square, answer)) else: print(answer) We’ve just introduced yet another keyword, "default". We’ve set it to "0" in order to make it comparable to the other int values. Remember that by default, if an optional argument isn’t specified, it gets the "None" value, and that cannot be compared to an int value (hence the "TypeError" exception). And: $ python3 prog.py 4 16 You can go quite far just with what we’ve learned so far, and we have only scratched the surface. The "argparse" module is very powerful, and we’ll explore a bit more of it before we end this tutorial. Getting a little more advanced ============================== What if we wanted to expand our tiny program to perform other powers, not just squares: import argparse parser = argparse.ArgumentParser() parser.add_argument("x", type=int, help="the base") parser.add_argument("y", type=int, help="the exponent") parser.add_argument("-v", "--verbosity", action="count", default=0) args = parser.parse_args() answer = args.x**args.y if args.verbosity >= 2: print("{} to the power {} equals {}".format(args.x, args.y, answer)) elif args.verbosity >= 1: print("{}^{} == {}".format(args.x, args.y, answer)) else: print(answer) Output: $ python3 prog.py usage: prog.py [-h] [-v] x y prog.py: error: the following arguments are required: x, y $ python3 prog.py -h usage: prog.py [-h] [-v] x y positional arguments: x the base y the exponent optional arguments: -h, --help show this help message and exit -v, --verbosity $ python3 prog.py 4 2 -v 4^2 == 16 Notice that so far we’ve been using verbosity level to *change* the text that gets displayed. The following example instead uses verbosity level to display *more* text instead: import argparse parser = argparse.ArgumentParser() parser.add_argument("x", type=int, help="the base") parser.add_argument("y", type=int, help="the exponent") parser.add_argument("-v", "--verbosity", action="count", default=0) args = parser.parse_args() answer = args.x**args.y if args.verbosity >= 2: print("Running '{}'".format(__file__)) if args.verbosity >= 1: print("{}^{} == ".format(args.x, args.y), end="") print(answer) Output: $ python3 prog.py 4 2 16 $ python3 prog.py 4 2 -v 4^2 == 16 $ python3 prog.py 4 2 -vv Running 'prog.py' 4^2 == 16 Conflicting options ------------------- So far, we have been working with two methods of an "argparse.ArgumentParser" instance. Let’s introduce a third one, "add_mutually_exclusive_group()". It allows for us to specify options that conflict with each other. Let’s also change the rest of the program so that the new functionality makes more sense: we’ll introduce the "--quiet" option, which will be the opposite of the "-- verbose" one: import argparse parser = argparse.ArgumentParser() group = parser.add_mutually_exclusive_group() group.add_argument("-v", "--verbose", action="store_true") group.add_argument("-q", "--quiet", action="store_true") parser.add_argument("x", type=int, help="the base") parser.add_argument("y", type=int, help="the exponent") args = parser.parse_args() answer = args.x**args.y if args.quiet: print(answer) elif args.verbose: print("{} to the power {} equals {}".format(args.x, args.y, answer)) else: print("{}^{} == {}".format(args.x, args.y, answer)) Our program is now simpler, and we’ve lost some functionality for the sake of demonstration. Anyways, here’s the output: $ python3 prog.py 4 2 4^2 == 16 $ python3 prog.py 4 2 -q 16 $ python3 prog.py 4 2 -v 4 to the power 2 equals 16 $ python3 prog.py 4 2 -vq usage: prog.py [-h] [-v | -q] x y prog.py: error: argument -q/--quiet: not allowed with argument -v/--verbose $ python3 prog.py 4 2 -v --quiet usage: prog.py [-h] [-v | -q] x y prog.py: error: argument -q/--quiet: not allowed with argument -v/--verbose That should be easy to follow. I’ve added that last output so you can see the sort of flexibility you get, i.e. mixing long form options with short form ones. Before we conclude, you probably want to tell your users the main purpose of your program, just in case they don’t know: import argparse parser = argparse.ArgumentParser(description="calculate X to the power of Y") group = parser.add_mutually_exclusive_group() group.add_argument("-v", "--verbose", action="store_true") group.add_argument("-q", "--quiet", action="store_true") parser.add_argument("x", type=int, help="the base") parser.add_argument("y", type=int, help="the exponent") args = parser.parse_args() answer = args.x**args.y if args.quiet: print(answer) elif args.verbose: print("{} to the power {} equals {}".format(args.x, args.y, answer)) else: print("{}^{} == {}".format(args.x, args.y, answer)) Note that slight difference in the usage text. Note the "[-v | -q]", which tells us that we can either use "-v" or "-q", but not both at the same time: $ python3 prog.py --help usage: prog.py [-h] [-v | -q] x y calculate X to the power of Y positional arguments: x the base y the exponent optional arguments: -h, --help show this help message and exit -v, --verbose -q, --quiet Conclusion ========== The "argparse" module offers a lot more than shown here. Its docs are quite detailed and thorough, and full of examples. Having gone through this tutorial, you should easily digest them without feeling overwhelmed. Argument Clinic How-To ********************** author: Larry Hastings Abstract ^^^^^^^^ Argument Clinic is a preprocessor for CPython C files. Its purpose is to automate all the boilerplate involved with writing argument parsing code for “builtins”. This document shows you how to convert your first C function to work with Argument Clinic, and then introduces some advanced topics on Argument Clinic usage. Currently Argument Clinic is considered internal-only for CPython. Its use is not supported for files outside CPython, and no guarantees are made regarding backwards compatibility for future versions. In other words: if you maintain an external C extension for CPython, you’re welcome to experiment with Argument Clinic in your own code. But the version of Argument Clinic that ships with the next version of CPython *could* be totally incompatible and break all your code. The Goals Of Argument Clinic ============================ Argument Clinic’s primary goal is to take over responsibility for all argument parsing code inside CPython. This means that, when you convert a function to work with Argument Clinic, that function should no longer do any of its own argument parsing—the code generated by Argument Clinic should be a “black box” to you, where CPython calls in at the top, and your code gets called at the bottom, with "PyObject *args" (and maybe "PyObject *kwargs") magically converted into the C variables and types you need. In order for Argument Clinic to accomplish its primary goal, it must be easy to use. Currently, working with CPython’s argument parsing library is a chore, requiring maintaining redundant information in a surprising number of places. When you use Argument Clinic, you don’t have to repeat yourself. Obviously, no one would want to use Argument Clinic unless it’s solving their problem—and without creating new problems of its own. So it’s paramount that Argument Clinic generate correct code. It’d be nice if the code was faster, too, but at the very least it should not introduce a major speed regression. (Eventually Argument Clinic *should* make a major speedup possible—we could rewrite its code generator to produce tailor-made argument parsing code, rather than calling the general-purpose CPython argument parsing library. That would make for the fastest argument parsing possible!) Additionally, Argument Clinic must be flexible enough to work with any approach to argument parsing. Python has some functions with some very strange parsing behaviors; Argument Clinic’s goal is to support all of them. Finally, the original motivation for Argument Clinic was to provide introspection “signatures” for CPython builtins. It used to be, the introspection query functions would throw an exception if you passed in a builtin. With Argument Clinic, that’s a thing of the past! One idea you should keep in mind, as you work with Argument Clinic: the more information you give it, the better job it’ll be able to do. Argument Clinic is admittedly relatively simple right now. But as it evolves it will get more sophisticated, and it should be able to do many interesting and smart things with all the information you give it. Basic Concepts And Usage ======================== Argument Clinic ships with CPython; you’ll find it in "Tools/clinic/clinic.py". If you run that script, specifying a C file as an argument: $ python3 Tools/clinic/clinic.py foo.c Argument Clinic will scan over the file looking for lines that look exactly like this: /*[clinic input] When it finds one, it reads everything up to a line that looks exactly like this: [clinic start generated code]*/ Everything in between these two lines is input for Argument Clinic. All of these lines, including the beginning and ending comment lines, are collectively called an Argument Clinic “block”. When Argument Clinic parses one of these blocks, it generates output. This output is rewritten into the C file immediately after the block, followed by a comment containing a checksum. The Argument Clinic block now looks like this: /*[clinic input] ... clinic input goes here ... [clinic start generated code]*/ ... clinic output goes here ... /*[clinic end generated code: checksum=...]*/ If you run Argument Clinic on the same file a second time, Argument Clinic will discard the old output and write out the new output with a fresh checksum line. However, if the input hasn’t changed, the output won’t change either. You should never modify the output portion of an Argument Clinic block. Instead, change the input until it produces the output you want. (That’s the purpose of the checksum—to detect if someone changed the output, as these edits would be lost the next time Argument Clinic writes out fresh output.) For the sake of clarity, here’s the terminology we’ll use with Argument Clinic: * The first line of the comment ("/*[clinic input]") is the *start line*. * The last line of the initial comment ("[clinic start generated code]*/") is the *end line*. * The last line ("/*[clinic end generated code: checksum=...]*/") is the *checksum line*. * In between the start line and the end line is the *input*. * In between the end line and the checksum line is the *output*. * All the text collectively, from the start line to the checksum line inclusively, is the *block*. (A block that hasn’t been successfully processed by Argument Clinic yet doesn’t have output or a checksum line, but it’s still considered a block.) Converting Your First Function ============================== The best way to get a sense of how Argument Clinic works is to convert a function to work with it. Here, then, are the bare minimum steps you’d need to follow to convert a function to work with Argument Clinic. Note that for code you plan to check in to CPython, you really should take the conversion farther, using some of the advanced concepts you’ll see later on in the document (like “return converters” and “self converters”). But we’ll keep it simple for this walkthrough so you can learn. Let’s dive in! * Make sure you’re working with a freshly updated checkout of the CPython trunk. * Find a Python builtin that calls either "PyArg_ParseTuple()" or "PyArg_ParseTupleAndKeywords()", and hasn’t been converted to work with Argument Clinic yet. For my example I’m using "_pickle.Pickler.dump()". * If the call to the "PyArg_Parse" function uses any of the following format units: O& O! es es# et et# or if it has multiple calls to "PyArg_ParseTuple()", you should choose a different function. Argument Clinic *does* support all of these scenarios. But these are advanced topics—let’s do something simpler for your first function. Also, if the function has multiple calls to "PyArg_ParseTuple()" or "PyArg_ParseTupleAndKeywords()" where it supports different types for the same argument, or if the function uses something besides PyArg_Parse functions to parse its arguments, it probably isn’t suitable for conversion to Argument Clinic. Argument Clinic doesn’t support generic functions or polymorphic parameters. * Add the following boilerplate above the function, creating our block: /*[clinic input] [clinic start generated code]*/ * Cut the docstring and paste it in between the "[clinic]" lines, removing all the junk that makes it a properly quoted C string. When you’re done you should have just the text, based at the left margin, with no line wider than 80 characters. (Argument Clinic will preserve indents inside the docstring.) If the old docstring had a first line that looked like a function signature, throw that line away. (The docstring doesn’t need it anymore—when you use "help()" on your builtin in the future, the first line will be built automatically based on the function’s signature.) Sample: /*[clinic input] Write a pickled representation of obj to the open file. [clinic start generated code]*/ * If your docstring doesn’t have a “summary” line, Argument Clinic will complain. So let’s make sure it has one. The “summary” line should be a paragraph consisting of a single 80-column line at the beginning of the docstring. (Our example docstring consists solely of a summary line, so the sample code doesn’t have to change for this step.) * Above the docstring, enter the name of the function, followed by a blank line. This should be the Python name of the function, and should be the full dotted path to the function—it should start with the name of the module, include any sub-modules, and if the function is a method on a class it should include the class name too. Sample: /*[clinic input] _pickle.Pickler.dump Write a pickled representation of obj to the open file. [clinic start generated code]*/ * If this is the first time that module or class has been used with Argument Clinic in this C file, you must declare the module and/or class. Proper Argument Clinic hygiene prefers declaring these in a separate block somewhere near the top of the C file, in the same way that include files and statics go at the top. (In our sample code we’ll just show the two blocks next to each other.) The name of the class and module should be the same as the one seen by Python. Check the name defined in the "PyModuleDef" or "PyTypeObject" as appropriate. When you declare a class, you must also specify two aspects of its type in C: the type declaration you’d use for a pointer to an instance of this class, and a pointer to the "PyTypeObject" for this class. Sample: /*[clinic input] module _pickle class _pickle.Pickler "PicklerObject *" "&Pickler_Type" [clinic start generated code]*/ /*[clinic input] _pickle.Pickler.dump Write a pickled representation of obj to the open file. [clinic start generated code]*/ * Declare each of the parameters to the function. Each parameter should get its own line. All the parameter lines should be indented from the function name and the docstring. The general form of these parameter lines is as follows: name_of_parameter: converter If the parameter has a default value, add that after the converter: name_of_parameter: converter = default_value Argument Clinic’s support for “default values” is quite sophisticated; please see the section below on default values for more information. Add a blank line below the parameters. What’s a “converter”? It establishes both the type of the variable used in C, and the method to convert the Python value into a C value at runtime. For now you’re going to use what’s called a “legacy converter”—a convenience syntax intended to make porting old code into Argument Clinic easier. For each parameter, copy the “format unit” for that parameter from the "PyArg_Parse()" format argument and specify *that* as its converter, as a quoted string. (“format unit” is the formal name for the one-to-three character substring of the "format" parameter that tells the argument parsing function what the type of the variable is and how to convert it. For more on format units please see Parsing arguments and building values.) For multicharacter format units like "z#", use the entire two-or- three character string. Sample: /*[clinic input] module _pickle class _pickle.Pickler "PicklerObject *" "&Pickler_Type" [clinic start generated code]*/ /*[clinic input] _pickle.Pickler.dump obj: 'O' Write a pickled representation of obj to the open file. [clinic start generated code]*/ * If your function has "|" in the format string, meaning some parameters have default values, you can ignore it. Argument Clinic infers which parameters are optional based on whether or not they have default values. If your function has "$" in the format string, meaning it takes keyword-only arguments, specify "*" on a line by itself before the first keyword-only argument, indented the same as the parameter lines. ("_pickle.Pickler.dump" has neither, so our sample is unchanged.) * If the existing C function calls "PyArg_ParseTuple()" (as opposed to "PyArg_ParseTupleAndKeywords()"), then all its arguments are positional-only. To mark all parameters as positional-only in Argument Clinic, add a "/" on a line by itself after the last parameter, indented the same as the parameter lines. Currently this is all-or-nothing; either all parameters are positional-only, or none of them are. (In the future Argument Clinic may relax this restriction.) Sample: /*[clinic input] module _pickle class _pickle.Pickler "PicklerObject *" "&Pickler_Type" [clinic start generated code]*/ /*[clinic input] _pickle.Pickler.dump obj: 'O' / Write a pickled representation of obj to the open file. [clinic start generated code]*/ * It’s helpful to write a per-parameter docstring for each parameter. But per-parameter docstrings are optional; you can skip this step if you prefer. Here’s how to add a per-parameter docstring. The first line of the per-parameter docstring must be indented further than the parameter definition. The left margin of this first line establishes the left margin for the whole per-parameter docstring; all the text you write will be outdented by this amount. You can write as much text as you like, across multiple lines if you wish. Sample: /*[clinic input] module _pickle class _pickle.Pickler "PicklerObject *" "&Pickler_Type" [clinic start generated code]*/ /*[clinic input] _pickle.Pickler.dump obj: 'O' The object to be pickled. / Write a pickled representation of obj to the open file. [clinic start generated code]*/ * Save and close the file, then run "Tools/clinic/clinic.py" on it. With luck everything worked—your block now has output, and a ".c.h" file has been generated! Reopen the file in your text editor to see: /*[clinic input] _pickle.Pickler.dump obj: 'O' The object to be pickled. / Write a pickled representation of obj to the open file. [clinic start generated code]*/ static PyObject * _pickle_Pickler_dump(PicklerObject *self, PyObject *obj) /*[clinic end generated code: output=87ecad1261e02ac7 input=552eb1c0f52260d9]*/ Obviously, if Argument Clinic didn’t produce any output, it’s because it found an error in your input. Keep fixing your errors and retrying until Argument Clinic processes your file without complaint. For readability, most of the glue code has been generated to a ".c.h" file. You’ll need to include that in your original ".c" file, typically right after the clinic module block: #include "clinic/_pickle.c.h" * Double-check that the argument-parsing code Argument Clinic generated looks basically the same as the existing code. First, ensure both places use the same argument-parsing function. The existing code must call either "PyArg_ParseTuple()" or "PyArg_ParseTupleAndKeywords()"; ensure that the code generated by Argument Clinic calls the *exact* same function. Second, the format string passed in to "PyArg_ParseTuple()" or "PyArg_ParseTupleAndKeywords()" should be *exactly* the same as the hand-written one in the existing function, up to the colon or semi- colon. (Argument Clinic always generates its format strings with a ":" followed by the name of the function. If the existing code’s format string ends with ";", to provide usage help, this change is harmless—don’t worry about it.) Third, for parameters whose format units require two arguments (like a length variable, or an encoding string, or a pointer to a conversion function), ensure that the second argument is *exactly* the same between the two invocations. Fourth, inside the output portion of the block you’ll find a preprocessor macro defining the appropriate static "PyMethodDef" structure for this builtin: #define __PICKLE_PICKLER_DUMP_METHODDEF \ {"dump", (PyCFunction)__pickle_Pickler_dump, METH_O, __pickle_Pickler_dump__doc__}, This static structure should be *exactly* the same as the existing static "PyMethodDef" structure for this builtin. If any of these items differ in *any way*, adjust your Argument Clinic function specification and rerun "Tools/clinic/clinic.py" until they *are* the same. * Notice that the last line of its output is the declaration of your “impl” function. This is where the builtin’s implementation goes. Delete the existing prototype of the function you’re modifying, but leave the opening curly brace. Now delete its argument parsing code and the declarations of all the variables it dumps the arguments into. Notice how the Python arguments are now arguments to this impl function; if the implementation used different names for these variables, fix it. Let’s reiterate, just because it’s kind of weird. Your code should now look like this: static return_type your_function_impl(...) /*[clinic end generated code: checksum=...]*/ { ... Argument Clinic generated the checksum line and the function prototype just above it. You should write the opening (and closing) curly braces for the function, and the implementation inside. Sample: /*[clinic input] module _pickle class _pickle.Pickler "PicklerObject *" "&Pickler_Type" [clinic start generated code]*/ /*[clinic end generated code: checksum=da39a3ee5e6b4b0d3255bfef95601890afd80709]*/ /*[clinic input] _pickle.Pickler.dump obj: 'O' The object to be pickled. / Write a pickled representation of obj to the open file. [clinic start generated code]*/ PyDoc_STRVAR(__pickle_Pickler_dump__doc__, "Write a pickled representation of obj to the open file.\n" "\n" ... static PyObject * _pickle_Pickler_dump_impl(PicklerObject *self, PyObject *obj) /*[clinic end generated code: checksum=3bd30745bf206a48f8b576a1da3d90f55a0a4187]*/ { /* Check whether the Pickler was initialized correctly (issue3664). Developers often forget to call __init__() in their subclasses, which would trigger a segfault without this check. */ if (self->write == NULL) { PyErr_Format(PicklingError, "Pickler.__init__() was not called by %s.__init__()", Py_TYPE(self)->tp_name); return NULL; } if (_Pickler_ClearBuffer(self) < 0) return NULL; ... * Remember the macro with the "PyMethodDef" structure for this function? Find the existing "PyMethodDef" structure for this function and replace it with a reference to the macro. (If the builtin is at module scope, this will probably be very near the end of the file; if the builtin is a class method, this will probably be below but relatively near to the implementation.) Note that the body of the macro contains a trailing comma. So when you replace the existing static "PyMethodDef" structure with the macro, *don’t* add a comma to the end. Sample: static struct PyMethodDef Pickler_methods[] = { __PICKLE_PICKLER_DUMP_METHODDEF __PICKLE_PICKLER_CLEAR_MEMO_METHODDEF {NULL, NULL} /* sentinel */ }; * Compile, then run the relevant portions of the regression-test suite. This change should not introduce any new compile-time warnings or errors, and there should be no externally-visible change to Python’s behavior. Well, except for one difference: "inspect.signature()" run on your function should now provide a valid signature! Congratulations, you’ve ported your first function to work with Argument Clinic! Advanced Topics =============== Now that you’ve had some experience working with Argument Clinic, it’s time for some advanced topics. Symbolic default values ----------------------- The default value you provide for a parameter can’t be any arbitrary expression. Currently the following are explicitly supported: * Numeric constants (integer and float) * String constants * "True", "False", and "None" * Simple symbolic constants like "sys.maxsize", which must start with the name of the module In case you’re curious, this is implemented in "from_builtin()" in "Lib/inspect.py". (In the future, this may need to get even more elaborate, to allow full expressions like "CONSTANT - 1".) Renaming the C functions and variables generated by Argument Clinic ------------------------------------------------------------------- Argument Clinic automatically names the functions it generates for you. Occasionally this may cause a problem, if the generated name collides with the name of an existing C function. There’s an easy solution: override the names used for the C functions. Just add the keyword ""as"" to your function declaration line, followed by the function name you wish to use. Argument Clinic will use that function name for the base (generated) function, then add ""_impl"" to the end and use that for the name of the impl function. For example, if we wanted to rename the C function names generated for "pickle.Pickler.dump", it’d look like this: /*[clinic input] pickle.Pickler.dump as pickler_dumper ... The base function would now be named "pickler_dumper()", and the impl function would now be named "pickler_dumper_impl()". Similarly, you may have a problem where you want to give a parameter a specific Python name, but that name may be inconvenient in C. Argument Clinic allows you to give a parameter different names in Python and in C, using the same ""as"" syntax: /*[clinic input] pickle.Pickler.dump obj: object file as file_obj: object protocol: object = NULL * fix_imports: bool = True Here, the name used in Python (in the signature and the "keywords" array) would be "file", but the C variable would be named "file_obj". You can use this to rename the "self" parameter too! Converting functions using PyArg_UnpackTuple -------------------------------------------- To convert a function parsing its arguments with "PyArg_UnpackTuple()", simply write out all the arguments, specifying each as an "object". You may specify the "type" argument to cast the type as appropriate. All arguments should be marked positional-only (add a "/" on a line by itself after the last argument). Currently the generated code will use "PyArg_ParseTuple()", but this will change soon. Optional Groups --------------- Some legacy functions have a tricky approach to parsing their arguments: they count the number of positional arguments, then use a "switch" statement to call one of several different "PyArg_ParseTuple()" calls depending on how many positional arguments there are. (These functions cannot accept keyword-only arguments.) This approach was used to simulate optional arguments back before "PyArg_ParseTupleAndKeywords()" was created. While functions using this approach can often be converted to use "PyArg_ParseTupleAndKeywords()", optional arguments, and default values, it’s not always possible. Some of these legacy functions have behaviors "PyArg_ParseTupleAndKeywords()" doesn’t directly support. The most obvious example is the builtin function "range()", which has an optional argument on the *left* side of its required argument! Another example is "curses.window.addch()", which has a group of two arguments that must always be specified together. (The arguments are called "x" and "y"; if you call the function passing in "x", you must also pass in "y"—and if you don’t pass in "x" you may not pass in "y" either.) In any case, the goal of Argument Clinic is to support argument parsing for all existing CPython builtins without changing their semantics. Therefore Argument Clinic supports this alternate approach to parsing, using what are called *optional groups*. Optional groups are groups of arguments that must all be passed in together. They can be to the left or the right of the required arguments. They can *only* be used with positional-only parameters. Note: Optional groups are *only* intended for use when converting functions that make multiple calls to "PyArg_ParseTuple()"! Functions that use *any* other approach for parsing arguments should *almost never* be converted to Argument Clinic using optional groups. Functions using optional groups currently cannot have accurate signatures in Python, because Python just doesn’t understand the concept. Please avoid using optional groups wherever possible. To specify an optional group, add a "[" on a line by itself before the parameters you wish to group together, and a "]" on a line by itself after these parameters. As an example, here’s how "curses.window.addch" uses optional groups to make the first two parameters and the last parameter optional: /*[clinic input] curses.window.addch [ x: int X-coordinate. y: int Y-coordinate. ] ch: object Character to add. [ attr: long Attributes for the character. ] / ... Notes: * For every optional group, one additional parameter will be passed into the impl function representing the group. The parameter will be an int named "group_{direction}_{number}", where "{direction}" is either "right" or "left" depending on whether the group is before or after the required parameters, and "{number}" is a monotonically increasing number (starting at 1) indicating how far away the group is from the required parameters. When the impl is called, this parameter will be set to zero if this group was unused, and set to non-zero if this group was used. (By used or unused, I mean whether or not the parameters received arguments in this invocation.) * If there are no required arguments, the optional groups will behave as if they’re to the right of the required arguments. * In the case of ambiguity, the argument parsing code favors parameters on the left (before the required parameters). * Optional groups can only contain positional-only parameters. * Optional groups are *only* intended for legacy code. Please do not use optional groups for new code. Using real Argument Clinic converters, instead of “legacy converters” --------------------------------------------------------------------- To save time, and to minimize how much you need to learn to achieve your first port to Argument Clinic, the walkthrough above tells you to use “legacy converters”. “Legacy converters” are a convenience, designed explicitly to make porting existing code to Argument Clinic easier. And to be clear, their use is acceptable when porting code for Python 3.4. However, in the long term we probably want all our blocks to use Argument Clinic’s real syntax for converters. Why? A couple reasons: * The proper converters are far easier to read and clearer in their intent. * There are some format units that are unsupported as “legacy converters”, because they require arguments, and the legacy converter syntax doesn’t support specifying arguments. * In the future we may have a new argument parsing library that isn’t restricted to what "PyArg_ParseTuple()" supports; this flexibility won’t be available to parameters using legacy converters. Therefore, if you don’t mind a little extra effort, please use the normal converters instead of legacy converters. In a nutshell, the syntax for Argument Clinic (non-legacy) converters looks like a Python function call. However, if there are no explicit arguments to the function (all functions take their default values), you may omit the parentheses. Thus "bool" and "bool()" are exactly the same converters. All arguments to Argument Clinic converters are keyword-only. All Argument Clinic converters accept the following arguments: "c_default" The default value for this parameter when defined in C. Specifically, this will be the initializer for the variable declared in the “parse function”. See the section on default values for how to use this. Specified as a string. "annotation" The annotation value for this parameter. Not currently supported, because **PEP 8** mandates that the Python library may not use annotations. In addition, some converters accept additional arguments. Here is a list of these arguments, along with their meanings: "accept" A set of Python types (and possibly pseudo-types); this restricts the allowable Python argument to values of these types. (This is not a general-purpose facility; as a rule it only supports specific lists of types as shown in the legacy converter table.) To accept "None", add "NoneType" to this set. "bitwise" Only supported for unsigned integers. The native integer value of this Python argument will be written to the parameter without any range checking, even for negative values. "converter" Only supported by the "object" converter. Specifies the name of a C “converter function” to use to convert this object to a native type. "encoding" Only supported for strings. Specifies the encoding to use when converting this string from a Python str (Unicode) value into a C "char *" value. "subclass_of" Only supported for the "object" converter. Requires that the Python value be a subclass of a Python type, as expressed in C. "type" Only supported for the "object" and "self" converters. Specifies the C type that will be used to declare the variable. Default value is ""PyObject *"". "zeroes" Only supported for strings. If true, embedded NUL bytes ("'\\0'") are permitted inside the value. The length of the string will be passed in to the impl function, just after the string parameter, as a parameter named "_length". Please note, not every possible combination of arguments will work. Usually these arguments are implemented by specific "PyArg_ParseTuple" *format units*, with specific behavior. For example, currently you cannot call "unsigned_short" without also specifying "bitwise=True". Although it’s perfectly reasonable to think this would work, these semantics don’t map to any existing format unit. So Argument Clinic doesn’t support it. (Or, at least, not yet.) Below is a table showing the mapping of legacy converters into real Argument Clinic converters. On the left is the legacy converter, on the right is the text you’d replace it with. +-----------+-----------------------------------------------------------------------------------+ | "'B'" | "unsigned_char(bitwise=True)" | +-----------+-----------------------------------------------------------------------------------+ | "'b'" | "unsigned_char" | +-----------+-----------------------------------------------------------------------------------+ | "'c'" | "char" | +-----------+-----------------------------------------------------------------------------------+ | "'C'" | "int(accept={str})" | +-----------+-----------------------------------------------------------------------------------+ | "'d'" | "double" | +-----------+-----------------------------------------------------------------------------------+ | "'D'" | "Py_complex" | +-----------+-----------------------------------------------------------------------------------+ | "'es'" | "str(encoding='name_of_encoding')" | +-----------+-----------------------------------------------------------------------------------+ | "'es#'" | "str(encoding='name_of_encoding', zeroes=True)" | +-----------+-----------------------------------------------------------------------------------+ | "'et'" | "str(encoding='name_of_encoding', accept={bytes, bytearray, str})" | +-----------+-----------------------------------------------------------------------------------+ | "'et#'" | "str(encoding='name_of_encoding', accept={bytes, bytearray, str}, zeroes=True)" | +-----------+-----------------------------------------------------------------------------------+ | "'f'" | "float" | +-----------+-----------------------------------------------------------------------------------+ | "'h'" | "short" | +-----------+-----------------------------------------------------------------------------------+ | "'H'" | "unsigned_short(bitwise=True)" | +-----------+-----------------------------------------------------------------------------------+ | "'i'" | "int" | +-----------+-----------------------------------------------------------------------------------+ | "'I'" | "unsigned_int(bitwise=True)" | +-----------+-----------------------------------------------------------------------------------+ | "'k'" | "unsigned_long(bitwise=True)" | +-----------+-----------------------------------------------------------------------------------+ | "'K'" | "unsigned_long_long(bitwise=True)" | +-----------+-----------------------------------------------------------------------------------+ | "'l'" | "long" | +-----------+-----------------------------------------------------------------------------------+ | "'L'" | "long long" | +-----------+-----------------------------------------------------------------------------------+ | "'n'" | "Py_ssize_t" | +-----------+-----------------------------------------------------------------------------------+ | "'O'" | "object" | +-----------+-----------------------------------------------------------------------------------+ | "'O!'" | "object(subclass_of='&PySomething_Type')" | +-----------+-----------------------------------------------------------------------------------+ | "'O&'" | "object(converter='name_of_c_function')" | +-----------+-----------------------------------------------------------------------------------+ | "'p'" | "bool" | +-----------+-----------------------------------------------------------------------------------+ | "'S'" | "PyBytesObject" | +-----------+-----------------------------------------------------------------------------------+ | "'s'" | "str" | +-----------+-----------------------------------------------------------------------------------+ | "'s#'" | "str(zeroes=True)" | +-----------+-----------------------------------------------------------------------------------+ | "'s*'" | "Py_buffer(accept={buffer, str})" | +-----------+-----------------------------------------------------------------------------------+ | "'U'" | "unicode" | +-----------+-----------------------------------------------------------------------------------+ | "'u'" | "Py_UNICODE" | +-----------+-----------------------------------------------------------------------------------+ | "'u#'" | "Py_UNICODE(zeroes=True)" | +-----------+-----------------------------------------------------------------------------------+ | "'w*'" | "Py_buffer(accept={rwbuffer})" | +-----------+-----------------------------------------------------------------------------------+ | "'Y'" | "PyByteArrayObject" | +-----------+-----------------------------------------------------------------------------------+ | "'y'" | "str(accept={bytes})" | +-----------+-----------------------------------------------------------------------------------+ | "'y#'" | "str(accept={robuffer}, zeroes=True)" | +-----------+-----------------------------------------------------------------------------------+ | "'y*'" | "Py_buffer" | +-----------+-----------------------------------------------------------------------------------+ | "'Z'" | "Py_UNICODE(accept={str, NoneType})" | +-----------+-----------------------------------------------------------------------------------+ | "'Z#'" | "Py_UNICODE(accept={str, NoneType}, zeroes=True)" | +-----------+-----------------------------------------------------------------------------------+ | "'z'" | "str(accept={str, NoneType})" | +-----------+-----------------------------------------------------------------------------------+ | "'z#'" | "str(accept={str, NoneType}, zeroes=True)" | +-----------+-----------------------------------------------------------------------------------+ | "'z*'" | "Py_buffer(accept={buffer, str, NoneType})" | +-----------+-----------------------------------------------------------------------------------+ As an example, here’s our sample "pickle.Pickler.dump" using the proper converter: /*[clinic input] pickle.Pickler.dump obj: object The object to be pickled. / Write a pickled representation of obj to the open file. [clinic start generated code]*/ One advantage of real converters is that they’re more flexible than legacy converters. For example, the "unsigned_int" converter (and all the "unsigned_" converters) can be specified without "bitwise=True". Their default behavior performs range checking on the value, and they won’t accept negative numbers. You just can’t do that with a legacy converter! Argument Clinic will show you all the converters it has available. For each converter it’ll show you all the parameters it accepts, along with the default value for each parameter. Just run "Tools/clinic/clinic.py --converters" to see the full list. Py_buffer --------- When using the "Py_buffer" converter (or the "'s*'", "'w*'", "'*y'", or "'z*'" legacy converters), you *must* not call "PyBuffer_Release()" on the provided buffer. Argument Clinic generates code that does it for you (in the parsing function). Advanced converters ------------------- Remember those format units you skipped for your first time because they were advanced? Here’s how to handle those too. The trick is, all those format units take arguments—either conversion functions, or types, or strings specifying an encoding. (But “legacy converters” don’t support arguments. That’s why we skipped them for your first function.) The argument you specified to the format unit is now an argument to the converter; this argument is either "converter" (for "O&"), "subclass_of" (for "O!"), or "encoding" (for all the format units that start with "e"). When using "subclass_of", you may also want to use the other custom argument for "object()": "type", which lets you set the type actually used for the parameter. For example, if you want to ensure that the object is a subclass of "PyUnicode_Type", you probably want to use the converter "object(type='PyUnicodeObject *', subclass_of='&PyUnicode_Type')". One possible problem with using Argument Clinic: it takes away some possible flexibility for the format units starting with "e". When writing a "PyArg_Parse" call by hand, you could theoretically decide at runtime what encoding string to pass in to "PyArg_ParseTuple()". But now this string must be hard-coded at Argument-Clinic- preprocessing-time. This limitation is deliberate; it made supporting this format unit much easier, and may allow for future optimizations. This restriction doesn’t seem unreasonable; CPython itself always passes in static hard-coded encoding strings for parameters whose format units start with "e". Parameter default values ------------------------ Default values for parameters can be any of a number of values. At their simplest, they can be string, int, or float literals: foo: str = "abc" bar: int = 123 bat: float = 45.6 They can also use any of Python’s built-in constants: yep: bool = True nope: bool = False nada: object = None There’s also special support for a default value of "NULL", and for simple expressions, documented in the following sections. The "NULL" default value ------------------------ For string and object parameters, you can set them to "None" to indicate that there’s no default. However, that means the C variable will be initialized to "Py_None". For convenience’s sakes, there’s a special value called "NULL" for just this reason: from Python’s perspective it behaves like a default value of "None", but the C variable is initialized with "NULL". Expressions specified as default values --------------------------------------- The default value for a parameter can be more than just a literal value. It can be an entire expression, using math operators and looking up attributes on objects. However, this support isn’t exactly simple, because of some non-obvious semantics. Consider the following example: foo: Py_ssize_t = sys.maxsize - 1 "sys.maxsize" can have different values on different platforms. Therefore Argument Clinic can’t simply evaluate that expression locally and hard-code it in C. So it stores the default in such a way that it will get evaluated at runtime, when the user asks for the function’s signature. What namespace is available when the expression is evaluated? It’s evaluated in the context of the module the builtin came from. So, if your module has an attribute called “"max_widgets"”, you may simply use it: foo: Py_ssize_t = max_widgets If the symbol isn’t found in the current module, it fails over to looking in "sys.modules". That’s how it can find "sys.maxsize" for example. (Since you don’t know in advance what modules the user will load into their interpreter, it’s best to restrict yourself to modules that are preloaded by Python itself.) Evaluating default values only at runtime means Argument Clinic can’t compute the correct equivalent C default value. So you need to tell it explicitly. When you use an expression, you must also specify the equivalent expression in C, using the "c_default" parameter to the converter: foo: Py_ssize_t(c_default="PY_SSIZE_T_MAX - 1") = sys.maxsize - 1 Another complication: Argument Clinic can’t know in advance whether or not the expression you supply is valid. It parses it to make sure it looks legal, but it can’t *actually* know. You must be very careful when using expressions to specify values that are guaranteed to be valid at runtime! Finally, because expressions must be representable as static C values, there are many restrictions on legal expressions. Here’s a list of Python features you’re not permitted to use: * Function calls. * Inline if statements ("3 if foo else 5"). * Automatic sequence unpacking ("*[1, 2, 3]"). * List/set/dict comprehensions and generator expressions. * Tuple/list/set/dict literals. Using a return converter ------------------------ By default the impl function Argument Clinic generates for you returns "PyObject *". But your C function often computes some C type, then converts it into the "PyObject *" at the last moment. Argument Clinic handles converting your inputs from Python types into native C types—why not have it convert your return value from a native C type into a Python type too? That’s what a “return converter” does. It changes your impl function to return some C type, then adds code to the generated (non-impl) function to handle converting that value into the appropriate "PyObject *". The syntax for return converters is similar to that of parameter converters. You specify the return converter like it was a return annotation on the function itself. Return converters behave much the same as parameter converters; they take arguments, the arguments are all keyword-only, and if you’re not changing any of the default arguments you can omit the parentheses. (If you use both ""as"" *and* a return converter for your function, the ""as"" should come before the return converter.) There’s one additional complication when using return converters: how do you indicate an error has occurred? Normally, a function returns a valid (non-"NULL") pointer for success, and "NULL" for failure. But if you use an integer return converter, all integers are valid. How can Argument Clinic detect an error? Its solution: each return converter implicitly looks for a special value that indicates an error. If you return that value, and an error has been set ("PyErr_Occurred()" returns a true value), then the generated code will propagate the error. Otherwise it will encode the value you return like normal. Currently Argument Clinic supports only a few return converters: bool int unsigned int long unsigned int size_t Py_ssize_t float double DecodeFSDefault None of these take parameters. For the first three, return -1 to indicate error. For "DecodeFSDefault", the return type is "const char *"; return a "NULL" pointer to indicate an error. (There’s also an experimental "NoneType" converter, which lets you return "Py_None" on success or "NULL" on failure, without having to increment the reference count on "Py_None". I’m not sure it adds enough clarity to be worth using.) To see all the return converters Argument Clinic supports, along with their parameters (if any), just run "Tools/clinic/clinic.py --converters" for the full list. Cloning existing functions -------------------------- If you have a number of functions that look similar, you may be able to use Clinic’s “clone” feature. When you clone an existing function, you reuse: * its parameters, including * their names, * their converters, with all parameters, * their default values, * their per-parameter docstrings, * their *kind* (whether they’re positional only, positional or keyword, or keyword only), and * its return converter. The only thing not copied from the original function is its docstring; the syntax allows you to specify a new docstring. Here’s the syntax for cloning a function: /*[clinic input] module.class.new_function [as c_basename] = module.class.existing_function Docstring for new_function goes here. [clinic start generated code]*/ (The functions can be in different modules or classes. I wrote "module.class" in the sample just to illustrate that you must use the full path to *both* functions.) Sorry, there’s no syntax for partially-cloning a function, or cloning a function then modifying it. Cloning is an all-or nothing proposition. Also, the function you are cloning from must have been previously defined in the current file. Calling Python code ------------------- The rest of the advanced topics require you to write Python code which lives inside your C file and modifies Argument Clinic’s runtime state. This is simple: you simply define a Python block. A Python block uses different delimiter lines than an Argument Clinic function block. It looks like this: /*[python input] # python code goes here [python start generated code]*/ All the code inside the Python block is executed at the time it’s parsed. All text written to stdout inside the block is redirected into the “output” after the block. As an example, here’s a Python block that adds a static integer variable to the C code: /*[python input] print('static int __ignored_unused_variable__ = 0;') [python start generated code]*/ static int __ignored_unused_variable__ = 0; /*[python checksum:...]*/ Using a “self converter” ------------------------ Argument Clinic automatically adds a “self” parameter for you using a default converter. It automatically sets the "type" of this parameter to the “pointer to an instance” you specified when you declared the type. However, you can override Argument Clinic’s converter and specify one yourself. Just add your own "self" parameter as the first parameter in a block, and ensure that its converter is an instance of "self_converter" or a subclass thereof. What’s the point? This lets you override the type of "self", or give it a different default name. How do you specify the custom type you want to cast "self" to? If you only have one or two functions with the same type for "self", you can directly use Argument Clinic’s existing "self" converter, passing in the type you want to use as the "type" parameter: /*[clinic input] _pickle.Pickler.dump self: self(type="PicklerObject *") obj: object / Write a pickled representation of the given object to the open file. [clinic start generated code]*/ On the other hand, if you have a lot of functions that will use the same type for "self", it’s best to create your own converter, subclassing "self_converter" but overwriting the "type" member: /*[python input] class PicklerObject_converter(self_converter): type = "PicklerObject *" [python start generated code]*/ /*[clinic input] _pickle.Pickler.dump self: PicklerObject obj: object / Write a pickled representation of the given object to the open file. [clinic start generated code]*/ Writing a custom converter -------------------------- As we hinted at in the previous section… you can write your own converters! A converter is simply a Python class that inherits from "CConverter". The main purpose of a custom converter is if you have a parameter using the "O&" format unit—parsing this parameter means calling a "PyArg_ParseTuple()" “converter function”. Your converter class should be named "*something*_converter". If the name follows this convention, then your converter class will be automatically registered with Argument Clinic; its name will be the name of your class with the "_converter" suffix stripped off. (This is accomplished with a metaclass.) You shouldn’t subclass "CConverter.__init__". Instead, you should write a "converter_init()" function. "converter_init()" always accepts a "self" parameter; after that, all additional parameters *must* be keyword-only. Any arguments passed in to the converter in Argument Clinic will be passed along to your "converter_init()". There are some additional members of "CConverter" you may wish to specify in your subclass. Here’s the current list: "type" The C type to use for this variable. "type" should be a Python string specifying the type, e.g. "int". If this is a pointer type, the type string should end with "' *'". "default" The Python default value for this parameter, as a Python value. Or the magic value "unspecified" if there is no default. "py_default" "default" as it should appear in Python code, as a string. Or "None" if there is no default. "c_default" "default" as it should appear in C code, as a string. Or "None" if there is no default. "c_ignored_default" The default value used to initialize the C variable when there is no default, but not specifying a default may result in an “uninitialized variable” warning. This can easily happen when using option groups—although properly-written code will never actually use this value, the variable does get passed in to the impl, and the C compiler will complain about the “use” of the uninitialized value. This value should always be a non-empty string. "converter" The name of the C converter function, as a string. "impl_by_reference" A boolean value. If true, Argument Clinic will add a "&" in front of the name of the variable when passing it into the impl function. "parse_by_reference" A boolean value. If true, Argument Clinic will add a "&" in front of the name of the variable when passing it into "PyArg_ParseTuple()". Here’s the simplest example of a custom converter, from "Modules/zlibmodule.c": /*[python input] class ssize_t_converter(CConverter): type = 'Py_ssize_t' converter = 'ssize_t_converter' [python start generated code]*/ /*[python end generated code: output=da39a3ee5e6b4b0d input=35521e4e733823c7]*/ This block adds a converter to Argument Clinic named "ssize_t". Parameters declared as "ssize_t" will be declared as type "Py_ssize_t", and will be parsed by the "'O&'" format unit, which will call the "ssize_t_converter" converter function. "ssize_t" variables automatically support default values. More sophisticated custom converters can insert custom C code to handle initialization and cleanup. You can see more examples of custom converters in the CPython source tree; grep the C files for the string "CConverter". Writing a custom return converter --------------------------------- Writing a custom return converter is much like writing a custom converter. Except it’s somewhat simpler, because return converters are themselves much simpler. Return converters must subclass "CReturnConverter". There are no examples yet of custom return converters, because they are not widely used yet. If you wish to write your own return converter, please read "Tools/clinic/clinic.py", specifically the implementation of "CReturnConverter" and all its subclasses. METH_O and METH_NOARGS ---------------------- To convert a function using "METH_O", make sure the function’s single argument is using the "object" converter, and mark the arguments as positional-only: /*[clinic input] meth_o_sample argument: object / [clinic start generated code]*/ To convert a function using "METH_NOARGS", just don’t specify any arguments. You can still use a self converter, a return converter, and specify a "type" argument to the object converter for "METH_O". tp_new and tp_init functions ---------------------------- You can convert "tp_new" and "tp_init" functions. Just name them "__new__" or "__init__" as appropriate. Notes: * The function name generated for "__new__" doesn’t end in "__new__" like it would by default. It’s just the name of the class, converted into a valid C identifier. * No "PyMethodDef" "#define" is generated for these functions. * "__init__" functions return "int", not "PyObject *". * Use the docstring as the class docstring. * Although "__new__" and "__init__" functions must always accept both the "args" and "kwargs" objects, when converting you may specify any signature for these functions that you like. (If your function doesn’t support keywords, the parsing function generated will throw an exception if it receives any.) Changing and redirecting Clinic’s output ---------------------------------------- It can be inconvenient to have Clinic’s output interspersed with your conventional hand-edited C code. Luckily, Clinic is configurable: you can buffer up its output for printing later (or earlier!), or write its output to a separate file. You can also add a prefix or suffix to every line of Clinic’s generated output. While changing Clinic’s output in this manner can be a boon to readability, it may result in Clinic code using types before they are defined, or your code attempting to use Clinic-generated code before it is defined. These problems can be easily solved by rearranging the declarations in your file, or moving where Clinic’s generated code goes. (This is why the default behavior of Clinic is to output everything into the current block; while many people consider this hampers readability, it will never require rearranging your code to fix definition-before-use problems.) Let’s start with defining some terminology: *field* A field, in this context, is a subsection of Clinic’s output. For example, the "#define" for the "PyMethodDef" structure is a field, called "methoddef_define". Clinic has seven different fields it can output per function definition: docstring_prototype docstring_definition methoddef_define impl_prototype parser_prototype parser_definition impl_definition All the names are of the form ""_"", where """" is the semantic object represented (the parsing function, the impl function, the docstring, or the methoddef structure) and """" represents what kind of statement the field is. Field names that end in ""_prototype"" represent forward declarations of that thing, without the actual body/data of the thing; field names that end in ""_definition"" represent the actual definition of the thing, with the body/data of the thing. (""methoddef"" is special, it’s the only one that ends with ""_define"", representing that it’s a preprocessor #define.) *destination* A destination is a place Clinic can write output to. There are five built-in destinations: "block" The default destination: printed in the output section of the current Clinic block. "buffer" A text buffer where you can save text for later. Text sent here is appended to the end of any existing text. It’s an error to have any text left in the buffer when Clinic finishes processing a file. "file" A separate “clinic file” that will be created automatically by Clinic. The filename chosen for the file is "{basename}.clinic{extension}", where "basename" and "extension" were assigned the output from "os.path.splitext()" run on the current file. (Example: the "file" destination for "_pickle.c" would be written to "_pickle.clinic.c".) **Important: When using a** "file" **destination, you** *must check in* **the generated file!** "two-pass" A buffer like "buffer". However, a two-pass buffer can only be dumped once, and it prints out all text sent to it during all processing, even from Clinic blocks *after* the dumping point. "suppress" The text is suppressed—thrown away. Clinic defines five new directives that let you reconfigure its output. The first new directive is "dump": dump This dumps the current contents of the named destination into the output of the current block, and empties it. This only works with "buffer" and "two-pass" destinations. The second new directive is "output". The most basic form of "output" is like this: output This tells Clinic to output *field* to *destination*. "output" also supports a special meta-destination, called "everything", which tells Clinic to output *all* fields to that *destination*. "output" has a number of other functions: output push output pop output preset "output push" and "output pop" allow you to push and pop configurations on an internal configuration stack, so that you can temporarily modify the output configuration, then easily restore the previous configuration. Simply push before your change to save the current configuration, then pop when you wish to restore the previous configuration. "output preset" sets Clinic’s output to one of several built-in preset configurations, as follows: "block" Clinic’s original starting configuration. Writes everything immediately after the input block. Suppress the "parser_prototype" and "docstring_prototype", write everything else to "block". "file" Designed to write everything to the “clinic file” that it can. You then "#include" this file near the top of your file. You may need to rearrange your file to make this work, though usually this just means creating forward declarations for various "typedef" and "PyTypeObject" definitions. Suppress the "parser_prototype" and "docstring_prototype", write the "impl_definition" to "block", and write everything else to "file". The default filename is ""{dirname}/clinic/{basename}.h"". "buffer" Save up most of the output from Clinic, to be written into your file near the end. For Python files implementing modules or builtin types, it’s recommended that you dump the buffer just above the static structures for your module or builtin type; these are normally very near the end. Using "buffer" may require even more editing than "file", if your file has static "PyMethodDef" arrays defined in the middle of the file. Suppress the "parser_prototype", "impl_prototype", and "docstring_prototype", write the "impl_definition" to "block", and write everything else to "file". "two-pass" Similar to the "buffer" preset, but writes forward declarations to the "two-pass" buffer, and definitions to the "buffer". This is similar to the "buffer" preset, but may require less editing than "buffer". Dump the "two-pass" buffer near the top of your file, and dump the "buffer" near the end just like you would when using the "buffer" preset. Suppresses the "impl_prototype", write the "impl_definition" to "block", write "docstring_prototype", "methoddef_define", and "parser_prototype" to "two-pass", write everything else to "buffer". "partial-buffer" Similar to the "buffer" preset, but writes more things to "block", only writing the really big chunks of generated code to "buffer". This avoids the definition-before-use problem of "buffer" completely, at the small cost of having slightly more stuff in the block’s output. Dump the "buffer" near the end, just like you would when using the "buffer" preset. Suppresses the "impl_prototype", write the "docstring_definition" and "parser_definition" to "buffer", write everything else to "block". The third new directive is "destination": destination [...] This performs an operation on the destination named "name". There are two defined subcommands: "new" and "clear". The "new" subcommand works like this: destination new This creates a new destination with name "" and type "". There are five destination types: "suppress" Throws the text away. "block" Writes the text to the current block. This is what Clinic originally did. "buffer" A simple text buffer, like the “buffer” builtin destination above. "file" A text file. The file destination takes an extra argument, a template to use for building the filename, like so: destination new The template can use three strings internally that will be replaced by bits of the filename: {path} The full path to the file, including directory and full filename. {dirname} The name of the directory the file is in. {basename} Just the name of the file, not including the directory. {basename_root} Basename with the extension clipped off (everything up to but not including the last ‘.’). {basename_extension} The last ‘.’ and everything after it. If the basename does not contain a period, this will be the empty string. If there are no periods in the filename, {basename} and {filename} are the same, and {extension} is empty. “{basename}{extension}” is always exactly the same as “{filename}”.” "two-pass" A two-pass buffer, like the “two-pass” builtin destination above. The "clear" subcommand works like this: destination clear It removes all the accumulated text up to this point in the destination. (I don’t know what you’d need this for, but I thought maybe it’d be useful while someone’s experimenting.) The fourth new directive is "set": set line_prefix "string" set line_suffix "string" "set" lets you set two internal variables in Clinic. "line_prefix" is a string that will be prepended to every line of Clinic’s output; "line_suffix" is a string that will be appended to every line of Clinic’s output. Both of these support two format strings: "{block comment start}" Turns into the string "/*", the start-comment text sequence for C files. "{block comment end}" Turns into the string "*/", the end-comment text sequence for C files. The final new directive is one you shouldn’t need to use directly, called "preserve": preserve This tells Clinic that the current contents of the output should be kept, unmodified. This is used internally by Clinic when dumping output into "file" files; wrapping it in a Clinic block lets Clinic use its existing checksum functionality to ensure the file was not modified by hand before it gets overwritten. The #ifdef trick ---------------- If you’re converting a function that isn’t available on all platforms, there’s a trick you can use to make life a little easier. The existing code probably looks like this: #ifdef HAVE_FUNCTIONNAME static module_functionname(...) { ... } #endif /* HAVE_FUNCTIONNAME */ And then in the "PyMethodDef" structure at the bottom the existing code will have: #ifdef HAVE_FUNCTIONNAME {'functionname', ... }, #endif /* HAVE_FUNCTIONNAME */ In this scenario, you should enclose the body of your impl function inside the "#ifdef", like so: #ifdef HAVE_FUNCTIONNAME /*[clinic input] module.functionname ... [clinic start generated code]*/ static module_functionname(...) { ... } #endif /* HAVE_FUNCTIONNAME */ Then, remove those three lines from the "PyMethodDef" structure, replacing them with the macro Argument Clinic generated: MODULE_FUNCTIONNAME_METHODDEF (You can find the real name for this macro inside the generated code. Or you can calculate it yourself: it’s the name of your function as defined on the first line of your block, but with periods changed to underscores, uppercased, and ""_METHODDEF"" added to the end.) Perhaps you’re wondering: what if "HAVE_FUNCTIONNAME" isn’t defined? The "MODULE_FUNCTIONNAME_METHODDEF" macro won’t be defined either! Here’s where Argument Clinic gets very clever. It actually detects that the Argument Clinic block might be deactivated by the "#ifdef". When that happens, it generates a little extra code that looks like this: #ifndef MODULE_FUNCTIONNAME_METHODDEF #define MODULE_FUNCTIONNAME_METHODDEF #endif /* !defined(MODULE_FUNCTIONNAME_METHODDEF) */ That means the macro always works. If the function is defined, this turns into the correct structure, including the trailing comma. If the function is undefined, this turns into nothing. However, this causes one ticklish problem: where should Argument Clinic put this extra code when using the “block” output preset? It can’t go in the output block, because that could be deactivated by the "#ifdef". (That’s the whole point!) In this situation, Argument Clinic writes the extra code to the “buffer” destination. This may mean that you get a complaint from Argument Clinic: Warning in file "Modules/posixmodule.c" on line 12357: Destination buffer 'buffer' not empty at end of file, emptying. When this happens, just open your file, find the "dump buffer" block that Argument Clinic added to your file (it’ll be at the very bottom), then move it above the "PyMethodDef" structure where that macro is used. Using Argument Clinic in Python files ------------------------------------- It’s actually possible to use Argument Clinic to preprocess Python files. There’s no point to using Argument Clinic blocks, of course, as the output wouldn’t make any sense to the Python interpreter. But using Argument Clinic to run Python blocks lets you use Python as a Python preprocessor! Since Python comments are different from C comments, Argument Clinic blocks embedded in Python files look slightly different. They look like this: #/*[python input] #print("def foo(): pass") #[python start generated code]*/ def foo(): pass #/*[python checksum:...]*/ Porting Extension Modules to Python 3 ************************************* We recommend the following resources for porting extension modules to Python 3: * The Migrating C extensions chapter from *Supporting Python 3: An in- depth guide*, a book on moving from Python 2 to Python 3 in general, guides the reader through porting an extension module. * The Porting guide from the *py3c* project provides opinionated suggestions with supporting code. * The Cython and CFFI libraries offer abstractions over Python’s C API. Extensions generally need to be re-written to use one of them, but the library then handles differences between various Python versions and implementations. Curses Programming with Python ****************************** Author: A.M. Kuchling, Eric S. Raymond Release: 2.04 Abstract ^^^^^^^^ This document describes how to use the "curses" extension module to control text-mode displays. What is curses? =============== The curses library supplies a terminal-independent screen-painting and keyboard-handling facility for text-based terminals; such terminals include VT100s, the Linux console, and the simulated terminal provided by various programs. Display terminals support various control codes to perform common operations such as moving the cursor, scrolling the screen, and erasing areas. Different terminals use widely differing codes, and often have their own minor quirks. In a world of graphical displays, one might ask “why bother”? It’s true that character-cell display terminals are an obsolete technology, but there are niches in which being able to do fancy things with them are still valuable. One niche is on small-footprint or embedded Unixes that don’t run an X server. Another is tools such as OS installers and kernel configurators that may have to run before any graphical support is available. The curses library provides fairly basic functionality, providing the programmer with an abstraction of a display containing multiple non- overlapping windows of text. The contents of a window can be changed in various ways—adding text, erasing it, changing its appearance—and the curses library will figure out what control codes need to be sent to the terminal to produce the right output. curses doesn’t provide many user-interface concepts such as buttons, checkboxes, or dialogs; if you need such features, consider a user interface library such as Urwid. The curses library was originally written for BSD Unix; the later System V versions of Unix from AT&T added many enhancements and new functions. BSD curses is no longer maintained, having been replaced by ncurses, which is an open-source implementation of the AT&T interface. If you’re using an open-source Unix such as Linux or FreeBSD, your system almost certainly uses ncurses. Since most current commercial Unix versions are based on System V code, all the functions described here will probably be available. The older versions of curses carried by some proprietary Unixes may not support everything, though. The Windows version of Python doesn’t include the "curses" module. A ported version called UniCurses is available. The Python curses module ------------------------ The Python module is a fairly simple wrapper over the C functions provided by curses; if you’re already familiar with curses programming in C, it’s really easy to transfer that knowledge to Python. The biggest difference is that the Python interface makes things simpler by merging different C functions such as "addstr()", "mvaddstr()", and "mvwaddstr()" into a single "addstr()" method. You’ll see this covered in more detail later. This HOWTO is an introduction to writing text-mode programs with curses and Python. It doesn’t attempt to be a complete guide to the curses API; for that, see the Python library guide’s section on ncurses, and the C manual pages for ncurses. It will, however, give you the basic ideas. Starting and ending a curses application ======================================== Before doing anything, curses must be initialized. This is done by calling the "initscr()" function, which will determine the terminal type, send any required setup codes to the terminal, and create various internal data structures. If successful, "initscr()" returns a window object representing the entire screen; this is usually called "stdscr" after the name of the corresponding C variable. import curses stdscr = curses.initscr() Usually curses applications turn off automatic echoing of keys to the screen, in order to be able to read keys and only display them under certain circumstances. This requires calling the "noecho()" function. curses.noecho() Applications will also commonly need to react to keys instantly, without requiring the Enter key to be pressed; this is called cbreak mode, as opposed to the usual buffered input mode. curses.cbreak() Terminals usually return special keys, such as the cursor keys or navigation keys such as Page Up and Home, as a multibyte escape sequence. While you could write your application to expect such sequences and process them accordingly, curses can do it for you, returning a special value such as "curses.KEY_LEFT". To get curses to do the job, you’ll have to enable keypad mode. stdscr.keypad(True) Terminating a curses application is much easier than starting one. You’ll need to call: curses.nocbreak() stdscr.keypad(False) curses.echo() to reverse the curses-friendly terminal settings. Then call the "endwin()" function to restore the terminal to its original operating mode. curses.endwin() A common problem when debugging a curses application is to get your terminal messed up when the application dies without restoring the terminal to its previous state. In Python this commonly happens when your code is buggy and raises an uncaught exception. Keys are no longer echoed to the screen when you type them, for example, which makes using the shell difficult. In Python you can avoid these complications and make debugging much easier by importing the "curses.wrapper()" function and using it like this: from curses import wrapper def main(stdscr): # Clear screen stdscr.clear() # This raises ZeroDivisionError when i == 10. for i in range(0, 11): v = i-10 stdscr.addstr(i, 0, '10 divided by {} is {}'.format(v, 10/v)) stdscr.refresh() stdscr.getkey() wrapper(main) The "wrapper()" function takes a callable object and does the initializations described above, also initializing colors if color support is present. "wrapper()" then runs your provided callable. Once the callable returns, "wrapper()" will restore the original state of the terminal. The callable is called inside a "try"…"except" that catches exceptions, restores the state of the terminal, and then re- raises the exception. Therefore your terminal won’t be left in a funny state on exception and you’ll be able to read the exception’s message and traceback. Windows and Pads ================ Windows are the basic abstraction in curses. A window object represents a rectangular area of the screen, and supports methods to display text, erase it, allow the user to input strings, and so forth. The "stdscr" object returned by the "initscr()" function is a window object that covers the entire screen. Many programs may need only this single window, but you might wish to divide the screen into smaller windows, in order to redraw or clear them separately. The "newwin()" function creates a new window of a given size, returning the new window object. begin_x = 20; begin_y = 7 height = 5; width = 40 win = curses.newwin(height, width, begin_y, begin_x) Note that the coordinate system used in curses is unusual. Coordinates are always passed in the order *y,x*, and the top-left corner of a window is coordinate (0,0). This breaks the normal convention for handling coordinates where the *x* coordinate comes first. This is an unfortunate difference from most other computer applications, but it’s been part of curses since it was first written, and it’s too late to change things now. Your application can determine the size of the screen by using the "curses.LINES" and "curses.COLS" variables to obtain the *y* and *x* sizes. Legal coordinates will then extend from "(0,0)" to "(curses.LINES - 1, curses.COLS - 1)". When you call a method to display or erase text, the effect doesn’t immediately show up on the display. Instead you must call the "refresh()" method of window objects to update the screen. This is because curses was originally written with slow 300-baud terminal connections in mind; with these terminals, minimizing the time required to redraw the screen was very important. Instead curses accumulates changes to the screen and displays them in the most efficient manner when you call "refresh()". For example, if your program displays some text in a window and then clears the window, there’s no need to send the original text because they’re never visible. In practice, explicitly telling curses to redraw a window doesn’t really complicate programming with curses much. Most programs go into a flurry of activity, and then pause waiting for a keypress or some other action on the part of the user. All you have to do is to be sure that the screen has been redrawn before pausing to wait for user input, by first calling "stdscr.refresh()" or the "refresh()" method of some other relevant window. A pad is a special case of a window; it can be larger than the actual display screen, and only a portion of the pad displayed at a time. Creating a pad requires the pad’s height and width, while refreshing a pad requires giving the coordinates of the on-screen area where a subsection of the pad will be displayed. pad = curses.newpad(100, 100) # These loops fill the pad with letters; addch() is # explained in the next section for y in range(0, 99): for x in range(0, 99): pad.addch(y,x, ord('a') + (x*x+y*y) % 26) # Displays a section of the pad in the middle of the screen. # (0,0) : coordinate of upper-left corner of pad area to display. # (5,5) : coordinate of upper-left corner of window area to be filled # with pad content. # (20, 75) : coordinate of lower-right corner of window area to be # : filled with pad content. pad.refresh( 0,0, 5,5, 20,75) The "refresh()" call displays a section of the pad in the rectangle extending from coordinate (5,5) to coordinate (20,75) on the screen; the upper left corner of the displayed section is coordinate (0,0) on the pad. Beyond that difference, pads are exactly like ordinary windows and support the same methods. If you have multiple windows and pads on screen there is a more efficient way to update the screen and prevent annoying screen flicker as each part of the screen gets updated. "refresh()" actually does two things: 1. Calls the "noutrefresh()" method of each window to update an underlying data structure representing the desired state of the screen. 2. Calls the function "doupdate()" function to change the physical screen to match the desired state recorded in the data structure. Instead you can call "noutrefresh()" on a number of windows to update the data structure, and then call "doupdate()" to update the screen. Displaying Text =============== From a C programmer’s point of view, curses may sometimes look like a twisty maze of functions, all subtly different. For example, "addstr()" displays a string at the current cursor location in the "stdscr" window, while "mvaddstr()" moves to a given y,x coordinate first before displaying the string. "waddstr()" is just like "addstr()", but allows specifying a window to use instead of using "stdscr" by default. "mvwaddstr()" allows specifying both a window and a coordinate. Fortunately the Python interface hides all these details. "stdscr" is a window object like any other, and methods such as "addstr()" accept multiple argument forms. Usually there are four different forms. +-----------------------------------+-------------------------------------------------+ | Form | Description | |===================================|=================================================| | *str* or *ch* | Display the string *str* or character *ch* at | | | the current position | +-----------------------------------+-------------------------------------------------+ | *str* or *ch*, *attr* | Display the string *str* or character *ch*, | | | using attribute *attr* at the current position | +-----------------------------------+-------------------------------------------------+ | *y*, *x*, *str* or *ch* | Move to position *y,x* within the window, and | | | display *str* or *ch* | +-----------------------------------+-------------------------------------------------+ | *y*, *x*, *str* or *ch*, *attr* | Move to position *y,x* within the window, and | | | display *str* or *ch*, using attribute *attr* | +-----------------------------------+-------------------------------------------------+ Attributes allow displaying text in highlighted forms such as boldface, underline, reverse code, or in color. They’ll be explained in more detail in the next subsection. The "addstr()" method takes a Python string or bytestring as the value to be displayed. The contents of bytestrings are sent to the terminal as-is. Strings are encoded to bytes using the value of the window’s "encoding" attribute; this defaults to the default system encoding as returned by "locale.getpreferredencoding()". The "addch()" methods take a character, which can be either a string of length 1, a bytestring of length 1, or an integer. Constants are provided for extension characters; these constants are integers greater than 255. For example, "ACS_PLMINUS" is a +/- symbol, and "ACS_ULCORNER" is the upper left corner of a box (handy for drawing borders). You can also use the appropriate Unicode character. Windows remember where the cursor was left after the last operation, so if you leave out the *y,x* coordinates, the string or character will be displayed wherever the last operation left off. You can also move the cursor with the "move(y,x)" method. Because some terminals always display a flashing cursor, you may want to ensure that the cursor is positioned in some location where it won’t be distracting; it can be confusing to have the cursor blinking at some apparently random location. If your application doesn’t need a blinking cursor at all, you can call "curs_set(False)" to make it invisible. For compatibility with older curses versions, there’s a "leaveok(bool)" function that’s a synonym for "curs_set()". When *bool* is true, the curses library will attempt to suppress the flashing cursor, and you won’t need to worry about leaving it in odd locations. Attributes and Color -------------------- Characters can be displayed in different ways. Status lines in a text-based application are commonly shown in reverse video, or a text viewer may need to highlight certain words. curses supports this by allowing you to specify an attribute for each cell on the screen. An attribute is an integer, each bit representing a different attribute. You can try to display text with multiple attribute bits set, but curses doesn’t guarantee that all the possible combinations are available, or that they’re all visually distinct. That depends on the ability of the terminal being used, so it’s safest to stick to the most commonly available attributes, listed here. +------------------------+----------------------------------------+ | Attribute | Description | |========================|========================================| | "A_BLINK" | Blinking text | +------------------------+----------------------------------------+ | "A_BOLD" | Extra bright or bold text | +------------------------+----------------------------------------+ | "A_DIM" | Half bright text | +------------------------+----------------------------------------+ | "A_REVERSE" | Reverse-video text | +------------------------+----------------------------------------+ | "A_STANDOUT" | The best highlighting mode available | +------------------------+----------------------------------------+ | "A_UNDERLINE" | Underlined text | +------------------------+----------------------------------------+ So, to display a reverse-video status line on the top line of the screen, you could code: stdscr.addstr(0, 0, "Current mode: Typing mode", curses.A_REVERSE) stdscr.refresh() The curses library also supports color on those terminals that provide it. The most common such terminal is probably the Linux console, followed by color xterms. To use color, you must call the "start_color()" function soon after calling "initscr()", to initialize the default color set (the "curses.wrapper()" function does this automatically). Once that’s done, the "has_colors()" function returns TRUE if the terminal in use can actually display color. (Note: curses uses the American spelling ‘color’, instead of the Canadian/British spelling ‘colour’. If you’re used to the British spelling, you’ll have to resign yourself to misspelling it for the sake of these functions.) The curses library maintains a finite number of color pairs, containing a foreground (or text) color and a background color. You can get the attribute value corresponding to a color pair with the "color_pair()" function; this can be bitwise-OR’ed with other attributes such as "A_REVERSE", but again, such combinations are not guaranteed to work on all terminals. An example, which displays a line of text using color pair 1: stdscr.addstr("Pretty text", curses.color_pair(1)) stdscr.refresh() As I said before, a color pair consists of a foreground and background color. The "init_pair(n, f, b)" function changes the definition of color pair *n*, to foreground color f and background color b. Color pair 0 is hard-wired to white on black, and cannot be changed. Colors are numbered, and "start_color()" initializes 8 basic colors when it activates color mode. They are: 0:black, 1:red, 2:green, 3:yellow, 4:blue, 5:magenta, 6:cyan, and 7:white. The "curses" module defines named constants for each of these colors: "curses.COLOR_BLACK", "curses.COLOR_RED", and so forth. Let’s put all this together. To change color 1 to red text on a white background, you would call: curses.init_pair(1, curses.COLOR_RED, curses.COLOR_WHITE) When you change a color pair, any text already displayed using that color pair will change to the new colors. You can also display new text in this color with: stdscr.addstr(0,0, "RED ALERT!", curses.color_pair(1)) Very fancy terminals can change the definitions of the actual colors to a given RGB value. This lets you change color 1, which is usually red, to purple or blue or any other color you like. Unfortunately, the Linux console doesn’t support this, so I’m unable to try it out, and can’t provide any examples. You can check if your terminal can do this by calling "can_change_color()", which returns "True" if the capability is there. If you’re lucky enough to have such a talented terminal, consult your system’s man pages for more information. User Input ========== The C curses library offers only very simple input mechanisms. Python’s "curses" module adds a basic text-input widget. (Other libraries such as Urwid have more extensive collections of widgets.) There are two methods for getting input from a window: * "getch()" refreshes the screen and then waits for the user to hit a key, displaying the key if "echo()" has been called earlier. You can optionally specify a coordinate to which the cursor should be moved before pausing. * "getkey()" does the same thing but converts the integer to a string. Individual characters are returned as 1-character strings, and special keys such as function keys return longer strings containing a key name such as "KEY_UP" or "^G". It’s possible to not wait for the user using the "nodelay()" window method. After "nodelay(True)", "getch()" and "getkey()" for the window become non-blocking. To signal that no input is ready, "getch()" returns "curses.ERR" (a value of -1) and "getkey()" raises an exception. There’s also a "halfdelay()" function, which can be used to (in effect) set a timer on each "getch()"; if no input becomes available within a specified delay (measured in tenths of a second), curses raises an exception. The "getch()" method returns an integer; if it’s between 0 and 255, it represents the ASCII code of the key pressed. Values greater than 255 are special keys such as Page Up, Home, or the cursor keys. You can compare the value returned to constants such as "curses.KEY_PPAGE", "curses.KEY_HOME", or "curses.KEY_LEFT". The main loop of your program may look something like this: while True: c = stdscr.getch() if c == ord('p'): PrintDocument() elif c == ord('q'): break # Exit the while loop elif c == curses.KEY_HOME: x = y = 0 The "curses.ascii" module supplies ASCII class membership functions that take either integer or 1-character string arguments; these may be useful in writing more readable tests for such loops. It also supplies conversion functions that take either integer or 1 -character-string arguments and return the same type. For example, "curses.ascii.ctrl()" returns the control character corresponding to its argument. There’s also a method to retrieve an entire string, "getstr()". It isn’t used very often, because its functionality is quite limited; the only editing keys available are the backspace key and the Enter key, which terminates the string. It can optionally be limited to a fixed number of characters. curses.echo() # Enable echoing of characters # Get a 15-character string, with the cursor on the top line s = stdscr.getstr(0,0, 15) The "curses.textpad" module supplies a text box that supports an Emacs-like set of keybindings. Various methods of the "Textbox" class support editing with input validation and gathering the edit results either with or without trailing spaces. Here’s an example: import curses from curses.textpad import Textbox, rectangle def main(stdscr): stdscr.addstr(0, 0, "Enter IM message: (hit Ctrl-G to send)") editwin = curses.newwin(5,30, 2,1) rectangle(stdscr, 1,0, 1+5+1, 1+30+1) stdscr.refresh() box = Textbox(editwin) # Let the user edit until Ctrl-G is struck. box.edit() # Get resulting contents message = box.gather() See the library documentation on "curses.textpad" for more details. For More Information ==================== This HOWTO doesn’t cover some advanced topics, such as reading the contents of the screen or capturing mouse events from an xterm instance, but the Python library page for the "curses" module is now reasonably complete. You should browse it next. If you’re in doubt about the detailed behavior of the curses functions, consult the manual pages for your curses implementation, whether it’s ncurses or a proprietary Unix vendor’s. The manual pages will document any quirks, and provide complete lists of all the functions, attributes, and "ACS_*" characters available to you. Because the curses API is so large, some functions aren’t supported in the Python interface. Often this isn’t because they’re difficult to implement, but because no one has needed them yet. Also, Python doesn’t yet support the menu library associated with ncurses. Patches adding support for these would be welcome; see the Python Developer’s Guide to learn more about submitting patches to Python. * Writing Programs with NCURSES: a lengthy tutorial for C programmers. * The ncurses man page * The ncurses FAQ * “Use curses… don’t swear”: video of a PyCon 2013 talk on controlling terminals using curses or Urwid. * “Console Applications with Urwid”: video of a PyCon CA 2012 talk demonstrating some applications written using Urwid. Descriptor HowTo Guide ********************** Author: Raymond Hettinger Contact: Contents ^^^^^^^^ * Descriptor HowTo Guide * Primer * Simple example: A descriptor that returns a constant * Dynamic lookups * Managed attributes * Customized names * Closing thoughts * Complete Practical Example * Validator class * Custom validators * Practical application * Technical Tutorial * Abstract * Definition and introduction * Descriptor protocol * Overview of descriptor invocation * Invocation from an instance * Invocation from a class * Invocation from super * Summary of invocation logic * Automatic name notification * ORM example * Pure Python Equivalents * Properties * Functions and methods * Kinds of methods * Static methods * Class methods * Member objects and __slots__ *Descriptors* let objects customize attribute lookup, storage, and deletion. This guide has four major sections: 1. The “primer” gives a basic overview, moving gently from simple examples, adding one feature at a time. Start here if you’re new to descriptors. 2. The second section shows a complete, practical descriptor example. If you already know the basics, start there. 3. The third section provides a more technical tutorial that goes into the detailed mechanics of how descriptors work. Most people don’t need this level of detail. 4. The last section has pure Python equivalents for built-in descriptors that are written in C. Read this if you’re curious about how functions turn into bound methods or about the implementation of common tools like "classmethod()", "staticmethod()", "property()", and *__slots__*. Primer ====== In this primer, we start with the most basic possible example and then we’ll add new capabilities one by one. Simple example: A descriptor that returns a constant ---------------------------------------------------- The "Ten" class is a descriptor that always returns the constant "10" from its "__get__()" method: class Ten: def __get__(self, obj, objtype=None): return 10 To use the descriptor, it must be stored as a class variable in another class: class A: x = 5 # Regular class attribute y = Ten() # Descriptor instance An interactive session shows the difference between normal attribute lookup and descriptor lookup: >>> a = A() # Make an instance of class A >>> a.x # Normal attribute lookup 5 >>> a.y # Descriptor lookup 10 In the "a.x" attribute lookup, the dot operator finds the key "x" and the value "5" in the class dictionary. In the "a.y" lookup, the dot operator finds a descriptor instance, recognized by its "__get__" method, and calls that method which returns "10". Note that the value "10" is not stored in either the class dictionary or the instance dictionary. Instead, the value "10" is computed on demand. This example shows how a simple descriptor works, but it isn’t very useful. For retrieving constants, normal attribute lookup would be better. In the next section, we’ll create something more useful, a dynamic lookup. Dynamic lookups --------------- Interesting descriptors typically run computations instead of returning constants: import os class DirectorySize: def __get__(self, obj, objtype=None): return len(os.listdir(obj.dirname)) class Directory: size = DirectorySize() # Descriptor instance def __init__(self, dirname): self.dirname = dirname # Regular instance attribute An interactive session shows that the lookup is dynamic — it computes different, updated answers each time: >>> s = Directory('songs') >>> g = Directory('games') >>> s.size # The songs directory has twenty files 20 >>> g.size # The games directory has three files 3 >>> os.remove('games/chess') # Delete a game >>> g.size # File count is automatically updated 2 Besides showing how descriptors can run computations, this example also reveals the purpose of the parameters to "__get__()". The *self* parameter is *size*, an instance of *DirectorySize*. The *obj* parameter is either *g* or *s*, an instance of *Directory*. It is the *obj* parameter that lets the "__get__()" method learn the target directory. The *objtype* parameter is the class *Directory*. Managed attributes ------------------ A popular use for descriptors is managing access to instance data. The descriptor is assigned to a public attribute in the class dictionary while the actual data is stored as a private attribute in the instance dictionary. The descriptor’s "__get__()" and "__set__()" methods are triggered when the public attribute is accessed. In the following example, *age* is the public attribute and *_age* is the private attribute. When the public attribute is accessed, the descriptor logs the lookup or update: import logging logging.basicConfig(level=logging.INFO) class LoggedAgeAccess: def __get__(self, obj, objtype=None): value = obj._age logging.info('Accessing %r giving %r', 'age', value) return value def __set__(self, obj, value): logging.info('Updating %r to %r', 'age', value) obj._age = value class Person: age = LoggedAgeAccess() # Descriptor instance def __init__(self, name, age): self.name = name # Regular instance attribute self.age = age # Calls __set__() def birthday(self): self.age += 1 # Calls both __get__() and __set__() An interactive session shows that all access to the managed attribute *age* is logged, but that the regular attribute *name* is not logged: >>> mary = Person('Mary M', 30) # The initial age update is logged INFO:root:Updating 'age' to 30 >>> dave = Person('David D', 40) INFO:root:Updating 'age' to 40 >>> vars(mary) # The actual data is in a private attribute {'name': 'Mary M', '_age': 30} >>> vars(dave) {'name': 'David D', '_age': 40} >>> mary.age # Access the data and log the lookup INFO:root:Accessing 'age' giving 30 30 >>> mary.birthday() # Updates are logged as well INFO:root:Accessing 'age' giving 30 INFO:root:Updating 'age' to 31 >>> dave.name # Regular attribute lookup isn't logged 'David D' >>> dave.age # Only the managed attribute is logged INFO:root:Accessing 'age' giving 40 40 One major issue with this example is that the private name *_age* is hardwired in the *LoggedAgeAccess* class. That means that each instance can only have one logged attribute and that its name is unchangeable. In the next example, we’ll fix that problem. Customized names ---------------- When a class uses descriptors, it can inform each descriptor about which variable name was used. In this example, the "Person" class has two descriptor instances, *name* and *age*. When the "Person" class is defined, it makes a callback to "__set_name__()" in *LoggedAccess* so that the field names can be recorded, giving each descriptor its own *public_name* and *private_name*: import logging logging.basicConfig(level=logging.INFO) class LoggedAccess: def __set_name__(self, owner, name): self.public_name = name self.private_name = '_' + name def __get__(self, obj, objtype=None): value = getattr(obj, self.private_name) logging.info('Accessing %r giving %r', self.public_name, value) return value def __set__(self, obj, value): logging.info('Updating %r to %r', self.public_name, value) setattr(obj, self.private_name, value) class Person: name = LoggedAccess() # First descriptor instance age = LoggedAccess() # Second descriptor instance def __init__(self, name, age): self.name = name # Calls the first descriptor self.age = age # Calls the second descriptor def birthday(self): self.age += 1 An interactive session shows that the "Person" class has called "__set_name__()" so that the field names would be recorded. Here we call "vars()" to look up the descriptor without triggering it: >>> vars(vars(Person)['name']) {'public_name': 'name', 'private_name': '_name'} >>> vars(vars(Person)['age']) {'public_name': 'age', 'private_name': '_age'} The new class now logs access to both *name* and *age*: >>> pete = Person('Peter P', 10) INFO:root:Updating 'name' to 'Peter P' INFO:root:Updating 'age' to 10 >>> kate = Person('Catherine C', 20) INFO:root:Updating 'name' to 'Catherine C' INFO:root:Updating 'age' to 20 The two *Person* instances contain only the private names: >>> vars(pete) {'_name': 'Peter P', '_age': 10} >>> vars(kate) {'_name': 'Catherine C', '_age': 20} Closing thoughts ---------------- A *descriptor* is what we call any object that defines "__get__()", "__set__()", or "__delete__()". Optionally, descriptors can have a "__set_name__()" method. This is only used in cases where a descriptor needs to know either the class where it was created or the name of class variable it was assigned to. (This method, if present, is called even if the class is not a descriptor.) Descriptors get invoked by the dot “operator” during attribute lookup. If a descriptor is accessed indirectly with "vars(some_class)[descriptor_name]", the descriptor instance is returned without invoking it. Descriptors only work when used as class variables. When put in instances, they have no effect. The main motivation for descriptors is to provide a hook allowing objects stored in class variables to control what happens during attribute lookup. Traditionally, the calling class controls what happens during lookup. Descriptors invert that relationship and allow the data being looked- up to have a say in the matter. Descriptors are used throughout the language. It is how functions turn into bound methods. Common tools like "classmethod()", "staticmethod()", "property()", and "functools.cached_property()" are all implemented as descriptors. Complete Practical Example ========================== In this example, we create a practical and powerful tool for locating notoriously hard to find data corruption bugs. Validator class --------------- A validator is a descriptor for managed attribute access. Prior to storing any data, it verifies that the new value meets various type and range restrictions. If those restrictions aren’t met, it raises an exception to prevent data corruption at its source. This "Validator" class is both an *abstract base class* and a managed attribute descriptor: from abc import ABC, abstractmethod class Validator(ABC): def __set_name__(self, owner, name): self.private_name = '_' + name def __get__(self, obj, objtype=None): return getattr(obj, self.private_name) def __set__(self, obj, value): self.validate(value) setattr(obj, self.private_name, value) @abstractmethod def validate(self, value): pass Custom validators need to inherit from "Validator" and must supply a "validate()" method to test various restrictions as needed. Custom validators ----------------- Here are three practical data validation utilities: 1. "OneOf" verifies that a value is one of a restricted set of options. 2. "Number" verifies that a value is either an "int" or "float". Optionally, it verifies that a value is between a given minimum or maximum. 3. "String" verifies that a value is a "str". Optionally, it validates a given minimum or maximum length. It can validate a user-defined predicate as well. class OneOf(Validator): def __init__(self, *options): self.options = set(options) def validate(self, value): if value not in self.options: raise ValueError(f'Expected {value!r} to be one of {self.options!r}') class Number(Validator): def __init__(self, minvalue=None, maxvalue=None): self.minvalue = minvalue self.maxvalue = maxvalue def validate(self, value): if not isinstance(value, (int, float)): raise TypeError(f'Expected {value!r} to be an int or float') if self.minvalue is not None and value < self.minvalue: raise ValueError( f'Expected {value!r} to be at least {self.minvalue!r}' ) if self.maxvalue is not None and value > self.maxvalue: raise ValueError( f'Expected {value!r} to be no more than {self.maxvalue!r}' ) class String(Validator): def __init__(self, minsize=None, maxsize=None, predicate=None): self.minsize = minsize self.maxsize = maxsize self.predicate = predicate def validate(self, value): if not isinstance(value, str): raise TypeError(f'Expected {value!r} to be an str') if self.minsize is not None and len(value) < self.minsize: raise ValueError( f'Expected {value!r} to be no smaller than {self.minsize!r}' ) if self.maxsize is not None and len(value) > self.maxsize: raise ValueError( f'Expected {value!r} to be no bigger than {self.maxsize!r}' ) if self.predicate is not None and not self.predicate(value): raise ValueError( f'Expected {self.predicate} to be true for {value!r}' ) Practical application --------------------- Here’s how the data validators can be used in a real class: class Component: name = String(minsize=3, maxsize=10, predicate=str.isupper) kind = OneOf('wood', 'metal', 'plastic') quantity = Number(minvalue=0) def __init__(self, name, kind, quantity): self.name = name self.kind = kind self.quantity = quantity The descriptors prevent invalid instances from being created: >>> Component('Widget', 'metal', 5) # Blocked: 'Widget' is not all uppercase Traceback (most recent call last): ... ValueError: Expected to be true for 'Widget' >>> Component('WIDGET', 'metle', 5) # Blocked: 'metle' is misspelled Traceback (most recent call last): ... ValueError: Expected 'metle' to be one of {'metal', 'plastic', 'wood'} >>> Component('WIDGET', 'metal', -5) # Blocked: -5 is negative Traceback (most recent call last): ... ValueError: Expected -5 to be at least 0 >>> Component('WIDGET', 'metal', 'V') # Blocked: 'V' isn't a number Traceback (most recent call last): ... TypeError: Expected 'V' to be an int or float >>> c = Component('WIDGET', 'metal', 5) # Allowed: The inputs are valid Technical Tutorial ================== What follows is a more technical tutorial for the mechanics and details of how descriptors work. Abstract -------- Defines descriptors, summarizes the protocol, and shows how descriptors are called. Provides an example showing how object relational mappings work. Learning about descriptors not only provides access to a larger toolset, it creates a deeper understanding of how Python works. Definition and introduction --------------------------- In general, a descriptor is an attribute value that has one of the methods in the descriptor protocol. Those methods are "__get__()", "__set__()", and "__delete__()". If any of those methods are defined for an attribute, it is said to be a *descriptor*. The default behavior for attribute access is to get, set, or delete the attribute from an object’s dictionary. For instance, "a.x" has a lookup chain starting with "a.__dict__['x']", then "type(a).__dict__['x']", and continuing through the method resolution order of "type(a)". If the looked-up value is an object defining one of the descriptor methods, then Python may override the default behavior and invoke the descriptor method instead. Where this occurs in the precedence chain depends on which descriptor methods were defined. Descriptors are a powerful, general purpose protocol. They are the mechanism behind properties, methods, static methods, class methods, and "super()". They are used throughout Python itself. Descriptors simplify the underlying C code and offer a flexible set of new tools for everyday Python programs. Descriptor protocol ------------------- "descr.__get__(self, obj, type=None) -> value" "descr.__set__(self, obj, value) -> None" "descr.__delete__(self, obj) -> None" That is all there is to it. Define any of these methods and an object is considered a descriptor and can override default behavior upon being looked up as an attribute. If an object defines "__set__()" or "__delete__()", it is considered a data descriptor. Descriptors that only define "__get__()" are called non-data descriptors (they are often used for methods but other uses are possible). Data and non-data descriptors differ in how overrides are calculated with respect to entries in an instance’s dictionary. If an instance’s dictionary has an entry with the same name as a data descriptor, the data descriptor takes precedence. If an instance’s dictionary has an entry with the same name as a non-data descriptor, the dictionary entry takes precedence. To make a read-only data descriptor, define both "__get__()" and "__set__()" with the "__set__()" raising an "AttributeError" when called. Defining the "__set__()" method with an exception raising placeholder is enough to make it a data descriptor. Overview of descriptor invocation --------------------------------- A descriptor can be called directly with "desc.__get__(obj)" or "desc.__get__(None, cls)". But it is more common for a descriptor to be invoked automatically from attribute access. The expression "obj.x" looks up the attribute "x" in the chain of namespaces for "obj". If the search finds a descriptor outside of the instance "__dict__", its "__get__()" method is invoked according to the precedence rules listed below. The details of invocation depend on whether "obj" is an object, class, or instance of super. Invocation from an instance --------------------------- Instance lookup scans through a chain of namespaces giving data descriptors the highest priority, followed by instance variables, then non-data descriptors, then class variables, and lastly "__getattr__()" if it is provided. If a descriptor is found for "a.x", then it is invoked with: "desc.__get__(a, type(a))". The logic for a dotted lookup is in "object.__getattribute__()". Here is a pure Python equivalent: def object_getattribute(obj, name): "Emulate PyObject_GenericGetAttr() in Objects/object.c" null = object() objtype = type(obj) cls_var = getattr(objtype, name, null) descr_get = getattr(type(cls_var), '__get__', null) if descr_get is not null: if (hasattr(type(cls_var), '__set__') or hasattr(type(cls_var), '__delete__')): return descr_get(cls_var, obj, objtype) # data descriptor if hasattr(obj, '__dict__') and name in vars(obj): return vars(obj)[name] # instance variable if descr_get is not null: return descr_get(cls_var, obj, objtype) # non-data descriptor if cls_var is not null: return cls_var # class variable raise AttributeError(name) Note, there is no "__getattr__()" hook in the "__getattribute__()" code. That is why calling "__getattribute__()" directly or with "super().__getattribute__" will bypass "__getattr__()" entirely. Instead, it is the dot operator and the "getattr()" function that are responsible for invoking "__getattr__()" whenever "__getattribute__()" raises an "AttributeError". Their logic is encapsulated in a helper function: def getattr_hook(obj, name): "Emulate slot_tp_getattr_hook() in Objects/typeobject.c" try: return obj.__getattribute__(name) except AttributeError: if not hasattr(type(obj), '__getattr__'): raise return type(obj).__getattr__(obj, name) # __getattr__ Invocation from a class ----------------------- The logic for a dotted lookup such as "A.x" is in "type.__getattribute__()". The steps are similar to those for "object.__getattribute__()" but the instance dictionary lookup is replaced by a search through the class’s *method resolution order*. If a descriptor is found, it is invoked with "desc.__get__(None, A)". The full C implementation can be found in "type_getattro()" and "_PyType_Lookup()" in Objects/typeobject.c. Invocation from super --------------------- The logic for super’s dotted lookup is in the "__getattribute__()" method for object returned by "super()". A dotted lookup such as "super(A, obj).m" searches "obj.__class__.__mro__" for the base class "B" immediately following "A" and then returns "B.__dict__['m'].__get__(obj, A)". If not a descriptor, "m" is returned unchanged. The full C implementation can be found in "super_getattro()" in Objects/typeobject.c. A pure Python equivalent can be found in Guido’s Tutorial. Summary of invocation logic --------------------------- The mechanism for descriptors is embedded in the "__getattribute__()" methods for "object", "type", and "super()". The important points to remember are: * Descriptors are invoked by the "__getattribute__()" method. * Classes inherit this machinery from "object", "type", or "super()". * Overriding "__getattribute__()" prevents automatic descriptor calls because all the descriptor logic is in that method. * "object.__getattribute__()" and "type.__getattribute__()" make different calls to "__get__()". The first includes the instance and may include the class. The second puts in "None" for the instance and always includes the class. * Data descriptors always override instance dictionaries. * Non-data descriptors may be overridden by instance dictionaries. Automatic name notification --------------------------- Sometimes it is desirable for a descriptor to know what class variable name it was assigned to. When a new class is created, the "type" metaclass scans the dictionary of the new class. If any of the entries are descriptors and if they define "__set_name__()", that method is called with two arguments. The *owner* is the class where the descriptor is used, and the *name* is the class variable the descriptor was assigned to. The implementation details are in "type_new()" and "set_names()" in Objects/typeobject.c. Since the update logic is in "type.__new__()", notifications only take place at the time of class creation. If descriptors are added to the class afterwards, "__set_name__()" will need to be called manually. ORM example ----------- The following code is simplified skeleton showing how data descriptors could be used to implement an object relational mapping. The essential idea is that the data is stored in an external database. The Python instances only hold keys to the database’s tables. Descriptors take care of lookups or updates: class Field: def __set_name__(self, owner, name): self.fetch = f'SELECT {name} FROM {owner.table} WHERE {owner.key}=?;' self.store = f'UPDATE {owner.table} SET {name}=? WHERE {owner.key}=?;' def __get__(self, obj, objtype=None): return conn.execute(self.fetch, [obj.key]).fetchone()[0] def __set__(self, obj, value): conn.execute(self.store, [value, obj.key]) conn.commit() We can use the "Field" class to define models that describe the schema for each table in a database: class Movie: table = 'Movies' # Table name key = 'title' # Primary key director = Field() year = Field() def __init__(self, key): self.key = key class Song: table = 'Music' key = 'title' artist = Field() year = Field() genre = Field() def __init__(self, key): self.key = key To use the models, first connect to the database: >>> import sqlite3 >>> conn = sqlite3.connect('entertainment.db') An interactive session shows how data is retrieved from the database and how it can be updated: >>> Movie('Star Wars').director 'George Lucas' >>> jaws = Movie('Jaws') >>> f'Released in {jaws.year} by {jaws.director}' 'Released in 1975 by Steven Spielberg' >>> Song('Country Roads').artist 'John Denver' >>> Movie('Star Wars').director = 'J.J. Abrams' >>> Movie('Star Wars').director 'J.J. Abrams' Pure Python Equivalents ======================= The descriptor protocol is simple and offers exciting possibilities. Several use cases are so common that they have been prepackaged into built-in tools. Properties, bound methods, static methods, class methods, and __slots__ are all based on the descriptor protocol. Properties ---------- Calling "property()" is a succinct way of building a data descriptor that triggers a function call upon access to an attribute. Its signature is: property(fget=None, fset=None, fdel=None, doc=None) -> property The documentation shows a typical use to define a managed attribute "x": class C: def getx(self): return self.__x def setx(self, value): self.__x = value def delx(self): del self.__x x = property(getx, setx, delx, "I'm the 'x' property.") To see how "property()" is implemented in terms of the descriptor protocol, here is a pure Python equivalent: class Property: "Emulate PyProperty_Type() in Objects/descrobject.c" def __init__(self, fget=None, fset=None, fdel=None, doc=None): self.fget = fget self.fset = fset self.fdel = fdel if doc is None and fget is not None: doc = fget.__doc__ self.__doc__ = doc def __get__(self, obj, objtype=None): if obj is None: return self if self.fget is None: raise AttributeError("unreadable attribute") return self.fget(obj) def __set__(self, obj, value): if self.fset is None: raise AttributeError("can't set attribute") self.fset(obj, value) def __delete__(self, obj): if self.fdel is None: raise AttributeError("can't delete attribute") self.fdel(obj) def getter(self, fget): return type(self)(fget, self.fset, self.fdel, self.__doc__) def setter(self, fset): return type(self)(self.fget, fset, self.fdel, self.__doc__) def deleter(self, fdel): return type(self)(self.fget, self.fset, fdel, self.__doc__) The "property()" builtin helps whenever a user interface has granted attribute access and then subsequent changes require the intervention of a method. For instance, a spreadsheet class may grant access to a cell value through "Cell('b10').value". Subsequent improvements to the program require the cell to be recalculated on every access; however, the programmer does not want to affect existing client code accessing the attribute directly. The solution is to wrap access to the value attribute in a property data descriptor: class Cell: ... @property def value(self): "Recalculate the cell before returning value" self.recalc() return self._value Either the built-in "property()" or our "Property()" equivalent would work in this example. Functions and methods --------------------- Python’s object oriented features are built upon a function based environment. Using non-data descriptors, the two are merged seamlessly. Functions stored in class dictionaries get turned into methods when invoked. Methods only differ from regular functions in that the object instance is prepended to the other arguments. By convention, the instance is called *self* but could be called *this* or any other variable name. Methods can be created manually with "types.MethodType" which is roughly equivalent to: class MethodType: "Emulate PyMethod_Type in Objects/classobject.c" def __init__(self, func, obj): self.__func__ = func self.__self__ = obj def __call__(self, *args, **kwargs): func = self.__func__ obj = self.__self__ return func(obj, *args, **kwargs) To support automatic creation of methods, functions include the "__get__()" method for binding methods during attribute access. This means that functions are non-data descriptors that return bound methods during dotted lookup from an instance. Here’s how it works: class Function: ... def __get__(self, obj, objtype=None): "Simulate func_descr_get() in Objects/funcobject.c" if obj is None: return self return MethodType(self, obj) Running the following class in the interpreter shows how the function descriptor works in practice: class D: def f(self, x): return x The function has a *qualified name* attribute to support introspection: >>> D.f.__qualname__ 'D.f' Accessing the function through the class dictionary does not invoke "__get__()". Instead, it just returns the underlying function object: >>> D.__dict__['f'] Dotted access from a class calls "__get__()" which just returns the underlying function unchanged: >>> D.f The interesting behavior occurs during dotted access from an instance. The dotted lookup calls "__get__()" which returns a bound method object: >>> d = D() >>> d.f > Internally, the bound method stores the underlying function and the bound instance: >>> d.f.__func__ >>> d.f.__self__ <__main__.D object at 0x1012e1f98> If you have ever wondered where *self* comes from in regular methods or where *cls* comes from in class methods, this is it! Kinds of methods ---------------- Non-data descriptors provide a simple mechanism for variations on the usual patterns of binding functions into methods. To recap, functions have a "__get__()" method so that they can be converted to a method when accessed as attributes. The non-data descriptor transforms an "obj.f(*args)" call into "f(obj, *args)". Calling "cls.f(*args)" becomes "f(*args)". This chart summarizes the binding and its two most useful variants: +-------------------+------------------------+--------------------+ | Transformation | Called from an object | Called from a | | | | class | |===================|========================|====================| | function | f(obj, *args) | f(*args) | +-------------------+------------------------+--------------------+ | staticmethod | f(*args) | f(*args) | +-------------------+------------------------+--------------------+ | classmethod | f(type(obj), *args) | f(cls, *args) | +-------------------+------------------------+--------------------+ Static methods -------------- Static methods return the underlying function without changes. Calling either "c.f" or "C.f" is the equivalent of a direct lookup into "object.__getattribute__(c, "f")" or "object.__getattribute__(C, "f")". As a result, the function becomes identically accessible from either an object or a class. Good candidates for static methods are methods that do not reference the "self" variable. For instance, a statistics package may include a container class for experimental data. The class provides normal methods for computing the average, mean, median, and other descriptive statistics that depend on the data. However, there may be useful functions which are conceptually related but do not depend on the data. For instance, "erf(x)" is handy conversion routine that comes up in statistical work but does not directly depend on a particular dataset. It can be called either from an object or the class: "s.erf(1.5) --> .9332" or "Sample.erf(1.5) --> .9332". Since static methods return the underlying function with no changes, the example calls are unexciting: class E: @staticmethod def f(x): return x * 10 >>> E.f(3) 30 >>> E().f(3) 30 Using the non-data descriptor protocol, a pure Python version of "staticmethod()" would look like this: class StaticMethod: "Emulate PyStaticMethod_Type() in Objects/funcobject.c" def __init__(self, f): self.f = f def __get__(self, obj, objtype=None): return self.f Class methods ------------- Unlike static methods, class methods prepend the class reference to the argument list before calling the function. This format is the same for whether the caller is an object or a class: class F: @classmethod def f(cls, x): return cls.__name__, x >>> F.f(3) ('F', 3) >>> F().f(3) ('F', 3) This behavior is useful whenever the method only needs to have a class reference and does not rely on data stored in a specific instance. One use for class methods is to create alternate class constructors. For example, the classmethod "dict.fromkeys()" creates a new dictionary from a list of keys. The pure Python equivalent is: class Dict(dict): @classmethod def fromkeys(cls, iterable, value=None): "Emulate dict_fromkeys() in Objects/dictobject.c" d = cls() for key in iterable: d[key] = value return d Now a new dictionary of unique keys can be constructed like this: >>> d = Dict.fromkeys('abracadabra') >>> type(d) is Dict True >>> d {'a': None, 'b': None, 'r': None, 'c': None, 'd': None} Using the non-data descriptor protocol, a pure Python version of "classmethod()" would look like this: class ClassMethod: "Emulate PyClassMethod_Type() in Objects/funcobject.c" def __init__(self, f): self.f = f def __get__(self, obj, cls=None): if cls is None: cls = type(obj) if hasattr(type(self.f), '__get__'): return self.f.__get__(cls) return MethodType(self.f, cls) The code path for "hasattr(type(self.f), '__get__')" was added in Python 3.9 and makes it possible for "classmethod()" to support chained decorators. For example, a classmethod and property could be chained together: class G: @classmethod @property def __doc__(cls): return f'A doc for {cls.__name__!r}' >>> G.__doc__ "A doc for 'G'" Member objects and __slots__ ---------------------------- When a class defines "__slots__", it replaces instance dictionaries with a fixed-length array of slot values. From a user point of view that has several effects: 1. Provides immediate detection of bugs due to misspelled attribute assignments. Only attribute names specified in "__slots__" are allowed: class Vehicle: __slots__ = ('id_number', 'make', 'model') >>> auto = Vehicle() >>> auto.id_nubmer = 'VYE483814LQEX' Traceback (most recent call last): ... AttributeError: 'Vehicle' object has no attribute 'id_nubmer' 2. Helps create immutable objects where descriptors manage access to private attributes stored in "__slots__": class Immutable: __slots__ = ('_dept', '_name') # Replace the instance dictionary def __init__(self, dept, name): self._dept = dept # Store to private attribute self._name = name # Store to private attribute @property # Read-only descriptor def dept(self): return self._dept @property def name(self): # Read-only descriptor return self._name >>> mark = Immutable('Botany', 'Mark Watney') >>> mark.dept 'Botany' >>> mark.dept = 'Space Pirate' Traceback (most recent call last): ... AttributeError: can't set attribute >>> mark.location = 'Mars' Traceback (most recent call last): ... AttributeError: 'Immutable' object has no attribute 'location' 3. Saves memory. On a 64-bit Linux build, an instance with two attributes takes 48 bytes with "__slots__" and 152 bytes without. This flyweight design pattern likely only matters when a large number of instances are going to be created. 4. Blocks tools like "functools.cached_property()" which require an instance dictionary to function correctly: from functools import cached_property class CP: __slots__ = () # Eliminates the instance dict @cached_property # Requires an instance dict def pi(self): return 4 * sum((-1.0)**n / (2.0*n + 1.0) for n in reversed(range(100_000))) >>> CP().pi Traceback (most recent call last): ... TypeError: No '__dict__' attribute on 'CP' instance to cache 'pi' property. It is not possible to create an exact drop-in pure Python version of "__slots__" because it requires direct access to C structures and control over object memory allocation. However, we can build a mostly faithful simulation where the actual C structure for slots is emulated by a private "_slotvalues" list. Reads and writes to that private structure are managed by member descriptors: null = object() class Member: def __init__(self, name, clsname, offset): 'Emulate PyMemberDef in Include/structmember.h' # Also see descr_new() in Objects/descrobject.c self.name = name self.clsname = clsname self.offset = offset def __get__(self, obj, objtype=None): 'Emulate member_get() in Objects/descrobject.c' # Also see PyMember_GetOne() in Python/structmember.c value = obj._slotvalues[self.offset] if value is null: raise AttributeError(self.name) return value def __set__(self, obj, value): 'Emulate member_set() in Objects/descrobject.c' obj._slotvalues[self.offset] = value def __delete__(self, obj): 'Emulate member_delete() in Objects/descrobject.c' value = obj._slotvalues[self.offset] if value is null: raise AttributeError(self.name) obj._slotvalues[self.offset] = null def __repr__(self): 'Emulate member_repr() in Objects/descrobject.c' return f'' The "type.__new__()" method takes care of adding member objects to class variables: class Type(type): 'Simulate how the type metaclass adds member objects for slots' def __new__(mcls, clsname, bases, mapping): 'Emuluate type_new() in Objects/typeobject.c' # type_new() calls PyTypeReady() which calls add_methods() slot_names = mapping.get('slot_names', []) for offset, name in enumerate(slot_names): mapping[name] = Member(name, clsname, offset) return type.__new__(mcls, clsname, bases, mapping) The "object.__new__()" method takes care of creating instances that have slots instead of an instance dictionary. Here is a rough simulation in pure Python: class Object: 'Simulate how object.__new__() allocates memory for __slots__' def __new__(cls, *args): 'Emulate object_new() in Objects/typeobject.c' inst = super().__new__(cls) if hasattr(cls, 'slot_names'): empty_slots = [null] * len(cls.slot_names) object.__setattr__(inst, '_slotvalues', empty_slots) return inst def __setattr__(self, name, value): 'Emulate _PyObject_GenericSetAttrWithDict() Objects/object.c' cls = type(self) if hasattr(cls, 'slot_names') and name not in cls.slot_names: raise AttributeError( f'{type(self).__name__!r} object has no attribute {name!r}' ) super().__setattr__(name, value) def __delattr__(self, name): 'Emulate _PyObject_GenericSetAttrWithDict() Objects/object.c' cls = type(self) if hasattr(cls, 'slot_names') and name not in cls.slot_names: raise AttributeError( f'{type(self).__name__!r} object has no attribute {name!r}' ) super().__delattr__(name) To use the simulation in a real class, just inherit from "Object" and set the *metaclass* to "Type": class H(Object, metaclass=Type): 'Instance variables stored in slots' slot_names = ['x', 'y'] def __init__(self, x, y): self.x = x self.y = y At this point, the metaclass has loaded member objects for *x* and *y*: >>> from pprint import pp >>> pp(dict(vars(H))) {'__module__': '__main__', '__doc__': 'Instance variables stored in slots', 'slot_names': ['x', 'y'], '__init__': , 'x': , 'y': } When instances are created, they have a "slot_values" list where the attributes are stored: >>> h = H(10, 20) >>> vars(h) {'_slotvalues': [10, 20]} >>> h.x = 55 >>> vars(h) {'_slotvalues': [55, 20]} Misspelled or unassigned attributes will raise an exception: >>> h.xz Traceback (most recent call last): ... AttributeError: 'H' object has no attribute 'xz' Functional Programming HOWTO **************************** Author: A. M. Kuchling Release: 0.32 In this document, we’ll take a tour of Python’s features suitable for implementing programs in a functional style. After an introduction to the concepts of functional programming, we’ll look at language features such as *iterator*s and *generator*s and relevant library modules such as "itertools" and "functools". Introduction ============ This section explains the basic concept of functional programming; if you’re just interested in learning about Python language features, skip to the next section on Iterators. Programming languages support decomposing problems in several different ways: * Most programming languages are **procedural**: programs are lists of instructions that tell the computer what to do with the program’s input. C, Pascal, and even Unix shells are procedural languages. * In **declarative** languages, you write a specification that describes the problem to be solved, and the language implementation figures out how to perform the computation efficiently. SQL is the declarative language you’re most likely to be familiar with; a SQL query describes the data set you want to retrieve, and the SQL engine decides whether to scan tables or use indexes, which subclauses should be performed first, etc. * **Object-oriented** programs manipulate collections of objects. Objects have internal state and support methods that query or modify this internal state in some way. Smalltalk and Java are object- oriented languages. C++ and Python are languages that support object-oriented programming, but don’t force the use of object- oriented features. * **Functional** programming decomposes a problem into a set of functions. Ideally, functions only take inputs and produce outputs, and don’t have any internal state that affects the output produced for a given input. Well-known functional languages include the ML family (Standard ML, OCaml, and other variants) and Haskell. The designers of some computer languages choose to emphasize one particular approach to programming. This often makes it difficult to write programs that use a different approach. Other languages are multi-paradigm languages that support several different approaches. Lisp, C++, and Python are multi-paradigm; you can write programs or libraries that are largely procedural, object-oriented, or functional in all of these languages. In a large program, different sections might be written using different approaches; the GUI might be object- oriented while the processing logic is procedural or functional, for example. In a functional program, input flows through a set of functions. Each function operates on its input and produces some output. Functional style discourages functions with side effects that modify internal state or make other changes that aren’t visible in the function’s return value. Functions that have no side effects at all are called **purely functional**. Avoiding side effects means not using data structures that get updated as a program runs; every function’s output must only depend on its input. Some languages are very strict about purity and don’t even have assignment statements such as "a=3" or "c = a + b", but it’s difficult to avoid all side effects, such as printing to the screen or writing to a disk file. Another example is a call to the "print()" or "time.sleep()" function, neither of which returns a useful value. Both are called only for their side effects of sending some text to the screen or pausing execution for a second. Python programs written in functional style usually won’t go to the extreme of avoiding all I/O or all assignments; instead, they’ll provide a functional-appearing interface but will use non-functional features internally. For example, the implementation of a function will still use assignments to local variables, but won’t modify global variables or have other side effects. Functional programming can be considered the opposite of object- oriented programming. Objects are little capsules containing some internal state along with a collection of method calls that let you modify this state, and programs consist of making the right set of state changes. Functional programming wants to avoid state changes as much as possible and works with data flowing between functions. In Python you might combine the two approaches by writing functions that take and return instances representing objects in your application (e-mail messages, transactions, etc.). Functional design may seem like an odd constraint to work under. Why should you avoid objects and side effects? There are theoretical and practical advantages to the functional style: * Formal provability. * Modularity. * Composability. * Ease of debugging and testing. Formal provability ------------------ A theoretical benefit is that it’s easier to construct a mathematical proof that a functional program is correct. For a long time researchers have been interested in finding ways to mathematically prove programs correct. This is different from testing a program on numerous inputs and concluding that its output is usually correct, or reading a program’s source code and concluding that the code looks right; the goal is instead a rigorous proof that a program produces the right result for all possible inputs. The technique used to prove programs correct is to write down **invariants**, properties of the input data and of the program’s variables that are always true. For each line of code, you then show that if invariants X and Y are true **before** the line is executed, the slightly different invariants X’ and Y’ are true **after** the line is executed. This continues until you reach the end of the program, at which point the invariants should match the desired conditions on the program’s output. Functional programming’s avoidance of assignments arose because assignments are difficult to handle with this technique; assignments can break invariants that were true before the assignment without producing any new invariants that can be propagated onward. Unfortunately, proving programs correct is largely impractical and not relevant to Python software. Even trivial programs require proofs that are several pages long; the proof of correctness for a moderately complicated program would be enormous, and few or none of the programs you use daily (the Python interpreter, your XML parser, your web browser) could be proven correct. Even if you wrote down or generated a proof, there would then be the question of verifying the proof; maybe there’s an error in it, and you wrongly believe you’ve proved the program correct. Modularity ---------- A more practical benefit of functional programming is that it forces you to break apart your problem into small pieces. Programs are more modular as a result. It’s easier to specify and write a small function that does one thing than a large function that performs a complicated transformation. Small functions are also easier to read and to check for errors. Ease of debugging and testing ----------------------------- Testing and debugging a functional-style program is easier. Debugging is simplified because functions are generally small and clearly specified. When a program doesn’t work, each function is an interface point where you can check that the data are correct. You can look at the intermediate inputs and outputs to quickly isolate the function that’s responsible for a bug. Testing is easier because each function is a potential subject for a unit test. Functions don’t depend on system state that needs to be replicated before running a test; instead you only have to synthesize the right input and then check that the output matches expectations. Composability ------------- As you work on a functional-style program, you’ll write a number of functions with varying inputs and outputs. Some of these functions will be unavoidably specialized to a particular application, but others will be useful in a wide variety of programs. For example, a function that takes a directory path and returns all the XML files in the directory, or a function that takes a filename and returns its contents, can be applied to many different situations. Over time you’ll form a personal library of utilities. Often you’ll assemble new programs by arranging existing functions in a new configuration and writing a few functions specialized for the current task. Iterators ========= I’ll start by looking at a Python language feature that’s an important foundation for writing functional-style programs: iterators. An iterator is an object representing a stream of data; this object returns the data one element at a time. A Python iterator must support a method called "__next__()" that takes no arguments and always returns the next element of the stream. If there are no more elements in the stream, "__next__()" must raise the "StopIteration" exception. Iterators don’t have to be finite, though; it’s perfectly reasonable to write an iterator that produces an infinite stream of data. The built-in "iter()" function takes an arbitrary object and tries to return an iterator that will return the object’s contents or elements, raising "TypeError" if the object doesn’t support iteration. Several of Python’s built-in data types support iteration, the most common being lists and dictionaries. An object is called *iterable* if you can get an iterator for it. You can experiment with the iteration interface manually: >>> L = [1, 2, 3] >>> it = iter(L) >>> it <...iterator object at ...> >>> it.__next__() # same as next(it) 1 >>> next(it) 2 >>> next(it) 3 >>> next(it) Traceback (most recent call last): File "", line 1, in StopIteration >>> Python expects iterable objects in several different contexts, the most important being the "for" statement. In the statement "for X in Y", Y must be an iterator or some object for which "iter()" can create an iterator. These two statements are equivalent: for i in iter(obj): print(i) for i in obj: print(i) Iterators can be materialized as lists or tuples by using the "list()" or "tuple()" constructor functions: >>> L = [1, 2, 3] >>> iterator = iter(L) >>> t = tuple(iterator) >>> t (1, 2, 3) Sequence unpacking also supports iterators: if you know an iterator will return N elements, you can unpack them into an N-tuple: >>> L = [1, 2, 3] >>> iterator = iter(L) >>> a, b, c = iterator >>> a, b, c (1, 2, 3) Built-in functions such as "max()" and "min()" can take a single iterator argument and will return the largest or smallest element. The ""in"" and ""not in"" operators also support iterators: "X in iterator" is true if X is found in the stream returned by the iterator. You’ll run into obvious problems if the iterator is infinite; "max()", "min()" will never return, and if the element X never appears in the stream, the ""in"" and ""not in"" operators won’t return either. Note that you can only go forward in an iterator; there’s no way to get the previous element, reset the iterator, or make a copy of it. Iterator objects can optionally provide these additional capabilities, but the iterator protocol only specifies the "__next__()" method. Functions may therefore consume all of the iterator’s output, and if you need to do something different with the same stream, you’ll have to create a new iterator. Data Types That Support Iterators --------------------------------- We’ve already seen how lists and tuples support iterators. In fact, any Python sequence type, such as strings, will automatically support creation of an iterator. Calling "iter()" on a dictionary returns an iterator that will loop over the dictionary’s keys: >>> m = {'Jan': 1, 'Feb': 2, 'Mar': 3, 'Apr': 4, 'May': 5, 'Jun': 6, ... 'Jul': 7, 'Aug': 8, 'Sep': 9, 'Oct': 10, 'Nov': 11, 'Dec': 12} >>> for key in m: ... print(key, m[key]) Jan 1 Feb 2 Mar 3 Apr 4 May 5 Jun 6 Jul 7 Aug 8 Sep 9 Oct 10 Nov 11 Dec 12 Note that starting with Python 3.7, dictionary iteration order is guaranteed to be the same as the insertion order. In earlier versions, the behaviour was unspecified and could vary between implementations. Applying "iter()" to a dictionary always loops over the keys, but dictionaries have methods that return other iterators. If you want to iterate over values or key/value pairs, you can explicitly call the "values()" or "items()" methods to get an appropriate iterator. The "dict()" constructor can accept an iterator that returns a finite stream of "(key, value)" tuples: >>> L = [('Italy', 'Rome'), ('France', 'Paris'), ('US', 'Washington DC')] >>> dict(iter(L)) {'Italy': 'Rome', 'France': 'Paris', 'US': 'Washington DC'} Files also support iteration by calling the "readline()" method until there are no more lines in the file. This means you can read each line of a file like this: for line in file: # do something for each line ... Sets can take their contents from an iterable and let you iterate over the set’s elements: S = {2, 3, 5, 7, 11, 13} for i in S: print(i) Generator expressions and list comprehensions ============================================= Two common operations on an iterator’s output are 1) performing some operation for every element, 2) selecting a subset of elements that meet some condition. For example, given a list of strings, you might want to strip off trailing whitespace from each line or extract all the strings containing a given substring. List comprehensions and generator expressions (short form: “listcomps” and “genexps”) are a concise notation for such operations, borrowed from the functional programming language Haskell (https://www.haskell.org/). You can strip all the whitespace from a stream of strings with the following code: line_list = [' line 1\n', 'line 2 \n', ...] # Generator expression -- returns iterator stripped_iter = (line.strip() for line in line_list) # List comprehension -- returns list stripped_list = [line.strip() for line in line_list] You can select only certain elements by adding an ""if"" condition: stripped_list = [line.strip() for line in line_list if line != ""] With a list comprehension, you get back a Python list; "stripped_list" is a list containing the resulting lines, not an iterator. Generator expressions return an iterator that computes the values as necessary, not needing to materialize all the values at once. This means that list comprehensions aren’t useful if you’re working with iterators that return an infinite stream or a very large amount of data. Generator expressions are preferable in these situations. Generator expressions are surrounded by parentheses (“()”) and list comprehensions are surrounded by square brackets (“[]”). Generator expressions have the form: ( expression for expr in sequence1 if condition1 for expr2 in sequence2 if condition2 for expr3 in sequence3 ... if condition3 for exprN in sequenceN if conditionN ) Again, for a list comprehension only the outside brackets are different (square brackets instead of parentheses). The elements of the generated output will be the successive values of "expression". The "if" clauses are all optional; if present, "expression" is only evaluated and added to the result when "condition" is true. Generator expressions always have to be written inside parentheses, but the parentheses signalling a function call also count. If you want to create an iterator that will be immediately passed to a function you can write: obj_total = sum(obj.count for obj in list_all_objects()) The "for...in" clauses contain the sequences to be iterated over. The sequences do not have to be the same length, because they are iterated over from left to right, **not** in parallel. For each element in "sequence1", "sequence2" is looped over from the beginning. "sequence3" is then looped over for each resulting pair of elements from "sequence1" and "sequence2". To put it another way, a list comprehension or generator expression is equivalent to the following Python code: for expr1 in sequence1: if not (condition1): continue # Skip this element for expr2 in sequence2: if not (condition2): continue # Skip this element ... for exprN in sequenceN: if not (conditionN): continue # Skip this element # Output the value of # the expression. This means that when there are multiple "for...in" clauses but no "if" clauses, the length of the resulting output will be equal to the product of the lengths of all the sequences. If you have two lists of length 3, the output list is 9 elements long: >>> seq1 = 'abc' >>> seq2 = (1, 2, 3) >>> [(x, y) for x in seq1 for y in seq2] [('a', 1), ('a', 2), ('a', 3), ('b', 1), ('b', 2), ('b', 3), ('c', 1), ('c', 2), ('c', 3)] To avoid introducing an ambiguity into Python’s grammar, if "expression" is creating a tuple, it must be surrounded with parentheses. The first list comprehension below is a syntax error, while the second one is correct: # Syntax error [x, y for x in seq1 for y in seq2] # Correct [(x, y) for x in seq1 for y in seq2] Generators ========== Generators are a special class of functions that simplify the task of writing iterators. Regular functions compute a value and return it, but generators return an iterator that returns a stream of values. You’re doubtless familiar with how regular function calls work in Python or C. When you call a function, it gets a private namespace where its local variables are created. When the function reaches a "return" statement, the local variables are destroyed and the value is returned to the caller. A later call to the same function creates a new private namespace and a fresh set of local variables. But, what if the local variables weren’t thrown away on exiting a function? What if you could later resume the function where it left off? This is what generators provide; they can be thought of as resumable functions. Here’s the simplest example of a generator function: >>> def generate_ints(N): ... for i in range(N): ... yield i Any function containing a "yield" keyword is a generator function; this is detected by Python’s *bytecode* compiler which compiles the function specially as a result. When you call a generator function, it doesn’t return a single value; instead it returns a generator object that supports the iterator protocol. On executing the "yield" expression, the generator outputs the value of "i", similar to a "return" statement. The big difference between "yield" and a "return" statement is that on reaching a "yield" the generator’s state of execution is suspended and local variables are preserved. On the next call to the generator’s "__next__()" method, the function will resume executing. Here’s a sample usage of the "generate_ints()" generator: >>> gen = generate_ints(3) >>> gen >>> next(gen) 0 >>> next(gen) 1 >>> next(gen) 2 >>> next(gen) Traceback (most recent call last): File "stdin", line 1, in File "stdin", line 2, in generate_ints StopIteration You could equally write "for i in generate_ints(5)", or "a, b, c = generate_ints(3)". Inside a generator function, "return value" causes "StopIteration(value)" to be raised from the "__next__()" method. Once this happens, or the bottom of the function is reached, the procession of values ends and the generator cannot yield any further values. You could achieve the effect of generators manually by writing your own class and storing all the local variables of the generator as instance variables. For example, returning a list of integers could be done by setting "self.count" to 0, and having the "__next__()" method increment "self.count" and return it. However, for a moderately complicated generator, writing a corresponding class can be much messier. The test suite included with Python’s library, Lib/test/test_generators.py, contains a number of more interesting examples. Here’s one generator that implements an in-order traversal of a tree using generators recursively. # A recursive generator that generates Tree leaves in in-order. def inorder(t): if t: for x in inorder(t.left): yield x yield t.label for x in inorder(t.right): yield x Two other examples in "test_generators.py" produce solutions for the N-Queens problem (placing N queens on an NxN chess board so that no queen threatens another) and the Knight’s Tour (finding a route that takes a knight to every square of an NxN chessboard without visiting any square twice). Passing values into a generator ------------------------------- In Python 2.4 and earlier, generators only produced output. Once a generator’s code was invoked to create an iterator, there was no way to pass any new information into the function when its execution is resumed. You could hack together this ability by making the generator look at a global variable or by passing in some mutable object that callers then modify, but these approaches are messy. In Python 2.5 there’s a simple way to pass values into a generator. "yield" became an expression, returning a value that can be assigned to a variable or otherwise operated on: val = (yield i) I recommend that you **always** put parentheses around a "yield" expression when you’re doing something with the returned value, as in the above example. The parentheses aren’t always necessary, but it’s easier to always add them instead of having to remember when they’re needed. (**PEP 342** explains the exact rules, which are that a "yield"-expression must always be parenthesized except when it occurs at the top-level expression on the right-hand side of an assignment. This means you can write "val = yield i" but have to use parentheses when there’s an operation, as in "val = (yield i) + 12".) Values are sent into a generator by calling its "send(value)" method. This method resumes the generator’s code and the "yield" expression returns the specified value. If the regular "__next__()" method is called, the "yield" returns "None". Here’s a simple counter that increments by 1 and allows changing the value of the internal counter. def counter(maximum): i = 0 while i < maximum: val = (yield i) # If value provided, change counter if val is not None: i = val else: i += 1 And here’s an example of changing the counter: >>> it = counter(10) >>> next(it) 0 >>> next(it) 1 >>> it.send(8) 8 >>> next(it) 9 >>> next(it) Traceback (most recent call last): File "t.py", line 15, in it.next() StopIteration Because "yield" will often be returning "None", you should always check for this case. Don’t just use its value in expressions unless you’re sure that the "send()" method will be the only method used to resume your generator function. In addition to "send()", there are two other methods on generators: * "throw(value)" is used to raise an exception inside the generator; the exception is raised by the "yield" expression where the generator’s execution is paused. * "close()" raises a "GeneratorExit" exception inside the generator to terminate the iteration. On receiving this exception, the generator’s code must either raise "GeneratorExit" or "StopIteration"; catching the exception and doing anything else is illegal and will trigger a "RuntimeError". "close()" will also be called by Python’s garbage collector when the generator is garbage- collected. If you need to run cleanup code when a "GeneratorExit" occurs, I suggest using a "try: ... finally:" suite instead of catching "GeneratorExit". The cumulative effect of these changes is to turn generators from one- way producers of information into both producers and consumers. Generators also become **coroutines**, a more generalized form of subroutines. Subroutines are entered at one point and exited at another point (the top of the function, and a "return" statement), but coroutines can be entered, exited, and resumed at many different points (the "yield" statements). Built-in functions ================== Let’s look in more detail at built-in functions often used with iterators. Two of Python’s built-in functions, "map()" and "filter()" duplicate the features of generator expressions: "map(f, iterA, iterB, ...)" returns an iterator over the sequence "f(iterA[0], iterB[0]), f(iterA[1], iterB[1]), f(iterA[2], iterB[2]), ...". >>> def upper(s): ... return s.upper() >>> list(map(upper, ['sentence', 'fragment'])) ['SENTENCE', 'FRAGMENT'] >>> [upper(s) for s in ['sentence', 'fragment']] ['SENTENCE', 'FRAGMENT'] You can of course achieve the same effect with a list comprehension. "filter(predicate, iter)" returns an iterator over all the sequence elements that meet a certain condition, and is similarly duplicated by list comprehensions. A **predicate** is a function that returns the truth value of some condition; for use with "filter()", the predicate must take a single value. >>> def is_even(x): ... return (x % 2) == 0 >>> list(filter(is_even, range(10))) [0, 2, 4, 6, 8] This can also be written as a list comprehension: >>> list(x for x in range(10) if is_even(x)) [0, 2, 4, 6, 8] "enumerate(iter, start=0)" counts off the elements in the iterable returning 2-tuples containing the count (from *start*) and each element. >>> for item in enumerate(['subject', 'verb', 'object']): ... print(item) (0, 'subject') (1, 'verb') (2, 'object') "enumerate()" is often used when looping through a list and recording the indexes at which certain conditions are met: f = open('data.txt', 'r') for i, line in enumerate(f): if line.strip() == '': print('Blank line at line #%i' % i) "sorted(iterable, key=None, reverse=False)" collects all the elements of the iterable into a list, sorts the list, and returns the sorted result. The *key* and *reverse* arguments are passed through to the constructed list’s "sort()" method. >>> import random >>> # Generate 8 random numbers between [0, 10000) >>> rand_list = random.sample(range(10000), 8) >>> rand_list [769, 7953, 9828, 6431, 8442, 9878, 6213, 2207] >>> sorted(rand_list) [769, 2207, 6213, 6431, 7953, 8442, 9828, 9878] >>> sorted(rand_list, reverse=True) [9878, 9828, 8442, 7953, 6431, 6213, 2207, 769] (For a more detailed discussion of sorting, see the Sorting HOW TO.) The "any(iter)" and "all(iter)" built-ins look at the truth values of an iterable’s contents. "any()" returns "True" if any element in the iterable is a true value, and "all()" returns "True" if all of the elements are true values: >>> any([0, 1, 0]) True >>> any([0, 0, 0]) False >>> any([1, 1, 1]) True >>> all([0, 1, 0]) False >>> all([0, 0, 0]) False >>> all([1, 1, 1]) True "zip(iterA, iterB, ...)" takes one element from each iterable and returns them in a tuple: zip(['a', 'b', 'c'], (1, 2, 3)) => ('a', 1), ('b', 2), ('c', 3) It doesn’t construct an in-memory list and exhaust all the input iterators before returning; instead tuples are constructed and returned only if they’re requested. (The technical term for this behaviour is lazy evaluation.) This iterator is intended to be used with iterables that are all of the same length. If the iterables are of different lengths, the resulting stream will be the same length as the shortest iterable. zip(['a', 'b'], (1, 2, 3)) => ('a', 1), ('b', 2) You should avoid doing this, though, because an element may be taken from the longer iterators and discarded. This means you can’t go on to use the iterators further because you risk skipping a discarded element. The itertools module ==================== The "itertools" module contains a number of commonly-used iterators as well as functions for combining several iterators. This section will introduce the module’s contents by showing small examples. The module’s functions fall into a few broad classes: * Functions that create a new iterator based on an existing iterator. * Functions for treating an iterator’s elements as function arguments. * Functions for selecting portions of an iterator’s output. * A function for grouping an iterator’s output. Creating new iterators ---------------------- "itertools.count(start, step)" returns an infinite stream of evenly spaced values. You can optionally supply the starting number, which defaults to 0, and the interval between numbers, which defaults to 1: itertools.count() => 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... itertools.count(10) => 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... itertools.count(10, 5) => 10, 15, 20, 25, 30, 35, 40, 45, 50, 55, ... "itertools.cycle(iter)" saves a copy of the contents of a provided iterable and returns a new iterator that returns its elements from first to last. The new iterator will repeat these elements infinitely. itertools.cycle([1, 2, 3, 4, 5]) => 1, 2, 3, 4, 5, 1, 2, 3, 4, 5, ... "itertools.repeat(elem, [n])" returns the provided element *n* times, or returns the element endlessly if *n* is not provided. itertools.repeat('abc') => abc, abc, abc, abc, abc, abc, abc, abc, abc, abc, ... itertools.repeat('abc', 5) => abc, abc, abc, abc, abc "itertools.chain(iterA, iterB, ...)" takes an arbitrary number of iterables as input, and returns all the elements of the first iterator, then all the elements of the second, and so on, until all of the iterables have been exhausted. itertools.chain(['a', 'b', 'c'], (1, 2, 3)) => a, b, c, 1, 2, 3 "itertools.islice(iter, [start], stop, [step])" returns a stream that’s a slice of the iterator. With a single *stop* argument, it will return the first *stop* elements. If you supply a starting index, you’ll get *stop-start* elements, and if you supply a value for *step*, elements will be skipped accordingly. Unlike Python’s string and list slicing, you can’t use negative values for *start*, *stop*, or *step*. itertools.islice(range(10), 8) => 0, 1, 2, 3, 4, 5, 6, 7 itertools.islice(range(10), 2, 8) => 2, 3, 4, 5, 6, 7 itertools.islice(range(10), 2, 8, 2) => 2, 4, 6 "itertools.tee(iter, [n])" replicates an iterator; it returns *n* independent iterators that will all return the contents of the source iterator. If you don’t supply a value for *n*, the default is 2. Replicating iterators requires saving some of the contents of the source iterator, so this can consume significant memory if the iterator is large and one of the new iterators is consumed more than the others. itertools.tee( itertools.count() ) => iterA, iterB where iterA -> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... and iterB -> 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, ... Calling functions on elements ----------------------------- The "operator" module contains a set of functions corresponding to Python’s operators. Some examples are "operator.add(a, b)" (adds two values), "operator.ne(a, b)" (same as "a != b"), and "operator.attrgetter('id')" (returns a callable that fetches the ".id" attribute). "itertools.starmap(func, iter)" assumes that the iterable will return a stream of tuples, and calls *func* using these tuples as the arguments: itertools.starmap(os.path.join, [('/bin', 'python'), ('/usr', 'bin', 'java'), ('/usr', 'bin', 'perl'), ('/usr', 'bin', 'ruby')]) => /bin/python, /usr/bin/java, /usr/bin/perl, /usr/bin/ruby Selecting elements ------------------ Another group of functions chooses a subset of an iterator’s elements based on a predicate. "itertools.filterfalse(predicate, iter)" is the opposite of "filter()", returning all elements for which the predicate returns false: itertools.filterfalse(is_even, itertools.count()) => 1, 3, 5, 7, 9, 11, 13, 15, ... "itertools.takewhile(predicate, iter)" returns elements for as long as the predicate returns true. Once the predicate returns false, the iterator will signal the end of its results. def less_than_10(x): return x < 10 itertools.takewhile(less_than_10, itertools.count()) => 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 itertools.takewhile(is_even, itertools.count()) => 0 "itertools.dropwhile(predicate, iter)" discards elements while the predicate returns true, and then returns the rest of the iterable’s results. itertools.dropwhile(less_than_10, itertools.count()) => 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, ... itertools.dropwhile(is_even, itertools.count()) => 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, ... "itertools.compress(data, selectors)" takes two iterators and returns only those elements of *data* for which the corresponding element of *selectors* is true, stopping whenever either one is exhausted: itertools.compress([1, 2, 3, 4, 5], [True, True, False, False, True]) => 1, 2, 5 Combinatoric functions ---------------------- The "itertools.combinations(iterable, r)" returns an iterator giving all possible *r*-tuple combinations of the elements contained in *iterable*. itertools.combinations([1, 2, 3, 4, 5], 2) => (1, 2), (1, 3), (1, 4), (1, 5), (2, 3), (2, 4), (2, 5), (3, 4), (3, 5), (4, 5) itertools.combinations([1, 2, 3, 4, 5], 3) => (1, 2, 3), (1, 2, 4), (1, 2, 5), (1, 3, 4), (1, 3, 5), (1, 4, 5), (2, 3, 4), (2, 3, 5), (2, 4, 5), (3, 4, 5) The elements within each tuple remain in the same order as *iterable* returned them. For example, the number 1 is always before 2, 3, 4, or 5 in the examples above. A similar function, "itertools.permutations(iterable, r=None)", removes this constraint on the order, returning all possible arrangements of length *r*: itertools.permutations([1, 2, 3, 4, 5], 2) => (1, 2), (1, 3), (1, 4), (1, 5), (2, 1), (2, 3), (2, 4), (2, 5), (3, 1), (3, 2), (3, 4), (3, 5), (4, 1), (4, 2), (4, 3), (4, 5), (5, 1), (5, 2), (5, 3), (5, 4) itertools.permutations([1, 2, 3, 4, 5]) => (1, 2, 3, 4, 5), (1, 2, 3, 5, 4), (1, 2, 4, 3, 5), ... (5, 4, 3, 2, 1) If you don’t supply a value for *r* the length of the iterable is used, meaning that all the elements are permuted. Note that these functions produce all of the possible combinations by position and don’t require that the contents of *iterable* are unique: itertools.permutations('aba', 3) => ('a', 'b', 'a'), ('a', 'a', 'b'), ('b', 'a', 'a'), ('b', 'a', 'a'), ('a', 'a', 'b'), ('a', 'b', 'a') The identical tuple "('a', 'a', 'b')" occurs twice, but the two ‘a’ strings came from different positions. The "itertools.combinations_with_replacement(iterable, r)" function relaxes a different constraint: elements can be repeated within a single tuple. Conceptually an element is selected for the first position of each tuple and then is replaced before the second element is selected. itertools.combinations_with_replacement([1, 2, 3, 4, 5], 2) => (1, 1), (1, 2), (1, 3), (1, 4), (1, 5), (2, 2), (2, 3), (2, 4), (2, 5), (3, 3), (3, 4), (3, 5), (4, 4), (4, 5), (5, 5) Grouping elements ----------------- The last function I’ll discuss, "itertools.groupby(iter, key_func=None)", is the most complicated. "key_func(elem)" is a function that can compute a key value for each element returned by the iterable. If you don’t supply a key function, the key is simply each element itself. "groupby()" collects all the consecutive elements from the underlying iterable that have the same key value, and returns a stream of 2-tuples containing a key value and an iterator for the elements with that key. city_list = [('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL'), ('Anchorage', 'AK'), ('Nome', 'AK'), ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ'), ... ] def get_state(city_state): return city_state[1] itertools.groupby(city_list, get_state) => ('AL', iterator-1), ('AK', iterator-2), ('AZ', iterator-3), ... where iterator-1 => ('Decatur', 'AL'), ('Huntsville', 'AL'), ('Selma', 'AL') iterator-2 => ('Anchorage', 'AK'), ('Nome', 'AK') iterator-3 => ('Flagstaff', 'AZ'), ('Phoenix', 'AZ'), ('Tucson', 'AZ') "groupby()" assumes that the underlying iterable’s contents will already be sorted based on the key. Note that the returned iterators also use the underlying iterable, so you have to consume the results of iterator-1 before requesting iterator-2 and its corresponding key. The functools module ==================== The "functools" module in Python 2.5 contains some higher-order functions. A **higher-order function** takes one or more functions as input and returns a new function. The most useful tool in this module is the "functools.partial()" function. For programs written in a functional style, you’ll sometimes want to construct variants of existing functions that have some of the parameters filled in. Consider a Python function "f(a, b, c)"; you may wish to create a new function "g(b, c)" that’s equivalent to "f(1, b, c)"; you’re filling in a value for one of "f()"’s parameters. This is called “partial function application”. The constructor for "partial()" takes the arguments "(function, arg1, arg2, ..., kwarg1=value1, kwarg2=value2)". The resulting object is callable, so you can just call it to invoke "function" with the filled-in arguments. Here’s a small but realistic example: import functools def log(message, subsystem): """Write the contents of 'message' to the specified subsystem.""" print('%s: %s' % (subsystem, message)) ... server_log = functools.partial(log, subsystem='server') server_log('Unable to open socket') "functools.reduce(func, iter, [initial_value])" cumulatively performs an operation on all the iterable’s elements and, therefore, can’t be applied to infinite iterables. *func* must be a function that takes two elements and returns a single value. "functools.reduce()" takes the first two elements A and B returned by the iterator and calculates "func(A, B)". It then requests the third element, C, calculates "func(func(A, B), C)", combines this result with the fourth element returned, and continues until the iterable is exhausted. If the iterable returns no values at all, a "TypeError" exception is raised. If the initial value is supplied, it’s used as a starting point and "func(initial_value, A)" is the first calculation. >>> import operator, functools >>> functools.reduce(operator.concat, ['A', 'BB', 'C']) 'ABBC' >>> functools.reduce(operator.concat, []) Traceback (most recent call last): ... TypeError: reduce() of empty sequence with no initial value >>> functools.reduce(operator.mul, [1, 2, 3], 1) 6 >>> functools.reduce(operator.mul, [], 1) 1 If you use "operator.add()" with "functools.reduce()", you’ll add up all the elements of the iterable. This case is so common that there’s a special built-in called "sum()" to compute it: >>> import functools, operator >>> functools.reduce(operator.add, [1, 2, 3, 4], 0) 10 >>> sum([1, 2, 3, 4]) 10 >>> sum([]) 0 For many uses of "functools.reduce()", though, it can be clearer to just write the obvious "for" loop: import functools # Instead of: product = functools.reduce(operator.mul, [1, 2, 3], 1) # You can write: product = 1 for i in [1, 2, 3]: product *= i A related function is "itertools.accumulate(iterable, func=operator.add)". It performs the same calculation, but instead of returning only the final result, "accumulate()" returns an iterator that also yields each partial result: itertools.accumulate([1, 2, 3, 4, 5]) => 1, 3, 6, 10, 15 itertools.accumulate([1, 2, 3, 4, 5], operator.mul) => 1, 2, 6, 24, 120 The operator module ------------------- The "operator" module was mentioned earlier. It contains a set of functions corresponding to Python’s operators. These functions are often useful in functional-style code because they save you from writing trivial functions that perform a single operation. Some of the functions in this module are: * Math operations: "add()", "sub()", "mul()", "floordiv()", "abs()", … * Logical operations: "not_()", "truth()". * Bitwise operations: "and_()", "or_()", "invert()". * Comparisons: "eq()", "ne()", "lt()", "le()", "gt()", and "ge()". * Object identity: "is_()", "is_not()". Consult the operator module’s documentation for a complete list. Small functions and the lambda expression ========================================= When writing functional-style programs, you’ll often need little functions that act as predicates or that combine elements in some way. If there’s a Python built-in or a module function that’s suitable, you don’t need to define a new function at all: stripped_lines = [line.strip() for line in lines] existing_files = filter(os.path.exists, file_list) If the function you need doesn’t exist, you need to write it. One way to write small functions is to use the "lambda" expression. "lambda" takes a number of parameters and an expression combining these parameters, and creates an anonymous function that returns the value of the expression: adder = lambda x, y: x+y print_assign = lambda name, value: name + '=' + str(value) An alternative is to just use the "def" statement and define a function in the usual way: def adder(x, y): return x + y def print_assign(name, value): return name + '=' + str(value) Which alternative is preferable? That’s a style question; my usual course is to avoid using "lambda". One reason for my preference is that "lambda" is quite limited in the functions it can define. The result has to be computable as a single expression, which means you can’t have multiway "if... elif... else" comparisons or "try... except" statements. If you try to do too much in a "lambda" statement, you’ll end up with an overly complicated expression that’s hard to read. Quick, what’s the following code doing? import functools total = functools.reduce(lambda a, b: (0, a[1] + b[1]), items)[1] You can figure it out, but it takes time to disentangle the expression to figure out what’s going on. Using a short nested "def" statements makes things a little bit better: import functools def combine(a, b): return 0, a[1] + b[1] total = functools.reduce(combine, items)[1] But it would be best of all if I had simply used a "for" loop: total = 0 for a, b in items: total += b Or the "sum()" built-in and a generator expression: total = sum(b for a, b in items) Many uses of "functools.reduce()" are clearer when written as "for" loops. Fredrik Lundh once suggested the following set of rules for refactoring uses of "lambda": 1. Write a lambda function. 2. Write a comment explaining what the heck that lambda does. 3. Study the comment for a while, and think of a name that captures the essence of the comment. 4. Convert the lambda to a def statement, using that name. 5. Remove the comment. I really like these rules, but you’re free to disagree about whether this lambda-free style is better. Revision History and Acknowledgements ===================================== The author would like to thank the following people for offering suggestions, corrections and assistance with various drafts of this article: Ian Bicking, Nick Coghlan, Nick Efford, Raymond Hettinger, Jim Jewett, Mike Krell, Leandro Lameiro, Jussi Salmela, Collin Winter, Blake Winton. Version 0.1: posted June 30 2006. Version 0.11: posted July 1 2006. Typo fixes. Version 0.2: posted July 10 2006. Merged genexp and listcomp sections into one. Typo fixes. Version 0.21: Added more references suggested on the tutor mailing list. Version 0.30: Adds a section on the "functional" module written by Collin Winter; adds short section on the operator module; a few other edits. References ========== General ------- **Structure and Interpretation of Computer Programs**, by Harold Abelson and Gerald Jay Sussman with Julie Sussman. Full text at https://mitpress.mit.edu/sicp/. In this classic textbook of computer science, chapters 2 and 3 discuss the use of sequences and streams to organize the data flow inside a program. The book uses Scheme for its examples, but many of the design approaches described in these chapters are applicable to functional-style Python code. http://www.defmacro.org/ramblings/fp.html: A general introduction to functional programming that uses Java examples and has a lengthy historical introduction. https://en.wikipedia.org/wiki/Functional_programming: General Wikipedia entry describing functional programming. https://en.wikipedia.org/wiki/Coroutine: Entry for coroutines. https://en.wikipedia.org/wiki/Currying: Entry for the concept of currying. Python-specific --------------- http://gnosis.cx/TPiP/: The first chapter of David Mertz’s book *Text Processing in Python* discusses functional programming for text processing, in the section titled “Utilizing Higher-Order Functions in Text Processing”. Mertz also wrote a 3-part series of articles on functional programming for IBM’s DeveloperWorks site; see part 1, part 2, and part 3, Python documentation -------------------- Documentation for the "itertools" module. Documentation for the "functools" module. Documentation for the "operator" module. **PEP 289**: “Generator Expressions” **PEP 342**: “Coroutines via Enhanced Generators” describes the new generator features in Python 2.5. Python HOWTOs ************* Python HOWTOs are documents that cover a single, specific topic, and attempt to cover it fairly completely. Modelled on the Linux Documentation Project’s HOWTO collection, this collection is an effort to foster documentation that’s more detailed than the Python Library Reference. Currently, the HOWTOs are: * Porting Python 2 Code to Python 3 * Porting Extension Modules to Python 3 * Curses Programming with Python * Descriptor HowTo Guide * Functional Programming HOWTO * Logging HOWTO * Logging Cookbook * Regular Expression HOWTO * Socket Programming HOWTO * Sorting HOW TO * Unicode HOWTO * HOWTO Fetch Internet Resources Using The urllib Package * Argparse Tutorial * An introduction to the ipaddress module * Argument Clinic How-To * Instrumenting CPython with DTrace and SystemTap Instrumenting CPython with DTrace and SystemTap *********************************************** author: David Malcolm author: Łukasz Langa DTrace and SystemTap are monitoring tools, each providing a way to inspect what the processes on a computer system are doing. They both use domain-specific languages allowing a user to write scripts which: * filter which processes are to be observed * gather data from the processes of interest * generate reports on the data As of Python 3.6, CPython can be built with embedded “markers”, also known as “probes”, that can be observed by a DTrace or SystemTap script, making it easier to monitor what the CPython processes on a system are doing. **CPython implementation detail:** DTrace markers are implementation details of the CPython interpreter. No guarantees are made about probe compatibility between versions of CPython. DTrace scripts can stop working or work incorrectly without warning when changing CPython versions. Enabling the static markers =========================== macOS comes with built-in support for DTrace. On Linux, in order to build CPython with the embedded markers for SystemTap, the SystemTap development tools must be installed. On a Linux machine, this can be done via: $ yum install systemtap-sdt-devel or: $ sudo apt-get install systemtap-sdt-dev CPython must then be configured "--with-dtrace": checking for --with-dtrace... yes On macOS, you can list available DTrace probes by running a Python process in the background and listing all probes made available by the Python provider: $ python3.6 -q & $ sudo dtrace -l -P python$! # or: dtrace -l -m python3.6 ID PROVIDER MODULE FUNCTION NAME 29564 python18035 python3.6 _PyEval_EvalFrameDefault function-entry 29565 python18035 python3.6 dtrace_function_entry function-entry 29566 python18035 python3.6 _PyEval_EvalFrameDefault function-return 29567 python18035 python3.6 dtrace_function_return function-return 29568 python18035 python3.6 collect gc-done 29569 python18035 python3.6 collect gc-start 29570 python18035 python3.6 _PyEval_EvalFrameDefault line 29571 python18035 python3.6 maybe_dtrace_line line On Linux, you can verify if the SystemTap static markers are present in the built binary by seeing if it contains a “.note.stapsdt” section. $ readelf -S ./python | grep .note.stapsdt [30] .note.stapsdt NOTE 0000000000000000 00308d78 If you’ve built Python as a shared library (with –enable-shared), you need to look instead within the shared library. For example: $ readelf -S libpython3.3dm.so.1.0 | grep .note.stapsdt [29] .note.stapsdt NOTE 0000000000000000 00365b68 Sufficiently modern readelf can print the metadata: $ readelf -n ./python Displaying notes found at file offset 0x00000254 with length 0x00000020: Owner Data size Description GNU 0x00000010 NT_GNU_ABI_TAG (ABI version tag) OS: Linux, ABI: 2.6.32 Displaying notes found at file offset 0x00000274 with length 0x00000024: Owner Data size Description GNU 0x00000014 NT_GNU_BUILD_ID (unique build ID bitstring) Build ID: df924a2b08a7e89f6e11251d4602022977af2670 Displaying notes found at file offset 0x002d6c30 with length 0x00000144: Owner Data size Description stapsdt 0x00000031 NT_STAPSDT (SystemTap probe descriptors) Provider: python Name: gc__start Location: 0x00000000004371c3, Base: 0x0000000000630ce2, Semaphore: 0x00000000008d6bf6 Arguments: -4@%ebx stapsdt 0x00000030 NT_STAPSDT (SystemTap probe descriptors) Provider: python Name: gc__done Location: 0x00000000004374e1, Base: 0x0000000000630ce2, Semaphore: 0x00000000008d6bf8 Arguments: -8@%rax stapsdt 0x00000045 NT_STAPSDT (SystemTap probe descriptors) Provider: python Name: function__entry Location: 0x000000000053db6c, Base: 0x0000000000630ce2, Semaphore: 0x00000000008d6be8 Arguments: 8@%rbp 8@%r12 -4@%eax stapsdt 0x00000046 NT_STAPSDT (SystemTap probe descriptors) Provider: python Name: function__return Location: 0x000000000053dba8, Base: 0x0000000000630ce2, Semaphore: 0x00000000008d6bea Arguments: 8@%rbp 8@%r12 -4@%eax The above metadata contains information for SystemTap describing how it can patch strategically-placed machine code instructions to enable the tracing hooks used by a SystemTap script. Static DTrace probes ==================== The following example DTrace script can be used to show the call/return hierarchy of a Python script, only tracing within the invocation of a function called “start”. In other words, import-time function invocations are not going to be listed: self int indent; python$target:::function-entry /copyinstr(arg1) == "start"/ { self->trace = 1; } python$target:::function-entry /self->trace/ { printf("%d\t%*s:", timestamp, 15, probename); printf("%*s", self->indent, ""); printf("%s:%s:%d\n", basename(copyinstr(arg0)), copyinstr(arg1), arg2); self->indent++; } python$target:::function-return /self->trace/ { self->indent--; printf("%d\t%*s:", timestamp, 15, probename); printf("%*s", self->indent, ""); printf("%s:%s:%d\n", basename(copyinstr(arg0)), copyinstr(arg1), arg2); } python$target:::function-return /copyinstr(arg1) == "start"/ { self->trace = 0; } It can be invoked like this: $ sudo dtrace -q -s call_stack.d -c "python3.6 script.py" The output looks like this: 156641360502280 function-entry:call_stack.py:start:23 156641360518804 function-entry: call_stack.py:function_1:1 156641360532797 function-entry: call_stack.py:function_3:9 156641360546807 function-return: call_stack.py:function_3:10 156641360563367 function-return: call_stack.py:function_1:2 156641360578365 function-entry: call_stack.py:function_2:5 156641360591757 function-entry: call_stack.py:function_1:1 156641360605556 function-entry: call_stack.py:function_3:9 156641360617482 function-return: call_stack.py:function_3:10 156641360629814 function-return: call_stack.py:function_1:2 156641360642285 function-return: call_stack.py:function_2:6 156641360656770 function-entry: call_stack.py:function_3:9 156641360669707 function-return: call_stack.py:function_3:10 156641360687853 function-entry: call_stack.py:function_4:13 156641360700719 function-return: call_stack.py:function_4:14 156641360719640 function-entry: call_stack.py:function_5:18 156641360732567 function-return: call_stack.py:function_5:21 156641360747370 function-return:call_stack.py:start:28 Static SystemTap markers ======================== The low-level way to use the SystemTap integration is to use the static markers directly. This requires you to explicitly state the binary file containing them. For example, this SystemTap script can be used to show the call/return hierarchy of a Python script: probe process("python").mark("function__entry") { filename = user_string($arg1); funcname = user_string($arg2); lineno = $arg3; printf("%s => %s in %s:%d\\n", thread_indent(1), funcname, filename, lineno); } probe process("python").mark("function__return") { filename = user_string($arg1); funcname = user_string($arg2); lineno = $arg3; printf("%s <= %s in %s:%d\\n", thread_indent(-1), funcname, filename, lineno); } It can be invoked like this: $ stap \ show-call-hierarchy.stp \ -c "./python test.py" The output looks like this: 11408 python(8274): => __contains__ in Lib/_abcoll.py:362 11414 python(8274): => __getitem__ in Lib/os.py:425 11418 python(8274): => encode in Lib/os.py:490 11424 python(8274): <= encode in Lib/os.py:493 11428 python(8274): <= __getitem__ in Lib/os.py:426 11433 python(8274): <= __contains__ in Lib/_abcoll.py:366 where the columns are: * time in microseconds since start of script * name of executable * PID of process and the remainder indicates the call/return hierarchy as the script executes. For a *–enable-shared* build of CPython, the markers are contained within the libpython shared library, and the probe’s dotted path needs to reflect this. For example, this line from the above example: probe process("python").mark("function__entry") { should instead read: probe process("python").library("libpython3.6dm.so.1.0").mark("function__entry") { (assuming a debug build of CPython 3.6) Available static markers ======================== function__entry(str filename, str funcname, int lineno) This marker indicates that execution of a Python function has begun. It is only triggered for pure-Python (bytecode) functions. The filename, function name, and line number are provided back to the tracing script as positional arguments, which must be accessed using "$arg1", "$arg2", "$arg3": * "$arg1" : "(const char *)" filename, accessible using "user_string($arg1)" * "$arg2" : "(const char *)" function name, accessible using "user_string($arg2)" * "$arg3" : "int" line number function__return(str filename, str funcname, int lineno) This marker is the converse of "function__entry()", and indicates that execution of a Python function has ended (either via "return", or via an exception). It is only triggered for pure-Python (bytecode) functions. The arguments are the same as for "function__entry()" line(str filename, str funcname, int lineno) This marker indicates a Python line is about to be executed. It is the equivalent of line-by-line tracing with a Python profiler. It is not triggered within C functions. The arguments are the same as for "function__entry()". gc__start(int generation) Fires when the Python interpreter starts a garbage collection cycle. "arg0" is the generation to scan, like "gc.collect()". gc__done(long collected) Fires when the Python interpreter finishes a garbage collection cycle. "arg0" is the number of collected objects. import__find__load__start(str modulename) Fires before "importlib" attempts to find and load the module. "arg0" is the module name. New in version 3.7. import__find__load__done(str modulename, int found) Fires after "importlib"’s find_and_load function is called. "arg0" is the module name, "arg1" indicates if module was successfully loaded. New in version 3.7. audit(str event, void *tuple) Fires when "sys.audit()" or "PySys_Audit()" is called. "arg0" is the event name as C string, "arg1" is a "PyObject" pointer to a tuple object. New in version 3.8. SystemTap Tapsets ================= The higher-level way to use the SystemTap integration is to use a “tapset”: SystemTap’s equivalent of a library, which hides some of the lower-level details of the static markers. Here is a tapset file, based on a non-shared build of CPython: /* Provide a higher-level wrapping around the function__entry and function__return markers: \*/ probe python.function.entry = process("python").mark("function__entry") { filename = user_string($arg1); funcname = user_string($arg2); lineno = $arg3; frameptr = $arg4 } probe python.function.return = process("python").mark("function__return") { filename = user_string($arg1); funcname = user_string($arg2); lineno = $arg3; frameptr = $arg4 } If this file is installed in SystemTap’s tapset directory (e.g. "/usr/share/systemtap/tapset"), then these additional probepoints become available: python.function.entry(str filename, str funcname, int lineno, frameptr) This probe point indicates that execution of a Python function has begun. It is only triggered for pure-Python (bytecode) functions. python.function.return(str filename, str funcname, int lineno, frameptr) This probe point is the converse of "python.function.return", and indicates that execution of a Python function has ended (either via "return", or via an exception). It is only triggered for pure- Python (bytecode) functions. Examples ======== This SystemTap script uses the tapset above to more cleanly implement the example given above of tracing the Python function-call hierarchy, without needing to directly name the static markers: probe python.function.entry { printf("%s => %s in %s:%d\n", thread_indent(1), funcname, filename, lineno); } probe python.function.return { printf("%s <= %s in %s:%d\n", thread_indent(-1), funcname, filename, lineno); } The following script uses the tapset above to provide a top-like view of all running CPython code, showing the top 20 most frequently- entered bytecode frames, each second, across the whole system: global fn_calls; probe python.function.entry { fn_calls[pid(), filename, funcname, lineno] += 1; } probe timer.ms(1000) { printf("\033[2J\033[1;1H") /* clear screen \*/ printf("%6s %80s %6s %30s %6s\n", "PID", "FILENAME", "LINE", "FUNCTION", "CALLS") foreach ([pid, filename, funcname, lineno] in fn_calls- limit 20) { printf("%6d %80s %6d %30s %6d\n", pid, filename, lineno, funcname, fn_calls[pid, filename, funcname, lineno]); } delete fn_calls; } An introduction to the ipaddress module *************************************** author: Peter Moody author: Nick Coghlan Overview ^^^^^^^^ This document aims to provide a gentle introduction to the "ipaddress" module. It is aimed primarily at users that aren’t already familiar with IP networking terminology, but may also be useful to network engineers wanting an overview of how "ipaddress" represents IP network addressing concepts. Creating Address/Network/Interface objects ========================================== Since "ipaddress" is a module for inspecting and manipulating IP addresses, the first thing you’ll want to do is create some objects. You can use "ipaddress" to create objects from strings and integers. A Note on IP Versions --------------------- For readers that aren’t particularly familiar with IP addressing, it’s important to know that the Internet Protocol is currently in the process of moving from version 4 of the protocol to version 6. This transition is occurring largely because version 4 of the protocol doesn’t provide enough addresses to handle the needs of the whole world, especially given the increasing number of devices with direct connections to the internet. Explaining the details of the differences between the two versions of the protocol is beyond the scope of this introduction, but readers need to at least be aware that these two versions exist, and it will sometimes be necessary to force the use of one version or the other. IP Host Addresses ----------------- Addresses, often referred to as “host addresses” are the most basic unit when working with IP addressing. The simplest way to create addresses is to use the "ipaddress.ip_address()" factory function, which automatically determines whether to create an IPv4 or IPv6 address based on the passed in value: >>> ipaddress.ip_address('192.0.2.1') IPv4Address('192.0.2.1') >>> ipaddress.ip_address('2001:DB8::1') IPv6Address('2001:db8::1') Addresses can also be created directly from integers. Values that will fit within 32 bits are assumed to be IPv4 addresses: >>> ipaddress.ip_address(3221225985) IPv4Address('192.0.2.1') >>> ipaddress.ip_address(42540766411282592856903984951653826561) IPv6Address('2001:db8::1') To force the use of IPv4 or IPv6 addresses, the relevant classes can be invoked directly. This is particularly useful to force creation of IPv6 addresses for small integers: >>> ipaddress.ip_address(1) IPv4Address('0.0.0.1') >>> ipaddress.IPv4Address(1) IPv4Address('0.0.0.1') >>> ipaddress.IPv6Address(1) IPv6Address('::1') Defining Networks ----------------- Host addresses are usually grouped together into IP networks, so "ipaddress" provides a way to create, inspect and manipulate network definitions. IP network objects are constructed from strings that define the range of host addresses that are part of that network. The simplest form for that information is a “network address/network prefix” pair, where the prefix defines the number of leading bits that are compared to determine whether or not an address is part of the network and the network address defines the expected value of those bits. As for addresses, a factory function is provided that determines the correct IP version automatically: >>> ipaddress.ip_network('192.0.2.0/24') IPv4Network('192.0.2.0/24') >>> ipaddress.ip_network('2001:db8::0/96') IPv6Network('2001:db8::/96') Network objects cannot have any host bits set. The practical effect of this is that "192.0.2.1/24" does not describe a network. Such definitions are referred to as interface objects since the ip- on-a-network notation is commonly used to describe network interfaces of a computer on a given network and are described further in the next section. By default, attempting to create a network object with host bits set will result in "ValueError" being raised. To request that the additional bits instead be coerced to zero, the flag "strict=False" can be passed to the constructor: >>> ipaddress.ip_network('192.0.2.1/24') Traceback (most recent call last): ... ValueError: 192.0.2.1/24 has host bits set >>> ipaddress.ip_network('192.0.2.1/24', strict=False) IPv4Network('192.0.2.0/24') While the string form offers significantly more flexibility, networks can also be defined with integers, just like host addresses. In this case, the network is considered to contain only the single address identified by the integer, so the network prefix includes the entire network address: >>> ipaddress.ip_network(3221225984) IPv4Network('192.0.2.0/32') >>> ipaddress.ip_network(42540766411282592856903984951653826560) IPv6Network('2001:db8::/128') As with addresses, creation of a particular kind of network can be forced by calling the class constructor directly instead of using the factory function. Host Interfaces --------------- As mentioned just above, if you need to describe an address on a particular network, neither the address nor the network classes are sufficient. Notation like "192.0.2.1/24" is commonly used by network engineers and the people who write tools for firewalls and routers as shorthand for “the host "192.0.2.1" on the network "192.0.2.0/24"”, Accordingly, "ipaddress" provides a set of hybrid classes that associate an address with a particular network. The interface for creation is identical to that for defining network objects, except that the address portion isn’t constrained to being a network address. >>> ipaddress.ip_interface('192.0.2.1/24') IPv4Interface('192.0.2.1/24') >>> ipaddress.ip_interface('2001:db8::1/96') IPv6Interface('2001:db8::1/96') Integer inputs are accepted (as with networks), and use of a particular IP version can be forced by calling the relevant constructor directly. Inspecting Address/Network/Interface Objects ============================================ You’ve gone to the trouble of creating an IPv(4|6)(Address|Network|Interface) object, so you probably want to get information about it. "ipaddress" tries to make doing this easy and intuitive. Extracting the IP version: >>> addr4 = ipaddress.ip_address('192.0.2.1') >>> addr6 = ipaddress.ip_address('2001:db8::1') >>> addr6.version 6 >>> addr4.version 4 Obtaining the network from an interface: >>> host4 = ipaddress.ip_interface('192.0.2.1/24') >>> host4.network IPv4Network('192.0.2.0/24') >>> host6 = ipaddress.ip_interface('2001:db8::1/96') >>> host6.network IPv6Network('2001:db8::/96') Finding out how many individual addresses are in a network: >>> net4 = ipaddress.ip_network('192.0.2.0/24') >>> net4.num_addresses 256 >>> net6 = ipaddress.ip_network('2001:db8::0/96') >>> net6.num_addresses 4294967296 Iterating through the “usable” addresses on a network: >>> net4 = ipaddress.ip_network('192.0.2.0/24') >>> for x in net4.hosts(): ... print(x) 192.0.2.1 192.0.2.2 192.0.2.3 192.0.2.4 ... 192.0.2.252 192.0.2.253 192.0.2.254 Obtaining the netmask (i.e. set bits corresponding to the network prefix) or the hostmask (any bits that are not part of the netmask): >>> net4 = ipaddress.ip_network('192.0.2.0/24') >>> net4.netmask IPv4Address('255.255.255.0') >>> net4.hostmask IPv4Address('0.0.0.255') >>> net6 = ipaddress.ip_network('2001:db8::0/96') >>> net6.netmask IPv6Address('ffff:ffff:ffff:ffff:ffff:ffff::') >>> net6.hostmask IPv6Address('::ffff:ffff') Exploding or compressing the address: >>> addr6.exploded '2001:0db8:0000:0000:0000:0000:0000:0001' >>> addr6.compressed '2001:db8::1' >>> net6.exploded '2001:0db8:0000:0000:0000:0000:0000:0000/96' >>> net6.compressed '2001:db8::/96' While IPv4 doesn’t support explosion or compression, the associated objects still provide the relevant properties so that version neutral code can easily ensure the most concise or most verbose form is used for IPv6 addresses while still correctly handling IPv4 addresses. Networks as lists of Addresses ============================== It’s sometimes useful to treat networks as lists. This means it is possible to index them like this: >>> net4[1] IPv4Address('192.0.2.1') >>> net4[-1] IPv4Address('192.0.2.255') >>> net6[1] IPv6Address('2001:db8::1') >>> net6[-1] IPv6Address('2001:db8::ffff:ffff') It also means that network objects lend themselves to using the list membership test syntax like this: if address in network: # do something Containment testing is done efficiently based on the network prefix: >>> addr4 = ipaddress.ip_address('192.0.2.1') >>> addr4 in ipaddress.ip_network('192.0.2.0/24') True >>> addr4 in ipaddress.ip_network('192.0.3.0/24') False Comparisons =========== "ipaddress" provides some simple, hopefully intuitive ways to compare objects, where it makes sense: >>> ipaddress.ip_address('192.0.2.1') < ipaddress.ip_address('192.0.2.2') True A "TypeError" exception is raised if you try to compare objects of different versions or different types. Using IP Addresses with other modules ===================================== Other modules that use IP addresses (such as "socket") usually won’t accept objects from this module directly. Instead, they must be coerced to an integer or string that the other module will accept: >>> addr4 = ipaddress.ip_address('192.0.2.1') >>> str(addr4) '192.0.2.1' >>> int(addr4) 3221225985 Getting more detail when instance creation fails ================================================ When creating address/network/interface objects using the version- agnostic factory functions, any errors will be reported as "ValueError" with a generic error message that simply says the passed in value was not recognized as an object of that type. The lack of a specific error is because it’s necessary to know whether the value is *supposed* to be IPv4 or IPv6 in order to provide more detail on why it has been rejected. To support use cases where it is useful to have access to this additional detail, the individual class constructors actually raise the "ValueError" subclasses "ipaddress.AddressValueError" and "ipaddress.NetmaskValueError" to indicate exactly which part of the definition failed to parse correctly. The error messages are significantly more detailed when using the class constructors directly. For example: >>> ipaddress.ip_address("192.168.0.256") Traceback (most recent call last): ... ValueError: '192.168.0.256' does not appear to be an IPv4 or IPv6 address >>> ipaddress.IPv4Address("192.168.0.256") Traceback (most recent call last): ... ipaddress.AddressValueError: Octet 256 (> 255) not permitted in '192.168.0.256' >>> ipaddress.ip_network("192.168.0.1/64") Traceback (most recent call last): ... ValueError: '192.168.0.1/64' does not appear to be an IPv4 or IPv6 network >>> ipaddress.IPv4Network("192.168.0.1/64") Traceback (most recent call last): ... ipaddress.NetmaskValueError: '64' is not a valid netmask However, both of the module specific exceptions have "ValueError" as their parent class, so if you’re not concerned with the particular type of error, you can still write code like the following: try: network = ipaddress.IPv4Network(address) except ValueError: print('address/netmask is invalid for IPv4:', address) Logging Cookbook **************** Author: Vinay Sajip This page contains a number of recipes related to logging, which have been found useful in the past. Using logging in multiple modules ================================= Multiple calls to "logging.getLogger('someLogger')" return a reference to the same logger object. This is true not only within the same module, but also across modules as long as it is in the same Python interpreter process. It is true for references to the same object; additionally, application code can define and configure a parent logger in one module and create (but not configure) a child logger in a separate module, and all logger calls to the child will pass up to the parent. Here is a main module: import logging import auxiliary_module # create logger with 'spam_application' logger = logging.getLogger('spam_application') logger.setLevel(logging.DEBUG) # create file handler which logs even debug messages fh = logging.FileHandler('spam.log') fh.setLevel(logging.DEBUG) # create console handler with a higher log level ch = logging.StreamHandler() ch.setLevel(logging.ERROR) # create formatter and add it to the handlers formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') fh.setFormatter(formatter) ch.setFormatter(formatter) # add the handlers to the logger logger.addHandler(fh) logger.addHandler(ch) logger.info('creating an instance of auxiliary_module.Auxiliary') a = auxiliary_module.Auxiliary() logger.info('created an instance of auxiliary_module.Auxiliary') logger.info('calling auxiliary_module.Auxiliary.do_something') a.do_something() logger.info('finished auxiliary_module.Auxiliary.do_something') logger.info('calling auxiliary_module.some_function()') auxiliary_module.some_function() logger.info('done with auxiliary_module.some_function()') Here is the auxiliary module: import logging # create logger module_logger = logging.getLogger('spam_application.auxiliary') class Auxiliary: def __init__(self): self.logger = logging.getLogger('spam_application.auxiliary.Auxiliary') self.logger.info('creating an instance of Auxiliary') def do_something(self): self.logger.info('doing something') a = 1 + 1 self.logger.info('done doing something') def some_function(): module_logger.info('received a call to "some_function"') The output looks like this: 2005-03-23 23:47:11,663 - spam_application - INFO - creating an instance of auxiliary_module.Auxiliary 2005-03-23 23:47:11,665 - spam_application.auxiliary.Auxiliary - INFO - creating an instance of Auxiliary 2005-03-23 23:47:11,665 - spam_application - INFO - created an instance of auxiliary_module.Auxiliary 2005-03-23 23:47:11,668 - spam_application - INFO - calling auxiliary_module.Auxiliary.do_something 2005-03-23 23:47:11,668 - spam_application.auxiliary.Auxiliary - INFO - doing something 2005-03-23 23:47:11,669 - spam_application.auxiliary.Auxiliary - INFO - done doing something 2005-03-23 23:47:11,670 - spam_application - INFO - finished auxiliary_module.Auxiliary.do_something 2005-03-23 23:47:11,671 - spam_application - INFO - calling auxiliary_module.some_function() 2005-03-23 23:47:11,672 - spam_application.auxiliary - INFO - received a call to 'some_function' 2005-03-23 23:47:11,673 - spam_application - INFO - done with auxiliary_module.some_function() Logging from multiple threads ============================= Logging from multiple threads requires no special effort. The following example shows logging from the main (initial) thread and another thread: import logging import threading import time def worker(arg): while not arg['stop']: logging.debug('Hi from myfunc') time.sleep(0.5) def main(): logging.basicConfig(level=logging.DEBUG, format='%(relativeCreated)6d %(threadName)s %(message)s') info = {'stop': False} thread = threading.Thread(target=worker, args=(info,)) thread.start() while True: try: logging.debug('Hello from main') time.sleep(0.75) except KeyboardInterrupt: info['stop'] = True break thread.join() if __name__ == '__main__': main() When run, the script should print something like the following: 0 Thread-1 Hi from myfunc 3 MainThread Hello from main 505 Thread-1 Hi from myfunc 755 MainThread Hello from main 1007 Thread-1 Hi from myfunc 1507 MainThread Hello from main 1508 Thread-1 Hi from myfunc 2010 Thread-1 Hi from myfunc 2258 MainThread Hello from main 2512 Thread-1 Hi from myfunc 3009 MainThread Hello from main 3013 Thread-1 Hi from myfunc 3515 Thread-1 Hi from myfunc 3761 MainThread Hello from main 4017 Thread-1 Hi from myfunc 4513 MainThread Hello from main 4518 Thread-1 Hi from myfunc This shows the logging output interspersed as one might expect. This approach works for more threads than shown here, of course. Multiple handlers and formatters ================================ Loggers are plain Python objects. The "addHandler()" method has no minimum or maximum quota for the number of handlers you may add. Sometimes it will be beneficial for an application to log all messages of all severities to a text file while simultaneously logging errors or above to the console. To set this up, simply configure the appropriate handlers. The logging calls in the application code will remain unchanged. Here is a slight modification to the previous simple module-based configuration example: import logging logger = logging.getLogger('simple_example') logger.setLevel(logging.DEBUG) # create file handler which logs even debug messages fh = logging.FileHandler('spam.log') fh.setLevel(logging.DEBUG) # create console handler with a higher log level ch = logging.StreamHandler() ch.setLevel(logging.ERROR) # create formatter and add it to the handlers formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') ch.setFormatter(formatter) fh.setFormatter(formatter) # add the handlers to logger logger.addHandler(ch) logger.addHandler(fh) # 'application' code logger.debug('debug message') logger.info('info message') logger.warning('warn message') logger.error('error message') logger.critical('critical message') Notice that the ‘application’ code does not care about multiple handlers. All that changed was the addition and configuration of a new handler named *fh*. The ability to create new handlers with higher- or lower-severity filters can be very helpful when writing and testing an application. Instead of using many "print" statements for debugging, use "logger.debug": Unlike the print statements, which you will have to delete or comment out later, the logger.debug statements can remain intact in the source code and remain dormant until you need them again. At that time, the only change that needs to happen is to modify the severity level of the logger and/or handler to debug. Logging to multiple destinations ================================ Let’s say you want to log to console and file with different message formats and in differing circumstances. Say you want to log messages with levels of DEBUG and higher to file, and those messages at level INFO and higher to the console. Let’s also assume that the file should contain timestamps, but the console messages should not. Here’s how you can achieve this: import logging # set up logging to file - see previous section for more details logging.basicConfig(level=logging.DEBUG, format='%(asctime)s %(name)-12s %(levelname)-8s %(message)s', datefmt='%m-%d %H:%M', filename='/temp/myapp.log', filemode='w') # define a Handler which writes INFO messages or higher to the sys.stderr console = logging.StreamHandler() console.setLevel(logging.INFO) # set a format which is simpler for console use formatter = logging.Formatter('%(name)-12s: %(levelname)-8s %(message)s') # tell the handler to use this format console.setFormatter(formatter) # add the handler to the root logger logging.getLogger('').addHandler(console) # Now, we can log to the root logger, or any other logger. First the root... logging.info('Jackdaws love my big sphinx of quartz.') # Now, define a couple of other loggers which might represent areas in your # application: logger1 = logging.getLogger('myapp.area1') logger2 = logging.getLogger('myapp.area2') logger1.debug('Quick zephyrs blow, vexing daft Jim.') logger1.info('How quickly daft jumping zebras vex.') logger2.warning('Jail zesty vixen who grabbed pay from quack.') logger2.error('The five boxing wizards jump quickly.') When you run this, on the console you will see root : INFO Jackdaws love my big sphinx of quartz. myapp.area1 : INFO How quickly daft jumping zebras vex. myapp.area2 : WARNING Jail zesty vixen who grabbed pay from quack. myapp.area2 : ERROR The five boxing wizards jump quickly. and in the file you will see something like 10-22 22:19 root INFO Jackdaws love my big sphinx of quartz. 10-22 22:19 myapp.area1 DEBUG Quick zephyrs blow, vexing daft Jim. 10-22 22:19 myapp.area1 INFO How quickly daft jumping zebras vex. 10-22 22:19 myapp.area2 WARNING Jail zesty vixen who grabbed pay from quack. 10-22 22:19 myapp.area2 ERROR The five boxing wizards jump quickly. As you can see, the DEBUG message only shows up in the file. The other messages are sent to both destinations. This example uses console and file handlers, but you can use any number and combination of handlers you choose. Configuration server example ============================ Here is an example of a module using the logging configuration server: import logging import logging.config import time import os # read initial config file logging.config.fileConfig('logging.conf') # create and start listener on port 9999 t = logging.config.listen(9999) t.start() logger = logging.getLogger('simpleExample') try: # loop through logging calls to see the difference # new configurations make, until Ctrl+C is pressed while True: logger.debug('debug message') logger.info('info message') logger.warning('warn message') logger.error('error message') logger.critical('critical message') time.sleep(5) except KeyboardInterrupt: # cleanup logging.config.stopListening() t.join() And here is a script that takes a filename and sends that file to the server, properly preceded with the binary-encoded length, as the new logging configuration: #!/usr/bin/env python import socket, sys, struct with open(sys.argv[1], 'rb') as f: data_to_send = f.read() HOST = 'localhost' PORT = 9999 s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) print('connecting...') s.connect((HOST, PORT)) print('sending config...') s.send(struct.pack('>L', len(data_to_send))) s.send(data_to_send) s.close() print('complete') Dealing with handlers that block ================================ Sometimes you have to get your logging handlers to do their work without blocking the thread you’re logging from. This is common in Web applications, though of course it also occurs in other scenarios. A common culprit which demonstrates sluggish behaviour is the "SMTPHandler": sending emails can take a long time, for a number of reasons outside the developer’s control (for example, a poorly performing mail or network infrastructure). But almost any network- based handler can block: Even a "SocketHandler" operation may do a DNS query under the hood which is too slow (and this query can be deep in the socket library code, below the Python layer, and outside your control). One solution is to use a two-part approach. For the first part, attach only a "QueueHandler" to those loggers which are accessed from performance-critical threads. They simply write to their queue, which can be sized to a large enough capacity or initialized with no upper bound to their size. The write to the queue will typically be accepted quickly, though you will probably need to catch the "queue.Full" exception as a precaution in your code. If you are a library developer who has performance-critical threads in their code, be sure to document this (together with a suggestion to attach only "QueueHandlers" to your loggers) for the benefit of other developers who will use your code. The second part of the solution is "QueueListener", which has been designed as the counterpart to "QueueHandler". A "QueueListener" is very simple: it’s passed a queue and some handlers, and it fires up an internal thread which listens to its queue for LogRecords sent from "QueueHandlers" (or any other source of "LogRecords", for that matter). The "LogRecords" are removed from the queue and passed to the handlers for processing. The advantage of having a separate "QueueListener" class is that you can use the same instance to service multiple "QueueHandlers". This is more resource-friendly than, say, having threaded versions of the existing handler classes, which would eat up one thread per handler for no particular benefit. An example of using these two classes follows (imports omitted): que = queue.Queue(-1) # no limit on size queue_handler = QueueHandler(que) handler = logging.StreamHandler() listener = QueueListener(que, handler) root = logging.getLogger() root.addHandler(queue_handler) formatter = logging.Formatter('%(threadName)s: %(message)s') handler.setFormatter(formatter) listener.start() # The log output will display the thread which generated # the event (the main thread) rather than the internal # thread which monitors the internal queue. This is what # you want to happen. root.warning('Look out!') listener.stop() which, when run, will produce: MainThread: Look out! Changed in version 3.5: Prior to Python 3.5, the "QueueListener" always passed every message received from the queue to every handler it was initialized with. (This was because it was assumed that level filtering was all done on the other side, where the queue is filled.) From 3.5 onwards, this behaviour can be changed by passing a keyword argument "respect_handler_level=True" to the listener’s constructor. When this is done, the listener compares the level of each message with the handler’s level, and only passes a message to a handler if it’s appropriate to do so. Sending and receiving logging events across a network ===================================================== Let’s say you want to send logging events across a network, and handle them at the receiving end. A simple way of doing this is attaching a "SocketHandler" instance to the root logger at the sending end: import logging, logging.handlers rootLogger = logging.getLogger('') rootLogger.setLevel(logging.DEBUG) socketHandler = logging.handlers.SocketHandler('localhost', logging.handlers.DEFAULT_TCP_LOGGING_PORT) # don't bother with a formatter, since a socket handler sends the event as # an unformatted pickle rootLogger.addHandler(socketHandler) # Now, we can log to the root logger, or any other logger. First the root... logging.info('Jackdaws love my big sphinx of quartz.') # Now, define a couple of other loggers which might represent areas in your # application: logger1 = logging.getLogger('myapp.area1') logger2 = logging.getLogger('myapp.area2') logger1.debug('Quick zephyrs blow, vexing daft Jim.') logger1.info('How quickly daft jumping zebras vex.') logger2.warning('Jail zesty vixen who grabbed pay from quack.') logger2.error('The five boxing wizards jump quickly.') At the receiving end, you can set up a receiver using the "socketserver" module. Here is a basic working example: import pickle import logging import logging.handlers import socketserver import struct class LogRecordStreamHandler(socketserver.StreamRequestHandler): """Handler for a streaming logging request. This basically logs the record using whatever logging policy is configured locally. """ def handle(self): """ Handle multiple requests - each expected to be a 4-byte length, followed by the LogRecord in pickle format. Logs the record according to whatever policy is configured locally. """ while True: chunk = self.connection.recv(4) if len(chunk) < 4: break slen = struct.unpack('>L', chunk)[0] chunk = self.connection.recv(slen) while len(chunk) < slen: chunk = chunk + self.connection.recv(slen - len(chunk)) obj = self.unPickle(chunk) record = logging.makeLogRecord(obj) self.handleLogRecord(record) def unPickle(self, data): return pickle.loads(data) def handleLogRecord(self, record): # if a name is specified, we use the named logger rather than the one # implied by the record. if self.server.logname is not None: name = self.server.logname else: name = record.name logger = logging.getLogger(name) # N.B. EVERY record gets logged. This is because Logger.handle # is normally called AFTER logger-level filtering. If you want # to do filtering, do it at the client end to save wasting # cycles and network bandwidth! logger.handle(record) class LogRecordSocketReceiver(socketserver.ThreadingTCPServer): """ Simple TCP socket-based logging receiver suitable for testing. """ allow_reuse_address = True def __init__(self, host='localhost', port=logging.handlers.DEFAULT_TCP_LOGGING_PORT, handler=LogRecordStreamHandler): socketserver.ThreadingTCPServer.__init__(self, (host, port), handler) self.abort = 0 self.timeout = 1 self.logname = None def serve_until_stopped(self): import select abort = 0 while not abort: rd, wr, ex = select.select([self.socket.fileno()], [], [], self.timeout) if rd: self.handle_request() abort = self.abort def main(): logging.basicConfig( format='%(relativeCreated)5d %(name)-15s %(levelname)-8s %(message)s') tcpserver = LogRecordSocketReceiver() print('About to start TCP server...') tcpserver.serve_until_stopped() if __name__ == '__main__': main() First run the server, and then the client. On the client side, nothing is printed on the console; on the server side, you should see something like: About to start TCP server... 59 root INFO Jackdaws love my big sphinx of quartz. 59 myapp.area1 DEBUG Quick zephyrs blow, vexing daft Jim. 69 myapp.area1 INFO How quickly daft jumping zebras vex. 69 myapp.area2 WARNING Jail zesty vixen who grabbed pay from quack. 69 myapp.area2 ERROR The five boxing wizards jump quickly. Note that there are some security issues with pickle in some scenarios. If these affect you, you can use an alternative serialization scheme by overriding the "makePickle()" method and implementing your alternative there, as well as adapting the above script to use your alternative serialization. Running a logging socket listener in production ----------------------------------------------- To run a logging listener in production, you may need to use a process-management tool such as Supervisor. Here is a Gist which provides the bare-bones files to run the above functionality using Supervisor: you will need to change the */path/to/* parts in the Gist to reflect the actual paths you want to use. Adding contextual information to your logging output ==================================================== Sometimes you want logging output to contain contextual information in addition to the parameters passed to the logging call. For example, in a networked application, it may be desirable to log client-specific information in the log (e.g. remote client’s username, or IP address). Although you could use the *extra* parameter to achieve this, it’s not always convenient to pass the information in this way. While it might be tempting to create "Logger" instances on a per-connection basis, this is not a good idea because these instances are not garbage collected. While this is not a problem in practice, when the number of "Logger" instances is dependent on the level of granularity you want to use in logging an application, it could be hard to manage if the number of "Logger" instances becomes effectively unbounded. Using LoggerAdapters to impart contextual information ----------------------------------------------------- An easy way in which you can pass contextual information to be output along with logging event information is to use the "LoggerAdapter" class. This class is designed to look like a "Logger", so that you can call "debug()", "info()", "warning()", "error()", "exception()", "critical()" and "log()". These methods have the same signatures as their counterparts in "Logger", so you can use the two types of instances interchangeably. When you create an instance of "LoggerAdapter", you pass it a "Logger" instance and a dict-like object which contains your contextual information. When you call one of the logging methods on an instance of "LoggerAdapter", it delegates the call to the underlying instance of "Logger" passed to its constructor, and arranges to pass the contextual information in the delegated call. Here’s a snippet from the code of "LoggerAdapter": def debug(self, msg, /, *args, **kwargs): """ Delegate a debug call to the underlying logger, after adding contextual information from this adapter instance. """ msg, kwargs = self.process(msg, kwargs) self.logger.debug(msg, *args, **kwargs) The "process()" method of "LoggerAdapter" is where the contextual information is added to the logging output. It’s passed the message and keyword arguments of the logging call, and it passes back (potentially) modified versions of these to use in the call to the underlying logger. The default implementation of this method leaves the message alone, but inserts an ‘extra’ key in the keyword argument whose value is the dict-like object passed to the constructor. Of course, if you had passed an ‘extra’ keyword argument in the call to the adapter, it will be silently overwritten. The advantage of using ‘extra’ is that the values in the dict-like object are merged into the "LogRecord" instance’s __dict__, allowing you to use customized strings with your "Formatter" instances which know about the keys of the dict-like object. If you need a different method, e.g. if you want to prepend or append the contextual information to the message string, you just need to subclass "LoggerAdapter" and override "process()" to do what you need. Here is a simple example: class CustomAdapter(logging.LoggerAdapter): """ This example adapter expects the passed in dict-like object to have a 'connid' key, whose value in brackets is prepended to the log message. """ def process(self, msg, kwargs): return '[%s] %s' % (self.extra['connid'], msg), kwargs which you can use like this: logger = logging.getLogger(__name__) adapter = CustomAdapter(logger, {'connid': some_conn_id}) Then any events that you log to the adapter will have the value of "some_conn_id" prepended to the log messages. Using objects other than dicts to pass contextual information ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ You don’t need to pass an actual dict to a "LoggerAdapter" - you could pass an instance of a class which implements "__getitem__" and "__iter__" so that it looks like a dict to logging. This would be useful if you want to generate values dynamically (whereas the values in a dict would be constant). Using Filters to impart contextual information ---------------------------------------------- You can also add contextual information to log output using a user- defined "Filter". "Filter" instances are allowed to modify the "LogRecords" passed to them, including adding additional attributes which can then be output using a suitable format string, or if needed a custom "Formatter". For example in a web application, the request being processed (or at least, the interesting parts of it) can be stored in a threadlocal ("threading.local") variable, and then accessed from a "Filter" to add, say, information from the request - say, the remote IP address and remote user’s username - to the "LogRecord", using the attribute names ‘ip’ and ‘user’ as in the "LoggerAdapter" example above. In that case, the same format string can be used to get similar output to that shown above. Here’s an example script: import logging from random import choice class ContextFilter(logging.Filter): """ This is a filter which injects contextual information into the log. Rather than use actual contextual information, we just use random data in this demo. """ USERS = ['jim', 'fred', 'sheila'] IPS = ['123.231.231.123', '127.0.0.1', '192.168.0.1'] def filter(self, record): record.ip = choice(ContextFilter.IPS) record.user = choice(ContextFilter.USERS) return True if __name__ == '__main__': levels = (logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR, logging.CRITICAL) logging.basicConfig(level=logging.DEBUG, format='%(asctime)-15s %(name)-5s %(levelname)-8s IP: %(ip)-15s User: %(user)-8s %(message)s') a1 = logging.getLogger('a.b.c') a2 = logging.getLogger('d.e.f') f = ContextFilter() a1.addFilter(f) a2.addFilter(f) a1.debug('A debug message') a1.info('An info message with %s', 'some parameters') for x in range(10): lvl = choice(levels) lvlname = logging.getLevelName(lvl) a2.log(lvl, 'A message at %s level with %d %s', lvlname, 2, 'parameters') which, when run, produces something like: 2010-09-06 22:38:15,292 a.b.c DEBUG IP: 123.231.231.123 User: fred A debug message 2010-09-06 22:38:15,300 a.b.c INFO IP: 192.168.0.1 User: sheila An info message with some parameters 2010-09-06 22:38:15,300 d.e.f CRITICAL IP: 127.0.0.1 User: sheila A message at CRITICAL level with 2 parameters 2010-09-06 22:38:15,300 d.e.f ERROR IP: 127.0.0.1 User: jim A message at ERROR level with 2 parameters 2010-09-06 22:38:15,300 d.e.f DEBUG IP: 127.0.0.1 User: sheila A message at DEBUG level with 2 parameters 2010-09-06 22:38:15,300 d.e.f ERROR IP: 123.231.231.123 User: fred A message at ERROR level with 2 parameters 2010-09-06 22:38:15,300 d.e.f CRITICAL IP: 192.168.0.1 User: jim A message at CRITICAL level with 2 parameters 2010-09-06 22:38:15,300 d.e.f CRITICAL IP: 127.0.0.1 User: sheila A message at CRITICAL level with 2 parameters 2010-09-06 22:38:15,300 d.e.f DEBUG IP: 192.168.0.1 User: jim A message at DEBUG level with 2 parameters 2010-09-06 22:38:15,301 d.e.f ERROR IP: 127.0.0.1 User: sheila A message at ERROR level with 2 parameters 2010-09-06 22:38:15,301 d.e.f DEBUG IP: 123.231.231.123 User: fred A message at DEBUG level with 2 parameters 2010-09-06 22:38:15,301 d.e.f INFO IP: 123.231.231.123 User: fred A message at INFO level with 2 parameters Logging to a single file from multiple processes ================================================ Although logging is thread-safe, and logging to a single file from multiple threads in a single process *is* supported, logging to a single file from *multiple processes* is *not* supported, because there is no standard way to serialize access to a single file across multiple processes in Python. If you need to log to a single file from multiple processes, one way of doing this is to have all the processes log to a "SocketHandler", and have a separate process which implements a socket server which reads from the socket and logs to file. (If you prefer, you can dedicate one thread in one of the existing processes to perform this function.) This section documents this approach in more detail and includes a working socket receiver which can be used as a starting point for you to adapt in your own applications. You could also write your own handler which uses the "Lock" class from the "multiprocessing" module to serialize access to the file from your processes. The existing "FileHandler" and subclasses do not make use of "multiprocessing" at present, though they may do so in the future. Note that at present, the "multiprocessing" module does not provide working lock functionality on all platforms (see https://bugs.python.org/issue3770). Alternatively, you can use a "Queue" and a "QueueHandler" to send all logging events to one of the processes in your multi-process application. The following example script demonstrates how you can do this; in the example a separate listener process listens for events sent by other processes and logs them according to its own logging configuration. Although the example only demonstrates one way of doing it (for example, you may want to use a listener thread rather than a separate listener process – the implementation would be analogous) it does allow for completely different logging configurations for the listener and the other processes in your application, and can be used as the basis for code meeting your own specific requirements: # You'll need these imports in your own code import logging import logging.handlers import multiprocessing # Next two import lines for this demo only from random import choice, random import time # # Because you'll want to define the logging configurations for listener and workers, the # listener and worker process functions take a configurer parameter which is a callable # for configuring logging for that process. These functions are also passed the queue, # which they use for communication. # # In practice, you can configure the listener however you want, but note that in this # simple example, the listener does not apply level or filter logic to received records. # In practice, you would probably want to do this logic in the worker processes, to avoid # sending events which would be filtered out between processes. # # The size of the rotated files is made small so you can see the results easily. def listener_configurer(): root = logging.getLogger() h = logging.handlers.RotatingFileHandler('mptest.log', 'a', 300, 10) f = logging.Formatter('%(asctime)s %(processName)-10s %(name)s %(levelname)-8s %(message)s') h.setFormatter(f) root.addHandler(h) # This is the listener process top-level loop: wait for logging events # (LogRecords)on the queue and handle them, quit when you get a None for a # LogRecord. def listener_process(queue, configurer): configurer() while True: try: record = queue.get() if record is None: # We send this as a sentinel to tell the listener to quit. break logger = logging.getLogger(record.name) logger.handle(record) # No level or filter logic applied - just do it! except Exception: import sys, traceback print('Whoops! Problem:', file=sys.stderr) traceback.print_exc(file=sys.stderr) # Arrays used for random selections in this demo LEVELS = [logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR, logging.CRITICAL] LOGGERS = ['a.b.c', 'd.e.f'] MESSAGES = [ 'Random message #1', 'Random message #2', 'Random message #3', ] # The worker configuration is done at the start of the worker process run. # Note that on Windows you can't rely on fork semantics, so each process # will run the logging configuration code when it starts. def worker_configurer(queue): h = logging.handlers.QueueHandler(queue) # Just the one handler needed root = logging.getLogger() root.addHandler(h) # send all messages, for demo; no other level or filter logic applied. root.setLevel(logging.DEBUG) # This is the worker process top-level loop, which just logs ten events with # random intervening delays before terminating. # The print messages are just so you know it's doing something! def worker_process(queue, configurer): configurer(queue) name = multiprocessing.current_process().name print('Worker started: %s' % name) for i in range(10): time.sleep(random()) logger = logging.getLogger(choice(LOGGERS)) level = choice(LEVELS) message = choice(MESSAGES) logger.log(level, message) print('Worker finished: %s' % name) # Here's where the demo gets orchestrated. Create the queue, create and start # the listener, create ten workers and start them, wait for them to finish, # then send a None to the queue to tell the listener to finish. def main(): queue = multiprocessing.Queue(-1) listener = multiprocessing.Process(target=listener_process, args=(queue, listener_configurer)) listener.start() workers = [] for i in range(10): worker = multiprocessing.Process(target=worker_process, args=(queue, worker_configurer)) workers.append(worker) worker.start() for w in workers: w.join() queue.put_nowait(None) listener.join() if __name__ == '__main__': main() A variant of the above script keeps the logging in the main process, in a separate thread: import logging import logging.config import logging.handlers from multiprocessing import Process, Queue import random import threading import time def logger_thread(q): while True: record = q.get() if record is None: break logger = logging.getLogger(record.name) logger.handle(record) def worker_process(q): qh = logging.handlers.QueueHandler(q) root = logging.getLogger() root.setLevel(logging.DEBUG) root.addHandler(qh) levels = [logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR, logging.CRITICAL] loggers = ['foo', 'foo.bar', 'foo.bar.baz', 'spam', 'spam.ham', 'spam.ham.eggs'] for i in range(100): lvl = random.choice(levels) logger = logging.getLogger(random.choice(loggers)) logger.log(lvl, 'Message no. %d', i) if __name__ == '__main__': q = Queue() d = { 'version': 1, 'formatters': { 'detailed': { 'class': 'logging.Formatter', 'format': '%(asctime)s %(name)-15s %(levelname)-8s %(processName)-10s %(message)s' } }, 'handlers': { 'console': { 'class': 'logging.StreamHandler', 'level': 'INFO', }, 'file': { 'class': 'logging.FileHandler', 'filename': 'mplog.log', 'mode': 'w', 'formatter': 'detailed', }, 'foofile': { 'class': 'logging.FileHandler', 'filename': 'mplog-foo.log', 'mode': 'w', 'formatter': 'detailed', }, 'errors': { 'class': 'logging.FileHandler', 'filename': 'mplog-errors.log', 'mode': 'w', 'level': 'ERROR', 'formatter': 'detailed', }, }, 'loggers': { 'foo': { 'handlers': ['foofile'] } }, 'root': { 'level': 'DEBUG', 'handlers': ['console', 'file', 'errors'] }, } workers = [] for i in range(5): wp = Process(target=worker_process, name='worker %d' % (i + 1), args=(q,)) workers.append(wp) wp.start() logging.config.dictConfig(d) lp = threading.Thread(target=logger_thread, args=(q,)) lp.start() # At this point, the main process could do some useful work of its own # Once it's done that, it can wait for the workers to terminate... for wp in workers: wp.join() # And now tell the logging thread to finish up, too q.put(None) lp.join() This variant shows how you can e.g. apply configuration for particular loggers - e.g. the "foo" logger has a special handler which stores all events in the "foo" subsystem in a file "mplog-foo.log". This will be used by the logging machinery in the main process (even though the logging events are generated in the worker processes) to direct the messages to the appropriate destinations. Using concurrent.futures.ProcessPoolExecutor -------------------------------------------- If you want to use "concurrent.futures.ProcessPoolExecutor" to start your worker processes, you need to create the queue slightly differently. Instead of queue = multiprocessing.Queue(-1) you should use queue = multiprocessing.Manager().Queue(-1) # also works with the examples above and you can then replace the worker creation from this: workers = [] for i in range(10): worker = multiprocessing.Process(target=worker_process, args=(queue, worker_configurer)) workers.append(worker) worker.start() for w in workers: w.join() to this (remembering to first import "concurrent.futures"): with concurrent.futures.ProcessPoolExecutor(max_workers=10) as executor: for i in range(10): executor.submit(worker_process, queue, worker_configurer) Deploying Web applications using Gunicorn and uWSGI --------------------------------------------------- When deploying Web applications using Gunicorn or uWSGI (or similar), multiple worker processes are created to handle client requests. In such environments, avoid creating file-based handlers directly in your web application. Instead, use a "SocketHandler" to log from the web application to a listener in a separate process. This can be set up using a process management tool such as Supervisor - see Running a logging socket listener in production for more details. Using file rotation =================== Sometimes you want to let a log file grow to a certain size, then open a new file and log to that. You may want to keep a certain number of these files, and when that many files have been created, rotate the files so that the number of files and the size of the files both remain bounded. For this usage pattern, the logging package provides a "RotatingFileHandler": import glob import logging import logging.handlers LOG_FILENAME = 'logging_rotatingfile_example.out' # Set up a specific logger with our desired output level my_logger = logging.getLogger('MyLogger') my_logger.setLevel(logging.DEBUG) # Add the log message handler to the logger handler = logging.handlers.RotatingFileHandler( LOG_FILENAME, maxBytes=20, backupCount=5) my_logger.addHandler(handler) # Log some messages for i in range(20): my_logger.debug('i = %d' % i) # See what files are created logfiles = glob.glob('%s*' % LOG_FILENAME) for filename in logfiles: print(filename) The result should be 6 separate files, each with part of the log history for the application: logging_rotatingfile_example.out logging_rotatingfile_example.out.1 logging_rotatingfile_example.out.2 logging_rotatingfile_example.out.3 logging_rotatingfile_example.out.4 logging_rotatingfile_example.out.5 The most current file is always "logging_rotatingfile_example.out", and each time it reaches the size limit it is renamed with the suffix ".1". Each of the existing backup files is renamed to increment the suffix (".1" becomes ".2", etc.) and the ".6" file is erased. Obviously this example sets the log length much too small as an extreme example. You would want to set *maxBytes* to an appropriate value. Use of alternative formatting styles ==================================== When logging was added to the Python standard library, the only way of formatting messages with variable content was to use the %-formatting method. Since then, Python has gained two new formatting approaches: "string.Template" (added in Python 2.4) and "str.format()" (added in Python 2.6). Logging (as of 3.2) provides improved support for these two additional formatting styles. The "Formatter" class been enhanced to take an additional, optional keyword parameter named "style". This defaults to "'%'", but other possible values are "'{'" and "'$'", which correspond to the other two formatting styles. Backwards compatibility is maintained by default (as you would expect), but by explicitly specifying a style parameter, you get the ability to specify format strings which work with "str.format()" or "string.Template". Here’s an example console session to show the possibilities: >>> import logging >>> root = logging.getLogger() >>> root.setLevel(logging.DEBUG) >>> handler = logging.StreamHandler() >>> bf = logging.Formatter('{asctime} {name} {levelname:8s} {message}', ... style='{') >>> handler.setFormatter(bf) >>> root.addHandler(handler) >>> logger = logging.getLogger('foo.bar') >>> logger.debug('This is a DEBUG message') 2010-10-28 15:11:55,341 foo.bar DEBUG This is a DEBUG message >>> logger.critical('This is a CRITICAL message') 2010-10-28 15:12:11,526 foo.bar CRITICAL This is a CRITICAL message >>> df = logging.Formatter('$asctime $name ${levelname} $message', ... style='$') >>> handler.setFormatter(df) >>> logger.debug('This is a DEBUG message') 2010-10-28 15:13:06,924 foo.bar DEBUG This is a DEBUG message >>> logger.critical('This is a CRITICAL message') 2010-10-28 15:13:11,494 foo.bar CRITICAL This is a CRITICAL message >>> Note that the formatting of logging messages for final output to logs is completely independent of how an individual logging message is constructed. That can still use %-formatting, as shown here: >>> logger.error('This is an%s %s %s', 'other,', 'ERROR,', 'message') 2010-10-28 15:19:29,833 foo.bar ERROR This is another, ERROR, message >>> Logging calls ("logger.debug()", "logger.info()" etc.) only take positional parameters for the actual logging message itself, with keyword parameters used only for determining options for how to handle the actual logging call (e.g. the "exc_info" keyword parameter to indicate that traceback information should be logged, or the "extra" keyword parameter to indicate additional contextual information to be added to the log). So you cannot directly make logging calls using "str.format()" or "string.Template" syntax, because internally the logging package uses %-formatting to merge the format string and the variable arguments. There would be no changing this while preserving backward compatibility, since all logging calls which are out there in existing code will be using %-format strings. There is, however, a way that you can use {}- and $- formatting to construct your individual log messages. Recall that for a message you can use an arbitrary object as a message format string, and that the logging package will call "str()" on that object to get the actual format string. Consider the following two classes: class BraceMessage: def __init__(self, fmt, /, *args, **kwargs): self.fmt = fmt self.args = args self.kwargs = kwargs def __str__(self): return self.fmt.format(*self.args, **self.kwargs) class DollarMessage: def __init__(self, fmt, /, **kwargs): self.fmt = fmt self.kwargs = kwargs def __str__(self): from string import Template return Template(self.fmt).substitute(**self.kwargs) Either of these can be used in place of a format string, to allow {}- or $-formatting to be used to build the actual “message” part which appears in the formatted log output in place of “%(message)s” or “{message}” or “$message”. It’s a little unwieldy to use the class names whenever you want to log something, but it’s quite palatable if you use an alias such as __ (double underscore — not to be confused with _, the single underscore used as a synonym/alias for "gettext.gettext()" or its brethren). The above classes are not included in Python, though they’re easy enough to copy and paste into your own code. They can be used as follows (assuming that they’re declared in a module called "wherever"): >>> from wherever import BraceMessage as __ >>> print(__('Message with {0} {name}', 2, name='placeholders')) Message with 2 placeholders >>> class Point: pass ... >>> p = Point() >>> p.x = 0.5 >>> p.y = 0.5 >>> print(__('Message with coordinates: ({point.x:.2f}, {point.y:.2f})', ... point=p)) Message with coordinates: (0.50, 0.50) >>> from wherever import DollarMessage as __ >>> print(__('Message with $num $what', num=2, what='placeholders')) Message with 2 placeholders >>> While the above examples use "print()" to show how the formatting works, you would of course use "logger.debug()" or similar to actually log using this approach. One thing to note is that you pay no significant performance penalty with this approach: the actual formatting happens not when you make the logging call, but when (and if) the logged message is actually about to be output to a log by a handler. So the only slightly unusual thing which might trip you up is that the parentheses go around the format string and the arguments, not just the format string. That’s because the __ notation is just syntax sugar for a constructor call to one of the XXXMessage classes. If you prefer, you can use a "LoggerAdapter" to achieve a similar effect to the above, as in the following example: import logging class Message: def __init__(self, fmt, args): self.fmt = fmt self.args = args def __str__(self): return self.fmt.format(*self.args) class StyleAdapter(logging.LoggerAdapter): def __init__(self, logger, extra=None): super().__init__(logger, extra or {}) def log(self, level, msg, /, *args, **kwargs): if self.isEnabledFor(level): msg, kwargs = self.process(msg, kwargs) self.logger._log(level, Message(msg, args), (), **kwargs) logger = StyleAdapter(logging.getLogger(__name__)) def main(): logger.debug('Hello, {}', 'world!') if __name__ == '__main__': logging.basicConfig(level=logging.DEBUG) main() The above script should log the message "Hello, world!" when run with Python 3.2 or later. Customizing "LogRecord" ======================= Every logging event is represented by a "LogRecord" instance. When an event is logged and not filtered out by a logger’s level, a "LogRecord" is created, populated with information about the event and then passed to the handlers for that logger (and its ancestors, up to and including the logger where further propagation up the hierarchy is disabled). Before Python 3.2, there were only two places where this creation was done: * "Logger.makeRecord()", which is called in the normal process of logging an event. This invoked "LogRecord" directly to create an instance. * "makeLogRecord()", which is called with a dictionary containing attributes to be added to the LogRecord. This is typically invoked when a suitable dictionary has been received over the network (e.g. in pickle form via a "SocketHandler", or in JSON form via an "HTTPHandler"). This has usually meant that if you need to do anything special with a "LogRecord", you’ve had to do one of the following. * Create your own "Logger" subclass, which overrides "Logger.makeRecord()", and set it using "setLoggerClass()" before any loggers that you care about are instantiated. * Add a "Filter" to a logger or handler, which does the necessary special manipulation you need when its "filter()" method is called. The first approach would be a little unwieldy in the scenario where (say) several different libraries wanted to do different things. Each would attempt to set its own "Logger" subclass, and the one which did this last would win. The second approach works reasonably well for many cases, but does not allow you to e.g. use a specialized subclass of "LogRecord". Library developers can set a suitable filter on their loggers, but they would have to remember to do this every time they introduced a new logger (which they would do simply by adding new packages or modules and doing logger = logging.getLogger(__name__) at module level). It’s probably one too many things to think about. Developers could also add the filter to a "NullHandler" attached to their top-level logger, but this would not be invoked if an application developer attached a handler to a lower-level library logger — so output from that handler would not reflect the intentions of the library developer. In Python 3.2 and later, "LogRecord" creation is done through a factory, which you can specify. The factory is just a callable you can set with "setLogRecordFactory()", and interrogate with "getLogRecordFactory()". The factory is invoked with the same signature as the "LogRecord" constructor, as "LogRecord" is the default setting for the factory. This approach allows a custom factory to control all aspects of LogRecord creation. For example, you could return a subclass, or just add some additional attributes to the record once created, using a pattern similar to this: old_factory = logging.getLogRecordFactory() def record_factory(*args, **kwargs): record = old_factory(*args, **kwargs) record.custom_attribute = 0xdecafbad return record logging.setLogRecordFactory(record_factory) This pattern allows different libraries to chain factories together, and as long as they don’t overwrite each other’s attributes or unintentionally overwrite the attributes provided as standard, there should be no surprises. However, it should be borne in mind that each link in the chain adds run-time overhead to all logging operations, and the technique should only be used when the use of a "Filter" does not provide the desired result. Subclassing QueueHandler - a ZeroMQ example =========================================== You can use a "QueueHandler" subclass to send messages to other kinds of queues, for example a ZeroMQ ‘publish’ socket. In the example below,the socket is created separately and passed to the handler (as its ‘queue’): import zmq # using pyzmq, the Python binding for ZeroMQ import json # for serializing records portably ctx = zmq.Context() sock = zmq.Socket(ctx, zmq.PUB) # or zmq.PUSH, or other suitable value sock.bind('tcp://*:5556') # or wherever class ZeroMQSocketHandler(QueueHandler): def enqueue(self, record): self.queue.send_json(record.__dict__) handler = ZeroMQSocketHandler(sock) Of course there are other ways of organizing this, for example passing in the data needed by the handler to create the socket: class ZeroMQSocketHandler(QueueHandler): def __init__(self, uri, socktype=zmq.PUB, ctx=None): self.ctx = ctx or zmq.Context() socket = zmq.Socket(self.ctx, socktype) socket.bind(uri) super().__init__(socket) def enqueue(self, record): self.queue.send_json(record.__dict__) def close(self): self.queue.close() Subclassing QueueListener - a ZeroMQ example ============================================ You can also subclass "QueueListener" to get messages from other kinds of queues, for example a ZeroMQ ‘subscribe’ socket. Here’s an example: class ZeroMQSocketListener(QueueListener): def __init__(self, uri, /, *handlers, **kwargs): self.ctx = kwargs.get('ctx') or zmq.Context() socket = zmq.Socket(self.ctx, zmq.SUB) socket.setsockopt_string(zmq.SUBSCRIBE, '') # subscribe to everything socket.connect(uri) super().__init__(socket, *handlers, **kwargs) def dequeue(self): msg = self.queue.recv_json() return logging.makeLogRecord(msg) See also: Module "logging" API reference for the logging module. Module "logging.config" Configuration API for the logging module. Module "logging.handlers" Useful handlers included with the logging module. A basic logging tutorial A more advanced logging tutorial An example dictionary-based configuration ========================================= Below is an example of a logging configuration dictionary - it’s taken from the documentation on the Django project. This dictionary is passed to "dictConfig()" to put the configuration into effect: LOGGING = { 'version': 1, 'disable_existing_loggers': True, 'formatters': { 'verbose': { 'format': '%(levelname)s %(asctime)s %(module)s %(process)d %(thread)d %(message)s' }, 'simple': { 'format': '%(levelname)s %(message)s' }, }, 'filters': { 'special': { '()': 'project.logging.SpecialFilter', 'foo': 'bar', } }, 'handlers': { 'null': { 'level':'DEBUG', 'class':'django.utils.log.NullHandler', }, 'console':{ 'level':'DEBUG', 'class':'logging.StreamHandler', 'formatter': 'simple' }, 'mail_admins': { 'level': 'ERROR', 'class': 'django.utils.log.AdminEmailHandler', 'filters': ['special'] } }, 'loggers': { 'django': { 'handlers':['null'], 'propagate': True, 'level':'INFO', }, 'django.request': { 'handlers': ['mail_admins'], 'level': 'ERROR', 'propagate': False, }, 'myproject.custom': { 'handlers': ['console', 'mail_admins'], 'level': 'INFO', 'filters': ['special'] } } } For more information about this configuration, you can see the relevant section of the Django documentation. Using a rotator and namer to customize log rotation processing ============================================================== An example of how you can define a namer and rotator is given in the following snippet, which shows zlib-based compression of the log file: def namer(name): return name + ".gz" def rotator(source, dest): with open(source, "rb") as sf: data = sf.read() compressed = zlib.compress(data, 9) with open(dest, "wb") as df: df.write(compressed) os.remove(source) rh = logging.handlers.RotatingFileHandler(...) rh.rotator = rotator rh.namer = namer These are not “true” .gz files, as they are bare compressed data, with no “container” such as you’d find in an actual gzip file. This snippet is just for illustration purposes. A more elaborate multiprocessing example ======================================== The following working example shows how logging can be used with multiprocessing using configuration files. The configurations are fairly simple, but serve to illustrate how more complex ones could be implemented in a real multiprocessing scenario. In the example, the main process spawns a listener process and some worker processes. Each of the main process, the listener and the workers have three separate configurations (the workers all share the same configuration). We can see logging in the main process, how the workers log to a QueueHandler and how the listener implements a QueueListener and a more complex logging configuration, and arranges to dispatch events received via the queue to the handlers specified in the configuration. Note that these configurations are purely illustrative, but you should be able to adapt this example to your own scenario. Here’s the script - the docstrings and the comments hopefully explain how it works: import logging import logging.config import logging.handlers from multiprocessing import Process, Queue, Event, current_process import os import random import time class MyHandler: """ A simple handler for logging events. It runs in the listener process and dispatches events to loggers based on the name in the received record, which then get dispatched, by the logging system, to the handlers configured for those loggers. """ def handle(self, record): if record.name == "root": logger = logging.getLogger() else: logger = logging.getLogger(record.name) if logger.isEnabledFor(record.levelno): # The process name is transformed just to show that it's the listener # doing the logging to files and console record.processName = '%s (for %s)' % (current_process().name, record.processName) logger.handle(record) def listener_process(q, stop_event, config): """ This could be done in the main process, but is just done in a separate process for illustrative purposes. This initialises logging according to the specified configuration, starts the listener and waits for the main process to signal completion via the event. The listener is then stopped, and the process exits. """ logging.config.dictConfig(config) listener = logging.handlers.QueueListener(q, MyHandler()) listener.start() if os.name == 'posix': # On POSIX, the setup logger will have been configured in the # parent process, but should have been disabled following the # dictConfig call. # On Windows, since fork isn't used, the setup logger won't # exist in the child, so it would be created and the message # would appear - hence the "if posix" clause. logger = logging.getLogger('setup') logger.critical('Should not appear, because of disabled logger ...') stop_event.wait() listener.stop() def worker_process(config): """ A number of these are spawned for the purpose of illustration. In practice, they could be a heterogeneous bunch of processes rather than ones which are identical to each other. This initialises logging according to the specified configuration, and logs a hundred messages with random levels to randomly selected loggers. A small sleep is added to allow other processes a chance to run. This is not strictly needed, but it mixes the output from the different processes a bit more than if it's left out. """ logging.config.dictConfig(config) levels = [logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR, logging.CRITICAL] loggers = ['foo', 'foo.bar', 'foo.bar.baz', 'spam', 'spam.ham', 'spam.ham.eggs'] if os.name == 'posix': # On POSIX, the setup logger will have been configured in the # parent process, but should have been disabled following the # dictConfig call. # On Windows, since fork isn't used, the setup logger won't # exist in the child, so it would be created and the message # would appear - hence the "if posix" clause. logger = logging.getLogger('setup') logger.critical('Should not appear, because of disabled logger ...') for i in range(100): lvl = random.choice(levels) logger = logging.getLogger(random.choice(loggers)) logger.log(lvl, 'Message no. %d', i) time.sleep(0.01) def main(): q = Queue() # The main process gets a simple configuration which prints to the console. config_initial = { 'version': 1, 'handlers': { 'console': { 'class': 'logging.StreamHandler', 'level': 'INFO' } }, 'root': { 'handlers': ['console'], 'level': 'DEBUG' } } # The worker process configuration is just a QueueHandler attached to the # root logger, which allows all messages to be sent to the queue. # We disable existing loggers to disable the "setup" logger used in the # parent process. This is needed on POSIX because the logger will # be there in the child following a fork(). config_worker = { 'version': 1, 'disable_existing_loggers': True, 'handlers': { 'queue': { 'class': 'logging.handlers.QueueHandler', 'queue': q } }, 'root': { 'handlers': ['queue'], 'level': 'DEBUG' } } # The listener process configuration shows that the full flexibility of # logging configuration is available to dispatch events to handlers however # you want. # We disable existing loggers to disable the "setup" logger used in the # parent process. This is needed on POSIX because the logger will # be there in the child following a fork(). config_listener = { 'version': 1, 'disable_existing_loggers': True, 'formatters': { 'detailed': { 'class': 'logging.Formatter', 'format': '%(asctime)s %(name)-15s %(levelname)-8s %(processName)-10s %(message)s' }, 'simple': { 'class': 'logging.Formatter', 'format': '%(name)-15s %(levelname)-8s %(processName)-10s %(message)s' } }, 'handlers': { 'console': { 'class': 'logging.StreamHandler', 'formatter': 'simple', 'level': 'INFO' }, 'file': { 'class': 'logging.FileHandler', 'filename': 'mplog.log', 'mode': 'w', 'formatter': 'detailed' }, 'foofile': { 'class': 'logging.FileHandler', 'filename': 'mplog-foo.log', 'mode': 'w', 'formatter': 'detailed' }, 'errors': { 'class': 'logging.FileHandler', 'filename': 'mplog-errors.log', 'mode': 'w', 'formatter': 'detailed', 'level': 'ERROR' } }, 'loggers': { 'foo': { 'handlers': ['foofile'] } }, 'root': { 'handlers': ['console', 'file', 'errors'], 'level': 'DEBUG' } } # Log some initial events, just to show that logging in the parent works # normally. logging.config.dictConfig(config_initial) logger = logging.getLogger('setup') logger.info('About to create workers ...') workers = [] for i in range(5): wp = Process(target=worker_process, name='worker %d' % (i + 1), args=(config_worker,)) workers.append(wp) wp.start() logger.info('Started worker: %s', wp.name) logger.info('About to create listener ...') stop_event = Event() lp = Process(target=listener_process, name='listener', args=(q, stop_event, config_listener)) lp.start() logger.info('Started listener') # We now hang around for the workers to finish their work. for wp in workers: wp.join() # Workers all done, listening can now stop. # Logging in the parent still works normally. logger.info('Telling listener to stop ...') stop_event.set() lp.join() logger.info('All done.') if __name__ == '__main__': main() Inserting a BOM into messages sent to a SysLogHandler ===================================================== **RFC 5424** requires that a Unicode message be sent to a syslog daemon as a set of bytes which have the following structure: an optional pure-ASCII component, followed by a UTF-8 Byte Order Mark (BOM), followed by Unicode encoded using UTF-8. (See the **relevant section of the specification**.) In Python 3.1, code was added to "SysLogHandler" to insert a BOM into the message, but unfortunately, it was implemented incorrectly, with the BOM appearing at the beginning of the message and hence not allowing any pure-ASCII component to appear before it. As this behaviour is broken, the incorrect BOM insertion code is being removed from Python 3.2.4 and later. However, it is not being replaced, and if you want to produce **RFC 5424**-compliant messages which include a BOM, an optional pure-ASCII sequence before it and arbitrary Unicode after it, encoded using UTF-8, then you need to do the following: 1. Attach a "Formatter" instance to your "SysLogHandler" instance, with a format string such as: 'ASCII section\ufeffUnicode section' The Unicode code point U+FEFF, when encoded using UTF-8, will be encoded as a UTF-8 BOM – the byte-string "b'\xef\xbb\xbf'". 2. Replace the ASCII section with whatever placeholders you like, but make sure that the data that appears in there after substitution is always ASCII (that way, it will remain unchanged after UTF-8 encoding). 3. Replace the Unicode section with whatever placeholders you like; if the data which appears there after substitution contains characters outside the ASCII range, that’s fine – it will be encoded using UTF-8. The formatted message *will* be encoded using UTF-8 encoding by "SysLogHandler". If you follow the above rules, you should be able to produce **RFC 5424**-compliant messages. If you don’t, logging may not complain, but your messages will not be RFC 5424-compliant, and your syslog daemon may complain. Implementing structured logging =============================== Although most logging messages are intended for reading by humans, and thus not readily machine-parseable, there might be circumstances where you want to output messages in a structured format which *is* capable of being parsed by a program (without needing complex regular expressions to parse the log message). This is straightforward to achieve using the logging package. There are a number of ways in which this could be achieved, but the following is a simple approach which uses JSON to serialise the event in a machine-parseable manner: import json import logging class StructuredMessage: def __init__(self, message, /, **kwargs): self.message = message self.kwargs = kwargs def __str__(self): return '%s >>> %s' % (self.message, json.dumps(self.kwargs)) _ = StructuredMessage # optional, to improve readability logging.basicConfig(level=logging.INFO, format='%(message)s') logging.info(_('message 1', foo='bar', bar='baz', num=123, fnum=123.456)) If the above script is run, it prints: message 1 >>> {"fnum": 123.456, "num": 123, "bar": "baz", "foo": "bar"} Note that the order of items might be different according to the version of Python used. If you need more specialised processing, you can use a custom JSON encoder, as in the following complete example: from __future__ import unicode_literals import json import logging # This next bit is to ensure the script runs unchanged on 2.x and 3.x try: unicode except NameError: unicode = str class Encoder(json.JSONEncoder): def default(self, o): if isinstance(o, set): return tuple(o) elif isinstance(o, unicode): return o.encode('unicode_escape').decode('ascii') return super().default(o) class StructuredMessage: def __init__(self, message, /, **kwargs): self.message = message self.kwargs = kwargs def __str__(self): s = Encoder().encode(self.kwargs) return '%s >>> %s' % (self.message, s) _ = StructuredMessage # optional, to improve readability def main(): logging.basicConfig(level=logging.INFO, format='%(message)s') logging.info(_('message 1', set_value={1, 2, 3}, snowman='\u2603')) if __name__ == '__main__': main() When the above script is run, it prints: message 1 >>> {"snowman": "\u2603", "set_value": [1, 2, 3]} Note that the order of items might be different according to the version of Python used. Customizing handlers with "dictConfig()" ======================================== There are times when you want to customize logging handlers in particular ways, and if you use "dictConfig()" you may be able to do this without subclassing. As an example, consider that you may want to set the ownership of a log file. On POSIX, this is easily done using "shutil.chown()", but the file handlers in the stdlib don’t offer built-in support. You can customize handler creation using a plain function such as: def owned_file_handler(filename, mode='a', encoding=None, owner=None): if owner: if not os.path.exists(filename): open(filename, 'a').close() shutil.chown(filename, *owner) return logging.FileHandler(filename, mode, encoding) You can then specify, in a logging configuration passed to "dictConfig()", that a logging handler be created by calling this function: LOGGING = { 'version': 1, 'disable_existing_loggers': False, 'formatters': { 'default': { 'format': '%(asctime)s %(levelname)s %(name)s %(message)s' }, }, 'handlers': { 'file':{ # The values below are popped from this dictionary and # used to create the handler, set the handler's level and # its formatter. '()': owned_file_handler, 'level':'DEBUG', 'formatter': 'default', # The values below are passed to the handler creator callable # as keyword arguments. 'owner': ['pulse', 'pulse'], 'filename': 'chowntest.log', 'mode': 'w', 'encoding': 'utf-8', }, }, 'root': { 'handlers': ['file'], 'level': 'DEBUG', }, } In this example I am setting the ownership using the "pulse" user and group, just for the purposes of illustration. Putting it together into a working script, "chowntest.py": import logging, logging.config, os, shutil def owned_file_handler(filename, mode='a', encoding=None, owner=None): if owner: if not os.path.exists(filename): open(filename, 'a').close() shutil.chown(filename, *owner) return logging.FileHandler(filename, mode, encoding) LOGGING = { 'version': 1, 'disable_existing_loggers': False, 'formatters': { 'default': { 'format': '%(asctime)s %(levelname)s %(name)s %(message)s' }, }, 'handlers': { 'file':{ # The values below are popped from this dictionary and # used to create the handler, set the handler's level and # its formatter. '()': owned_file_handler, 'level':'DEBUG', 'formatter': 'default', # The values below are passed to the handler creator callable # as keyword arguments. 'owner': ['pulse', 'pulse'], 'filename': 'chowntest.log', 'mode': 'w', 'encoding': 'utf-8', }, }, 'root': { 'handlers': ['file'], 'level': 'DEBUG', }, } logging.config.dictConfig(LOGGING) logger = logging.getLogger('mylogger') logger.debug('A debug message') To run this, you will probably need to run as "root": $ sudo python3.3 chowntest.py $ cat chowntest.log 2013-11-05 09:34:51,128 DEBUG mylogger A debug message $ ls -l chowntest.log -rw-r--r-- 1 pulse pulse 55 2013-11-05 09:34 chowntest.log Note that this example uses Python 3.3 because that’s where "shutil.chown()" makes an appearance. This approach should work with any Python version that supports "dictConfig()" - namely, Python 2.7, 3.2 or later. With pre-3.3 versions, you would need to implement the actual ownership change using e.g. "os.chown()". In practice, the handler-creating function may be in a utility module somewhere in your project. Instead of the line in the configuration: '()': owned_file_handler, you could use e.g.: '()': 'ext://project.util.owned_file_handler', where "project.util" can be replaced with the actual name of the package where the function resides. In the above working script, using "'ext://__main__.owned_file_handler'" should work. Here, the actual callable is resolved by "dictConfig()" from the "ext://" specification. This example hopefully also points the way to how you could implement other types of file change - e.g. setting specific POSIX permission bits - in the same way, using "os.chmod()". Of course, the approach could also be extended to types of handler other than a "FileHandler" - for example, one of the rotating file handlers, or a different type of handler altogether. Using particular formatting styles throughout your application ============================================================== In Python 3.2, the "Formatter" gained a "style" keyword parameter which, while defaulting to "%" for backward compatibility, allowed the specification of "{" or "$" to support the formatting approaches supported by "str.format()" and "string.Template". Note that this governs the formatting of logging messages for final output to logs, and is completely orthogonal to how an individual logging message is constructed. Logging calls ("debug()", "info()" etc.) only take positional parameters for the actual logging message itself, with keyword parameters used only for determining options for how to handle the logging call (e.g. the "exc_info" keyword parameter to indicate that traceback information should be logged, or the "extra" keyword parameter to indicate additional contextual information to be added to the log). So you cannot directly make logging calls using "str.format()" or "string.Template" syntax, because internally the logging package uses %-formatting to merge the format string and the variable arguments. There would no changing this while preserving backward compatibility, since all logging calls which are out there in existing code will be using %-format strings. There have been suggestions to associate format styles with specific loggers, but that approach also runs into backward compatibility problems because any existing code could be using a given logger name and using %-formatting. For logging to work interoperably between any third-party libraries and your code, decisions about formatting need to be made at the level of the individual logging call. This opens up a couple of ways in which alternative formatting styles can be accommodated. Using LogRecord factories ------------------------- In Python 3.2, along with the "Formatter" changes mentioned above, the logging package gained the ability to allow users to set their own "LogRecord" subclasses, using the "setLogRecordFactory()" function. You can use this to set your own subclass of "LogRecord", which does the Right Thing by overriding the "getMessage()" method. The base class implementation of this method is where the "msg % args" formatting happens, and where you can substitute your alternate formatting; however, you should be careful to support all formatting styles and allow %-formatting as the default, to ensure interoperability with other code. Care should also be taken to call "str(self.msg)", just as the base implementation does. Refer to the reference documentation on "setLogRecordFactory()" and "LogRecord" for more information. Using custom message objects ---------------------------- There is another, perhaps simpler way that you can use {}- and $- formatting to construct your individual log messages. You may recall (from Using arbitrary objects as messages) that when logging you can use an arbitrary object as a message format string, and that the logging package will call "str()" on that object to get the actual format string. Consider the following two classes: class BraceMessage: def __init__(self, fmt, /, *args, **kwargs): self.fmt = fmt self.args = args self.kwargs = kwargs def __str__(self): return self.fmt.format(*self.args, **self.kwargs) class DollarMessage: def __init__(self, fmt, /, **kwargs): self.fmt = fmt self.kwargs = kwargs def __str__(self): from string import Template return Template(self.fmt).substitute(**self.kwargs) Either of these can be used in place of a format string, to allow {}- or $-formatting to be used to build the actual “message” part which appears in the formatted log output in place of “%(message)s” or “{message}” or “$message”. If you find it a little unwieldy to use the class names whenever you want to log something, you can make it more palatable if you use an alias such as "M" or "_" for the message (or perhaps "__", if you are using "_" for localization). Examples of this approach are given below. Firstly, formatting with "str.format()": >>> __ = BraceMessage >>> print(__('Message with {0} {1}', 2, 'placeholders')) Message with 2 placeholders >>> class Point: pass ... >>> p = Point() >>> p.x = 0.5 >>> p.y = 0.5 >>> print(__('Message with coordinates: ({point.x:.2f}, {point.y:.2f})', point=p)) Message with coordinates: (0.50, 0.50) Secondly, formatting with "string.Template": >>> __ = DollarMessage >>> print(__('Message with $num $what', num=2, what='placeholders')) Message with 2 placeholders >>> One thing to note is that you pay no significant performance penalty with this approach: the actual formatting happens not when you make the logging call, but when (and if) the logged message is actually about to be output to a log by a handler. So the only slightly unusual thing which might trip you up is that the parentheses go around the format string and the arguments, not just the format string. That’s because the __ notation is just syntax sugar for a constructor call to one of the "XXXMessage" classes shown above. Configuring filters with "dictConfig()" ======================================= You *can* configure filters using "dictConfig()", though it might not be obvious at first glance how to do it (hence this recipe). Since "Filter" is the only filter class included in the standard library, and it is unlikely to cater to many requirements (it’s only there as a base class), you will typically need to define your own "Filter" subclass with an overridden "filter()" method. To do this, specify the "()" key in the configuration dictionary for the filter, specifying a callable which will be used to create the filter (a class is the most obvious, but you can provide any callable which returns a "Filter" instance). Here is a complete example: import logging import logging.config import sys class MyFilter(logging.Filter): def __init__(self, param=None): self.param = param def filter(self, record): if self.param is None: allow = True else: allow = self.param not in record.msg if allow: record.msg = 'changed: ' + record.msg return allow LOGGING = { 'version': 1, 'filters': { 'myfilter': { '()': MyFilter, 'param': 'noshow', } }, 'handlers': { 'console': { 'class': 'logging.StreamHandler', 'filters': ['myfilter'] } }, 'root': { 'level': 'DEBUG', 'handlers': ['console'] }, } if __name__ == '__main__': logging.config.dictConfig(LOGGING) logging.debug('hello') logging.debug('hello - noshow') This example shows how you can pass configuration data to the callable which constructs the instance, in the form of keyword parameters. When run, the above script will print: changed: hello which shows that the filter is working as configured. A couple of extra points to note: * If you can’t refer to the callable directly in the configuration (e.g. if it lives in a different module, and you can’t import it directly where the configuration dictionary is), you can use the form "ext://..." as described in Access to external objects. For example, you could have used the text "'ext://__main__.MyFilter'" instead of "MyFilter" in the above example. * As well as for filters, this technique can also be used to configure custom handlers and formatters. See User-defined objects for more information on how logging supports using user-defined objects in its configuration, and see the other cookbook recipe Customizing handlers with dictConfig() above. Customized exception formatting =============================== There might be times when you want to do customized exception formatting - for argument’s sake, let’s say you want exactly one line per logged event, even when exception information is present. You can do this with a custom formatter class, as shown in the following example: import logging class OneLineExceptionFormatter(logging.Formatter): def formatException(self, exc_info): """ Format an exception so that it prints on a single line. """ result = super().formatException(exc_info) return repr(result) # or format into one line however you want to def format(self, record): s = super().format(record) if record.exc_text: s = s.replace('\n', '') + '|' return s def configure_logging(): fh = logging.FileHandler('output.txt', 'w') f = OneLineExceptionFormatter('%(asctime)s|%(levelname)s|%(message)s|', '%d/%m/%Y %H:%M:%S') fh.setFormatter(f) root = logging.getLogger() root.setLevel(logging.DEBUG) root.addHandler(fh) def main(): configure_logging() logging.info('Sample message') try: x = 1 / 0 except ZeroDivisionError as e: logging.exception('ZeroDivisionError: %s', e) if __name__ == '__main__': main() When run, this produces a file with exactly two lines: 28/01/2015 07:21:23|INFO|Sample message| 28/01/2015 07:21:23|ERROR|ZeroDivisionError: integer division or modulo by zero|'Traceback (most recent call last):\n File "logtest7.py", line 30, in main\n x = 1 / 0\nZeroDivisionError: integer division or modulo by zero'| While the above treatment is simplistic, it points the way to how exception information can be formatted to your liking. The "traceback" module may be helpful for more specialized needs. Speaking logging messages ========================= There might be situations when it is desirable to have logging messages rendered in an audible rather than a visible format. This is easy to do if you have text-to-speech (TTS) functionality available in your system, even if it doesn’t have a Python binding. Most TTS systems have a command line program you can run, and this can be invoked from a handler using "subprocess". It’s assumed here that TTS command line programs won’t expect to interact with users or take a long time to complete, and that the frequency of logged messages will be not so high as to swamp the user with messages, and that it’s acceptable to have the messages spoken one at a time rather than concurrently, The example implementation below waits for one message to be spoken before the next is processed, and this might cause other handlers to be kept waiting. Here is a short example showing the approach, which assumes that the "espeak" TTS package is available: import logging import subprocess import sys class TTSHandler(logging.Handler): def emit(self, record): msg = self.format(record) # Speak slowly in a female English voice cmd = ['espeak', '-s150', '-ven+f3', msg] p = subprocess.Popen(cmd, stdout=subprocess.PIPE, stderr=subprocess.STDOUT) # wait for the program to finish p.communicate() def configure_logging(): h = TTSHandler() root = logging.getLogger() root.addHandler(h) # the default formatter just returns the message root.setLevel(logging.DEBUG) def main(): logging.info('Hello') logging.debug('Goodbye') if __name__ == '__main__': configure_logging() sys.exit(main()) When run, this script should say “Hello” and then “Goodbye” in a female voice. The above approach can, of course, be adapted to other TTS systems and even other systems altogether which can process messages via external programs run from a command line. Buffering logging messages and outputting them conditionally ============================================================ There might be situations where you want to log messages in a temporary area and only output them if a certain condition occurs. For example, you may want to start logging debug events in a function, and if the function completes without errors, you don’t want to clutter the log with the collected debug information, but if there is an error, you want all the debug information to be output as well as the error. Here is an example which shows how you could do this using a decorator for your functions where you want logging to behave this way. It makes use of the "logging.handlers.MemoryHandler", which allows buffering of logged events until some condition occurs, at which point the buffered events are "flushed" - passed to another handler (the "target" handler) for processing. By default, the "MemoryHandler" flushed when its buffer gets filled up or an event whose level is greater than or equal to a specified threshold is seen. You can use this recipe with a more specialised subclass of "MemoryHandler" if you want custom flushing behavior. The example script has a simple function, "foo", which just cycles through all the logging levels, writing to "sys.stderr" to say what level it’s about to log at, and then actually logging a message at that level. You can pass a parameter to "foo" which, if true, will log at ERROR and CRITICAL levels - otherwise, it only logs at DEBUG, INFO and WARNING levels. The script just arranges to decorate "foo" with a decorator which will do the conditional logging that’s required. The decorator takes a logger as a parameter and attaches a memory handler for the duration of the call to the decorated function. The decorator can be additionally parameterised using a target handler, a level at which flushing should occur, and a capacity for the buffer (number of records buffered). These default to a "StreamHandler" which writes to "sys.stderr", "logging.ERROR" and "100" respectively. Here’s the script: import logging from logging.handlers import MemoryHandler import sys logger = logging.getLogger(__name__) logger.addHandler(logging.NullHandler()) def log_if_errors(logger, target_handler=None, flush_level=None, capacity=None): if target_handler is None: target_handler = logging.StreamHandler() if flush_level is None: flush_level = logging.ERROR if capacity is None: capacity = 100 handler = MemoryHandler(capacity, flushLevel=flush_level, target=target_handler) def decorator(fn): def wrapper(*args, **kwargs): logger.addHandler(handler) try: return fn(*args, **kwargs) except Exception: logger.exception('call failed') raise finally: super(MemoryHandler, handler).flush() logger.removeHandler(handler) return wrapper return decorator def write_line(s): sys.stderr.write('%s\n' % s) def foo(fail=False): write_line('about to log at DEBUG ...') logger.debug('Actually logged at DEBUG') write_line('about to log at INFO ...') logger.info('Actually logged at INFO') write_line('about to log at WARNING ...') logger.warning('Actually logged at WARNING') if fail: write_line('about to log at ERROR ...') logger.error('Actually logged at ERROR') write_line('about to log at CRITICAL ...') logger.critical('Actually logged at CRITICAL') return fail decorated_foo = log_if_errors(logger)(foo) if __name__ == '__main__': logger.setLevel(logging.DEBUG) write_line('Calling undecorated foo with False') assert not foo(False) write_line('Calling undecorated foo with True') assert foo(True) write_line('Calling decorated foo with False') assert not decorated_foo(False) write_line('Calling decorated foo with True') assert decorated_foo(True) When this script is run, the following output should be observed: Calling undecorated foo with False about to log at DEBUG ... about to log at INFO ... about to log at WARNING ... Calling undecorated foo with True about to log at DEBUG ... about to log at INFO ... about to log at WARNING ... about to log at ERROR ... about to log at CRITICAL ... Calling decorated foo with False about to log at DEBUG ... about to log at INFO ... about to log at WARNING ... Calling decorated foo with True about to log at DEBUG ... about to log at INFO ... about to log at WARNING ... about to log at ERROR ... Actually logged at DEBUG Actually logged at INFO Actually logged at WARNING Actually logged at ERROR about to log at CRITICAL ... Actually logged at CRITICAL As you can see, actual logging output only occurs when an event is logged whose severity is ERROR or greater, but in that case, any previous events at lower severities are also logged. You can of course use the conventional means of decoration: @log_if_errors(logger) def foo(fail=False): ... Formatting times using UTC (GMT) via configuration ================================================== Sometimes you want to format times using UTC, which can be done using a class such as *UTCFormatter*, shown below: import logging import time class UTCFormatter(logging.Formatter): converter = time.gmtime and you can then use the "UTCFormatter" in your code instead of "Formatter". If you want to do that via configuration, you can use the "dictConfig()" API with an approach illustrated by the following complete example: import logging import logging.config import time class UTCFormatter(logging.Formatter): converter = time.gmtime LOGGING = { 'version': 1, 'disable_existing_loggers': False, 'formatters': { 'utc': { '()': UTCFormatter, 'format': '%(asctime)s %(message)s', }, 'local': { 'format': '%(asctime)s %(message)s', } }, 'handlers': { 'console1': { 'class': 'logging.StreamHandler', 'formatter': 'utc', }, 'console2': { 'class': 'logging.StreamHandler', 'formatter': 'local', }, }, 'root': { 'handlers': ['console1', 'console2'], } } if __name__ == '__main__': logging.config.dictConfig(LOGGING) logging.warning('The local time is %s', time.asctime()) When this script is run, it should print something like: 2015-10-17 12:53:29,501 The local time is Sat Oct 17 13:53:29 2015 2015-10-17 13:53:29,501 The local time is Sat Oct 17 13:53:29 2015 showing how the time is formatted both as local time and UTC, one for each handler. Using a context manager for selective logging ============================================= There are times when it would be useful to temporarily change the logging configuration and revert it back after doing something. For this, a context manager is the most obvious way of saving and restoring the logging context. Here is a simple example of such a context manager, which allows you to optionally change the logging level and add a logging handler purely in the scope of the context manager: import logging import sys class LoggingContext: def __init__(self, logger, level=None, handler=None, close=True): self.logger = logger self.level = level self.handler = handler self.close = close def __enter__(self): if self.level is not None: self.old_level = self.logger.level self.logger.setLevel(self.level) if self.handler: self.logger.addHandler(self.handler) def __exit__(self, et, ev, tb): if self.level is not None: self.logger.setLevel(self.old_level) if self.handler: self.logger.removeHandler(self.handler) if self.handler and self.close: self.handler.close() # implicit return of None => don't swallow exceptions If you specify a level value, the logger’s level is set to that value in the scope of the with block covered by the context manager. If you specify a handler, it is added to the logger on entry to the block and removed on exit from the block. You can also ask the manager to close the handler for you on block exit - you could do this if you don’t need the handler any more. To illustrate how it works, we can add the following block of code to the above: if __name__ == '__main__': logger = logging.getLogger('foo') logger.addHandler(logging.StreamHandler()) logger.setLevel(logging.INFO) logger.info('1. This should appear just once on stderr.') logger.debug('2. This should not appear.') with LoggingContext(logger, level=logging.DEBUG): logger.debug('3. This should appear once on stderr.') logger.debug('4. This should not appear.') h = logging.StreamHandler(sys.stdout) with LoggingContext(logger, level=logging.DEBUG, handler=h, close=True): logger.debug('5. This should appear twice - once on stderr and once on stdout.') logger.info('6. This should appear just once on stderr.') logger.debug('7. This should not appear.') We initially set the logger’s level to "INFO", so message #1 appears and message #2 doesn’t. We then change the level to "DEBUG" temporarily in the following "with" block, and so message #3 appears. After the block exits, the logger’s level is restored to "INFO" and so message #4 doesn’t appear. In the next "with" block, we set the level to "DEBUG" again but also add a handler writing to "sys.stdout". Thus, message #5 appears twice on the console (once via "stderr" and once via "stdout"). After the "with" statement’s completion, the status is as it was before so message #6 appears (like message #1) whereas message #7 doesn’t (just like message #2). If we run the resulting script, the result is as follows: $ python logctx.py 1. This should appear just once on stderr. 3. This should appear once on stderr. 5. This should appear twice - once on stderr and once on stdout. 5. This should appear twice - once on stderr and once on stdout. 6. This should appear just once on stderr. If we run it again, but pipe "stderr" to "/dev/null", we see the following, which is the only message written to "stdout": $ python logctx.py 2>/dev/null 5. This should appear twice - once on stderr and once on stdout. Once again, but piping "stdout" to "/dev/null", we get: $ python logctx.py >/dev/null 1. This should appear just once on stderr. 3. This should appear once on stderr. 5. This should appear twice - once on stderr and once on stdout. 6. This should appear just once on stderr. In this case, the message #5 printed to "stdout" doesn’t appear, as expected. Of course, the approach described here can be generalised, for example to attach logging filters temporarily. Note that the above code works in Python 2 as well as Python 3. A CLI application starter template ================================== Here’s an example which shows how you can: * Use a logging level based on command-line arguments * Dispatch to multiple subcommands in separate files, all logging at the same level in a consistent way * Make use of simple, minimal configuration Suppose we have a command-line application whose job is to stop, start or restart some services. This could be organised for the purposes of illustration as a file "app.py" that is the main script for the application, with individual commands implemented in "start.py", "stop.py" and "restart.py". Suppose further that we want to control the verbosity of the application via a command-line argument, defaulting to "logging.INFO". Here’s one way that "app.py" could be written: import argparse import importlib import logging import os import sys def main(args=None): scriptname = os.path.basename(__file__) parser = argparse.ArgumentParser(scriptname) levels = ('DEBUG', 'INFO', 'WARNING', 'ERROR', 'CRITICAL') parser.add_argument('--log-level', default='INFO', choices=levels) subparsers = parser.add_subparsers(dest='command', help='Available commands:') start_cmd = subparsers.add_parser('start', help='Start a service') start_cmd.add_argument('name', metavar='NAME', help='Name of service to start') stop_cmd = subparsers.add_parser('stop', help='Stop one or more services') stop_cmd.add_argument('names', metavar='NAME', nargs='+', help='Name of service to stop') restart_cmd = subparsers.add_parser('restart', help='Restart one or more services') restart_cmd.add_argument('names', metavar='NAME', nargs='+', help='Name of service to restart') options = parser.parse_args() # the code to dispatch commands could all be in this file. For the purposes # of illustration only, we implement each command in a separate module. try: mod = importlib.import_module(options.command) cmd = getattr(mod, 'command') except (ImportError, AttributeError): print('Unable to find the code for command \'%s\'' % options.command) return 1 # Could get fancy here and load configuration from file or dictionary logging.basicConfig(level=options.log_level, format='%(levelname)s %(name)s %(message)s') cmd(options) if __name__ == '__main__': sys.exit(main()) And the "start", "stop" and "restart" commands can be implemented in separate modules, like so for starting: # start.py import logging logger = logging.getLogger(__name__) def command(options): logger.debug('About to start %s', options.name) # actually do the command processing here ... logger.info('Started the \'%s\' service.', options.name) and thus for stopping: # stop.py import logging logger = logging.getLogger(__name__) def command(options): n = len(options.names) if n == 1: plural = '' services = '\'%s\'' % options.names[0] else: plural = 's' services = ', '.join('\'%s\'' % name for name in options.names) i = services.rfind(', ') services = services[:i] + ' and ' + services[i + 2:] logger.debug('About to stop %s', services) # actually do the command processing here ... logger.info('Stopped the %s service%s.', services, plural) and similarly for restarting: # restart.py import logging logger = logging.getLogger(__name__) def command(options): n = len(options.names) if n == 1: plural = '' services = '\'%s\'' % options.names[0] else: plural = 's' services = ', '.join('\'%s\'' % name for name in options.names) i = services.rfind(', ') services = services[:i] + ' and ' + services[i + 2:] logger.debug('About to restart %s', services) # actually do the command processing here ... logger.info('Restarted the %s service%s.', services, plural) If we run this application with the default log level, we get output like this: $ python app.py start foo INFO start Started the 'foo' service. $ python app.py stop foo bar INFO stop Stopped the 'foo' and 'bar' services. $ python app.py restart foo bar baz INFO restart Restarted the 'foo', 'bar' and 'baz' services. The first word is the logging level, and the second word is the module or package name of the place where the event was logged. If we change the logging level, then we can change the information sent to the log. For example, if we want more information: $ python app.py --log-level DEBUG start foo DEBUG start About to start foo INFO start Started the 'foo' service. $ python app.py --log-level DEBUG stop foo bar DEBUG stop About to stop 'foo' and 'bar' INFO stop Stopped the 'foo' and 'bar' services. $ python app.py --log-level DEBUG restart foo bar baz DEBUG restart About to restart 'foo', 'bar' and 'baz' INFO restart Restarted the 'foo', 'bar' and 'baz' services. And if we want less: $ python app.py --log-level WARNING start foo $ python app.py --log-level WARNING stop foo bar $ python app.py --log-level WARNING restart foo bar baz In this case, the commands don’t print anything to the console, since nothing at "WARNING" level or above is logged by them. A Qt GUI for logging ==================== A question that comes up from time to time is about how to log to a GUI application. The Qt framework is a popular cross-platform UI framework with Python bindings using PySide2 or PyQt5 libraries. The following example shows how to log to a Qt GUI. This introduces a simple "QtHandler" class which takes a callable, which should be a slot in the main thread that does GUI updates. A worker thread is also created to show how you can log to the GUI from both the UI itself (via a button for manual logging) as well as a worker thread doing work in the background (here, just logging messages at random levels with random short delays in between). The worker thread is implemented using Qt’s "QThread" class rather than the "threading" module, as there are circumstances where one has to use "QThread", which offers better integration with other "Qt" components. The code should work with recent releases of either "PySide2" or "PyQt5". You should be able to adapt the approach to earlier versions of Qt. Please refer to the comments in the code snippet for more detailed information. import datetime import logging import random import sys import time # Deal with minor differences between PySide2 and PyQt5 try: from PySide2 import QtCore, QtGui, QtWidgets Signal = QtCore.Signal Slot = QtCore.Slot except ImportError: from PyQt5 import QtCore, QtGui, QtWidgets Signal = QtCore.pyqtSignal Slot = QtCore.pyqtSlot logger = logging.getLogger(__name__) # # Signals need to be contained in a QObject or subclass in order to be correctly # initialized. # class Signaller(QtCore.QObject): signal = Signal(str, logging.LogRecord) # # Output to a Qt GUI is only supposed to happen on the main thread. So, this # handler is designed to take a slot function which is set up to run in the main # thread. In this example, the function takes a string argument which is a # formatted log message, and the log record which generated it. The formatted # string is just a convenience - you could format a string for output any way # you like in the slot function itself. # # You specify the slot function to do whatever GUI updates you want. The handler # doesn't know or care about specific UI elements. # class QtHandler(logging.Handler): def __init__(self, slotfunc, *args, **kwargs): super().__init__(*args, **kwargs) self.signaller = Signaller() self.signaller.signal.connect(slotfunc) def emit(self, record): s = self.format(record) self.signaller.signal.emit(s, record) # # This example uses QThreads, which means that the threads at the Python level # are named something like "Dummy-1". The function below gets the Qt name of the # current thread. # def ctname(): return QtCore.QThread.currentThread().objectName() # # Used to generate random levels for logging. # LEVELS = (logging.DEBUG, logging.INFO, logging.WARNING, logging.ERROR, logging.CRITICAL) # # This worker class represents work that is done in a thread separate to the # main thread. The way the thread is kicked off to do work is via a button press # that connects to a slot in the worker. # # Because the default threadName value in the LogRecord isn't much use, we add # a qThreadName which contains the QThread name as computed above, and pass that # value in an "extra" dictionary which is used to update the LogRecord with the # QThread name. # # This example worker just outputs messages sequentially, interspersed with # random delays of the order of a few seconds. # class Worker(QtCore.QObject): @Slot() def start(self): extra = {'qThreadName': ctname() } logger.debug('Started work', extra=extra) i = 1 # Let the thread run until interrupted. This allows reasonably clean # thread termination. while not QtCore.QThread.currentThread().isInterruptionRequested(): delay = 0.5 + random.random() * 2 time.sleep(delay) level = random.choice(LEVELS) logger.log(level, 'Message after delay of %3.1f: %d', delay, i, extra=extra) i += 1 # # Implement a simple UI for this cookbook example. This contains: # # * A read-only text edit window which holds formatted log messages # * A button to start work and log stuff in a separate thread # * A button to log something from the main thread # * A button to clear the log window # class Window(QtWidgets.QWidget): COLORS = { logging.DEBUG: 'black', logging.INFO: 'blue', logging.WARNING: 'orange', logging.ERROR: 'red', logging.CRITICAL: 'purple', } def __init__(self, app): super().__init__() self.app = app self.textedit = te = QtWidgets.QPlainTextEdit(self) # Set whatever the default monospace font is for the platform f = QtGui.QFont('nosuchfont') f.setStyleHint(f.Monospace) te.setFont(f) te.setReadOnly(True) PB = QtWidgets.QPushButton self.work_button = PB('Start background work', self) self.log_button = PB('Log a message at a random level', self) self.clear_button = PB('Clear log window', self) self.handler = h = QtHandler(self.update_status) # Remember to use qThreadName rather than threadName in the format string. fs = '%(asctime)s %(qThreadName)-12s %(levelname)-8s %(message)s' formatter = logging.Formatter(fs) h.setFormatter(formatter) logger.addHandler(h) # Set up to terminate the QThread when we exit app.aboutToQuit.connect(self.force_quit) # Lay out all the widgets layout = QtWidgets.QVBoxLayout(self) layout.addWidget(te) layout.addWidget(self.work_button) layout.addWidget(self.log_button) layout.addWidget(self.clear_button) self.setFixedSize(900, 400) # Connect the non-worker slots and signals self.log_button.clicked.connect(self.manual_update) self.clear_button.clicked.connect(self.clear_display) # Start a new worker thread and connect the slots for the worker self.start_thread() self.work_button.clicked.connect(self.worker.start) # Once started, the button should be disabled self.work_button.clicked.connect(lambda : self.work_button.setEnabled(False)) def start_thread(self): self.worker = Worker() self.worker_thread = QtCore.QThread() self.worker.setObjectName('Worker') self.worker_thread.setObjectName('WorkerThread') # for qThreadName self.worker.moveToThread(self.worker_thread) # This will start an event loop in the worker thread self.worker_thread.start() def kill_thread(self): # Just tell the worker to stop, then tell it to quit and wait for that # to happen self.worker_thread.requestInterruption() if self.worker_thread.isRunning(): self.worker_thread.quit() self.worker_thread.wait() else: print('worker has already exited.') def force_quit(self): # For use when the window is closed if self.worker_thread.isRunning(): self.kill_thread() # The functions below update the UI and run in the main thread because # that's where the slots are set up @Slot(str, logging.LogRecord) def update_status(self, status, record): color = self.COLORS.get(record.levelno, 'black') s = '
%s
' % (color, status) self.textedit.appendHtml(s) @Slot() def manual_update(self): # This function uses the formatted message passed in, but also uses # information from the record to format the message in an appropriate # color according to its severity (level). level = random.choice(LEVELS) extra = {'qThreadName': ctname() } logger.log(level, 'Manually logged!', extra=extra) @Slot() def clear_display(self): self.textedit.clear() def main(): QtCore.QThread.currentThread().setObjectName('MainThread') logging.getLogger().setLevel(logging.DEBUG) app = QtWidgets.QApplication(sys.argv) example = Window(app) example.show() sys.exit(app.exec_()) if __name__=='__main__': main() Patterns to avoid ================= Although the preceding sections have described ways of doing things you might need to do or deal with, it is worth mentioning some usage patterns which are *unhelpful*, and which should therefore be avoided in most cases. The following sections are in no particular order. Opening the same log file multiple times ---------------------------------------- On Windows, you will generally not be able to open the same file multiple times as this will lead to a “file is in use by another process” error. However, on POSIX platforms you’ll not get any errors if you open the same file multiple times. This could be done accidentally, for example by: * Adding a file handler more than once which references the same file (e.g. by a copy/paste/forget-to-change error). * Opening two files that look different, as they have different names, but are the same because one is a symbolic link to the other. * Forking a process, following which both parent and child have a reference to the same file. This might be through use of the "multiprocessing" module, for example. Opening a file multiple times might *appear* to work most of the time, but can lead to a number of problems in practice: * Logging output can be garbled because multiple threads or processes try to write to the same file. Although logging guards against concurrent use of the same handler instance by multiple threads, there is no such protection if concurrent writes are attempted by two different threads using two different handler instances which happen to point to the same file. * An attempt to delete a file (e.g. during file rotation) silently fails, because there is another reference pointing to it. This can lead to confusion and wasted debugging time - log entries end up in unexpected places, or are lost altogether. Use the techniques outlined in Logging to a single file from multiple processes to circumvent such issues. Using loggers as attributes in a class or passing them as parameters -------------------------------------------------------------------- While there might be unusual cases where you’ll need to do this, in general there is no point because loggers are singletons. Code can always access a given logger instance by name using "logging.getLogger(name)", so passing instances around and holding them as instance attributes is pointless. Note that in other languages such as Java and C#, loggers are often static class attributes. However, this pattern doesn’t make sense in Python, where the module (and not the class) is the unit of software decomposition. Adding handlers other than "NullHandler" to a logger in a library ----------------------------------------------------------------- Configuring logging by adding handlers, formatters and filters is the responsibility of the application developer, not the library developer. If you are maintaining a library, ensure that you don’t add handlers to any of your loggers other than a "NullHandler" instance. Creating a lot of loggers ------------------------- Loggers are singletons that are never freed during a script execution, and so creating lots of loggers will use up memory which can’t then be freed. Rather than create a logger per e.g. file processed or network connection made, use the existing mechanisms for passing contextual information into your logs and restrict the loggers created to those describing areas within your application (generally modules, but occasionally slightly more fine-grained than that). Logging HOWTO ************* Author: Vinay Sajip Basic Logging Tutorial ====================== Logging is a means of tracking events that happen when some software runs. The software’s developer adds logging calls to their code to indicate that certain events have occurred. An event is described by a descriptive message which can optionally contain variable data (i.e. data that is potentially different for each occurrence of the event). Events also have an importance which the developer ascribes to the event; the importance can also be called the *level* or *severity*. When to use logging ------------------- Logging provides a set of convenience functions for simple logging usage. These are "debug()", "info()", "warning()", "error()" and "critical()". To determine when to use logging, see the table below, which states, for each of a set of common tasks, the best tool to use for it. +---------------------------------------+----------------------------------------+ | Task you want to perform | The best tool for the task | |=======================================|========================================| | Display console output for ordinary | "print()" | | usage of a command line script or | | | program | | +---------------------------------------+----------------------------------------+ | Report events that occur during | "logging.info()" (or "logging.debug()" | | normal operation of a program (e.g. | for very detailed output for | | for status monitoring or fault | diagnostic purposes) | | investigation) | | +---------------------------------------+----------------------------------------+ | Issue a warning regarding a | "warnings.warn()" in library code if | | particular runtime event | the issue is avoidable and the client | | | application should be modified to | | | eliminate the warning | | | "logging.warning()" if there is | | | nothing the client application can do | | | about the situation, but the event | | | should still be noted | +---------------------------------------+----------------------------------------+ | Report an error regarding a | Raise an exception | | particular runtime event | | +---------------------------------------+----------------------------------------+ | Report suppression of an error | "logging.error()", | | without raising an exception (e.g. | "logging.exception()" or | | error handler in a long-running | "logging.critical()" as appropriate | | server process) | for the specific error and application | | | domain | +---------------------------------------+----------------------------------------+ The logging functions are named after the level or severity of the events they are used to track. The standard levels and their applicability are described below (in increasing order of severity): +----------------+-----------------------------------------------+ | Level | When it’s used | |================|===============================================| | "DEBUG" | Detailed information, typically of interest | | | only when diagnosing problems. | +----------------+-----------------------------------------------+ | "INFO" | Confirmation that things are working as | | | expected. | +----------------+-----------------------------------------------+ | "WARNING" | An indication that something unexpected | | | happened, or indicative of some problem in | | | the near future (e.g. ‘disk space low’). The | | | software is still working as expected. | +----------------+-----------------------------------------------+ | "ERROR" | Due to a more serious problem, the software | | | has not been able to perform some function. | +----------------+-----------------------------------------------+ | "CRITICAL" | A serious error, indicating that the program | | | itself may be unable to continue running. | +----------------+-----------------------------------------------+ The default level is "WARNING", which means that only events of this level and above will be tracked, unless the logging package is configured to do otherwise. Events that are tracked can be handled in different ways. The simplest way of handling tracked events is to print them to the console. Another common way is to write them to a disk file. A simple example ---------------- A very simple example is: import logging logging.warning('Watch out!') # will print a message to the console logging.info('I told you so') # will not print anything If you type these lines into a script and run it, you’ll see: WARNING:root:Watch out! printed out on the console. The "INFO" message doesn’t appear because the default level is "WARNING". The printed message includes the indication of the level and the description of the event provided in the logging call, i.e. ‘Watch out!’. Don’t worry about the ‘root’ part for now: it will be explained later. The actual output can be formatted quite flexibly if you need that; formatting options will also be explained later. Logging to a file ----------------- A very common situation is that of recording logging events in a file, so let’s look at that next. Be sure to try the following in a newly- started Python interpreter, and don’t just continue from the session described above: import logging logging.basicConfig(filename='example.log', encoding='utf-8', level=logging.DEBUG) logging.debug('This message should go to the log file') logging.info('So should this') logging.warning('And this, too') logging.error('And non-ASCII stuff, too, like Øresund and Malmö') Changed in version 3.9: The *encoding* argument was added. In earlier Python versions, or if not specified, the encoding used is the default value used by "open()". While not shown in the above example, an *errors* argument can also now be passed, which determines how encoding errors are handled. For available values and the default, see the documentation for "open()". And now if we open the file and look at what we have, we should find the log messages: DEBUG:root:This message should go to the log file INFO:root:So should this WARNING:root:And this, too ERROR:root:And non-ASCII stuff, too, like Øresund and Malmö This example also shows how you can set the logging level which acts as the threshold for tracking. In this case, because we set the threshold to "DEBUG", all of the messages were printed. If you want to set the logging level from a command-line option such as: --log=INFO and you have the value of the parameter passed for "--log" in some variable *loglevel*, you can use: getattr(logging, loglevel.upper()) to get the value which you’ll pass to "basicConfig()" via the *level* argument. You may want to error check any user input value, perhaps as in the following example: # assuming loglevel is bound to the string value obtained from the # command line argument. Convert to upper case to allow the user to # specify --log=DEBUG or --log=debug numeric_level = getattr(logging, loglevel.upper(), None) if not isinstance(numeric_level, int): raise ValueError('Invalid log level: %s' % loglevel) logging.basicConfig(level=numeric_level, ...) The call to "basicConfig()" should come *before* any calls to "debug()", "info()" etc. As it’s intended as a one-off simple configuration facility, only the first call will actually do anything: subsequent calls are effectively no-ops. If you run the above script several times, the messages from successive runs are appended to the file *example.log*. If you want each run to start afresh, not remembering the messages from earlier runs, you can specify the *filemode* argument, by changing the call in the above example to: logging.basicConfig(filename='example.log', filemode='w', level=logging.DEBUG) The output will be the same as before, but the log file is no longer appended to, so the messages from earlier runs are lost. Logging from multiple modules ----------------------------- If your program consists of multiple modules, here’s an example of how you could organize logging in it: # myapp.py import logging import mylib def main(): logging.basicConfig(filename='myapp.log', level=logging.INFO) logging.info('Started') mylib.do_something() logging.info('Finished') if __name__ == '__main__': main() # mylib.py import logging def do_something(): logging.info('Doing something') If you run *myapp.py*, you should see this in *myapp.log*: INFO:root:Started INFO:root:Doing something INFO:root:Finished which is hopefully what you were expecting to see. You can generalize this to multiple modules, using the pattern in *mylib.py*. Note that for this simple usage pattern, you won’t know, by looking in the log file, *where* in your application your messages came from, apart from looking at the event description. If you want to track the location of your messages, you’ll need to refer to the documentation beyond the tutorial level – see Advanced Logging Tutorial. Logging variable data --------------------- To log variable data, use a format string for the event description message and append the variable data as arguments. For example: import logging logging.warning('%s before you %s', 'Look', 'leap!') will display: WARNING:root:Look before you leap! As you can see, merging of variable data into the event description message uses the old, %-style of string formatting. This is for backwards compatibility: the logging package pre-dates newer formatting options such as "str.format()" and "string.Template". These newer formatting options *are* supported, but exploring them is outside the scope of this tutorial: see Using particular formatting styles throughout your application for more information. Changing the format of displayed messages ----------------------------------------- To change the format which is used to display messages, you need to specify the format you want to use: import logging logging.basicConfig(format='%(levelname)s:%(message)s', level=logging.DEBUG) logging.debug('This message should appear on the console') logging.info('So should this') logging.warning('And this, too') which would print: DEBUG:This message should appear on the console INFO:So should this WARNING:And this, too Notice that the ‘root’ which appeared in earlier examples has disappeared. For a full set of things that can appear in format strings, you can refer to the documentation for LogRecord attributes, but for simple usage, you just need the *levelname* (severity), *message* (event description, including variable data) and perhaps to display when the event occurred. This is described in the next section. Displaying the date/time in messages ------------------------------------ To display the date and time of an event, you would place ‘%(asctime)s’ in your format string: import logging logging.basicConfig(format='%(asctime)s %(message)s') logging.warning('is when this event was logged.') which should print something like this: 2010-12-12 11:41:42,612 is when this event was logged. The default format for date/time display (shown above) is like ISO8601 or **RFC 3339**. If you need more control over the formatting of the date/time, provide a *datefmt* argument to "basicConfig", as in this example: import logging logging.basicConfig(format='%(asctime)s %(message)s', datefmt='%m/%d/%Y %I:%M:%S %p') logging.warning('is when this event was logged.') which would display something like this: 12/12/2010 11:46:36 AM is when this event was logged. The format of the *datefmt* argument is the same as supported by "time.strftime()". Next Steps ---------- That concludes the basic tutorial. It should be enough to get you up and running with logging. There’s a lot more that the logging package offers, but to get the best out of it, you’ll need to invest a little more of your time in reading the following sections. If you’re ready for that, grab some of your favourite beverage and carry on. If your logging needs are simple, then use the above examples to incorporate logging into your own scripts, and if you run into problems or don’t understand something, please post a question on the comp.lang.python Usenet group (available at https://groups.google.com/forum/#!forum/comp.lang.python) and you should receive help before too long. Still here? You can carry on reading the next few sections, which provide a slightly more advanced/in-depth tutorial than the basic one above. After that, you can take a look at the Logging Cookbook. Advanced Logging Tutorial ========================= The logging library takes a modular approach and offers several categories of components: loggers, handlers, filters, and formatters. * Loggers expose the interface that application code directly uses. * Handlers send the log records (created by loggers) to the appropriate destination. * Filters provide a finer grained facility for determining which log records to output. * Formatters specify the layout of log records in the final output. Log event information is passed between loggers, handlers, filters and formatters in a "LogRecord" instance. Logging is performed by calling methods on instances of the "Logger" class (hereafter called *loggers*). Each instance has a name, and they are conceptually arranged in a namespace hierarchy using dots (periods) as separators. For example, a logger named ‘scan’ is the parent of loggers ‘scan.text’, ‘scan.html’ and ‘scan.pdf’. Logger names can be anything you want, and indicate the area of an application in which a logged message originates. A good convention to use when naming loggers is to use a module-level logger, in each module which uses logging, named as follows: logger = logging.getLogger(__name__) This means that logger names track the package/module hierarchy, and it’s intuitively obvious where events are logged just from the logger name. The root of the hierarchy of loggers is called the root logger. That’s the logger used by the functions "debug()", "info()", "warning()", "error()" and "critical()", which just call the same-named method of the root logger. The functions and the methods have the same signatures. The root logger’s name is printed as ‘root’ in the logged output. It is, of course, possible to log messages to different destinations. Support is included in the package for writing log messages to files, HTTP GET/POST locations, email via SMTP, generic sockets, queues, or OS-specific logging mechanisms such as syslog or the Windows NT event log. Destinations are served by *handler* classes. You can create your own log destination class if you have special requirements not met by any of the built-in handler classes. By default, no destination is set for any logging messages. You can specify a destination (such as console or file) by using "basicConfig()" as in the tutorial examples. If you call the functions "debug()", "info()", "warning()", "error()" and "critical()", they will check to see if no destination is set; and if one is not set, they will set a destination of the console ("sys.stderr") and a default format for the displayed message before delegating to the root logger to do the actual message output. The default format set by "basicConfig()" for messages is: severity:logger name:message You can change this by passing a format string to "basicConfig()" with the *format* keyword argument. For all options regarding how a format string is constructed, see Formatter Objects. Logging Flow ------------ The flow of log event information in loggers and handlers is illustrated in the following diagram. [image] Loggers ------- "Logger" objects have a threefold job. First, they expose several methods to application code so that applications can log messages at runtime. Second, logger objects determine which log messages to act upon based upon severity (the default filtering facility) or filter objects. Third, logger objects pass along relevant log messages to all interested log handlers. The most widely used methods on logger objects fall into two categories: configuration and message sending. These are the most common configuration methods: * "Logger.setLevel()" specifies the lowest-severity log message a logger will handle, where debug is the lowest built-in severity level and critical is the highest built-in severity. For example, if the severity level is INFO, the logger will handle only INFO, WARNING, ERROR, and CRITICAL messages and will ignore DEBUG messages. * "Logger.addHandler()" and "Logger.removeHandler()" add and remove handler objects from the logger object. Handlers are covered in more detail in Handlers. * "Logger.addFilter()" and "Logger.removeFilter()" add and remove filter objects from the logger object. Filters are covered in more detail in Filter Objects. You don’t need to always call these methods on every logger you create. See the last two paragraphs in this section. With the logger object configured, the following methods create log messages: * "Logger.debug()", "Logger.info()", "Logger.warning()", "Logger.error()", and "Logger.critical()" all create log records with a message and a level that corresponds to their respective method names. The message is actually a format string, which may contain the standard string substitution syntax of "%s", "%d", "%f", and so on. The rest of their arguments is a list of objects that correspond with the substitution fields in the message. With regard to "**kwargs", the logging methods care only about a keyword of "exc_info" and use it to determine whether to log exception information. * "Logger.exception()" creates a log message similar to "Logger.error()". The difference is that "Logger.exception()" dumps a stack trace along with it. Call this method only from an exception handler. * "Logger.log()" takes a log level as an explicit argument. This is a little more verbose for logging messages than using the log level convenience methods listed above, but this is how to log at custom log levels. "getLogger()" returns a reference to a logger instance with the specified name if it is provided, or "root" if not. The names are period-separated hierarchical structures. Multiple calls to "getLogger()" with the same name will return a reference to the same logger object. Loggers that are further down in the hierarchical list are children of loggers higher up in the list. For example, given a logger with a name of "foo", loggers with names of "foo.bar", "foo.bar.baz", and "foo.bam" are all descendants of "foo". Loggers have a concept of *effective level*. If a level is not explicitly set on a logger, the level of its parent is used instead as its effective level. If the parent has no explicit level set, *its* parent is examined, and so on - all ancestors are searched until an explicitly set level is found. The root logger always has an explicit level set ("WARNING" by default). When deciding whether to process an event, the effective level of the logger is used to determine whether the event is passed to the logger’s handlers. Child loggers propagate messages up to the handlers associated with their ancestor loggers. Because of this, it is unnecessary to define and configure handlers for all the loggers an application uses. It is sufficient to configure handlers for a top-level logger and create child loggers as needed. (You can, however, turn off propagation by setting the *propagate* attribute of a logger to "False".) Handlers -------- "Handler" objects are responsible for dispatching the appropriate log messages (based on the log messages’ severity) to the handler’s specified destination. "Logger" objects can add zero or more handler objects to themselves with an "addHandler()" method. As an example scenario, an application may want to send all log messages to a log file, all log messages of error or higher to stdout, and all messages of critical to an email address. This scenario requires three individual handlers where each handler is responsible for sending messages of a specific severity to a specific location. The standard library includes quite a few handler types (see Useful Handlers); the tutorials use mainly "StreamHandler" and "FileHandler" in its examples. There are very few methods in a handler for application developers to concern themselves with. The only handler methods that seem relevant for application developers who are using the built-in handler objects (that is, not creating custom handlers) are the following configuration methods: * The "setLevel()" method, just as in logger objects, specifies the lowest severity that will be dispatched to the appropriate destination. Why are there two "setLevel()" methods? The level set in the logger determines which severity of messages it will pass to its handlers. The level set in each handler determines which messages that handler will send on. * "setFormatter()" selects a Formatter object for this handler to use. * "addFilter()" and "removeFilter()" respectively configure and deconfigure filter objects on handlers. Application code should not directly instantiate and use instances of "Handler". Instead, the "Handler" class is a base class that defines the interface that all handlers should have and establishes some default behavior that child classes can use (or override). Formatters ---------- Formatter objects configure the final order, structure, and contents of the log message. Unlike the base "logging.Handler" class, application code may instantiate formatter classes, although you could likely subclass the formatter if your application needs special behavior. The constructor takes three optional arguments – a message format string, a date format string and a style indicator. logging.Formatter.__init__(fmt=None, datefmt=None, style='%') If there is no message format string, the default is to use the raw message. If there is no date format string, the default date format is: %Y-%m-%d %H:%M:%S with the milliseconds tacked on at the end. The "style" is one of *%*, ‘{’ or ‘$’. If one of these is not specified, then ‘%’ will be used. If the "style" is ‘%’, the message format string uses "%()s" styled string substitution; the possible keys are documented in LogRecord attributes. If the style is ‘{’, the message format string is assumed to be compatible with "str.format()" (using keyword arguments), while if the style is ‘$’ then the message format string should conform to what is expected by "string.Template.substitute()". Changed in version 3.2: Added the "style" parameter. The following message format string will log the time in a human- readable format, the severity of the message, and the contents of the message, in that order: '%(asctime)s - %(levelname)s - %(message)s' Formatters use a user-configurable function to convert the creation time of a record to a tuple. By default, "time.localtime()" is used; to change this for a particular formatter instance, set the "converter" attribute of the instance to a function with the same signature as "time.localtime()" or "time.gmtime()". To change it for all formatters, for example if you want all logging times to be shown in GMT, set the "converter" attribute in the Formatter class (to "time.gmtime" for GMT display). Configuring Logging ------------------- Programmers can configure logging in three ways: 1. Creating loggers, handlers, and formatters explicitly using Python code that calls the configuration methods listed above. 2. Creating a logging config file and reading it using the "fileConfig()" function. 3. Creating a dictionary of configuration information and passing it to the "dictConfig()" function. For the reference documentation on the last two options, see Configuration functions. The following example configures a very simple logger, a console handler, and a simple formatter using Python code: import logging # create logger logger = logging.getLogger('simple_example') logger.setLevel(logging.DEBUG) # create console handler and set level to debug ch = logging.StreamHandler() ch.setLevel(logging.DEBUG) # create formatter formatter = logging.Formatter('%(asctime)s - %(name)s - %(levelname)s - %(message)s') # add formatter to ch ch.setFormatter(formatter) # add ch to logger logger.addHandler(ch) # 'application' code logger.debug('debug message') logger.info('info message') logger.warning('warn message') logger.error('error message') logger.critical('critical message') Running this module from the command line produces the following output: $ python simple_logging_module.py 2005-03-19 15:10:26,618 - simple_example - DEBUG - debug message 2005-03-19 15:10:26,620 - simple_example - INFO - info message 2005-03-19 15:10:26,695 - simple_example - WARNING - warn message 2005-03-19 15:10:26,697 - simple_example - ERROR - error message 2005-03-19 15:10:26,773 - simple_example - CRITICAL - critical message The following Python module creates a logger, handler, and formatter nearly identical to those in the example listed above, with the only difference being the names of the objects: import logging import logging.config logging.config.fileConfig('logging.conf') # create logger logger = logging.getLogger('simpleExample') # 'application' code logger.debug('debug message') logger.info('info message') logger.warning('warn message') logger.error('error message') logger.critical('critical message') Here is the logging.conf file: [loggers] keys=root,simpleExample [handlers] keys=consoleHandler [formatters] keys=simpleFormatter [logger_root] level=DEBUG handlers=consoleHandler [logger_simpleExample] level=DEBUG handlers=consoleHandler qualname=simpleExample propagate=0 [handler_consoleHandler] class=StreamHandler level=DEBUG formatter=simpleFormatter args=(sys.stdout,) [formatter_simpleFormatter] format=%(asctime)s - %(name)s - %(levelname)s - %(message)s datefmt= The output is nearly identical to that of the non-config-file-based example: $ python simple_logging_config.py 2005-03-19 15:38:55,977 - simpleExample - DEBUG - debug message 2005-03-19 15:38:55,979 - simpleExample - INFO - info message 2005-03-19 15:38:56,054 - simpleExample - WARNING - warn message 2005-03-19 15:38:56,055 - simpleExample - ERROR - error message 2005-03-19 15:38:56,130 - simpleExample - CRITICAL - critical message You can see that the config file approach has a few advantages over the Python code approach, mainly separation of configuration and code and the ability of noncoders to easily modify the logging properties. Warning: The "fileConfig()" function takes a default parameter, "disable_existing_loggers", which defaults to "True" for reasons of backward compatibility. This may or may not be what you want, since it will cause any non-root loggers existing before the "fileConfig()" call to be disabled unless they (or an ancestor) are explicitly named in the configuration. Please refer to the reference documentation for more information, and specify "False" for this parameter if you wish.The dictionary passed to "dictConfig()" can also specify a Boolean value with key "disable_existing_loggers", which if not specified explicitly in the dictionary also defaults to being interpreted as "True". This leads to the logger-disabling behaviour described above, which may not be what you want - in which case, provide the key explicitly with a value of "False". Note that the class names referenced in config files need to be either relative to the logging module, or absolute values which can be resolved using normal import mechanisms. Thus, you could use either "WatchedFileHandler" (relative to the logging module) or "mypackage.mymodule.MyHandler" (for a class defined in package "mypackage" and module "mymodule", where "mypackage" is available on the Python import path). In Python 3.2, a new means of configuring logging has been introduced, using dictionaries to hold configuration information. This provides a superset of the functionality of the config-file-based approach outlined above, and is the recommended configuration method for new applications and deployments. Because a Python dictionary is used to hold configuration information, and since you can populate that dictionary using different means, you have more options for configuration. For example, you can use a configuration file in JSON format, or, if you have access to YAML processing functionality, a file in YAML format, to populate the configuration dictionary. Or, of course, you can construct the dictionary in Python code, receive it in pickled form over a socket, or use whatever approach makes sense for your application. Here’s an example of the same configuration as above, in YAML format for the new dictionary-based approach: version: 1 formatters: simple: format: '%(asctime)s - %(name)s - %(levelname)s - %(message)s' handlers: console: class: logging.StreamHandler level: DEBUG formatter: simple stream: ext://sys.stdout loggers: simpleExample: level: DEBUG handlers: [console] propagate: no root: level: DEBUG handlers: [console] For more information about logging using a dictionary, see Configuration functions. What happens if no configuration is provided -------------------------------------------- If no logging configuration is provided, it is possible to have a situation where a logging event needs to be output, but no handlers can be found to output the event. The behaviour of the logging package in these circumstances is dependent on the Python version. For versions of Python prior to 3.2, the behaviour is as follows: * If *logging.raiseExceptions* is "False" (production mode), the event is silently dropped. * If *logging.raiseExceptions* is "True" (development mode), a message ‘No handlers could be found for logger X.Y.Z’ is printed once. In Python 3.2 and later, the behaviour is as follows: * The event is output using a ‘handler of last resort’, stored in "logging.lastResort". This internal handler is not associated with any logger, and acts like a "StreamHandler" which writes the event description message to the current value of "sys.stderr" (therefore respecting any redirections which may be in effect). No formatting is done on the message - just the bare event description message is printed. The handler’s level is set to "WARNING", so all events at this and greater severities will be output. To obtain the pre-3.2 behaviour, "logging.lastResort" can be set to "None". Configuring Logging for a Library --------------------------------- When developing a library which uses logging, you should take care to document how the library uses logging - for example, the names of loggers used. Some consideration also needs to be given to its logging configuration. If the using application does not use logging, and library code makes logging calls, then (as described in the previous section) events of severity "WARNING" and greater will be printed to "sys.stderr". This is regarded as the best default behaviour. If for some reason you *don’t* want these messages printed in the absence of any logging configuration, you can attach a do-nothing handler to the top-level logger for your library. This avoids the message being printed, since a handler will always be found for the library’s events: it just doesn’t produce any output. If the library user configures logging for application use, presumably that configuration will add some handlers, and if levels are suitably configured then logging calls made in library code will send output to those handlers, as normal. A do-nothing handler is included in the logging package: "NullHandler" (since Python 3.1). An instance of this handler could be added to the top-level logger of the logging namespace used by the library (*if* you want to prevent your library’s logged events being output to "sys.stderr" in the absence of logging configuration). If all logging by a library *foo* is done using loggers with names matching ‘foo.x’, ‘foo.x.y’, etc. then the code: import logging logging.getLogger('foo').addHandler(logging.NullHandler()) should have the desired effect. If an organisation produces a number of libraries, then the logger name specified can be ‘orgname.foo’ rather than just ‘foo’. Note: It is strongly advised that you *do not add any handlers other than* "NullHandler" *to your library’s loggers*. This is because the configuration of handlers is the prerogative of the application developer who uses your library. The application developer knows their target audience and what handlers are most appropriate for their application: if you add handlers ‘under the hood’, you might well interfere with their ability to carry out unit tests and deliver logs which suit their requirements. Logging Levels ============== The numeric values of logging levels are given in the following table. These are primarily of interest if you want to define your own levels, and need them to have specific values relative to the predefined levels. If you define a level with the same numeric value, it overwrites the predefined value; the predefined name is lost. +----------------+-----------------+ | Level | Numeric value | |================|=================| | "CRITICAL" | 50 | +----------------+-----------------+ | "ERROR" | 40 | +----------------+-----------------+ | "WARNING" | 30 | +----------------+-----------------+ | "INFO" | 20 | +----------------+-----------------+ | "DEBUG" | 10 | +----------------+-----------------+ | "NOTSET" | 0 | +----------------+-----------------+ Levels can also be associated with loggers, being set either by the developer or through loading a saved logging configuration. When a logging method is called on a logger, the logger compares its own level with the level associated with the method call. If the logger’s level is higher than the method call’s, no logging message is actually generated. This is the basic mechanism controlling the verbosity of logging output. Logging messages are encoded as instances of the "LogRecord" class. When a logger decides to actually log an event, a "LogRecord" instance is created from the logging message. Logging messages are subjected to a dispatch mechanism through the use of *handlers*, which are instances of subclasses of the "Handler" class. Handlers are responsible for ensuring that a logged message (in the form of a "LogRecord") ends up in a particular location (or set of locations) which is useful for the target audience for that message (such as end users, support desk staff, system administrators, developers). Handlers are passed "LogRecord" instances intended for particular destinations. Each logger can have zero, one or more handlers associated with it (via the "addHandler()" method of "Logger"). In addition to any handlers directly associated with a logger, *all handlers associated with all ancestors of the logger* are called to dispatch the message (unless the *propagate* flag for a logger is set to a false value, at which point the passing to ancestor handlers stops). Just as for loggers, handlers can have levels associated with them. A handler’s level acts as a filter in the same way as a logger’s level does. If a handler decides to actually dispatch an event, the "emit()" method is used to send the message to its destination. Most user- defined subclasses of "Handler" will need to override this "emit()". Custom Levels ------------- Defining your own levels is possible, but should not be necessary, as the existing levels have been chosen on the basis of practical experience. However, if you are convinced that you need custom levels, great care should be exercised when doing this, and it is possibly *a very bad idea to define custom levels if you are developing a library*. That’s because if multiple library authors all define their own custom levels, there is a chance that the logging output from such multiple libraries used together will be difficult for the using developer to control and/or interpret, because a given numeric value might mean different things for different libraries. Useful Handlers =============== In addition to the base "Handler" class, many useful subclasses are provided: 1. "StreamHandler" instances send messages to streams (file-like objects). 2. "FileHandler" instances send messages to disk files. 3. "BaseRotatingHandler" is the base class for handlers that rotate log files at a certain point. It is not meant to be instantiated directly. Instead, use "RotatingFileHandler" or "TimedRotatingFileHandler". 4. "RotatingFileHandler" instances send messages to disk files, with support for maximum log file sizes and log file rotation. 5. "TimedRotatingFileHandler" instances send messages to disk files, rotating the log file at certain timed intervals. 6. "SocketHandler" instances send messages to TCP/IP sockets. Since 3.4, Unix domain sockets are also supported. 7. "DatagramHandler" instances send messages to UDP sockets. Since 3.4, Unix domain sockets are also supported. 8. "SMTPHandler" instances send messages to a designated email address. 9. "SysLogHandler" instances send messages to a Unix syslog daemon, possibly on a remote machine. 10. "NTEventLogHandler" instances send messages to a Windows NT/2000/XP event log. 11. "MemoryHandler" instances send messages to a buffer in memory, which is flushed whenever specific criteria are met. 12. "HTTPHandler" instances send messages to an HTTP server using either "GET" or "POST" semantics. 13. "WatchedFileHandler" instances watch the file they are logging to. If the file changes, it is closed and reopened using the file name. This handler is only useful on Unix-like systems; Windows does not support the underlying mechanism used. 14. "QueueHandler" instances send messages to a queue, such as those implemented in the "queue" or "multiprocessing" modules. 15. "NullHandler" instances do nothing with error messages. They are used by library developers who want to use logging, but want to avoid the ‘No handlers could be found for logger XXX’ message which can be displayed if the library user has not configured logging. See Configuring Logging for a Library for more information. New in version 3.1: The "NullHandler" class. New in version 3.2: The "QueueHandler" class. The "NullHandler", "StreamHandler" and "FileHandler" classes are defined in the core logging package. The other handlers are defined in a sub-module, "logging.handlers". (There is also another sub-module, "logging.config", for configuration functionality.) Logged messages are formatted for presentation through instances of the "Formatter" class. They are initialized with a format string suitable for use with the % operator and a dictionary. For formatting multiple messages in a batch, instances of "BufferingFormatter" can be used. In addition to the format string (which is applied to each message in the batch), there is provision for header and trailer format strings. When filtering based on logger level and/or handler level is not enough, instances of "Filter" can be added to both "Logger" and "Handler" instances (through their "addFilter()" method). Before deciding to process a message further, both loggers and handlers consult all their filters for permission. If any filter returns a false value, the message is not processed further. The basic "Filter" functionality allows filtering by specific logger name. If this feature is used, messages sent to the named logger and its children are allowed through the filter, and all others dropped. Exceptions raised during logging ================================ The logging package is designed to swallow exceptions which occur while logging in production. This is so that errors which occur while handling logging events - such as logging misconfiguration, network or other similar errors - do not cause the application using logging to terminate prematurely. "SystemExit" and "KeyboardInterrupt" exceptions are never swallowed. Other exceptions which occur during the "emit()" method of a "Handler" subclass are passed to its "handleError()" method. The default implementation of "handleError()" in "Handler" checks to see if a module-level variable, "raiseExceptions", is set. If set, a traceback is printed to "sys.stderr". If not set, the exception is swallowed. Note: The default value of "raiseExceptions" is "True". This is because during development, you typically want to be notified of any exceptions that occur. It’s advised that you set "raiseExceptions" to "False" for production usage. Using arbitrary objects as messages =================================== In the preceding sections and examples, it has been assumed that the message passed when logging the event is a string. However, this is not the only possibility. You can pass an arbitrary object as a message, and its "__str__()" method will be called when the logging system needs to convert it to a string representation. In fact, if you want to, you can avoid computing a string representation altogether - for example, the "SocketHandler" emits an event by pickling it and sending it over the wire. Optimization ============ Formatting of message arguments is deferred until it cannot be avoided. However, computing the arguments passed to the logging method can also be expensive, and you may want to avoid doing it if the logger will just throw away your event. To decide what to do, you can call the "isEnabledFor()" method which takes a level argument and returns true if the event would be created by the Logger for that level of call. You can write code like this: if logger.isEnabledFor(logging.DEBUG): logger.debug('Message with %s, %s', expensive_func1(), expensive_func2()) so that if the logger’s threshold is set above "DEBUG", the calls to "expensive_func1()" and "expensive_func2()" are never made. Note: In some cases, "isEnabledFor()" can itself be more expensive than you’d like (e.g. for deeply nested loggers where an explicit level is only set high up in the logger hierarchy). In such cases (or if you want to avoid calling a method in tight loops), you can cache the result of a call to "isEnabledFor()" in a local or instance variable, and use that instead of calling the method each time. Such a cached value would only need to be recomputed when the logging configuration changes dynamically while the application is running (which is not all that common). There are other optimizations which can be made for specific applications which need more precise control over what logging information is collected. Here’s a list of things you can do to avoid processing during logging which you don’t need: +-------------------------------------------------+------------------------------------------+ | What you don’t want to collect | How to avoid collecting it | |=================================================|==========================================| | Information about where calls were made from. | Set "logging._srcfile" to "None". This | | | avoids calling "sys._getframe()", which | | | may help to speed up your code in | | | environments like PyPy (which can’t | | | speed up code that uses | | | "sys._getframe()"). | +-------------------------------------------------+------------------------------------------+ | Threading information. | Set "logging.logThreads" to "0". | +-------------------------------------------------+------------------------------------------+ | Process information. | Set "logging.logProcesses" to "0". | +-------------------------------------------------+------------------------------------------+ Also note that the core logging module only includes the basic handlers. If you don’t import "logging.handlers" and "logging.config", they won’t take up any memory. See also: Module "logging" API reference for the logging module. Module "logging.config" Configuration API for the logging module. Module "logging.handlers" Useful handlers included with the logging module. A logging cookbook Porting Python 2 Code to Python 3 ********************************* author: Brett Cannon Abstract ^^^^^^^^ With Python 3 being the future of Python while Python 2 is still in active use, it is good to have your project available for both major releases of Python. This guide is meant to help you figure out how best to support both Python 2 & 3 simultaneously. If you are looking to port an extension module instead of pure Python code, please see Porting Extension Modules to Python 3. If you would like to read one core Python developer’s take on why Python 3 came into existence, you can read Nick Coghlan’s Python 3 Q & A or Brett Cannon’s Why Python 3 exists. For help with porting, you can view the archived python-porting mailing list. The Short Explanation ===================== To make your project be single-source Python 2/3 compatible, the basic steps are: 1. Only worry about supporting Python 2.7 2. Make sure you have good test coverage (coverage.py can help; "python -m pip install coverage") 3. Learn the differences between Python 2 & 3 4. Use Futurize (or Modernize) to update your code (e.g. "python -m pip install future") 5. Use Pylint to help make sure you don’t regress on your Python 3 support ("python -m pip install pylint") 6. Use caniusepython3 to find out which of your dependencies are blocking your use of Python 3 ("python -m pip install caniusepython3") 7. Once your dependencies are no longer blocking you, use continuous integration to make sure you stay compatible with Python 2 & 3 (tox can help test against multiple versions of Python; "python -m pip install tox") 8. Consider using optional static type checking to make sure your type usage works in both Python 2 & 3 (e.g. use mypy to check your typing under both Python 2 & Python 3; "python -m pip install mypy"). Note: Note: Using "python -m pip install" guarantees that the "pip" you invoke is the one installed for the Python currently in use, whether it be a system-wide "pip" or one installed within a virtual environment. Details ======= A key point about supporting Python 2 & 3 simultaneously is that you can start **today**! Even if your dependencies are not supporting Python 3 yet that does not mean you can’t modernize your code **now** to support Python 3. Most changes required to support Python 3 lead to cleaner code using newer practices even in Python 2 code. Another key point is that modernizing your Python 2 code to also support Python 3 is largely automated for you. While you might have to make some API decisions thanks to Python 3 clarifying text data versus binary data, the lower-level work is now mostly done for you and thus can at least benefit from the automated changes immediately. Keep those key points in mind while you read on about the details of porting your code to support Python 2 & 3 simultaneously. Drop support for Python 2.6 and older ------------------------------------- While you can make Python 2.5 work with Python 3, it is **much** easier if you only have to work with Python 2.7. If dropping Python 2.5 is not an option then the six project can help you support Python 2.5 & 3 simultaneously ("python -m pip install six"). Do realize, though, that nearly all the projects listed in this HOWTO will not be available to you. If you are able to skip Python 2.5 and older, then the required changes to your code should continue to look and feel like idiomatic Python code. At worst you will have to use a function instead of a method in some instances or have to import a function instead of using a built-in one, but otherwise the overall transformation should not feel foreign to you. But you should aim for only supporting Python 2.7. Python 2.6 is no longer freely supported and thus is not receiving bugfixes. This means **you** will have to work around any issues you come across with Python 2.6. There are also some tools mentioned in this HOWTO which do not support Python 2.6 (e.g., Pylint), and this will become more commonplace as time goes on. It will simply be easier for you if you only support the versions of Python that you have to support. Make sure you specify the proper version support in your "setup.py" file ------------------------------------------------------------------------ In your "setup.py" file you should have the proper trove classifier specifying what versions of Python you support. As your project does not support Python 3 yet you should at least have "Programming Language :: Python :: 2 :: Only" specified. Ideally you should also specify each major/minor version of Python that you do support, e.g. "Programming Language :: Python :: 2.7". Have good test coverage ----------------------- Once you have your code supporting the oldest version of Python 2 you want it to, you will want to make sure your test suite has good coverage. A good rule of thumb is that if you want to be confident enough in your test suite that any failures that appear after having tools rewrite your code are actual bugs in the tools and not in your code. If you want a number to aim for, try to get over 80% coverage (and don’t feel bad if you find it hard to get better than 90% coverage). If you don’t already have a tool to measure test coverage then coverage.py is recommended. Learn the differences between Python 2 & 3 ------------------------------------------ Once you have your code well-tested you are ready to begin porting your code to Python 3! But to fully understand how your code is going to change and what you want to look out for while you code, you will want to learn what changes Python 3 makes in terms of Python 2. Typically the two best ways of doing that is reading the “What’s New” doc for each release of Python 3 and the Porting to Python 3 book (which is free online). There is also a handy cheat sheet from the Python-Future project. Update your code ---------------- Once you feel like you know what is different in Python 3 compared to Python 2, it’s time to update your code! You have a choice between two tools in porting your code automatically: Futurize and Modernize. Which tool you choose will depend on how much like Python 3 you want your code to be. Futurize does its best to make Python 3 idioms and practices exist in Python 2, e.g. backporting the "bytes" type from Python 3 so that you have semantic parity between the major versions of Python. Modernize, on the other hand, is more conservative and targets a Python 2/3 subset of Python, directly relying on six to help provide compatibility. As Python 3 is the future, it might be best to consider Futurize to begin adjusting to any new practices that Python 3 introduces which you are not accustomed to yet. Regardless of which tool you choose, they will update your code to run under Python 3 while staying compatible with the version of Python 2 you started with. Depending on how conservative you want to be, you may want to run the tool over your test suite first and visually inspect the diff to make sure the transformation is accurate. After you have transformed your test suite and verified that all the tests still pass as expected, then you can transform your application code knowing that any tests which fail is a translation failure. Unfortunately the tools can’t automate everything to make your code work under Python 3 and so there are a handful of things you will need to update manually to get full Python 3 support (which of these steps are necessary vary between the tools). Read the documentation for the tool you choose to use to see what it fixes by default and what it can do optionally to know what will (not) be fixed for you and what you may have to fix on your own (e.g. using "io.open()" over the built-in "open()" function is off by default in Modernize). Luckily, though, there are only a couple of things to watch out for which can be considered large issues that may be hard to debug if not watched for. Division ~~~~~~~~ In Python 3, "5 / 2 == 2.5" and not "2"; all division between "int" values result in a "float". This change has actually been planned since Python 2.2 which was released in 2002. Since then users have been encouraged to add "from __future__ import division" to any and all files which use the "/" and "//" operators or to be running the interpreter with the "-Q" flag. If you have not been doing this then you will need to go through your code and do two things: 1. Add "from __future__ import division" to your files 2. Update any division operator as necessary to either use "//" to use floor division or continue using "/" and expect a float The reason that "/" isn’t simply translated to "//" automatically is that if an object defines a "__truediv__" method but not "__floordiv__" then your code would begin to fail (e.g. a user-defined class that uses "/" to signify some operation but not "//" for the same thing or at all). Text versus binary data ~~~~~~~~~~~~~~~~~~~~~~~ In Python 2 you could use the "str" type for both text and binary data. Unfortunately this confluence of two different concepts could lead to brittle code which sometimes worked for either kind of data, sometimes not. It also could lead to confusing APIs if people didn’t explicitly state that something that accepted "str" accepted either text or binary data instead of one specific type. This complicated the situation especially for anyone supporting multiple languages as APIs wouldn’t bother explicitly supporting "unicode" when they claimed text data support. To make the distinction between text and binary data clearer and more pronounced, Python 3 did what most languages created in the age of the internet have done and made text and binary data distinct types that cannot blindly be mixed together (Python predates widespread access to the internet). For any code that deals only with text or only binary data, this separation doesn’t pose an issue. But for code that has to deal with both, it does mean you might have to now care about when you are using text compared to binary data, which is why this cannot be entirely automated. To start, you will need to decide which APIs take text and which take binary (it is **highly** recommended you don’t design APIs that can take both due to the difficulty of keeping the code working; as stated earlier it is difficult to do well). In Python 2 this means making sure the APIs that take text can work with "unicode" and those that work with binary data work with the "bytes" type from Python 3 (which is a subset of "str" in Python 2 and acts as an alias for "bytes" type in Python 2). Usually the biggest issue is realizing which methods exist on which types in Python 2 & 3 simultaneously (for text that’s "unicode" in Python 2 and "str" in Python 3, for binary that’s "str"/"bytes" in Python 2 and "bytes" in Python 3). The following table lists the **unique** methods of each data type across Python 2 & 3 (e.g., the "decode()" method is usable on the equivalent binary data type in either Python 2 or 3, but it can’t be used by the textual data type consistently between Python 2 and 3 because "str" in Python 3 doesn’t have the method). Do note that as of Python 3.5 the "__mod__" method was added to the bytes type. +--------------------------+-----------------------+ | **Text data** | **Binary data** | +--------------------------+-----------------------+ | | decode | +--------------------------+-----------------------+ | encode | | +--------------------------+-----------------------+ | format | | +--------------------------+-----------------------+ | isdecimal | | +--------------------------+-----------------------+ | isnumeric | | +--------------------------+-----------------------+ Making the distinction easier to handle can be accomplished by encoding and decoding between binary data and text at the edge of your code. This means that when you receive text in binary data, you should immediately decode it. And if your code needs to send text as binary data then encode it as late as possible. This allows your code to work with only text internally and thus eliminates having to keep track of what type of data you are working with. The next issue is making sure you know whether the string literals in your code represent text or binary data. You should add a "b" prefix to any literal that presents binary data. For text you should add a "u" prefix to the text literal. (there is a "__future__" import to force all unspecified literals to be Unicode, but usage has shown it isn’t as effective as adding a "b" or "u" prefix to all literals explicitly) As part of this dichotomy you also need to be careful about opening files. Unless you have been working on Windows, there is a chance you have not always bothered to add the "b" mode when opening a binary file (e.g., "rb" for binary reading). Under Python 3, binary files and text files are clearly distinct and mutually incompatible; see the "io" module for details. Therefore, you **must** make a decision of whether a file will be used for binary access (allowing binary data to be read and/or written) or textual access (allowing text data to be read and/or written). You should also use "io.open()" for opening files instead of the built-in "open()" function as the "io" module is consistent from Python 2 to 3 while the built-in "open()" function is not (in Python 3 it’s actually "io.open()"). Do not bother with the outdated practice of using "codecs.open()" as that’s only necessary for keeping compatibility with Python 2.5. The constructors of both "str" and "bytes" have different semantics for the same arguments between Python 2 & 3. Passing an integer to "bytes" in Python 2 will give you the string representation of the integer: "bytes(3) == '3'". But in Python 3, an integer argument to "bytes" will give you a bytes object as long as the integer specified, filled with null bytes: "bytes(3) == b'\x00\x00\x00'". A similar worry is necessary when passing a bytes object to "str". In Python 2 you just get the bytes object back: "str(b'3') == b'3'". But in Python 3 you get the string representation of the bytes object: "str(b'3') == "b'3'"". Finally, the indexing of binary data requires careful handling (slicing does **not** require any special handling). In Python 2, "b'123'[1] == b'2'" while in Python 3 "b'123'[1] == 50". Because binary data is simply a collection of binary numbers, Python 3 returns the integer value for the byte you index on. But in Python 2 because "bytes == str", indexing returns a one-item slice of bytes. The six project has a function named "six.indexbytes()" which will return an integer like in Python 3: "six.indexbytes(b'123', 1)". To summarize: 1. Decide which of your APIs take text and which take binary data 2. Make sure that your code that works with text also works with "unicode" and code for binary data works with "bytes" in Python 2 (see the table above for what methods you cannot use for each type) 3. Mark all binary literals with a "b" prefix, textual literals with a "u" prefix 4. Decode binary data to text as soon as possible, encode text as binary data as late as possible 5. Open files using "io.open()" and make sure to specify the "b" mode when appropriate 6. Be careful when indexing into binary data Use feature detection instead of version detection ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Inevitably you will have code that has to choose what to do based on what version of Python is running. The best way to do this is with feature detection of whether the version of Python you’re running under supports what you need. If for some reason that doesn’t work then you should make the version check be against Python 2 and not Python 3. To help explain this, let’s look at an example. Let’s pretend that you need access to a feature of "importlib" that is available in Python’s standard library since Python 3.3 and available for Python 2 through importlib2 on PyPI. You might be tempted to write code to access e.g. the "importlib.abc" module by doing the following: import sys if sys.version_info[0] == 3: from importlib import abc else: from importlib2 import abc The problem with this code is what happens when Python 4 comes out? It would be better to treat Python 2 as the exceptional case instead of Python 3 and assume that future Python versions will be more compatible with Python 3 than Python 2: import sys if sys.version_info[0] > 2: from importlib import abc else: from importlib2 import abc The best solution, though, is to do no version detection at all and instead rely on feature detection. That avoids any potential issues of getting the version detection wrong and helps keep you future- compatible: try: from importlib import abc except ImportError: from importlib2 import abc Prevent compatibility regressions --------------------------------- Once you have fully translated your code to be compatible with Python 3, you will want to make sure your code doesn’t regress and stop working under Python 3. This is especially true if you have a dependency which is blocking you from actually running under Python 3 at the moment. To help with staying compatible, any new modules you create should have at least the following block of code at the top of it: from __future__ import absolute_import from __future__ import division from __future__ import print_function You can also run Python 2 with the "-3" flag to be warned about various compatibility issues your code triggers during execution. If you turn warnings into errors with "-Werror" then you can make sure that you don’t accidentally miss a warning. You can also use the Pylint project and its "--py3k" flag to lint your code to receive warnings when your code begins to deviate from Python 3 compatibility. This also prevents you from having to run Modernize or Futurize over your code regularly to catch compatibility regressions. This does require you only support Python 2.7 and Python 3.4 or newer as that is Pylint’s minimum Python version support. Check which dependencies block your transition ---------------------------------------------- **After** you have made your code compatible with Python 3 you should begin to care about whether your dependencies have also been ported. The caniusepython3 project was created to help you determine which projects – directly or indirectly – are blocking you from supporting Python 3. There is both a command-line tool as well as a web interface at https://caniusepython3.com. The project also provides code which you can integrate into your test suite so that you will have a failing test when you no longer have dependencies blocking you from using Python 3. This allows you to avoid having to manually check your dependencies and to be notified quickly when you can start running on Python 3. Update your "setup.py" file to denote Python 3 compatibility ------------------------------------------------------------ Once your code works under Python 3, you should update the classifiers in your "setup.py" to contain "Programming Language :: Python :: 3" and to not specify sole Python 2 support. This will tell anyone using your code that you support Python 2 **and** 3. Ideally you will also want to add classifiers for each major/minor version of Python you now support. Use continuous integration to stay compatible --------------------------------------------- Once you are able to fully run under Python 3 you will want to make sure your code always works under both Python 2 & 3. Probably the best tool for running your tests under multiple Python interpreters is tox. You can then integrate tox with your continuous integration system so that you never accidentally break Python 2 or 3 support. You may also want to use the "-bb" flag with the Python 3 interpreter to trigger an exception when you are comparing bytes to strings or bytes to an int (the latter is available starting in Python 3.5). By default type-differing comparisons simply return "False", but if you made a mistake in your separation of text/binary data handling or indexing on bytes you wouldn’t easily find the mistake. This flag will raise an exception when these kinds of comparisons occur, making the mistake much easier to track down. And that’s mostly it! At this point your code base is compatible with both Python 2 and 3 simultaneously. Your testing will also be set up so that you don’t accidentally break Python 2 or 3 compatibility regardless of which version you typically run your tests under while developing. Consider using optional static type checking -------------------------------------------- Another way to help port your code is to use a static type checker like mypy or pytype on your code. These tools can be used to analyze your code as if it’s being run under Python 2, then you can run the tool a second time as if your code is running under Python 3. By running a static type checker twice like this you can discover if you’re e.g. misusing binary data type in one version of Python compared to another. If you add optional type hints to your code you can also explicitly state whether your APIs use textual or binary data, helping to make sure everything functions as expected in both versions of Python. Regular Expression HOWTO ************************ Author: A.M. Kuchling Abstract ^^^^^^^^ This document is an introductory tutorial to using regular expressions in Python with the "re" module. It provides a gentler introduction than the corresponding section in the Library Reference. Introduction ============ Regular expressions (called REs, or regexes, or regex patterns) are essentially a tiny, highly specialized programming language embedded inside Python and made available through the "re" module. Using this little language, you specify the rules for the set of possible strings that you want to match; this set might contain English sentences, or e-mail addresses, or TeX commands, or anything you like. You can then ask questions such as “Does this string match the pattern?”, or “Is there a match for the pattern anywhere in this string?”. You can also use REs to modify a string or to split it apart in various ways. Regular expression patterns are compiled into a series of bytecodes which are then executed by a matching engine written in C. For advanced use, it may be necessary to pay careful attention to how the engine will execute a given RE, and write the RE in a certain way in order to produce bytecode that runs faster. Optimization isn’t covered in this document, because it requires that you have a good understanding of the matching engine’s internals. The regular expression language is relatively small and restricted, so not all possible string processing tasks can be done using regular expressions. There are also tasks that *can* be done with regular expressions, but the expressions turn out to be very complicated. In these cases, you may be better off writing Python code to do the processing; while Python code will be slower than an elaborate regular expression, it will also probably be more understandable. Simple Patterns =============== We’ll start by learning about the simplest possible regular expressions. Since regular expressions are used to operate on strings, we’ll begin with the most common task: matching characters. For a detailed explanation of the computer science underlying regular expressions (deterministic and non-deterministic finite automata), you can refer to almost any textbook on writing compilers. Matching Characters ------------------- Most letters and characters will simply match themselves. For example, the regular expression "test" will match the string "test" exactly. (You can enable a case-insensitive mode that would let this RE match "Test" or "TEST" as well; more about this later.) There are exceptions to this rule; some characters are special *metacharacters*, and don’t match themselves. Instead, they signal that some out-of-the-ordinary thing should be matched, or they affect other portions of the RE by repeating them or changing their meaning. Much of this document is devoted to discussing various metacharacters and what they do. Here’s a complete list of the metacharacters; their meanings will be discussed in the rest of this HOWTO. . ^ $ * + ? { } [ ] \ | ( ) The first metacharacters we’ll look at are "[" and "]". They’re used for specifying a character class, which is a set of characters that you wish to match. Characters can be listed individually, or a range of characters can be indicated by giving two characters and separating them by a "'-'". For example, "[abc]" will match any of the characters "a", "b", or "c"; this is the same as "[a-c]", which uses a range to express the same set of characters. If you wanted to match only lowercase letters, your RE would be "[a-z]". Metacharacters (except "\") are not active inside classes. For example, "[akm$]" will match any of the characters "'a'", "'k'", "'m'", or "'$'"; "'$'" is usually a metacharacter, but inside a character class it’s stripped of its special nature. You can match the characters not listed within the class by *complementing* the set. This is indicated by including a "'^'" as the first character of the class. For example, "[^5]" will match any character except "'5'". If the caret appears elsewhere in a character class, it does not have special meaning. For example: "[5^]" will match either a "'5'" or a "'^'". Perhaps the most important metacharacter is the backslash, "\". As in Python string literals, the backslash can be followed by various characters to signal various special sequences. It’s also used to escape all the metacharacters so you can still match them in patterns; for example, if you need to match a "[" or "\", you can precede them with a backslash to remove their special meaning: "\[" or "\\". Some of the special sequences beginning with "'\'" represent predefined sets of characters that are often useful, such as the set of digits, the set of letters, or the set of anything that isn’t whitespace. Let’s take an example: "\w" matches any alphanumeric character. If the regex pattern is expressed in bytes, this is equivalent to the class "[a-zA-Z0-9_]". If the regex pattern is a string, "\w" will match all the characters marked as letters in the Unicode database provided by the "unicodedata" module. You can use the more restricted definition of "\w" in a string pattern by supplying the "re.ASCII" flag when compiling the regular expression. The following list of special sequences isn’t complete. For a complete list of sequences and expanded class definitions for Unicode string patterns, see the last part of Regular Expression Syntax in the Standard Library reference. In general, the Unicode versions match any character that’s in the appropriate category in the Unicode database. "\d" Matches any decimal digit; this is equivalent to the class "[0-9]". "\D" Matches any non-digit character; this is equivalent to the class "[^0-9]". "\s" Matches any whitespace character; this is equivalent to the class "[ \t\n\r\f\v]". "\S" Matches any non-whitespace character; this is equivalent to the class "[^ \t\n\r\f\v]". "\w" Matches any alphanumeric character; this is equivalent to the class "[a-zA-Z0-9_]". "\W" Matches any non-alphanumeric character; this is equivalent to the class "[^a-zA-Z0-9_]". These sequences can be included inside a character class. For example, "[\s,.]" is a character class that will match any whitespace character, or "','" or "'.'". The final metacharacter in this section is ".". It matches anything except a newline character, and there’s an alternate mode ("re.DOTALL") where it will match even a newline. "." is often used where you want to match “any character”. Repeating Things ---------------- Being able to match varying sets of characters is the first thing regular expressions can do that isn’t already possible with the methods available on strings. However, if that was the only additional capability of regexes, they wouldn’t be much of an advance. Another capability is that you can specify that portions of the RE must be repeated a certain number of times. The first metacharacter for repeating things that we’ll look at is "*". "*" doesn’t match the literal character "'*'"; instead, it specifies that the previous character can be matched zero or more times, instead of exactly once. For example, "ca*t" will match "'ct'" (0 "'a'" characters), "'cat'" (1 "'a'"), "'caaat'" (3 "'a'" characters), and so forth. Repetitions such as "*" are *greedy*; when repeating a RE, the matching engine will try to repeat it as many times as possible. If later portions of the pattern don’t match, the matching engine will then back up and try again with fewer repetitions. A step-by-step example will make this more obvious. Let’s consider the expression "a[bcd]*b". This matches the letter "'a'", zero or more letters from the class "[bcd]", and finally ends with a "'b'". Now imagine matching this RE against the string "'abcbd'". +--------+-------------+-----------------------------------+ | Step | Matched | Explanation | |========|=============|===================================| | 1 | "a" | The "a" in the RE matches. | +--------+-------------+-----------------------------------+ | 2 | "abcbd" | The engine matches "[bcd]*", | | | | going as far as it can, which is | | | | to the end of the string. | +--------+-------------+-----------------------------------+ | 3 | *Failure* | The engine tries to match "b", | | | | but the current position is at | | | | the end of the string, so it | | | | fails. | +--------+-------------+-----------------------------------+ | 4 | "abcb" | Back up, so that "[bcd]*" | | | | matches one less character. | +--------+-------------+-----------------------------------+ | 5 | *Failure* | Try "b" again, but the current | | | | position is at the last | | | | character, which is a "'d'". | +--------+-------------+-----------------------------------+ | 6 | "abc" | Back up again, so that "[bcd]*" | | | | is only matching "bc". | +--------+-------------+-----------------------------------+ | 6 | "abcb" | Try "b" again. This time the | | | | character at the current position | | | | is "'b'", so it succeeds. | +--------+-------------+-----------------------------------+ The end of the RE has now been reached, and it has matched "'abcb'". This demonstrates how the matching engine goes as far as it can at first, and if no match is found it will then progressively back up and retry the rest of the RE again and again. It will back up until it has tried zero matches for "[bcd]*", and if that subsequently fails, the engine will conclude that the string doesn’t match the RE at all. Another repeating metacharacter is "+", which matches one or more times. Pay careful attention to the difference between "*" and "+"; "*" matches *zero* or more times, so whatever’s being repeated may not be present at all, while "+" requires at least *one* occurrence. To use a similar example, "ca+t" will match "'cat'" (1 "'a'"), "'caaat'" (3 "'a'"s), but won’t match "'ct'". There are two more repeating qualifiers. The question mark character, "?", matches either once or zero times; you can think of it as marking something as being optional. For example, "home-?brew" matches either "'homebrew'" or "'home-brew'". The most complicated repeated qualifier is "{m,n}", where *m* and *n* are decimal integers. This qualifier means there must be at least *m* repetitions, and at most *n*. For example, "a/{1,3}b" will match "'a/b'", "'a//b'", and "'a///b'". It won’t match "'ab'", which has no slashes, or "'a////b'", which has four. You can omit either *m* or *n*; in that case, a reasonable value is assumed for the missing value. Omitting *m* is interpreted as a lower limit of 0, while omitting *n* results in an upper bound of infinity. Readers of a reductionist bent may notice that the three other qualifiers can all be expressed using this notation. "{0,}" is the same as "*", "{1,}" is equivalent to "+", and "{0,1}" is the same as "?". It’s better to use "*", "+", or "?" when you can, simply because they’re shorter and easier to read. Using Regular Expressions ========================= Now that we’ve looked at some simple regular expressions, how do we actually use them in Python? The "re" module provides an interface to the regular expression engine, allowing you to compile REs into objects and then perform matches with them. Compiling Regular Expressions ----------------------------- Regular expressions are compiled into pattern objects, which have methods for various operations such as searching for pattern matches or performing string substitutions. >>> import re >>> p = re.compile('ab*') >>> p re.compile('ab*') "re.compile()" also accepts an optional *flags* argument, used to enable various special features and syntax variations. We’ll go over the available settings later, but for now a single example will do: >>> p = re.compile('ab*', re.IGNORECASE) The RE is passed to "re.compile()" as a string. REs are handled as strings because regular expressions aren’t part of the core Python language, and no special syntax was created for expressing them. (There are applications that don’t need REs at all, so there’s no need to bloat the language specification by including them.) Instead, the "re" module is simply a C extension module included with Python, just like the "socket" or "zlib" modules. Putting REs in strings keeps the Python language simpler, but has one disadvantage which is the topic of the next section. The Backslash Plague -------------------- As stated earlier, regular expressions use the backslash character ("'\'") to indicate special forms or to allow special characters to be used without invoking their special meaning. This conflicts with Python’s usage of the same character for the same purpose in string literals. Let’s say you want to write a RE that matches the string "\section", which might be found in a LaTeX file. To figure out what to write in the program code, start with the desired string to be matched. Next, you must escape any backslashes and other metacharacters by preceding them with a backslash, resulting in the string "\\section". The resulting string that must be passed to "re.compile()" must be "\\section". However, to express this as a Python string literal, both backslashes must be escaped *again*. +---------------------+--------------------------------------------+ | Characters | Stage | |=====================|============================================| | "\section" | Text string to be matched | +---------------------+--------------------------------------------+ | "\\section" | Escaped backslash for "re.compile()" | +---------------------+--------------------------------------------+ | ""\\\\section"" | Escaped backslashes for a string literal | +---------------------+--------------------------------------------+ In short, to match a literal backslash, one has to write "'\\\\'" as the RE string, because the regular expression must be "\\", and each backslash must be expressed as "\\" inside a regular Python string literal. In REs that feature backslashes repeatedly, this leads to lots of repeated backslashes and makes the resulting strings difficult to understand. The solution is to use Python’s raw string notation for regular expressions; backslashes are not handled in any special way in a string literal prefixed with "'r'", so "r"\n"" is a two-character string containing "'\'" and "'n'", while ""\n"" is a one-character string containing a newline. Regular expressions will often be written in Python code using this raw string notation. In addition, special escape sequences that are valid in regular expressions, but not valid as Python string literals, now result in a "DeprecationWarning" and will eventually become a "SyntaxError", which means the sequences will be invalid if raw string notation or escaping the backslashes isn’t used. +---------------------+--------------------+ | Regular String | Raw string | |=====================|====================| | ""ab*"" | "r"ab*"" | +---------------------+--------------------+ | ""\\\\section"" | "r"\\section"" | +---------------------+--------------------+ | ""\\w+\\s+\\1"" | "r"\w+\s+\1"" | +---------------------+--------------------+ Performing Matches ------------------ Once you have an object representing a compiled regular expression, what do you do with it? Pattern objects have several methods and attributes. Only the most significant ones will be covered here; consult the "re" docs for a complete listing. +--------------------+-------------------------------------------------+ | Method/Attribute | Purpose | |====================|=================================================| | "match()" | Determine if the RE matches at the beginning of | | | the string. | +--------------------+-------------------------------------------------+ | "search()" | Scan through a string, looking for any location | | | where this RE matches. | +--------------------+-------------------------------------------------+ | "findall()" | Find all substrings where the RE matches, and | | | returns them as a list. | +--------------------+-------------------------------------------------+ | "finditer()" | Find all substrings where the RE matches, and | | | returns them as an *iterator*. | +--------------------+-------------------------------------------------+ "match()" and "search()" return "None" if no match can be found. If they’re successful, a match object instance is returned, containing information about the match: where it starts and ends, the substring it matched, and more. You can learn about this by interactively experimenting with the "re" module. If you have "tkinter" available, you may also want to look at Tools/demo/redemo.py, a demonstration program included with the Python distribution. It allows you to enter REs and strings, and displays whether the RE matches or fails. "redemo.py" can be quite useful when trying to debug a complicated RE. This HOWTO uses the standard Python interpreter for its examples. First, run the Python interpreter, import the "re" module, and compile a RE: >>> import re >>> p = re.compile('[a-z]+') >>> p re.compile('[a-z]+') Now, you can try matching various strings against the RE "[a-z]+". An empty string shouldn’t match at all, since "+" means ‘one or more repetitions’. "match()" should return "None" in this case, which will cause the interpreter to print no output. You can explicitly print the result of "match()" to make this clear. >>> p.match("") >>> print(p.match("")) None Now, let’s try it on a string that it should match, such as "tempo". In this case, "match()" will return a match object, so you should store the result in a variable for later use. >>> m = p.match('tempo') >>> m Now you can query the match object for information about the matching string. Match object instances also have several methods and attributes; the most important ones are: +--------------------+----------------------------------------------+ | Method/Attribute | Purpose | |====================|==============================================| | "group()" | Return the string matched by the RE | +--------------------+----------------------------------------------+ | "start()" | Return the starting position of the match | +--------------------+----------------------------------------------+ | "end()" | Return the ending position of the match | +--------------------+----------------------------------------------+ | "span()" | Return a tuple containing the (start, end) | | | positions of the match | +--------------------+----------------------------------------------+ Trying these methods will soon clarify their meaning: >>> m.group() 'tempo' >>> m.start(), m.end() (0, 5) >>> m.span() (0, 5) "group()" returns the substring that was matched by the RE. "start()" and "end()" return the starting and ending index of the match. "span()" returns both start and end indexes in a single tuple. Since the "match()" method only checks if the RE matches at the start of a string, "start()" will always be zero. However, the "search()" method of patterns scans through the string, so the match may not start at zero in that case. >>> print(p.match('::: message')) None >>> m = p.search('::: message'); print(m) >>> m.group() 'message' >>> m.span() (4, 11) In actual programs, the most common style is to store the match object in a variable, and then check if it was "None". This usually looks like: p = re.compile( ... ) m = p.match( 'string goes here' ) if m: print('Match found: ', m.group()) else: print('No match') Two pattern methods return all of the matches for a pattern. "findall()" returns a list of matching strings: >>> p = re.compile(r'\d+') >>> p.findall('12 drummers drumming, 11 pipers piping, 10 lords a-leaping') ['12', '11', '10'] The "r" prefix, making the literal a raw string literal, is needed in this example because escape sequences in a normal “cooked” string literal that are not recognized by Python, as opposed to regular expressions, now result in a "DeprecationWarning" and will eventually become a "SyntaxError". See The Backslash Plague. "findall()" has to create the entire list before it can be returned as the result. The "finditer()" method returns a sequence of match object instances as an *iterator*: >>> iterator = p.finditer('12 drummers drumming, 11 ... 10 ...') >>> iterator >>> for match in iterator: ... print(match.span()) ... (0, 2) (22, 24) (29, 31) Module-Level Functions ---------------------- You don’t have to create a pattern object and call its methods; the "re" module also provides top-level functions called "match()", "search()", "findall()", "sub()", and so forth. These functions take the same arguments as the corresponding pattern method with the RE string added as the first argument, and still return either "None" or a match object instance. >>> print(re.match(r'From\s+', 'Fromage amk')) None >>> re.match(r'From\s+', 'From amk Thu May 14 19:12:10 1998') Under the hood, these functions simply create a pattern object for you and call the appropriate method on it. They also store the compiled object in a cache, so future calls using the same RE won’t need to parse the pattern again and again. Should you use these module-level functions, or should you get the pattern and call its methods yourself? If you’re accessing a regex within a loop, pre-compiling it will save a few function calls. Outside of loops, there’s not much difference thanks to the internal cache. Compilation Flags ----------------- Compilation flags let you modify some aspects of how regular expressions work. Flags are available in the "re" module under two names, a long name such as "IGNORECASE" and a short, one-letter form such as "I". (If you’re familiar with Perl’s pattern modifiers, the one-letter forms use the same letters; the short form of "re.VERBOSE" is "re.X", for example.) Multiple flags can be specified by bitwise OR-ing them; "re.I | re.M" sets both the "I" and "M" flags, for example. Here’s a table of the available flags, followed by a more detailed explanation of each one. +-----------------------------------+----------------------------------------------+ | Flag | Meaning | |===================================|==============================================| | "ASCII", "A" | Makes several escapes like "\w", "\b", "\s" | | | and "\d" match only on ASCII characters with | | | the respective property. | +-----------------------------------+----------------------------------------------+ | "DOTALL", "S" | Make "." match any character, including | | | newlines. | +-----------------------------------+----------------------------------------------+ | "IGNORECASE", "I" | Do case-insensitive matches. | +-----------------------------------+----------------------------------------------+ | "LOCALE", "L" | Do a locale-aware match. | +-----------------------------------+----------------------------------------------+ | "MULTILINE", "M" | Multi-line matching, affecting "^" and "$". | +-----------------------------------+----------------------------------------------+ | "VERBOSE", "X" (for ‘extended’) | Enable verbose REs, which can be organized | | | more cleanly and understandably. | +-----------------------------------+----------------------------------------------+ I IGNORECASE Perform case-insensitive matching; character class and literal strings will match letters by ignoring case. For example, "[A-Z]" will match lowercase letters, too. Full Unicode matching also works unless the "ASCII" flag is used to disable non-ASCII matches. When the Unicode patterns "[a-z]" or "[A-Z]" are used in combination with the "IGNORECASE" flag, they will match the 52 ASCII letters and 4 additional non-ASCII letters: ‘İ’ (U+0130, Latin capital letter I with dot above), ‘ı’ (U+0131, Latin small letter dotless i), ‘ſ’ (U+017F, Latin small letter long s) and ‘K’ (U+212A, Kelvin sign). "Spam" will match "'Spam'", "'spam'", "'spAM'", or "'ſpam'" (the latter is matched only in Unicode mode). This lowercasing doesn’t take the current locale into account; it will if you also set the "LOCALE" flag. L LOCALE Make "\w", "\W", "\b", "\B" and case-insensitive matching dependent on the current locale instead of the Unicode database. Locales are a feature of the C library intended to help in writing programs that take account of language differences. For example, if you’re processing encoded French text, you’d want to be able to write "\w+" to match words, but "\w" only matches the character class "[A-Za-z]" in bytes patterns; it won’t match bytes corresponding to "é" or "ç". If your system is configured properly and a French locale is selected, certain C functions will tell the program that the byte corresponding to "é" should also be considered a letter. Setting the "LOCALE" flag when compiling a regular expression will cause the resulting compiled object to use these C functions for "\w"; this is slower, but also enables "\w+" to match French words as you’d expect. The use of this flag is discouraged in Python 3 as the locale mechanism is very unreliable, it only handles one “culture” at a time, and it only works with 8-bit locales. Unicode matching is already enabled by default in Python 3 for Unicode (str) patterns, and it is able to handle different locales/languages. M MULTILINE ("^" and "$" haven’t been explained yet; they’ll be introduced in section More Metacharacters.) Usually "^" matches only at the beginning of the string, and "$" matches only at the end of the string and immediately before the newline (if any) at the end of the string. When this flag is specified, "^" matches at the beginning of the string and at the beginning of each line within the string, immediately following each newline. Similarly, the "$" metacharacter matches either at the end of the string and at the end of each line (immediately preceding each newline). S DOTALL Makes the "'.'" special character match any character at all, including a newline; without this flag, "'.'" will match anything *except* a newline. A ASCII Make "\w", "\W", "\b", "\B", "\s" and "\S" perform ASCII-only matching instead of full Unicode matching. This is only meaningful for Unicode patterns, and is ignored for byte patterns. X VERBOSE This flag allows you to write regular expressions that are more readable by granting you more flexibility in how you can format them. When this flag has been specified, whitespace within the RE string is ignored, except when the whitespace is in a character class or preceded by an unescaped backslash; this lets you organize and indent the RE more clearly. This flag also lets you put comments within a RE that will be ignored by the engine; comments are marked by a "'#'" that’s neither in a character class or preceded by an unescaped backslash. For example, here’s a RE that uses "re.VERBOSE"; see how much easier it is to read? charref = re.compile(r""" &[#] # Start of a numeric entity reference ( 0[0-7]+ # Octal form | [0-9]+ # Decimal form | x[0-9a-fA-F]+ # Hexadecimal form ) ; # Trailing semicolon """, re.VERBOSE) Without the verbose setting, the RE would look like this: charref = re.compile("&#(0[0-7]+" "|[0-9]+" "|x[0-9a-fA-F]+);") In the above example, Python’s automatic concatenation of string literals has been used to break up the RE into smaller pieces, but it’s still more difficult to understand than the version using "re.VERBOSE". More Pattern Power ================== So far we’ve only covered a part of the features of regular expressions. In this section, we’ll cover some new metacharacters, and how to use groups to retrieve portions of the text that was matched. More Metacharacters ------------------- There are some metacharacters that we haven’t covered yet. Most of them will be covered in this section. Some of the remaining metacharacters to be discussed are *zero-width assertions*. They don’t cause the engine to advance through the string; instead, they consume no characters at all, and simply succeed or fail. For example, "\b" is an assertion that the current position is located at a word boundary; the position isn’t changed by the "\b" at all. This means that zero-width assertions should never be repeated, because if they match once at a given location, they can obviously be matched an infinite number of times. "|" Alternation, or the “or” operator. If *A* and *B* are regular expressions, "A|B" will match any string that matches either *A* or *B*. "|" has very low precedence in order to make it work reasonably when you’re alternating multi-character strings. "Crow|Servo" will match either "'Crow'" or "'Servo'", not "'Cro'", a "'w'" or an "'S'", and "'ervo'". To match a literal "'|'", use "\|", or enclose it inside a character class, as in "[|]". "^" Matches at the beginning of lines. Unless the "MULTILINE" flag has been set, this will only match at the beginning of the string. In "MULTILINE" mode, this also matches immediately after each newline within the string. For example, if you wish to match the word "From" only at the beginning of a line, the RE to use is "^From". >>> print(re.search('^From', 'From Here to Eternity')) >>> print(re.search('^From', 'Reciting From Memory')) None To match a literal "'^'", use "\^". "$" Matches at the end of a line, which is defined as either the end of the string, or any location followed by a newline character. >>> print(re.search('}$', '{block}')) >>> print(re.search('}$', '{block} ')) None >>> print(re.search('}$', '{block}\n')) To match a literal "'$'", use "\$" or enclose it inside a character class, as in "[$]". "\A" Matches only at the start of the string. When not in "MULTILINE" mode, "\A" and "^" are effectively the same. In "MULTILINE" mode, they’re different: "\A" still matches only at the beginning of the string, but "^" may match at any location inside the string that follows a newline character. "\Z" Matches only at the end of the string. "\b" Word boundary. This is a zero-width assertion that matches only at the beginning or end of a word. A word is defined as a sequence of alphanumeric characters, so the end of a word is indicated by whitespace or a non-alphanumeric character. The following example matches "class" only when it’s a complete word; it won’t match when it’s contained inside another word. >>> p = re.compile(r'\bclass\b') >>> print(p.search('no class at all')) >>> print(p.search('the declassified algorithm')) None >>> print(p.search('one subclass is')) None There are two subtleties you should remember when using this special sequence. First, this is the worst collision between Python’s string literals and regular expression sequences. In Python’s string literals, "\b" is the backspace character, ASCII value 8. If you’re not using raw strings, then Python will convert the "\b" to a backspace, and your RE won’t match as you expect it to. The following example looks the same as our previous RE, but omits the "'r'" in front of the RE string. >>> p = re.compile('\bclass\b') >>> print(p.search('no class at all')) None >>> print(p.search('\b' + 'class' + '\b')) Second, inside a character class, where there’s no use for this assertion, "\b" represents the backspace character, for compatibility with Python’s string literals. "\B" Another zero-width assertion, this is the opposite of "\b", only matching when the current position is not at a word boundary. Grouping -------- Frequently you need to obtain more information than just whether the RE matched or not. Regular expressions are often used to dissect strings by writing a RE divided into several subgroups which match different components of interest. For example, an RFC-822 header line is divided into a header name and a value, separated by a "':'", like this: From: author@example.com User-Agent: Thunderbird 1.5.0.9 (X11/20061227) MIME-Version: 1.0 To: editor@example.com This can be handled by writing a regular expression which matches an entire header line, and has one group which matches the header name, and another group which matches the header’s value. Groups are marked by the "'('", "')'" metacharacters. "'('" and "')'" have much the same meaning as they do in mathematical expressions; they group together the expressions contained inside them, and you can repeat the contents of a group with a repeating qualifier, such as "*", "+", "?", or "{m,n}". For example, "(ab)*" will match zero or more repetitions of "ab". >>> p = re.compile('(ab)*') >>> print(p.match('ababababab').span()) (0, 10) Groups indicated with "'('", "')'" also capture the starting and ending index of the text that they match; this can be retrieved by passing an argument to "group()", "start()", "end()", and "span()". Groups are numbered starting with 0. Group 0 is always present; it’s the whole RE, so match object methods all have group 0 as their default argument. Later we’ll see how to express groups that don’t capture the span of text that they match. >>> p = re.compile('(a)b') >>> m = p.match('ab') >>> m.group() 'ab' >>> m.group(0) 'ab' Subgroups are numbered from left to right, from 1 upward. Groups can be nested; to determine the number, just count the opening parenthesis characters, going from left to right. >>> p = re.compile('(a(b)c)d') >>> m = p.match('abcd') >>> m.group(0) 'abcd' >>> m.group(1) 'abc' >>> m.group(2) 'b' "group()" can be passed multiple group numbers at a time, in which case it will return a tuple containing the corresponding values for those groups. >>> m.group(2,1,2) ('b', 'abc', 'b') The "groups()" method returns a tuple containing the strings for all the subgroups, from 1 up to however many there are. >>> m.groups() ('abc', 'b') Backreferences in a pattern allow you to specify that the contents of an earlier capturing group must also be found at the current location in the string. For example, "\1" will succeed if the exact contents of group 1 can be found at the current position, and fails otherwise. Remember that Python’s string literals also use a backslash followed by numbers to allow including arbitrary characters in a string, so be sure to use a raw string when incorporating backreferences in a RE. For example, the following RE detects doubled words in a string. >>> p = re.compile(r'\b(\w+)\s+\1\b') >>> p.search('Paris in the the spring').group() 'the the' Backreferences like this aren’t often useful for just searching through a string — there are few text formats which repeat data in this way — but you’ll soon find out that they’re *very* useful when performing string substitutions. Non-capturing and Named Groups ------------------------------ Elaborate REs may use many groups, both to capture substrings of interest, and to group and structure the RE itself. In complex REs, it becomes difficult to keep track of the group numbers. There are two features which help with this problem. Both of them use a common syntax for regular expression extensions, so we’ll look at that first. Perl 5 is well known for its powerful additions to standard regular expressions. For these new features the Perl developers couldn’t choose new single-keystroke metacharacters or new special sequences beginning with "\" without making Perl’s regular expressions confusingly different from standard REs. If they chose "&" as a new metacharacter, for example, old expressions would be assuming that "&" was a regular character and wouldn’t have escaped it by writing "\&" or "[&]". The solution chosen by the Perl developers was to use "(?...)" as the extension syntax. "?" immediately after a parenthesis was a syntax error because the "?" would have nothing to repeat, so this didn’t introduce any compatibility problems. The characters immediately after the "?" indicate what extension is being used, so "(?=foo)" is one thing (a positive lookahead assertion) and "(?:foo)" is something else (a non-capturing group containing the subexpression "foo"). Python supports several of Perl’s extensions and adds an extension syntax to Perl’s extension syntax. If the first character after the question mark is a "P", you know that it’s an extension that’s specific to Python. Now that we’ve looked at the general extension syntax, we can return to the features that simplify working with groups in complex REs. Sometimes you’ll want to use a group to denote a part of a regular expression, but aren’t interested in retrieving the group’s contents. You can make this fact explicit by using a non-capturing group: "(?:...)", where you can replace the "..." with any other regular expression. >>> m = re.match("([abc])+", "abc") >>> m.groups() ('c',) >>> m = re.match("(?:[abc])+", "abc") >>> m.groups() () Except for the fact that you can’t retrieve the contents of what the group matched, a non-capturing group behaves exactly the same as a capturing group; you can put anything inside it, repeat it with a repetition metacharacter such as "*", and nest it within other groups (capturing or non-capturing). "(?:...)" is particularly useful when modifying an existing pattern, since you can add new groups without changing how all the other groups are numbered. It should be mentioned that there’s no performance difference in searching between capturing and non-capturing groups; neither form is any faster than the other. A more significant feature is named groups: instead of referring to them by numbers, groups can be referenced by a name. The syntax for a named group is one of the Python-specific extensions: "(?P...)". *name* is, obviously, the name of the group. Named groups behave exactly like capturing groups, and additionally associate a name with a group. The match object methods that deal with capturing groups all accept either integers that refer to the group by number or strings that contain the desired group’s name. Named groups are still given numbers, so you can retrieve information about a group in two ways: >>> p = re.compile(r'(?P\b\w+\b)') >>> m = p.search( '(((( Lots of punctuation )))' ) >>> m.group('word') 'Lots' >>> m.group(1) 'Lots' Additionally, you can retrieve named groups as a dictionary with "groupdict()": >>> m = re.match(r'(?P\w+) (?P\w+)', 'Jane Doe') >>> m.groupdict() {'first': 'Jane', 'last': 'Doe'} Named groups are handy because they let you use easily-remembered names, instead of having to remember numbers. Here’s an example RE from the "imaplib" module: InternalDate = re.compile(r'INTERNALDATE "' r'(?P[ 123][0-9])-(?P[A-Z][a-z][a-z])-' r'(?P[0-9][0-9][0-9][0-9])' r' (?P[0-9][0-9]):(?P[0-9][0-9]):(?P[0-9][0-9])' r' (?P[-+])(?P[0-9][0-9])(?P[0-9][0-9])' r'"') It’s obviously much easier to retrieve "m.group('zonem')", instead of having to remember to retrieve group 9. The syntax for backreferences in an expression such as "(...)\1" refers to the number of the group. There’s naturally a variant that uses the group name instead of the number. This is another Python extension: "(?P=name)" indicates that the contents of the group called *name* should again be matched at the current point. The regular expression for finding doubled words, "\b(\w+)\s+\1\b" can also be written as "\b(?P\w+)\s+(?P=word)\b": >>> p = re.compile(r'\b(?P\w+)\s+(?P=word)\b') >>> p.search('Paris in the the spring').group() 'the the' Lookahead Assertions -------------------- Another zero-width assertion is the lookahead assertion. Lookahead assertions are available in both positive and negative form, and look like this: "(?=...)" Positive lookahead assertion. This succeeds if the contained regular expression, represented here by "...", successfully matches at the current location, and fails otherwise. But, once the contained expression has been tried, the matching engine doesn’t advance at all; the rest of the pattern is tried right where the assertion started. "(?!...)" Negative lookahead assertion. This is the opposite of the positive assertion; it succeeds if the contained expression *doesn’t* match at the current position in the string. To make this concrete, let’s look at a case where a lookahead is useful. Consider a simple pattern to match a filename and split it apart into a base name and an extension, separated by a ".". For example, in "news.rc", "news" is the base name, and "rc" is the filename’s extension. The pattern to match this is quite simple: ".*[.].*$" Notice that the "." needs to be treated specially because it’s a metacharacter, so it’s inside a character class to only match that specific character. Also notice the trailing "$"; this is added to ensure that all the rest of the string must be included in the extension. This regular expression matches "foo.bar" and "autoexec.bat" and "sendmail.cf" and "printers.conf". Now, consider complicating the problem a bit; what if you want to match filenames where the extension is not "bat"? Some incorrect attempts: ".*[.][^b].*$" The first attempt above tries to exclude "bat" by requiring that the first character of the extension is not a "b". This is wrong, because the pattern also doesn’t match "foo.bar". ".*[.]([^b]..|.[^a].|..[^t])$" The expression gets messier when you try to patch up the first solution by requiring one of the following cases to match: the first character of the extension isn’t "b"; the second character isn’t "a"; or the third character isn’t "t". This accepts "foo.bar" and rejects "autoexec.bat", but it requires a three-letter extension and won’t accept a filename with a two-letter extension such as "sendmail.cf". We’ll complicate the pattern again in an effort to fix it. ".*[.]([^b].?.?|.[^a]?.?|..?[^t]?)$" In the third attempt, the second and third letters are all made optional in order to allow matching extensions shorter than three characters, such as "sendmail.cf". The pattern’s getting really complicated now, which makes it hard to read and understand. Worse, if the problem changes and you want to exclude both "bat" and "exe" as extensions, the pattern would get even more complicated and confusing. A negative lookahead cuts through all this confusion: ".*[.](?!bat$)[^.]*$" The negative lookahead means: if the expression "bat" doesn’t match at this point, try the rest of the pattern; if "bat$" does match, the whole pattern will fail. The trailing "$" is required to ensure that something like "sample.batch", where the extension only starts with "bat", will be allowed. The "[^.]*" makes sure that the pattern works when there are multiple dots in the filename. Excluding another filename extension is now easy; simply add it as an alternative inside the assertion. The following pattern excludes filenames that end in either "bat" or "exe": ".*[.](?!bat$|exe$)[^.]*$" Modifying Strings ================= Up to this point, we’ve simply performed searches against a static string. Regular expressions are also commonly used to modify strings in various ways, using the following pattern methods: +--------------------+-------------------------------------------------+ | Method/Attribute | Purpose | |====================|=================================================| | "split()" | Split the string into a list, splitting it | | | wherever the RE matches | +--------------------+-------------------------------------------------+ | "sub()" | Find all substrings where the RE matches, and | | | replace them with a different string | +--------------------+-------------------------------------------------+ | "subn()" | Does the same thing as "sub()", but returns | | | the new string and the number of replacements | +--------------------+-------------------------------------------------+ Splitting Strings ----------------- The "split()" method of a pattern splits a string apart wherever the RE matches, returning a list of the pieces. It’s similar to the "split()" method of strings but provides much more generality in the delimiters that you can split by; string "split()" only supports splitting by whitespace or by a fixed string. As you’d expect, there’s a module-level "re.split()" function, too. .split(string[, maxsplit=0]) Split *string* by the matches of the regular expression. If capturing parentheses are used in the RE, then their contents will also be returned as part of the resulting list. If *maxsplit* is nonzero, at most *maxsplit* splits are performed. You can limit the number of splits made, by passing a value for *maxsplit*. When *maxsplit* is nonzero, at most *maxsplit* splits will be made, and the remainder of the string is returned as the final element of the list. In the following example, the delimiter is any sequence of non-alphanumeric characters. >>> p = re.compile(r'\W+') >>> p.split('This is a test, short and sweet, of split().') ['This', 'is', 'a', 'test', 'short', 'and', 'sweet', 'of', 'split', ''] >>> p.split('This is a test, short and sweet, of split().', 3) ['This', 'is', 'a', 'test, short and sweet, of split().'] Sometimes you’re not only interested in what the text between delimiters is, but also need to know what the delimiter was. If capturing parentheses are used in the RE, then their values are also returned as part of the list. Compare the following calls: >>> p = re.compile(r'\W+') >>> p2 = re.compile(r'(\W+)') >>> p.split('This... is a test.') ['This', 'is', 'a', 'test', ''] >>> p2.split('This... is a test.') ['This', '... ', 'is', ' ', 'a', ' ', 'test', '.', ''] The module-level function "re.split()" adds the RE to be used as the first argument, but is otherwise the same. >>> re.split(r'[\W]+', 'Words, words, words.') ['Words', 'words', 'words', ''] >>> re.split(r'([\W]+)', 'Words, words, words.') ['Words', ', ', 'words', ', ', 'words', '.', ''] >>> re.split(r'[\W]+', 'Words, words, words.', 1) ['Words', 'words, words.'] Search and Replace ------------------ Another common task is to find all the matches for a pattern, and replace them with a different string. The "sub()" method takes a replacement value, which can be either a string or a function, and the string to be processed. .sub(replacement, string[, count=0]) Returns the string obtained by replacing the leftmost non- overlapping occurrences of the RE in *string* by the replacement *replacement*. If the pattern isn’t found, *string* is returned unchanged. The optional argument *count* is the maximum number of pattern occurrences to be replaced; *count* must be a non-negative integer. The default value of 0 means to replace all occurrences. Here’s a simple example of using the "sub()" method. It replaces colour names with the word "colour": >>> p = re.compile('(blue|white|red)') >>> p.sub('colour', 'blue socks and red shoes') 'colour socks and colour shoes' >>> p.sub('colour', 'blue socks and red shoes', count=1) 'colour socks and red shoes' The "subn()" method does the same work, but returns a 2-tuple containing the new string value and the number of replacements that were performed: >>> p = re.compile('(blue|white|red)') >>> p.subn('colour', 'blue socks and red shoes') ('colour socks and colour shoes', 2) >>> p.subn('colour', 'no colours at all') ('no colours at all', 0) Empty matches are replaced only when they’re not adjacent to a previous empty match. >>> p = re.compile('x*') >>> p.sub('-', 'abxd') '-a-b--d-' If *replacement* is a string, any backslash escapes in it are processed. That is, "\n" is converted to a single newline character, "\r" is converted to a carriage return, and so forth. Unknown escapes such as "\&" are left alone. Backreferences, such as "\6", are replaced with the substring matched by the corresponding group in the RE. This lets you incorporate portions of the original text in the resulting replacement string. This example matches the word "section" followed by a string enclosed in "{", "}", and changes "section" to "subsection": >>> p = re.compile('section{ ( [^}]* ) }', re.VERBOSE) >>> p.sub(r'subsection{\1}','section{First} section{second}') 'subsection{First} subsection{second}' There’s also a syntax for referring to named groups as defined by the "(?P...)" syntax. "\g" will use the substring matched by the group named "name", and "\g" uses the corresponding group number. "\g<2>" is therefore equivalent to "\2", but isn’t ambiguous in a replacement string such as "\g<2>0". ("\20" would be interpreted as a reference to group 20, not a reference to group 2 followed by the literal character "'0'".) The following substitutions are all equivalent, but use all three variations of the replacement string. >>> p = re.compile('section{ (?P [^}]* ) }', re.VERBOSE) >>> p.sub(r'subsection{\1}','section{First}') 'subsection{First}' >>> p.sub(r'subsection{\g<1>}','section{First}') 'subsection{First}' >>> p.sub(r'subsection{\g}','section{First}') 'subsection{First}' *replacement* can also be a function, which gives you even more control. If *replacement* is a function, the function is called for every non-overlapping occurrence of *pattern*. On each call, the function is passed a match object argument for the match and can use this information to compute the desired replacement string and return it. In the following example, the replacement function translates decimals into hexadecimal: >>> def hexrepl(match): ... "Return the hex string for a decimal number" ... value = int(match.group()) ... return hex(value) ... >>> p = re.compile(r'\d+') >>> p.sub(hexrepl, 'Call 65490 for printing, 49152 for user code.') 'Call 0xffd2 for printing, 0xc000 for user code.' When using the module-level "re.sub()" function, the pattern is passed as the first argument. The pattern may be provided as an object or as a string; if you need to specify regular expression flags, you must either use a pattern object as the first parameter, or use embedded modifiers in the pattern string, e.g. "sub("(?i)b+", "x", "bbbb BBBB")" returns "'x x'". Common Problems =============== Regular expressions are a powerful tool for some applications, but in some ways their behaviour isn’t intuitive and at times they don’t behave the way you may expect them to. This section will point out some of the most common pitfalls. Use String Methods ------------------ Sometimes using the "re" module is a mistake. If you’re matching a fixed string, or a single character class, and you’re not using any "re" features such as the "IGNORECASE" flag, then the full power of regular expressions may not be required. Strings have several methods for performing operations with fixed strings and they’re usually much faster, because the implementation is a single small C loop that’s been optimized for the purpose, instead of the large, more generalized regular expression engine. One example might be replacing a single fixed string with another one; for example, you might replace "word" with "deed". "re.sub()" seems like the function to use for this, but consider the "replace()" method. Note that "replace()" will also replace "word" inside words, turning "swordfish" into "sdeedfish", but the naive RE "word" would have done that, too. (To avoid performing the substitution on parts of words, the pattern would have to be "\bword\b", in order to require that "word" have a word boundary on either side. This takes the job beyond "replace()"’s abilities.) Another common task is deleting every occurrence of a single character from a string or replacing it with another single character. You might do this with something like "re.sub('\n', ' ', S)", but "translate()" is capable of doing both tasks and will be faster than any regular expression operation can be. In short, before turning to the "re" module, consider whether your problem can be solved with a faster and simpler string method. match() versus search() ----------------------- The "match()" function only checks if the RE matches at the beginning of the string while "search()" will scan forward through the string for a match. It’s important to keep this distinction in mind. Remember, "match()" will only report a successful match which will start at 0; if the match wouldn’t start at zero, "match()" will *not* report it. >>> print(re.match('super', 'superstition').span()) (0, 5) >>> print(re.match('super', 'insuperable')) None On the other hand, "search()" will scan forward through the string, reporting the first match it finds. >>> print(re.search('super', 'superstition').span()) (0, 5) >>> print(re.search('super', 'insuperable').span()) (2, 7) Sometimes you’ll be tempted to keep using "re.match()", and just add ".*" to the front of your RE. Resist this temptation and use "re.search()" instead. The regular expression compiler does some analysis of REs in order to speed up the process of looking for a match. One such analysis figures out what the first character of a match must be; for example, a pattern starting with "Crow" must match starting with a "'C'". The analysis lets the engine quickly scan through the string looking for the starting character, only trying the full match if a "'C'" is found. Adding ".*" defeats this optimization, requiring scanning to the end of the string and then backtracking to find a match for the rest of the RE. Use "re.search()" instead. Greedy versus Non-Greedy ------------------------ When repeating a regular expression, as in "a*", the resulting action is to consume as much of the pattern as possible. This fact often bites you when you’re trying to match a pair of balanced delimiters, such as the angle brackets surrounding an HTML tag. The naive pattern for matching a single HTML tag doesn’t work because of the greedy nature of ".*". >>> s = 'Title' >>> len(s) 32 >>> print(re.match('<.*>', s).span()) (0, 32) >>> print(re.match('<.*>', s).group()) Title The RE matches the "'<'" in "''", and the ".*" consumes the rest of the string. There’s still more left in the RE, though, and the ">" can’t match at the end of the string, so the regular expression engine has to backtrack character by character until it finds a match for the ">". The final match extends from the "'<'" in "''" to the "'>'" in "''", which isn’t what you want. In this case, the solution is to use the non-greedy qualifiers "*?", "+?", "??", or "{m,n}?", which match as *little* text as possible. In the above example, the "'>'" is tried immediately after the first "'<'" matches, and when it fails, the engine advances a character at a time, retrying the "'>'" at every step. This produces just the right result: >>> print(re.match('<.*?>', s).group()) (Note that parsing HTML or XML with regular expressions is painful. Quick-and-dirty patterns will handle common cases, but HTML and XML have special cases that will break the obvious regular expression; by the time you’ve written a regular expression that handles all of the possible cases, the patterns will be *very* complicated. Use an HTML or XML parser module for such tasks.) Using re.VERBOSE ---------------- By now you’ve probably noticed that regular expressions are a very compact notation, but they’re not terribly readable. REs of moderate complexity can become lengthy collections of backslashes, parentheses, and metacharacters, making them difficult to read and understand. For such REs, specifying the "re.VERBOSE" flag when compiling the regular expression can be helpful, because it allows you to format the regular expression more clearly. The "re.VERBOSE" flag has several effects. Whitespace in the regular expression that *isn’t* inside a character class is ignored. This means that an expression such as "dog | cat" is equivalent to the less readable "dog|cat", but "[a b]" will still match the characters "'a'", "'b'", or a space. In addition, you can also put comments inside a RE; comments extend from a "#" character to the next newline. When used with triple-quoted strings, this enables REs to be formatted more neatly: pat = re.compile(r""" \s* # Skip leading whitespace (?P
[^:]+) # Header name \s* : # Whitespace, and a colon (?P.*?) # The header's value -- *? used to # lose the following trailing whitespace \s*$ # Trailing whitespace to end-of-line """, re.VERBOSE) This is far more readable than: pat = re.compile(r"\s*(?P
[^:]+)\s*:(?P.*?)\s*$") Feedback ======== Regular expressions are a complicated topic. Did this document help you understand them? Were there parts that were unclear, or Problems you encountered that weren’t covered here? If so, please send suggestions for improvements to the author. The most complete book on regular expressions is almost certainly Jeffrey Friedl’s Mastering Regular Expressions, published by O’Reilly. Unfortunately, it exclusively concentrates on Perl and Java’s flavours of regular expressions, and doesn’t contain any Python material at all, so it won’t be useful as a reference for programming in Python. (The first edition covered Python’s now-removed "regex" module, which won’t help you much.) Consider checking it out from your library. Socket Programming HOWTO ************************ Author: Gordon McMillan Abstract ^^^^^^^^ Sockets are used nearly everywhere, but are one of the most severely misunderstood technologies around. This is a 10,000 foot overview of sockets. It’s not really a tutorial - you’ll still have work to do in getting things operational. It doesn’t cover the fine points (and there are a lot of them), but I hope it will give you enough background to begin using them decently. Sockets ======= I’m only going to talk about INET (i.e. IPv4) sockets, but they account for at least 99% of the sockets in use. And I’ll only talk about STREAM (i.e. TCP) sockets - unless you really know what you’re doing (in which case this HOWTO isn’t for you!), you’ll get better behavior and performance from a STREAM socket than anything else. I will try to clear up the mystery of what a socket is, as well as some hints on how to work with blocking and non-blocking sockets. But I’ll start by talking about blocking sockets. You’ll need to know how they work before dealing with non-blocking sockets. Part of the trouble with understanding these things is that “socket” can mean a number of subtly different things, depending on context. So first, let’s make a distinction between a “client” socket - an endpoint of a conversation, and a “server” socket, which is more like a switchboard operator. The client application (your browser, for example) uses “client” sockets exclusively; the web server it’s talking to uses both “server” sockets and “client” sockets. History ------- Of the various forms of IPC (Inter Process Communication), sockets are by far the most popular. On any given platform, there are likely to be other forms of IPC that are faster, but for cross-platform communication, sockets are about the only game in town. They were invented in Berkeley as part of the BSD flavor of Unix. They spread like wildfire with the Internet. With good reason — the combination of sockets with INET makes talking to arbitrary machines around the world unbelievably easy (at least compared to other schemes). Creating a Socket ================= Roughly speaking, when you clicked on the link that brought you to this page, your browser did something like the following: # create an INET, STREAMing socket s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) # now connect to the web server on port 80 - the normal http port s.connect(("www.python.org", 80)) When the "connect" completes, the socket "s" can be used to send in a request for the text of the page. The same socket will read the reply, and then be destroyed. That’s right, destroyed. Client sockets are normally only used for one exchange (or a small set of sequential exchanges). What happens in the web server is a bit more complex. First, the web server creates a “server socket”: # create an INET, STREAMing socket serversocket = socket.socket(socket.AF_INET, socket.SOCK_STREAM) # bind the socket to a public host, and a well-known port serversocket.bind((socket.gethostname(), 80)) # become a server socket serversocket.listen(5) A couple things to notice: we used "socket.gethostname()" so that the socket would be visible to the outside world. If we had used "s.bind(('localhost', 80))" or "s.bind(('127.0.0.1', 80))" we would still have a “server” socket, but one that was only visible within the same machine. "s.bind(('', 80))" specifies that the socket is reachable by any address the machine happens to have. A second thing to note: low number ports are usually reserved for “well known” services (HTTP, SNMP etc). If you’re playing around, use a nice high number (4 digits). Finally, the argument to "listen" tells the socket library that we want it to queue up as many as 5 connect requests (the normal max) before refusing outside connections. If the rest of the code is written properly, that should be plenty. Now that we have a “server” socket, listening on port 80, we can enter the mainloop of the web server: while True: # accept connections from outside (clientsocket, address) = serversocket.accept() # now do something with the clientsocket # in this case, we'll pretend this is a threaded server ct = client_thread(clientsocket) ct.run() There’s actually 3 general ways in which this loop could work - dispatching a thread to handle "clientsocket", create a new process to handle "clientsocket", or restructure this app to use non-blocking sockets, and multiplex between our “server” socket and any active "clientsocket"s using "select". More about that later. The important thing to understand now is this: this is *all* a “server” socket does. It doesn’t send any data. It doesn’t receive any data. It just produces “client” sockets. Each "clientsocket" is created in response to some *other* “client” socket doing a "connect()" to the host and port we’re bound to. As soon as we’ve created that "clientsocket", we go back to listening for more connections. The two “clients” are free to chat it up - they are using some dynamically allocated port which will be recycled when the conversation ends. IPC --- If you need fast IPC between two processes on one machine, you should look into pipes or shared memory. If you do decide to use AF_INET sockets, bind the “server” socket to "'localhost'". On most platforms, this will take a shortcut around a couple of layers of network code and be quite a bit faster. See also: The "multiprocessing" integrates cross-platform IPC into a higher- level API. Using a Socket ============== The first thing to note, is that the web browser’s “client” socket and the web server’s “client” socket are identical beasts. That is, this is a “peer to peer” conversation. Or to put it another way, *as the designer, you will have to decide what the rules of etiquette are for a conversation*. Normally, the "connect"ing socket starts the conversation, by sending in a request, or perhaps a signon. But that’s a design decision - it’s not a rule of sockets. Now there are two sets of verbs to use for communication. You can use "send" and "recv", or you can transform your client socket into a file-like beast and use "read" and "write". The latter is the way Java presents its sockets. I’m not going to talk about it here, except to warn you that you need to use "flush" on sockets. These are buffered “files”, and a common mistake is to "write" something, and then "read" for a reply. Without a "flush" in there, you may wait forever for the reply, because the request may still be in your output buffer. Now we come to the major stumbling block of sockets - "send" and "recv" operate on the network buffers. They do not necessarily handle all the bytes you hand them (or expect from them), because their major focus is handling the network buffers. In general, they return when the associated network buffers have been filled ("send") or emptied ("recv"). They then tell you how many bytes they handled. It is *your* responsibility to call them again until your message has been completely dealt with. When a "recv" returns 0 bytes, it means the other side has closed (or is in the process of closing) the connection. You will not receive any more data on this connection. Ever. You may be able to send data successfully; I’ll talk more about this later. A protocol like HTTP uses a socket for only one transfer. The client sends a request, then reads a reply. That’s it. The socket is discarded. This means that a client can detect the end of the reply by receiving 0 bytes. But if you plan to reuse your socket for further transfers, you need to realize that *there is no* EOT (End of Transfer) *on a socket.* I repeat: if a socket "send" or "recv" returns after handling 0 bytes, the connection has been broken. If the connection has *not* been broken, you may wait on a "recv" forever, because the socket will *not* tell you that there’s nothing more to read (for now). Now if you think about that a bit, you’ll come to realize a fundamental truth of sockets: *messages must either be fixed length* (yuck), *or be delimited* (shrug), *or indicate how long they are* (much better), *or end by shutting down the connection*. The choice is entirely yours, (but some ways are righter than others). Assuming you don’t want to end the connection, the simplest solution is a fixed length message: class MySocket: """demonstration class only - coded for clarity, not efficiency """ def __init__(self, sock=None): if sock is None: self.sock = socket.socket( socket.AF_INET, socket.SOCK_STREAM) else: self.sock = sock def connect(self, host, port): self.sock.connect((host, port)) def mysend(self, msg): totalsent = 0 while totalsent < MSGLEN: sent = self.sock.send(msg[totalsent:]) if sent == 0: raise RuntimeError("socket connection broken") totalsent = totalsent + sent def myreceive(self): chunks = [] bytes_recd = 0 while bytes_recd < MSGLEN: chunk = self.sock.recv(min(MSGLEN - bytes_recd, 2048)) if chunk == b'': raise RuntimeError("socket connection broken") chunks.append(chunk) bytes_recd = bytes_recd + len(chunk) return b''.join(chunks) The sending code here is usable for almost any messaging scheme - in Python you send strings, and you can use "len()" to determine its length (even if it has embedded "\0" characters). It’s mostly the receiving code that gets more complex. (And in C, it’s not much worse, except you can’t use "strlen" if the message has embedded "\0"s.) The easiest enhancement is to make the first character of the message an indicator of message type, and have the type determine the length. Now you have two "recv"s - the first to get (at least) that first character so you can look up the length, and the second in a loop to get the rest. If you decide to go the delimited route, you’ll be receiving in some arbitrary chunk size, (4096 or 8192 is frequently a good match for network buffer sizes), and scanning what you’ve received for a delimiter. One complication to be aware of: if your conversational protocol allows multiple messages to be sent back to back (without some kind of reply), and you pass "recv" an arbitrary chunk size, you may end up reading the start of a following message. You’ll need to put that aside and hold onto it, until it’s needed. Prefixing the message with its length (say, as 5 numeric characters) gets more complex, because (believe it or not), you may not get all 5 characters in one "recv". In playing around, you’ll get away with it; but in high network loads, your code will very quickly break unless you use two "recv" loops - the first to determine the length, the second to get the data part of the message. Nasty. This is also when you’ll discover that "send" does not always manage to get rid of everything in one pass. And despite having read this, you will eventually get bit by it! In the interests of space, building your character, (and preserving my competitive position), these enhancements are left as an exercise for the reader. Lets move on to cleaning up. Binary Data ----------- It is perfectly possible to send binary data over a socket. The major problem is that not all machines use the same formats for binary data. For example, a Motorola chip will represent a 16 bit integer with the value 1 as the two hex bytes 00 01. Intel and DEC, however, are byte- reversed - that same 1 is 01 00. Socket libraries have calls for converting 16 and 32 bit integers - "ntohl, htonl, ntohs, htons" where “n” means *network* and “h” means *host*, “s” means *short* and “l” means *long*. Where network order is host order, these do nothing, but where the machine is byte-reversed, these swap the bytes around appropriately. In these days of 32 bit machines, the ascii representation of binary data is frequently smaller than the binary representation. That’s because a surprising amount of the time, all those longs have the value 0, or maybe 1. The string “0” would be two bytes, while binary is four. Of course, this doesn’t fit well with fixed-length messages. Decisions, decisions. Disconnecting ============= Strictly speaking, you’re supposed to use "shutdown" on a socket before you "close" it. The "shutdown" is an advisory to the socket at the other end. Depending on the argument you pass it, it can mean “I’m not going to send anymore, but I’ll still listen”, or “I’m not listening, good riddance!”. Most socket libraries, however, are so used to programmers neglecting to use this piece of etiquette that normally a "close" is the same as "shutdown(); close()". So in most situations, an explicit "shutdown" is not needed. One way to use "shutdown" effectively is in an HTTP-like exchange. The client sends a request and then does a "shutdown(1)". This tells the server “This client is done sending, but can still receive.” The server can detect “EOF” by a receive of 0 bytes. It can assume it has the complete request. The server sends a reply. If the "send" completes successfully then, indeed, the client was still receiving. Python takes the automatic shutdown a step further, and says that when a socket is garbage collected, it will automatically do a "close" if it’s needed. But relying on this is a very bad habit. If your socket just disappears without doing a "close", the socket at the other end may hang indefinitely, thinking you’re just being slow. *Please* "close" your sockets when you’re done. When Sockets Die ---------------- Probably the worst thing about using blocking sockets is what happens when the other side comes down hard (without doing a "close"). Your socket is likely to hang. TCP is a reliable protocol, and it will wait a long, long time before giving up on a connection. If you’re using threads, the entire thread is essentially dead. There’s not much you can do about it. As long as you aren’t doing something dumb, like holding a lock while doing a blocking read, the thread isn’t really consuming much in the way of resources. Do *not* try to kill the thread - part of the reason that threads are more efficient than processes is that they avoid the overhead associated with the automatic recycling of resources. In other words, if you do manage to kill the thread, your whole process is likely to be screwed up. Non-blocking Sockets ==================== If you’ve understood the preceding, you already know most of what you need to know about the mechanics of using sockets. You’ll still use the same calls, in much the same ways. It’s just that, if you do it right, your app will be almost inside-out. In Python, you use "socket.setblocking(False)" to make it non- blocking. In C, it’s more complex, (for one thing, you’ll need to choose between the BSD flavor "O_NONBLOCK" and the almost indistinguishable POSIX flavor "O_NDELAY", which is completely different from "TCP_NODELAY"), but it’s the exact same idea. You do this after creating the socket, but before using it. (Actually, if you’re nuts, you can switch back and forth.) The major mechanical difference is that "send", "recv", "connect" and "accept" can return without having done anything. You have (of course) a number of choices. You can check return code and error codes and generally drive yourself crazy. If you don’t believe me, try it sometime. Your app will grow large, buggy and suck CPU. So let’s skip the brain-dead solutions and do it right. Use "select". In C, coding "select" is fairly complex. In Python, it’s a piece of cake, but it’s close enough to the C version that if you understand "select" in Python, you’ll have little trouble with it in C: ready_to_read, ready_to_write, in_error = \ select.select( potential_readers, potential_writers, potential_errs, timeout) You pass "select" three lists: the first contains all sockets that you might want to try reading; the second all the sockets you might want to try writing to, and the last (normally left empty) those that you want to check for errors. You should note that a socket can go into more than one list. The "select" call is blocking, but you can give it a timeout. This is generally a sensible thing to do - give it a nice long timeout (say a minute) unless you have good reason to do otherwise. In return, you will get three lists. They contain the sockets that are actually readable, writable and in error. Each of these lists is a subset (possibly empty) of the corresponding list you passed in. If a socket is in the output readable list, you can be as-close-to- certain-as-we-ever-get-in-this-business that a "recv" on that socket will return *something*. Same idea for the writable list. You’ll be able to send *something*. Maybe not all you want to, but *something* is better than nothing. (Actually, any reasonably healthy socket will return as writable - it just means outbound network buffer space is available.) If you have a “server” socket, put it in the potential_readers list. If it comes out in the readable list, your "accept" will (almost certainly) work. If you have created a new socket to "connect" to someone else, put it in the potential_writers list. If it shows up in the writable list, you have a decent chance that it has connected. Actually, "select" can be handy even with blocking sockets. It’s one way of determining whether you will block - the socket returns as readable when there’s something in the buffers. However, this still doesn’t help with the problem of determining whether the other end is done, or just busy with something else. **Portability alert**: On Unix, "select" works both with the sockets and files. Don’t try this on Windows. On Windows, "select" works with sockets only. Also note that in C, many of the more advanced socket options are done differently on Windows. In fact, on Windows I usually use threads (which work very, very well) with my sockets. Sorting HOW TO ************** Author: Andrew Dalke and Raymond Hettinger Release: 0.1 Python lists have a built-in "list.sort()" method that modifies the list in-place. There is also a "sorted()" built-in function that builds a new sorted list from an iterable. In this document, we explore the various techniques for sorting data using Python. Sorting Basics ============== A simple ascending sort is very easy: just call the "sorted()" function. It returns a new sorted list: >>> sorted([5, 2, 3, 1, 4]) [1, 2, 3, 4, 5] You can also use the "list.sort()" method. It modifies the list in- place (and returns "None" to avoid confusion). Usually it’s less convenient than "sorted()" - but if you don’t need the original list, it’s slightly more efficient. >>> a = [5, 2, 3, 1, 4] >>> a.sort() >>> a [1, 2, 3, 4, 5] Another difference is that the "list.sort()" method is only defined for lists. In contrast, the "sorted()" function accepts any iterable. >>> sorted({1: 'D', 2: 'B', 3: 'B', 4: 'E', 5: 'A'}) [1, 2, 3, 4, 5] Key Functions ============= Both "list.sort()" and "sorted()" have a *key* parameter to specify a function (or other callable) to be called on each list element prior to making comparisons. For example, here’s a case-insensitive string comparison: >>> sorted("This is a test string from Andrew".split(), key=str.lower) ['a', 'Andrew', 'from', 'is', 'string', 'test', 'This'] The value of the *key* parameter should be a function (or other callable) that takes a single argument and returns a key to use for sorting purposes. This technique is fast because the key function is called exactly once for each input record. A common pattern is to sort complex objects using some of the object’s indices as keys. For example: >>> student_tuples = [ ... ('john', 'A', 15), ... ('jane', 'B', 12), ... ('dave', 'B', 10), ... ] >>> sorted(student_tuples, key=lambda student: student[2]) # sort by age [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] The same technique works for objects with named attributes. For example: >>> class Student: ... def __init__(self, name, grade, age): ... self.name = name ... self.grade = grade ... self.age = age ... def __repr__(self): ... return repr((self.name, self.grade, self.age)) >>> student_objects = [ ... Student('john', 'A', 15), ... Student('jane', 'B', 12), ... Student('dave', 'B', 10), ... ] >>> sorted(student_objects, key=lambda student: student.age) # sort by age [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] Operator Module Functions ========================= The key-function patterns shown above are very common, so Python provides convenience functions to make accessor functions easier and faster. The "operator" module has "itemgetter()", "attrgetter()", and a "methodcaller()" function. Using those functions, the above examples become simpler and faster: >>> from operator import itemgetter, attrgetter >>> sorted(student_tuples, key=itemgetter(2)) [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] >>> sorted(student_objects, key=attrgetter('age')) [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] The operator module functions allow multiple levels of sorting. For example, to sort by *grade* then by *age*: >>> sorted(student_tuples, key=itemgetter(1,2)) [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)] >>> sorted(student_objects, key=attrgetter('grade', 'age')) [('john', 'A', 15), ('dave', 'B', 10), ('jane', 'B', 12)] Ascending and Descending ======================== Both "list.sort()" and "sorted()" accept a *reverse* parameter with a boolean value. This is used to flag descending sorts. For example, to get the student data in reverse *age* order: >>> sorted(student_tuples, key=itemgetter(2), reverse=True) [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] >>> sorted(student_objects, key=attrgetter('age'), reverse=True) [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] Sort Stability and Complex Sorts ================================ Sorts are guaranteed to be stable. That means that when multiple records have the same key, their original order is preserved. >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)] >>> sorted(data, key=itemgetter(0)) [('blue', 1), ('blue', 2), ('red', 1), ('red', 2)] Notice how the two records for *blue* retain their original order so that "('blue', 1)" is guaranteed to precede "('blue', 2)". This wonderful property lets you build complex sorts in a series of sorting steps. For example, to sort the student data by descending *grade* and then ascending *age*, do the *age* sort first and then sort again using *grade*: >>> s = sorted(student_objects, key=attrgetter('age')) # sort on secondary key >>> sorted(s, key=attrgetter('grade'), reverse=True) # now sort on primary key, descending [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] This can be abstracted out into a wrapper function that can take a list and tuples of field and order to sort them on multiple passes. >>> def multisort(xs, specs): ... for key, reverse in reversed(specs): ... xs.sort(key=attrgetter(key), reverse=reverse) ... return xs >>> multisort(list(student_objects), (('grade', True), ('age', False))) [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] The Timsort algorithm used in Python does multiple sorts efficiently because it can take advantage of any ordering already present in a dataset. The Old Way Using Decorate-Sort-Undecorate ========================================== This idiom is called Decorate-Sort-Undecorate after its three steps: * First, the initial list is decorated with new values that control the sort order. * Second, the decorated list is sorted. * Finally, the decorations are removed, creating a list that contains only the initial values in the new order. For example, to sort the student data by *grade* using the DSU approach: >>> decorated = [(student.grade, i, student) for i, student in enumerate(student_objects)] >>> decorated.sort() >>> [student for grade, i, student in decorated] # undecorate [('john', 'A', 15), ('jane', 'B', 12), ('dave', 'B', 10)] This idiom works because tuples are compared lexicographically; the first items are compared; if they are the same then the second items are compared, and so on. It is not strictly necessary in all cases to include the index *i* in the decorated list, but including it gives two benefits: * The sort is stable – if two items have the same key, their order will be preserved in the sorted list. * The original items do not have to be comparable because the ordering of the decorated tuples will be determined by at most the first two items. So for example the original list could contain complex numbers which cannot be sorted directly. Another name for this idiom is Schwartzian transform, after Randal L. Schwartz, who popularized it among Perl programmers. Now that Python sorting provides key-functions, this technique is not often needed. The Old Way Using the *cmp* Parameter ===================================== Many constructs given in this HOWTO assume Python 2.4 or later. Before that, there was no "sorted()" builtin and "list.sort()" took no keyword arguments. Instead, all of the Py2.x versions supported a *cmp* parameter to handle user specified comparison functions. In Py3.0, the *cmp* parameter was removed entirely (as part of a larger effort to simplify and unify the language, eliminating the conflict between rich comparisons and the "__cmp__()" magic method). In Py2.x, sort allowed an optional function which can be called for doing the comparisons. That function should take two arguments to be compared and then return a negative value for less-than, return zero if they are equal, or return a positive value for greater-than. For example, we can do: >>> def numeric_compare(x, y): ... return x - y >>> sorted([5, 2, 4, 1, 3], cmp=numeric_compare) [1, 2, 3, 4, 5] Or you can reverse the order of comparison with: >>> def reverse_numeric(x, y): ... return y - x >>> sorted([5, 2, 4, 1, 3], cmp=reverse_numeric) [5, 4, 3, 2, 1] When porting code from Python 2.x to 3.x, the situation can arise when you have the user supplying a comparison function and you need to convert that to a key function. The following wrapper makes that easy to do: def cmp_to_key(mycmp): 'Convert a cmp= function into a key= function' class K: def __init__(self, obj, *args): self.obj = obj def __lt__(self, other): return mycmp(self.obj, other.obj) < 0 def __gt__(self, other): return mycmp(self.obj, other.obj) > 0 def __eq__(self, other): return mycmp(self.obj, other.obj) == 0 def __le__(self, other): return mycmp(self.obj, other.obj) <= 0 def __ge__(self, other): return mycmp(self.obj, other.obj) >= 0 def __ne__(self, other): return mycmp(self.obj, other.obj) != 0 return K To convert to a key function, just wrap the old comparison function: >>> sorted([5, 2, 4, 1, 3], key=cmp_to_key(reverse_numeric)) [5, 4, 3, 2, 1] In Python 3.2, the "functools.cmp_to_key()" function was added to the "functools" module in the standard library. Odd and Ends ============ * For locale aware sorting, use "locale.strxfrm()" for a key function or "locale.strcoll()" for a comparison function. * The *reverse* parameter still maintains sort stability (so that records with equal keys retain the original order). Interestingly, that effect can be simulated without the parameter by using the builtin "reversed()" function twice: >>> data = [('red', 1), ('blue', 1), ('red', 2), ('blue', 2)] >>> standard_way = sorted(data, key=itemgetter(0), reverse=True) >>> double_reversed = list(reversed(sorted(reversed(data), key=itemgetter(0)))) >>> assert standard_way == double_reversed >>> standard_way [('red', 1), ('red', 2), ('blue', 1), ('blue', 2)] * The sort routines use "<" when making comparisons between two objects. So, it is easy to add a standard sort order to a class by defining an "__lt__()" method: >>> Student.__lt__ = lambda self, other: self.age < other.age >>> sorted(student_objects) [('dave', 'B', 10), ('jane', 'B', 12), ('john', 'A', 15)] However, note that "<" can fall back to using "__gt__()" if "__lt__()" is not implemented (see "object.__lt__()"). * Key functions need not depend directly on the objects being sorted. A key function can also access external resources. For instance, if the student grades are stored in a dictionary, they can be used to sort a separate list of student names: >>> students = ['dave', 'john', 'jane'] >>> newgrades = {'john': 'F', 'jane':'A', 'dave': 'C'} >>> sorted(students, key=newgrades.__getitem__) ['jane', 'dave', 'john'] Unicode HOWTO ************* Release: 1.12 This HOWTO discusses Python’s support for the Unicode specification for representing textual data, and explains various problems that people commonly encounter when trying to work with Unicode. Introduction to Unicode ======================= Definitions ----------- Today’s programs need to be able to handle a wide variety of characters. Applications are often internationalized to display messages and output in a variety of user-selectable languages; the same program might need to output an error message in English, French, Japanese, Hebrew, or Russian. Web content can be written in any of these languages and can also include a variety of emoji symbols. Python’s string type uses the Unicode Standard for representing characters, which lets Python programs work with all these different possible characters. Unicode (https://www.unicode.org/) is a specification that aims to list every character used by human languages and give each character its own unique code. The Unicode specifications are continually revised and updated to add new languages and symbols. A **character** is the smallest possible component of a text. ‘A’, ‘B’, ‘C’, etc., are all different characters. So are ‘È’ and ‘Í’. Characters vary depending on the language or context you’re talking about. For example, there’s a character for “Roman Numeral One”, ‘Ⅰ’, that’s separate from the uppercase letter ‘I’. They’ll usually look the same, but these are two different characters that have different meanings. The Unicode standard describes how characters are represented by **code points**. A code point value is an integer in the range 0 to 0x10FFFF (about 1.1 million values, the actual number assigned is less than that). In the standard and in this document, a code point is written using the notation "U+265E" to mean the character with value "0x265e" (9,822 in decimal). The Unicode standard contains a lot of tables listing characters and their corresponding code points: 0061 'a'; LATIN SMALL LETTER A 0062 'b'; LATIN SMALL LETTER B 0063 'c'; LATIN SMALL LETTER C ... 007B '{'; LEFT CURLY BRACKET ... 2167 'Ⅷ'; ROMAN NUMERAL EIGHT 2168 'Ⅸ'; ROMAN NUMERAL NINE ... 265E '♞'; BLACK CHESS KNIGHT 265F '♟'; BLACK CHESS PAWN ... 1F600 '😀'; GRINNING FACE 1F609 '😉'; WINKING FACE ... Strictly, these definitions imply that it’s meaningless to say ‘this is character "U+265E"’. "U+265E" is a code point, which represents some particular character; in this case, it represents the character ‘BLACK CHESS KNIGHT’, ‘♞’. In informal contexts, this distinction between code points and characters will sometimes be forgotten. A character is represented on a screen or on paper by a set of graphical elements that’s called a **glyph**. The glyph for an uppercase A, for example, is two diagonal strokes and a horizontal stroke, though the exact details will depend on the font being used. Most Python code doesn’t need to worry about glyphs; figuring out the correct glyph to display is generally the job of a GUI toolkit or a terminal’s font renderer. Encodings --------- To summarize the previous section: a Unicode string is a sequence of code points, which are numbers from 0 through "0x10FFFF" (1,114,111 decimal). This sequence of code points needs to be represented in memory as a set of **code units**, and **code units** are then mapped to 8-bit bytes. The rules for translating a Unicode string into a sequence of bytes are called a **character encoding**, or just an **encoding**. The first encoding you might think of is using 32-bit integers as the code unit, and then using the CPU’s representation of 32-bit integers. In this representation, the string “Python” might look like this: P y t h o n 0x50 00 00 00 79 00 00 00 74 00 00 00 68 00 00 00 6f 00 00 00 6e 00 00 00 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 This representation is straightforward but using it presents a number of problems. 1. It’s not portable; different processors order the bytes differently. 2. It’s very wasteful of space. In most texts, the majority of the code points are less than 127, or less than 255, so a lot of space is occupied by "0x00" bytes. The above string takes 24 bytes compared to the 6 bytes needed for an ASCII representation. Increased RAM usage doesn’t matter too much (desktop computers have gigabytes of RAM, and strings aren’t usually that large), but expanding our usage of disk and network bandwidth by a factor of 4 is intolerable. 3. It’s not compatible with existing C functions such as "strlen()", so a new family of wide string functions would need to be used. Therefore this encoding isn’t used very much, and people instead choose other encodings that are more efficient and convenient, such as UTF-8. UTF-8 is one of the most commonly used encodings, and Python often defaults to using it. UTF stands for “Unicode Transformation Format”, and the ‘8’ means that 8-bit values are used in the encoding. (There are also UTF-16 and UTF-32 encodings, but they are less frequently used than UTF-8.) UTF-8 uses the following rules: 1. If the code point is < 128, it’s represented by the corresponding byte value. 2. If the code point is >= 128, it’s turned into a sequence of two, three, or four bytes, where each byte of the sequence is between 128 and 255. UTF-8 has several convenient properties: 1. It can handle any Unicode code point. 2. A Unicode string is turned into a sequence of bytes that contains embedded zero bytes only where they represent the null character (U+0000). This means that UTF-8 strings can be processed by C functions such as "strcpy()" and sent through protocols that can’t handle zero bytes for anything other than end-of-string markers. 3. A string of ASCII text is also valid UTF-8 text. 4. UTF-8 is fairly compact; the majority of commonly used characters can be represented with one or two bytes. 5. If bytes are corrupted or lost, it’s possible to determine the start of the next UTF-8-encoded code point and resynchronize. It’s also unlikely that random 8-bit data will look like valid UTF-8. 6. UTF-8 is a byte oriented encoding. The encoding specifies that each character is represented by a specific sequence of one or more bytes. This avoids the byte-ordering issues that can occur with integer and word oriented encodings, like UTF-16 and UTF-32, where the sequence of bytes varies depending on the hardware on which the string was encoded. References ---------- The Unicode Consortium site has character charts, a glossary, and PDF versions of the Unicode specification. Be prepared for some difficult reading. A chronology of the origin and development of Unicode is also available on the site. On the Computerphile Youtube channel, Tom Scott briefly discusses the history of Unicode and UTF-8 (9 minutes 36 seconds). To help understand the standard, Jukka Korpela has written an introductory guide to reading the Unicode character tables. Another good introductory article was written by Joel Spolsky. If this introduction didn’t make things clear to you, you should try reading this alternate article before continuing. Wikipedia entries are often helpful; see the entries for “character encoding” and UTF-8, for example. Python’s Unicode Support ======================== Now that you’ve learned the rudiments of Unicode, we can look at Python’s Unicode features. The String Type --------------- Since Python 3.0, the language’s "str" type contains Unicode characters, meaning any string created using ""unicode rocks!"", "'unicode rocks!'", or the triple-quoted string syntax is stored as Unicode. The default encoding for Python source code is UTF-8, so you can simply include a Unicode character in a string literal: try: with open('/tmp/input.txt', 'r') as f: ... except OSError: # 'File not found' error message. print("Fichier non trouvé") Side note: Python 3 also supports using Unicode characters in identifiers: répertoire = "/tmp/records.log" with open(répertoire, "w") as f: f.write("test\n") If you can’t enter a particular character in your editor or want to keep the source code ASCII-only for some reason, you can also use escape sequences in string literals. (Depending on your system, you may see the actual capital-delta glyph instead of a u escape.) >>> "\N{GREEK CAPITAL LETTER DELTA}" # Using the character name '\u0394' >>> "\u0394" # Using a 16-bit hex value '\u0394' >>> "\U00000394" # Using a 32-bit hex value '\u0394' In addition, one can create a string using the "decode()" method of "bytes". This method takes an *encoding* argument, such as "UTF-8", and optionally an *errors* argument. The *errors* argument specifies the response when the input string can’t be converted according to the encoding’s rules. Legal values for this argument are "'strict'" (raise a "UnicodeDecodeError" exception), "'replace'" (use "U+FFFD", "REPLACEMENT CHARACTER"), "'ignore'" (just leave the character out of the Unicode result), or "'backslashreplace'" (inserts a "\xNN" escape sequence). The following examples show the differences: >>> b'\x80abc'.decode("utf-8", "strict") Traceback (most recent call last): ... UnicodeDecodeError: 'utf-8' codec can't decode byte 0x80 in position 0: invalid start byte >>> b'\x80abc'.decode("utf-8", "replace") '\ufffdabc' >>> b'\x80abc'.decode("utf-8", "backslashreplace") '\\x80abc' >>> b'\x80abc'.decode("utf-8", "ignore") 'abc' Encodings are specified as strings containing the encoding’s name. Python comes with roughly 100 different encodings; see the Python Library Reference at Standard Encodings for a list. Some encodings have multiple names; for example, "'latin-1'", "'iso_8859_1'" and "'8859"’ are all synonyms for the same encoding. One-character Unicode strings can also be created with the "chr()" built-in function, which takes integers and returns a Unicode string of length 1 that contains the corresponding code point. The reverse operation is the built-in "ord()" function that takes a one-character Unicode string and returns the code point value: >>> chr(57344) '\ue000' >>> ord('\ue000') 57344 Converting to Bytes ------------------- The opposite method of "bytes.decode()" is "str.encode()", which returns a "bytes" representation of the Unicode string, encoded in the requested *encoding*. The *errors* parameter is the same as the parameter of the "decode()" method but supports a few more possible handlers. As well as "'strict'", "'ignore'", and "'replace'" (which in this case inserts a question mark instead of the unencodable character), there is also "'xmlcharrefreplace'" (inserts an XML character reference), "backslashreplace" (inserts a "\uNNNN" escape sequence) and "namereplace" (inserts a "\N{...}" escape sequence). The following example shows the different results: >>> u = chr(40960) + 'abcd' + chr(1972) >>> u.encode('utf-8') b'\xea\x80\x80abcd\xde\xb4' >>> u.encode('ascii') Traceback (most recent call last): ... UnicodeEncodeError: 'ascii' codec can't encode character '\ua000' in position 0: ordinal not in range(128) >>> u.encode('ascii', 'ignore') b'abcd' >>> u.encode('ascii', 'replace') b'?abcd?' >>> u.encode('ascii', 'xmlcharrefreplace') b'ꀀabcd޴' >>> u.encode('ascii', 'backslashreplace') b'\\ua000abcd\\u07b4' >>> u.encode('ascii', 'namereplace') b'\\N{YI SYLLABLE IT}abcd\\u07b4' The low-level routines for registering and accessing the available encodings are found in the "codecs" module. Implementing new encodings also requires understanding the "codecs" module. However, the encoding and decoding functions returned by this module are usually more low-level than is comfortable, and writing new encodings is a specialized task, so the module won’t be covered in this HOWTO. Unicode Literals in Python Source Code -------------------------------------- In Python source code, specific Unicode code points can be written using the "\u" escape sequence, which is followed by four hex digits giving the code point. The "\U" escape sequence is similar, but expects eight hex digits, not four: >>> s = "a\xac\u1234\u20ac\U00008000" ... # ^^^^ two-digit hex escape ... # ^^^^^^ four-digit Unicode escape ... # ^^^^^^^^^^ eight-digit Unicode escape >>> [ord(c) for c in s] [97, 172, 4660, 8364, 32768] Using escape sequences for code points greater than 127 is fine in small doses, but becomes an annoyance if you’re using many accented characters, as you would in a program with messages in French or some other accent-using language. You can also assemble strings using the "chr()" built-in function, but this is even more tedious. Ideally, you’d want to be able to write literals in your language’s natural encoding. You could then edit Python source code with your favorite editor which would display the accented characters naturally, and have the right characters used at runtime. Python supports writing source code in UTF-8 by default, but you can use almost any encoding if you declare the encoding being used. This is done by including a special comment as either the first or second line of the source file: #!/usr/bin/env python # -*- coding: latin-1 -*- u = 'abcdé' print(ord(u[-1])) The syntax is inspired by Emacs’s notation for specifying variables local to a file. Emacs supports many different variables, but Python only supports ‘coding’. The "-*-" symbols indicate to Emacs that the comment is special; they have no significance to Python but are a convention. Python looks for "coding: name" or "coding=name" in the comment. If you don’t include such a comment, the default encoding used will be UTF-8 as already mentioned. See also **PEP 263** for more information. Unicode Properties ------------------ The Unicode specification includes a database of information about code points. For each defined code point, the information includes the character’s name, its category, the numeric value if applicable (for characters representing numeric concepts such as the Roman numerals, fractions such as one-third and four-fifths, etc.). There are also display-related properties, such as how to use the code point in bidirectional text. The following program displays some information about several characters, and prints the numeric value of one particular character: import unicodedata u = chr(233) + chr(0x0bf2) + chr(3972) + chr(6000) + chr(13231) for i, c in enumerate(u): print(i, '%04x' % ord(c), unicodedata.category(c), end=" ") print(unicodedata.name(c)) # Get numeric value of second character print(unicodedata.numeric(u[1])) When run, this prints: 0 00e9 Ll LATIN SMALL LETTER E WITH ACUTE 1 0bf2 No TAMIL NUMBER ONE THOUSAND 2 0f84 Mn TIBETAN MARK HALANTA 3 1770 Lo TAGBANWA LETTER SA 4 33af So SQUARE RAD OVER S SQUARED 1000.0 The category codes are abbreviations describing the nature of the character. These are grouped into categories such as “Letter”, “Number”, “Punctuation”, or “Symbol”, which in turn are broken up into subcategories. To take the codes from the above output, "'Ll'" means ‘Letter, lowercase’, "'No'" means “Number, other”, "'Mn'" is “Mark, nonspacing”, and "'So'" is “Symbol, other”. See the General Category Values section of the Unicode Character Database documentation for a list of category codes. Comparing Strings ----------------- Unicode adds some complication to comparing strings, because the same set of characters can be represented by different sequences of code points. For example, a letter like ‘ê’ can be represented as a single code point U+00EA, or as U+0065 U+0302, which is the code point for ‘e’ followed by a code point for ‘COMBINING CIRCUMFLEX ACCENT’. These will produce the same output when printed, but one is a string of length 1 and the other is of length 2. One tool for a case-insensitive comparison is the "casefold()" string method that converts a string to a case-insensitive form following an algorithm described by the Unicode Standard. This algorithm has special handling for characters such as the German letter ‘ß’ (code point U+00DF), which becomes the pair of lowercase letters ‘ss’. >>> street = 'Gürzenichstraße' >>> street.casefold() 'gürzenichstrasse' A second tool is the "unicodedata" module’s "normalize()" function that converts strings to one of several normal forms, where letters followed by a combining character are replaced with single characters. "normalize()" can be used to perform string comparisons that won’t falsely report inequality if two strings use combining characters differently: import unicodedata def compare_strs(s1, s2): def NFD(s): return unicodedata.normalize('NFD', s) return NFD(s1) == NFD(s2) single_char = 'ê' multiple_chars = '\N{LATIN SMALL LETTER E}\N{COMBINING CIRCUMFLEX ACCENT}' print('length of first string=', len(single_char)) print('length of second string=', len(multiple_chars)) print(compare_strs(single_char, multiple_chars)) When run, this outputs: $ python3 compare-strs.py length of first string= 1 length of second string= 2 True The first argument to the "normalize()" function is a string giving the desired normalization form, which can be one of ‘NFC’, ‘NFKC’, ‘NFD’, and ‘NFKD’. The Unicode Standard also specifies how to do caseless comparisons: import unicodedata def compare_caseless(s1, s2): def NFD(s): return unicodedata.normalize('NFD', s) return NFD(NFD(s1).casefold()) == NFD(NFD(s2).casefold()) # Example usage single_char = 'ê' multiple_chars = '\N{LATIN CAPITAL LETTER E}\N{COMBINING CIRCUMFLEX ACCENT}' print(compare_caseless(single_char, multiple_chars)) This will print "True". (Why is "NFD()" invoked twice? Because there are a few characters that make "casefold()" return a non-normalized string, so the result needs to be normalized again. See section 3.13 of the Unicode Standard for a discussion and an example.) Unicode Regular Expressions --------------------------- The regular expressions supported by the "re" module can be provided either as bytes or strings. Some of the special character sequences such as "\d" and "\w" have different meanings depending on whether the pattern is supplied as bytes or a string. For example, "\d" will match the characters "[0-9]" in bytes but in strings will match any character that’s in the "'Nd'" category. The string in this example has the number 57 written in both Thai and Arabic numerals: import re p = re.compile(r'\d+') s = "Over \u0e55\u0e57 57 flavours" m = p.search(s) print(repr(m.group())) When executed, "\d+" will match the Thai numerals and print them out. If you supply the "re.ASCII" flag to "compile()", "\d+" will match the substring “57” instead. Similarly, "\w" matches a wide variety of Unicode characters but only "[a-zA-Z0-9_]" in bytes or if "re.ASCII" is supplied, and "\s" will match either Unicode whitespace characters or "[ \t\n\r\f\v]". References ---------- Some good alternative discussions of Python’s Unicode support are: * Processing Text Files in Python 3, by Nick Coghlan. * Pragmatic Unicode, a PyCon 2012 presentation by Ned Batchelder. The "str" type is described in the Python library reference at Text Sequence Type — str. The documentation for the "unicodedata" module. The documentation for the "codecs" module. Marc-André Lemburg gave a presentation titled “Python and Unicode” (PDF slides) at EuroPython 2002. The slides are an excellent overview of the design of Python 2’s Unicode features (where the Unicode string type is called "unicode" and literals start with "u"). Reading and Writing Unicode Data ================================ Once you’ve written some code that works with Unicode data, the next problem is input/output. How do you get Unicode strings into your program, and how do you convert Unicode into a form suitable for storage or transmission? It’s possible that you may not need to do anything depending on your input sources and output destinations; you should check whether the libraries used in your application support Unicode natively. XML parsers often return Unicode data, for example. Many relational databases also support Unicode-valued columns and can return Unicode values from an SQL query. Unicode data is usually converted to a particular encoding before it gets written to disk or sent over a socket. It’s possible to do all the work yourself: open a file, read an 8-bit bytes object from it, and convert the bytes with "bytes.decode(encoding)". However, the manual approach is not recommended. One problem is the multi-byte nature of encodings; one Unicode character can be represented by several bytes. If you want to read the file in arbitrary-sized chunks (say, 1024 or 4096 bytes), you need to write error-handling code to catch the case where only part of the bytes encoding a single Unicode character are read at the end of a chunk. One solution would be to read the entire file into memory and then perform the decoding, but that prevents you from working with files that are extremely large; if you need to read a 2 GiB file, you need 2 GiB of RAM. (More, really, since for at least a moment you’d need to have both the encoded string and its Unicode version in memory.) The solution would be to use the low-level decoding interface to catch the case of partial coding sequences. The work of implementing this has already been done for you: the built-in "open()" function can return a file-like object that assumes the file’s contents are in a specified encoding and accepts Unicode parameters for methods such as "read()" and "write()". This works through "open()"’s *encoding* and *errors* parameters which are interpreted just like those in "str.encode()" and "bytes.decode()". Reading Unicode from a file is therefore simple: with open('unicode.txt', encoding='utf-8') as f: for line in f: print(repr(line)) It’s also possible to open files in update mode, allowing both reading and writing: with open('test', encoding='utf-8', mode='w+') as f: f.write('\u4500 blah blah blah\n') f.seek(0) print(repr(f.readline()[:1])) The Unicode character "U+FEFF" is used as a byte-order mark (BOM), and is often written as the first character of a file in order to assist with autodetection of the file’s byte ordering. Some encodings, such as UTF-16, expect a BOM to be present at the start of a file; when such an encoding is used, the BOM will be automatically written as the first character and will be silently dropped when the file is read. There are variants of these encodings, such as ‘utf-16-le’ and ‘utf-16-be’ for little-endian and big-endian encodings, that specify one particular byte ordering and don’t skip the BOM. In some areas, it is also convention to use a “BOM” at the start of UTF-8 encoded files; the name is misleading since UTF-8 is not byte- order dependent. The mark simply announces that the file is encoded in UTF-8. For reading such files, use the ‘utf-8-sig’ codec to automatically skip the mark if present. Unicode filenames ----------------- Most of the operating systems in common use today support filenames that contain arbitrary Unicode characters. Usually this is implemented by converting the Unicode string into some encoding that varies depending on the system. Today Python is converging on using UTF-8: Python on MacOS has used UTF-8 for several versions, and Python 3.6 switched to using UTF-8 on Windows as well. On Unix systems, there will only be a filesystem encoding if you’ve set the "LANG" or "LC_CTYPE" environment variables; if you haven’t, the default encoding is again UTF-8. The "sys.getfilesystemencoding()" function returns the encoding to use on your current system, in case you want to do the encoding manually, but there’s not much reason to bother. When opening a file for reading or writing, you can usually just provide the Unicode string as the filename, and it will be automatically converted to the right encoding for you: filename = 'filename\u4500abc' with open(filename, 'w') as f: f.write('blah\n') Functions in the "os" module such as "os.stat()" will also accept Unicode filenames. The "os.listdir()" function returns filenames, which raises an issue: should it return the Unicode version of filenames, or should it return bytes containing the encoded versions? "os.listdir()" can do both, depending on whether you provided the directory path as bytes or a Unicode string. If you pass a Unicode string as the path, filenames will be decoded using the filesystem’s encoding and a list of Unicode strings will be returned, while passing a byte path will return the filenames as bytes. For example, assuming the default filesystem encoding is UTF-8, running the following program: fn = 'filename\u4500abc' f = open(fn, 'w') f.close() import os print(os.listdir(b'.')) print(os.listdir('.')) will produce the following output: $ python listdir-test.py [b'filename\xe4\x94\x80abc', ...] ['filename\u4500abc', ...] The first list contains UTF-8-encoded filenames, and the second list contains the Unicode versions. Note that on most occasions, you should can just stick with using Unicode with these APIs. The bytes APIs should only be used on systems where undecodable file names can be present; that’s pretty much only Unix systems now. Tips for Writing Unicode-aware Programs --------------------------------------- This section provides some suggestions on writing software that deals with Unicode. The most important tip is: Software should only work with Unicode strings internally, decoding the input data as soon as possible and encoding the output only at the end. If you attempt to write processing functions that accept both Unicode and byte strings, you will find your program vulnerable to bugs wherever you combine the two different kinds of strings. There is no automatic encoding or decoding: if you do e.g. "str + bytes", a "TypeError" will be raised. When using data coming from a web browser or some other untrusted source, a common technique is to check for illegal characters in a string before using the string in a generated command line or storing it in a database. If you’re doing this, be careful to check the decoded string, not the encoded bytes data; some encodings may have interesting properties, such as not being bijective or not being fully ASCII-compatible. This is especially true if the input data also specifies the encoding, since the attacker can then choose a clever way to hide malicious text in the encoded bytestream. Converting Between File Encodings ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The "StreamRecoder" class can transparently convert between encodings, taking a stream that returns data in encoding #1 and behaving like a stream returning data in encoding #2. For example, if you have an input file *f* that’s in Latin-1, you can wrap it with a "StreamRecoder" to return bytes encoded in UTF-8: new_f = codecs.StreamRecoder(f, # en/decoder: used by read() to encode its results and # by write() to decode its input. codecs.getencoder('utf-8'), codecs.getdecoder('utf-8'), # reader/writer: used to read and write to the stream. codecs.getreader('latin-1'), codecs.getwriter('latin-1') ) Files in an Unknown Encoding ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ What can you do if you need to make a change to a file, but don’t know the file’s encoding? If you know the encoding is ASCII-compatible and only want to examine or modify the ASCII parts, you can open the file with the "surrogateescape" error handler: with open(fname, 'r', encoding="ascii", errors="surrogateescape") as f: data = f.read() # make changes to the string 'data' with open(fname + '.new', 'w', encoding="ascii", errors="surrogateescape") as f: f.write(data) The "surrogateescape" error handler will decode any non-ASCII bytes as code points in a special range running from U+DC80 to U+DCFF. These code points will then turn back into the same bytes when the "surrogateescape" error handler is used to encode the data and write it back out. References ---------- One section of Mastering Python 3 Input/Output, a PyCon 2010 talk by David Beazley, discusses text processing and binary data handling. The PDF slides for Marc-André Lemburg’s presentation “Writing Unicode- aware Applications in Python” discuss questions of character encodings as well as how to internationalize and localize an application. These slides cover Python 2.x only. The Guts of Unicode in Python is a PyCon 2013 talk by Benjamin Peterson that discusses the internal Unicode representation in Python 3.3. Acknowledgements ================ The initial draft of this document was written by Andrew Kuchling. It has since been revised further by Alexander Belopolsky, Georg Brandl, Andrew Kuchling, and Ezio Melotti. Thanks to the following people who have noted errors or offered suggestions on this article: Éric Araujo, Nicholas Bastin, Nick Coghlan, Marius Gedminas, Kent Johnson, Ken Krugler, Marc-André Lemburg, Martin von Löwis, Terry J. Reedy, Serhiy Storchaka, Eryk Sun, Chad Whitacre, Graham Wideman. HOWTO Fetch Internet Resources Using The urllib Package ******************************************************* Author: Michael Foord Note: There is a French translation of an earlier revision of this HOWTO, available at urllib2 - Le Manuel manquant. Introduction ============ Related Articles ^^^^^^^^^^^^^^^^ You may also find useful the following article on fetching web resources with Python: * Basic Authentication A tutorial on *Basic Authentication*, with examples in Python. **urllib.request** is a Python module for fetching URLs (Uniform Resource Locators). It offers a very simple interface, in the form of the *urlopen* function. This is capable of fetching URLs using a variety of different protocols. It also offers a slightly more complex interface for handling common situations - like basic authentication, cookies, proxies and so on. These are provided by objects called handlers and openers. urllib.request supports fetching URLs for many “URL schemes” (identified by the string before the "":"" in URL - for example ""ftp"" is the URL scheme of ""ftp://python.org/"") using their associated network protocols (e.g. FTP, HTTP). This tutorial focuses on the most common case, HTTP. For straightforward situations *urlopen* is very easy to use. But as soon as you encounter errors or non-trivial cases when opening HTTP URLs, you will need some understanding of the HyperText Transfer Protocol. The most comprehensive and authoritative reference to HTTP is **RFC 2616**. This is a technical document and not intended to be easy to read. This HOWTO aims to illustrate using *urllib*, with enough detail about HTTP to help you through. It is not intended to replace the "urllib.request" docs, but is supplementary to them. Fetching URLs ============= The simplest way to use urllib.request is as follows: import urllib.request with urllib.request.urlopen('http://python.org/') as response: html = response.read() If you wish to retrieve a resource via URL and store it in a temporary location, you can do so via the "shutil.copyfileobj()" and "tempfile.NamedTemporaryFile()" functions: import shutil import tempfile import urllib.request with urllib.request.urlopen('http://python.org/') as response: with tempfile.NamedTemporaryFile(delete=False) as tmp_file: shutil.copyfileobj(response, tmp_file) with open(tmp_file.name) as html: pass Many uses of urllib will be that simple (note that instead of an ‘http:’ URL we could have used a URL starting with ‘ftp:’, ‘file:’, etc.). However, it’s the purpose of this tutorial to explain the more complicated cases, concentrating on HTTP. HTTP is based on requests and responses - the client makes requests and servers send responses. urllib.request mirrors this with a "Request" object which represents the HTTP request you are making. In its simplest form you create a Request object that specifies the URL you want to fetch. Calling "urlopen" with this Request object returns a response object for the URL requested. This response is a file-like object, which means you can for example call ".read()" on the response: import urllib.request req = urllib.request.Request('http://www.voidspace.org.uk') with urllib.request.urlopen(req) as response: the_page = response.read() Note that urllib.request makes use of the same Request interface to handle all URL schemes. For example, you can make an FTP request like so: req = urllib.request.Request('ftp://example.com/') In the case of HTTP, there are two extra things that Request objects allow you to do: First, you can pass data to be sent to the server. Second, you can pass extra information (“metadata”) *about* the data or about the request itself, to the server - this information is sent as HTTP “headers”. Let’s look at each of these in turn. Data ---- Sometimes you want to send data to a URL (often the URL will refer to a CGI (Common Gateway Interface) script or other web application). With HTTP, this is often done using what’s known as a **POST** request. This is often what your browser does when you submit a HTML form that you filled in on the web. Not all POSTs have to come from forms: you can use a POST to transmit arbitrary data to your own application. In the common case of HTML forms, the data needs to be encoded in a standard way, and then passed to the Request object as the "data" argument. The encoding is done using a function from the "urllib.parse" library. import urllib.parse import urllib.request url = 'http://www.someserver.com/cgi-bin/register.cgi' values = {'name' : 'Michael Foord', 'location' : 'Northampton', 'language' : 'Python' } data = urllib.parse.urlencode(values) data = data.encode('ascii') # data should be bytes req = urllib.request.Request(url, data) with urllib.request.urlopen(req) as response: the_page = response.read() Note that other encodings are sometimes required (e.g. for file upload from HTML forms - see HTML Specification, Form Submission for more details). If you do not pass the "data" argument, urllib uses a **GET** request. One way in which GET and POST requests differ is that POST requests often have “side-effects”: they change the state of the system in some way (for example by placing an order with the website for a hundredweight of tinned spam to be delivered to your door). Though the HTTP standard makes it clear that POSTs are intended to *always* cause side-effects, and GET requests *never* to cause side-effects, nothing prevents a GET request from having side-effects, nor a POST requests from having no side-effects. Data can also be passed in an HTTP GET request by encoding it in the URL itself. This is done as follows: >>> import urllib.request >>> import urllib.parse >>> data = {} >>> data['name'] = 'Somebody Here' >>> data['location'] = 'Northampton' >>> data['language'] = 'Python' >>> url_values = urllib.parse.urlencode(data) >>> print(url_values) # The order may differ from below. name=Somebody+Here&language=Python&location=Northampton >>> url = 'http://www.example.com/example.cgi' >>> full_url = url + '?' + url_values >>> data = urllib.request.urlopen(full_url) Notice that the full URL is created by adding a "?" to the URL, followed by the encoded values. Headers ------- We’ll discuss here one particular HTTP header, to illustrate how to add headers to your HTTP request. Some websites [1] dislike being browsed by programs, or send different versions to different browsers [2]. By default urllib identifies itself as "Python-urllib/x.y" (where "x" and "y" are the major and minor version numbers of the Python release, e.g. "Python- urllib/2.5"), which may confuse the site, or just plain not work. The way a browser identifies itself is through the "User-Agent" header [3]. When you create a Request object you can pass a dictionary of headers in. The following example makes the same request as above, but identifies itself as a version of Internet Explorer [4]. import urllib.parse import urllib.request url = 'http://www.someserver.com/cgi-bin/register.cgi' user_agent = 'Mozilla/5.0 (Windows NT 6.1; Win64; x64)' values = {'name': 'Michael Foord', 'location': 'Northampton', 'language': 'Python' } headers = {'User-Agent': user_agent} data = urllib.parse.urlencode(values) data = data.encode('ascii') req = urllib.request.Request(url, data, headers) with urllib.request.urlopen(req) as response: the_page = response.read() The response also has two useful methods. See the section on info and geturl which comes after we have a look at what happens when things go wrong. Handling Exceptions =================== *urlopen* raises "URLError" when it cannot handle a response (though as usual with Python APIs, built-in exceptions such as "ValueError", "TypeError" etc. may also be raised). "HTTPError" is the subclass of "URLError" raised in the specific case of HTTP URLs. The exception classes are exported from the "urllib.error" module. URLError -------- Often, URLError is raised because there is no network connection (no route to the specified server), or the specified server doesn’t exist. In this case, the exception raised will have a ‘reason’ attribute, which is a tuple containing an error code and a text error message. e.g. >>> req = urllib.request.Request('http://www.pretend_server.org') >>> try: urllib.request.urlopen(req) ... except urllib.error.URLError as e: ... print(e.reason) ... (4, 'getaddrinfo failed') HTTPError --------- Every HTTP response from the server contains a numeric “status code”. Sometimes the status code indicates that the server is unable to fulfil the request. The default handlers will handle some of these responses for you (for example, if the response is a “redirection” that requests the client fetch the document from a different URL, urllib will handle that for you). For those it can’t handle, urlopen will raise an "HTTPError". Typical errors include ‘404’ (page not found), ‘403’ (request forbidden), and ‘401’ (authentication required). See section 10 of **RFC 2616** for a reference on all the HTTP error codes. The "HTTPError" instance raised will have an integer ‘code’ attribute, which corresponds to the error sent by the server. Error Codes ~~~~~~~~~~~ Because the default handlers handle redirects (codes in the 300 range), and codes in the 100–299 range indicate success, you will usually only see error codes in the 400–599 range. "http.server.BaseHTTPRequestHandler.responses" is a useful dictionary of response codes in that shows all the response codes used by **RFC 2616**. The dictionary is reproduced here for convenience # Table mapping response codes to messages; entries have the # form {code: (shortmessage, longmessage)}. responses = { 100: ('Continue', 'Request received, please continue'), 101: ('Switching Protocols', 'Switching to new protocol; obey Upgrade header'), 200: ('OK', 'Request fulfilled, document follows'), 201: ('Created', 'Document created, URL follows'), 202: ('Accepted', 'Request accepted, processing continues off-line'), 203: ('Non-Authoritative Information', 'Request fulfilled from cache'), 204: ('No Content', 'Request fulfilled, nothing follows'), 205: ('Reset Content', 'Clear input form for further input.'), 206: ('Partial Content', 'Partial content follows.'), 300: ('Multiple Choices', 'Object has several resources -- see URI list'), 301: ('Moved Permanently', 'Object moved permanently -- see URI list'), 302: ('Found', 'Object moved temporarily -- see URI list'), 303: ('See Other', 'Object moved -- see Method and URL list'), 304: ('Not Modified', 'Document has not changed since given time'), 305: ('Use Proxy', 'You must use proxy specified in Location to access this ' 'resource.'), 307: ('Temporary Redirect', 'Object moved temporarily -- see URI list'), 400: ('Bad Request', 'Bad request syntax or unsupported method'), 401: ('Unauthorized', 'No permission -- see authorization schemes'), 402: ('Payment Required', 'No payment -- see charging schemes'), 403: ('Forbidden', 'Request forbidden -- authorization will not help'), 404: ('Not Found', 'Nothing matches the given URI'), 405: ('Method Not Allowed', 'Specified method is invalid for this server.'), 406: ('Not Acceptable', 'URI not available in preferred format.'), 407: ('Proxy Authentication Required', 'You must authenticate with ' 'this proxy before proceeding.'), 408: ('Request Timeout', 'Request timed out; try again later.'), 409: ('Conflict', 'Request conflict.'), 410: ('Gone', 'URI no longer exists and has been permanently removed.'), 411: ('Length Required', 'Client must specify Content-Length.'), 412: ('Precondition Failed', 'Precondition in headers is false.'), 413: ('Request Entity Too Large', 'Entity is too large.'), 414: ('Request-URI Too Long', 'URI is too long.'), 415: ('Unsupported Media Type', 'Entity body in unsupported format.'), 416: ('Requested Range Not Satisfiable', 'Cannot satisfy request range.'), 417: ('Expectation Failed', 'Expect condition could not be satisfied.'), 500: ('Internal Server Error', 'Server got itself in trouble'), 501: ('Not Implemented', 'Server does not support this operation'), 502: ('Bad Gateway', 'Invalid responses from another server/proxy.'), 503: ('Service Unavailable', 'The server cannot process the request due to a high load'), 504: ('Gateway Timeout', 'The gateway server did not receive a timely response'), 505: ('HTTP Version Not Supported', 'Cannot fulfill request.'), } When an error is raised the server responds by returning an HTTP error code *and* an error page. You can use the "HTTPError" instance as a response on the page returned. This means that as well as the code attribute, it also has read, geturl, and info, methods as returned by the "urllib.response" module: >>> req = urllib.request.Request('http://www.python.org/fish.html') >>> try: ... urllib.request.urlopen(req) ... except urllib.error.HTTPError as e: ... print(e.code) ... print(e.read()) ... 404 b'\n\n\nPage Not Found\n ... Wrapping it Up -------------- So if you want to be prepared for "HTTPError" *or* "URLError" there are two basic approaches. I prefer the second approach. Number 1 ~~~~~~~~ from urllib.request import Request, urlopen from urllib.error import URLError, HTTPError req = Request(someurl) try: response = urlopen(req) except HTTPError as e: print('The server couldn\'t fulfill the request.') print('Error code: ', e.code) except URLError as e: print('We failed to reach a server.') print('Reason: ', e.reason) else: # everything is fine Note: The "except HTTPError" *must* come first, otherwise "except URLError" will *also* catch an "HTTPError". Number 2 ~~~~~~~~ from urllib.request import Request, urlopen from urllib.error import URLError req = Request(someurl) try: response = urlopen(req) except URLError as e: if hasattr(e, 'reason'): print('We failed to reach a server.') print('Reason: ', e.reason) elif hasattr(e, 'code'): print('The server couldn\'t fulfill the request.') print('Error code: ', e.code) else: # everything is fine info and geturl =============== The response returned by urlopen (or the "HTTPError" instance) has two useful methods "info()" and "geturl()" and is defined in the module "urllib.response".. **geturl** - this returns the real URL of the page fetched. This is useful because "urlopen" (or the opener object used) may have followed a redirect. The URL of the page fetched may not be the same as the URL requested. **info** - this returns a dictionary-like object that describes the page fetched, particularly the headers sent by the server. It is currently an "http.client.HTTPMessage" instance. Typical headers include ‘Content-length’, ‘Content-type’, and so on. See the Quick Reference to HTTP Headers for a useful listing of HTTP headers with brief explanations of their meaning and use. Openers and Handlers ==================== When you fetch a URL you use an opener (an instance of the perhaps confusingly-named "urllib.request.OpenerDirector"). Normally we have been using the default opener - via "urlopen" - but you can create custom openers. Openers use handlers. All the “heavy lifting” is done by the handlers. Each handler knows how to open URLs for a particular URL scheme (http, ftp, etc.), or how to handle an aspect of URL opening, for example HTTP redirections or HTTP cookies. You will want to create openers if you want to fetch URLs with specific handlers installed, for example to get an opener that handles cookies, or to get an opener that does not handle redirections. To create an opener, instantiate an "OpenerDirector", and then call ".add_handler(some_handler_instance)" repeatedly. Alternatively, you can use "build_opener", which is a convenience function for creating opener objects with a single function call. "build_opener" adds several handlers by default, but provides a quick way to add more and/or override the default handlers. Other sorts of handlers you might want to can handle proxies, authentication, and other common but slightly specialised situations. "install_opener" can be used to make an "opener" object the (global) default opener. This means that calls to "urlopen" will use the opener you have installed. Opener objects have an "open" method, which can be called directly to fetch urls in the same way as the "urlopen" function: there’s no need to call "install_opener", except as a convenience. Basic Authentication ==================== To illustrate creating and installing a handler we will use the "HTTPBasicAuthHandler". For a more detailed discussion of this subject – including an explanation of how Basic Authentication works - see the Basic Authentication Tutorial. When authentication is required, the server sends a header (as well as the 401 error code) requesting authentication. This specifies the authentication scheme and a ‘realm’. The header looks like: "WWW- Authenticate: SCHEME realm="REALM"". e.g. WWW-Authenticate: Basic realm="cPanel Users" The client should then retry the request with the appropriate name and password for the realm included as a header in the request. This is ‘basic authentication’. In order to simplify this process we can create an instance of "HTTPBasicAuthHandler" and an opener to use this handler. The "HTTPBasicAuthHandler" uses an object called a password manager to handle the mapping of URLs and realms to passwords and usernames. If you know what the realm is (from the authentication header sent by the server), then you can use a "HTTPPasswordMgr". Frequently one doesn’t care what the realm is. In that case, it is convenient to use "HTTPPasswordMgrWithDefaultRealm". This allows you to specify a default username and password for a URL. This will be supplied in the absence of you providing an alternative combination for a specific realm. We indicate this by providing "None" as the realm argument to the "add_password" method. The top-level URL is the first URL that requires authentication. URLs “deeper” than the URL you pass to .add_password() will also match. # create a password manager password_mgr = urllib.request.HTTPPasswordMgrWithDefaultRealm() # Add the username and password. # If we knew the realm, we could use it instead of None. top_level_url = "http://example.com/foo/" password_mgr.add_password(None, top_level_url, username, password) handler = urllib.request.HTTPBasicAuthHandler(password_mgr) # create "opener" (OpenerDirector instance) opener = urllib.request.build_opener(handler) # use the opener to fetch a URL opener.open(a_url) # Install the opener. # Now all calls to urllib.request.urlopen use our opener. urllib.request.install_opener(opener) Note: In the above example we only supplied our "HTTPBasicAuthHandler" to "build_opener". By default openers have the handlers for normal situations – "ProxyHandler" (if a proxy setting such as an "http_proxy" environment variable is set), "UnknownHandler", "HTTPHandler", "HTTPDefaultErrorHandler", "HTTPRedirectHandler", "FTPHandler", "FileHandler", "DataHandler", "HTTPErrorProcessor". "top_level_url" is in fact *either* a full URL (including the ‘http:’ scheme component and the hostname and optionally the port number) e.g. ""http://example.com/"" *or* an “authority” (i.e. the hostname, optionally including the port number) e.g. ""example.com"" or ""example.com:8080"" (the latter example includes a port number). The authority, if present, must NOT contain the “userinfo” component - for example ""joe:password@example.com"" is not correct. Proxies ======= **urllib** will auto-detect your proxy settings and use those. This is through the "ProxyHandler", which is part of the normal handler chain when a proxy setting is detected. Normally that’s a good thing, but there are occasions when it may not be helpful [5]. One way to do this is to setup our own "ProxyHandler", with no proxies defined. This is done using similar steps to setting up a Basic Authentication handler: >>> proxy_support = urllib.request.ProxyHandler({}) >>> opener = urllib.request.build_opener(proxy_support) >>> urllib.request.install_opener(opener) Note: Currently "urllib.request" *does not* support fetching of "https" locations through a proxy. However, this can be enabled by extending urllib.request as shown in the recipe [6]. Note: "HTTP_PROXY" will be ignored if a variable "REQUEST_METHOD" is set; see the documentation on "getproxies()". Sockets and Layers ================== The Python support for fetching resources from the web is layered. urllib uses the "http.client" library, which in turn uses the socket library. As of Python 2.3 you can specify how long a socket should wait for a response before timing out. This can be useful in applications which have to fetch web pages. By default the socket module has *no timeout* and can hang. Currently, the socket timeout is not exposed at the http.client or urllib.request levels. However, you can set the default timeout globally for all sockets using import socket import urllib.request # timeout in seconds timeout = 10 socket.setdefaulttimeout(timeout) # this call to urllib.request.urlopen now uses the default timeout # we have set in the socket module req = urllib.request.Request('http://www.voidspace.org.uk') response = urllib.request.urlopen(req) ====================================================================== Footnotes ========= This document was reviewed and revised by John Lee. [1] Google for example. [2] Browser sniffing is a very bad practice for website design - building sites using web standards is much more sensible. Unfortunately a lot of sites still send different versions to different browsers. [3] The user agent for MSIE 6 is *‘Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)’* [4] For details of more HTTP request headers, see Quick Reference to HTTP Headers. [5] In my case I have to use a proxy to access the internet at work. If you attempt to fetch *localhost* URLs through this proxy it blocks them. IE is set to use the proxy, which urllib picks up on. In order to test scripts with a localhost server, I have to prevent urllib from using the proxy. [6] urllib opener for SSL proxy (CONNECT method): ASPN Cookbook Recipe. Installing Python Modules (Legacy version) ****************************************** Author: Greg Ward See also: Installing Python Modules The up to date module installation documentation. For regular Python usage, you almost certainly want that document rather than this one. Note: This document is being retained solely until the "setuptools" documentation at https://setuptools.readthedocs.io/en/latest/setuptools.html independently covers all of the relevant information currently included here. Note: This guide only covers the basic tools for building and distributing extensions that are provided as part of this version of Python. Third party tools offer easier to use and more secure alternatives. Refer to the quick recommendations section in the Python Packaging User Guide for more information. Introduction ============ In Python 2.0, the "distutils" API was first added to the standard library. This provided Linux distro maintainers with a standard way of converting Python projects into Linux distro packages, and system administrators with a standard way of installing them directly onto target systems. In the many years since Python 2.0 was released, tightly coupling the build system and package installer to the language runtime release cycle has turned out to be problematic, and it is now recommended that projects use the "pip" package installer and the "setuptools" build system, rather than using "distutils" directly. See Installing Python Modules and Distributing Python Modules for more details. This legacy documentation is being retained only until we’re confident that the "setuptools" documentation covers everything needed. Distutils based source distributions ------------------------------------ If you download a module source distribution, you can tell pretty quickly if it was packaged and distributed in the standard way, i.e. using the Distutils. First, the distribution’s name and version number will be featured prominently in the name of the downloaded archive, e.g. "foo-1.0.tar.gz" or "widget-0.9.7.zip". Next, the archive will unpack into a similarly-named directory: "foo-1.0" or "widget-0.9.7". Additionally, the distribution will contain a setup script "setup.py", and a file named "README.txt" or possibly just "README", which should explain that building and installing the module distribution is a simple matter of running one command from a terminal: python setup.py install For Windows, this command should be run from a command prompt window (Start ‣ Accessories): setup.py install If all these things are true, then you already know how to build and install the modules you’ve just downloaded: Run the command above. Unless you need to install things in a non-standard way or customize the build process, you don’t really need this manual. Or rather, the above command is everything you need to get out of this manual. Standard Build and Install ========================== As described in section Distutils based source distributions, building and installing a module distribution using the Distutils is usually one simple command to run from a terminal: python setup.py install Platform variations ------------------- You should always run the setup command from the distribution root directory, i.e. the top-level subdirectory that the module source distribution unpacks into. For example, if you’ve just downloaded a module source distribution "foo-1.0.tar.gz" onto a Unix system, the normal thing to do is: gunzip -c foo-1.0.tar.gz | tar xf - # unpacks into directory foo-1.0 cd foo-1.0 python setup.py install On Windows, you’d probably download "foo-1.0.zip". If you downloaded the archive file to "C:\Temp", then it would unpack into "C:\Temp\foo-1.0"; you can use either an archive manipulator with a graphical user interface (such as WinZip) or a command-line tool (such as **unzip** or **pkunzip**) to unpack the archive. Then, open a command prompt window and run: cd c:\Temp\foo-1.0 python setup.py install Splitting the job up -------------------- Running "setup.py install" builds and installs all modules in one run. If you prefer to work incrementally—especially useful if you want to customize the build process, or if things are going wrong—you can use the setup script to do one thing at a time. This is particularly helpful when the build and install will be done by different users—for example, you might want to build a module distribution and hand it off to a system administrator for installation (or do it yourself, with super-user privileges). For example, you can build everything in one step, and then install everything in a second step, by invoking the setup script twice: python setup.py build python setup.py install If you do this, you will notice that running the **install** command first runs the **build** command, which—in this case—quickly notices that it has nothing to do, since everything in the "build" directory is up-to-date. You may not need this ability to break things down often if all you do is install modules downloaded off the ‘net, but it’s very handy for more advanced tasks. If you get into distributing your own Python modules and extensions, you’ll run lots of individual Distutils commands on their own. How building works ------------------ As implied above, the **build** command is responsible for putting the files to install into a *build directory*. By default, this is "build" under the distribution root; if you’re excessively concerned with speed, or want to keep the source tree pristine, you can change the build directory with the "--build-base" option. For example: python setup.py build --build-base=/path/to/pybuild/foo-1.0 (Or you could do this permanently with a directive in your system or personal Distutils configuration file; see section Distutils Configuration Files.) Normally, this isn’t necessary. The default layout for the build tree is as follows: --- build/ --- lib/ or --- build/ --- lib./ temp./ where "" expands to a brief description of the current OS/hardware platform and Python version. The first form, with just a "lib" directory, is used for “pure module distributions”—that is, module distributions that include only pure Python modules. If a module distribution contains any extensions (modules written in C/C++), then the second form, with two "" directories, is used. In that case, the "temp.*plat*" directory holds temporary files generated by the compile/link process that don’t actually get installed. In either case, the "lib" (or "lib.*plat*") directory contains all Python modules (pure Python and extensions) that will be installed. In the future, more directories will be added to handle Python scripts, documentation, binary executables, and whatever else is needed to handle the job of installing Python modules and applications. How installation works ---------------------- After the **build** command runs (whether you run it explicitly, or the **install** command does it for you), the work of the **install** command is relatively simple: all it has to do is copy everything under "build/lib" (or "build/lib.*plat*") to your chosen installation directory. If you don’t choose an installation directory—i.e., if you just run "setup.py install"—then the **install** command installs to the standard location for third-party Python modules. This location varies by platform and by how you built/installed Python itself. On Unix (and macOS, which is also Unix-based), it also depends on whether the module distribution being installed is pure Python or contains extensions (“non-pure”): +-------------------+-------------------------------------------------------+----------------------------------------------------+---------+ | Platform | Standard installation location | Default value | Notes | |===================|=======================================================|====================================================|=========| | Unix (pure) | "*prefix*/lib/python*X.Y*/site-packages" | "/usr/local/lib/python*X.Y*/site-packages" | (1) | +-------------------+-------------------------------------------------------+----------------------------------------------------+---------+ | Unix (non-pure) | "*exec-prefix*/lib/python*X.Y*/site-packages" | "/usr/local/lib/python*X.Y*/site-packages" | (1) | +-------------------+-------------------------------------------------------+----------------------------------------------------+---------+ | Windows | "*prefix*\Lib\site-packages" | "C:\Python*XY*\Lib\site-packages" | (2) | +-------------------+-------------------------------------------------------+----------------------------------------------------+---------+ Notes: 1. Most Linux distributions include Python as a standard part of the system, so "*prefix*" and "*exec-prefix*" are usually both "/usr" on Linux. If you build Python yourself on Linux (or any Unix-like system), the default "*prefix*" and "*exec-prefix*" are "/usr/local". 2. The default installation directory on Windows was "C:\Program Files\Python" under Python 1.6a1, 1.5.2, and earlier. "*prefix*" and "*exec-prefix*" stand for the directories that Python is installed to, and where it finds its libraries at run-time. They are always the same under Windows, and very often the same under Unix and macOS. You can find out what your Python installation uses for "*prefix*" and "*exec-prefix*" by running Python in interactive mode and typing a few simple commands. Under Unix, just type "python" at the shell prompt. Under Windows, choose Start ‣ Programs ‣ Python X.Y ‣ Python (command line). Once the interpreter is started, you type Python code at the prompt. For example, on my Linux system, I type the three Python statements shown below, and get the output as shown, to find out my "*prefix*" and "*exec-prefix*": Python 2.4 (#26, Aug 7 2004, 17:19:02) Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.prefix '/usr' >>> sys.exec_prefix '/usr' A few other placeholders are used in this document: "*X.Y*" stands for the version of Python, for example "3.2"; "*abiflags*" will be replaced by the value of "sys.abiflags" or the empty string for platforms which don’t define ABI flags; "*distname*" will be replaced by the name of the module distribution being installed. Dots and capitalization are important in the paths; for example, a value that uses "python3.2" on UNIX will typically use "Python32" on Windows. If you don’t want to install modules to the standard location, or if you don’t have permission to write there, then you need to read about alternate installations in section Alternate Installation. If you want to customize your installation directories more heavily, see section Custom Installation on custom installations. Alternate Installation ====================== Often, it is necessary or desirable to install modules to a location other than the standard location for third-party Python modules. For example, on a Unix system you might not have permission to write to the standard third-party module directory. Or you might wish to try out a module before making it a standard part of your local Python installation. This is especially true when upgrading a distribution already present: you want to make sure your existing base of scripts still works with the new version before actually upgrading. The Distutils **install** command is designed to make installing module distributions to an alternate location simple and painless. The basic idea is that you supply a base directory for the installation, and the **install** command picks a set of directories (called an *installation scheme*) under this base directory in which to install files. The details differ across platforms, so read whichever of the following sections applies to you. Note that the various alternate installation schemes are mutually exclusive: you can pass "--user", or "--home", or "--prefix" and "-- exec-prefix", or "--install-base" and "--install-platbase", but you can’t mix from these groups. Alternate installation: the user scheme --------------------------------------- This scheme is designed to be the most convenient solution for users that don’t have write permission to the global site-packages directory or don’t want to install into it. It is enabled with a simple option: python setup.py install --user Files will be installed into subdirectories of "site.USER_BASE" (written as "*userbase*" hereafter). This scheme installs pure Python modules and extension modules in the same location (also known as "site.USER_SITE"). Here are the values for UNIX, including macOS: +-----------------+-------------------------------------------------------------+ | Type of file | Installation directory | |=================|=============================================================| | modules | "*userbase*/lib/python*X.Y*/site-packages" | +-----------------+-------------------------------------------------------------+ | scripts | "*userbase*/bin" | +-----------------+-------------------------------------------------------------+ | data | "*userbase*" | +-----------------+-------------------------------------------------------------+ | C headers | "*userbase*/include/python*X.Y**abiflags*/*distname*" | +-----------------+-------------------------------------------------------------+ And here are the values used on Windows: +-----------------+-------------------------------------------------------------+ | Type of file | Installation directory | |=================|=============================================================| | modules | "*userbase*\Python*XY*\site-packages" | +-----------------+-------------------------------------------------------------+ | scripts | "*userbase*\Python*XY*\Scripts" | +-----------------+-------------------------------------------------------------+ | data | "*userbase*" | +-----------------+-------------------------------------------------------------+ | C headers | "*userbase*\Python*XY*\Include{distname}" | +-----------------+-------------------------------------------------------------+ The advantage of using this scheme compared to the other ones described below is that the user site-packages directory is under normal conditions always included in "sys.path" (see "site" for more information), which means that there is no additional step to perform after running the "setup.py" script to finalize the installation. The **build_ext** command also has a "--user" option to add "*userbase*/include" to the compiler search path for header files and "*userbase*/lib" to the compiler search path for libraries as well as to the runtime search path for shared C libraries (rpath). Alternate installation: the home scheme --------------------------------------- The idea behind the “home scheme” is that you build and maintain a personal stash of Python modules. This scheme’s name is derived from the idea of a “home” directory on Unix, since it’s not unusual for a Unix user to make their home directory have a layout similar to "/usr/" or "/usr/local/". This scheme can be used by anyone, regardless of the operating system they are installing for. Installing a new module distribution is as simple as python setup.py install --home= where you can supply any directory you like for the "--home" option. On Unix, lazy typists can just type a tilde ("~"); the **install** command will expand this to your home directory: python setup.py install --home=~ To make Python find the distributions installed with this scheme, you may have to modify Python’s search path or edit "sitecustomize" (see "site") to call "site.addsitedir()" or edit "sys.path". The "--home" option defines the installation base directory. Files are installed to the following directories under the installation base as follows: +-----------------+-------------------------------------------------------------+ | Type of file | Installation directory | |=================|=============================================================| | modules | "*home*/lib/python" | +-----------------+-------------------------------------------------------------+ | scripts | "*home*/bin" | +-----------------+-------------------------------------------------------------+ | data | "*home*" | +-----------------+-------------------------------------------------------------+ | C headers | "*home*/include/python/*distname*" | +-----------------+-------------------------------------------------------------+ (Mentally replace slashes with backslashes if you’re on Windows.) Alternate installation: Unix (the prefix scheme) ------------------------------------------------ The “prefix scheme” is useful when you wish to use one Python installation to perform the build/install (i.e., to run the setup script), but install modules into the third-party module directory of a different Python installation (or something that looks like a different Python installation). If this sounds a trifle unusual, it is—that’s why the user and home schemes come before. However, there are at least two known cases where the prefix scheme will be useful. First, consider that many Linux distributions put Python in "/usr", rather than the more traditional "/usr/local". This is entirely appropriate, since in those cases Python is part of “the system” rather than a local add-on. However, if you are installing Python modules from source, you probably want them to go in "/usr/local/lib/python2.*X*" rather than "/usr/lib/python2.*X*". This can be done with /usr/bin/python setup.py install --prefix=/usr/local Another possibility is a network filesystem where the name used to write to a remote directory is different from the name used to read it: for example, the Python interpreter accessed as "/usr/local/bin/python" might search for modules in "/usr/local/lib/python2.*X*", but those modules would have to be installed to, say, "/mnt/*@server*/export/lib/python2.*X*". This could be done with /usr/local/bin/python setup.py install --prefix=/mnt/@server/export In either case, the "--prefix" option defines the installation base, and the "--exec-prefix" option defines the platform-specific installation base, which is used for platform-specific files. (Currently, this just means non-pure module distributions, but could be expanded to C libraries, binary executables, etc.) If "--exec- prefix" is not supplied, it defaults to "--prefix". Files are installed as follows: +-------------------+------------------------------------------------------------+ | Type of file | Installation directory | |===================|============================================================| | Python modules | "*prefix*/lib/python*X.Y*/site-packages" | +-------------------+------------------------------------------------------------+ | extension modules | "*exec-prefix*/lib/python*X.Y*/site-packages" | +-------------------+------------------------------------------------------------+ | scripts | "*prefix*/bin" | +-------------------+------------------------------------------------------------+ | data | "*prefix*" | +-------------------+------------------------------------------------------------+ | C headers | "*prefix*/include/python*X.Y**abiflags*/*distname*" | +-------------------+------------------------------------------------------------+ There is no requirement that "--prefix" or "--exec-prefix" actually point to an alternate Python installation; if the directories listed above do not already exist, they are created at installation time. Incidentally, the real reason the prefix scheme is important is simply that a standard Unix installation uses the prefix scheme, but with "-- prefix" and "--exec-prefix" supplied by Python itself as "sys.prefix" and "sys.exec_prefix". Thus, you might think you’ll never use the prefix scheme, but every time you run "python setup.py install" without any other options, you’re using it. Note that installing extensions to an alternate Python installation has no effect on how those extensions are built: in particular, the Python header files ("Python.h" and friends) installed with the Python interpreter used to run the setup script will be used in compiling extensions. It is your responsibility to ensure that the interpreter used to run extensions installed in this way is compatible with the interpreter used to build them. The best way to do this is to ensure that the two interpreters are the same version of Python (possibly different builds, or possibly copies of the same build). (Of course, if your "--prefix" and "--exec-prefix" don’t even point to an alternate Python installation, this is immaterial.) Alternate installation: Windows (the prefix scheme) --------------------------------------------------- Windows has no concept of a user’s home directory, and since the standard Python installation under Windows is simpler than under Unix, the "--prefix" option has traditionally been used to install additional packages in separate locations on Windows. python setup.py install --prefix="\Temp\Python" to install modules to the "\Temp\Python" directory on the current drive. The installation base is defined by the "--prefix" option; the "-- exec-prefix" option is not supported under Windows, which means that pure Python modules and extension modules are installed into the same location. Files are installed as follows: +-----------------+------------------------------------------------------------+ | Type of file | Installation directory | |=================|============================================================| | modules | "*prefix*\Lib\site-packages" | +-----------------+------------------------------------------------------------+ | scripts | "*prefix*\Scripts" | +-----------------+------------------------------------------------------------+ | data | "*prefix*" | +-----------------+------------------------------------------------------------+ | C headers | "*prefix*\Include{distname}" | +-----------------+------------------------------------------------------------+ Custom Installation =================== Sometimes, the alternate installation schemes described in section Alternate Installation just don’t do what you want. You might want to tweak just one or two directories while keeping everything under the same base directory, or you might want to completely redefine the installation scheme. In either case, you’re creating a *custom installation scheme*. To create a custom installation scheme, you start with one of the alternate schemes and override some of the installation directories used for the various types of files, using these options: +------------------------+-------------------------+ | Type of file | Override option | |========================|=========================| | Python modules | "--install-purelib" | +------------------------+-------------------------+ | extension modules | "--install-platlib" | +------------------------+-------------------------+ | all modules | "--install-lib" | +------------------------+-------------------------+ | scripts | "--install-scripts" | +------------------------+-------------------------+ | data | "--install-data" | +------------------------+-------------------------+ | C headers | "--install-headers" | +------------------------+-------------------------+ These override options can be relative, absolute, or explicitly defined in terms of one of the installation base directories. (There are two installation base directories, and they are normally the same—they only differ when you use the Unix “prefix scheme” and supply different "--prefix" and "--exec-prefix" options; using "--install- lib" will override values computed or given for "--install-purelib" and "--install-platlib", and is recommended for schemes that don’t make a difference between Python and extension modules.) For example, say you’re installing a module distribution to your home directory under Unix—but you want scripts to go in "~/scripts" rather than "~/bin". As you might expect, you can override this directory with the "--install-scripts" option; in this case, it makes most sense to supply a relative path, which will be interpreted relative to the installation base directory (your home directory, in this case): python setup.py install --home=~ --install-scripts=scripts Another Unix example: suppose your Python installation was built and installed with a prefix of "/usr/local/python", so under a standard installation scripts will wind up in "/usr/local/python/bin". If you want them in "/usr/local/bin" instead, you would supply this absolute directory for the "--install-scripts" option: python setup.py install --install-scripts=/usr/local/bin (This performs an installation using the “prefix scheme”, where the prefix is whatever your Python interpreter was installed with— "/usr/local/python" in this case.) If you maintain Python on Windows, you might want third-party modules to live in a subdirectory of "*prefix*", rather than right in "*prefix*" itself. This is almost as easy as customizing the script installation directory—you just have to remember that there are two types of modules to worry about, Python and extension modules, which can conveniently be both controlled by one option: python setup.py install --install-lib=Site The specified installation directory is relative to "*prefix*". Of course, you also have to ensure that this directory is in Python’s module search path, such as by putting a ".pth" file in a site directory (see "site"). See section Modifying Python’s Search Path to find out how to modify Python’s search path. If you want to define an entire installation scheme, you just have to supply all of the installation directory options. The recommended way to do this is to supply relative paths; for example, if you want to maintain all Python module-related files under "python" in your home directory, and you want a separate directory for each platform that you use your home directory from, you might define the following installation scheme: python setup.py install --home=~ \ --install-purelib=python/lib \ --install-platlib=python/lib.$PLAT \ --install-scripts=python/scripts --install-data=python/data or, equivalently, python setup.py install --home=~/python \ --install-purelib=lib \ --install-platlib='lib.$PLAT' \ --install-scripts=scripts --install-data=data "$PLAT" is not (necessarily) an environment variable—it will be expanded by the Distutils as it parses your command line options, just as it does when parsing your configuration file(s). Obviously, specifying the entire installation scheme every time you install a new module distribution would be very tedious. Thus, you can put these options into your Distutils config file (see section Distutils Configuration Files): [install] install-base=$HOME install-purelib=python/lib install-platlib=python/lib.$PLAT install-scripts=python/scripts install-data=python/data or, equivalently, [install] install-base=$HOME/python install-purelib=lib install-platlib=lib.$PLAT install-scripts=scripts install-data=data Note that these two are *not* equivalent if you supply a different installation base directory when you run the setup script. For example, python setup.py install --install-base=/tmp would install pure modules to "/tmp/python/lib" in the first case, and to "/tmp/lib" in the second case. (For the second case, you probably want to supply an installation base of "/tmp/python".) You probably noticed the use of "$HOME" and "$PLAT" in the sample configuration file input. These are Distutils configuration variables, which bear a strong resemblance to environment variables. In fact, you can use environment variables in config files on platforms that have such a notion but the Distutils additionally define a few extra variables that may not be in your environment, such as "$PLAT". (And of course, on systems that don’t have environment variables, such as Mac OS 9, the configuration variables supplied by the Distutils are the only ones you can use.) See section Distutils Configuration Files for details. Note: When a virtual environment is activated, any options that change the installation path will be ignored from all distutils configuration files to prevent inadvertently installing projects outside of the virtual environment. Modifying Python’s Search Path ------------------------------ When the Python interpreter executes an "import" statement, it searches for both Python code and extension modules along a search path. A default value for the path is configured into the Python binary when the interpreter is built. You can determine the path by importing the "sys" module and printing the value of "sys.path". $ python Python 2.2 (#11, Oct 3 2002, 13:31:27) [GCC 2.96 20000731 (Red Hat Linux 7.3 2.96-112)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys.path ['', '/usr/local/lib/python2.3', '/usr/local/lib/python2.3/plat-linux2', '/usr/local/lib/python2.3/lib-tk', '/usr/local/lib/python2.3/lib-dynload', '/usr/local/lib/python2.3/site-packages'] >>> The null string in "sys.path" represents the current working directory. The expected convention for locally installed packages is to put them in the "*…*/site-packages/" directory, but you may want to install Python modules into some arbitrary directory. For example, your site may have a convention of keeping all software related to the web server under "/www". Add-on Python modules might then belong in "/www/python", and in order to import them, this directory must be added to "sys.path". There are several different ways to add the directory. The most convenient way is to add a path configuration file to a directory that’s already on Python’s path, usually to the ".../site- packages/" directory. Path configuration files have an extension of ".pth", and each line must contain a single path that will be appended to "sys.path". (Because the new paths are appended to "sys.path", modules in the added directories will not override standard modules. This means you can’t use this mechanism for installing fixed versions of standard modules.) Paths can be absolute or relative, in which case they’re relative to the directory containing the ".pth" file. See the documentation of the "site" module for more information. A slightly less convenient way is to edit the "site.py" file in Python’s standard library, and modify "sys.path". "site.py" is automatically imported when the Python interpreter is executed, unless the "-S" switch is supplied to suppress this behaviour. So you could simply edit "site.py" and add two lines to it: import sys sys.path.append('/www/python/') However, if you reinstall the same major version of Python (perhaps when upgrading from 2.2 to 2.2.2, for example) "site.py" will be overwritten by the stock version. You’d have to remember that it was modified and save a copy before doing the installation. There are two environment variables that can modify "sys.path". "PYTHONHOME" sets an alternate value for the prefix of the Python installation. For example, if "PYTHONHOME" is set to "/www/python", the search path will be set to "['', '/www/python/lib/pythonX.Y/', '/www/python/lib/pythonX.Y/plat-linux2', ...]". The "PYTHONPATH" variable can be set to a list of paths that will be added to the beginning of "sys.path". For example, if "PYTHONPATH" is set to "/www/python:/opt/py", the search path will begin with "['/www/python', '/opt/py']". (Note that directories must exist in order to be added to "sys.path"; the "site" module removes paths that don’t exist.) Finally, "sys.path" is just a regular Python list, so any Python application can modify it by adding or removing entries. Distutils Configuration Files ============================= As mentioned above, you can use Distutils configuration files to record personal or site preferences for any Distutils options. That is, any option to any command can be stored in one of two or three (depending on your platform) configuration files, which will be consulted before the command-line is parsed. This means that configuration files will override default values, and the command-line will in turn override configuration files. Furthermore, if multiple configuration files apply, values from “earlier” files are overridden by “later” files. Location and names of config files ---------------------------------- The names and locations of the configuration files vary slightly across platforms. On Unix and macOS, the three configuration files (in the order they are processed) are: +----------------+------------------------------------------------------------+---------+ | Type of file | Location and filename | Notes | |================|============================================================|=========| | system | "*prefix*/lib/python*ver*/distutils/distutils.cfg" | (1) | +----------------+------------------------------------------------------------+---------+ | personal | "$HOME/.pydistutils.cfg" | (2) | +----------------+------------------------------------------------------------+---------+ | local | "setup.cfg" | (3) | +----------------+------------------------------------------------------------+---------+ And on Windows, the configuration files are: +----------------+---------------------------------------------------+---------+ | Type of file | Location and filename | Notes | |================|===================================================|=========| | system | "*prefix*\Lib\distutils\distutils.cfg" | (4) | +----------------+---------------------------------------------------+---------+ | personal | "%HOME%\pydistutils.cfg" | (5) | +----------------+---------------------------------------------------+---------+ | local | "setup.cfg" | (3) | +----------------+---------------------------------------------------+---------+ On all platforms, the “personal” file can be temporarily disabled by passing the *–no-user-cfg* option. Notes: 1. Strictly speaking, the system-wide configuration file lives in the directory where the Distutils are installed; under Python 1.6 and later on Unix, this is as shown. For Python 1.5.2, the Distutils will normally be installed to "*prefix*/lib/python1.5/site- packages/distutils", so the system configuration file should be put there under Python 1.5.2. 2. On Unix, if the "HOME" environment variable is not defined, the user’s home directory will be determined with the "getpwuid()" function from the standard "pwd" module. This is done by the "os.path.expanduser()" function used by Distutils. 3. I.e., in the current directory (usually the location of the setup script). 4. (See also note (1).) Under Python 1.6 and later, Python’s default “installation prefix” is "C:\Python", so the system configuration file is normally "C:\Python\Lib\distutils\distutils.cfg". Under Python 1.5.2, the default prefix was "C:\Program Files\Python", and the Distutils were not part of the standard library—so the system configuration file would be "C:\Program Files\Python\distutils\distutils.cfg" in a standard Python 1.5.2 installation under Windows. 5. On Windows, if the "HOME" environment variable is not defined, "USERPROFILE" then "HOMEDRIVE" and "HOMEPATH" will be tried. This is done by the "os.path.expanduser()" function used by Distutils. Syntax of config files ---------------------- The Distutils configuration files all have the same syntax. The config files are grouped into sections. There is one section for each Distutils command, plus a "global" section for global options that affect every command. Each section consists of one option per line, specified as "option=value". For example, the following is a complete config file that just forces all commands to run quietly by default: [global] verbose=0 If this is installed as the system config file, it will affect all processing of any Python module distribution by any user on the current system. If it is installed as your personal config file (on systems that support them), it will affect only module distributions processed by you. And if it is used as the "setup.cfg" for a particular module distribution, it affects only that distribution. You could override the default “build base” directory and make the **build*** commands always forcibly rebuild all files with the following: [build] build-base=blib force=1 which corresponds to the command-line arguments python setup.py build --build-base=blib --force except that including the **build** command on the command-line means that command will be run. Including a particular command in config files has no such implication; it only means that if the command is run, the options in the config file will apply. (Or if other commands that derive values from it are run, they will use the values in the config file.) You can find out the complete list of options for any command using the "--help" option, e.g.: python setup.py build --help and you can find out the complete list of global options by using "-- help" without a command: python setup.py --help See also the “Reference” section of the “Distributing Python Modules” manual. Building Extensions: Tips and Tricks ==================================== Whenever possible, the Distutils try to use the configuration information made available by the Python interpreter used to run the "setup.py" script. For example, the same compiler and linker flags used to compile Python will also be used for compiling extensions. Usually this will work well, but in complicated situations this might be inappropriate. This section discusses how to override the usual Distutils behaviour. Tweaking compiler/linker flags ------------------------------ Compiling a Python extension written in C or C++ will sometimes require specifying custom flags for the compiler and linker in order to use a particular library or produce a special kind of object code. This is especially true if the extension hasn’t been tested on your platform, or if you’re trying to cross-compile Python. In the most general case, the extension author might have foreseen that compiling the extensions would be complicated, and provided a "Setup" file for you to edit. This will likely only be done if the module distribution contains many separate extension modules, or if they often require elaborate sets of compiler flags in order to work. A "Setup" file, if present, is parsed in order to get a list of extensions to build. Each line in a "Setup" describes a single module. Lines have the following structure: module ... [sourcefile ...] [cpparg ...] [library ...] Let’s examine each of the fields in turn. * *module* is the name of the extension module to be built, and should be a valid Python identifier. You can’t just change this in order to rename a module (edits to the source code would also be needed), so this should be left alone. * *sourcefile* is anything that’s likely to be a source code file, at least judging by the filename. Filenames ending in ".c" are assumed to be written in C, filenames ending in ".C", ".cc", and ".c++" are assumed to be C++, and filenames ending in ".m" or ".mm" are assumed to be in Objective C. * *cpparg* is an argument for the C preprocessor, and is anything starting with "-I", "-D", "-U" or "-C". * *library* is anything ending in ".a" or beginning with "-l" or "-L". If a particular platform requires a special library on your platform, you can add it by editing the "Setup" file and running "python setup.py build". For example, if the module defined by the line foo foomodule.c must be linked with the math library "libm.a" on your platform, simply add "-lm" to the line: foo foomodule.c -lm Arbitrary switches intended for the compiler or the linker can be supplied with the "-Xcompiler" *arg* and "-Xlinker" *arg* options: foo foomodule.c -Xcompiler -o32 -Xlinker -shared -lm The next option after "-Xcompiler" and "-Xlinker" will be appended to the proper command line, so in the above example the compiler will be passed the "-o32" option, and the linker will be passed "-shared". If a compiler option requires an argument, you’ll have to supply multiple "-Xcompiler" options; for example, to pass "-x c++" the "Setup" file would have to contain "-Xcompiler -x -Xcompiler c++". Compiler flags can also be supplied through setting the "CFLAGS" environment variable. If set, the contents of "CFLAGS" will be added to the compiler flags specified in the "Setup" file. Using non-Microsoft compilers on Windows ---------------------------------------- Borland/CodeGear C++ ~~~~~~~~~~~~~~~~~~~~ This subsection describes the necessary steps to use Distutils with the Borland C++ compiler version 5.5. First you have to know that Borland’s object file format (OMF) is different from the format used by the Python version you can download from the Python or ActiveState Web site. (Python is built with Microsoft Visual C++, which uses COFF as the object file format.) For this reason you have to convert Python’s library "python25.lib" into the Borland format. You can do this as follows: coff2omf python25.lib python25_bcpp.lib The "coff2omf" program comes with the Borland compiler. The file "python25.lib" is in the "Libs" directory of your Python installation. If your extension uses other libraries (zlib, …) you have to convert them too. The converted files have to reside in the same directories as the normal libraries. How does Distutils manage to use these libraries with their changed names? If the extension needs a library (eg. "foo") Distutils checks first if it finds a library with suffix "_bcpp" (eg. "foo_bcpp.lib") and then uses this library. In the case it doesn’t find such a special library it uses the default name ("foo.lib".) [1] To let Distutils compile your extension with Borland C++ you now have to type: python setup.py build --compiler=bcpp If you want to use the Borland C++ compiler as the default, you could specify this in your personal or system-wide configuration file for Distutils (see section Distutils Configuration Files.) See also: C++Builder Compiler Information about the free C++ compiler from Borland, including links to the download pages. Creating Python Extensions Using Borland’s Free Compiler Document describing how to use Borland’s free command-line C++ compiler to build Python. GNU C / Cygwin / MinGW ~~~~~~~~~~~~~~~~~~~~~~ This section describes the necessary steps to use Distutils with the GNU C/C++ compilers in their Cygwin and MinGW distributions. [2] For a Python interpreter that was built with Cygwin, everything should work without any of these following steps. Not all extensions can be built with MinGW or Cygwin, but many can. Extensions most likely to not work are those that use C++ or depend on Microsoft Visual C extensions. To let Distutils compile your extension with Cygwin you have to type: python setup.py build --compiler=cygwin and for Cygwin in no-cygwin mode [3] or for MinGW type: python setup.py build --compiler=mingw32 If you want to use any of these options/compilers as default, you should consider writing it in your personal or system-wide configuration file for Distutils (see section Distutils Configuration Files.) Older Versions of Python and MinGW """""""""""""""""""""""""""""""""" The following instructions only apply if you’re using a version of Python inferior to 2.4.1 with a MinGW inferior to 3.0.0 (with binutils-2.13.90-20030111-1). These compilers require some special libraries. This task is more complex than for Borland’s C++, because there is no program to convert the library. First you have to create a list of symbols which the Python DLL exports. (You can find a good program for this task at htt ps://sourceforge.net/projects/mingw/files/MinGW/Extension/pexports/). pexports python25.dll >python25.def The location of an installed "python25.dll" will depend on the installation options and the version and language of Windows. In a “just for me” installation, it will appear in the root of the installation directory. In a shared installation, it will be located in the system directory. Then you can create from these information an import library for gcc. /cygwin/bin/dlltool --dllname python25.dll --def python25.def --output-lib libpython25.a The resulting library has to be placed in the same directory as "python25.lib". (Should be the "libs" directory under your Python installation directory.) If your extension uses other libraries (zlib,…) you might have to convert them too. The converted files have to reside in the same directories as the normal libraries do. See also: Building Python modules on MS Windows platform with MinGW Information about building the required libraries for the MinGW environment. -[ Footnotes ]- [1] This also means you could replace all existing COFF-libraries with OMF-libraries of the same name. [2] Check https://www.sourceware.org/cygwin/ for more information [3] Then you have no POSIX emulation available, but you also don’t need "cygwin1.dll". Installing Python Modules ************************* Email: distutils-sig@python.org As a popular open source development project, Python has an active supporting community of contributors and users that also make their software available for other Python developers to use under open source license terms. This allows Python users to share and collaborate effectively, benefiting from the solutions others have already created to common (and sometimes even rare!) problems, as well as potentially contributing their own solutions to the common pool. This guide covers the installation part of the process. For a guide to creating and sharing your own Python projects, refer to the distribution guide. Note: For corporate and other institutional users, be aware that many organisations have their own policies around using and contributing to open source software. Please take such policies into account when making use of the distribution and installation tools provided with Python. Key terms ========= * "pip" is the preferred installer program. Starting with Python 3.4, it is included by default with the Python binary installers. * A *virtual environment* is a semi-isolated Python environment that allows packages to be installed for use by a particular application, rather than being installed system wide. * "venv" is the standard tool for creating virtual environments, and has been part of Python since Python 3.3. Starting with Python 3.4, it defaults to installing "pip" into all created virtual environments. * "virtualenv" is a third party alternative (and predecessor) to "venv". It allows virtual environments to be used on versions of Python prior to 3.4, which either don’t provide "venv" at all, or aren’t able to automatically install "pip" into created environments. * The Python Package Index is a public repository of open source licensed packages made available for use by other Python users. * the Python Packaging Authority is the group of developers and documentation authors responsible for the maintenance and evolution of the standard packaging tools and the associated metadata and file format standards. They maintain a variety of tools, documentation, and issue trackers on both GitHub and Bitbucket. * "distutils" is the original build and distribution system first added to the Python standard library in 1998. While direct use of "distutils" is being phased out, it still laid the foundation for the current packaging and distribution infrastructure, and it not only remains part of the standard library, but its name lives on in other ways (such as the name of the mailing list used to coordinate Python packaging standards development). Changed in version 3.5: The use of "venv" is now recommended for creating virtual environments. See also: Python Packaging User Guide: Creating and using virtual environments Basic usage =========== The standard packaging tools are all designed to be used from the command line. The following command will install the latest version of a module and its dependencies from the Python Package Index: python -m pip install SomePackage Note: For POSIX users (including macOS and Linux users), the examples in this guide assume the use of a *virtual environment*.For Windows users, the examples in this guide assume that the option to adjust the system PATH environment variable was selected when installing Python. It’s also possible to specify an exact or minimum version directly on the command line. When using comparator operators such as ">", "<" or some other special character which get interpreted by shell, the package name and the version should be enclosed within double quotes: python -m pip install SomePackage==1.0.4 # specific version python -m pip install "SomePackage>=1.0.4" # minimum version Normally, if a suitable module is already installed, attempting to install it again will have no effect. Upgrading existing modules must be requested explicitly: python -m pip install --upgrade SomePackage More information and resources regarding "pip" and its capabilities can be found in the Python Packaging User Guide. Creation of virtual environments is done through the "venv" module. Installing packages into an active virtual environment uses the commands shown above. See also: Python Packaging User Guide: Installing Python Distribution Packages How do I …? =========== These are quick answers or links for some common tasks. … install "pip" in versions of Python prior to Python 3.4? ---------------------------------------------------------- Python only started bundling "pip" with Python 3.4. For earlier versions, "pip" needs to be “bootstrapped” as described in the Python Packaging User Guide. See also: Python Packaging User Guide: Requirements for Installing Packages … install packages just for the current user? --------------------------------------------- Passing the "--user" option to "python -m pip install" will install a package just for the current user, rather than for all users of the system. … install scientific Python packages? ------------------------------------- A number of scientific Python packages have complex binary dependencies, and aren’t currently easy to install using "pip" directly. At this point in time, it will often be easier for users to install these packages by other means rather than attempting to install them with "pip". See also: Python Packaging User Guide: Installing Scientific Packages … work with multiple versions of Python installed in parallel? -------------------------------------------------------------- On Linux, macOS, and other POSIX systems, use the versioned Python commands in combination with the "-m" switch to run the appropriate copy of "pip": python2 -m pip install SomePackage # default Python 2 python2.7 -m pip install SomePackage # specifically Python 2.7 python3 -m pip install SomePackage # default Python 3 python3.4 -m pip install SomePackage # specifically Python 3.4 Appropriately versioned "pip" commands may also be available. On Windows, use the "py" Python launcher in combination with the "-m" switch: py -2 -m pip install SomePackage # default Python 2 py -2.7 -m pip install SomePackage # specifically Python 2.7 py -3 -m pip install SomePackage # default Python 3 py -3.4 -m pip install SomePackage # specifically Python 3.4 Common installation issues ========================== Installing into the system Python on Linux ------------------------------------------ On Linux systems, a Python installation will typically be included as part of the distribution. Installing into this Python installation requires root access to the system, and may interfere with the operation of the system package manager and other components of the system if a component is unexpectedly upgraded using "pip". On such systems, it is often better to use a virtual environment or a per-user installation when installing packages with "pip". Pip not installed ----------------- It is possible that "pip" does not get installed by default. One potential fix is: python -m ensurepip --default-pip There are also additional resources for installing pip. Installing binary extensions ---------------------------- Python has typically relied heavily on source based distribution, with end users being expected to compile extension modules from source as part of the installation process. With the introduction of support for the binary "wheel" format, and the ability to publish wheels for at least Windows and macOS through the Python Package Index, this problem is expected to diminish over time, as users are more regularly able to install pre-built extensions rather than needing to build them themselves. Some of the solutions for installing scientific software that are not yet available as pre-built "wheel" files may also help with obtaining other binary extensions without needing to build them locally. See also: Python Packaging User Guide: Binary Extensions 2to3 - Automated Python 2 to 3 code translation *********************************************** 2to3 is a Python program that reads Python 2.x source code and applies a series of *fixers* to transform it into valid Python 3.x code. The standard library contains a rich set of fixers that will handle almost all code. 2to3 supporting library "lib2to3" is, however, a flexible and generic library, so it is possible to write your own fixers for 2to3. Using 2to3 ========== 2to3 will usually be installed with the Python interpreter as a script. It is also located in the "Tools/scripts" directory of the Python root. 2to3’s basic arguments are a list of files or directories to transform. The directories are recursively traversed for Python sources. Here is a sample Python 2.x source file, "example.py": def greet(name): print "Hello, {0}!".format(name) print "What's your name?" name = raw_input() greet(name) It can be converted to Python 3.x code via 2to3 on the command line: $ 2to3 example.py A diff against the original source file is printed. 2to3 can also write the needed modifications right back to the source file. (A backup of the original file is made unless "-n" is also given.) Writing the changes back is enabled with the "-w" flag: $ 2to3 -w example.py After transformation, "example.py" looks like this: def greet(name): print("Hello, {0}!".format(name)) print("What's your name?") name = input() greet(name) Comments and exact indentation are preserved throughout the translation process. By default, 2to3 runs a set of predefined fixers. The "-l" flag lists all available fixers. An explicit set of fixers to run can be given with "-f". Likewise the "-x" explicitly disables a fixer. The following example runs only the "imports" and "has_key" fixers: $ 2to3 -f imports -f has_key example.py This command runs every fixer except the "apply" fixer: $ 2to3 -x apply example.py Some fixers are *explicit*, meaning they aren’t run by default and must be listed on the command line to be run. Here, in addition to the default fixers, the "idioms" fixer is run: $ 2to3 -f all -f idioms example.py Notice how passing "all" enables all default fixers. Sometimes 2to3 will find a place in your source code that needs to be changed, but 2to3 cannot fix automatically. In this case, 2to3 will print a warning beneath the diff for a file. You should address the warning in order to have compliant 3.x code. 2to3 can also refactor doctests. To enable this mode, use the "-d" flag. Note that *only* doctests will be refactored. This also doesn’t require the module to be valid Python. For example, doctest like examples in a reST document could also be refactored with this option. The "-v" option enables output of more information on the translation process. Since some print statements can be parsed as function calls or statements, 2to3 cannot always read files containing the print function. When 2to3 detects the presence of the "from __future__ import print_function" compiler directive, it modifies its internal grammar to interpret "print()" as a function. This change can also be enabled manually with the "-p" flag. Use "-p" to run fixers on code that already has had its print statements converted. Also "-e" can be used to make "exec()" a function. The "-o" or "--output-dir" option allows specification of an alternate directory for processed output files to be written to. The "-n" flag is required when using this as backup files do not make sense when not overwriting the input files. New in version 3.2.3: The "-o" option was added. The "-W" or "--write-unchanged-files" flag tells 2to3 to always write output files even if no changes were required to the file. This is most useful with "-o" so that an entire Python source tree is copied with translation from one directory to another. This option implies the "-w" flag as it would not make sense otherwise. New in version 3.2.3: The "-W" flag was added. The "--add-suffix" option specifies a string to append to all output filenames. The "-n" flag is required when specifying this as backups are not necessary when writing to different filenames. Example: $ 2to3 -n -W --add-suffix=3 example.py Will cause a converted file named "example.py3" to be written. New in version 3.2.3: The "--add-suffix" option was added. To translate an entire project from one directory tree to another use: $ 2to3 --output-dir=python3-version/mycode -W -n python2-version/mycode Fixers ====== Each step of transforming code is encapsulated in a fixer. The command "2to3 -l" lists them. As documented above, each can be turned on and off individually. They are described here in more detail. apply Removes usage of "apply()". For example "apply(function, *args, **kwargs)" is converted to "function(*args, **kwargs)". asserts Replaces deprecated "unittest" method names with the correct ones. +----------------------------------+--------------------------------------------+ | From | To | |==================================|============================================| | "failUnlessEqual(a, b)" | "assertEqual(a, b)" | +----------------------------------+--------------------------------------------+ | "assertEquals(a, b)" | "assertEqual(a, b)" | +----------------------------------+--------------------------------------------+ | "failIfEqual(a, b)" | "assertNotEqual(a, b)" | +----------------------------------+--------------------------------------------+ | "assertNotEquals(a, b)" | "assertNotEqual(a, b)" | +----------------------------------+--------------------------------------------+ | "failUnless(a)" | "assertTrue(a)" | +----------------------------------+--------------------------------------------+ | "assert_(a)" | "assertTrue(a)" | +----------------------------------+--------------------------------------------+ | "failIf(a)" | "assertFalse(a)" | +----------------------------------+--------------------------------------------+ | "failUnlessRaises(exc, cal)" | "assertRaises(exc, cal)" | +----------------------------------+--------------------------------------------+ | "failUnlessAlmostEqual(a, b)" | "assertAlmostEqual(a, b)" | +----------------------------------+--------------------------------------------+ | "assertAlmostEquals(a, b)" | "assertAlmostEqual(a, b)" | +----------------------------------+--------------------------------------------+ | "failIfAlmostEqual(a, b)" | "assertNotAlmostEqual(a, b)" | +----------------------------------+--------------------------------------------+ | "assertNotAlmostEquals(a, b)" | "assertNotAlmostEqual(a, b)" | +----------------------------------+--------------------------------------------+ basestring Converts "basestring" to "str". buffer Converts "buffer" to "memoryview". This fixer is optional because the "memoryview" API is similar but not exactly the same as that of "buffer". dict Fixes dictionary iteration methods. "dict.iteritems()" is converted to "dict.items()", "dict.iterkeys()" to "dict.keys()", and "dict.itervalues()" to "dict.values()". Similarly, "dict.viewitems()", "dict.viewkeys()" and "dict.viewvalues()" are converted respectively to "dict.items()", "dict.keys()" and "dict.values()". It also wraps existing usages of "dict.items()", "dict.keys()", and "dict.values()" in a call to "list". except Converts "except X, T" to "except X as T". exec Converts the "exec" statement to the "exec()" function. execfile Removes usage of "execfile()". The argument to "execfile()" is wrapped in calls to "open()", "compile()", and "exec()". exitfunc Changes assignment of "sys.exitfunc" to use of the "atexit" module. filter Wraps "filter()" usage in a "list" call. funcattrs Fixes function attributes that have been renamed. For example, "my_function.func_closure" is converted to "my_function.__closure__". future Removes "from __future__ import new_feature" statements. getcwdu Renames "os.getcwdu()" to "os.getcwd()". has_key Changes "dict.has_key(key)" to "key in dict". idioms This optional fixer performs several transformations that make Python code more idiomatic. Type comparisons like "type(x) is SomeClass" and "type(x) == SomeClass" are converted to "isinstance(x, SomeClass)". "while 1" becomes "while True". This fixer also tries to make use of "sorted()" in appropriate places. For example, this block L = list(some_iterable) L.sort() is changed to L = sorted(some_iterable) import Detects sibling imports and converts them to relative imports. imports Handles module renames in the standard library. imports2 Handles other modules renames in the standard library. It is separate from the "imports" fixer only because of technical limitations. input Converts "input(prompt)" to "eval(input(prompt))". intern Converts "intern()" to "sys.intern()". isinstance Fixes duplicate types in the second argument of "isinstance()". For example, "isinstance(x, (int, int))" is converted to "isinstance(x, int)" and "isinstance(x, (int, float, int))" is converted to "isinstance(x, (int, float))". itertools_imports Removes imports of "itertools.ifilter()", "itertools.izip()", and "itertools.imap()". Imports of "itertools.ifilterfalse()" are also changed to "itertools.filterfalse()". itertools Changes usage of "itertools.ifilter()", "itertools.izip()", and "itertools.imap()" to their built-in equivalents. "itertools.ifilterfalse()" is changed to "itertools.filterfalse()". long Renames "long" to "int". map Wraps "map()" in a "list" call. It also changes "map(None, x)" to "list(x)". Using "from future_builtins import map" disables this fixer. metaclass Converts the old metaclass syntax ("__metaclass__ = Meta" in the class body) to the new ("class X(metaclass=Meta)"). methodattrs Fixes old method attribute names. For example, "meth.im_func" is converted to "meth.__func__". ne Converts the old not-equal syntax, "<>", to "!=". next Converts the use of iterator’s "next()" methods to the "next()" function. It also renames "next()" methods to "__next__()". nonzero Renames definitions of methods called "__nonzero__()" to "__bool__()". numliterals Converts octal literals into the new syntax. operator Converts calls to various functions in the "operator" module to other, but equivalent, function calls. When needed, the appropriate "import" statements are added, e.g. "import collections.abc". The following mapping are made: +------------------------------------+-----------------------------------------------+ | From | To | |====================================|===============================================| | "operator.isCallable(obj)" | "callable(obj)" | +------------------------------------+-----------------------------------------------+ | "operator.sequenceIncludes(obj)" | "operator.contains(obj)" | +------------------------------------+-----------------------------------------------+ | "operator.isSequenceType(obj)" | "isinstance(obj, collections.abc.Sequence)" | +------------------------------------+-----------------------------------------------+ | "operator.isMappingType(obj)" | "isinstance(obj, collections.abc.Mapping)" | +------------------------------------+-----------------------------------------------+ | "operator.isNumberType(obj)" | "isinstance(obj, numbers.Number)" | +------------------------------------+-----------------------------------------------+ | "operator.repeat(obj, n)" | "operator.mul(obj, n)" | +------------------------------------+-----------------------------------------------+ | "operator.irepeat(obj, n)" | "operator.imul(obj, n)" | +------------------------------------+-----------------------------------------------+ paren Add extra parenthesis where they are required in list comprehensions. For example, "[x for x in 1, 2]" becomes "[x for x in (1, 2)]". print Converts the "print" statement to the "print()" function. raise Converts "raise E, V" to "raise E(V)", and "raise E, V, T" to "raise E(V).with_traceback(T)". If "E" is a tuple, the translation will be incorrect because substituting tuples for exceptions has been removed in 3.0. raw_input Converts "raw_input()" to "input()". reduce Handles the move of "reduce()" to "functools.reduce()". reload Converts "reload()" to "importlib.reload()". renames Changes "sys.maxint" to "sys.maxsize". repr Replaces backtick repr with the "repr()" function. set_literal Replaces use of the "set" constructor with set literals. This fixer is optional. standarderror Renames "StandardError" to "Exception". sys_exc Changes the deprecated "sys.exc_value", "sys.exc_type", "sys.exc_traceback" to use "sys.exc_info()". throw Fixes the API change in generator’s "throw()" method. tuple_params Removes implicit tuple parameter unpacking. This fixer inserts temporary variables. types Fixes code broken from the removal of some members in the "types" module. unicode Renames "unicode" to "str". urllib Handles the rename of "urllib" and "urllib2" to the "urllib" package. ws_comma Removes excess whitespace from comma separated items. This fixer is optional. xrange Renames "xrange()" to "range()" and wraps existing "range()" calls with "list". xreadlines Changes "for x in file.xreadlines()" to "for x in file". zip Wraps "zip()" usage in a "list" call. This is disabled when "from future_builtins import zip" appears. "lib2to3" - 2to3’s library ========================== **Source code:** Lib/lib2to3/ ====================================================================== Deprecated since version 3.10: Python 3.9 will switch to a PEG parser (see **PEP 617**), and Python 3.10 may include new language syntax that is not parsable by lib2to3’s LL(1) parser. The "lib2to3" module may be removed from the standard library in a future Python version. Consider third-party alternatives such as LibCST or parso. Note: The "lib2to3" API should be considered unstable and may change drastically in the future. "__future__" — Future statement definitions ******************************************* **Source code:** Lib/__future__.py ====================================================================== "__future__" is a real module, and serves three purposes: * To avoid confusing existing tools that analyze import statements and expect to find the modules they’re importing. * To ensure that future statements run under releases prior to 2.1 at least yield runtime exceptions (the import of "__future__" will fail, because there was no module of that name prior to 2.1). * To document when incompatible changes were introduced, and when they will be — or were — made mandatory. This is a form of executable documentation, and can be inspected programmatically via importing "__future__" and examining its contents. Each statement in "__future__.py" is of the form: FeatureName = _Feature(OptionalRelease, MandatoryRelease, CompilerFlag) where, normally, *OptionalRelease* is less than *MandatoryRelease*, and both are 5-tuples of the same form as "sys.version_info": (PY_MAJOR_VERSION, # the 2 in 2.1.0a3; an int PY_MINOR_VERSION, # the 1; an int PY_MICRO_VERSION, # the 0; an int PY_RELEASE_LEVEL, # "alpha", "beta", "candidate" or "final"; string PY_RELEASE_SERIAL # the 3; an int ) *OptionalRelease* records the first release in which the feature was accepted. In the case of a *MandatoryRelease* that has not yet occurred, *MandatoryRelease* predicts the release in which the feature will become part of the language. Else *MandatoryRelease* records when the feature became part of the language; in releases at or after that, modules no longer need a future statement to use the feature in question, but may continue to use such imports. *MandatoryRelease* may also be "None", meaning that a planned feature got dropped. Instances of class "_Feature" have two corresponding methods, "getOptionalRelease()" and "getMandatoryRelease()". *CompilerFlag* is the (bitfield) flag that should be passed in the fourth argument to the built-in function "compile()" to enable the feature in dynamically compiled code. This flag is stored in the "compiler_flag" attribute on "_Feature" instances. No feature description will ever be deleted from "__future__". Since its introduction in Python 2.1 the following features have found their way into the language using this mechanism: +--------------------+---------------+----------------+-----------------------------------------------+ | feature | optional in | mandatory in | effect | |====================|===============|================|===============================================| | nested_scopes | 2.1.0b1 | 2.2 | **PEP 227**: *Statically Nested Scopes* | +--------------------+---------------+----------------+-----------------------------------------------+ | generators | 2.2.0a1 | 2.3 | **PEP 255**: *Simple Generators* | +--------------------+---------------+----------------+-----------------------------------------------+ | division | 2.2.0a2 | 3.0 | **PEP 238**: *Changing the Division Operator* | +--------------------+---------------+----------------+-----------------------------------------------+ | absolute_import | 2.5.0a1 | 3.0 | **PEP 328**: *Imports: Multi-Line and | | | | | Absolute/Relative* | +--------------------+---------------+----------------+-----------------------------------------------+ | with_statement | 2.5.0a1 | 2.6 | **PEP 343**: *The “with” Statement* | +--------------------+---------------+----------------+-----------------------------------------------+ | print_function | 2.6.0a2 | 3.0 | **PEP 3105**: *Make print a function* | +--------------------+---------------+----------------+-----------------------------------------------+ | unicode_literals | 2.6.0a2 | 3.0 | **PEP 3112**: *Bytes literals in Python 3000* | +--------------------+---------------+----------------+-----------------------------------------------+ | generator_stop | 3.5.0b1 | 3.7 | **PEP 479**: *StopIteration handling inside | | | | | generators* | +--------------------+---------------+----------------+-----------------------------------------------+ | annotations | 3.7.0b1 | TBD [1] | **PEP 563**: *Postponed evaluation of | | | | | annotations* | +--------------------+---------------+----------------+-----------------------------------------------+ [1] "from __future__ import annotations" was previously scheduled to become mandatory in Python 3.10, but the Python Steering Council twice decided to delay the change (announcement for Python 3.10; announcement for Python 3.11). No final decision has been made yet. See also **PEP 563** and **PEP 649**. See also: Future statements How the compiler treats future imports. "__main__" — Top-level script environment ***************************************** ====================================================================== "'__main__'" is the name of the scope in which top-level code executes. A module’s __name__ is set equal to "'__main__'" when read from standard input, a script, or from an interactive prompt. A module can discover whether or not it is running in the main scope by checking its own "__name__", which allows a common idiom for conditionally executing code in a module when it is run as a script or with "python -m" but not when it is imported: if __name__ == "__main__": # execute only if run as a script main() For a package, the same effect can be achieved by including a "__main__.py" module, the contents of which will be executed when the module is run with "-m". "_thread" — Low-level threading API *********************************** ====================================================================== This module provides low-level primitives for working with multiple threads (also called *light-weight processes* or *tasks*) — multiple threads of control sharing their global data space. For synchronization, simple locks (also called *mutexes* or *binary semaphores*) are provided. The "threading" module provides an easier to use and higher-level threading API built on top of this module. Changed in version 3.7: This module used to be optional, it is now always available. This module defines the following constants and functions: exception _thread.error Raised on thread-specific errors. Changed in version 3.3: This is now a synonym of the built-in "RuntimeError". _thread.LockType This is the type of lock objects. _thread.start_new_thread(function, args[, kwargs]) Start a new thread and return its identifier. The thread executes the function *function* with the argument list *args* (which must be a tuple). The optional *kwargs* argument specifies a dictionary of keyword arguments. When the function returns, the thread silently exits. When the function terminates with an unhandled exception, "sys.unraisablehook()" is called to handle the exception. The *object* attribute of the hook argument is *function*. By default, a stack trace is printed and then the thread exits (but other threads continue to run). When the function raises a "SystemExit" exception, it is silently ignored. Changed in version 3.8: "sys.unraisablehook()" is now used to handle unhandled exceptions. _thread.interrupt_main() Simulate the effect of a "signal.SIGINT" signal arriving in the main thread. A thread can use this function to interrupt the main thread. If "signal.SIGINT" isn’t handled by Python (it was set to "signal.SIG_DFL" or "signal.SIG_IGN"), this function does nothing. _thread.exit() Raise the "SystemExit" exception. When not caught, this will cause the thread to exit silently. _thread.allocate_lock() Return a new lock object. Methods of locks are described below. The lock is initially unlocked. _thread.get_ident() Return the ‘thread identifier’ of the current thread. This is a nonzero integer. Its value has no direct meaning; it is intended as a magic cookie to be used e.g. to index a dictionary of thread- specific data. Thread identifiers may be recycled when a thread exits and another thread is created. _thread.get_native_id() Return the native integral Thread ID of the current thread assigned by the kernel. This is a non-negative integer. Its value may be used to uniquely identify this particular thread system-wide (until the thread terminates, after which the value may be recycled by the OS). Availability: Windows, FreeBSD, Linux, macOS, OpenBSD, NetBSD, AIX. New in version 3.8. _thread.stack_size([size]) Return the thread stack size used when creating new threads. The optional *size* argument specifies the stack size to be used for subsequently created threads, and must be 0 (use platform or configured default) or a positive integer value of at least 32,768 (32 KiB). If *size* is not specified, 0 is used. If changing the thread stack size is unsupported, a "RuntimeError" is raised. If the specified stack size is invalid, a "ValueError" is raised and the stack size is unmodified. 32 KiB is currently the minimum supported stack size value to guarantee sufficient stack space for the interpreter itself. Note that some platforms may have particular restrictions on values for the stack size, such as requiring a minimum stack size > 32 KiB or requiring allocation in multiples of the system memory page size - platform documentation should be referred to for more information (4 KiB pages are common; using multiples of 4096 for the stack size is the suggested approach in the absence of more specific information). Availability: Windows, systems with POSIX threads. _thread.TIMEOUT_MAX The maximum value allowed for the *timeout* parameter of "Lock.acquire()". Specifying a timeout greater than this value will raise an "OverflowError". New in version 3.2. Lock objects have the following methods: lock.acquire(waitflag=1, timeout=-1) Without any optional argument, this method acquires the lock unconditionally, if necessary waiting until it is released by another thread (only one thread at a time can acquire a lock — that’s their reason for existence). If the integer *waitflag* argument is present, the action depends on its value: if it is zero, the lock is only acquired if it can be acquired immediately without waiting, while if it is nonzero, the lock is acquired unconditionally as above. If the floating-point *timeout* argument is present and positive, it specifies the maximum wait time in seconds before returning. A negative *timeout* argument specifies an unbounded wait. You cannot specify a *timeout* if *waitflag* is zero. The return value is "True" if the lock is acquired successfully, "False" if not. Changed in version 3.2: The *timeout* parameter is new. Changed in version 3.2: Lock acquires can now be interrupted by signals on POSIX. lock.release() Releases the lock. The lock must have been acquired earlier, but not necessarily by the same thread. lock.locked() Return the status of the lock: "True" if it has been acquired by some thread, "False" if not. In addition to these methods, lock objects can also be used via the "with" statement, e.g.: import _thread a_lock = _thread.allocate_lock() with a_lock: print("a_lock is locked while this executes") **Caveats:** * Threads interact strangely with interrupts: the "KeyboardInterrupt" exception will be received by an arbitrary thread. (When the "signal" module is available, interrupts always go to the main thread.) * Calling "sys.exit()" or raising the "SystemExit" exception is equivalent to calling "_thread.exit()". * It is not possible to interrupt the "acquire()" method on a lock — the "KeyboardInterrupt" exception will happen after the lock has been acquired. * When the main thread exits, it is system defined whether the other threads survive. On most systems, they are killed without executing "try" … "finally" clauses or executing object destructors. * When the main thread exits, it does not do any of its usual cleanup (except that "try" … "finally" clauses are honored), and the standard I/O files are not flushed. "abc" — Abstract Base Classes ***************************** **Source code:** Lib/abc.py ====================================================================== This module provides the infrastructure for defining *abstract base classes* (ABCs) in Python, as outlined in **PEP 3119**; see the PEP for why this was added to Python. (See also **PEP 3141** and the "numbers" module regarding a type hierarchy for numbers based on ABCs.) The "collections" module has some concrete classes that derive from ABCs; these can, of course, be further derived. In addition, the "collections.abc" submodule has some ABCs that can be used to test whether a class or instance provides a particular interface, for example, if it is hashable or if it is a mapping. This module provides the metaclass "ABCMeta" for defining ABCs and a helper class "ABC" to alternatively define ABCs through inheritance: class abc.ABC A helper class that has "ABCMeta" as its metaclass. With this class, an abstract base class can be created by simply deriving from "ABC" avoiding sometimes confusing metaclass usage, for example: from abc import ABC class MyABC(ABC): pass Note that the type of "ABC" is still "ABCMeta", therefore inheriting from "ABC" requires the usual precautions regarding metaclass usage, as multiple inheritance may lead to metaclass conflicts. One may also define an abstract base class by passing the metaclass keyword and using "ABCMeta" directly, for example: from abc import ABCMeta class MyABC(metaclass=ABCMeta): pass New in version 3.4. class abc.ABCMeta Metaclass for defining Abstract Base Classes (ABCs). Use this metaclass to create an ABC. An ABC can be subclassed directly, and then acts as a mix-in class. You can also register unrelated concrete classes (even built-in classes) and unrelated ABCs as “virtual subclasses” – these and their descendants will be considered subclasses of the registering ABC by the built-in "issubclass()" function, but the registering ABC won’t show up in their MRO (Method Resolution Order) nor will method implementations defined by the registering ABC be callable (not even via "super()"). [1] Classes created with a metaclass of "ABCMeta" have the following method: register(subclass) Register *subclass* as a “virtual subclass” of this ABC. For example: from abc import ABC class MyABC(ABC): pass MyABC.register(tuple) assert issubclass(tuple, MyABC) assert isinstance((), MyABC) Changed in version 3.3: Returns the registered subclass, to allow usage as a class decorator. Changed in version 3.4: To detect calls to "register()", you can use the "get_cache_token()" function. You can also override this method in an abstract base class: __subclasshook__(subclass) (Must be defined as a class method.) Check whether *subclass* is considered a subclass of this ABC. This means that you can customize the behavior of "issubclass" further without the need to call "register()" on every class you want to consider a subclass of the ABC. (This class method is called from the "__subclasscheck__()" method of the ABC.) This method should return "True", "False" or "NotImplemented". If it returns "True", the *subclass* is considered a subclass of this ABC. If it returns "False", the *subclass* is not considered a subclass of this ABC, even if it would normally be one. If it returns "NotImplemented", the subclass check is continued with the usual mechanism. For a demonstration of these concepts, look at this example ABC definition: class Foo: def __getitem__(self, index): ... def __len__(self): ... def get_iterator(self): return iter(self) class MyIterable(ABC): @abstractmethod def __iter__(self): while False: yield None def get_iterator(self): return self.__iter__() @classmethod def __subclasshook__(cls, C): if cls is MyIterable: if any("__iter__" in B.__dict__ for B in C.__mro__): return True return NotImplemented MyIterable.register(Foo) The ABC "MyIterable" defines the standard iterable method, "__iter__()", as an abstract method. The implementation given here can still be called from subclasses. The "get_iterator()" method is also part of the "MyIterable" abstract base class, but it does not have to be overridden in non-abstract derived classes. The "__subclasshook__()" class method defined here says that any class that has an "__iter__()" method in its "__dict__" (or in that of one of its base classes, accessed via the "__mro__" list) is considered a "MyIterable" too. Finally, the last line makes "Foo" a virtual subclass of "MyIterable", even though it does not define an "__iter__()" method (it uses the old-style iterable protocol, defined in terms of "__len__()" and "__getitem__()"). Note that this will not make "get_iterator" available as a method of "Foo", so it is provided separately. The "abc" module also provides the following decorator: @abc.abstractmethod A decorator indicating abstract methods. Using this decorator requires that the class’s metaclass is "ABCMeta" or is derived from it. A class that has a metaclass derived from "ABCMeta" cannot be instantiated unless all of its abstract methods and properties are overridden. The abstract methods can be called using any of the normal ‘super’ call mechanisms. "abstractmethod()" may be used to declare abstract methods for properties and descriptors. Dynamically adding abstract methods to a class, or attempting to modify the abstraction status of a method or class once it is created, are not supported. The "abstractmethod()" only affects subclasses derived using regular inheritance; “virtual subclasses” registered with the ABC’s "register()" method are not affected. When "abstractmethod()" is applied in combination with other method descriptors, it should be applied as the innermost decorator, as shown in the following usage examples: class C(ABC): @abstractmethod def my_abstract_method(self, arg1): ... @classmethod @abstractmethod def my_abstract_classmethod(cls, arg2): ... @staticmethod @abstractmethod def my_abstract_staticmethod(arg3): ... @property @abstractmethod def my_abstract_property(self): ... @my_abstract_property.setter @abstractmethod def my_abstract_property(self, val): ... @abstractmethod def _get_x(self): ... @abstractmethod def _set_x(self, val): ... x = property(_get_x, _set_x) In order to correctly interoperate with the abstract base class machinery, the descriptor must identify itself as abstract using "__isabstractmethod__". In general, this attribute should be "True" if any of the methods used to compose the descriptor are abstract. For example, Python’s built-in "property" does the equivalent of: class Descriptor: ... @property def __isabstractmethod__(self): return any(getattr(f, '__isabstractmethod__', False) for f in (self._fget, self._fset, self._fdel)) Note: Unlike Java abstract methods, these abstract methods may have an implementation. This implementation can be called via the "super()" mechanism from the class that overrides it. This could be useful as an end-point for a super-call in a framework that uses cooperative multiple-inheritance. The "abc" module also supports the following legacy decorators: @abc.abstractclassmethod New in version 3.2. Deprecated since version 3.3: It is now possible to use "classmethod" with "abstractmethod()", making this decorator redundant. A subclass of the built-in "classmethod()", indicating an abstract classmethod. Otherwise it is similar to "abstractmethod()". This special case is deprecated, as the "classmethod()" decorator is now correctly identified as abstract when applied to an abstract method: class C(ABC): @classmethod @abstractmethod def my_abstract_classmethod(cls, arg): ... @abc.abstractstaticmethod New in version 3.2. Deprecated since version 3.3: It is now possible to use "staticmethod" with "abstractmethod()", making this decorator redundant. A subclass of the built-in "staticmethod()", indicating an abstract staticmethod. Otherwise it is similar to "abstractmethod()". This special case is deprecated, as the "staticmethod()" decorator is now correctly identified as abstract when applied to an abstract method: class C(ABC): @staticmethod @abstractmethod def my_abstract_staticmethod(arg): ... @abc.abstractproperty Deprecated since version 3.3: It is now possible to use "property", "property.getter()", "property.setter()" and "property.deleter()" with "abstractmethod()", making this decorator redundant. A subclass of the built-in "property()", indicating an abstract property. This special case is deprecated, as the "property()" decorator is now correctly identified as abstract when applied to an abstract method: class C(ABC): @property @abstractmethod def my_abstract_property(self): ... The above example defines a read-only property; you can also define a read-write abstract property by appropriately marking one or more of the underlying methods as abstract: class C(ABC): @property def x(self): ... @x.setter @abstractmethod def x(self, val): ... If only some components are abstract, only those components need to be updated to create a concrete property in a subclass: class D(C): @C.x.setter def x(self, val): ... The "abc" module also provides the following functions: abc.get_cache_token() Returns the current abstract base class cache token. The token is an opaque object (that supports equality testing) identifying the current version of the abstract base class cache for virtual subclasses. The token changes with every call to "ABCMeta.register()" on any ABC. New in version 3.4. -[ Footnotes ]- [1] C++ programmers should note that Python’s virtual base class concept is not the same as C++’s. "aifc" — Read and write AIFF and AIFC files ******************************************* **Source code:** Lib/aifc.py Deprecated since version 3.11: The "aifc" module is deprecated (see **PEP 594** for details). ====================================================================== This module provides support for reading and writing AIFF and AIFF-C files. AIFF is Audio Interchange File Format, a format for storing digital audio samples in a file. AIFF-C is a newer version of the format that includes the ability to compress the audio data. Audio files have a number of parameters that describe the audio data. The sampling rate or frame rate is the number of times per second the sound is sampled. The number of channels indicate if the audio is mono, stereo, or quadro. Each frame consists of one sample per channel. The sample size is the size in bytes of each sample. Thus a frame consists of "nchannels * samplesize" bytes, and a second’s worth of audio consists of "nchannels * samplesize * framerate" bytes. For example, CD quality audio has a sample size of two bytes (16 bits), uses two channels (stereo) and has a frame rate of 44,100 frames/second. This gives a frame size of 4 bytes (2*2), and a second’s worth occupies 2*2*44100 bytes (176,400 bytes). Module "aifc" defines the following function: aifc.open(file, mode=None) Open an AIFF or AIFF-C file and return an object instance with methods that are described below. The argument *file* is either a string naming a file or a *file object*. *mode* must be "'r'" or "'rb'" when the file must be opened for reading, or "'w'" or "'wb'" when the file must be opened for writing. If omitted, "file.mode" is used if it exists, otherwise "'rb'" is used. When used for writing, the file object should be seekable, unless you know ahead of time how many samples you are going to write in total and use "writeframesraw()" and "setnframes()". The "open()" function may be used in a "with" statement. When the "with" block completes, the "close()" method is called. Changed in version 3.4: Support for the "with" statement was added. Objects returned by "open()" when a file is opened for reading have the following methods: aifc.getnchannels() Return the number of audio channels (1 for mono, 2 for stereo). aifc.getsampwidth() Return the size in bytes of individual samples. aifc.getframerate() Return the sampling rate (number of audio frames per second). aifc.getnframes() Return the number of audio frames in the file. aifc.getcomptype() Return a bytes array of length 4 describing the type of compression used in the audio file. For AIFF files, the returned value is "b'NONE'". aifc.getcompname() Return a bytes array convertible to a human-readable description of the type of compression used in the audio file. For AIFF files, the returned value is "b'not compressed'". aifc.getparams() Returns a "namedtuple()" "(nchannels, sampwidth, framerate, nframes, comptype, compname)", equivalent to output of the "get*()" methods. aifc.getmarkers() Return a list of markers in the audio file. A marker consists of a tuple of three elements. The first is the mark ID (an integer), the second is the mark position in frames from the beginning of the data (an integer), the third is the name of the mark (a string). aifc.getmark(id) Return the tuple as described in "getmarkers()" for the mark with the given *id*. aifc.readframes(nframes) Read and return the next *nframes* frames from the audio file. The returned data is a string containing for each frame the uncompressed samples of all channels. aifc.rewind() Rewind the read pointer. The next "readframes()" will start from the beginning. aifc.setpos(pos) Seek to the specified frame number. aifc.tell() Return the current frame number. aifc.close() Close the AIFF file. After calling this method, the object can no longer be used. Objects returned by "open()" when a file is opened for writing have all the above methods, except for "readframes()" and "setpos()". In addition the following methods exist. The "get*()" methods can only be called after the corresponding "set*()" methods have been called. Before the first "writeframes()" or "writeframesraw()", all parameters except for the number of frames must be filled in. aifc.aiff() Create an AIFF file. The default is that an AIFF-C file is created, unless the name of the file ends in "'.aiff'" in which case the default is an AIFF file. aifc.aifc() Create an AIFF-C file. The default is that an AIFF-C file is created, unless the name of the file ends in "'.aiff'" in which case the default is an AIFF file. aifc.setnchannels(nchannels) Specify the number of channels in the audio file. aifc.setsampwidth(width) Specify the size in bytes of audio samples. aifc.setframerate(rate) Specify the sampling frequency in frames per second. aifc.setnframes(nframes) Specify the number of frames that are to be written to the audio file. If this parameter is not set, or not set correctly, the file needs to support seeking. aifc.setcomptype(type, name) Specify the compression type. If not specified, the audio data will not be compressed. In AIFF files, compression is not possible. The name parameter should be a human-readable description of the compression type as a bytes array, the type parameter should be a bytes array of length 4. Currently the following compression types are supported: "b'NONE'", "b'ULAW'", "b'ALAW'", "b'G722'". aifc.setparams(nchannels, sampwidth, framerate, comptype, compname) Set all the above parameters at once. The argument is a tuple consisting of the various parameters. This means that it is possible to use the result of a "getparams()" call as argument to "setparams()". aifc.setmark(id, pos, name) Add a mark with the given id (larger than 0), and the given name at the given position. This method can be called at any time before "close()". aifc.tell() Return the current write position in the output file. Useful in combination with "setmark()". aifc.writeframes(data) Write data to the output file. This method can only be called after the audio file parameters have been set. Changed in version 3.4: Any *bytes-like object* is now accepted. aifc.writeframesraw(data) Like "writeframes()", except that the header of the audio file is not updated. Changed in version 3.4: Any *bytes-like object* is now accepted. aifc.close() Close the AIFF file. The header of the file is updated to reflect the actual size of the audio data. After calling this method, the object can no longer be used. Generic Operating System Services ********************************* The modules described in this chapter provide interfaces to operating system features that are available on (almost) all operating systems, such as files and a clock. The interfaces are generally modeled after the Unix or C interfaces, but they are available on most other systems as well. Here’s an overview: * "os" — Miscellaneous operating system interfaces * File Names, Command Line Arguments, and Environment Variables * Process Parameters * File Object Creation * File Descriptor Operations * Querying the size of a terminal * Inheritance of File Descriptors * Files and Directories * Linux extended attributes * Process Management * Interface to the scheduler * Miscellaneous System Information * Random numbers * "io" — Core tools for working with streams * Overview * Text I/O * Binary I/O * Raw I/O * High-level Module Interface * Class hierarchy * I/O Base Classes * Raw File I/O * Buffered Streams * Text I/O * Performance * Binary I/O * Text I/O * Multi-threading * Reentrancy * "time" — Time access and conversions * Functions * Clock ID Constants * Timezone Constants * "argparse" — Parser for command-line options, arguments and sub- commands * Example * Creating a parser * Adding arguments * Parsing arguments * ArgumentParser objects * prog * usage * description * epilog * parents * formatter_class * prefix_chars * fromfile_prefix_chars * argument_default * allow_abbrev * conflict_handler * add_help * exit_on_error * The add_argument() method * name or flags * action * nargs * const * default * type * choices * required * help * metavar * dest * Action classes * The parse_args() method * Option value syntax * Invalid arguments * Arguments containing "-" * Argument abbreviations (prefix matching) * Beyond "sys.argv" * The Namespace object * Other utilities * Sub-commands * FileType objects * Argument groups * Mutual exclusion * Parser defaults * Printing help * Partial parsing * Customizing file parsing * Exiting methods * Intermixed parsing * Upgrading optparse code * "getopt" — C-style parser for command line options * "logging" — Logging facility for Python * Logger Objects * Logging Levels * Handler Objects * Formatter Objects * Filter Objects * LogRecord Objects * LogRecord attributes * LoggerAdapter Objects * Thread Safety * Module-Level Functions * Module-Level Attributes * Integration with the warnings module * "logging.config" — Logging configuration * Configuration functions * Security considerations * Configuration dictionary schema * Dictionary Schema Details * Incremental Configuration * Object connections * User-defined objects * Access to external objects * Access to internal objects * Import resolution and custom importers * Configuration file format * "logging.handlers" — Logging handlers * StreamHandler * FileHandler * NullHandler * WatchedFileHandler * BaseRotatingHandler * RotatingFileHandler * TimedRotatingFileHandler * SocketHandler * DatagramHandler * SysLogHandler * NTEventLogHandler * SMTPHandler * MemoryHandler * HTTPHandler * QueueHandler * QueueListener * "getpass" — Portable password input * "curses" — Terminal handling for character-cell displays * Functions * Window Objects * Constants * "curses.textpad" — Text input widget for curses programs * Textbox objects * "curses.ascii" — Utilities for ASCII characters * "curses.panel" — A panel stack extension for curses * Functions * Panel Objects * "platform" — Access to underlying platform’s identifying data * Cross Platform * Java Platform * Windows Platform * macOS Platform * Unix Platforms * "errno" — Standard errno system symbols * "ctypes" — A foreign function library for Python * ctypes tutorial * Loading dynamic link libraries * Accessing functions from loaded dlls * Calling functions * Fundamental data types * Calling functions, continued * Calling functions with your own custom data types * Specifying the required argument types (function prototypes) * Return types * Passing pointers (or: passing parameters by reference) * Structures and unions * Structure/union alignment and byte order * Bit fields in structures and unions * Arrays * Pointers * Type conversions * Incomplete Types * Callback functions * Accessing values exported from dlls * Surprises * Variable-sized data types * ctypes reference * Finding shared libraries * Loading shared libraries * Foreign functions * Function prototypes * Utility functions * Data types * Fundamental data types * Structured data types * Arrays and pointers Data Compression and Archiving ****************************** The modules described in this chapter support data compression with the zlib, gzip, bzip2 and lzma algorithms, and the creation of ZIP- and tar-format archives. See also Archiving operations provided by the "shutil" module. * "zlib" — Compression compatible with **gzip** * "gzip" — Support for **gzip** files * Examples of usage * Command Line Interface * Command line options * "bz2" — Support for **bzip2** compression * (De)compression of files * Incremental (de)compression * One-shot (de)compression * Examples of usage * "lzma" — Compression using the LZMA algorithm * Reading and writing compressed files * Compressing and decompressing data in memory * Miscellaneous * Specifying custom filter chains * Examples * "zipfile" — Work with ZIP archives * ZipFile Objects * Path Objects * PyZipFile Objects * ZipInfo Objects * Command-Line Interface * Command-line options * Decompression pitfalls * From file itself * File System limitations * Resources limitations * Interruption * Default behaviors of extraction * "tarfile" — Read and write tar archive files * TarFile Objects * TarInfo Objects * Extraction filters * Default named filters * Filter errors * Hints for further verification * Supporting older Python versions * Stateful extraction filter example * Command-Line Interface * Command-line options * Examples * Supported tar formats * Unicode issues "argparse" — Parser for command-line options, arguments and sub-commands ************************************************************************ New in version 3.2. **Source code:** Lib/argparse.py ====================================================================== Tutorial ^^^^^^^^ This page contains the API reference information. For a more gentle introduction to Python command-line parsing, have a look at the argparse tutorial. The "argparse" module makes it easy to write user-friendly command- line interfaces. The program defines what arguments it requires, and "argparse" will figure out how to parse those out of "sys.argv". The "argparse" module also automatically generates help and usage messages and issues errors when users give the program invalid arguments. Example ======= The following code is a Python program that takes a list of integers and produces either the sum or the max: import argparse parser = argparse.ArgumentParser(description='Process some integers.') parser.add_argument('integers', metavar='N', type=int, nargs='+', help='an integer for the accumulator') parser.add_argument('--sum', dest='accumulate', action='store_const', const=sum, default=max, help='sum the integers (default: find the max)') args = parser.parse_args() print(args.accumulate(args.integers)) Assuming the Python code above is saved into a file called "prog.py", it can be run at the command line and provides useful help messages: $ python prog.py -h usage: prog.py [-h] [--sum] N [N ...] Process some integers. positional arguments: N an integer for the accumulator optional arguments: -h, --help show this help message and exit --sum sum the integers (default: find the max) When run with the appropriate arguments, it prints either the sum or the max of the command-line integers: $ python prog.py 1 2 3 4 4 $ python prog.py 1 2 3 4 --sum 10 If invalid arguments are passed in, it will issue an error: $ python prog.py a b c usage: prog.py [-h] [--sum] N [N ...] prog.py: error: argument N: invalid int value: 'a' The following sections walk you through this example. Creating a parser ----------------- The first step in using the "argparse" is creating an "ArgumentParser" object: >>> parser = argparse.ArgumentParser(description='Process some integers.') The "ArgumentParser" object will hold all the information necessary to parse the command line into Python data types. Adding arguments ---------------- Filling an "ArgumentParser" with information about program arguments is done by making calls to the "add_argument()" method. Generally, these calls tell the "ArgumentParser" how to take the strings on the command line and turn them into objects. This information is stored and used when "parse_args()" is called. For example: >>> parser.add_argument('integers', metavar='N', type=int, nargs='+', ... help='an integer for the accumulator') >>> parser.add_argument('--sum', dest='accumulate', action='store_const', ... const=sum, default=max, ... help='sum the integers (default: find the max)') Later, calling "parse_args()" will return an object with two attributes, "integers" and "accumulate". The "integers" attribute will be a list of one or more ints, and the "accumulate" attribute will be either the "sum()" function, if "--sum" was specified at the command line, or the "max()" function if it was not. Parsing arguments ----------------- "ArgumentParser" parses arguments through the "parse_args()" method. This will inspect the command line, convert each argument to the appropriate type and then invoke the appropriate action. In most cases, this means a simple "Namespace" object will be built up from attributes parsed out of the command line: >>> parser.parse_args(['--sum', '7', '-1', '42']) Namespace(accumulate=, integers=[7, -1, 42]) In a script, "parse_args()" will typically be called with no arguments, and the "ArgumentParser" will automatically determine the command-line arguments from "sys.argv". ArgumentParser objects ====================== class argparse.ArgumentParser(prog=None, usage=None, description=None, epilog=None, parents=[], formatter_class=argparse.HelpFormatter, prefix_chars='-', fromfile_prefix_chars=None, argument_default=None, conflict_handler='error', add_help=True, allow_abbrev=True, exit_on_error=True) Create a new "ArgumentParser" object. All parameters should be passed as keyword arguments. Each parameter has its own more detailed description below, but in short they are: * prog - The name of the program (default: "sys.argv[0]") * usage - The string describing the program usage (default: generated from arguments added to parser) * description - Text to display before the argument help (default: none) * epilog - Text to display after the argument help (default: none) * parents - A list of "ArgumentParser" objects whose arguments should also be included * formatter_class - A class for customizing the help output * prefix_chars - The set of characters that prefix optional arguments (default: ‘-‘) * fromfile_prefix_chars - The set of characters that prefix files from which additional arguments should be read (default: "None") * argument_default - The global default value for arguments (default: "None") * conflict_handler - The strategy for resolving conflicting optionals (usually unnecessary) * add_help - Add a "-h/--help" option to the parser (default: "True") * allow_abbrev - Allows long options to be abbreviated if the abbreviation is unambiguous. (default: "True") * exit_on_error - Determines whether or not ArgumentParser exits with error info when an error occurs. (default: "True") Changed in version 3.5: *allow_abbrev* parameter was added. Changed in version 3.8: In previous versions, *allow_abbrev* also disabled grouping of short flags such as "-vv" to mean "-v -v". Changed in version 3.9: *exit_on_error* parameter was added. The following sections describe how each of these are used. prog ---- By default, "ArgumentParser" objects use "sys.argv[0]" to determine how to display the name of the program in help messages. This default is almost always desirable because it will make the help messages match how the program was invoked on the command line. For example, consider a file named "myprogram.py" with the following code: import argparse parser = argparse.ArgumentParser() parser.add_argument('--foo', help='foo help') args = parser.parse_args() The help for this program will display "myprogram.py" as the program name (regardless of where the program was invoked from): $ python myprogram.py --help usage: myprogram.py [-h] [--foo FOO] optional arguments: -h, --help show this help message and exit --foo FOO foo help $ cd .. $ python subdir/myprogram.py --help usage: myprogram.py [-h] [--foo FOO] optional arguments: -h, --help show this help message and exit --foo FOO foo help To change this default behavior, another value can be supplied using the "prog=" argument to "ArgumentParser": >>> parser = argparse.ArgumentParser(prog='myprogram') >>> parser.print_help() usage: myprogram [-h] optional arguments: -h, --help show this help message and exit Note that the program name, whether determined from "sys.argv[0]" or from the "prog=" argument, is available to help messages using the "%(prog)s" format specifier. >>> parser = argparse.ArgumentParser(prog='myprogram') >>> parser.add_argument('--foo', help='foo of the %(prog)s program') >>> parser.print_help() usage: myprogram [-h] [--foo FOO] optional arguments: -h, --help show this help message and exit --foo FOO foo of the myprogram program usage ----- By default, "ArgumentParser" calculates the usage message from the arguments it contains: >>> parser = argparse.ArgumentParser(prog='PROG') >>> parser.add_argument('--foo', nargs='?', help='foo help') >>> parser.add_argument('bar', nargs='+', help='bar help') >>> parser.print_help() usage: PROG [-h] [--foo [FOO]] bar [bar ...] positional arguments: bar bar help optional arguments: -h, --help show this help message and exit --foo [FOO] foo help The default message can be overridden with the "usage=" keyword argument: >>> parser = argparse.ArgumentParser(prog='PROG', usage='%(prog)s [options]') >>> parser.add_argument('--foo', nargs='?', help='foo help') >>> parser.add_argument('bar', nargs='+', help='bar help') >>> parser.print_help() usage: PROG [options] positional arguments: bar bar help optional arguments: -h, --help show this help message and exit --foo [FOO] foo help The "%(prog)s" format specifier is available to fill in the program name in your usage messages. description ----------- Most calls to the "ArgumentParser" constructor will use the "description=" keyword argument. This argument gives a brief description of what the program does and how it works. In help messages, the description is displayed between the command-line usage string and the help messages for the various arguments: >>> parser = argparse.ArgumentParser(description='A foo that bars') >>> parser.print_help() usage: argparse.py [-h] A foo that bars optional arguments: -h, --help show this help message and exit By default, the description will be line-wrapped so that it fits within the given space. To change this behavior, see the formatter_class argument. epilog ------ Some programs like to display additional description of the program after the description of the arguments. Such text can be specified using the "epilog=" argument to "ArgumentParser": >>> parser = argparse.ArgumentParser( ... description='A foo that bars', ... epilog="And that's how you'd foo a bar") >>> parser.print_help() usage: argparse.py [-h] A foo that bars optional arguments: -h, --help show this help message and exit And that's how you'd foo a bar As with the description argument, the "epilog=" text is by default line-wrapped, but this behavior can be adjusted with the formatter_class argument to "ArgumentParser". parents ------- Sometimes, several parsers share a common set of arguments. Rather than repeating the definitions of these arguments, a single parser with all the shared arguments and passed to "parents=" argument to "ArgumentParser" can be used. The "parents=" argument takes a list of "ArgumentParser" objects, collects all the positional and optional actions from them, and adds these actions to the "ArgumentParser" object being constructed: >>> parent_parser = argparse.ArgumentParser(add_help=False) >>> parent_parser.add_argument('--parent', type=int) >>> foo_parser = argparse.ArgumentParser(parents=[parent_parser]) >>> foo_parser.add_argument('foo') >>> foo_parser.parse_args(['--parent', '2', 'XXX']) Namespace(foo='XXX', parent=2) >>> bar_parser = argparse.ArgumentParser(parents=[parent_parser]) >>> bar_parser.add_argument('--bar') >>> bar_parser.parse_args(['--bar', 'YYY']) Namespace(bar='YYY', parent=None) Note that most parent parsers will specify "add_help=False". Otherwise, the "ArgumentParser" will see two "-h/--help" options (one in the parent and one in the child) and raise an error. Note: You must fully initialize the parsers before passing them via "parents=". If you change the parent parsers after the child parser, those changes will not be reflected in the child. formatter_class --------------- "ArgumentParser" objects allow the help formatting to be customized by specifying an alternate formatting class. Currently, there are four such classes: class argparse.RawDescriptionHelpFormatter class argparse.RawTextHelpFormatter class argparse.ArgumentDefaultsHelpFormatter class argparse.MetavarTypeHelpFormatter "RawDescriptionHelpFormatter" and "RawTextHelpFormatter" give more control over how textual descriptions are displayed. By default, "ArgumentParser" objects line-wrap the description and epilog texts in command-line help messages: >>> parser = argparse.ArgumentParser( ... prog='PROG', ... description='''this description ... was indented weird ... but that is okay''', ... epilog=''' ... likewise for this epilog whose whitespace will ... be cleaned up and whose words will be wrapped ... across a couple lines''') >>> parser.print_help() usage: PROG [-h] this description was indented weird but that is okay optional arguments: -h, --help show this help message and exit likewise for this epilog whose whitespace will be cleaned up and whose words will be wrapped across a couple lines Passing "RawDescriptionHelpFormatter" as "formatter_class=" indicates that description and epilog are already correctly formatted and should not be line-wrapped: >>> parser = argparse.ArgumentParser( ... prog='PROG', ... formatter_class=argparse.RawDescriptionHelpFormatter, ... description=textwrap.dedent('''\ ... Please do not mess up this text! ... -------------------------------- ... I have indented it ... exactly the way ... I want it ... ''')) >>> parser.print_help() usage: PROG [-h] Please do not mess up this text! -------------------------------- I have indented it exactly the way I want it optional arguments: -h, --help show this help message and exit "RawTextHelpFormatter" maintains whitespace for all sorts of help text, including argument descriptions. However, multiple new lines are replaced with one. If you wish to preserve multiple blank lines, add spaces between the newlines. "ArgumentDefaultsHelpFormatter" automatically adds information about default values to each of the argument help messages: >>> parser = argparse.ArgumentParser( ... prog='PROG', ... formatter_class=argparse.ArgumentDefaultsHelpFormatter) >>> parser.add_argument('--foo', type=int, default=42, help='FOO!') >>> parser.add_argument('bar', nargs='*', default=[1, 2, 3], help='BAR!') >>> parser.print_help() usage: PROG [-h] [--foo FOO] [bar ...] positional arguments: bar BAR! (default: [1, 2, 3]) optional arguments: -h, --help show this help message and exit --foo FOO FOO! (default: 42) "MetavarTypeHelpFormatter" uses the name of the type argument for each argument as the display name for its values (rather than using the dest as the regular formatter does): >>> parser = argparse.ArgumentParser( ... prog='PROG', ... formatter_class=argparse.MetavarTypeHelpFormatter) >>> parser.add_argument('--foo', type=int) >>> parser.add_argument('bar', type=float) >>> parser.print_help() usage: PROG [-h] [--foo int] float positional arguments: float optional arguments: -h, --help show this help message and exit --foo int prefix_chars ------------ Most command-line options will use "-" as the prefix, e.g. "-f/--foo". Parsers that need to support different or additional prefix characters, e.g. for options like "+f" or "/foo", may specify them using the "prefix_chars=" argument to the ArgumentParser constructor: >>> parser = argparse.ArgumentParser(prog='PROG', prefix_chars='-+') >>> parser.add_argument('+f') >>> parser.add_argument('++bar') >>> parser.parse_args('+f X ++bar Y'.split()) Namespace(bar='Y', f='X') The "prefix_chars=" argument defaults to "'-'". Supplying a set of characters that does not include "-" will cause "-f/--foo" options to be disallowed. fromfile_prefix_chars --------------------- Sometimes, for example when dealing with a particularly long argument list, it may make sense to keep the list of arguments in a file rather than typing it out at the command line. If the "fromfile_prefix_chars=" argument is given to the "ArgumentParser" constructor, then arguments that start with any of the specified characters will be treated as files, and will be replaced by the arguments they contain. For example: >>> with open('args.txt', 'w') as fp: ... fp.write('-f\nbar') >>> parser = argparse.ArgumentParser(fromfile_prefix_chars='@') >>> parser.add_argument('-f') >>> parser.parse_args(['-f', 'foo', '@args.txt']) Namespace(f='bar') Arguments read from a file must by default be one per line (but see also "convert_arg_line_to_args()") and are treated as if they were in the same place as the original file referencing argument on the command line. So in the example above, the expression "['-f', 'foo', '@args.txt']" is considered equivalent to the expression "['-f', 'foo', '-f', 'bar']". The "fromfile_prefix_chars=" argument defaults to "None", meaning that arguments will never be treated as file references. argument_default ---------------- Generally, argument defaults are specified either by passing a default to "add_argument()" or by calling the "set_defaults()" methods with a specific set of name-value pairs. Sometimes however, it may be useful to specify a single parser-wide default for arguments. This can be accomplished by passing the "argument_default=" keyword argument to "ArgumentParser". For example, to globally suppress attribute creation on "parse_args()" calls, we supply "argument_default=SUPPRESS": >>> parser = argparse.ArgumentParser(argument_default=argparse.SUPPRESS) >>> parser.add_argument('--foo') >>> parser.add_argument('bar', nargs='?') >>> parser.parse_args(['--foo', '1', 'BAR']) Namespace(bar='BAR', foo='1') >>> parser.parse_args([]) Namespace() allow_abbrev ------------ Normally, when you pass an argument list to the "parse_args()" method of an "ArgumentParser", it recognizes abbreviations of long options. This feature can be disabled by setting "allow_abbrev" to "False": >>> parser = argparse.ArgumentParser(prog='PROG', allow_abbrev=False) >>> parser.add_argument('--foobar', action='store_true') >>> parser.add_argument('--foonley', action='store_false') >>> parser.parse_args(['--foon']) usage: PROG [-h] [--foobar] [--foonley] PROG: error: unrecognized arguments: --foon New in version 3.5. conflict_handler ---------------- "ArgumentParser" objects do not allow two actions with the same option string. By default, "ArgumentParser" objects raise an exception if an attempt is made to create an argument with an option string that is already in use: >>> parser = argparse.ArgumentParser(prog='PROG') >>> parser.add_argument('-f', '--foo', help='old foo help') >>> parser.add_argument('--foo', help='new foo help') Traceback (most recent call last): .. ArgumentError: argument --foo: conflicting option string(s): --foo Sometimes (e.g. when using parents) it may be useful to simply override any older arguments with the same option string. To get this behavior, the value "'resolve'" can be supplied to the "conflict_handler=" argument of "ArgumentParser": >>> parser = argparse.ArgumentParser(prog='PROG', conflict_handler='resolve') >>> parser.add_argument('-f', '--foo', help='old foo help') >>> parser.add_argument('--foo', help='new foo help') >>> parser.print_help() usage: PROG [-h] [-f FOO] [--foo FOO] optional arguments: -h, --help show this help message and exit -f FOO old foo help --foo FOO new foo help Note that "ArgumentParser" objects only remove an action if all of its option strings are overridden. So, in the example above, the old "-f/--foo" action is retained as the "-f" action, because only the "-- foo" option string was overridden. add_help -------- By default, ArgumentParser objects add an option which simply displays the parser’s help message. For example, consider a file named "myprogram.py" containing the following code: import argparse parser = argparse.ArgumentParser() parser.add_argument('--foo', help='foo help') args = parser.parse_args() If "-h" or "--help" is supplied at the command line, the ArgumentParser help will be printed: $ python myprogram.py --help usage: myprogram.py [-h] [--foo FOO] optional arguments: -h, --help show this help message and exit --foo FOO foo help Occasionally, it may be useful to disable the addition of this help option. This can be achieved by passing "False" as the "add_help=" argument to "ArgumentParser": >>> parser = argparse.ArgumentParser(prog='PROG', add_help=False) >>> parser.add_argument('--foo', help='foo help') >>> parser.print_help() usage: PROG [--foo FOO] optional arguments: --foo FOO foo help The help option is typically "-h/--help". The exception to this is if the "prefix_chars=" is specified and does not include "-", in which case "-h" and "--help" are not valid options. In this case, the first character in "prefix_chars" is used to prefix the help options: >>> parser = argparse.ArgumentParser(prog='PROG', prefix_chars='+/') >>> parser.print_help() usage: PROG [+h] optional arguments: +h, ++help show this help message and exit exit_on_error ------------- Normally, when you pass an invalid argument list to the "parse_args()" method of an "ArgumentParser", it will exit with error info. If the user would like to catch errors manually, the feature can be enabled by setting "exit_on_error" to "False": >>> parser = argparse.ArgumentParser(exit_on_error=False) >>> parser.add_argument('--integers', type=int) _StoreAction(option_strings=['--integers'], dest='integers', nargs=None, const=None, default=None, type=, choices=None, help=None, metavar=None) >>> try: ... parser.parse_args('--integers a'.split()) ... except argparse.ArgumentError: ... print('Catching an argumentError') ... Catching an argumentError New in version 3.9. The add_argument() method ========================= ArgumentParser.add_argument(name or flags...[, action][, nargs][, const][, default][, type][, choices][, required][, help][, metavar][, dest]) Define how a single command-line argument should be parsed. Each parameter has its own more detailed description below, but in short they are: * name or flags - Either a name or a list of option strings, e.g. "foo" or "-f, --foo". * action - The basic type of action to be taken when this argument is encountered at the command line. * nargs - The number of command-line arguments that should be consumed. * const - A constant value required by some action and nargs selections. * default - The value produced if the argument is absent from the command line and if it is absent from the namespace object. * type - The type to which the command-line argument should be converted. * choices - A container of the allowable values for the argument. * required - Whether or not the command-line option may be omitted (optionals only). * help - A brief description of what the argument does. * metavar - A name for the argument in usage messages. * dest - The name of the attribute to be added to the object returned by "parse_args()". The following sections describe how each of these are used. name or flags ------------- The "add_argument()" method must know whether an optional argument, like "-f" or "--foo", or a positional argument, like a list of filenames, is expected. The first arguments passed to "add_argument()" must therefore be either a series of flags, or a simple argument name. For example, an optional argument could be created like: >>> parser.add_argument('-f', '--foo') while a positional argument could be created like: >>> parser.add_argument('bar') When "parse_args()" is called, optional arguments will be identified by the "-" prefix, and the remaining arguments will be assumed to be positional: >>> parser = argparse.ArgumentParser(prog='PROG') >>> parser.add_argument('-f', '--foo') >>> parser.add_argument('bar') >>> parser.parse_args(['BAR']) Namespace(bar='BAR', foo=None) >>> parser.parse_args(['BAR', '--foo', 'FOO']) Namespace(bar='BAR', foo='FOO') >>> parser.parse_args(['--foo', 'FOO']) usage: PROG [-h] [-f FOO] bar PROG: error: the following arguments are required: bar action ------ "ArgumentParser" objects associate command-line arguments with actions. These actions can do just about anything with the command- line arguments associated with them, though most actions simply add an attribute to the object returned by "parse_args()". The "action" keyword argument specifies how the command-line arguments should be handled. The supplied actions are: * "'store'" - This just stores the argument’s value. This is the default action. For example: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo') >>> parser.parse_args('--foo 1'.split()) Namespace(foo='1') * "'store_const'" - This stores the value specified by the const keyword argument. The "'store_const'" action is most commonly used with optional arguments that specify some sort of flag. For example: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', action='store_const', const=42) >>> parser.parse_args(['--foo']) Namespace(foo=42) * "'store_true'" and "'store_false'" - These are special cases of "'store_const'" used for storing the values "True" and "False" respectively. In addition, they create default values of "False" and "True" respectively. For example: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', action='store_true') >>> parser.add_argument('--bar', action='store_false') >>> parser.add_argument('--baz', action='store_false') >>> parser.parse_args('--foo --bar'.split()) Namespace(foo=True, bar=False, baz=True) * "'append'" - This stores a list, and appends each argument value to the list. This is useful to allow an option to be specified multiple times. Example usage: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', action='append') >>> parser.parse_args('--foo 1 --foo 2'.split()) Namespace(foo=['1', '2']) * "'append_const'" - This stores a list, and appends the value specified by the const keyword argument to the list. (Note that the const keyword argument defaults to "None".) The "'append_const'" action is typically useful when multiple arguments need to store constants to the same list. For example: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--str', dest='types', action='append_const', const=str) >>> parser.add_argument('--int', dest='types', action='append_const', const=int) >>> parser.parse_args('--str --int'.split()) Namespace(types=[, ]) * "'count'" - This counts the number of times a keyword argument occurs. For example, this is useful for increasing verbosity levels: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--verbose', '-v', action='count', default=0) >>> parser.parse_args(['-vvv']) Namespace(verbose=3) Note, the *default* will be "None" unless explicitly set to *0*. * "'help'" - This prints a complete help message for all the options in the current parser and then exits. By default a help action is automatically added to the parser. See "ArgumentParser" for details of how the output is created. * "'version'" - This expects a "version=" keyword argument in the "add_argument()" call, and prints version information and exits when invoked: >>> import argparse >>> parser = argparse.ArgumentParser(prog='PROG') >>> parser.add_argument('--version', action='version', version='%(prog)s 2.0') >>> parser.parse_args(['--version']) PROG 2.0 * "'extend'" - This stores a list, and extends each argument value to the list. Example usage: >>> parser = argparse.ArgumentParser() >>> parser.add_argument("--foo", action="extend", nargs="+", type=str) >>> parser.parse_args(["--foo", "f1", "--foo", "f2", "f3", "f4"]) Namespace(foo=['f1', 'f2', 'f3', 'f4']) New in version 3.8. You may also specify an arbitrary action by passing an Action subclass or other object that implements the same interface. The "BooleanOptionalAction" is available in "argparse" and adds support for boolean actions such as "--foo" and "--no-foo": >>> import argparse >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', action=argparse.BooleanOptionalAction) >>> parser.parse_args(['--no-foo']) Namespace(foo=False) New in version 3.9. The recommended way to create a custom action is to extend "Action", overriding the "__call__" method and optionally the "__init__" and "format_usage" methods. An example of a custom action: >>> class FooAction(argparse.Action): ... def __init__(self, option_strings, dest, nargs=None, **kwargs): ... if nargs is not None: ... raise ValueError("nargs not allowed") ... super().__init__(option_strings, dest, **kwargs) ... def __call__(self, parser, namespace, values, option_string=None): ... print('%r %r %r' % (namespace, values, option_string)) ... setattr(namespace, self.dest, values) ... >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', action=FooAction) >>> parser.add_argument('bar', action=FooAction) >>> args = parser.parse_args('1 --foo 2'.split()) Namespace(bar=None, foo=None) '1' None Namespace(bar='1', foo=None) '2' '--foo' >>> args Namespace(bar='1', foo='2') For more details, see "Action". nargs ----- ArgumentParser objects usually associate a single command-line argument with a single action to be taken. The "nargs" keyword argument associates a different number of command-line arguments with a single action. The supported values are: * "N" (an integer). "N" arguments from the command line will be gathered together into a list. For example: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', nargs=2) >>> parser.add_argument('bar', nargs=1) >>> parser.parse_args('c --foo a b'.split()) Namespace(bar=['c'], foo=['a', 'b']) Note that "nargs=1" produces a list of one item. This is different from the default, in which the item is produced by itself. * "'?'". One argument will be consumed from the command line if possible, and produced as a single item. If no command-line argument is present, the value from default will be produced. Note that for optional arguments, there is an additional case - the option string is present but not followed by a command-line argument. In this case the value from const will be produced. Some examples to illustrate this: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', nargs='?', const='c', default='d') >>> parser.add_argument('bar', nargs='?', default='d') >>> parser.parse_args(['XX', '--foo', 'YY']) Namespace(bar='XX', foo='YY') >>> parser.parse_args(['XX', '--foo']) Namespace(bar='XX', foo='c') >>> parser.parse_args([]) Namespace(bar='d', foo='d') One of the more common uses of "nargs='?'" is to allow optional input and output files: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('infile', nargs='?', type=argparse.FileType('r'), ... default=sys.stdin) >>> parser.add_argument('outfile', nargs='?', type=argparse.FileType('w'), ... default=sys.stdout) >>> parser.parse_args(['input.txt', 'output.txt']) Namespace(infile=<_io.TextIOWrapper name='input.txt' encoding='UTF-8'>, outfile=<_io.TextIOWrapper name='output.txt' encoding='UTF-8'>) >>> parser.parse_args([]) Namespace(infile=<_io.TextIOWrapper name='' encoding='UTF-8'>, outfile=<_io.TextIOWrapper name='' encoding='UTF-8'>) * "'*'". All command-line arguments present are gathered into a list. Note that it generally doesn’t make much sense to have more than one positional argument with "nargs='*'", but multiple optional arguments with "nargs='*'" is possible. For example: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', nargs='*') >>> parser.add_argument('--bar', nargs='*') >>> parser.add_argument('baz', nargs='*') >>> parser.parse_args('a b --foo x y --bar 1 2'.split()) Namespace(bar=['1', '2'], baz=['a', 'b'], foo=['x', 'y']) * "'+'". Just like "'*'", all command-line args present are gathered into a list. Additionally, an error message will be generated if there wasn’t at least one command-line argument present. For example: >>> parser = argparse.ArgumentParser(prog='PROG') >>> parser.add_argument('foo', nargs='+') >>> parser.parse_args(['a', 'b']) Namespace(foo=['a', 'b']) >>> parser.parse_args([]) usage: PROG [-h] foo [foo ...] PROG: error: the following arguments are required: foo If the "nargs" keyword argument is not provided, the number of arguments consumed is determined by the action. Generally this means a single command-line argument will be consumed and a single item (not a list) will be produced. const ----- The "const" argument of "add_argument()" is used to hold constant values that are not read from the command line but are required for the various "ArgumentParser" actions. The two most common uses of it are: * When "add_argument()" is called with "action='store_const'" or "action='append_const'". These actions add the "const" value to one of the attributes of the object returned by "parse_args()". See the action description for examples. * When "add_argument()" is called with option strings (like "-f" or " --foo") and "nargs='?'". This creates an optional argument that can be followed by zero or one command-line arguments. When parsing the command line, if the option string is encountered with no command- line argument following it, the value of "const" will be assumed instead. See the nargs description for examples. With the "'store_const'" and "'append_const'" actions, the "const" keyword argument must be given. For other actions, it defaults to "None". default ------- All optional arguments and some positional arguments may be omitted at the command line. The "default" keyword argument of "add_argument()", whose value defaults to "None", specifies what value should be used if the command-line argument is not present. For optional arguments, the "default" value is used when the option string was not present at the command line: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', default=42) >>> parser.parse_args(['--foo', '2']) Namespace(foo='2') >>> parser.parse_args([]) Namespace(foo=42) If the target namespace already has an attribute set, the action *default* will not over write it: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', default=42) >>> parser.parse_args([], namespace=argparse.Namespace(foo=101)) Namespace(foo=101) If the "default" value is a string, the parser parses the value as if it were a command-line argument. In particular, the parser applies any type conversion argument, if provided, before setting the attribute on the "Namespace" return value. Otherwise, the parser uses the value as is: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--length', default='10', type=int) >>> parser.add_argument('--width', default=10.5, type=int) >>> parser.parse_args() Namespace(length=10, width=10.5) For positional arguments with nargs equal to "?" or "*", the "default" value is used when no command-line argument was present: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('foo', nargs='?', default=42) >>> parser.parse_args(['a']) Namespace(foo='a') >>> parser.parse_args([]) Namespace(foo=42) Providing "default=argparse.SUPPRESS" causes no attribute to be added if the command-line argument was not present: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', default=argparse.SUPPRESS) >>> parser.parse_args([]) Namespace() >>> parser.parse_args(['--foo', '1']) Namespace(foo='1') type ---- By default, the parser reads command-line arguments in as simple strings. However, quite often the command-line string should instead be interpreted as another type, such as a "float" or "int". The "type" keyword for "add_argument()" allows any necessary type-checking and type conversions to be performed. If the type keyword is used with the default keyword, the type converter is only applied if the default is a string. The argument to "type" can be any callable that accepts a single string. If the function raises "ArgumentTypeError", "TypeError", or "ValueError", the exception is caught and a nicely formatted error message is displayed. No other exception types are handled. Common built-in types and functions can be used as type converters: import argparse import pathlib parser = argparse.ArgumentParser() parser.add_argument('count', type=int) parser.add_argument('distance', type=float) parser.add_argument('street', type=ascii) parser.add_argument('code_point', type=ord) parser.add_argument('source_file', type=open) parser.add_argument('dest_file', type=argparse.FileType('w', encoding='latin-1')) parser.add_argument('datapath', type=pathlib.Path) User defined functions can be used as well: >>> def hyphenated(string): ... return '-'.join([word[:4] for word in string.casefold().split()]) ... >>> parser = argparse.ArgumentParser() >>> _ = parser.add_argument('short_title', type=hyphenated) >>> parser.parse_args(['"The Tale of Two Cities"']) Namespace(short_title='"the-tale-of-two-citi') The "bool()" function is not recommended as a type converter. All it does is convert empty strings to "False" and non-empty strings to "True". This is usually not what is desired. In general, the "type" keyword is a convenience that should only be used for simple conversions that can only raise one of the three supported exceptions. Anything with more interesting error-handling or resource management should be done downstream after the arguments are parsed. For example, JSON or YAML conversions have complex error cases that require better reporting than can be given by the "type" keyword. A "JSONDecodeError" would not be well formatted and a "FileNotFound" exception would not be handled at all. Even "FileType" has its limitations for use with the "type" keyword. If one argument uses *FileType* and then a subsequent argument fails, an error is reported but the file is not automatically closed. In this case, it would be better to wait until after the parser has run and then use the "with"-statement to manage the files. For type checkers that simply check against a fixed set of values, consider using the choices keyword instead. choices ------- Some command-line arguments should be selected from a restricted set of values. These can be handled by passing a container object as the *choices* keyword argument to "add_argument()". When the command line is parsed, argument values will be checked, and an error message will be displayed if the argument was not one of the acceptable values: >>> parser = argparse.ArgumentParser(prog='game.py') >>> parser.add_argument('move', choices=['rock', 'paper', 'scissors']) >>> parser.parse_args(['rock']) Namespace(move='rock') >>> parser.parse_args(['fire']) usage: game.py [-h] {rock,paper,scissors} game.py: error: argument move: invalid choice: 'fire' (choose from 'rock', 'paper', 'scissors') Note that inclusion in the *choices* container is checked after any type conversions have been performed, so the type of the objects in the *choices* container should match the type specified: >>> parser = argparse.ArgumentParser(prog='doors.py') >>> parser.add_argument('door', type=int, choices=range(1, 4)) >>> print(parser.parse_args(['3'])) Namespace(door=3) >>> parser.parse_args(['4']) usage: doors.py [-h] {1,2,3} doors.py: error: argument door: invalid choice: 4 (choose from 1, 2, 3) Any container can be passed as the *choices* value, so "list" objects, "set" objects, and custom containers are all supported. Use of "enum.Enum" is not recommended because it is difficult to control its appearance in usage, help, and error messages. Formatted choices overrides the default *metavar* which is normally derived from *dest*. This is usually what you want because the user never sees the *dest* parameter. If this display isn’t desirable (perhaps because there are many choices), just specify an explicit metavar. required -------- In general, the "argparse" module assumes that flags like "-f" and "-- bar" indicate *optional* arguments, which can always be omitted at the command line. To make an option *required*, "True" can be specified for the "required=" keyword argument to "add_argument()": >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', required=True) >>> parser.parse_args(['--foo', 'BAR']) Namespace(foo='BAR') >>> parser.parse_args([]) usage: [-h] --foo FOO : error: the following arguments are required: --foo As the example shows, if an option is marked as "required", "parse_args()" will report an error if that option is not present at the command line. Note: Required options are generally considered bad form because users expect *options* to be *optional*, and thus they should be avoided when possible. help ---- The "help" value is a string containing a brief description of the argument. When a user requests help (usually by using "-h" or "--help" at the command line), these "help" descriptions will be displayed with each argument: >>> parser = argparse.ArgumentParser(prog='frobble') >>> parser.add_argument('--foo', action='store_true', ... help='foo the bars before frobbling') >>> parser.add_argument('bar', nargs='+', ... help='one of the bars to be frobbled') >>> parser.parse_args(['-h']) usage: frobble [-h] [--foo] bar [bar ...] positional arguments: bar one of the bars to be frobbled optional arguments: -h, --help show this help message and exit --foo foo the bars before frobbling The "help" strings can include various format specifiers to avoid repetition of things like the program name or the argument default. The available specifiers include the program name, "%(prog)s" and most keyword arguments to "add_argument()", e.g. "%(default)s", "%(type)s", etc.: >>> parser = argparse.ArgumentParser(prog='frobble') >>> parser.add_argument('bar', nargs='?', type=int, default=42, ... help='the bar to %(prog)s (default: %(default)s)') >>> parser.print_help() usage: frobble [-h] [bar] positional arguments: bar the bar to frobble (default: 42) optional arguments: -h, --help show this help message and exit As the help string supports %-formatting, if you want a literal "%" to appear in the help string, you must escape it as "%%". "argparse" supports silencing the help entry for certain options, by setting the "help" value to "argparse.SUPPRESS": >>> parser = argparse.ArgumentParser(prog='frobble') >>> parser.add_argument('--foo', help=argparse.SUPPRESS) >>> parser.print_help() usage: frobble [-h] optional arguments: -h, --help show this help message and exit metavar ------- When "ArgumentParser" generates help messages, it needs some way to refer to each expected argument. By default, ArgumentParser objects use the dest value as the “name” of each object. By default, for positional argument actions, the dest value is used directly, and for optional argument actions, the dest value is uppercased. So, a single positional argument with "dest='bar'" will be referred to as "bar". A single optional argument "--foo" that should be followed by a single command-line argument will be referred to as "FOO". An example: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo') >>> parser.add_argument('bar') >>> parser.parse_args('X --foo Y'.split()) Namespace(bar='X', foo='Y') >>> parser.print_help() usage: [-h] [--foo FOO] bar positional arguments: bar optional arguments: -h, --help show this help message and exit --foo FOO An alternative name can be specified with "metavar": >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', metavar='YYY') >>> parser.add_argument('bar', metavar='XXX') >>> parser.parse_args('X --foo Y'.split()) Namespace(bar='X', foo='Y') >>> parser.print_help() usage: [-h] [--foo YYY] XXX positional arguments: XXX optional arguments: -h, --help show this help message and exit --foo YYY Note that "metavar" only changes the *displayed* name - the name of the attribute on the "parse_args()" object is still determined by the dest value. Different values of "nargs" may cause the metavar to be used multiple times. Providing a tuple to "metavar" specifies a different display for each of the arguments: >>> parser = argparse.ArgumentParser(prog='PROG') >>> parser.add_argument('-x', nargs=2) >>> parser.add_argument('--foo', nargs=2, metavar=('bar', 'baz')) >>> parser.print_help() usage: PROG [-h] [-x X X] [--foo bar baz] optional arguments: -h, --help show this help message and exit -x X X --foo bar baz dest ---- Most "ArgumentParser" actions add some value as an attribute of the object returned by "parse_args()". The name of this attribute is determined by the "dest" keyword argument of "add_argument()". For positional argument actions, "dest" is normally supplied as the first argument to "add_argument()": >>> parser = argparse.ArgumentParser() >>> parser.add_argument('bar') >>> parser.parse_args(['XXX']) Namespace(bar='XXX') For optional argument actions, the value of "dest" is normally inferred from the option strings. "ArgumentParser" generates the value of "dest" by taking the first long option string and stripping away the initial "--" string. If no long option strings were supplied, "dest" will be derived from the first short option string by stripping the initial "-" character. Any internal "-" characters will be converted to "_" characters to make sure the string is a valid attribute name. The examples below illustrate this behavior: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('-f', '--foo-bar', '--foo') >>> parser.add_argument('-x', '-y') >>> parser.parse_args('-f 1 -x 2'.split()) Namespace(foo_bar='1', x='2') >>> parser.parse_args('--foo 1 -y 2'.split()) Namespace(foo_bar='1', x='2') "dest" allows a custom attribute name to be provided: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', dest='bar') >>> parser.parse_args('--foo XXX'.split()) Namespace(bar='XXX') Action classes -------------- Action classes implement the Action API, a callable which returns a callable which processes arguments from the command-line. Any object which follows this API may be passed as the "action" parameter to "add_argument()". class argparse.Action(option_strings, dest, nargs=None, const=None, default=None, type=None, choices=None, required=False, help=None, metavar=None) Action objects are used by an ArgumentParser to represent the information needed to parse a single argument from one or more strings from the command line. The Action class must accept the two positional arguments plus any keyword arguments passed to "ArgumentParser.add_argument()" except for the "action" itself. Instances of Action (or return value of any callable to the "action" parameter) should have attributes “dest”, “option_strings”, “default”, “type”, “required”, “help”, etc. defined. The easiest way to ensure these attributes are defined is to call "Action.__init__". Action instances should be callable, so subclasses must override the "__call__" method, which should accept four parameters: * "parser" - The ArgumentParser object which contains this action. * "namespace" - The "Namespace" object that will be returned by "parse_args()". Most actions add an attribute to this object using "setattr()". * "values" - The associated command-line arguments, with any type conversions applied. Type conversions are specified with the type keyword argument to "add_argument()". * "option_string" - The option string that was used to invoke this action. The "option_string" argument is optional, and will be absent if the action is associated with a positional argument. The "__call__" method may perform arbitrary actions, but will typically set attributes on the "namespace" based on "dest" and "values". Action subclasses can define a "format_usage" method that takes no argument and return a string which will be used when printing the usage of the program. If such method is not provided, a sensible default will be used. The parse_args() method ======================= ArgumentParser.parse_args(args=None, namespace=None) Convert argument strings to objects and assign them as attributes of the namespace. Return the populated namespace. Previous calls to "add_argument()" determine exactly what objects are created and how they are assigned. See the documentation for "add_argument()" for details. * args - List of strings to parse. The default is taken from "sys.argv". * namespace - An object to take the attributes. The default is a new empty "Namespace" object. Option value syntax ------------------- The "parse_args()" method supports several ways of specifying the value of an option (if it takes one). In the simplest case, the option and its value are passed as two separate arguments: >>> parser = argparse.ArgumentParser(prog='PROG') >>> parser.add_argument('-x') >>> parser.add_argument('--foo') >>> parser.parse_args(['-x', 'X']) Namespace(foo=None, x='X') >>> parser.parse_args(['--foo', 'FOO']) Namespace(foo='FOO', x=None) For long options (options with names longer than a single character), the option and value can also be passed as a single command-line argument, using "=" to separate them: >>> parser.parse_args(['--foo=FOO']) Namespace(foo='FOO', x=None) For short options (options only one character long), the option and its value can be concatenated: >>> parser.parse_args(['-xX']) Namespace(foo=None, x='X') Several short options can be joined together, using only a single "-" prefix, as long as only the last option (or none of them) requires a value: >>> parser = argparse.ArgumentParser(prog='PROG') >>> parser.add_argument('-x', action='store_true') >>> parser.add_argument('-y', action='store_true') >>> parser.add_argument('-z') >>> parser.parse_args(['-xyzZ']) Namespace(x=True, y=True, z='Z') Invalid arguments ----------------- While parsing the command line, "parse_args()" checks for a variety of errors, including ambiguous options, invalid types, invalid options, wrong number of positional arguments, etc. When it encounters such an error, it exits and prints the error along with a usage message: >>> parser = argparse.ArgumentParser(prog='PROG') >>> parser.add_argument('--foo', type=int) >>> parser.add_argument('bar', nargs='?') >>> # invalid type >>> parser.parse_args(['--foo', 'spam']) usage: PROG [-h] [--foo FOO] [bar] PROG: error: argument --foo: invalid int value: 'spam' >>> # invalid option >>> parser.parse_args(['--bar']) usage: PROG [-h] [--foo FOO] [bar] PROG: error: no such option: --bar >>> # wrong number of arguments >>> parser.parse_args(['spam', 'badger']) usage: PROG [-h] [--foo FOO] [bar] PROG: error: extra arguments found: badger Arguments containing "-" ------------------------ The "parse_args()" method attempts to give errors whenever the user has clearly made a mistake, but some situations are inherently ambiguous. For example, the command-line argument "-1" could either be an attempt to specify an option or an attempt to provide a positional argument. The "parse_args()" method is cautious here: positional arguments may only begin with "-" if they look like negative numbers and there are no options in the parser that look like negative numbers: >>> parser = argparse.ArgumentParser(prog='PROG') >>> parser.add_argument('-x') >>> parser.add_argument('foo', nargs='?') >>> # no negative number options, so -1 is a positional argument >>> parser.parse_args(['-x', '-1']) Namespace(foo=None, x='-1') >>> # no negative number options, so -1 and -5 are positional arguments >>> parser.parse_args(['-x', '-1', '-5']) Namespace(foo='-5', x='-1') >>> parser = argparse.ArgumentParser(prog='PROG') >>> parser.add_argument('-1', dest='one') >>> parser.add_argument('foo', nargs='?') >>> # negative number options present, so -1 is an option >>> parser.parse_args(['-1', 'X']) Namespace(foo=None, one='X') >>> # negative number options present, so -2 is an option >>> parser.parse_args(['-2']) usage: PROG [-h] [-1 ONE] [foo] PROG: error: no such option: -2 >>> # negative number options present, so both -1s are options >>> parser.parse_args(['-1', '-1']) usage: PROG [-h] [-1 ONE] [foo] PROG: error: argument -1: expected one argument If you have positional arguments that must begin with "-" and don’t look like negative numbers, you can insert the pseudo-argument "'--'" which tells "parse_args()" that everything after that is a positional argument: >>> parser.parse_args(['--', '-f']) Namespace(foo='-f', one=None) Argument abbreviations (prefix matching) ---------------------------------------- The "parse_args()" method by default allows long options to be abbreviated to a prefix, if the abbreviation is unambiguous (the prefix matches a unique option): >>> parser = argparse.ArgumentParser(prog='PROG') >>> parser.add_argument('-bacon') >>> parser.add_argument('-badger') >>> parser.parse_args('-bac MMM'.split()) Namespace(bacon='MMM', badger=None) >>> parser.parse_args('-bad WOOD'.split()) Namespace(bacon=None, badger='WOOD') >>> parser.parse_args('-ba BA'.split()) usage: PROG [-h] [-bacon BACON] [-badger BADGER] PROG: error: ambiguous option: -ba could match -badger, -bacon An error is produced for arguments that could produce more than one options. This feature can be disabled by setting allow_abbrev to "False". Beyond "sys.argv" ----------------- Sometimes it may be useful to have an ArgumentParser parse arguments other than those of "sys.argv". This can be accomplished by passing a list of strings to "parse_args()". This is useful for testing at the interactive prompt: >>> parser = argparse.ArgumentParser() >>> parser.add_argument( ... 'integers', metavar='int', type=int, choices=range(10), ... nargs='+', help='an integer in the range 0..9') >>> parser.add_argument( ... '--sum', dest='accumulate', action='store_const', const=sum, ... default=max, help='sum the integers (default: find the max)') >>> parser.parse_args(['1', '2', '3', '4']) Namespace(accumulate=, integers=[1, 2, 3, 4]) >>> parser.parse_args(['1', '2', '3', '4', '--sum']) Namespace(accumulate=, integers=[1, 2, 3, 4]) The Namespace object -------------------- class argparse.Namespace Simple class used by default by "parse_args()" to create an object holding attributes and return it. This class is deliberately simple, just an "object" subclass with a readable string representation. If you prefer to have dict-like view of the attributes, you can use the standard Python idiom, "vars()": >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo') >>> args = parser.parse_args(['--foo', 'BAR']) >>> vars(args) {'foo': 'BAR'} It may also be useful to have an "ArgumentParser" assign attributes to an already existing object, rather than a new "Namespace" object. This can be achieved by specifying the "namespace=" keyword argument: >>> class C: ... pass ... >>> c = C() >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo') >>> parser.parse_args(args=['--foo', 'BAR'], namespace=c) >>> c.foo 'BAR' Other utilities =============== Sub-commands ------------ ArgumentParser.add_subparsers([title][, description][, prog][, parser_class][, action][, option_string][, dest][, required][, help][, metavar]) Many programs split up their functionality into a number of sub- commands, for example, the "svn" program can invoke sub-commands like "svn checkout", "svn update", and "svn commit". Splitting up functionality this way can be a particularly good idea when a program performs several different functions which require different kinds of command-line arguments. "ArgumentParser" supports the creation of such sub-commands with the "add_subparsers()" method. The "add_subparsers()" method is normally called with no arguments and returns a special action object. This object has a single method, "add_parser()", which takes a command name and any "ArgumentParser" constructor arguments, and returns an "ArgumentParser" object that can be modified as usual. Description of parameters: * title - title for the sub-parser group in help output; by default “subcommands” if description is provided, otherwise uses title for positional arguments * description - description for the sub-parser group in help output, by default "None" * prog - usage information that will be displayed with sub-command help, by default the name of the program and any positional arguments before the subparser argument * parser_class - class which will be used to create sub-parser instances, by default the class of the current parser (e.g. ArgumentParser) * action - the basic type of action to be taken when this argument is encountered at the command line * dest - name of the attribute under which sub-command name will be stored; by default "None" and no value is stored * required - Whether or not a subcommand must be provided, by default "False" (added in 3.7) * help - help for sub-parser group in help output, by default "None" * metavar - string presenting available sub-commands in help; by default it is "None" and presents sub-commands in form {cmd1, cmd2, ..} Some example usage: >>> # create the top-level parser >>> parser = argparse.ArgumentParser(prog='PROG') >>> parser.add_argument('--foo', action='store_true', help='foo help') >>> subparsers = parser.add_subparsers(help='sub-command help') >>> >>> # create the parser for the "a" command >>> parser_a = subparsers.add_parser('a', help='a help') >>> parser_a.add_argument('bar', type=int, help='bar help') >>> >>> # create the parser for the "b" command >>> parser_b = subparsers.add_parser('b', help='b help') >>> parser_b.add_argument('--baz', choices='XYZ', help='baz help') >>> >>> # parse some argument lists >>> parser.parse_args(['a', '12']) Namespace(bar=12, foo=False) >>> parser.parse_args(['--foo', 'b', '--baz', 'Z']) Namespace(baz='Z', foo=True) Note that the object returned by "parse_args()" will only contain attributes for the main parser and the subparser that was selected by the command line (and not any other subparsers). So in the example above, when the "a" command is specified, only the "foo" and "bar" attributes are present, and when the "b" command is specified, only the "foo" and "baz" attributes are present. Similarly, when a help message is requested from a subparser, only the help for that particular parser will be printed. The help message will not include parent parser or sibling parser messages. (A help message for each subparser command, however, can be given by supplying the "help=" argument to "add_parser()" as above.) >>> parser.parse_args(['--help']) usage: PROG [-h] [--foo] {a,b} ... positional arguments: {a,b} sub-command help a a help b b help optional arguments: -h, --help show this help message and exit --foo foo help >>> parser.parse_args(['a', '--help']) usage: PROG a [-h] bar positional arguments: bar bar help optional arguments: -h, --help show this help message and exit >>> parser.parse_args(['b', '--help']) usage: PROG b [-h] [--baz {X,Y,Z}] optional arguments: -h, --help show this help message and exit --baz {X,Y,Z} baz help The "add_subparsers()" method also supports "title" and "description" keyword arguments. When either is present, the subparser’s commands will appear in their own group in the help output. For example: >>> parser = argparse.ArgumentParser() >>> subparsers = parser.add_subparsers(title='subcommands', ... description='valid subcommands', ... help='additional help') >>> subparsers.add_parser('foo') >>> subparsers.add_parser('bar') >>> parser.parse_args(['-h']) usage: [-h] {foo,bar} ... optional arguments: -h, --help show this help message and exit subcommands: valid subcommands {foo,bar} additional help Furthermore, "add_parser" supports an additional "aliases" argument, which allows multiple strings to refer to the same subparser. This example, like "svn", aliases "co" as a shorthand for "checkout": >>> parser = argparse.ArgumentParser() >>> subparsers = parser.add_subparsers() >>> checkout = subparsers.add_parser('checkout', aliases=['co']) >>> checkout.add_argument('foo') >>> parser.parse_args(['co', 'bar']) Namespace(foo='bar') One particularly effective way of handling sub-commands is to combine the use of the "add_subparsers()" method with calls to "set_defaults()" so that each subparser knows which Python function it should execute. For example: >>> # sub-command functions >>> def foo(args): ... print(args.x * args.y) ... >>> def bar(args): ... print('((%s))' % args.z) ... >>> # create the top-level parser >>> parser = argparse.ArgumentParser() >>> subparsers = parser.add_subparsers() >>> >>> # create the parser for the "foo" command >>> parser_foo = subparsers.add_parser('foo') >>> parser_foo.add_argument('-x', type=int, default=1) >>> parser_foo.add_argument('y', type=float) >>> parser_foo.set_defaults(func=foo) >>> >>> # create the parser for the "bar" command >>> parser_bar = subparsers.add_parser('bar') >>> parser_bar.add_argument('z') >>> parser_bar.set_defaults(func=bar) >>> >>> # parse the args and call whatever function was selected >>> args = parser.parse_args('foo 1 -x 2'.split()) >>> args.func(args) 2.0 >>> >>> # parse the args and call whatever function was selected >>> args = parser.parse_args('bar XYZYX'.split()) >>> args.func(args) ((XYZYX)) This way, you can let "parse_args()" do the job of calling the appropriate function after argument parsing is complete. Associating functions with actions like this is typically the easiest way to handle the different actions for each of your subparsers. However, if it is necessary to check the name of the subparser that was invoked, the "dest" keyword argument to the "add_subparsers()" call will work: >>> parser = argparse.ArgumentParser() >>> subparsers = parser.add_subparsers(dest='subparser_name') >>> subparser1 = subparsers.add_parser('1') >>> subparser1.add_argument('-x') >>> subparser2 = subparsers.add_parser('2') >>> subparser2.add_argument('y') >>> parser.parse_args(['2', 'frobble']) Namespace(subparser_name='2', y='frobble') Changed in version 3.7: New *required* keyword argument. FileType objects ---------------- class argparse.FileType(mode='r', bufsize=-1, encoding=None, errors=None) The "FileType" factory creates objects that can be passed to the type argument of "ArgumentParser.add_argument()". Arguments that have "FileType" objects as their type will open command-line arguments as files with the requested modes, buffer sizes, encodings and error handling (see the "open()" function for more details): >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--raw', type=argparse.FileType('wb', 0)) >>> parser.add_argument('out', type=argparse.FileType('w', encoding='UTF-8')) >>> parser.parse_args(['--raw', 'raw.dat', 'file.txt']) Namespace(out=<_io.TextIOWrapper name='file.txt' mode='w' encoding='UTF-8'>, raw=<_io.FileIO name='raw.dat' mode='wb'>) FileType objects understand the pseudo-argument "'-'" and automatically convert this into "sys.stdin" for readable "FileType" objects and "sys.stdout" for writable "FileType" objects: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('infile', type=argparse.FileType('r')) >>> parser.parse_args(['-']) Namespace(infile=<_io.TextIOWrapper name='' encoding='UTF-8'>) New in version 3.4: The *encodings* and *errors* keyword arguments. Argument groups --------------- ArgumentParser.add_argument_group(title=None, description=None) By default, "ArgumentParser" groups command-line arguments into “positional arguments” and “optional arguments” when displaying help messages. When there is a better conceptual grouping of arguments than this default one, appropriate groups can be created using the "add_argument_group()" method: >>> parser = argparse.ArgumentParser(prog='PROG', add_help=False) >>> group = parser.add_argument_group('group') >>> group.add_argument('--foo', help='foo help') >>> group.add_argument('bar', help='bar help') >>> parser.print_help() usage: PROG [--foo FOO] bar group: bar bar help --foo FOO foo help The "add_argument_group()" method returns an argument group object which has an "add_argument()" method just like a regular "ArgumentParser". When an argument is added to the group, the parser treats it just like a normal argument, but displays the argument in a separate group for help messages. The "add_argument_group()" method accepts *title* and *description* arguments which can be used to customize this display: >>> parser = argparse.ArgumentParser(prog='PROG', add_help=False) >>> group1 = parser.add_argument_group('group1', 'group1 description') >>> group1.add_argument('foo', help='foo help') >>> group2 = parser.add_argument_group('group2', 'group2 description') >>> group2.add_argument('--bar', help='bar help') >>> parser.print_help() usage: PROG [--bar BAR] foo group1: group1 description foo foo help group2: group2 description --bar BAR bar help Note that any arguments not in your user-defined groups will end up back in the usual “positional arguments” and “optional arguments” sections. Mutual exclusion ---------------- ArgumentParser.add_mutually_exclusive_group(required=False) Create a mutually exclusive group. "argparse" will make sure that only one of the arguments in the mutually exclusive group was present on the command line: >>> parser = argparse.ArgumentParser(prog='PROG') >>> group = parser.add_mutually_exclusive_group() >>> group.add_argument('--foo', action='store_true') >>> group.add_argument('--bar', action='store_false') >>> parser.parse_args(['--foo']) Namespace(bar=True, foo=True) >>> parser.parse_args(['--bar']) Namespace(bar=False, foo=False) >>> parser.parse_args(['--foo', '--bar']) usage: PROG [-h] [--foo | --bar] PROG: error: argument --bar: not allowed with argument --foo The "add_mutually_exclusive_group()" method also accepts a *required* argument, to indicate that at least one of the mutually exclusive arguments is required: >>> parser = argparse.ArgumentParser(prog='PROG') >>> group = parser.add_mutually_exclusive_group(required=True) >>> group.add_argument('--foo', action='store_true') >>> group.add_argument('--bar', action='store_false') >>> parser.parse_args([]) usage: PROG [-h] (--foo | --bar) PROG: error: one of the arguments --foo --bar is required Note that currently mutually exclusive argument groups do not support the *title* and *description* arguments of "add_argument_group()". Parser defaults --------------- ArgumentParser.set_defaults(**kwargs) Most of the time, the attributes of the object returned by "parse_args()" will be fully determined by inspecting the command- line arguments and the argument actions. "set_defaults()" allows some additional attributes that are determined without any inspection of the command line to be added: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('foo', type=int) >>> parser.set_defaults(bar=42, baz='badger') >>> parser.parse_args(['736']) Namespace(bar=42, baz='badger', foo=736) Note that parser-level defaults always override argument-level defaults: >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', default='bar') >>> parser.set_defaults(foo='spam') >>> parser.parse_args([]) Namespace(foo='spam') Parser-level defaults can be particularly useful when working with multiple parsers. See the "add_subparsers()" method for an example of this type. ArgumentParser.get_default(dest) Get the default value for a namespace attribute, as set by either "add_argument()" or by "set_defaults()": >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', default='badger') >>> parser.get_default('foo') 'badger' Printing help ------------- In most typical applications, "parse_args()" will take care of formatting and printing any usage or error messages. However, several formatting methods are available: ArgumentParser.print_usage(file=None) Print a brief description of how the "ArgumentParser" should be invoked on the command line. If *file* is "None", "sys.stdout" is assumed. ArgumentParser.print_help(file=None) Print a help message, including the program usage and information about the arguments registered with the "ArgumentParser". If *file* is "None", "sys.stdout" is assumed. There are also variants of these methods that simply return a string instead of printing it: ArgumentParser.format_usage() Return a string containing a brief description of how the "ArgumentParser" should be invoked on the command line. ArgumentParser.format_help() Return a string containing a help message, including the program usage and information about the arguments registered with the "ArgumentParser". Partial parsing --------------- ArgumentParser.parse_known_args(args=None, namespace=None) Sometimes a script may only parse a few of the command-line arguments, passing the remaining arguments on to another script or program. In these cases, the "parse_known_args()" method can be useful. It works much like "parse_args()" except that it does not produce an error when extra arguments are present. Instead, it returns a two item tuple containing the populated namespace and the list of remaining argument strings. >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo', action='store_true') >>> parser.add_argument('bar') >>> parser.parse_known_args(['--foo', '--badger', 'BAR', 'spam']) (Namespace(bar='BAR', foo=True), ['--badger', 'spam']) Warning: Prefix matching rules apply to "parse_known_args()". The parser may consume an option even if it’s just a prefix of one of its known options, instead of leaving it in the remaining arguments list. Customizing file parsing ------------------------ ArgumentParser.convert_arg_line_to_args(arg_line) Arguments that are read from a file (see the *fromfile_prefix_chars* keyword argument to the "ArgumentParser" constructor) are read one argument per line. "convert_arg_line_to_args()" can be overridden for fancier reading. This method takes a single argument *arg_line* which is a string read from the argument file. It returns a list of arguments parsed from this string. The method is called once per line read from the argument file, in order. A useful override of this method is one that treats each space- separated word as an argument. The following example demonstrates how to do this: class MyArgumentParser(argparse.ArgumentParser): def convert_arg_line_to_args(self, arg_line): return arg_line.split() Exiting methods --------------- ArgumentParser.exit(status=0, message=None) This method terminates the program, exiting with the specified *status* and, if given, it prints a *message* before that. The user can override this method to handle these steps differently: class ErrorCatchingArgumentParser(argparse.ArgumentParser): def exit(self, status=0, message=None): if status: raise Exception(f'Exiting because of an error: {message}') exit(status) ArgumentParser.error(message) This method prints a usage message including the *message* to the standard error and terminates the program with a status code of 2. Intermixed parsing ------------------ ArgumentParser.parse_intermixed_args(args=None, namespace=None) ArgumentParser.parse_known_intermixed_args(args=None, namespace=None) A number of Unix commands allow the user to intermix optional arguments with positional arguments. The "parse_intermixed_args()" and "parse_known_intermixed_args()" methods support this parsing style. These parsers do not support all the argparse features, and will raise exceptions if unsupported features are used. In particular, subparsers, "argparse.REMAINDER", and mutually exclusive groups that include both optionals and positionals are not supported. The following example shows the difference between "parse_known_args()" and "parse_intermixed_args()": the former returns "['2', '3']" as unparsed arguments, while the latter collects all the positionals into "rest". >>> parser = argparse.ArgumentParser() >>> parser.add_argument('--foo') >>> parser.add_argument('cmd') >>> parser.add_argument('rest', nargs='*', type=int) >>> parser.parse_known_args('doit 1 --foo bar 2 3'.split()) (Namespace(cmd='doit', foo='bar', rest=[1]), ['2', '3']) >>> parser.parse_intermixed_args('doit 1 --foo bar 2 3'.split()) Namespace(cmd='doit', foo='bar', rest=[1, 2, 3]) "parse_known_intermixed_args()" returns a two item tuple containing the populated namespace and the list of remaining argument strings. "parse_intermixed_args()" raises an error if there are any remaining unparsed argument strings. New in version 3.7. Upgrading optparse code ======================= Originally, the "argparse" module had attempted to maintain compatibility with "optparse". However, "optparse" was difficult to extend transparently, particularly with the changes required to support the new "nargs=" specifiers and better usage messages. When most everything in "optparse" had either been copy-pasted over or monkey-patched, it no longer seemed practical to try to maintain the backwards compatibility. The "argparse" module improves on the standard library "optparse" module in a number of ways including: * Handling positional arguments. * Supporting sub-commands. * Allowing alternative option prefixes like "+" and "/". * Handling zero-or-more and one-or-more style arguments. * Producing more informative usage messages. * Providing a much simpler interface for custom "type" and "action". A partial upgrade path from "optparse" to "argparse": * Replace all "optparse.OptionParser.add_option()" calls with "ArgumentParser.add_argument()" calls. * Replace "(options, args) = parser.parse_args()" with "args = parser.parse_args()" and add additional "ArgumentParser.add_argument()" calls for the positional arguments. Keep in mind that what was previously called "options", now in the "argparse" context is called "args". * Replace "optparse.OptionParser.disable_interspersed_args()" by using "parse_intermixed_args()" instead of "parse_args()". * Replace callback actions and the "callback_*" keyword arguments with "type" or "action" arguments. * Replace string names for "type" keyword arguments with the corresponding type objects (e.g. int, float, complex, etc). * Replace "optparse.Values" with "Namespace" and "optparse.OptionError" and "optparse.OptionValueError" with "ArgumentError". * Replace strings with implicit arguments such as "%default" or "%prog" with the standard Python syntax to use dictionaries to format strings, that is, "%(default)s" and "%(prog)s". * Replace the OptionParser constructor "version" argument with a call to "parser.add_argument('--version', action='version', version='')". "array" — Efficient arrays of numeric values ******************************************** ====================================================================== This module defines an object type which can compactly represent an array of basic values: characters, integers, floating point numbers. Arrays are sequence types and behave very much like lists, except that the type of objects stored in them is constrained. The type is specified at object creation time by using a *type code*, which is a single character. The following type codes are defined: +-------------+----------------------+---------------------+-------------------------+---------+ | Type code | C Type | Python Type | Minimum size in bytes | Notes | |=============|======================|=====================|=========================|=========| | "'b'" | signed char | int | 1 | | +-------------+----------------------+---------------------+-------------------------+---------+ | "'B'" | unsigned char | int | 1 | | +-------------+----------------------+---------------------+-------------------------+---------+ | "'u'" | wchar_t | Unicode character | 2 | (1) | +-------------+----------------------+---------------------+-------------------------+---------+ | "'h'" | signed short | int | 2 | | +-------------+----------------------+---------------------+-------------------------+---------+ | "'H'" | unsigned short | int | 2 | | +-------------+----------------------+---------------------+-------------------------+---------+ | "'i'" | signed int | int | 2 | | +-------------+----------------------+---------------------+-------------------------+---------+ | "'I'" | unsigned int | int | 2 | | +-------------+----------------------+---------------------+-------------------------+---------+ | "'l'" | signed long | int | 4 | | +-------------+----------------------+---------------------+-------------------------+---------+ | "'L'" | unsigned long | int | 4 | | +-------------+----------------------+---------------------+-------------------------+---------+ | "'q'" | signed long long | int | 8 | | +-------------+----------------------+---------------------+-------------------------+---------+ | "'Q'" | unsigned long long | int | 8 | | +-------------+----------------------+---------------------+-------------------------+---------+ | "'f'" | float | float | 4 | | +-------------+----------------------+---------------------+-------------------------+---------+ | "'d'" | double | float | 8 | | +-------------+----------------------+---------------------+-------------------------+---------+ Notes: 1. It can be 16 bits or 32 bits depending on the platform. Changed in version 3.9: "array('u')" now uses "wchar_t" as C type instead of deprecated "Py_UNICODE". This change doesn’t affect to its behavior because "Py_UNICODE" is alias of "wchar_t" since Python 3.3. Deprecated since version 3.3, will be removed in version 4.0. The actual representation of values is determined by the machine architecture (strictly speaking, by the C implementation). The actual size can be accessed through the "itemsize" attribute. The module defines the following type: class array.array(typecode[, initializer]) A new array whose items are restricted by *typecode*, and initialized from the optional *initializer* value, which must be a list, a *bytes-like object*, or iterable over elements of the appropriate type. If given a list or string, the initializer is passed to the new array’s "fromlist()", "frombytes()", or "fromunicode()" method (see below) to add initial items to the array. Otherwise, the iterable initializer is passed to the "extend()" method. Raises an auditing event "array.__new__" with arguments "typecode", "initializer". array.typecodes A string with all available type codes. Array objects support the ordinary sequence operations of indexing, slicing, concatenation, and multiplication. When using slice assignment, the assigned value must be an array object with the same type code; in all other cases, "TypeError" is raised. Array objects also implement the buffer interface, and may be used wherever *bytes- like objects* are supported. The following data items and methods are also supported: array.typecode The typecode character used to create the array. array.itemsize The length in bytes of one array item in the internal representation. array.append(x) Append a new item with value *x* to the end of the array. array.buffer_info() Return a tuple "(address, length)" giving the current memory address and the length in elements of the buffer used to hold array’s contents. The size of the memory buffer in bytes can be computed as "array.buffer_info()[1] * array.itemsize". This is occasionally useful when working with low-level (and inherently unsafe) I/O interfaces that require memory addresses, such as certain "ioctl()" operations. The returned numbers are valid as long as the array exists and no length-changing operations are applied to it. Note: When using array objects from code written in C or C++ (the only way to effectively make use of this information), it makes more sense to use the buffer interface supported by array objects. This method is maintained for backward compatibility and should be avoided in new code. The buffer interface is documented in Buffer Protocol. array.byteswap() “Byteswap” all items of the array. This is only supported for values which are 1, 2, 4, or 8 bytes in size; for other types of values, "RuntimeError" is raised. It is useful when reading data from a file written on a machine with a different byte order. array.count(x) Return the number of occurrences of *x* in the array. array.extend(iterable) Append items from *iterable* to the end of the array. If *iterable* is another array, it must have *exactly* the same type code; if not, "TypeError" will be raised. If *iterable* is not an array, it must be iterable and its elements must be the right type to be appended to the array. array.frombytes(s) Appends items from the string, interpreting the string as an array of machine values (as if it had been read from a file using the "fromfile()" method). New in version 3.2: "fromstring()" is renamed to "frombytes()" for clarity. array.fromfile(f, n) Read *n* items (as machine values) from the *file object* *f* and append them to the end of the array. If less than *n* items are available, "EOFError" is raised, but the items that were available are still inserted into the array. array.fromlist(list) Append items from the list. This is equivalent to "for x in list: a.append(x)" except that if there is a type error, the array is unchanged. array.fromunicode(s) Extends this array with data from the given unicode string. The array must be a type "'u'" array; otherwise a "ValueError" is raised. Use "array.frombytes(unicodestring.encode(enc))" to append Unicode data to an array of some other type. array.index(x) Return the smallest *i* such that *i* is the index of the first occurrence of *x* in the array. array.insert(i, x) Insert a new item with value *x* in the array before position *i*. Negative values are treated as being relative to the end of the array. array.pop([i]) Removes the item with the index *i* from the array and returns it. The optional argument defaults to "-1", so that by default the last item is removed and returned. array.remove(x) Remove the first occurrence of *x* from the array. array.reverse() Reverse the order of the items in the array. array.tobytes() Convert the array to an array of machine values and return the bytes representation (the same sequence of bytes that would be written to a file by the "tofile()" method.) New in version 3.2: "tostring()" is renamed to "tobytes()" for clarity. array.tofile(f) Write all items (as machine values) to the *file object* *f*. array.tolist() Convert the array to an ordinary list with the same items. array.tounicode() Convert the array to a unicode string. The array must be a type "'u'" array; otherwise a "ValueError" is raised. Use "array.tobytes().decode(enc)" to obtain a unicode string from an array of some other type. When an array object is printed or converted to a string, it is represented as "array(typecode, initializer)". The *initializer* is omitted if the array is empty, otherwise it is a string if the *typecode* is "'u'", otherwise it is a list of numbers. The string is guaranteed to be able to be converted back to an array with the same type and value using "eval()", so long as the "array" class has been imported using "from array import array". Examples: array('l') array('u', 'hello \u2641') array('l', [1, 2, 3, 4, 5]) array('d', [1.0, 2.0, 3.14]) See also: Module "struct" Packing and unpacking of heterogeneous binary data. Module "xdrlib" Packing and unpacking of External Data Representation (XDR) data as used in some remote procedure call systems. NumPy The NumPy package defines another array type. "ast" — Abstract Syntax Trees ***************************** **Source code:** Lib/ast.py ====================================================================== The "ast" module helps Python applications to process trees of the Python abstract syntax grammar. The abstract syntax itself might change with each Python release; this module helps to find out programmatically what the current grammar looks like. An abstract syntax tree can be generated by passing "ast.PyCF_ONLY_AST" as a flag to the "compile()" built-in function, or using the "parse()" helper provided in this module. The result will be a tree of objects whose classes all inherit from "ast.AST". An abstract syntax tree can be compiled into a Python code object using the built-in "compile()" function. Abstract Grammar ================ The abstract grammar is currently defined as follows: -- ASDL's 4 builtin types are: -- identifier, int, string, constant module Python { mod = Module(stmt* body, type_ignore* type_ignores) | Interactive(stmt* body) | Expression(expr body) | FunctionType(expr* argtypes, expr returns) stmt = FunctionDef(identifier name, arguments args, stmt* body, expr* decorator_list, expr? returns, string? type_comment) | AsyncFunctionDef(identifier name, arguments args, stmt* body, expr* decorator_list, expr? returns, string? type_comment) | ClassDef(identifier name, expr* bases, keyword* keywords, stmt* body, expr* decorator_list) | Return(expr? value) | Delete(expr* targets) | Assign(expr* targets, expr value, string? type_comment) | AugAssign(expr target, operator op, expr value) -- 'simple' indicates that we annotate simple name without parens | AnnAssign(expr target, expr annotation, expr? value, int simple) -- use 'orelse' because else is a keyword in target languages | For(expr target, expr iter, stmt* body, stmt* orelse, string? type_comment) | AsyncFor(expr target, expr iter, stmt* body, stmt* orelse, string? type_comment) | While(expr test, stmt* body, stmt* orelse) | If(expr test, stmt* body, stmt* orelse) | With(withitem* items, stmt* body, string? type_comment) | AsyncWith(withitem* items, stmt* body, string? type_comment) | Raise(expr? exc, expr? cause) | Try(stmt* body, excepthandler* handlers, stmt* orelse, stmt* finalbody) | Assert(expr test, expr? msg) | Import(alias* names) | ImportFrom(identifier? module, alias* names, int? level) | Global(identifier* names) | Nonlocal(identifier* names) | Expr(expr value) | Pass | Break | Continue -- col_offset is the byte offset in the utf8 string the parser uses attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset) -- BoolOp() can use left & right? expr = BoolOp(boolop op, expr* values) | NamedExpr(expr target, expr value) | BinOp(expr left, operator op, expr right) | UnaryOp(unaryop op, expr operand) | Lambda(arguments args, expr body) | IfExp(expr test, expr body, expr orelse) | Dict(expr* keys, expr* values) | Set(expr* elts) | ListComp(expr elt, comprehension* generators) | SetComp(expr elt, comprehension* generators) | DictComp(expr key, expr value, comprehension* generators) | GeneratorExp(expr elt, comprehension* generators) -- the grammar constrains where yield expressions can occur | Await(expr value) | Yield(expr? value) | YieldFrom(expr value) -- need sequences for compare to distinguish between -- x < 4 < 3 and (x < 4) < 3 | Compare(expr left, cmpop* ops, expr* comparators) | Call(expr func, expr* args, keyword* keywords) | FormattedValue(expr value, int? conversion, expr? format_spec) | JoinedStr(expr* values) | Constant(constant value, string? kind) -- the following expression can appear in assignment context | Attribute(expr value, identifier attr, expr_context ctx) | Subscript(expr value, expr slice, expr_context ctx) | Starred(expr value, expr_context ctx) | Name(identifier id, expr_context ctx) | List(expr* elts, expr_context ctx) | Tuple(expr* elts, expr_context ctx) -- can appear only in Subscript | Slice(expr? lower, expr? upper, expr? step) -- col_offset is the byte offset in the utf8 string the parser uses attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset) expr_context = Load | Store | Del boolop = And | Or operator = Add | Sub | Mult | MatMult | Div | Mod | Pow | LShift | RShift | BitOr | BitXor | BitAnd | FloorDiv unaryop = Invert | Not | UAdd | USub cmpop = Eq | NotEq | Lt | LtE | Gt | GtE | Is | IsNot | In | NotIn comprehension = (expr target, expr iter, expr* ifs, int is_async) excepthandler = ExceptHandler(expr? type, identifier? name, stmt* body) attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset) arguments = (arg* posonlyargs, arg* args, arg? vararg, arg* kwonlyargs, expr* kw_defaults, arg? kwarg, expr* defaults) arg = (identifier arg, expr? annotation, string? type_comment) attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset) -- keyword arguments supplied to call (NULL identifier for **kwargs) keyword = (identifier? arg, expr value) attributes (int lineno, int col_offset, int? end_lineno, int? end_col_offset) -- import name with optional 'as' alias. alias = (identifier name, identifier? asname) withitem = (expr context_expr, expr? optional_vars) type_ignore = TypeIgnore(int lineno, string tag) } Node classes ============ class ast.AST This is the base of all AST node classes. The actual node classes are derived from the "Parser/Python.asdl" file, which is reproduced below. They are defined in the "_ast" C module and re-exported in "ast". There is one class defined for each left-hand side symbol in the abstract grammar (for example, "ast.stmt" or "ast.expr"). In addition, there is one class defined for each constructor on the right-hand side; these classes inherit from the classes for the left-hand side trees. For example, "ast.BinOp" inherits from "ast.expr". For production rules with alternatives (aka “sums”), the left-hand side class is abstract: only instances of specific constructor nodes are ever created. _fields Each concrete class has an attribute "_fields" which gives the names of all child nodes. Each instance of a concrete class has one attribute for each child node, of the type as defined in the grammar. For example, "ast.BinOp" instances have an attribute "left" of type "ast.expr". If these attributes are marked as optional in the grammar (using a question mark), the value might be "None". If the attributes can have zero-or-more values (marked with an asterisk), the values are represented as Python lists. All possible attributes must be present and have valid values when compiling an AST with "compile()". lineno col_offset end_lineno end_col_offset Instances of "ast.expr" and "ast.stmt" subclasses have "lineno", "col_offset", "end_lineno", and "end_col_offset" attributes. The "lineno" and "end_lineno" are the first and last line numbers of the source text span (1-indexed so the first line is line 1), and the "col_offset" and "end_col_offset" are the corresponding UTF-8 byte offsets of the first and last tokens that generated the node. The UTF-8 offset is recorded because the parser uses UTF-8 internally. Note that the end positions are not required by the compiler and are therefore optional. The end offset is *after* the last symbol, for example one can get the source segment of a one-line expression node using "source_line[node.col_offset : node.end_col_offset]". The constructor of a class "ast.T" parses its arguments as follows: * If there are positional arguments, there must be as many as there are items in "T._fields"; they will be assigned as attributes of these names. * If there are keyword arguments, they will set the attributes of the same names to the given values. For example, to create and populate an "ast.UnaryOp" node, you could use node = ast.UnaryOp() node.op = ast.USub() node.operand = ast.Constant() node.operand.value = 5 node.operand.lineno = 0 node.operand.col_offset = 0 node.lineno = 0 node.col_offset = 0 or the more compact node = ast.UnaryOp(ast.USub(), ast.Constant(5, lineno=0, col_offset=0), lineno=0, col_offset=0) Changed in version 3.8: Class "ast.Constant" is now used for all constants. Changed in version 3.9: Simple indices are represented by their value, extended slices are represented as tuples. Deprecated since version 3.8: Old classes "ast.Num", "ast.Str", "ast.Bytes", "ast.NameConstant" and "ast.Ellipsis" are still available, but they will be removed in future Python releases. In the meantime, instantiating them will return an instance of a different class. Deprecated since version 3.9: Old classes "ast.Index" and "ast.ExtSlice" are still available, but they will be removed in future Python releases. In the meantime, instantiating them will return an instance of a different class. Note: The descriptions of the specific node classes displayed here were initially adapted from the fantastic Green Tree Snakes project and all its contributors. Literals -------- class ast.Constant(value) A constant value. The "value" attribute of the "Constant" literal contains the Python object it represents. The values represented can be simple types such as a number, string or "None", but also immutable container types (tuples and frozensets) if all of their elements are constant. >>> print(ast.dump(ast.parse('123', mode='eval'), indent=4)) Expression( body=Constant(value=123)) class ast.FormattedValue(value, conversion, format_spec) Node representing a single formatting field in an f-string. If the string contains a single formatting field and nothing else the node can be isolated otherwise it appears in "JoinedStr". * "value" is any expression node (such as a literal, a variable, or a function call). * "conversion" is an integer: * -1: no formatting * 115: "!s" string formatting * 114: "!r" repr formatting * 97: "!a" ascii formatting * "format_spec" is a "JoinedStr" node representing the formatting of the value, or "None" if no format was specified. Both "conversion" and "format_spec" can be set at the same time. class ast.JoinedStr(values) An f-string, comprising a series of "FormattedValue" and "Constant" nodes. >>> print(ast.dump(ast.parse('f"sin({a}) is {sin(a):.3}"', mode='eval'), indent=4)) Expression( body=JoinedStr( values=[ Constant(value='sin('), FormattedValue( value=Name(id='a', ctx=Load()), conversion=-1), Constant(value=') is '), FormattedValue( value=Call( func=Name(id='sin', ctx=Load()), args=[ Name(id='a', ctx=Load())], keywords=[]), conversion=-1, format_spec=JoinedStr( values=[ Constant(value='.3')]))])) class ast.List(elts, ctx) class ast.Tuple(elts, ctx) A list or tuple. "elts" holds a list of nodes representing the elements. "ctx" is "Store" if the container is an assignment target (i.e. "(x,y)=something"), and "Load" otherwise. >>> print(ast.dump(ast.parse('[1, 2, 3]', mode='eval'), indent=4)) Expression( body=List( elts=[ Constant(value=1), Constant(value=2), Constant(value=3)], ctx=Load())) >>> print(ast.dump(ast.parse('(1, 2, 3)', mode='eval'), indent=4)) Expression( body=Tuple( elts=[ Constant(value=1), Constant(value=2), Constant(value=3)], ctx=Load())) class ast.Set(elts) A set. "elts" holds a list of nodes representing the set’s elements. >>> print(ast.dump(ast.parse('{1, 2, 3}', mode='eval'), indent=4)) Expression( body=Set( elts=[ Constant(value=1), Constant(value=2), Constant(value=3)])) class ast.Dict(keys, values) A dictionary. "keys" and "values" hold lists of nodes representing the keys and the values respectively, in matching order (what would be returned when calling "dictionary.keys()" and "dictionary.values()"). When doing dictionary unpacking using dictionary literals the expression to be expanded goes in the "values" list, with a "None" at the corresponding position in "keys". >>> print(ast.dump(ast.parse('{"a":1, **d}', mode='eval'), indent=4)) Expression( body=Dict( keys=[ Constant(value='a'), None], values=[ Constant(value=1), Name(id='d', ctx=Load())])) Variables --------- class ast.Name(id, ctx) A variable name. "id" holds the name as a string, and "ctx" is one of the following types. class ast.Load class ast.Store class ast.Del Variable references can be used to load the value of a variable, to assign a new value to it, or to delete it. Variable references are given a context to distinguish these cases. >>> print(ast.dump(ast.parse('a'), indent=4)) Module( body=[ Expr( value=Name(id='a', ctx=Load()))], type_ignores=[]) >>> print(ast.dump(ast.parse('a = 1'), indent=4)) Module( body=[ Assign( targets=[ Name(id='a', ctx=Store())], value=Constant(value=1))], type_ignores=[]) >>> print(ast.dump(ast.parse('del a'), indent=4)) Module( body=[ Delete( targets=[ Name(id='a', ctx=Del())])], type_ignores=[]) class ast.Starred(value, ctx) A "*var" variable reference. "value" holds the variable, typically a "Name" node. This type must be used when building a "Call" node with "*args". >>> print(ast.dump(ast.parse('a, *b = it'), indent=4)) Module( body=[ Assign( targets=[ Tuple( elts=[ Name(id='a', ctx=Store()), Starred( value=Name(id='b', ctx=Store()), ctx=Store())], ctx=Store())], value=Name(id='it', ctx=Load()))], type_ignores=[]) Expressions ----------- class ast.Expr(value) When an expression, such as a function call, appears as a statement by itself with its return value not used or stored, it is wrapped in this container. "value" holds one of the other nodes in this section, a "Constant", a "Name", a "Lambda", a "Yield" or "YieldFrom" node. >>> print(ast.dump(ast.parse('-a'), indent=4)) Module( body=[ Expr( value=UnaryOp( op=USub(), operand=Name(id='a', ctx=Load())))], type_ignores=[]) class ast.UnaryOp(op, operand) A unary operation. "op" is the operator, and "operand" any expression node. class ast.UAdd class ast.USub class ast.Not class ast.Invert Unary operator tokens. "Not" is the "not" keyword, "Invert" is the "~" operator. >>> print(ast.dump(ast.parse('not x', mode='eval'), indent=4)) Expression( body=UnaryOp( op=Not(), operand=Name(id='x', ctx=Load()))) class ast.BinOp(left, op, right) A binary operation (like addition or division). "op" is the operator, and "left" and "right" are any expression nodes. >>> print(ast.dump(ast.parse('x + y', mode='eval'), indent=4)) Expression( body=BinOp( left=Name(id='x', ctx=Load()), op=Add(), right=Name(id='y', ctx=Load()))) class ast.Add class ast.Sub class ast.Mult class ast.Div class ast.FloorDiv class ast.Mod class ast.Pow class ast.LShift class ast.RShift class ast.BitOr class ast.BitXor class ast.BitAnd class ast.MatMult Binary operator tokens. class ast.BoolOp(op, values) A boolean operation, ‘or’ or ‘and’. "op" is "Or" or "And". "values" are the values involved. Consecutive operations with the same operator, such as "a or b or c", are collapsed into one node with several values. This doesn’t include "not", which is a "UnaryOp". >>> print(ast.dump(ast.parse('x or y', mode='eval'), indent=4)) Expression( body=BoolOp( op=Or(), values=[ Name(id='x', ctx=Load()), Name(id='y', ctx=Load())])) class ast.And class ast.Or Boolean operator tokens. class ast.Compare(left, ops, comparators) A comparison of two or more values. "left" is the first value in the comparison, "ops" the list of operators, and "comparators" the list of values after the first element in the comparison. >>> print(ast.dump(ast.parse('1 <= a < 10', mode='eval'), indent=4)) Expression( body=Compare( left=Constant(value=1), ops=[ LtE(), Lt()], comparators=[ Name(id='a', ctx=Load()), Constant(value=10)])) class ast.Eq class ast.NotEq class ast.Lt class ast.LtE class ast.Gt class ast.GtE class ast.Is class ast.IsNot class ast.In class ast.NotIn Comparison operator tokens. class ast.Call(func, args, keywords, starargs, kwargs) A function call. "func" is the function, which will often be a "Name" or "Attribute" object. Of the arguments: * "args" holds a list of the arguments passed by position. * "keywords" holds a list of "keyword" objects representing arguments passed by keyword. When creating a "Call" node, "args" and "keywords" are required, but they can be empty lists. "starargs" and "kwargs" are optional. >>> print(ast.dump(ast.parse('func(a, b=c, *d, **e)', mode='eval'), indent=4)) Expression( body=Call( func=Name(id='func', ctx=Load()), args=[ Name(id='a', ctx=Load()), Starred( value=Name(id='d', ctx=Load()), ctx=Load())], keywords=[ keyword( arg='b', value=Name(id='c', ctx=Load())), keyword( value=Name(id='e', ctx=Load()))])) class ast.keyword(arg, value) A keyword argument to a function call or class definition. "arg" is a raw string of the parameter name, "value" is a node to pass in. class ast.IfExp(test, body, orelse) An expression such as "a if b else c". Each field holds a single node, so in the following example, all three are "Name" nodes. >>> print(ast.dump(ast.parse('a if b else c', mode='eval'), indent=4)) Expression( body=IfExp( test=Name(id='b', ctx=Load()), body=Name(id='a', ctx=Load()), orelse=Name(id='c', ctx=Load()))) class ast.Attribute(value, attr, ctx) Attribute access, e.g. "d.keys". "value" is a node, typically a "Name". "attr" is a bare string giving the name of the attribute, and "ctx" is "Load", "Store" or "Del" according to how the attribute is acted on. >>> print(ast.dump(ast.parse('snake.colour', mode='eval'), indent=4)) Expression( body=Attribute( value=Name(id='snake', ctx=Load()), attr='colour', ctx=Load())) class ast.NamedExpr(target, value) A named expression. This AST node is produced by the assignment expressions operator (also known as the walrus operator). As opposed to the "Assign" node in which the first argument can be multiple nodes, in this case both "target" and "value" must be single nodes. >>> print(ast.dump(ast.parse('(x := 4)', mode='eval'), indent=4)) Expression( body=NamedExpr( target=Name(id='x', ctx=Store()), value=Constant(value=4))) Subscripting ~~~~~~~~~~~~ class ast.Subscript(value, slice, ctx) A subscript, such as "l[1]". "value" is the subscripted object (usually sequence or mapping). "slice" is an index, slice or key. It can be a "Tuple" and contain a "Slice". "ctx" is "Load", "Store" or "Del" according to the action performed with the subscript. >>> print(ast.dump(ast.parse('l[1:2, 3]', mode='eval'), indent=4)) Expression( body=Subscript( value=Name(id='l', ctx=Load()), slice=Tuple( elts=[ Slice( lower=Constant(value=1), upper=Constant(value=2)), Constant(value=3)], ctx=Load()), ctx=Load())) class ast.Slice(lower, upper, step) Regular slicing (on the form "lower:upper" or "lower:upper:step"). Can occur only inside the *slice* field of "Subscript", either directly or as an element of "Tuple". >>> print(ast.dump(ast.parse('l[1:2]', mode='eval'), indent=4)) Expression( body=Subscript( value=Name(id='l', ctx=Load()), slice=Slice( lower=Constant(value=1), upper=Constant(value=2)), ctx=Load())) Comprehensions ~~~~~~~~~~~~~~ class ast.ListComp(elt, generators) class ast.SetComp(elt, generators) class ast.GeneratorExp(elt, generators) class ast.DictComp(key, value, generators) List and set comprehensions, generator expressions, and dictionary comprehensions. "elt" (or "key" and "value") is a single node representing the part that will be evaluated for each item. "generators" is a list of "comprehension" nodes. >>> print(ast.dump(ast.parse('[x for x in numbers]', mode='eval'), indent=4)) Expression( body=ListComp( elt=Name(id='x', ctx=Load()), generators=[ comprehension( target=Name(id='x', ctx=Store()), iter=Name(id='numbers', ctx=Load()), ifs=[], is_async=0)])) >>> print(ast.dump(ast.parse('{x: x**2 for x in numbers}', mode='eval'), indent=4)) Expression( body=DictComp( key=Name(id='x', ctx=Load()), value=BinOp( left=Name(id='x', ctx=Load()), op=Pow(), right=Constant(value=2)), generators=[ comprehension( target=Name(id='x', ctx=Store()), iter=Name(id='numbers', ctx=Load()), ifs=[], is_async=0)])) >>> print(ast.dump(ast.parse('{x for x in numbers}', mode='eval'), indent=4)) Expression( body=SetComp( elt=Name(id='x', ctx=Load()), generators=[ comprehension( target=Name(id='x', ctx=Store()), iter=Name(id='numbers', ctx=Load()), ifs=[], is_async=0)])) class ast.comprehension(target, iter, ifs, is_async) One "for" clause in a comprehension. "target" is the reference to use for each element - typically a "Name" or "Tuple" node. "iter" is the object to iterate over. "ifs" is a list of test expressions: each "for" clause can have multiple "ifs". "is_async" indicates a comprehension is asynchronous (using an "async for" instead of "for"). The value is an integer (0 or 1). >>> print(ast.dump(ast.parse('[ord(c) for line in file for c in line]', mode='eval'), ... indent=4)) # Multiple comprehensions in one. Expression( body=ListComp( elt=Call( func=Name(id='ord', ctx=Load()), args=[ Name(id='c', ctx=Load())], keywords=[]), generators=[ comprehension( target=Name(id='line', ctx=Store()), iter=Name(id='file', ctx=Load()), ifs=[], is_async=0), comprehension( target=Name(id='c', ctx=Store()), iter=Name(id='line', ctx=Load()), ifs=[], is_async=0)])) >>> print(ast.dump(ast.parse('(n**2 for n in it if n>5 if n<10)', mode='eval'), ... indent=4)) # generator comprehension Expression( body=GeneratorExp( elt=BinOp( left=Name(id='n', ctx=Load()), op=Pow(), right=Constant(value=2)), generators=[ comprehension( target=Name(id='n', ctx=Store()), iter=Name(id='it', ctx=Load()), ifs=[ Compare( left=Name(id='n', ctx=Load()), ops=[ Gt()], comparators=[ Constant(value=5)]), Compare( left=Name(id='n', ctx=Load()), ops=[ Lt()], comparators=[ Constant(value=10)])], is_async=0)])) >>> print(ast.dump(ast.parse('[i async for i in soc]', mode='eval'), ... indent=4)) # Async comprehension Expression( body=ListComp( elt=Name(id='i', ctx=Load()), generators=[ comprehension( target=Name(id='i', ctx=Store()), iter=Name(id='soc', ctx=Load()), ifs=[], is_async=1)])) Statements ---------- class ast.Assign(targets, value, type_comment) An assignment. "targets" is a list of nodes, and "value" is a single node. Multiple nodes in "targets" represents assigning the same value to each. Unpacking is represented by putting a "Tuple" or "List" within "targets". type_comment "type_comment" is an optional string with the type annotation as a comment. >>> print(ast.dump(ast.parse('a = b = 1'), indent=4)) # Multiple assignment Module( body=[ Assign( targets=[ Name(id='a', ctx=Store()), Name(id='b', ctx=Store())], value=Constant(value=1))], type_ignores=[]) >>> print(ast.dump(ast.parse('a,b = c'), indent=4)) # Unpacking Module( body=[ Assign( targets=[ Tuple( elts=[ Name(id='a', ctx=Store()), Name(id='b', ctx=Store())], ctx=Store())], value=Name(id='c', ctx=Load()))], type_ignores=[]) class ast.AnnAssign(target, annotation, value, simple) An assignment with a type annotation. "target" is a single node and can be a "Name", a "Attribute" or a "Subscript". "annotation" is the annotation, such as a "Constant" or "Name" node. "value" is a single optional node. "simple" is a boolean integer set to True for a "Name" node in "target" that do not appear in between parenthesis and are hence pure names and not expressions. >>> print(ast.dump(ast.parse('c: int'), indent=4)) Module( body=[ AnnAssign( target=Name(id='c', ctx=Store()), annotation=Name(id='int', ctx=Load()), simple=1)], type_ignores=[]) >>> print(ast.dump(ast.parse('(a): int = 1'), indent=4)) # Annotation with parenthesis Module( body=[ AnnAssign( target=Name(id='a', ctx=Store()), annotation=Name(id='int', ctx=Load()), value=Constant(value=1), simple=0)], type_ignores=[]) >>> print(ast.dump(ast.parse('a.b: int'), indent=4)) # Attribute annotation Module( body=[ AnnAssign( target=Attribute( value=Name(id='a', ctx=Load()), attr='b', ctx=Store()), annotation=Name(id='int', ctx=Load()), simple=0)], type_ignores=[]) >>> print(ast.dump(ast.parse('a[1]: int'), indent=4)) # Subscript annotation Module( body=[ AnnAssign( target=Subscript( value=Name(id='a', ctx=Load()), slice=Constant(value=1), ctx=Store()), annotation=Name(id='int', ctx=Load()), simple=0)], type_ignores=[]) class ast.AugAssign(target, op, value) Augmented assignment, such as "a += 1". In the following example, "target" is a "Name" node for "x" (with the "Store" context), "op" is "Add", and "value" is a "Constant" with value for 1. The "target" attribute connot be of class "Tuple" or "List", unlike the targets of "Assign". >>> print(ast.dump(ast.parse('x += 2'), indent=4)) Module( body=[ AugAssign( target=Name(id='x', ctx=Store()), op=Add(), value=Constant(value=2))], type_ignores=[]) class ast.Raise(exc, cause) A "raise" statement. "exc" is the exception object to be raised, normally a "Call" or "Name", or "None" for a standalone "raise". "cause" is the optional part for "y" in "raise x from y". >>> print(ast.dump(ast.parse('raise x from y'), indent=4)) Module( body=[ Raise( exc=Name(id='x', ctx=Load()), cause=Name(id='y', ctx=Load()))], type_ignores=[]) class ast.Assert(test, msg) An assertion. "test" holds the condition, such as a "Compare" node. "msg" holds the failure message. >>> print(ast.dump(ast.parse('assert x,y'), indent=4)) Module( body=[ Assert( test=Name(id='x', ctx=Load()), msg=Name(id='y', ctx=Load()))], type_ignores=[]) class ast.Delete(targets) Represents a "del" statement. "targets" is a list of nodes, such as "Name", "Attribute" or "Subscript" nodes. >>> print(ast.dump(ast.parse('del x,y,z'), indent=4)) Module( body=[ Delete( targets=[ Name(id='x', ctx=Del()), Name(id='y', ctx=Del()), Name(id='z', ctx=Del())])], type_ignores=[]) class ast.Pass A "pass" statement. >>> print(ast.dump(ast.parse('pass'), indent=4)) Module( body=[ Pass()], type_ignores=[]) Other statements which are only applicable inside functions or loops are described in other sections. Imports ~~~~~~~ class ast.Import(names) An import statement. "names" is a list of "alias" nodes. >>> print(ast.dump(ast.parse('import x,y,z'), indent=4)) Module( body=[ Import( names=[ alias(name='x'), alias(name='y'), alias(name='z')])], type_ignores=[]) class ast.ImportFrom(module, names, level) Represents "from x import y". "module" is a raw string of the ‘from’ name, without any leading dots, or "None" for statements such as "from . import foo". "level" is an integer holding the level of the relative import (0 means absolute import). >>> print(ast.dump(ast.parse('from y import x,y,z'), indent=4)) Module( body=[ ImportFrom( module='y', names=[ alias(name='x'), alias(name='y'), alias(name='z')], level=0)], type_ignores=[]) class ast.alias(name, asname) Both parameters are raw strings of the names. "asname" can be "None" if the regular name is to be used. >>> print(ast.dump(ast.parse('from ..foo.bar import a as b, c'), indent=4)) Module( body=[ ImportFrom( module='foo.bar', names=[ alias(name='a', asname='b'), alias(name='c')], level=2)], type_ignores=[]) Control flow ------------ Note: Optional clauses such as "else" are stored as an empty list if they’re not present. class ast.If(test, body, orelse) An "if" statement. "test" holds a single node, such as a "Compare" node. "body" and "orelse" each hold a list of nodes. "elif" clauses don’t have a special representation in the AST, but rather appear as extra "If" nodes within the "orelse" section of the previous one. >>> print(ast.dump(ast.parse(""" ... if x: ... ... ... elif y: ... ... ... else: ... ... ... """), indent=4)) Module( body=[ If( test=Name(id='x', ctx=Load()), body=[ Expr( value=Constant(value=Ellipsis))], orelse=[ If( test=Name(id='y', ctx=Load()), body=[ Expr( value=Constant(value=Ellipsis))], orelse=[ Expr( value=Constant(value=Ellipsis))])])], type_ignores=[]) class ast.For(target, iter, body, orelse, type_comment) A "for" loop. "target" holds the variable(s) the loop assigns to, as a single "Name", "Tuple" or "List" node. "iter" holds the item to be looped over, again as a single node. "body" and "orelse" contain lists of nodes to execute. Those in "orelse" are executed if the loop finishes normally, rather than via a "break" statement. type_comment "type_comment" is an optional string with the type annotation as a comment. >>> print(ast.dump(ast.parse(""" ... for x in y: ... ... ... else: ... ... ... """), indent=4)) Module( body=[ For( target=Name(id='x', ctx=Store()), iter=Name(id='y', ctx=Load()), body=[ Expr( value=Constant(value=Ellipsis))], orelse=[ Expr( value=Constant(value=Ellipsis))])], type_ignores=[]) class ast.While(test, body, orelse) A "while" loop. "test" holds the condition, such as a "Compare" node. >> print(ast.dump(ast.parse(""" ... while x: ... ... ... else: ... ... ... """), indent=4)) Module( body=[ While( test=Name(id='x', ctx=Load()), body=[ Expr( value=Constant(value=Ellipsis))], orelse=[ Expr( value=Constant(value=Ellipsis))])], type_ignores=[]) class ast.Break class ast.Continue The "break" and "continue" statements. >>> print(ast.dump(ast.parse("""\ ... for a in b: ... if a > 5: ... break ... else: ... continue ... ... """), indent=4)) Module( body=[ For( target=Name(id='a', ctx=Store()), iter=Name(id='b', ctx=Load()), body=[ If( test=Compare( left=Name(id='a', ctx=Load()), ops=[ Gt()], comparators=[ Constant(value=5)]), body=[ Break()], orelse=[ Continue()])], orelse=[])], type_ignores=[]) class ast.Try(body, handlers, orelse, finalbody) "try" blocks. All attributes are list of nodes to execute, except for "handlers", which is a list of "ExceptHandler" nodes. >>> print(ast.dump(ast.parse(""" ... try: ... ... ... except Exception: ... ... ... except OtherException as e: ... ... ... else: ... ... ... finally: ... ... ... """), indent=4)) Module( body=[ Try( body=[ Expr( value=Constant(value=Ellipsis))], handlers=[ ExceptHandler( type=Name(id='Exception', ctx=Load()), body=[ Expr( value=Constant(value=Ellipsis))]), ExceptHandler( type=Name(id='OtherException', ctx=Load()), name='e', body=[ Expr( value=Constant(value=Ellipsis))])], orelse=[ Expr( value=Constant(value=Ellipsis))], finalbody=[ Expr( value=Constant(value=Ellipsis))])], type_ignores=[]) class ast.ExceptHandler(type, name, body) A single "except" clause. "type" is the exception type it will match, typically a "Name" node (or "None" for a catch-all "except:" clause). "name" is a raw string for the name to hold the exception, or "None" if the clause doesn’t have "as foo". "body" is a list of nodes. >>> print(ast.dump(ast.parse("""\ ... try: ... a + 1 ... except TypeError: ... pass ... """), indent=4)) Module( body=[ Try( body=[ Expr( value=BinOp( left=Name(id='a', ctx=Load()), op=Add(), right=Constant(value=1)))], handlers=[ ExceptHandler( type=Name(id='TypeError', ctx=Load()), body=[ Pass()])], orelse=[], finalbody=[])], type_ignores=[]) class ast.With(items, body, type_comment) A "with" block. "items" is a list of "withitem" nodes representing the context managers, and "body" is the indented block inside the context. type_comment "type_comment" is an optional string with the type annotation as a comment. class ast.withitem(context_expr, optional_vars) A single context manager in a "with" block. "context_expr" is the context manager, often a "Call" node. "optional_vars" is a "Name", "Tuple" or "List" for the "as foo" part, or "None" if that isn’t used. >>> print(ast.dump(ast.parse("""\ ... with a as b, c as d: ... something(b, d) ... """), indent=4)) Module( body=[ With( items=[ withitem( context_expr=Name(id='a', ctx=Load()), optional_vars=Name(id='b', ctx=Store())), withitem( context_expr=Name(id='c', ctx=Load()), optional_vars=Name(id='d', ctx=Store()))], body=[ Expr( value=Call( func=Name(id='something', ctx=Load()), args=[ Name(id='b', ctx=Load()), Name(id='d', ctx=Load())], keywords=[]))])], type_ignores=[]) Function and class definitions ------------------------------ class ast.FunctionDef(name, args, body, decorator_list, returns, type_comment) A function definition. * "name" is a raw string of the function name. * "args" is an "arguments" node. * "body" is the list of nodes inside the function. * "decorator_list" is the list of decorators to be applied, stored outermost first (i.e. the first in the list will be applied last). * "returns" is the return annotation. type_comment "type_comment" is an optional string with the type annotation as a comment. class ast.Lambda(args, body) "lambda" is a minimal function definition that can be used inside an expression. Unlike "FunctionDef", "body" holds a single node. >>> print(ast.dump(ast.parse('lambda x,y: ...'), indent=4)) Module( body=[ Expr( value=Lambda( args=arguments( posonlyargs=[], args=[ arg(arg='x'), arg(arg='y')], kwonlyargs=[], kw_defaults=[], defaults=[]), body=Constant(value=Ellipsis)))], type_ignores=[]) class ast.arguments(posonlyargs, args, vararg, kwonlyargs, kw_defaults, kwarg, defaults) The arguments for a function. * "posonlyargs", "args" and "kwonlyargs" are lists of "arg" nodes. * "vararg" and "kwarg" are single "arg" nodes, referring to the "*args, **kwargs" parameters. * "kw_defaults" is a list of default values for keyword-only arguments. If one is "None", the corresponding argument is required. * "defaults" is a list of default values for arguments that can be passed positionally. If there are fewer defaults, they correspond to the last n arguments. class ast.arg(arg, annotation, type_comment) A single argument in a list. "arg" is a raw string of the argument name, "annotation" is its annotation, such as a "Str" or "Name" node. type_comment "type_comment" is an optional string with the type annotation as a comment >>> print(ast.dump(ast.parse("""\ ... @decorator1 ... @decorator2 ... def f(a: 'annotation', b=1, c=2, *d, e, f=3, **g) -> 'return annotation': ... pass ... """), indent=4)) Module( body=[ FunctionDef( name='f', args=arguments( posonlyargs=[], args=[ arg( arg='a', annotation=Constant(value='annotation')), arg(arg='b'), arg(arg='c')], vararg=arg(arg='d'), kwonlyargs=[ arg(arg='e'), arg(arg='f')], kw_defaults=[ None, Constant(value=3)], kwarg=arg(arg='g'), defaults=[ Constant(value=1), Constant(value=2)]), body=[ Pass()], decorator_list=[ Name(id='decorator1', ctx=Load()), Name(id='decorator2', ctx=Load())], returns=Constant(value='return annotation'))], type_ignores=[]) class ast.Return(value) A "return" statement. >>> print(ast.dump(ast.parse('return 4'), indent=4)) Module( body=[ Return( value=Constant(value=4))], type_ignores=[]) class ast.Yield(value) class ast.YieldFrom(value) A "yield" or "yield from" expression. Because these are expressions, they must be wrapped in a "Expr" node if the value sent back is not used. >>> print(ast.dump(ast.parse('yield x'), indent=4)) Module( body=[ Expr( value=Yield( value=Name(id='x', ctx=Load())))], type_ignores=[]) >>> print(ast.dump(ast.parse('yield from x'), indent=4)) Module( body=[ Expr( value=YieldFrom( value=Name(id='x', ctx=Load())))], type_ignores=[]) class ast.Global(names) class ast.Nonlocal(names) "global" and "nonlocal" statements. "names" is a list of raw strings. >>> print(ast.dump(ast.parse('global x,y,z'), indent=4)) Module( body=[ Global( names=[ 'x', 'y', 'z'])], type_ignores=[]) >>> print(ast.dump(ast.parse('nonlocal x,y,z'), indent=4)) Module( body=[ Nonlocal( names=[ 'x', 'y', 'z'])], type_ignores=[]) class ast.ClassDef(name, bases, keywords, starargs, kwargs, body, decorator_list) A class definition. * "name" is a raw string for the class name * "bases" is a list of nodes for explicitly specified base classes. * "keywords" is a list of "keyword" nodes, principally for ‘metaclass’. Other keywords will be passed to the metaclass, as per PEP-3115. * "starargs" and "kwargs" are each a single node, as in a function call. starargs will be expanded to join the list of base classes, and kwargs will be passed to the metaclass. * "body" is a list of nodes representing the code within the class definition. * "decorator_list" is a list of nodes, as in "FunctionDef". >>> print(ast.dump(ast.parse("""\ ... @decorator1 ... @decorator2 ... class Foo(base1, base2, metaclass=meta): ... pass ... """), indent=4)) Module( body=[ ClassDef( name='Foo', bases=[ Name(id='base1', ctx=Load()), Name(id='base2', ctx=Load())], keywords=[ keyword( arg='metaclass', value=Name(id='meta', ctx=Load()))], body=[ Pass()], decorator_list=[ Name(id='decorator1', ctx=Load()), Name(id='decorator2', ctx=Load())])], type_ignores=[]) Async and await --------------- class ast.AsyncFunctionDef(name, args, body, decorator_list, returns, type_comment) An "async def" function definition. Has the same fields as "FunctionDef". class ast.Await(value) An "await" expression. "value" is what it waits for. Only valid in the body of an "AsyncFunctionDef". >>> print(ast.dump(ast.parse("""\ ... async def f(): ... await other_func() ... """), indent=4)) Module( body=[ AsyncFunctionDef( name='f', args=arguments( posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[ Expr( value=Await( value=Call( func=Name(id='other_func', ctx=Load()), args=[], keywords=[])))], decorator_list=[])], type_ignores=[]) class ast.AsyncFor(target, iter, body, orelse, type_comment) class ast.AsyncWith(items, body, type_comment) "async for" loops and "async with" context managers. They have the same fields as "For" and "With", respectively. Only valid in the body of an "AsyncFunctionDef". Note: When a string is parsed by "ast.parse()", operator nodes (subclasses of "ast.operator", "ast.unaryop", "ast.cmpop", "ast.boolop" and "ast.expr_context") on the returned tree will be singletons. Changes to one will be reflected in all other occurrences of the same value (e.g. "ast.Add"). "ast" Helpers ============= Apart from the node classes, the "ast" module defines these utility functions and classes for traversing abstract syntax trees: ast.parse(source, filename='', mode='exec', *, type_comments=False, feature_version=None) Parse the source into an AST node. Equivalent to "compile(source, filename, mode, ast.PyCF_ONLY_AST)". If "type_comments=True" is given, the parser is modified to check and return type comments as specified by **PEP 484** and **PEP 526**. This is equivalent to adding "ast.PyCF_TYPE_COMMENTS" to the flags passed to "compile()". This will report syntax errors for misplaced type comments. Without this flag, type comments will be ignored, and the "type_comment" field on selected AST nodes will always be "None". In addition, the locations of "# type: ignore" comments will be returned as the "type_ignores" attribute of "Module" (otherwise it is always an empty list). In addition, if "mode" is "'func_type'", the input syntax is modified to correspond to **PEP 484** “signature type comments”, e.g. "(str, int) -> List[str]". Also, setting "feature_version" to a tuple "(major, minor)" will attempt to parse using that Python version’s grammar. Currently "major" must equal to "3". For example, setting "feature_version=(3, 4)" will allow the use of "async" and "await" as variable names. The lowest supported version is "(3, 4)"; the highest is "sys.version_info[0:2]". If source contains a null character (‘0’), "ValueError" is raised. Warning: Note that successfully parsing source code into an AST object doesn’t guarantee that the source code provided is valid Python code that can be executed as the compilation step can raise further "SyntaxError" exceptions. For instance, the source "return 42" generates a valid AST node for a return statement, but it cannot be compiled alone (it needs to be inside a function node).In particular, "ast.parse()" won’t do any scoping checks, which the compilation step does. Warning: It is possible to crash the Python interpreter with a sufficiently large/complex string due to stack depth limitations in Python’s AST compiler. Changed in version 3.8: Added "type_comments", "mode='func_type'" and "feature_version". ast.unparse(ast_obj) Unparse an "ast.AST" object and generate a string with code that would produce an equivalent "ast.AST" object if parsed back with "ast.parse()". Warning: The produced code string will not necessarily be equal to the original code that generated the "ast.AST" object (without any compiler optimizations, such as constant tuples/frozensets). Warning: Trying to unparse a highly complex expression would result with "RecursionError". New in version 3.9. ast.literal_eval(node_or_string) Evaluate an expression node or a string containing only a Python literal or container display. The string or node provided may only consist of the following Python literal structures: strings, bytes, numbers, tuples, lists, dicts, sets, booleans, and "None". This can be used for evaluating strings containing Python values without the need to parse the values oneself. It is not capable of evaluating arbitrarily complex expressions, for example involving operators or indexing. This function had been documented as “safe” in the past without defining what that meant. That was misleading. This is specifically designed not to execute Python code, unlike the more general "eval()". There is no namespace, no name lookups, or ability to call out. But it is not free from attack: A relatively small input can lead to memory exhaustion or to C stack exhaustion, crashing the process. There is also the possibility for excessive CPU consumption denial of service on some inputs. Calling it on untrusted data is thus not recommended. Warning: It is possible to crash the Python interpreter due to stack depth limitations in Python’s AST compiler. Changed in version 3.2: Now allows bytes and set literals. Changed in version 3.9: Now supports creating empty sets with "'set()'". ast.get_docstring(node, clean=True) Return the docstring of the given *node* (which must be a "FunctionDef", "AsyncFunctionDef", "ClassDef", or "Module" node), or "None" if it has no docstring. If *clean* is true, clean up the docstring’s indentation with "inspect.cleandoc()". Changed in version 3.5: "AsyncFunctionDef" is now supported. ast.get_source_segment(source, node, *, padded=False) Get source code segment of the *source* that generated *node*. If some location information ("lineno", "end_lineno", "col_offset", or "end_col_offset") is missing, return "None". If *padded* is "True", the first line of a multi-line statement will be padded with spaces to match its original position. New in version 3.8. ast.fix_missing_locations(node) When you compile a node tree with "compile()", the compiler expects "lineno" and "col_offset" attributes for every node that supports them. This is rather tedious to fill in for generated nodes, so this helper adds these attributes recursively where not already set, by setting them to the values of the parent node. It works recursively starting at *node*. ast.increment_lineno(node, n=1) Increment the line number and end line number of each node in the tree starting at *node* by *n*. This is useful to “move code” to a different location in a file. ast.copy_location(new_node, old_node) Copy source location ("lineno", "col_offset", "end_lineno", and "end_col_offset") from *old_node* to *new_node* if possible, and return *new_node*. ast.iter_fields(node) Yield a tuple of "(fieldname, value)" for each field in "node._fields" that is present on *node*. ast.iter_child_nodes(node) Yield all direct child nodes of *node*, that is, all fields that are nodes and all items of fields that are lists of nodes. ast.walk(node) Recursively yield all descendant nodes in the tree starting at *node* (including *node* itself), in no specified order. This is useful if you only want to modify nodes in place and don’t care about the context. class ast.NodeVisitor A node visitor base class that walks the abstract syntax tree and calls a visitor function for every node found. This function may return a value which is forwarded by the "visit()" method. This class is meant to be subclassed, with the subclass adding visitor methods. visit(node) Visit a node. The default implementation calls the method called "self.visit_*classname*" where *classname* is the name of the node class, or "generic_visit()" if that method doesn’t exist. generic_visit(node) This visitor calls "visit()" on all children of the node. Note that child nodes of nodes that have a custom visitor method won’t be visited unless the visitor calls "generic_visit()" or visits them itself. Don’t use the "NodeVisitor" if you want to apply changes to nodes during traversal. For this a special visitor exists ("NodeTransformer") that allows modifications. Deprecated since version 3.8: Methods "visit_Num()", "visit_Str()", "visit_Bytes()", "visit_NameConstant()" and "visit_Ellipsis()" are deprecated now and will not be called in future Python versions. Add the "visit_Constant()" method to handle all constant nodes. class ast.NodeTransformer A "NodeVisitor" subclass that walks the abstract syntax tree and allows modification of nodes. The "NodeTransformer" will walk the AST and use the return value of the visitor methods to replace or remove the old node. If the return value of the visitor method is "None", the node will be removed from its location, otherwise it is replaced with the return value. The return value may be the original node in which case no replacement takes place. Here is an example transformer that rewrites all occurrences of name lookups ("foo") to "data['foo']": class RewriteName(NodeTransformer): def visit_Name(self, node): return Subscript( value=Name(id='data', ctx=Load()), slice=Constant(value=node.id), ctx=node.ctx ) Keep in mind that if the node you’re operating on has child nodes you must either transform the child nodes yourself or call the "generic_visit()" method for the node first. For nodes that were part of a collection of statements (that applies to all statement nodes), the visitor may also return a list of nodes rather than just a single node. If "NodeTransformer" introduces new nodes (that weren’t part of original tree) without giving them location information (such as "lineno"), "fix_missing_locations()" should be called with the new sub-tree to recalculate the location information: tree = ast.parse('foo', mode='eval') new_tree = fix_missing_locations(RewriteName().visit(tree)) Usually you use the transformer like this: node = YourTransformer().visit(node) ast.dump(node, annotate_fields=True, include_attributes=False, *, indent=None) Return a formatted dump of the tree in *node*. This is mainly useful for debugging purposes. If *annotate_fields* is true (by default), the returned string will show the names and the values for fields. If *annotate_fields* is false, the result string will be more compact by omitting unambiguous field names. Attributes such as line numbers and column offsets are not dumped by default. If this is wanted, *include_attributes* can be set to true. If *indent* is a non-negative integer or string, then the tree will be pretty-printed with that indent level. An indent level of 0, negative, or """" will only insert newlines. "None" (the default) selects the single line representation. Using a positive integer indent indents that many spaces per level. If *indent* is a string (such as ""\t""), that string is used to indent each level. Changed in version 3.9: Added the *indent* option. Compiler Flags ============== The following flags may be passed to "compile()" in order to change effects on the compilation of a program: ast.PyCF_ALLOW_TOP_LEVEL_AWAIT Enables support for top-level "await", "async for", "async with" and async comprehensions. New in version 3.8. ast.PyCF_ONLY_AST Generates and returns an abstract syntax tree instead of returning a compiled code object. ast.PyCF_TYPE_COMMENTS Enables support for **PEP 484** and **PEP 526** style type comments ("# type: ", "# type: ignore "). New in version 3.8. Command-Line Usage ================== New in version 3.9. The "ast" module can be executed as a script from the command line. It is as simple as: python -m ast [-m ] [-a] [infile] The following options are accepted: -h, --help Show the help message and exit. -m --mode Specify what kind of code must be compiled, like the *mode* argument in "parse()". --no-type-comments Don’t parse type comments. -a, --include-attributes Include attributes such as line numbers and column offsets. -i --indent Indentation of nodes in AST (number of spaces). If "infile" is specified its contents are parsed to AST and dumped to stdout. Otherwise, the content is read from stdin. See also: Green Tree Snakes, an external documentation resource, has good details on working with Python ASTs. ASTTokens annotates Python ASTs with the positions of tokens and text in the source code that generated them. This is helpful for tools that make source code transformations. leoAst.py unifies the token-based and parse-tree-based views of python programs by inserting two-way links between tokens and ast nodes. LibCST parses code as a Concrete Syntax Tree that looks like an ast tree and keeps all formatting details. It’s useful for building automated refactoring (codemod) applications and linters. Parso is a Python parser that supports error recovery and round-trip parsing for different Python versions (in multiple Python versions). Parso is also able to list multiple syntax errors in your python file. "asynchat" — Asynchronous socket command/response handler ********************************************************* **Source code:** Lib/asynchat.py Deprecated since version 3.6: "asynchat" will be removed in Python 3.12 (see **PEP 594** for details). Please use "asyncio" instead. ====================================================================== Note: This module exists for backwards compatibility only. For new code we recommend using "asyncio". This module builds on the "asyncore" infrastructure, simplifying asynchronous clients and servers and making it easier to handle protocols whose elements are terminated by arbitrary strings, or are of variable length. "asynchat" defines the abstract class "async_chat" that you subclass, providing implementations of the "collect_incoming_data()" and "found_terminator()" methods. It uses the same asynchronous loop as "asyncore", and the two types of channel, "asyncore.dispatcher" and "asynchat.async_chat", can freely be mixed in the channel map. Typically an "asyncore.dispatcher" server channel generates new "asynchat.async_chat" channel objects as it receives incoming connection requests. class asynchat.async_chat This class is an abstract subclass of "asyncore.dispatcher". To make practical use of the code you must subclass "async_chat", providing meaningful "collect_incoming_data()" and "found_terminator()" methods. The "asyncore.dispatcher" methods can be used, although not all make sense in a message/response context. Like "asyncore.dispatcher", "async_chat" defines a set of events that are generated by an analysis of socket conditions after a "select()" call. Once the polling loop has been started the "async_chat" object’s methods are called by the event-processing framework with no action on the part of the programmer. Two class attributes can be modified, to improve performance, or possibly even to conserve memory. ac_in_buffer_size The asynchronous input buffer size (default "4096"). ac_out_buffer_size The asynchronous output buffer size (default "4096"). Unlike "asyncore.dispatcher", "async_chat" allows you to define a FIFO (first-in, first-out) queue of *producers*. A producer need have only one method, "more()", which should return data to be transmitted on the channel. The producer indicates exhaustion (*i.e.* that it contains no more data) by having its "more()" method return the empty bytes object. At this point the "async_chat" object removes the producer from the queue and starts using the next producer, if any. When the producer queue is empty the "handle_write()" method does nothing. You use the channel object’s "set_terminator()" method to describe how to recognize the end of, or an important breakpoint in, an incoming transmission from the remote endpoint. To build a functioning "async_chat" subclass your input methods "collect_incoming_data()" and "found_terminator()" must handle the data that the channel receives asynchronously. The methods are described below. async_chat.close_when_done() Pushes a "None" on to the producer queue. When this producer is popped off the queue it causes the channel to be closed. async_chat.collect_incoming_data(data) Called with *data* holding an arbitrary amount of received data. The default method, which must be overridden, raises a "NotImplementedError" exception. async_chat.discard_buffers() In emergencies this method will discard any data held in the input and/or output buffers and the producer queue. async_chat.found_terminator() Called when the incoming data stream matches the termination condition set by "set_terminator()". The default method, which must be overridden, raises a "NotImplementedError" exception. The buffered input data should be available via an instance attribute. async_chat.get_terminator() Returns the current terminator for the channel. async_chat.push(data) Pushes data on to the channel’s queue to ensure its transmission. This is all you need to do to have the channel write the data out to the network, although it is possible to use your own producers in more complex schemes to implement encryption and chunking, for example. async_chat.push_with_producer(producer) Takes a producer object and adds it to the producer queue associated with the channel. When all currently-pushed producers have been exhausted the channel will consume this producer’s data by calling its "more()" method and send the data to the remote endpoint. async_chat.set_terminator(term) Sets the terminating condition to be recognized on the channel. "term" may be any of three types of value, corresponding to three different ways to handle incoming protocol data. +-------------+-----------------------------------------------+ | term | Description | |=============|===============================================| | *string* | Will call "found_terminator()" when the | | | string is found in the input stream | +-------------+-----------------------------------------------+ | *integer* | Will call "found_terminator()" when the | | | indicated number of characters have been | | | received | +-------------+-----------------------------------------------+ | "None" | The channel continues to collect data forever | +-------------+-----------------------------------------------+ Note that any data following the terminator will be available for reading by the channel after "found_terminator()" is called. asynchat Example ================ The following partial example shows how HTTP requests can be read with "async_chat". A web server might create an "http_request_handler" object for each incoming client connection. Notice that initially the channel terminator is set to match the blank line at the end of the HTTP headers, and a flag indicates that the headers are being read. Once the headers have been read, if the request is of type POST (indicating that further data are present in the input stream) then the "Content-Length:" header is used to set a numeric terminator to read the right amount of data from the channel. The "handle_request()" method is called once all relevant input has been marshalled, after setting the channel terminator to "None" to ensure that any extraneous data sent by the web client are ignored. import asynchat class http_request_handler(asynchat.async_chat): def __init__(self, sock, addr, sessions, log): asynchat.async_chat.__init__(self, sock=sock) self.addr = addr self.sessions = sessions self.ibuffer = [] self.obuffer = b"" self.set_terminator(b"\r\n\r\n") self.reading_headers = True self.handling = False self.cgi_data = None self.log = log def collect_incoming_data(self, data): """Buffer the data""" self.ibuffer.append(data) def found_terminator(self): if self.reading_headers: self.reading_headers = False self.parse_headers(b"".join(self.ibuffer)) self.ibuffer = [] if self.op.upper() == b"POST": clen = self.headers.getheader("content-length") self.set_terminator(int(clen)) else: self.handling = True self.set_terminator(None) self.handle_request() elif not self.handling: self.set_terminator(None) # browsers sometimes over-send self.cgi_data = parse(self.headers, b"".join(self.ibuffer)) self.handling = True self.ibuffer = [] self.handle_request() High-level API Index ******************** This page lists all high-level async/await enabled asyncio APIs. Tasks ===== Utilities to run asyncio programs, create Tasks, and await on multiple things with timeouts. +----------------------------------------------------+----------------------------------------------------+ | "run()" | Create event loop, run a coroutine, close the | | | loop. | +----------------------------------------------------+----------------------------------------------------+ | "create_task()" | Start an asyncio Task. | +----------------------------------------------------+----------------------------------------------------+ | "await" "sleep()" | Sleep for a number of seconds. | +----------------------------------------------------+----------------------------------------------------+ | "await" "gather()" | Schedule and wait for things concurrently. | +----------------------------------------------------+----------------------------------------------------+ | "await" "wait_for()" | Run with a timeout. | +----------------------------------------------------+----------------------------------------------------+ | "await" "shield()" | Shield from cancellation. | +----------------------------------------------------+----------------------------------------------------+ | "await" "wait()" | Monitor for completion. | +----------------------------------------------------+----------------------------------------------------+ | "current_task()" | Return the current Task. | +----------------------------------------------------+----------------------------------------------------+ | "all_tasks()" | Return all tasks for an event loop. | +----------------------------------------------------+----------------------------------------------------+ | "Task" | Task object. | +----------------------------------------------------+----------------------------------------------------+ | "to_thread()" | Asychronously run a function in a separate OS | | | thread. | +----------------------------------------------------+----------------------------------------------------+ | "run_coroutine_threadsafe()" | Schedule a coroutine from another OS thread. | +----------------------------------------------------+----------------------------------------------------+ | "for in" "as_completed()" | Monitor for completion with a "for" loop. | +----------------------------------------------------+----------------------------------------------------+ -[ Examples ]- * Using asyncio.gather() to run things in parallel. * Using asyncio.wait_for() to enforce a timeout. * Cancellation. * Using asyncio.sleep(). * See also the main Tasks documentation page. Queues ====== Queues should be used to distribute work amongst multiple asyncio Tasks, implement connection pools, and pub/sub patterns. +----------------------------------------------------+----------------------------------------------------+ | "Queue" | A FIFO queue. | +----------------------------------------------------+----------------------------------------------------+ | "PriorityQueue" | A priority queue. | +----------------------------------------------------+----------------------------------------------------+ | "LifoQueue" | A LIFO queue. | +----------------------------------------------------+----------------------------------------------------+ -[ Examples ]- * Using asyncio.Queue to distribute workload between several Tasks. * See also the Queues documentation page. Subprocesses ============ Utilities to spawn subprocesses and run shell commands. +----------------------------------------------------+----------------------------------------------------+ | "await" "create_subprocess_exec()" | Create a subprocess. | +----------------------------------------------------+----------------------------------------------------+ | "await" "create_subprocess_shell()" | Run a shell command. | +----------------------------------------------------+----------------------------------------------------+ -[ Examples ]- * Executing a shell command. * See also the subprocess APIs documentation. Streams ======= High-level APIs to work with network IO. +----------------------------------------------------+----------------------------------------------------+ | "await" "open_connection()" | Establish a TCP connection. | +----------------------------------------------------+----------------------------------------------------+ | "await" "open_unix_connection()" | Establish a Unix socket connection. | +----------------------------------------------------+----------------------------------------------------+ | "await" "start_server()" | Start a TCP server. | +----------------------------------------------------+----------------------------------------------------+ | "await" "start_unix_server()" | Start a Unix socket server. | +----------------------------------------------------+----------------------------------------------------+ | "StreamReader" | High-level async/await object to receive network | | | data. | +----------------------------------------------------+----------------------------------------------------+ | "StreamWriter" | High-level async/await object to send network | | | data. | +----------------------------------------------------+----------------------------------------------------+ -[ Examples ]- * Example TCP client. * See also the streams APIs documentation. Synchronization =============== Threading-like synchronization primitives that can be used in Tasks. +----------------------------------------------------+----------------------------------------------------+ | "Lock" | A mutex lock. | +----------------------------------------------------+----------------------------------------------------+ | "Event" | An event object. | +----------------------------------------------------+----------------------------------------------------+ | "Condition" | A condition object. | +----------------------------------------------------+----------------------------------------------------+ | "Semaphore" | A semaphore. | +----------------------------------------------------+----------------------------------------------------+ | "BoundedSemaphore" | A bounded semaphore. | +----------------------------------------------------+----------------------------------------------------+ -[ Examples ]- * Using asyncio.Event. * See also the documentation of asyncio synchronization primitives. Exceptions ========== +----------------------------------------------------+----------------------------------------------------+ | "asyncio.TimeoutError" | Raised on timeout by functions like "wait_for()". | | | Keep in mind that "asyncio.TimeoutError" is | | | **unrelated** to the built-in "TimeoutError" | | | exception. | +----------------------------------------------------+----------------------------------------------------+ | "asyncio.CancelledError" | Raised when a Task is cancelled. See also | | | "Task.cancel()". | +----------------------------------------------------+----------------------------------------------------+ -[ Examples ]- * Handling CancelledError to run code on cancellation request. * See also the full list of asyncio-specific exceptions. Developing with asyncio *********************** Asynchronous programming is different from classic “sequential” programming. This page lists common mistakes and traps and explains how to avoid them. Debug Mode ========== By default asyncio runs in production mode. In order to ease the development asyncio has a *debug mode*. There are several ways to enable asyncio debug mode: * Setting the "PYTHONASYNCIODEBUG" environment variable to "1". * Using the Python Development Mode. * Passing "debug=True" to "asyncio.run()". * Calling "loop.set_debug()". In addition to enabling the debug mode, consider also: * setting the log level of the asyncio logger to "logging.DEBUG", for example the following snippet of code can be run at startup of the application: logging.basicConfig(level=logging.DEBUG) * configuring the "warnings" module to display "ResourceWarning" warnings. One way of doing that is by using the "-W" "default" command line option. When the debug mode is enabled: * asyncio checks for coroutines that were not awaited and logs them; this mitigates the “forgotten await” pitfall. * Many non-threadsafe asyncio APIs (such as "loop.call_soon()" and "loop.call_at()" methods) raise an exception if they are called from a wrong thread. * The execution time of the I/O selector is logged if it takes too long to perform an I/O operation. * Callbacks taking longer than 100ms are logged. The "loop.slow_callback_duration" attribute can be used to set the minimum execution duration in seconds that is considered “slow”. Concurrency and Multithreading ============================== An event loop runs in a thread (typically the main thread) and executes all callbacks and Tasks in its thread. While a Task is running in the event loop, no other Tasks can run in the same thread. When a Task executes an "await" expression, the running Task gets suspended, and the event loop executes the next Task. To schedule a *callback* from another OS thread, the "loop.call_soon_threadsafe()" method should be used. Example: loop.call_soon_threadsafe(callback, *args) Almost all asyncio objects are not thread safe, which is typically not a problem unless there is code that works with them from outside of a Task or a callback. If there’s a need for such code to call a low- level asyncio API, the "loop.call_soon_threadsafe()" method should be used, e.g.: loop.call_soon_threadsafe(fut.cancel) To schedule a coroutine object from a different OS thread, the "run_coroutine_threadsafe()" function should be used. It returns a "concurrent.futures.Future" to access the result: async def coro_func(): return await asyncio.sleep(1, 42) # Later in another OS thread: future = asyncio.run_coroutine_threadsafe(coro_func(), loop) # Wait for the result: result = future.result() To handle signals and to execute subprocesses, the event loop must be run in the main thread. The "loop.run_in_executor()" method can be used with a "concurrent.futures.ThreadPoolExecutor" to execute blocking code in a different OS thread without blocking the OS thread that the event loop runs in. There is currently no way to schedule coroutines or callbacks directly from a different process (such as one started with "multiprocessing"). The Event Loop Methods section lists APIs that can read from pipes and watch file descriptors without blocking the event loop. In addition, asyncio’s Subprocess APIs provide a way to start a process and communicate with it from the event loop. Lastly, the aforementioned "loop.run_in_executor()" method can also be used with a "concurrent.futures.ProcessPoolExecutor" to execute code in a different process. Running Blocking Code ===================== Blocking (CPU-bound) code should not be called directly. For example, if a function performs a CPU-intensive calculation for 1 second, all concurrent asyncio Tasks and IO operations would be delayed by 1 second. An executor can be used to run a task in a different thread or even in a different process to avoid blocking the OS thread with the event loop. See the "loop.run_in_executor()" method for more details. Logging ======= asyncio uses the "logging" module and all logging is performed via the ""asyncio"" logger. The default log level is "logging.INFO", which can be easily adjusted: logging.getLogger("asyncio").setLevel(logging.WARNING) Detect never-awaited coroutines =============================== When a coroutine function is called, but not awaited (e.g. "coro()" instead of "await coro()") or the coroutine is not scheduled with "asyncio.create_task()", asyncio will emit a "RuntimeWarning": import asyncio async def test(): print("never scheduled") async def main(): test() asyncio.run(main()) Output: test.py:7: RuntimeWarning: coroutine 'test' was never awaited test() Output in debug mode: test.py:7: RuntimeWarning: coroutine 'test' was never awaited Coroutine created at (most recent call last) File "../t.py", line 9, in asyncio.run(main(), debug=True) < .. > File "../t.py", line 7, in main test() test() The usual fix is to either await the coroutine or call the "asyncio.create_task()" function: async def main(): await test() Detect never-retrieved exceptions ================================= If a "Future.set_exception()" is called but the Future object is never awaited on, the exception would never be propagated to the user code. In this case, asyncio would emit a log message when the Future object is garbage collected. Example of an unhandled exception: import asyncio async def bug(): raise Exception("not consumed") async def main(): asyncio.create_task(bug()) asyncio.run(main()) Output: Task exception was never retrieved future: exception=Exception('not consumed')> Traceback (most recent call last): File "test.py", line 4, in bug raise Exception("not consumed") Exception: not consumed Enable the debug mode to get the traceback where the task was created: asyncio.run(main(), debug=True) Output in debug mode: Task exception was never retrieved future: exception=Exception('not consumed') created at asyncio/tasks.py:321> source_traceback: Object created at (most recent call last): File "../t.py", line 9, in asyncio.run(main(), debug=True) < .. > Traceback (most recent call last): File "../t.py", line 4, in bug raise Exception("not consumed") Exception: not consumed Event Loop ********** **Source code:** Lib/asyncio/events.py, Lib/asyncio/base_events.py ====================================================================== -[ Preface ]- The event loop is the core of every asyncio application. Event loops run asynchronous tasks and callbacks, perform network IO operations, and run subprocesses. Application developers should typically use the high-level asyncio functions, such as "asyncio.run()", and should rarely need to reference the loop object or call its methods. This section is intended mostly for authors of lower-level code, libraries, and frameworks, who need finer control over the event loop behavior. -[ Obtaining the Event Loop ]- The following low-level functions can be used to get, set, or create an event loop: asyncio.get_running_loop() Return the running event loop in the current OS thread. If there is no running event loop a "RuntimeError" is raised. This function can only be called from a coroutine or a callback. New in version 3.7. asyncio.get_event_loop() Get the current event loop. If there is no current event loop set in the current OS thread, the OS thread is main, and "set_event_loop()" has not yet been called, asyncio will create a new event loop and set it as the current one. Because this function has rather complex behavior (especially when custom event loop policies are in use), using the "get_running_loop()" function is preferred to "get_event_loop()" in coroutines and callbacks. Consider also using the "asyncio.run()" function instead of using lower level functions to manually create and close an event loop. asyncio.set_event_loop(loop) Set *loop* as a current event loop for the current OS thread. asyncio.new_event_loop() Create and return a new event loop object. Note that the behaviour of "get_event_loop()", "set_event_loop()", and "new_event_loop()" functions can be altered by setting a custom event loop policy. -[ Contents ]- This documentation page contains the following sections: * The Event Loop Methods section is the reference documentation of the event loop APIs; * The Callback Handles section documents the "Handle" and "TimerHandle" instances which are returned from scheduling methods such as "loop.call_soon()" and "loop.call_later()"; * The Server Objects section documents types returned from event loop methods like "loop.create_server()"; * The Event Loop Implementations section documents the "SelectorEventLoop" and "ProactorEventLoop" classes; * The Examples section showcases how to work with some event loop APIs. Event Loop Methods ================== Event loops have **low-level** APIs for the following: * Running and stopping the loop * Scheduling callbacks * Scheduling delayed callbacks * Creating Futures and Tasks * Opening network connections * Creating network servers * Transferring files * TLS Upgrade * Watching file descriptors * Working with socket objects directly * DNS * Working with pipes * Unix signals * Executing code in thread or process pools * Error Handling API * Enabling debug mode * Running Subprocesses Running and stopping the loop ----------------------------- loop.run_until_complete(future) Run until the *future* (an instance of "Future") has completed. If the argument is a coroutine object it is implicitly scheduled to run as a "asyncio.Task". Return the Future’s result or raise its exception. loop.run_forever() Run the event loop until "stop()" is called. If "stop()" is called before "run_forever()" is called, the loop will poll the I/O selector once with a timeout of zero, run all callbacks scheduled in response to I/O events (and those that were already scheduled), and then exit. If "stop()" is called while "run_forever()" is running, the loop will run the current batch of callbacks and then exit. Note that new callbacks scheduled by callbacks will not run in this case; instead, they will run the next time "run_forever()" or "run_until_complete()" is called. loop.stop() Stop the event loop. loop.is_running() Return "True" if the event loop is currently running. loop.is_closed() Return "True" if the event loop was closed. loop.close() Close the event loop. The loop must not be running when this function is called. Any pending callbacks will be discarded. This method clears all queues and shuts down the executor, but does not wait for the executor to finish. This method is idempotent and irreversible. No other methods should be called after the event loop is closed. coroutine loop.shutdown_asyncgens() Schedule all currently open *asynchronous generator* objects to close with an "aclose()" call. After calling this method, the event loop will issue a warning if a new asynchronous generator is iterated. This should be used to reliably finalize all scheduled asynchronous generators. Note that there is no need to call this function when "asyncio.run()" is used. Example: try: loop.run_forever() finally: loop.run_until_complete(loop.shutdown_asyncgens()) loop.close() New in version 3.6. coroutine loop.shutdown_default_executor() Schedule the closure of the default executor and wait for it to join all of the threads in the "ThreadPoolExecutor". After calling this method, a "RuntimeError" will be raised if "loop.run_in_executor()" is called while using the default executor. Note that there is no need to call this function when "asyncio.run()" is used. New in version 3.9. Scheduling callbacks -------------------- loop.call_soon(callback, *args, context=None) Schedule the *callback* *callback* to be called with *args* arguments at the next iteration of the event loop. Callbacks are called in the order in which they are registered. Each callback will be called exactly once. An optional keyword-only *context* argument allows specifying a custom "contextvars.Context" for the *callback* to run in. The current context is used when no *context* is provided. An instance of "asyncio.Handle" is returned, which can be used later to cancel the callback. This method is not thread-safe. loop.call_soon_threadsafe(callback, *args, context=None) A thread-safe variant of "call_soon()". Must be used to schedule callbacks *from another thread*. Raises "RuntimeError" if called on a loop that’s been closed. This can happen on a secondary thread when the main application is shutting down. See the concurrency and multithreading section of the documentation. Changed in version 3.7: The *context* keyword-only parameter was added. See **PEP 567** for more details. Note: Most "asyncio" scheduling functions don’t allow passing keyword arguments. To do that, use "functools.partial()": # will schedule "print("Hello", flush=True)" loop.call_soon( functools.partial(print, "Hello", flush=True)) Using partial objects is usually more convenient than using lambdas, as asyncio can render partial objects better in debug and error messages. Scheduling delayed callbacks ---------------------------- Event loop provides mechanisms to schedule callback functions to be called at some point in the future. Event loop uses monotonic clocks to track time. loop.call_later(delay, callback, *args, context=None) Schedule *callback* to be called after the given *delay* number of seconds (can be either an int or a float). An instance of "asyncio.TimerHandle" is returned which can be used to cancel the callback. *callback* will be called exactly once. If two callbacks are scheduled for exactly the same time, the order in which they are called is undefined. The optional positional *args* will be passed to the callback when it is called. If you want the callback to be called with keyword arguments use "functools.partial()". An optional keyword-only *context* argument allows specifying a custom "contextvars.Context" for the *callback* to run in. The current context is used when no *context* is provided. Changed in version 3.7: The *context* keyword-only parameter was added. See **PEP 567** for more details. Changed in version 3.8: In Python 3.7 and earlier with the default event loop implementation, the *delay* could not exceed one day. This has been fixed in Python 3.8. loop.call_at(when, callback, *args, context=None) Schedule *callback* to be called at the given absolute timestamp *when* (an int or a float), using the same time reference as "loop.time()". This method’s behavior is the same as "call_later()". An instance of "asyncio.TimerHandle" is returned which can be used to cancel the callback. Changed in version 3.7: The *context* keyword-only parameter was added. See **PEP 567** for more details. Changed in version 3.8: In Python 3.7 and earlier with the default event loop implementation, the difference between *when* and the current time could not exceed one day. This has been fixed in Python 3.8. loop.time() Return the current time, as a "float" value, according to the event loop’s internal monotonic clock. Note: Changed in version 3.8: In Python 3.7 and earlier timeouts (relative *delay* or absolute *when*) should not exceed one day. This has been fixed in Python 3.8. See also: The "asyncio.sleep()" function. Creating Futures and Tasks -------------------------- loop.create_future() Create an "asyncio.Future" object attached to the event loop. This is the preferred way to create Futures in asyncio. This lets third-party event loops provide alternative implementations of the Future object (with better performance or instrumentation). New in version 3.5.2. loop.create_task(coro, *, name=None) Schedule the execution of a Coroutines. Return a "Task" object. Third-party event loops can use their own subclass of "Task" for interoperability. In this case, the result type is a subclass of "Task". If the *name* argument is provided and not "None", it is set as the name of the task using "Task.set_name()". Changed in version 3.8: Added the "name" parameter. loop.set_task_factory(factory) Set a task factory that will be used by "loop.create_task()". If *factory* is "None" the default task factory will be set. Otherwise, *factory* must be a *callable* with the signature matching "(loop, coro)", where *loop* is a reference to the active event loop, and *coro* is a coroutine object. The callable must return a "asyncio.Future"-compatible object. loop.get_task_factory() Return a task factory or "None" if the default one is in use. Opening network connections --------------------------- coroutine loop.create_connection(protocol_factory, host=None, port=None, *, ssl=None, family=0, proto=0, flags=0, sock=None, local_addr=None, server_hostname=None, ssl_handshake_timeout=None, happy_eyeballs_delay=None, interleave=None) Open a streaming transport connection to a given address specified by *host* and *port*. The socket family can be either "AF_INET" or "AF_INET6" depending on *host* (or the *family* argument, if provided). The socket type will be "SOCK_STREAM". *protocol_factory* must be a callable returning an asyncio protocol implementation. This method will try to establish the connection in the background. When successful, it returns a "(transport, protocol)" pair. The chronological synopsis of the underlying operation is as follows: 1. The connection is established and a transport is created for it. 2. *protocol_factory* is called without arguments and is expected to return a protocol instance. 3. The protocol instance is coupled with the transport by calling its "connection_made()" method. 4. A "(transport, protocol)" tuple is returned on success. The created transport is an implementation-dependent bidirectional stream. Other arguments: * *ssl*: if given and not false, a SSL/TLS transport is created (by default a plain TCP transport is created). If *ssl* is a "ssl.SSLContext" object, this context is used to create the transport; if *ssl* is "True", a default context returned from "ssl.create_default_context()" is used. See also: SSL/TLS security considerations * *server_hostname* sets or overrides the hostname that the target server’s certificate will be matched against. Should only be passed if *ssl* is not "None". By default the value of the *host* argument is used. If *host* is empty, there is no default and you must pass a value for *server_hostname*. If *server_hostname* is an empty string, hostname matching is disabled (which is a serious security risk, allowing for potential man-in-the-middle attacks). * *family*, *proto*, *flags* are the optional address family, protocol and flags to be passed through to getaddrinfo() for *host* resolution. If given, these should all be integers from the corresponding "socket" module constants. * *happy_eyeballs_delay*, if given, enables Happy Eyeballs for this connection. It should be a floating-point number representing the amount of time in seconds to wait for a connection attempt to complete, before starting the next attempt in parallel. This is the “Connection Attempt Delay” as defined in **RFC 8305**. A sensible default value recommended by the RFC is "0.25" (250 milliseconds). * *interleave* controls address reordering when a host name resolves to multiple IP addresses. If "0" or unspecified, no reordering is done, and addresses are tried in the order returned by "getaddrinfo()". If a positive integer is specified, the addresses are interleaved by address family, and the given integer is interpreted as “First Address Family Count” as defined in **RFC 8305**. The default is "0" if *happy_eyeballs_delay* is not specified, and "1" if it is. * *sock*, if given, should be an existing, already connected "socket.socket" object to be used by the transport. If *sock* is given, none of *host*, *port*, *family*, *proto*, *flags*, *happy_eyeballs_delay*, *interleave* and *local_addr* should be specified. * *local_addr*, if given, is a "(local_host, local_port)" tuple used to bind the socket locally. The *local_host* and *local_port* are looked up using "getaddrinfo()", similarly to *host* and *port*. * *ssl_handshake_timeout* is (for a TLS connection) the time in seconds to wait for the TLS handshake to complete before aborting the connection. "60.0" seconds if "None" (default). New in version 3.8: Added the *happy_eyeballs_delay* and *interleave* parameters.Happy Eyeballs Algorithm: Success with Dual-Stack Hosts. When a server’s IPv4 path and protocol are working, but the server’s IPv6 path and protocol are not working, a dual-stack client application experiences significant connection delay compared to an IPv4-only client. This is undesirable because it causes the dual- stack client to have a worse user experience. This document specifies requirements for algorithms that reduce this user-visible delay and provides an algorithm.For more information: https://tools.ietf.org/html/rfc6555 New in version 3.7: The *ssl_handshake_timeout* parameter. Changed in version 3.6: The socket option "TCP_NODELAY" is set by default for all TCP connections. Changed in version 3.5: Added support for SSL/TLS in "ProactorEventLoop". See also: The "open_connection()" function is a high-level alternative API. It returns a pair of ("StreamReader", "StreamWriter") that can be used directly in async/await code. coroutine loop.create_datagram_endpoint(protocol_factory, local_addr=None, remote_addr=None, *, family=0, proto=0, flags=0, reuse_address=None, reuse_port=None, allow_broadcast=None, sock=None) Note: The parameter *reuse_address* is no longer supported, as using "SO_REUSEADDR" poses a significant security concern for UDP. Explicitly passing "reuse_address=True" will raise an exception.When multiple processes with differing UIDs assign sockets to an identical UDP socket address with "SO_REUSEADDR", incoming packets can become randomly distributed among the sockets.For supported platforms, *reuse_port* can be used as a replacement for similar functionality. With *reuse_port*, "SO_REUSEPORT" is used instead, which specifically prevents processes with differing UIDs from assigning sockets to the same socket address. Create a datagram connection. The socket family can be either "AF_INET", "AF_INET6", or "AF_UNIX", depending on *host* (or the *family* argument, if provided). The socket type will be "SOCK_DGRAM". *protocol_factory* must be a callable returning a protocol implementation. A tuple of "(transport, protocol)" is returned on success. Other arguments: * *local_addr*, if given, is a "(local_host, local_port)" tuple used to bind the socket locally. The *local_host* and *local_port* are looked up using "getaddrinfo()". * *remote_addr*, if given, is a "(remote_host, remote_port)" tuple used to connect the socket to a remote address. The *remote_host* and *remote_port* are looked up using "getaddrinfo()". * *family*, *proto*, *flags* are the optional address family, protocol and flags to be passed through to "getaddrinfo()" for *host* resolution. If given, these should all be integers from the corresponding "socket" module constants. * *reuse_port* tells the kernel to allow this endpoint to be bound to the same port as other existing endpoints are bound to, so long as they all set this flag when being created. This option is not supported on Windows and some Unixes. If the "SO_REUSEPORT" constant is not defined then this capability is unsupported. * *allow_broadcast* tells the kernel to allow this endpoint to send messages to the broadcast address. * *sock* can optionally be specified in order to use a preexisting, already connected, "socket.socket" object to be used by the transport. If specified, *local_addr* and *remote_addr* should be omitted (must be "None"). See UDP echo client protocol and UDP echo server protocol examples. Changed in version 3.4.4: The *family*, *proto*, *flags*, *reuse_address*, *reuse_port, *allow_broadcast*, and *sock* parameters were added. Changed in version 3.8.1: The *reuse_address* parameter is no longer supported due to security concerns. Changed in version 3.8: Added support for Windows. coroutine loop.create_unix_connection(protocol_factory, path=None, *, ssl=None, sock=None, server_hostname=None, ssl_handshake_timeout=None) Create a Unix connection. The socket family will be "AF_UNIX"; socket type will be "SOCK_STREAM". A tuple of "(transport, protocol)" is returned on success. *path* is the name of a Unix domain socket and is required, unless a *sock* parameter is specified. Abstract Unix sockets, "str", "bytes", and "Path" paths are supported. See the documentation of the "loop.create_connection()" method for information about arguments to this method. Availability: Unix. New in version 3.7: The *ssl_handshake_timeout* parameter. Changed in version 3.7: The *path* parameter can now be a *path- like object*. Creating network servers ------------------------ coroutine loop.create_server(protocol_factory, host=None, port=None, *, family=socket.AF_UNSPEC, flags=socket.AI_PASSIVE, sock=None, backlog=100, ssl=None, reuse_address=None, reuse_port=None, ssl_handshake_timeout=None, start_serving=True) Create a TCP server (socket type "SOCK_STREAM") listening on *port* of the *host* address. Returns a "Server" object. Arguments: * *protocol_factory* must be a callable returning a protocol implementation. * The *host* parameter can be set to several types which determine where the server would be listening: * If *host* is a string, the TCP server is bound to a single network interface specified by *host*. * If *host* is a sequence of strings, the TCP server is bound to all network interfaces specified by the sequence. * If *host* is an empty string or "None", all interfaces are assumed and a list of multiple sockets will be returned (most likely one for IPv4 and another one for IPv6). * The *port* parameter can be set to specify which port the server should listen on. If "0" or "None" (the default), a random unused port will be selected (note that if *host* resolves to multiple network interfaces, a different random port will be selected for each interface). * *family* can be set to either "socket.AF_INET" or "AF_INET6" to force the socket to use IPv4 or IPv6. If not set, the *family* will be determined from host name (defaults to "AF_UNSPEC"). * *flags* is a bitmask for "getaddrinfo()". * *sock* can optionally be specified in order to use a preexisting socket object. If specified, *host* and *port* must not be specified. * *backlog* is the maximum number of queued connections passed to "listen()" (defaults to 100). * *ssl* can be set to an "SSLContext" instance to enable TLS over the accepted connections. * *reuse_address* tells the kernel to reuse a local socket in "TIME_WAIT" state, without waiting for its natural timeout to expire. If not specified will automatically be set to "True" on Unix. * *reuse_port* tells the kernel to allow this endpoint to be bound to the same port as other existing endpoints are bound to, so long as they all set this flag when being created. This option is not supported on Windows. * *ssl_handshake_timeout* is (for a TLS server) the time in seconds to wait for the TLS handshake to complete before aborting the connection. "60.0" seconds if "None" (default). * *start_serving* set to "True" (the default) causes the created server to start accepting connections immediately. When set to "False", the user should await on "Server.start_serving()" or "Server.serve_forever()" to make the server to start accepting connections. New in version 3.7: Added *ssl_handshake_timeout* and *start_serving* parameters. Changed in version 3.6: The socket option "TCP_NODELAY" is set by default for all TCP connections. Changed in version 3.5: Added support for SSL/TLS in "ProactorEventLoop". Changed in version 3.5.1: The *host* parameter can be a sequence of strings. See also: The "start_server()" function is a higher-level alternative API that returns a pair of "StreamReader" and "StreamWriter" that can be used in an async/await code. coroutine loop.create_unix_server(protocol_factory, path=None, *, sock=None, backlog=100, ssl=None, ssl_handshake_timeout=None, start_serving=True) Similar to "loop.create_server()" but works with the "AF_UNIX" socket family. *path* is the name of a Unix domain socket, and is required, unless a *sock* argument is provided. Abstract Unix sockets, "str", "bytes", and "Path" paths are supported. See the documentation of the "loop.create_server()" method for information about arguments to this method. Availability: Unix. New in version 3.7: The *ssl_handshake_timeout* and *start_serving* parameters. Changed in version 3.7: The *path* parameter can now be a "Path" object. coroutine loop.connect_accepted_socket(protocol_factory, sock, *, ssl=None, ssl_handshake_timeout=None) Wrap an already accepted connection into a transport/protocol pair. This method can be used by servers that accept connections outside of asyncio but that use asyncio to handle them. Parameters: * *protocol_factory* must be a callable returning a protocol implementation. * *sock* is a preexisting socket object returned from "socket.accept". * *ssl* can be set to an "SSLContext" to enable SSL over the accepted connections. * *ssl_handshake_timeout* is (for an SSL connection) the time in seconds to wait for the SSL handshake to complete before aborting the connection. "60.0" seconds if "None" (default). Returns a "(transport, protocol)" pair. New in version 3.7: The *ssl_handshake_timeout* parameter. New in version 3.5.3. Transferring files ------------------ coroutine loop.sendfile(transport, file, offset=0, count=None, *, fallback=True) Send a *file* over a *transport*. Return the total number of bytes sent. The method uses high-performance "os.sendfile()" if available. *file* must be a regular file object opened in binary mode. *offset* tells from where to start reading the file. If specified, *count* is the total number of bytes to transmit as opposed to sending the file until EOF is reached. File position is always updated, even when this method raises an error, and "file.tell()" can be used to obtain the actual number of bytes sent. *fallback* set to "True" makes asyncio to manually read and send the file when the platform does not support the sendfile system call (e.g. Windows or SSL socket on Unix). Raise "SendfileNotAvailableError" if the system does not support the *sendfile* syscall and *fallback* is "False". New in version 3.7. TLS Upgrade ----------- coroutine loop.start_tls(transport, protocol, sslcontext, *, server_side=False, server_hostname=None, ssl_handshake_timeout=None) Upgrade an existing transport-based connection to TLS. Return a new transport instance, that the *protocol* must start using immediately after the *await*. The *transport* instance passed to the *start_tls* method should never be used again. Parameters: * *transport* and *protocol* instances that methods like "create_server()" and "create_connection()" return. * *sslcontext*: a configured instance of "SSLContext". * *server_side* pass "True" when a server-side connection is being upgraded (like the one created by "create_server()"). * *server_hostname*: sets or overrides the host name that the target server’s certificate will be matched against. * *ssl_handshake_timeout* is (for a TLS connection) the time in seconds to wait for the TLS handshake to complete before aborting the connection. "60.0" seconds if "None" (default). New in version 3.7. Watching file descriptors ------------------------- loop.add_reader(fd, callback, *args) Start monitoring the *fd* file descriptor for read availability and invoke *callback* with the specified arguments once *fd* is available for reading. loop.remove_reader(fd) Stop monitoring the *fd* file descriptor for read availability. loop.add_writer(fd, callback, *args) Start monitoring the *fd* file descriptor for write availability and invoke *callback* with the specified arguments once *fd* is available for writing. Use "functools.partial()" to pass keyword arguments to *callback*. loop.remove_writer(fd) Stop monitoring the *fd* file descriptor for write availability. See also Platform Support section for some limitations of these methods. Working with socket objects directly ------------------------------------ In general, protocol implementations that use transport-based APIs such as "loop.create_connection()" and "loop.create_server()" are faster than implementations that work with sockets directly. However, there are some use cases when performance is not critical, and working with "socket" objects directly is more convenient. coroutine loop.sock_recv(sock, nbytes) Receive up to *nbytes* from *sock*. Asynchronous version of "socket.recv()". Return the received data as a bytes object. *sock* must be a non-blocking socket. Changed in version 3.7: Even though this method was always documented as a coroutine method, releases before Python 3.7 returned a "Future". Since Python 3.7 this is an "async def" method. coroutine loop.sock_recv_into(sock, buf) Receive data from *sock* into the *buf* buffer. Modeled after the blocking "socket.recv_into()" method. Return the number of bytes written to the buffer. *sock* must be a non-blocking socket. New in version 3.7. coroutine loop.sock_sendall(sock, data) Send *data* to the *sock* socket. Asynchronous version of "socket.sendall()". This method continues to send to the socket until either all data in *data* has been sent or an error occurs. "None" is returned on success. On error, an exception is raised. Additionally, there is no way to determine how much data, if any, was successfully processed by the receiving end of the connection. *sock* must be a non-blocking socket. Changed in version 3.7: Even though the method was always documented as a coroutine method, before Python 3.7 it returned an "Future". Since Python 3.7, this is an "async def" method. coroutine loop.sock_connect(sock, address) Connect *sock* to a remote socket at *address*. Asynchronous version of "socket.connect()". *sock* must be a non-blocking socket. Changed in version 3.5.2: "address" no longer needs to be resolved. "sock_connect" will try to check if the *address* is already resolved by calling "socket.inet_pton()". If not, "loop.getaddrinfo()" will be used to resolve the *address*. See also: "loop.create_connection()" and "asyncio.open_connection()". coroutine loop.sock_accept(sock) Accept a connection. Modeled after the blocking "socket.accept()" method. The socket must be bound to an address and listening for connections. The return value is a pair "(conn, address)" where *conn* is a *new* socket object usable to send and receive data on the connection, and *address* is the address bound to the socket on the other end of the connection. *sock* must be a non-blocking socket. Changed in version 3.7: Even though the method was always documented as a coroutine method, before Python 3.7 it returned a "Future". Since Python 3.7, this is an "async def" method. See also: "loop.create_server()" and "start_server()". coroutine loop.sock_sendfile(sock, file, offset=0, count=None, *, fallback=True) Send a file using high-performance "os.sendfile" if possible. Return the total number of bytes sent. Asynchronous version of "socket.sendfile()". *sock* must be a non-blocking "socket.SOCK_STREAM" "socket". *file* must be a regular file object open in binary mode. *offset* tells from where to start reading the file. If specified, *count* is the total number of bytes to transmit as opposed to sending the file until EOF is reached. File position is always updated, even when this method raises an error, and "file.tell()" can be used to obtain the actual number of bytes sent. *fallback*, when set to "True", makes asyncio manually read and send the file when the platform does not support the sendfile syscall (e.g. Windows or SSL socket on Unix). Raise "SendfileNotAvailableError" if the system does not support *sendfile* syscall and *fallback* is "False". *sock* must be a non-blocking socket. New in version 3.7. DNS --- coroutine loop.getaddrinfo(host, port, *, family=0, type=0, proto=0, flags=0) Asynchronous version of "socket.getaddrinfo()". coroutine loop.getnameinfo(sockaddr, flags=0) Asynchronous version of "socket.getnameinfo()". Changed in version 3.7: Both *getaddrinfo* and *getnameinfo* methods were always documented to return a coroutine, but prior to Python 3.7 they were, in fact, returning "asyncio.Future" objects. Starting with Python 3.7 both methods are coroutines. Working with pipes ------------------ coroutine loop.connect_read_pipe(protocol_factory, pipe) Register the read end of *pipe* in the event loop. *protocol_factory* must be a callable returning an asyncio protocol implementation. *pipe* is a *file-like object*. Return pair "(transport, protocol)", where *transport* supports the "ReadTransport" interface and *protocol* is an object instantiated by the *protocol_factory*. With "SelectorEventLoop" event loop, the *pipe* is set to non- blocking mode. coroutine loop.connect_write_pipe(protocol_factory, pipe) Register the write end of *pipe* in the event loop. *protocol_factory* must be a callable returning an asyncio protocol implementation. *pipe* is *file-like object*. Return pair "(transport, protocol)", where *transport* supports "WriteTransport" interface and *protocol* is an object instantiated by the *protocol_factory*. With "SelectorEventLoop" event loop, the *pipe* is set to non- blocking mode. Note: "SelectorEventLoop" does not support the above methods on Windows. Use "ProactorEventLoop" instead for Windows. See also: The "loop.subprocess_exec()" and "loop.subprocess_shell()" methods. Unix signals ------------ loop.add_signal_handler(signum, callback, *args) Set *callback* as the handler for the *signum* signal. The callback will be invoked by *loop*, along with other queued callbacks and runnable coroutines of that event loop. Unlike signal handlers registered using "signal.signal()", a callback registered with this function is allowed to interact with the event loop. Raise "ValueError" if the signal number is invalid or uncatchable. Raise "RuntimeError" if there is a problem setting up the handler. Use "functools.partial()" to pass keyword arguments to *callback*. Like "signal.signal()", this function must be invoked in the main thread. loop.remove_signal_handler(sig) Remove the handler for the *sig* signal. Return "True" if the signal handler was removed, or "False" if no handler was set for the given signal. Availability: Unix. See also: The "signal" module. Executing code in thread or process pools ----------------------------------------- awaitable loop.run_in_executor(executor, func, *args) Arrange for *func* to be called in the specified executor. The *executor* argument should be an "concurrent.futures.Executor" instance. The default executor is used if *executor* is "None". Example: import asyncio import concurrent.futures def blocking_io(): # File operations (such as logging) can block the # event loop: run them in a thread pool. with open('/dev/urandom', 'rb') as f: return f.read(100) def cpu_bound(): # CPU-bound operations will block the event loop: # in general it is preferable to run them in a # process pool. return sum(i * i for i in range(10 ** 7)) async def main(): loop = asyncio.get_running_loop() ## Options: # 1. Run in the default loop's executor: result = await loop.run_in_executor( None, blocking_io) print('default thread pool', result) # 2. Run in a custom thread pool: with concurrent.futures.ThreadPoolExecutor() as pool: result = await loop.run_in_executor( pool, blocking_io) print('custom thread pool', result) # 3. Run in a custom process pool: with concurrent.futures.ProcessPoolExecutor() as pool: result = await loop.run_in_executor( pool, cpu_bound) print('custom process pool', result) asyncio.run(main()) This method returns a "asyncio.Future" object. Use "functools.partial()" to pass keyword arguments to *func*. Changed in version 3.5.3: "loop.run_in_executor()" no longer configures the "max_workers" of the thread pool executor it creates, instead leaving it up to the thread pool executor ("ThreadPoolExecutor") to set the default. loop.set_default_executor(executor) Set *executor* as the default executor used by "run_in_executor()". *executor* should be an instance of "ThreadPoolExecutor". Deprecated since version 3.8: Using an executor that is not an instance of "ThreadPoolExecutor" is deprecated and will trigger an error in Python 3.9. *executor* must be an instance of "concurrent.futures.ThreadPoolExecutor". Error Handling API ------------------ Allows customizing how exceptions are handled in the event loop. loop.set_exception_handler(handler) Set *handler* as the new event loop exception handler. If *handler* is "None", the default exception handler will be set. Otherwise, *handler* must be a callable with the signature matching "(loop, context)", where "loop" is a reference to the active event loop, and "context" is a "dict" object containing the details of the exception (see "call_exception_handler()" documentation for details about context). loop.get_exception_handler() Return the current exception handler, or "None" if no custom exception handler was set. New in version 3.5.2. loop.default_exception_handler(context) Default exception handler. This is called when an exception occurs and no exception handler is set. This can be called by a custom exception handler that wants to defer to the default handler behavior. *context* parameter has the same meaning as in "call_exception_handler()". loop.call_exception_handler(context) Call the current event loop exception handler. *context* is a "dict" object containing the following keys (new keys may be introduced in future Python versions): * ‘message’: Error message; * ‘exception’ (optional): Exception object; * ‘future’ (optional): "asyncio.Future" instance; * ‘task’ (optional): "asyncio.Task" instance; * ‘handle’ (optional): "asyncio.Handle" instance; * ‘protocol’ (optional): Protocol instance; * ‘transport’ (optional): Transport instance; * ‘socket’ (optional): "socket.socket" instance; * ‘asyncgen’ (optional): Asynchronous generator that caused the exception. Note: This method should not be overloaded in subclassed event loops. For custom exception handling, use the "set_exception_handler()" method. Enabling debug mode ------------------- loop.get_debug() Get the debug mode ("bool") of the event loop. The default value is "True" if the environment variable "PYTHONASYNCIODEBUG" is set to a non-empty string, "False" otherwise. loop.set_debug(enabled: bool) Set the debug mode of the event loop. Changed in version 3.7: The new Python Development Mode can now also be used to enable the debug mode. See also: The debug mode of asyncio. Running Subprocesses -------------------- Methods described in this subsections are low-level. In regular async/await code consider using the high-level "asyncio.create_subprocess_shell()" and "asyncio.create_subprocess_exec()" convenience functions instead. Note: On Windows, the default event loop "ProactorEventLoop" supports subprocesses, whereas "SelectorEventLoop" does not. See Subprocess Support on Windows for details. coroutine loop.subprocess_exec(protocol_factory, *args, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, **kwargs) Create a subprocess from one or more string arguments specified by *args*. *args* must be a list of strings represented by: * "str"; * or "bytes", encoded to the filesystem encoding. The first string specifies the program executable, and the remaining strings specify the arguments. Together, string arguments form the "argv" of the program. This is similar to the standard library "subprocess.Popen" class called with "shell=False" and the list of strings passed as the first argument; however, where "Popen" takes a single argument which is list of strings, *subprocess_exec* takes multiple string arguments. The *protocol_factory* must be a callable returning a subclass of the "asyncio.SubprocessProtocol" class. Other parameters: * *stdin* can be any of these: * a file-like object representing a pipe to be connected to the subprocess’s standard input stream using "connect_write_pipe()" * the "subprocess.PIPE" constant (default) which will create a new pipe and connect it, * the value "None" which will make the subprocess inherit the file descriptor from this process * the "subprocess.DEVNULL" constant which indicates that the special "os.devnull" file will be used * *stdout* can be any of these: * a file-like object representing a pipe to be connected to the subprocess’s standard output stream using "connect_write_pipe()" * the "subprocess.PIPE" constant (default) which will create a new pipe and connect it, * the value "None" which will make the subprocess inherit the file descriptor from this process * the "subprocess.DEVNULL" constant which indicates that the special "os.devnull" file will be used * *stderr* can be any of these: * a file-like object representing a pipe to be connected to the subprocess’s standard error stream using "connect_write_pipe()" * the "subprocess.PIPE" constant (default) which will create a new pipe and connect it, * the value "None" which will make the subprocess inherit the file descriptor from this process * the "subprocess.DEVNULL" constant which indicates that the special "os.devnull" file will be used * the "subprocess.STDOUT" constant which will connect the standard error stream to the process’ standard output stream * All other keyword arguments are passed to "subprocess.Popen" without interpretation, except for *bufsize*, *universal_newlines*, *shell*, *text*, *encoding* and *errors*, which should not be specified at all. The "asyncio" subprocess API does not support decoding the streams as text. "bytes.decode()" can be used to convert the bytes returned from the stream to text. See the constructor of the "subprocess.Popen" class for documentation on other arguments. Returns a pair of "(transport, protocol)", where *transport* conforms to the "asyncio.SubprocessTransport" base class and *protocol* is an object instantiated by the *protocol_factory*. coroutine loop.subprocess_shell(protocol_factory, cmd, *, stdin=subprocess.PIPE, stdout=subprocess.PIPE, stderr=subprocess.PIPE, **kwargs) Create a subprocess from *cmd*, which can be a "str" or a "bytes" string encoded to the filesystem encoding, using the platform’s “shell” syntax. This is similar to the standard library "subprocess.Popen" class called with "shell=True". The *protocol_factory* must be a callable returning a subclass of the "SubprocessProtocol" class. See "subprocess_exec()" for more details about the remaining arguments. Returns a pair of "(transport, protocol)", where *transport* conforms to the "SubprocessTransport" base class and *protocol* is an object instantiated by the *protocol_factory*. Note: It is the application’s responsibility to ensure that all whitespace and special characters are quoted appropriately to avoid shell injection vulnerabilities. The "shlex.quote()" function can be used to properly escape whitespace and special characters in strings that are going to be used to construct shell commands. Callback Handles ================ class asyncio.Handle A callback wrapper object returned by "loop.call_soon()", "loop.call_soon_threadsafe()". cancel() Cancel the callback. If the callback has already been canceled or executed, this method has no effect. cancelled() Return "True" if the callback was cancelled. New in version 3.7. class asyncio.TimerHandle A callback wrapper object returned by "loop.call_later()", and "loop.call_at()". This class is a subclass of "Handle". when() Return a scheduled callback time as "float" seconds. The time is an absolute timestamp, using the same time reference as "loop.time()". New in version 3.7. Server Objects ============== Server objects are created by "loop.create_server()", "loop.create_unix_server()", "start_server()", and "start_unix_server()" functions. Do not instantiate the class directly. class asyncio.Server *Server* objects are asynchronous context managers. When used in an "async with" statement, it’s guaranteed that the Server object is closed and not accepting new connections when the "async with" statement is completed: srv = await loop.create_server(...) async with srv: # some code # At this point, srv is closed and no longer accepts new connections. Changed in version 3.7: Server object is an asynchronous context manager since Python 3.7. close() Stop serving: close listening sockets and set the "sockets" attribute to "None". The sockets that represent existing incoming client connections are left open. The server is closed asynchronously, use the "wait_closed()" coroutine to wait until the server is closed. get_loop() Return the event loop associated with the server object. New in version 3.7. coroutine start_serving() Start accepting connections. This method is idempotent, so it can be called when the server is already being serving. The *start_serving* keyword-only parameter to "loop.create_server()" and "asyncio.start_server()" allows creating a Server object that is not accepting connections initially. In this case "Server.start_serving()", or "Server.serve_forever()" can be used to make the Server start accepting connections. New in version 3.7. coroutine serve_forever() Start accepting connections until the coroutine is cancelled. Cancellation of "serve_forever" task causes the server to be closed. This method can be called if the server is already accepting connections. Only one "serve_forever" task can exist per one *Server* object. Example: async def client_connected(reader, writer): # Communicate with the client with # reader/writer streams. For example: await reader.readline() async def main(host, port): srv = await asyncio.start_server( client_connected, host, port) await srv.serve_forever() asyncio.run(main('127.0.0.1', 0)) New in version 3.7. is_serving() Return "True" if the server is accepting new connections. New in version 3.7. coroutine wait_closed() Wait until the "close()" method completes. sockets List of "socket.socket" objects the server is listening on. Changed in version 3.7: Prior to Python 3.7 "Server.sockets" used to return an internal list of server sockets directly. In 3.7 a copy of that list is returned. Event Loop Implementations ========================== asyncio ships with two different event loop implementations: "SelectorEventLoop" and "ProactorEventLoop". By default asyncio is configured to use "SelectorEventLoop" on Unix and "ProactorEventLoop" on Windows. class asyncio.SelectorEventLoop An event loop based on the "selectors" module. Uses the most efficient *selector* available for the given platform. It is also possible to manually configure the exact selector implementation to be used: import asyncio import selectors selector = selectors.SelectSelector() loop = asyncio.SelectorEventLoop(selector) asyncio.set_event_loop(loop) Availability: Unix, Windows. class asyncio.ProactorEventLoop An event loop for Windows that uses “I/O Completion Ports” (IOCP). Availability: Windows. See also: MSDN documentation on I/O Completion Ports. class asyncio.AbstractEventLoop Abstract base class for asyncio-compliant event loops. The Event Loop Methods section lists all methods that an alternative implementation of "AbstractEventLoop" should have defined. Examples ======== Note that all examples in this section **purposefully** show how to use the low-level event loop APIs, such as "loop.run_forever()" and "loop.call_soon()". Modern asyncio applications rarely need to be written this way; consider using the high-level functions like "asyncio.run()". Hello World with call_soon() ---------------------------- An example using the "loop.call_soon()" method to schedule a callback. The callback displays ""Hello World"" and then stops the event loop: import asyncio def hello_world(loop): """A callback to print 'Hello World' and stop the event loop""" print('Hello World') loop.stop() loop = asyncio.get_event_loop() # Schedule a call to hello_world() loop.call_soon(hello_world, loop) # Blocking call interrupted by loop.stop() try: loop.run_forever() finally: loop.close() See also: A similar Hello World example created with a coroutine and the "run()" function. Display the current date with call_later() ------------------------------------------ An example of a callback displaying the current date every second. The callback uses the "loop.call_later()" method to reschedule itself after 5 seconds, and then stops the event loop: import asyncio import datetime def display_date(end_time, loop): print(datetime.datetime.now()) if (loop.time() + 1.0) < end_time: loop.call_later(1, display_date, end_time, loop) else: loop.stop() loop = asyncio.get_event_loop() # Schedule the first call to display_date() end_time = loop.time() + 5.0 loop.call_soon(display_date, end_time, loop) # Blocking call interrupted by loop.stop() try: loop.run_forever() finally: loop.close() See also: A similar current date example created with a coroutine and the "run()" function. Watch a file descriptor for read events --------------------------------------- Wait until a file descriptor received some data using the "loop.add_reader()" method and then close the event loop: import asyncio from socket import socketpair # Create a pair of connected file descriptors rsock, wsock = socketpair() loop = asyncio.get_event_loop() def reader(): data = rsock.recv(100) print("Received:", data.decode()) # We are done: unregister the file descriptor loop.remove_reader(rsock) # Stop the event loop loop.stop() # Register the file descriptor for read event loop.add_reader(rsock, reader) # Simulate the reception of data from the network loop.call_soon(wsock.send, 'abc'.encode()) try: # Run the event loop loop.run_forever() finally: # We are done. Close sockets and the event loop. rsock.close() wsock.close() loop.close() See also: * A similar example using transports, protocols, and the "loop.create_connection()" method. * Another similar example using the high-level "asyncio.open_connection()" function and streams. Set signal handlers for SIGINT and SIGTERM ------------------------------------------ (This "signals" example only works on Unix.) Register handlers for signals "SIGINT" and "SIGTERM" using the "loop.add_signal_handler()" method: import asyncio import functools import os import signal def ask_exit(signame, loop): print("got signal %s: exit" % signame) loop.stop() async def main(): loop = asyncio.get_running_loop() for signame in {'SIGINT', 'SIGTERM'}: loop.add_signal_handler( getattr(signal, signame), functools.partial(ask_exit, signame, loop)) await asyncio.sleep(3600) print("Event loop running for 1 hour, press Ctrl+C to interrupt.") print(f"pid {os.getpid()}: send SIGINT or SIGTERM to exit.") asyncio.run(main()) Exceptions ********** **Source code:** Lib/asyncio/exceptions.py ====================================================================== exception asyncio.TimeoutError The operation has exceeded the given deadline. Important: This exception is different from the builtin "TimeoutError" exception. exception asyncio.CancelledError The operation has been cancelled. This exception can be caught to perform custom operations when asyncio Tasks are cancelled. In almost all situations the exception must be re-raised. Changed in version 3.8: "CancelledError" is now a subclass of "BaseException". exception asyncio.InvalidStateError Invalid internal state of "Task" or "Future". Can be raised in situations like setting a result value for a *Future* object that already has a result value set. exception asyncio.SendfileNotAvailableError The “sendfile” syscall is not available for the given socket or file type. A subclass of "RuntimeError". exception asyncio.IncompleteReadError The requested read operation did not complete fully. Raised by the asyncio stream APIs. This exception is a subclass of "EOFError". expected The total number ("int") of expected bytes. partial A string of "bytes" read before the end of stream was reached. exception asyncio.LimitOverrunError Reached the buffer size limit while looking for a separator. Raised by the asyncio stream APIs. consumed The total number of to be consumed bytes. Futures ******* **Source code:** Lib/asyncio/futures.py, Lib/asyncio/base_futures.py ====================================================================== *Future* objects are used to bridge **low-level callback-based code** with high-level async/await code. Future Functions ================ asyncio.isfuture(obj) Return "True" if *obj* is either of: * an instance of "asyncio.Future", * an instance of "asyncio.Task", * a Future-like object with a "_asyncio_future_blocking" attribute. New in version 3.5. asyncio.ensure_future(obj, *, loop=None) Return: * *obj* argument as is, if *obj* is a "Future", a "Task", or a Future-like object ("isfuture()" is used for the test.) * a "Task" object wrapping *obj*, if *obj* is a coroutine ("iscoroutine()" is used for the test); in this case the coroutine will be scheduled by "ensure_future()". * a "Task" object that would await on *obj*, if *obj* is an awaitable ("inspect.isawaitable()" is used for the test.) If *obj* is neither of the above a "TypeError" is raised. Important: See also the "create_task()" function which is the preferred way for creating new Tasks.Save a reference to the result of this function, to avoid a task disappearing mid execution. Changed in version 3.5.1: The function accepts any *awaitable* object. asyncio.wrap_future(future, *, loop=None) Wrap a "concurrent.futures.Future" object in a "asyncio.Future" object. Future Object ============= class asyncio.Future(*, loop=None) A Future represents an eventual result of an asynchronous operation. Not thread-safe. Future is an *awaitable* object. Coroutines can await on Future objects until they either have a result or an exception set, or until they are cancelled. Typically Futures are used to enable low-level callback-based code (e.g. in protocols implemented using asyncio transports) to interoperate with high-level async/await code. The rule of thumb is to never expose Future objects in user-facing APIs, and the recommended way to create a Future object is to call "loop.create_future()". This way alternative event loop implementations can inject their own optimized implementations of a Future object. Changed in version 3.7: Added support for the "contextvars" module. result() Return the result of the Future. If the Future is *done* and has a result set by the "set_result()" method, the result value is returned. If the Future is *done* and has an exception set by the "set_exception()" method, this method raises the exception. If the Future has been *cancelled*, this method raises a "CancelledError" exception. If the Future’s result isn’t yet available, this method raises a "InvalidStateError" exception. set_result(result) Mark the Future as *done* and set its result. Raises a "InvalidStateError" error if the Future is already *done*. set_exception(exception) Mark the Future as *done* and set an exception. Raises a "InvalidStateError" error if the Future is already *done*. done() Return "True" if the Future is *done*. A Future is *done* if it was *cancelled* or if it has a result or an exception set with "set_result()" or "set_exception()" calls. cancelled() Return "True" if the Future was *cancelled*. The method is usually used to check if a Future is not *cancelled* before setting a result or an exception for it: if not fut.cancelled(): fut.set_result(42) add_done_callback(callback, *, context=None) Add a callback to be run when the Future is *done*. The *callback* is called with the Future object as its only argument. If the Future is already *done* when this method is called, the callback is scheduled with "loop.call_soon()". An optional keyword-only *context* argument allows specifying a custom "contextvars.Context" for the *callback* to run in. The current context is used when no *context* is provided. "functools.partial()" can be used to pass parameters to the callback, e.g.: # Call 'print("Future:", fut)' when "fut" is done. fut.add_done_callback( functools.partial(print, "Future:")) Changed in version 3.7: The *context* keyword-only parameter was added. See **PEP 567** for more details. remove_done_callback(callback) Remove *callback* from the callbacks list. Returns the number of callbacks removed, which is typically 1, unless a callback was added more than once. cancel(msg=None) Cancel the Future and schedule callbacks. If the Future is already *done* or *cancelled*, return "False". Otherwise, change the Future’s state to *cancelled*, schedule the callbacks, and return "True". Changed in version 3.9: Added the "msg" parameter. exception() Return the exception that was set on this Future. The exception (or "None" if no exception was set) is returned only if the Future is *done*. If the Future has been *cancelled*, this method raises a "CancelledError" exception. If the Future isn’t *done* yet, this method raises an "InvalidStateError" exception. get_loop() Return the event loop the Future object is bound to. New in version 3.7. This example creates a Future object, creates and schedules an asynchronous Task to set result for the Future, and waits until the Future has a result: async def set_after(fut, delay, value): # Sleep for *delay* seconds. await asyncio.sleep(delay) # Set *value* as a result of *fut* Future. fut.set_result(value) async def main(): # Get the current event loop. loop = asyncio.get_running_loop() # Create a new Future object. fut = loop.create_future() # Run "set_after()" coroutine in a parallel Task. # We are using the low-level "loop.create_task()" API here because # we already have a reference to the event loop at hand. # Otherwise we could have just used "asyncio.create_task()". loop.create_task( set_after(fut, 1, '... world')) print('hello ...') # Wait until *fut* has a result (1 second) and print it. print(await fut) asyncio.run(main()) Important: The Future object was designed to mimic "concurrent.futures.Future". Key differences include: * unlike asyncio Futures, "concurrent.futures.Future" instances cannot be awaited. * "asyncio.Future.result()" and "asyncio.Future.exception()" do not accept the *timeout* argument. * "asyncio.Future.result()" and "asyncio.Future.exception()" raise an "InvalidStateError" exception when the Future is not *done*. * Callbacks registered with "asyncio.Future.add_done_callback()" are not called immediately. They are scheduled with "loop.call_soon()" instead. * asyncio Future is not compatible with the "concurrent.futures.wait()" and "concurrent.futures.as_completed()" functions. * "asyncio.Future.cancel()" accepts an optional "msg" argument, but "concurrent.futures.cancel()" does not. Low-level API Index ******************* This page lists all low-level asyncio APIs. Obtaining the Event Loop ======================== +----------------------------------------------------+----------------------------------------------------+ | "asyncio.get_running_loop()" | The **preferred** function to get the running | | | event loop. | +----------------------------------------------------+----------------------------------------------------+ | "asyncio.get_event_loop()" | Get an event loop instance (current or via the | | | policy). | +----------------------------------------------------+----------------------------------------------------+ | "asyncio.set_event_loop()" | Set the event loop as current via the current | | | policy. | +----------------------------------------------------+----------------------------------------------------+ | "asyncio.new_event_loop()" | Create a new event loop. | +----------------------------------------------------+----------------------------------------------------+ -[ Examples ]- * Using asyncio.get_running_loop(). Event Loop Methods ================== See also the main documentation section about the event loop methods. -[ Lifecycle ]- +----------------------------------------------------+----------------------------------------------------+ | "loop.run_until_complete()" | Run a Future/Task/awaitable until complete. | +----------------------------------------------------+----------------------------------------------------+ | "loop.run_forever()" | Run the event loop forever. | +----------------------------------------------------+----------------------------------------------------+ | "loop.stop()" | Stop the event loop. | +----------------------------------------------------+----------------------------------------------------+ | "loop.close()" | Close the event loop. | +----------------------------------------------------+----------------------------------------------------+ | "loop.is_running()" | Return "True" if the event loop is running. | +----------------------------------------------------+----------------------------------------------------+ | "loop.is_closed()" | Return "True" if the event loop is closed. | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.shutdown_asyncgens()" | Close asynchronous generators. | +----------------------------------------------------+----------------------------------------------------+ -[ Debugging ]- +----------------------------------------------------+----------------------------------------------------+ | "loop.set_debug()" | Enable or disable the debug mode. | +----------------------------------------------------+----------------------------------------------------+ | "loop.get_debug()" | Get the current debug mode. | +----------------------------------------------------+----------------------------------------------------+ -[ Scheduling Callbacks ]- +----------------------------------------------------+----------------------------------------------------+ | "loop.call_soon()" | Invoke a callback soon. | +----------------------------------------------------+----------------------------------------------------+ | "loop.call_soon_threadsafe()" | A thread-safe variant of "loop.call_soon()". | +----------------------------------------------------+----------------------------------------------------+ | "loop.call_later()" | Invoke a callback *after* the given time. | +----------------------------------------------------+----------------------------------------------------+ | "loop.call_at()" | Invoke a callback *at* the given time. | +----------------------------------------------------+----------------------------------------------------+ -[ Thread/Process Pool ]- +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.run_in_executor()" | Run a CPU-bound or other blocking function in a | | | "concurrent.futures" executor. | +----------------------------------------------------+----------------------------------------------------+ | "loop.set_default_executor()" | Set the default executor for | | | "loop.run_in_executor()". | +----------------------------------------------------+----------------------------------------------------+ -[ Tasks and Futures ]- +----------------------------------------------------+----------------------------------------------------+ | "loop.create_future()" | Create a "Future" object. | +----------------------------------------------------+----------------------------------------------------+ | "loop.create_task()" | Schedule coroutine as a "Task". | +----------------------------------------------------+----------------------------------------------------+ | "loop.set_task_factory()" | Set a factory used by "loop.create_task()" to | | | create "Tasks". | +----------------------------------------------------+----------------------------------------------------+ | "loop.get_task_factory()" | Get the factory "loop.create_task()" uses to | | | create "Tasks". | +----------------------------------------------------+----------------------------------------------------+ -[ DNS ]- +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.getaddrinfo()" | Asynchronous version of "socket.getaddrinfo()". | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.getnameinfo()" | Asynchronous version of "socket.getnameinfo()". | +----------------------------------------------------+----------------------------------------------------+ -[ Networking and IPC ]- +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.create_connection()" | Open a TCP connection. | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.create_server()" | Create a TCP server. | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.create_unix_connection()" | Open a Unix socket connection. | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.create_unix_server()" | Create a Unix socket server. | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.connect_accepted_socket()" | Wrap a "socket" into a "(transport, protocol)" | | | pair. | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.create_datagram_endpoint()" | Open a datagram (UDP) connection. | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.sendfile()" | Send a file over a transport. | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.start_tls()" | Upgrade an existing connection to TLS. | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.connect_read_pipe()" | Wrap a read end of a pipe into a "(transport, | | | protocol)" pair. | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.connect_write_pipe()" | Wrap a write end of a pipe into a "(transport, | | | protocol)" pair. | +----------------------------------------------------+----------------------------------------------------+ -[ Sockets ]- +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.sock_recv()" | Receive data from the "socket". | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.sock_recv_into()" | Receive data from the "socket" into a buffer. | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.sock_sendall()" | Send data to the "socket". | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.sock_connect()" | Connect the "socket". | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.sock_accept()" | Accept a "socket" connection. | +----------------------------------------------------+----------------------------------------------------+ | "await" "loop.sock_sendfile()" | Send a file over the "socket". | +----------------------------------------------------+----------------------------------------------------+ | "loop.add_reader()" | Start watching a file descriptor for read | | | availability. | +----------------------------------------------------+----------------------------------------------------+ | "loop.remove_reader()" | Stop watching a file descriptor for read | | | availability. | +----------------------------------------------------+----------------------------------------------------+ | "loop.add_writer()" | Start watching a file descriptor for write | | | availability. | +----------------------------------------------------+----------------------------------------------------+ | "loop.remove_writer()" | Stop watching a file descriptor for write | | | availability. | +----------------------------------------------------+----------------------------------------------------+ -[ Unix Signals ]- +----------------------------------------------------+----------------------------------------------------+ | "loop.add_signal_handler()" | Add a handler for a "signal". | +----------------------------------------------------+----------------------------------------------------+ | "loop.remove_signal_handler()" | Remove a handler for a "signal". | +----------------------------------------------------+----------------------------------------------------+ -[ Subprocesses ]- +----------------------------------------------------+----------------------------------------------------+ | "loop.subprocess_exec()" | Spawn a subprocess. | +----------------------------------------------------+----------------------------------------------------+ | "loop.subprocess_shell()" | Spawn a subprocess from a shell command. | +----------------------------------------------------+----------------------------------------------------+ -[ Error Handling ]- +----------------------------------------------------+----------------------------------------------------+ | "loop.call_exception_handler()" | Call the exception handler. | +----------------------------------------------------+----------------------------------------------------+ | "loop.set_exception_handler()" | Set a new exception handler. | +----------------------------------------------------+----------------------------------------------------+ | "loop.get_exception_handler()" | Get the current exception handler. | +----------------------------------------------------+----------------------------------------------------+ | "loop.default_exception_handler()" | The default exception handler implementation. | +----------------------------------------------------+----------------------------------------------------+ -[ Examples ]- * Using asyncio.get_event_loop() and loop.run_forever(). * Using loop.call_later(). * Using "loop.create_connection()" to implement an echo-client. * Using "loop.create_connection()" to connect a socket. * Using add_reader() to watch an FD for read events. * Using loop.add_signal_handler(). * Using loop.subprocess_exec(). Transports ========== All transports implement the following methods: +----------------------------------------------------+----------------------------------------------------+ | "transport.close()" | Close the transport. | +----------------------------------------------------+----------------------------------------------------+ | "transport.is_closing()" | Return "True" if the transport is closing or is | | | closed. | +----------------------------------------------------+----------------------------------------------------+ | "transport.get_extra_info()" | Request for information about the transport. | +----------------------------------------------------+----------------------------------------------------+ | "transport.set_protocol()" | Set a new protocol. | +----------------------------------------------------+----------------------------------------------------+ | "transport.get_protocol()" | Return the current protocol. | +----------------------------------------------------+----------------------------------------------------+ Transports that can receive data (TCP and Unix connections, pipes, etc). Returned from methods like "loop.create_connection()", "loop.create_unix_connection()", "loop.connect_read_pipe()", etc: -[ Read Transports ]- +----------------------------------------------------+----------------------------------------------------+ | "transport.is_reading()" | Return "True" if the transport is receiving. | +----------------------------------------------------+----------------------------------------------------+ | "transport.pause_reading()" | Pause receiving. | +----------------------------------------------------+----------------------------------------------------+ | "transport.resume_reading()" | Resume receiving. | +----------------------------------------------------+----------------------------------------------------+ Transports that can Send data (TCP and Unix connections, pipes, etc). Returned from methods like "loop.create_connection()", "loop.create_unix_connection()", "loop.connect_write_pipe()", etc: -[ Write Transports ]- +----------------------------------------------------+----------------------------------------------------+ | "transport.write()" | Write data to the transport. | +----------------------------------------------------+----------------------------------------------------+ | "transport.writelines()" | Write buffers to the transport. | +----------------------------------------------------+----------------------------------------------------+ | "transport.can_write_eof()" | Return "True" if the transport supports sending | | | EOF. | +----------------------------------------------------+----------------------------------------------------+ | "transport.write_eof()" | Close and send EOF after flushing buffered data. | +----------------------------------------------------+----------------------------------------------------+ | "transport.abort()" | Close the transport immediately. | +----------------------------------------------------+----------------------------------------------------+ | "transport.get_write_buffer_size()" | Return high and low water marks for write flow | | | control. | +----------------------------------------------------+----------------------------------------------------+ | "transport.set_write_buffer_limits()" | Set new high and low water marks for write flow | | | control. | +----------------------------------------------------+----------------------------------------------------+ Transports returned by "loop.create_datagram_endpoint()": -[ Datagram Transports ]- +----------------------------------------------------+----------------------------------------------------+ | "transport.sendto()" | Send data to the remote peer. | +----------------------------------------------------+----------------------------------------------------+ | "transport.abort()" | Close the transport immediately. | +----------------------------------------------------+----------------------------------------------------+ Low-level transport abstraction over subprocesses. Returned by "loop.subprocess_exec()" and "loop.subprocess_shell()": -[ Subprocess Transports ]- +----------------------------------------------------+----------------------------------------------------+ | "transport.get_pid()" | Return the subprocess process id. | +----------------------------------------------------+----------------------------------------------------+ | "transport.get_pipe_transport()" | Return the transport for the requested | | | communication pipe (*stdin*, *stdout*, or | | | *stderr*). | +----------------------------------------------------+----------------------------------------------------+ | "transport.get_returncode()" | Return the subprocess return code. | +----------------------------------------------------+----------------------------------------------------+ | "transport.kill()" | Kill the subprocess. | +----------------------------------------------------+----------------------------------------------------+ | "transport.send_signal()" | Send a signal to the subprocess. | +----------------------------------------------------+----------------------------------------------------+ | "transport.terminate()" | Stop the subprocess. | +----------------------------------------------------+----------------------------------------------------+ | "transport.close()" | Kill the subprocess and close all pipes. | +----------------------------------------------------+----------------------------------------------------+ Protocols ========= Protocol classes can implement the following **callback methods**: +----------------------------------------------------+----------------------------------------------------+ | "callback" "connection_made()" | Called when a connection is made. | +----------------------------------------------------+----------------------------------------------------+ | "callback" "connection_lost()" | Called when the connection is lost or closed. | +----------------------------------------------------+----------------------------------------------------+ | "callback" "pause_writing()" | Called when the transport’s buffer goes over the | | | high water mark. | +----------------------------------------------------+----------------------------------------------------+ | "callback" "resume_writing()" | Called when the transport’s buffer drains below | | | the low water mark. | +----------------------------------------------------+----------------------------------------------------+ -[ Streaming Protocols (TCP, Unix Sockets, Pipes) ]- +----------------------------------------------------+----------------------------------------------------+ | "callback" "data_received()" | Called when some data is received. | +----------------------------------------------------+----------------------------------------------------+ | "callback" "eof_received()" | Called when an EOF is received. | +----------------------------------------------------+----------------------------------------------------+ -[ Buffered Streaming Protocols ]- +----------------------------------------------------+----------------------------------------------------+ | "callback" "get_buffer()" | Called to allocate a new receive buffer. | +----------------------------------------------------+----------------------------------------------------+ | "callback" "buffer_updated()" | Called when the buffer was updated with the | | | received data. | +----------------------------------------------------+----------------------------------------------------+ | "callback" "eof_received()" | Called when an EOF is received. | +----------------------------------------------------+----------------------------------------------------+ -[ Datagram Protocols ]- +----------------------------------------------------+----------------------------------------------------+ | "callback" "datagram_received()" | Called when a datagram is received. | +----------------------------------------------------+----------------------------------------------------+ | "callback" "error_received()" | Called when a previous send or receive operation | | | raises an "OSError". | +----------------------------------------------------+----------------------------------------------------+ -[ Subprocess Protocols ]- +----------------------------------------------------+----------------------------------------------------+ | "callback" "pipe_data_received()" | Called when the child process writes data into its | | | *stdout* or *stderr* pipe. | +----------------------------------------------------+----------------------------------------------------+ | "callback" "pipe_connection_lost()" | Called when one of the pipes communicating with | | | the child process is closed. | +----------------------------------------------------+----------------------------------------------------+ | "callback" "process_exited()" | Called when the child process has exited. | +----------------------------------------------------+----------------------------------------------------+ Event Loop Policies =================== Policies is a low-level mechanism to alter the behavior of functions like "asyncio.get_event_loop()". See also the main policies section for more details. -[ Accessing Policies ]- +----------------------------------------------------+----------------------------------------------------+ | "asyncio.get_event_loop_policy()" | Return the current process-wide policy. | +----------------------------------------------------+----------------------------------------------------+ | "asyncio.set_event_loop_policy()" | Set a new process-wide policy. | +----------------------------------------------------+----------------------------------------------------+ | "AbstractEventLoopPolicy" | Base class for policy objects. | +----------------------------------------------------+----------------------------------------------------+ Platform Support **************** The "asyncio" module is designed to be portable, but some platforms have subtle differences and limitations due to the platforms’ underlying architecture and capabilities. All Platforms ============= * "loop.add_reader()" and "loop.add_writer()" cannot be used to monitor file I/O. Windows ======= **Source code:** Lib/asyncio/proactor_events.py, Lib/asyncio/windows_events.py, Lib/asyncio/windows_utils.py ====================================================================== Changed in version 3.8: On Windows, "ProactorEventLoop" is now the default event loop. All event loops on Windows do not support the following methods: * "loop.create_unix_connection()" and "loop.create_unix_server()" are not supported. The "socket.AF_UNIX" socket family is specific to Unix. * "loop.add_signal_handler()" and "loop.remove_signal_handler()" are not supported. "SelectorEventLoop" has the following limitations: * "SelectSelector" is used to wait on socket events: it supports sockets and is limited to 512 sockets. * "loop.add_reader()" and "loop.add_writer()" only accept socket handles (e.g. pipe file descriptors are not supported). * Pipes are not supported, so the "loop.connect_read_pipe()" and "loop.connect_write_pipe()" methods are not implemented. * Subprocesses are not supported, i.e. "loop.subprocess_exec()" and "loop.subprocess_shell()" methods are not implemented. "ProactorEventLoop" has the following limitations: * The "loop.add_reader()" and "loop.add_writer()" methods are not supported. The resolution of the monotonic clock on Windows is usually around 15.6 msec. The best resolution is 0.5 msec. The resolution depends on the hardware (availability of HPET) and on the Windows configuration. Subprocess Support on Windows ----------------------------- On Windows, the default event loop "ProactorEventLoop" supports subprocesses, whereas "SelectorEventLoop" does not. The "policy.set_child_watcher()" function is also not supported, as "ProactorEventLoop" has a different mechanism to watch child processes. macOS ===== Modern macOS versions are fully supported. -[ macOS <= 10.8 ]- On macOS 10.6, 10.7 and 10.8, the default event loop uses "selectors.KqueueSelector", which does not support character devices on these versions. The "SelectorEventLoop" can be manually configured to use "SelectSelector" or "PollSelector" to support character devices on these older versions of macOS. Example: import asyncio import selectors selector = selectors.SelectSelector() loop = asyncio.SelectorEventLoop(selector) asyncio.set_event_loop(loop) Policies ******** An event loop policy is a global per-process object that controls the management of the event loop. Each event loop has a default policy, which can be changed and customized using the policy API. A policy defines the notion of *context* and manages a separate event loop per context. The default policy defines *context* to be the current thread. By using a custom event loop policy, the behavior of "get_event_loop()", "set_event_loop()", and "new_event_loop()" functions can be customized. Policy objects should implement the APIs defined in the "AbstractEventLoopPolicy" abstract base class. Getting and Setting the Policy ============================== The following functions can be used to get and set the policy for the current process: asyncio.get_event_loop_policy() Return the current process-wide policy. asyncio.set_event_loop_policy(policy) Set the current process-wide policy to *policy*. If *policy* is set to "None", the default policy is restored. Policy Objects ============== The abstract event loop policy base class is defined as follows: class asyncio.AbstractEventLoopPolicy An abstract base class for asyncio policies. get_event_loop() Get the event loop for the current context. Return an event loop object implementing the "AbstractEventLoop" interface. This method should never return "None". Changed in version 3.6. set_event_loop(loop) Set the event loop for the current context to *loop*. new_event_loop() Create and return a new event loop object. This method should never return "None". get_child_watcher() Get a child process watcher object. Return a watcher object implementing the "AbstractChildWatcher" interface. This function is Unix specific. set_child_watcher(watcher) Set the current child process watcher to *watcher*. This function is Unix specific. asyncio ships with the following built-in policies: class asyncio.DefaultEventLoopPolicy The default asyncio policy. Uses "SelectorEventLoop" on Unix and "ProactorEventLoop" on Windows. There is no need to install the default policy manually. asyncio is configured to use the default policy automatically. Changed in version 3.8: On Windows, "ProactorEventLoop" is now used by default. class asyncio.WindowsSelectorEventLoopPolicy An alternative event loop policy that uses the "SelectorEventLoop" event loop implementation. Availability: Windows. class asyncio.WindowsProactorEventLoopPolicy An alternative event loop policy that uses the "ProactorEventLoop" event loop implementation. Availability: Windows. Process Watchers ================ A process watcher allows customization of how an event loop monitors child processes on Unix. Specifically, the event loop needs to know when a child process has exited. In asyncio, child processes are created with "create_subprocess_exec()" and "loop.subprocess_exec()" functions. asyncio defines the "AbstractChildWatcher" abstract base class, which child watchers should implement, and has four different implementations: "ThreadedChildWatcher" (configured to be used by default), "MultiLoopChildWatcher", "SafeChildWatcher", and "FastChildWatcher". See also the Subprocess and Threads section. The following two functions can be used to customize the child process watcher implementation used by the asyncio event loop: asyncio.get_child_watcher() Return the current child watcher for the current policy. asyncio.set_child_watcher(watcher) Set the current child watcher to *watcher* for the current policy. *watcher* must implement methods defined in the "AbstractChildWatcher" base class. Note: Third-party event loops implementations might not support custom child watchers. For such event loops, using "set_child_watcher()" might be prohibited or have no effect. class asyncio.AbstractChildWatcher add_child_handler(pid, callback, *args) Register a new child handler. Arrange for "callback(pid, returncode, *args)" to be called when a process with PID equal to *pid* terminates. Specifying another callback for the same process replaces the previous handler. The *callback* callable must be thread-safe. remove_child_handler(pid) Removes the handler for process with PID equal to *pid*. The function returns "True" if the handler was successfully removed, "False" if there was nothing to remove. attach_loop(loop) Attach the watcher to an event loop. If the watcher was previously attached to an event loop, then it is first detached before attaching to the new loop. Note: loop may be "None". is_active() Return "True" if the watcher is ready to use. Spawning a subprocess with *inactive* current child watcher raises "RuntimeError". New in version 3.8. close() Close the watcher. This method has to be called to ensure that underlying resources are cleaned-up. class asyncio.ThreadedChildWatcher This implementation starts a new waiting thread for every subprocess spawn. It works reliably even when the asyncio event loop is run in a non- main OS thread. There is no noticeable overhead when handling a big number of children (*O(1)* each time a child terminates), but starting a thread per process requires extra memory. This watcher is used by default. New in version 3.8. class asyncio.MultiLoopChildWatcher This implementation registers a "SIGCHLD" signal handler on instantiation. That can break third-party code that installs a custom handler for "SIGCHLD" signal. The watcher avoids disrupting other code spawning processes by polling every process explicitly on a "SIGCHLD" signal. There is no limitation for running subprocesses from different threads once the watcher is installed. The solution is safe but it has a significant overhead when handling a big number of processes (*O(n)* each time a "SIGCHLD" is received). New in version 3.8. class asyncio.SafeChildWatcher This implementation uses active event loop from the main thread to handle "SIGCHLD" signal. If the main thread has no running event loop another thread cannot spawn a subprocess ("RuntimeError" is raised). The watcher avoids disrupting other code spawning processes by polling every process explicitly on a "SIGCHLD" signal. This solution is as safe as "MultiLoopChildWatcher" and has the same *O(N)* complexity but requires a running event loop in the main thread to work. class asyncio.FastChildWatcher This implementation reaps every terminated processes by calling "os.waitpid(-1)" directly, possibly breaking other code spawning processes and waiting for their termination. There is no noticeable overhead when handling a big number of children (*O(1)* each time a child terminates). This solution requires a running event loop in the main thread to work, as "SafeChildWatcher". class asyncio.PidfdChildWatcher This implementation polls process file descriptors (pidfds) to await child process termination. In some respects, "PidfdChildWatcher" is a “Goldilocks” child watcher implementation. It doesn’t require signals or threads, doesn’t interfere with any processes launched outside the event loop, and scales linearly with the number of subprocesses launched by the event loop. The main disadvantage is that pidfds are specific to Linux, and only work on recent (5.3+) kernels. New in version 3.9. Custom Policies =============== To implement a new event loop policy, it is recommended to subclass "DefaultEventLoopPolicy" and override the methods for which custom behavior is wanted, e.g.: class MyEventLoopPolicy(asyncio.DefaultEventLoopPolicy): def get_event_loop(self): """Get the event loop. This may be None or an instance of EventLoop. """ loop = super().get_event_loop() # Do something with loop ... return loop asyncio.set_event_loop_policy(MyEventLoopPolicy()) Transports and Protocols ************************ -[ Preface ]- Transports and Protocols are used by the **low-level** event loop APIs such as "loop.create_connection()". They use callback-based programming style and enable high-performance implementations of network or IPC protocols (e.g. HTTP). Essentially, transports and protocols should only be used in libraries and frameworks and never in high-level asyncio applications. This documentation page covers both Transports and Protocols. -[ Introduction ]- At the highest level, the transport is concerned with *how* bytes are transmitted, while the protocol determines *which* bytes to transmit (and to some extent when). A different way of saying the same thing: a transport is an abstraction for a socket (or similar I/O endpoint) while a protocol is an abstraction for an application, from the transport’s point of view. Yet another view is the transport and protocol interfaces together define an abstract interface for using network I/O and interprocess I/O. There is always a 1:1 relationship between transport and protocol objects: the protocol calls transport methods to send data, while the transport calls protocol methods to pass it data that has been received. Most of connection oriented event loop methods (such as "loop.create_connection()") usually accept a *protocol_factory* argument used to create a *Protocol* object for an accepted connection, represented by a *Transport* object. Such methods usually return a tuple of "(transport, protocol)". -[ Contents ]- This documentation page contains the following sections: * The Transports section documents asyncio "BaseTransport", "ReadTransport", "WriteTransport", "Transport", "DatagramTransport", and "SubprocessTransport" classes. * The Protocols section documents asyncio "BaseProtocol", "Protocol", "BufferedProtocol", "DatagramProtocol", and "SubprocessProtocol" classes. * The Examples section showcases how to work with transports, protocols, and low-level event loop APIs. Transports ========== **Source code:** Lib/asyncio/transports.py ====================================================================== Transports are classes provided by "asyncio" in order to abstract various kinds of communication channels. Transport objects are always instantiated by an asyncio event loop. asyncio implements transports for TCP, UDP, SSL, and subprocess pipes. The methods available on a transport depend on the transport’s kind. The transport classes are not thread safe. Transports Hierarchy -------------------- class asyncio.BaseTransport Base class for all transports. Contains methods that all asyncio transports share. class asyncio.WriteTransport(BaseTransport) A base transport for write-only connections. Instances of the *WriteTransport* class are returned from the "loop.connect_write_pipe()" event loop method and are also used by subprocess-related methods like "loop.subprocess_exec()". class asyncio.ReadTransport(BaseTransport) A base transport for read-only connections. Instances of the *ReadTransport* class are returned from the "loop.connect_read_pipe()" event loop method and are also used by subprocess-related methods like "loop.subprocess_exec()". class asyncio.Transport(WriteTransport, ReadTransport) Interface representing a bidirectional transport, such as a TCP connection. The user does not instantiate a transport directly; they call a utility function, passing it a protocol factory and other information necessary to create the transport and protocol. Instances of the *Transport* class are returned from or used by event loop methods like "loop.create_connection()", "loop.create_unix_connection()", "loop.create_server()", "loop.sendfile()", etc. class asyncio.DatagramTransport(BaseTransport) A transport for datagram (UDP) connections. Instances of the *DatagramTransport* class are returned from the "loop.create_datagram_endpoint()" event loop method. class asyncio.SubprocessTransport(BaseTransport) An abstraction to represent a connection between a parent and its child OS process. Instances of the *SubprocessTransport* class are returned from event loop methods "loop.subprocess_shell()" and "loop.subprocess_exec()". Base Transport -------------- BaseTransport.close() Close the transport. If the transport has a buffer for outgoing data, buffered data will be flushed asynchronously. No more data will be received. After all buffered data is flushed, the protocol’s "protocol.connection_lost()" method will be called with "None" as its argument. BaseTransport.is_closing() Return "True" if the transport is closing or is closed. BaseTransport.get_extra_info(name, default=None) Return information about the transport or underlying resources it uses. *name* is a string representing the piece of transport-specific information to get. *default* is the value to return if the information is not available, or if the transport does not support querying it with the given third-party event loop implementation or on the current platform. For example, the following code attempts to get the underlying socket object of the transport: sock = transport.get_extra_info('socket') if sock is not None: print(sock.getsockopt(...)) Categories of information that can be queried on some transports: * socket: * "'peername'": the remote address to which the socket is connected, result of "socket.socket.getpeername()" ("None" on error) * "'socket'": "socket.socket" instance * "'sockname'": the socket’s own address, result of "socket.socket.getsockname()" * SSL socket: * "'compression'": the compression algorithm being used as a string, or "None" if the connection isn’t compressed; result of "ssl.SSLSocket.compression()" * "'cipher'": a three-value tuple containing the name of the cipher being used, the version of the SSL protocol that defines its use, and the number of secret bits being used; result of "ssl.SSLSocket.cipher()" * "'peercert'": peer certificate; result of "ssl.SSLSocket.getpeercert()" * "'sslcontext'": "ssl.SSLContext" instance * "'ssl_object'": "ssl.SSLObject" or "ssl.SSLSocket" instance * pipe: * "'pipe'": pipe object * subprocess: * "'subprocess'": "subprocess.Popen" instance BaseTransport.set_protocol(protocol) Set a new protocol. Switching protocol should only be done when both protocols are documented to support the switch. BaseTransport.get_protocol() Return the current protocol. Read-only Transports -------------------- ReadTransport.is_reading() Return "True" if the transport is receiving new data. New in version 3.7. ReadTransport.pause_reading() Pause the receiving end of the transport. No data will be passed to the protocol’s "protocol.data_received()" method until "resume_reading()" is called. Changed in version 3.7: The method is idempotent, i.e. it can be called when the transport is already paused or closed. ReadTransport.resume_reading() Resume the receiving end. The protocol’s "protocol.data_received()" method will be called once again if some data is available for reading. Changed in version 3.7: The method is idempotent, i.e. it can be called when the transport is already reading. Write-only Transports --------------------- WriteTransport.abort() Close the transport immediately, without waiting for pending operations to complete. Buffered data will be lost. No more data will be received. The protocol’s "protocol.connection_lost()" method will eventually be called with "None" as its argument. WriteTransport.can_write_eof() Return "True" if the transport supports "write_eof()", "False" if not. WriteTransport.get_write_buffer_size() Return the current size of the output buffer used by the transport. WriteTransport.get_write_buffer_limits() Get the *high* and *low* watermarks for write flow control. Return a tuple "(low, high)" where *low* and *high* are positive number of bytes. Use "set_write_buffer_limits()" to set the limits. New in version 3.4.2. WriteTransport.set_write_buffer_limits(high=None, low=None) Set the *high* and *low* watermarks for write flow control. These two values (measured in number of bytes) control when the protocol’s "protocol.pause_writing()" and "protocol.resume_writing()" methods are called. If specified, the low watermark must be less than or equal to the high watermark. Neither *high* nor *low* can be negative. "pause_writing()" is called when the buffer size becomes greater than or equal to the *high* value. If writing has been paused, "resume_writing()" is called when the buffer size becomes less than or equal to the *low* value. The defaults are implementation-specific. If only the high watermark is given, the low watermark defaults to an implementation-specific value less than or equal to the high watermark. Setting *high* to zero forces *low* to zero as well, and causes "pause_writing()" to be called whenever the buffer becomes non-empty. Setting *low* to zero causes "resume_writing()" to be called only once the buffer is empty. Use of zero for either limit is generally sub-optimal as it reduces opportunities for doing I/O and computation concurrently. Use "get_write_buffer_limits()" to get the limits. WriteTransport.write(data) Write some *data* bytes to the transport. This method does not block; it buffers the data and arranges for it to be sent out asynchronously. WriteTransport.writelines(list_of_data) Write a list (or any iterable) of data bytes to the transport. This is functionally equivalent to calling "write()" on each element yielded by the iterable, but may be implemented more efficiently. WriteTransport.write_eof() Close the write end of the transport after flushing all buffered data. Data may still be received. This method can raise "NotImplementedError" if the transport (e.g. SSL) doesn’t support half-closed connections. Datagram Transports ------------------- DatagramTransport.sendto(data, addr=None) Send the *data* bytes to the remote peer given by *addr* (a transport-dependent target address). If *addr* is "None", the data is sent to the target address given on transport creation. This method does not block; it buffers the data and arranges for it to be sent out asynchronously. DatagramTransport.abort() Close the transport immediately, without waiting for pending operations to complete. Buffered data will be lost. No more data will be received. The protocol’s "protocol.connection_lost()" method will eventually be called with "None" as its argument. Subprocess Transports --------------------- SubprocessTransport.get_pid() Return the subprocess process id as an integer. SubprocessTransport.get_pipe_transport(fd) Return the transport for the communication pipe corresponding to the integer file descriptor *fd*: * "0": readable streaming transport of the standard input (*stdin*), or "None" if the subprocess was not created with "stdin=PIPE" * "1": writable streaming transport of the standard output (*stdout*), or "None" if the subprocess was not created with "stdout=PIPE" * "2": writable streaming transport of the standard error (*stderr*), or "None" if the subprocess was not created with "stderr=PIPE" * other *fd*: "None" SubprocessTransport.get_returncode() Return the subprocess return code as an integer or "None" if it hasn’t returned, which is similar to the "subprocess.Popen.returncode" attribute. SubprocessTransport.kill() Kill the subprocess. On POSIX systems, the function sends SIGKILL to the subprocess. On Windows, this method is an alias for "terminate()". See also "subprocess.Popen.kill()". SubprocessTransport.send_signal(signal) Send the *signal* number to the subprocess, as in "subprocess.Popen.send_signal()". SubprocessTransport.terminate() Stop the subprocess. On POSIX systems, this method sends SIGTERM to the subprocess. On Windows, the Windows API function TerminateProcess() is called to stop the subprocess. See also "subprocess.Popen.terminate()". SubprocessTransport.close() Kill the subprocess by calling the "kill()" method. If the subprocess hasn’t returned yet, and close transports of *stdin*, *stdout*, and *stderr* pipes. Protocols ========= **Source code:** Lib/asyncio/protocols.py ====================================================================== asyncio provides a set of abstract base classes that should be used to implement network protocols. Those classes are meant to be used together with transports. Subclasses of abstract base protocol classes may implement some or all methods. All these methods are callbacks: they are called by transports on certain events, for example when some data is received. A base protocol method should be called by the corresponding transport. Base Protocols -------------- class asyncio.BaseProtocol Base protocol with methods that all protocols share. class asyncio.Protocol(BaseProtocol) The base class for implementing streaming protocols (TCP, Unix sockets, etc). class asyncio.BufferedProtocol(BaseProtocol) A base class for implementing streaming protocols with manual control of the receive buffer. class asyncio.DatagramProtocol(BaseProtocol) The base class for implementing datagram (UDP) protocols. class asyncio.SubprocessProtocol(BaseProtocol) The base class for implementing protocols communicating with child processes (unidirectional pipes). Base Protocol ------------- All asyncio protocols can implement Base Protocol callbacks. -[ Connection Callbacks ]- Connection callbacks are called on all protocols, exactly once per a successful connection. All other protocol callbacks can only be called between those two methods. BaseProtocol.connection_made(transport) Called when a connection is made. The *transport* argument is the transport representing the connection. The protocol is responsible for storing the reference to its transport. BaseProtocol.connection_lost(exc) Called when the connection is lost or closed. The argument is either an exception object or "None". The latter means a regular EOF is received, or the connection was aborted or closed by this side of the connection. -[ Flow Control Callbacks ]- Flow control callbacks can be called by transports to pause or resume writing performed by the protocol. See the documentation of the "set_write_buffer_limits()" method for more details. BaseProtocol.pause_writing() Called when the transport’s buffer goes over the high watermark. BaseProtocol.resume_writing() Called when the transport’s buffer drains below the low watermark. If the buffer size equals the high watermark, "pause_writing()" is not called: the buffer size must go strictly over. Conversely, "resume_writing()" is called when the buffer size is equal or lower than the low watermark. These end conditions are important to ensure that things go as expected when either mark is zero. Streaming Protocols ------------------- Event methods, such as "loop.create_server()", "loop.create_unix_server()", "loop.create_connection()", "loop.create_unix_connection()", "loop.connect_accepted_socket()", "loop.connect_read_pipe()", and "loop.connect_write_pipe()" accept factories that return streaming protocols. Protocol.data_received(data) Called when some data is received. *data* is a non-empty bytes object containing the incoming data. Whether the data is buffered, chunked or reassembled depends on the transport. In general, you shouldn’t rely on specific semantics and instead make your parsing generic and flexible. However, data is always received in the correct order. The method can be called an arbitrary number of times while a connection is open. However, "protocol.eof_received()" is called at most once. Once *eof_received()* is called, "data_received()" is not called anymore. Protocol.eof_received() Called when the other end signals it won’t send any more data (for example by calling "transport.write_eof()", if the other end also uses asyncio). This method may return a false value (including "None"), in which case the transport will close itself. Conversely, if this method returns a true value, the protocol used determines whether to close the transport. Since the default implementation returns "None", it implicitly closes the connection. Some transports, including SSL, don’t support half-closed connections, in which case returning true from this method will result in the connection being closed. State machine: start -> connection_made [-> data_received]* [-> eof_received]? -> connection_lost -> end Buffered Streaming Protocols ---------------------------- New in version 3.7. Buffered Protocols can be used with any event loop method that supports Streaming Protocols. "BufferedProtocol" implementations allow explicit manual allocation and control of the receive buffer. Event loops can then use the buffer provided by the protocol to avoid unnecessary data copies. This can result in noticeable performance improvement for protocols that receive big amounts of data. Sophisticated protocol implementations can significantly reduce the number of buffer allocations. The following callbacks are called on "BufferedProtocol" instances: BufferedProtocol.get_buffer(sizehint) Called to allocate a new receive buffer. *sizehint* is the recommended minimum size for the returned buffer. It is acceptable to return smaller or larger buffers than what *sizehint* suggests. When set to -1, the buffer size can be arbitrary. It is an error to return a buffer with a zero size. "get_buffer()" must return an object implementing the buffer protocol. BufferedProtocol.buffer_updated(nbytes) Called when the buffer was updated with the received data. *nbytes* is the total number of bytes that were written to the buffer. BufferedProtocol.eof_received() See the documentation of the "protocol.eof_received()" method. "get_buffer()" can be called an arbitrary number of times during a connection. However, "protocol.eof_received()" is called at most once and, if called, "get_buffer()" and "buffer_updated()" won’t be called after it. State machine: start -> connection_made [-> get_buffer [-> buffer_updated]? ]* [-> eof_received]? -> connection_lost -> end Datagram Protocols ------------------ Datagram Protocol instances should be constructed by protocol factories passed to the "loop.create_datagram_endpoint()" method. DatagramProtocol.datagram_received(data, addr) Called when a datagram is received. *data* is a bytes object containing the incoming data. *addr* is the address of the peer sending the data; the exact format depends on the transport. DatagramProtocol.error_received(exc) Called when a previous send or receive operation raises an "OSError". *exc* is the "OSError" instance. This method is called in rare conditions, when the transport (e.g. UDP) detects that a datagram could not be delivered to its recipient. In many conditions though, undeliverable datagrams will be silently dropped. Note: On BSD systems (macOS, FreeBSD, etc.) flow control is not supported for datagram protocols, because there is no reliable way to detect send failures caused by writing too many packets.The socket always appears ‘ready’ and excess packets are dropped. An "OSError" with "errno" set to "errno.ENOBUFS" may or may not be raised; if it is raised, it will be reported to "DatagramProtocol.error_received()" but otherwise ignored. Subprocess Protocols -------------------- Subprocess Protocol instances should be constructed by protocol factories passed to the "loop.subprocess_exec()" and "loop.subprocess_shell()" methods. SubprocessProtocol.pipe_data_received(fd, data) Called when the child process writes data into its stdout or stderr pipe. *fd* is the integer file descriptor of the pipe. *data* is a non-empty bytes object containing the received data. SubprocessProtocol.pipe_connection_lost(fd, exc) Called when one of the pipes communicating with the child process is closed. *fd* is the integer file descriptor that was closed. SubprocessProtocol.process_exited() Called when the child process has exited. Examples ======== TCP Echo Server --------------- Create a TCP echo server using the "loop.create_server()" method, send back received data, and close the connection: import asyncio class EchoServerProtocol(asyncio.Protocol): def connection_made(self, transport): peername = transport.get_extra_info('peername') print('Connection from {}'.format(peername)) self.transport = transport def data_received(self, data): message = data.decode() print('Data received: {!r}'.format(message)) print('Send: {!r}'.format(message)) self.transport.write(data) print('Close the client socket') self.transport.close() async def main(): # Get a reference to the event loop as we plan to use # low-level APIs. loop = asyncio.get_running_loop() server = await loop.create_server( lambda: EchoServerProtocol(), '127.0.0.1', 8888) async with server: await server.serve_forever() asyncio.run(main()) See also: The TCP echo server using streams example uses the high-level "asyncio.start_server()" function. TCP Echo Client --------------- A TCP echo client using the "loop.create_connection()" method, sends data, and waits until the connection is closed: import asyncio class EchoClientProtocol(asyncio.Protocol): def __init__(self, message, on_con_lost): self.message = message self.on_con_lost = on_con_lost def connection_made(self, transport): transport.write(self.message.encode()) print('Data sent: {!r}'.format(self.message)) def data_received(self, data): print('Data received: {!r}'.format(data.decode())) def connection_lost(self, exc): print('The server closed the connection') self.on_con_lost.set_result(True) async def main(): # Get a reference to the event loop as we plan to use # low-level APIs. loop = asyncio.get_running_loop() on_con_lost = loop.create_future() message = 'Hello World!' transport, protocol = await loop.create_connection( lambda: EchoClientProtocol(message, on_con_lost), '127.0.0.1', 8888) # Wait until the protocol signals that the connection # is lost and close the transport. try: await on_con_lost finally: transport.close() asyncio.run(main()) See also: The TCP echo client using streams example uses the high-level "asyncio.open_connection()" function. UDP Echo Server --------------- A UDP echo server, using the "loop.create_datagram_endpoint()" method, sends back received data: import asyncio class EchoServerProtocol: def connection_made(self, transport): self.transport = transport def datagram_received(self, data, addr): message = data.decode() print('Received %r from %s' % (message, addr)) print('Send %r to %s' % (message, addr)) self.transport.sendto(data, addr) async def main(): print("Starting UDP server") # Get a reference to the event loop as we plan to use # low-level APIs. loop = asyncio.get_running_loop() # One protocol instance will be created to serve all # client requests. transport, protocol = await loop.create_datagram_endpoint( lambda: EchoServerProtocol(), local_addr=('127.0.0.1', 9999)) try: await asyncio.sleep(3600) # Serve for 1 hour. finally: transport.close() asyncio.run(main()) UDP Echo Client --------------- A UDP echo client, using the "loop.create_datagram_endpoint()" method, sends data and closes the transport when it receives the answer: import asyncio class EchoClientProtocol: def __init__(self, message, on_con_lost): self.message = message self.on_con_lost = on_con_lost self.transport = None def connection_made(self, transport): self.transport = transport print('Send:', self.message) self.transport.sendto(self.message.encode()) def datagram_received(self, data, addr): print("Received:", data.decode()) print("Close the socket") self.transport.close() def error_received(self, exc): print('Error received:', exc) def connection_lost(self, exc): print("Connection closed") self.on_con_lost.set_result(True) async def main(): # Get a reference to the event loop as we plan to use # low-level APIs. loop = asyncio.get_running_loop() on_con_lost = loop.create_future() message = "Hello World!" transport, protocol = await loop.create_datagram_endpoint( lambda: EchoClientProtocol(message, on_con_lost), remote_addr=('127.0.0.1', 9999)) try: await on_con_lost finally: transport.close() asyncio.run(main()) Connecting Existing Sockets --------------------------- Wait until a socket receives data using the "loop.create_connection()" method with a protocol: import asyncio import socket class MyProtocol(asyncio.Protocol): def __init__(self, on_con_lost): self.transport = None self.on_con_lost = on_con_lost def connection_made(self, transport): self.transport = transport def data_received(self, data): print("Received:", data.decode()) # We are done: close the transport; # connection_lost() will be called automatically. self.transport.close() def connection_lost(self, exc): # The socket has been closed self.on_con_lost.set_result(True) async def main(): # Get a reference to the event loop as we plan to use # low-level APIs. loop = asyncio.get_running_loop() on_con_lost = loop.create_future() # Create a pair of connected sockets rsock, wsock = socket.socketpair() # Register the socket to wait for data. transport, protocol = await loop.create_connection( lambda: MyProtocol(on_con_lost), sock=rsock) # Simulate the reception of data from the network. loop.call_soon(wsock.send, 'abc'.encode()) try: await protocol.on_con_lost finally: transport.close() wsock.close() asyncio.run(main()) See also: The watch a file descriptor for read events example uses the low- level "loop.add_reader()" method to register an FD. The register an open socket to wait for data using streams example uses high-level streams created by the "open_connection()" function in a coroutine. loop.subprocess_exec() and SubprocessProtocol --------------------------------------------- An example of a subprocess protocol used to get the output of a subprocess and to wait for the subprocess exit. The subprocess is created by the "loop.subprocess_exec()" method: import asyncio import sys class DateProtocol(asyncio.SubprocessProtocol): def __init__(self, exit_future): self.exit_future = exit_future self.output = bytearray() def pipe_data_received(self, fd, data): self.output.extend(data) def process_exited(self): self.exit_future.set_result(True) async def get_date(): # Get a reference to the event loop as we plan to use # low-level APIs. loop = asyncio.get_running_loop() code = 'import datetime; print(datetime.datetime.now())' exit_future = asyncio.Future(loop=loop) # Create the subprocess controlled by DateProtocol; # redirect the standard output into a pipe. transport, protocol = await loop.subprocess_exec( lambda: DateProtocol(exit_future), sys.executable, '-c', code, stdin=None, stderr=None) # Wait for the subprocess exit using the process_exited() # method of the protocol. await exit_future # Close the stdout pipe. transport.close() # Read the output which was collected by the # pipe_data_received() method of the protocol. data = bytes(protocol.output) return data.decode('ascii').rstrip() date = asyncio.run(get_date()) print(f"Current date: {date}") See also the same example written using high-level APIs. Queues ****** **Source code:** Lib/asyncio/queues.py ====================================================================== asyncio queues are designed to be similar to classes of the "queue" module. Although asyncio queues are not thread-safe, they are designed to be used specifically in async/await code. Note that methods of asyncio queues don’t have a *timeout* parameter; use "asyncio.wait_for()" function to do queue operations with a timeout. See also the Examples section below. Queue ===== class asyncio.Queue(maxsize=0, *, loop=None) A first in, first out (FIFO) queue. If *maxsize* is less than or equal to zero, the queue size is infinite. If it is an integer greater than "0", then "await put()" blocks when the queue reaches *maxsize* until an item is removed by "get()". Unlike the standard library threading "queue", the size of the queue is always known and can be returned by calling the "qsize()" method. Deprecated since version 3.8, will be removed in version 3.10: The *loop* parameter. This class is not thread safe. maxsize Number of items allowed in the queue. empty() Return "True" if the queue is empty, "False" otherwise. full() Return "True" if there are "maxsize" items in the queue. If the queue was initialized with "maxsize=0" (the default), then "full()" never returns "True". coroutine get() Remove and return an item from the queue. If queue is empty, wait until an item is available. get_nowait() Return an item if one is immediately available, else raise "QueueEmpty". coroutine join() Block until all items in the queue have been received and processed. The count of unfinished tasks goes up whenever an item is added to the queue. The count goes down whenever a consumer coroutine calls "task_done()" to indicate that the item was retrieved and all work on it is complete. When the count of unfinished tasks drops to zero, "join()" unblocks. coroutine put(item) Put an item into the queue. If the queue is full, wait until a free slot is available before adding the item. put_nowait(item) Put an item into the queue without blocking. If no free slot is immediately available, raise "QueueFull". qsize() Return the number of items in the queue. task_done() Indicate that a formerly enqueued task is complete. Used by queue consumers. For each "get()" used to fetch a task, a subsequent call to "task_done()" tells the queue that the processing on the task is complete. If a "join()" is currently blocking, it will resume when all items have been processed (meaning that a "task_done()" call was received for every item that had been "put()" into the queue). Raises "ValueError" if called more times than there were items placed in the queue. Priority Queue ============== class asyncio.PriorityQueue A variant of "Queue"; retrieves entries in priority order (lowest first). Entries are typically tuples of the form "(priority_number, data)". LIFO Queue ========== class asyncio.LifoQueue A variant of "Queue" that retrieves most recently added entries first (last in, first out). Exceptions ========== exception asyncio.QueueEmpty This exception is raised when the "get_nowait()" method is called on an empty queue. exception asyncio.QueueFull Exception raised when the "put_nowait()" method is called on a queue that has reached its *maxsize*. Examples ======== Queues can be used to distribute workload between several concurrent tasks: import asyncio import random import time async def worker(name, queue): while True: # Get a "work item" out of the queue. sleep_for = await queue.get() # Sleep for the "sleep_for" seconds. await asyncio.sleep(sleep_for) # Notify the queue that the "work item" has been processed. queue.task_done() print(f'{name} has slept for {sleep_for:.2f} seconds') async def main(): # Create a queue that we will use to store our "workload". queue = asyncio.Queue() # Generate random timings and put them into the queue. total_sleep_time = 0 for _ in range(20): sleep_for = random.uniform(0.05, 1.0) total_sleep_time += sleep_for queue.put_nowait(sleep_for) # Create three worker tasks to process the queue concurrently. tasks = [] for i in range(3): task = asyncio.create_task(worker(f'worker-{i}', queue)) tasks.append(task) # Wait until the queue is fully processed. started_at = time.monotonic() await queue.join() total_slept_for = time.monotonic() - started_at # Cancel our worker tasks. for task in tasks: task.cancel() # Wait until all worker tasks are cancelled. await asyncio.gather(*tasks, return_exceptions=True) print('====') print(f'3 workers slept in parallel for {total_slept_for:.2f} seconds') print(f'total expected sleep time: {total_sleep_time:.2f} seconds') asyncio.run(main()) Streams ******* **Source code:** Lib/asyncio/streams.py ====================================================================== Streams are high-level async/await-ready primitives to work with network connections. Streams allow sending and receiving data without using callbacks or low-level protocols and transports. Here is an example of a TCP echo client written using asyncio streams: import asyncio async def tcp_echo_client(message): reader, writer = await asyncio.open_connection( '127.0.0.1', 8888) print(f'Send: {message!r}') writer.write(message.encode()) await writer.drain() data = await reader.read(100) print(f'Received: {data.decode()!r}') print('Close the connection') writer.close() await writer.wait_closed() asyncio.run(tcp_echo_client('Hello World!')) See also the Examples section below. -[ Stream Functions ]- The following top-level asyncio functions can be used to create and work with streams: coroutine asyncio.open_connection(host=None, port=None, *, loop=None, limit=None, ssl=None, family=0, proto=0, flags=0, sock=None, local_addr=None, server_hostname=None, ssl_handshake_timeout=None, happy_eyeballs_delay=None, interleave=None) Establish a network connection and return a pair of "(reader, writer)" objects. The returned *reader* and *writer* objects are instances of "StreamReader" and "StreamWriter" classes. The *loop* argument is optional and can always be determined automatically when this function is awaited from a coroutine. *limit* determines the buffer size limit used by the returned "StreamReader" instance. By default the *limit* is set to 64 KiB. The rest of the arguments are passed directly to "loop.create_connection()". New in version 3.7: The *ssl_handshake_timeout* parameter. New in version 3.8: Added *happy_eyeballs_delay* and *interleave* parameters. coroutine asyncio.start_server(client_connected_cb, host=None, port=None, *, loop=None, limit=None, family=socket.AF_UNSPEC, flags=socket.AI_PASSIVE, sock=None, backlog=100, ssl=None, reuse_address=None, reuse_port=None, ssl_handshake_timeout=None, start_serving=True) Start a socket server. The *client_connected_cb* callback is called whenever a new client connection is established. It receives a "(reader, writer)" pair as two arguments, instances of the "StreamReader" and "StreamWriter" classes. *client_connected_cb* can be a plain callable or a coroutine function; if it is a coroutine function, it will be automatically scheduled as a "Task". The *loop* argument is optional and can always be determined automatically when this method is awaited from a coroutine. *limit* determines the buffer size limit used by the returned "StreamReader" instance. By default the *limit* is set to 64 KiB. The rest of the arguments are passed directly to "loop.create_server()". New in version 3.7: The *ssl_handshake_timeout* and *start_serving* parameters. -[ Unix Sockets ]- coroutine asyncio.open_unix_connection(path=None, *, loop=None, limit=None, ssl=None, sock=None, server_hostname=None, ssl_handshake_timeout=None) Establish a Unix socket connection and return a pair of "(reader, writer)". Similar to "open_connection()" but operates on Unix sockets. See also the documentation of "loop.create_unix_connection()". Availability: Unix. New in version 3.7: The *ssl_handshake_timeout* parameter. Changed in version 3.7: The *path* parameter can now be a *path- like object* coroutine asyncio.start_unix_server(client_connected_cb, path=None, *, loop=None, limit=None, sock=None, backlog=100, ssl=None, ssl_handshake_timeout=None, start_serving=True) Start a Unix socket server. Similar to "start_server()" but works with Unix sockets. See also the documentation of "loop.create_unix_server()". Availability: Unix. New in version 3.7: The *ssl_handshake_timeout* and *start_serving* parameters. Changed in version 3.7: The *path* parameter can now be a *path- like object*. StreamReader ============ class asyncio.StreamReader Represents a reader object that provides APIs to read data from the IO stream. It is not recommended to instantiate *StreamReader* objects directly; use "open_connection()" and "start_server()" instead. coroutine read(n=-1) Read up to *n* bytes. If *n* is not provided, or set to "-1", read until EOF and return all read bytes. If EOF was received and the internal buffer is empty, return an empty "bytes" object. coroutine readline() Read one line, where “line” is a sequence of bytes ending with "\n". If EOF is received and "\n" was not found, the method returns partially read data. If EOF is received and the internal buffer is empty, return an empty "bytes" object. coroutine readexactly(n) Read exactly *n* bytes. Raise an "IncompleteReadError" if EOF is reached before *n* can be read. Use the "IncompleteReadError.partial" attribute to get the partially read data. coroutine readuntil(separator=b'\n') Read data from the stream until *separator* is found. On success, the data and separator will be removed from the internal buffer (consumed). Returned data will include the separator at the end. If the amount of data read exceeds the configured stream limit, a "LimitOverrunError" exception is raised, and the data is left in the internal buffer and can be read again. If EOF is reached before the complete separator is found, an "IncompleteReadError" exception is raised, and the internal buffer is reset. The "IncompleteReadError.partial" attribute may contain a portion of the separator. New in version 3.5.2. at_eof() Return "True" if the buffer is empty and "feed_eof()" was called. StreamWriter ============ class asyncio.StreamWriter Represents a writer object that provides APIs to write data to the IO stream. It is not recommended to instantiate *StreamWriter* objects directly; use "open_connection()" and "start_server()" instead. write(data) The method attempts to write the *data* to the underlying socket immediately. If that fails, the data is queued in an internal write buffer until it can be sent. The method should be used along with the "drain()" method: stream.write(data) await stream.drain() writelines(data) The method writes a list (or any iterable) of bytes to the underlying socket immediately. If that fails, the data is queued in an internal write buffer until it can be sent. The method should be used along with the "drain()" method: stream.writelines(lines) await stream.drain() close() The method closes the stream and the underlying socket. The method should be used along with the "wait_closed()" method: stream.close() await stream.wait_closed() can_write_eof() Return "True" if the underlying transport supports the "write_eof()" method, "False" otherwise. write_eof() Close the write end of the stream after the buffered write data is flushed. transport Return the underlying asyncio transport. get_extra_info(name, default=None) Access optional transport information; see "BaseTransport.get_extra_info()" for details. coroutine drain() Wait until it is appropriate to resume writing to the stream. Example: writer.write(data) await writer.drain() This is a flow control method that interacts with the underlying IO write buffer. When the size of the buffer reaches the high watermark, *drain()* blocks until the size of the buffer is drained down to the low watermark and writing can be resumed. When there is nothing to wait for, the "drain()" returns immediately. is_closing() Return "True" if the stream is closed or in the process of being closed. New in version 3.7. coroutine wait_closed() Wait until the stream is closed. Should be called after "close()" to wait until the underlying connection is closed. New in version 3.7. Examples ======== TCP echo client using streams ----------------------------- TCP echo client using the "asyncio.open_connection()" function: import asyncio async def tcp_echo_client(message): reader, writer = await asyncio.open_connection( '127.0.0.1', 8888) print(f'Send: {message!r}') writer.write(message.encode()) data = await reader.read(100) print(f'Received: {data.decode()!r}') print('Close the connection') writer.close() asyncio.run(tcp_echo_client('Hello World!')) See also: The TCP echo client protocol example uses the low-level "loop.create_connection()" method. TCP echo server using streams ----------------------------- TCP echo server using the "asyncio.start_server()" function: import asyncio async def handle_echo(reader, writer): data = await reader.read(100) message = data.decode() addr = writer.get_extra_info('peername') print(f"Received {message!r} from {addr!r}") print(f"Send: {message!r}") writer.write(data) await writer.drain() print("Close the connection") writer.close() async def main(): server = await asyncio.start_server( handle_echo, '127.0.0.1', 8888) addrs = ', '.join(str(sock.getsockname()) for sock in server.sockets) print(f'Serving on {addrs}') async with server: await server.serve_forever() asyncio.run(main()) See also: The TCP echo server protocol example uses the "loop.create_server()" method. Get HTTP headers ---------------- Simple example querying HTTP headers of the URL passed on the command line: import asyncio import urllib.parse import sys async def print_http_headers(url): url = urllib.parse.urlsplit(url) if url.scheme == 'https': reader, writer = await asyncio.open_connection( url.hostname, 443, ssl=True) else: reader, writer = await asyncio.open_connection( url.hostname, 80) query = ( f"HEAD {url.path or '/'} HTTP/1.0\r\n" f"Host: {url.hostname}\r\n" f"\r\n" ) writer.write(query.encode('latin-1')) while True: line = await reader.readline() if not line: break line = line.decode('latin1').rstrip() if line: print(f'HTTP header> {line}') # Ignore the body, close the socket writer.close() url = sys.argv[1] asyncio.run(print_http_headers(url)) Usage: python example.py http://example.com/path/page.html or with HTTPS: python example.py https://example.com/path/page.html Register an open socket to wait for data using streams ------------------------------------------------------ Coroutine waiting until a socket receives data using the "open_connection()" function: import asyncio import socket async def wait_for_data(): # Get a reference to the current event loop because # we want to access low-level APIs. loop = asyncio.get_running_loop() # Create a pair of connected sockets. rsock, wsock = socket.socketpair() # Register the open socket to wait for data. reader, writer = await asyncio.open_connection(sock=rsock) # Simulate the reception of data from the network loop.call_soon(wsock.send, 'abc'.encode()) # Wait for data data = await reader.read(100) # Got data, we are done: close the socket print("Received:", data.decode()) writer.close() # Close the second socket wsock.close() asyncio.run(wait_for_data()) See also: The register an open socket to wait for data using a protocol example uses a low-level protocol and the "loop.create_connection()" method. The watch a file descriptor for read events example uses the low- level "loop.add_reader()" method to watch a file descriptor. Subprocesses ************ **Source code:** Lib/asyncio/subprocess.py, Lib/asyncio/base_subprocess.py ====================================================================== This section describes high-level async/await asyncio APIs to create and manage subprocesses. Here’s an example of how asyncio can run a shell command and obtain its result: import asyncio async def run(cmd): proc = await asyncio.create_subprocess_shell( cmd, stdout=asyncio.subprocess.PIPE, stderr=asyncio.subprocess.PIPE) stdout, stderr = await proc.communicate() print(f'[{cmd!r} exited with {proc.returncode}]') if stdout: print(f'[stdout]\n{stdout.decode()}') if stderr: print(f'[stderr]\n{stderr.decode()}') asyncio.run(run('ls /zzz')) will print: ['ls /zzz' exited with 1] [stderr] ls: /zzz: No such file or directory Because all asyncio subprocess functions are asynchronous and asyncio provides many tools to work with such functions, it is easy to execute and monitor multiple subprocesses in parallel. It is indeed trivial to modify the above example to run several commands simultaneously: async def main(): await asyncio.gather( run('ls /zzz'), run('sleep 1; echo "hello"')) asyncio.run(main()) See also the Examples subsection. Creating Subprocesses ===================== coroutine asyncio.create_subprocess_exec(program, *args, stdin=None, stdout=None, stderr=None, loop=None, limit=None, **kwds) Create a subprocess. The *limit* argument sets the buffer limit for "StreamReader" wrappers for "Process.stdout" and "Process.stderr" (if "subprocess.PIPE" is passed to *stdout* and *stderr* arguments). Return a "Process" instance. See the documentation of "loop.subprocess_exec()" for other parameters. Deprecated since version 3.8, will be removed in version 3.10: The *loop* parameter. coroutine asyncio.create_subprocess_shell(cmd, stdin=None, stdout=None, stderr=None, loop=None, limit=None, **kwds) Run the *cmd* shell command. The *limit* argument sets the buffer limit for "StreamReader" wrappers for "Process.stdout" and "Process.stderr" (if "subprocess.PIPE" is passed to *stdout* and *stderr* arguments). Return a "Process" instance. See the documentation of "loop.subprocess_shell()" for other parameters. Important: It is the application’s responsibility to ensure that all whitespace and special characters are quoted appropriately to avoid shell injection vulnerabilities. The "shlex.quote()" function can be used to properly escape whitespace and special shell characters in strings that are going to be used to construct shell commands. Deprecated since version 3.8, will be removed in version 3.10: The *loop* parameter. Note: Subprocesses are available for Windows if a "ProactorEventLoop" is used. See Subprocess Support on Windows for details. See also: asyncio also has the following *low-level* APIs to work with subprocesses: "loop.subprocess_exec()", "loop.subprocess_shell()", "loop.connect_read_pipe()", "loop.connect_write_pipe()", as well as the Subprocess Transports and Subprocess Protocols. Constants ========= asyncio.subprocess.PIPE Can be passed to the *stdin*, *stdout* or *stderr* parameters. If *PIPE* is passed to *stdin* argument, the "Process.stdin" attribute will point to a "StreamWriter" instance. If *PIPE* is passed to *stdout* or *stderr* arguments, the "Process.stdout" and "Process.stderr" attributes will point to "StreamReader" instances. asyncio.subprocess.STDOUT Special value that can be used as the *stderr* argument and indicates that standard error should be redirected into standard output. asyncio.subprocess.DEVNULL Special value that can be used as the *stdin*, *stdout* or *stderr* argument to process creation functions. It indicates that the special file "os.devnull" will be used for the corresponding subprocess stream. Interacting with Subprocesses ============================= Both "create_subprocess_exec()" and "create_subprocess_shell()" functions return instances of the *Process* class. *Process* is a high-level wrapper that allows communicating with subprocesses and watching for their completion. class asyncio.subprocess.Process An object that wraps OS processes created by the "create_subprocess_exec()" and "create_subprocess_shell()" functions. This class is designed to have a similar API to the "subprocess.Popen" class, but there are some notable differences: * unlike Popen, Process instances do not have an equivalent to the "poll()" method; * the "communicate()" and "wait()" methods don’t have a *timeout* parameter: use the "wait_for()" function; * the "Process.wait()" method is asynchronous, whereas "subprocess.Popen.wait()" method is implemented as a blocking busy loop; * the *universal_newlines* parameter is not supported. This class is not thread safe. See also the Subprocess and Threads section. coroutine wait() Wait for the child process to terminate. Set and return the "returncode" attribute. Note: This method can deadlock when using "stdout=PIPE" or "stderr=PIPE" and the child process generates so much output that it blocks waiting for the OS pipe buffer to accept more data. Use the "communicate()" method when using pipes to avoid this condition. coroutine communicate(input=None) Interact with process: 1. send data to *stdin* (if *input* is not "None"); 2. read data from *stdout* and *stderr*, until EOF is reached; 3. wait for process to terminate. The optional *input* argument is the data ("bytes" object) that will be sent to the child process. Return a tuple "(stdout_data, stderr_data)". If either "BrokenPipeError" or "ConnectionResetError" exception is raised when writing *input* into *stdin*, the exception is ignored. This condition occurs when the process exits before all data are written into *stdin*. If it is desired to send data to the process’ *stdin*, the process needs to be created with "stdin=PIPE". Similarly, to get anything other than "None" in the result tuple, the process has to be created with "stdout=PIPE" and/or "stderr=PIPE" arguments. Note, that the data read is buffered in memory, so do not use this method if the data size is large or unlimited. send_signal(signal) Sends the signal *signal* to the child process. Note: On Windows, "SIGTERM" is an alias for "terminate()". "CTRL_C_EVENT" and "CTRL_BREAK_EVENT" can be sent to processes started with a *creationflags* parameter which includes "CREATE_NEW_PROCESS_GROUP". terminate() Stop the child process. On POSIX systems this method sends "signal.SIGTERM" to the child process. On Windows the Win32 API function "TerminateProcess()" is called to stop the child process. kill() Kill the child process. On POSIX systems this method sends "SIGKILL" to the child process. On Windows this method is an alias for "terminate()". stdin Standard input stream ("StreamWriter") or "None" if the process was created with "stdin=None". stdout Standard output stream ("StreamReader") or "None" if the process was created with "stdout=None". stderr Standard error stream ("StreamReader") or "None" if the process was created with "stderr=None". Warning: Use the "communicate()" method rather than "process.stdin.write()", "await process.stdout.read()" or "await process.stderr.read()". This avoids deadlocks due to streams pausing reading or writing and blocking the child process. pid Process identification number (PID). Note that for processes created by the "create_subprocess_shell()" function, this attribute is the PID of the spawned shell. returncode Return code of the process when it exits. A "None" value indicates that the process has not terminated yet. A negative value "-N" indicates that the child was terminated by signal "N" (POSIX only). Subprocess and Threads ---------------------- Standard asyncio event loop supports running subprocesses from different threads by default. On Windows subprocesses are provided by "ProactorEventLoop" only (default), "SelectorEventLoop" has no subprocess support. On UNIX *child watchers* are used for subprocess finish waiting, see Process Watchers for more info. Changed in version 3.8: UNIX switched to use "ThreadedChildWatcher" for spawning subprocesses from different threads without any limitation.Spawning a subprocess with *inactive* current child watcher raises "RuntimeError". Note that alternative event loop implementations might have own limitations; please refer to their documentation. See also: The Concurrency and multithreading in asyncio section. Examples -------- An example using the "Process" class to control a subprocess and the "StreamReader" class to read from its standard output. The subprocess is created by the "create_subprocess_exec()" function: import asyncio import sys async def get_date(): code = 'import datetime; print(datetime.datetime.now())' # Create the subprocess; redirect the standard output # into a pipe. proc = await asyncio.create_subprocess_exec( sys.executable, '-c', code, stdout=asyncio.subprocess.PIPE) # Read one line of output. data = await proc.stdout.readline() line = data.decode('ascii').rstrip() # Wait for the subprocess exit. await proc.wait() return line date = asyncio.run(get_date()) print(f"Current date: {date}") See also the same example written using low-level APIs. Synchronization Primitives ************************** **Source code:** Lib/asyncio/locks.py ====================================================================== asyncio synchronization primitives are designed to be similar to those of the "threading" module with two important caveats: * asyncio primitives are not thread-safe, therefore they should not be used for OS thread synchronization (use "threading" for that); * methods of these synchronization primitives do not accept the *timeout* argument; use the "asyncio.wait_for()" function to perform operations with timeouts. asyncio has the following basic synchronization primitives: * "Lock" * "Event" * "Condition" * "Semaphore" * "BoundedSemaphore" ====================================================================== Lock ==== class asyncio.Lock(*, loop=None) Implements a mutex lock for asyncio tasks. Not thread-safe. An asyncio lock can be used to guarantee exclusive access to a shared resource. The preferred way to use a Lock is an "async with" statement: lock = asyncio.Lock() # ... later async with lock: # access shared state which is equivalent to: lock = asyncio.Lock() # ... later await lock.acquire() try: # access shared state finally: lock.release() Deprecated since version 3.8, will be removed in version 3.10: The *loop* parameter. coroutine acquire() Acquire the lock. This method waits until the lock is *unlocked*, sets it to *locked* and returns "True". When more than one coroutine is blocked in "acquire()" waiting for the lock to be unlocked, only one coroutine eventually proceeds. Acquiring a lock is *fair*: the coroutine that proceeds will be the first coroutine that started waiting on the lock. release() Release the lock. When the lock is *locked*, reset it to *unlocked* and return. If the lock is *unlocked*, a "RuntimeError" is raised. locked() Return "True" if the lock is *locked*. Event ===== class asyncio.Event(*, loop=None) An event object. Not thread-safe. An asyncio event can be used to notify multiple asyncio tasks that some event has happened. An Event object manages an internal flag that can be set to *true* with the "set()" method and reset to *false* with the "clear()" method. The "wait()" method blocks until the flag is set to *true*. The flag is set to *false* initially. Deprecated since version 3.8, will be removed in version 3.10: The *loop* parameter. Example: async def waiter(event): print('waiting for it ...') await event.wait() print('... got it!') async def main(): # Create an Event object. event = asyncio.Event() # Spawn a Task to wait until 'event' is set. waiter_task = asyncio.create_task(waiter(event)) # Sleep for 1 second and set the event. await asyncio.sleep(1) event.set() # Wait until the waiter task is finished. await waiter_task asyncio.run(main()) coroutine wait() Wait until the event is set. If the event is set, return "True" immediately. Otherwise block until another task calls "set()". set() Set the event. All tasks waiting for event to be set will be immediately awakened. clear() Clear (unset) the event. Tasks awaiting on "wait()" will now block until the "set()" method is called again. is_set() Return "True" if the event is set. Condition ========= class asyncio.Condition(lock=None, *, loop=None) A Condition object. Not thread-safe. An asyncio condition primitive can be used by a task to wait for some event to happen and then get exclusive access to a shared resource. In essence, a Condition object combines the functionality of an "Event" and a "Lock". It is possible to have multiple Condition objects share one Lock, which allows coordinating exclusive access to a shared resource between different tasks interested in particular states of that shared resource. The optional *lock* argument must be a "Lock" object or "None". In the latter case a new Lock object is created automatically. Deprecated since version 3.8, will be removed in version 3.10: The *loop* parameter. The preferred way to use a Condition is an "async with" statement: cond = asyncio.Condition() # ... later async with cond: await cond.wait() which is equivalent to: cond = asyncio.Condition() # ... later await cond.acquire() try: await cond.wait() finally: cond.release() coroutine acquire() Acquire the underlying lock. This method waits until the underlying lock is *unlocked*, sets it to *locked* and returns "True". notify(n=1) Wake up at most *n* tasks (1 by default) waiting on this condition. The method is no-op if no tasks are waiting. The lock must be acquired before this method is called and released shortly after. If called with an *unlocked* lock a "RuntimeError" error is raised. locked() Return "True" if the underlying lock is acquired. notify_all() Wake up all tasks waiting on this condition. This method acts like "notify()", but wakes up all waiting tasks. The lock must be acquired before this method is called and released shortly after. If called with an *unlocked* lock a "RuntimeError" error is raised. release() Release the underlying lock. When invoked on an unlocked lock, a "RuntimeError" is raised. coroutine wait() Wait until notified. If the calling task has not acquired the lock when this method is called, a "RuntimeError" is raised. This method releases the underlying lock, and then blocks until it is awakened by a "notify()" or "notify_all()" call. Once awakened, the Condition re-acquires its lock and this method returns "True". coroutine wait_for(predicate) Wait until a predicate becomes *true*. The predicate must be a callable which result will be interpreted as a boolean value. The final value is the return value. Semaphore ========= class asyncio.Semaphore(value=1, *, loop=None) A Semaphore object. Not thread-safe. A semaphore manages an internal counter which is decremented by each "acquire()" call and incremented by each "release()" call. The counter can never go below zero; when "acquire()" finds that it is zero, it blocks, waiting until some task calls "release()". The optional *value* argument gives the initial value for the internal counter ("1" by default). If the given value is less than "0" a "ValueError" is raised. Deprecated since version 3.8, will be removed in version 3.10: The *loop* parameter. The preferred way to use a Semaphore is an "async with" statement: sem = asyncio.Semaphore(10) # ... later async with sem: # work with shared resource which is equivalent to: sem = asyncio.Semaphore(10) # ... later await sem.acquire() try: # work with shared resource finally: sem.release() coroutine acquire() Acquire a semaphore. If the internal counter is greater than zero, decrement it by one and return "True" immediately. If it is zero, wait until a "release()" is called and return "True". locked() Returns "True" if semaphore can not be acquired immediately. release() Release a semaphore, incrementing the internal counter by one. Can wake up a task waiting to acquire the semaphore. Unlike "BoundedSemaphore", "Semaphore" allows making more "release()" calls than "acquire()" calls. BoundedSemaphore ================ class asyncio.BoundedSemaphore(value=1, *, loop=None) A bounded semaphore object. Not thread-safe. Bounded Semaphore is a version of "Semaphore" that raises a "ValueError" in "release()" if it increases the internal counter above the initial *value*. Deprecated since version 3.8, will be removed in version 3.10: The *loop* parameter. ====================================================================== Changed in version 3.9: Acquiring a lock using "await lock" or "yield from lock" and/or "with" statement ("with await lock", "with (yield from lock)") was removed. Use "async with lock" instead. Coroutines and Tasks ******************** This section outlines high-level asyncio APIs to work with coroutines and Tasks. * Coroutines * Awaitables * Running an asyncio Program * Creating Tasks * Sleeping * Running Tasks Concurrently * Shielding From Cancellation * Timeouts * Waiting Primitives * Running in Threads * Scheduling From Other Threads * Introspection * Task Object * Generator-based Coroutines Coroutines ========== *Coroutines* declared with the async/await syntax is the preferred way of writing asyncio applications. For example, the following snippet of code prints “hello”, waits 1 second, and then prints “world”: >>> import asyncio >>> async def main(): ... print('hello') ... await asyncio.sleep(1) ... print('world') >>> asyncio.run(main()) hello world Note that simply calling a coroutine will not schedule it to be executed: >>> main() To actually run a coroutine, asyncio provides three main mechanisms: * The "asyncio.run()" function to run the top-level entry point “main()” function (see the above example.) * Awaiting on a coroutine. The following snippet of code will print “hello” after waiting for 1 second, and then print “world” after waiting for *another* 2 seconds: import asyncio import time async def say_after(delay, what): await asyncio.sleep(delay) print(what) async def main(): print(f"started at {time.strftime('%X')}") await say_after(1, 'hello') await say_after(2, 'world') print(f"finished at {time.strftime('%X')}") asyncio.run(main()) Expected output: started at 17:13:52 hello world finished at 17:13:55 * The "asyncio.create_task()" function to run coroutines concurrently as asyncio "Tasks". Let’s modify the above example and run two "say_after" coroutines *concurrently*: async def main(): task1 = asyncio.create_task( say_after(1, 'hello')) task2 = asyncio.create_task( say_after(2, 'world')) print(f"started at {time.strftime('%X')}") # Wait until both tasks are completed (should take # around 2 seconds.) await task1 await task2 print(f"finished at {time.strftime('%X')}") Note that expected output now shows that the snippet runs 1 second faster than before: started at 17:14:32 hello world finished at 17:14:34 Awaitables ========== We say that an object is an **awaitable** object if it can be used in an "await" expression. Many asyncio APIs are designed to accept awaitables. There are three main types of *awaitable* objects: **coroutines**, **Tasks**, and **Futures**. -[ Coroutines ]- Python coroutines are *awaitables* and therefore can be awaited from other coroutines: import asyncio async def nested(): return 42 async def main(): # Nothing happens if we just call "nested()". # A coroutine object is created but not awaited, # so it *won't run at all*. nested() # Let's do it differently now and await it: print(await nested()) # will print "42". asyncio.run(main()) Important: In this documentation the term “coroutine” can be used for two closely related concepts: * a *coroutine function*: an "async def" function; * a *coroutine object*: an object returned by calling a *coroutine function*. asyncio also supports legacy generator-based coroutines. -[ Tasks ]- *Tasks* are used to schedule coroutines *concurrently*. When a coroutine is wrapped into a *Task* with functions like "asyncio.create_task()" the coroutine is automatically scheduled to run soon: import asyncio async def nested(): return 42 async def main(): # Schedule nested() to run soon concurrently # with "main()". task = asyncio.create_task(nested()) # "task" can now be used to cancel "nested()", or # can simply be awaited to wait until it is complete: await task asyncio.run(main()) -[ Futures ]- A "Future" is a special **low-level** awaitable object that represents an **eventual result** of an asynchronous operation. When a Future object is *awaited* it means that the coroutine will wait until the Future is resolved in some other place. Future objects in asyncio are needed to allow callback-based code to be used with async/await. Normally **there is no need** to create Future objects at the application level code. Future objects, sometimes exposed by libraries and some asyncio APIs, can be awaited: async def main(): await function_that_returns_a_future_object() # this is also valid: await asyncio.gather( function_that_returns_a_future_object(), some_python_coroutine() ) A good example of a low-level function that returns a Future object is "loop.run_in_executor()". Running an asyncio Program ========================== asyncio.run(coro, *, debug=False) Execute the *coroutine* *coro* and return the result. This function runs the passed coroutine, taking care of managing the asyncio event loop, *finalizing asynchronous generators*, and closing the threadpool. This function cannot be called when another asyncio event loop is running in the same thread. If *debug* is "True", the event loop will be run in debug mode. This function always creates a new event loop and closes it at the end. It should be used as a main entry point for asyncio programs, and should ideally only be called once. Example: async def main(): await asyncio.sleep(1) print('hello') asyncio.run(main()) New in version 3.7. Changed in version 3.9: Updated to use "loop.shutdown_default_executor()". Note: The source code for "asyncio.run()" can be found in Lib/asyncio/runners.py. Creating Tasks ============== asyncio.create_task(coro, *, name=None) Wrap the *coro* coroutine into a "Task" and schedule its execution. Return the Task object. If *name* is not "None", it is set as the name of the task using "Task.set_name()". The task is executed in the loop returned by "get_running_loop()", "RuntimeError" is raised if there is no running loop in current thread. Important: Save a reference to the result of this function, to avoid a task disappearing mid execution. New in version 3.7. Changed in version 3.8: Added the "name" parameter. Sleeping ======== coroutine asyncio.sleep(delay, result=None, *, loop=None) Block for *delay* seconds. If *result* is provided, it is returned to the caller when the coroutine completes. "sleep()" always suspends the current task, allowing other tasks to run. Setting the delay to 0 provides an optimized path to allow other tasks to run. This can be used by long-running functions to avoid blocking the event loop for the full duration of the function call. Deprecated since version 3.8, will be removed in version 3.10: The *loop* parameter. Example of coroutine displaying the current date every second for 5 seconds: import asyncio import datetime async def display_date(): loop = asyncio.get_running_loop() end_time = loop.time() + 5.0 while True: print(datetime.datetime.now()) if (loop.time() + 1.0) >= end_time: break await asyncio.sleep(1) asyncio.run(display_date()) Running Tasks Concurrently ========================== awaitable asyncio.gather(*aws, loop=None, return_exceptions=False) Run awaitable objects in the *aws* sequence *concurrently*. If any awaitable in *aws* is a coroutine, it is automatically scheduled as a Task. If all awaitables are completed successfully, the result is an aggregate list of returned values. The order of result values corresponds to the order of awaitables in *aws*. If *return_exceptions* is "False" (default), the first raised exception is immediately propagated to the task that awaits on "gather()". Other awaitables in the *aws* sequence **won’t be cancelled** and will continue to run. If *return_exceptions* is "True", exceptions are treated the same as successful results, and aggregated in the result list. If "gather()" is *cancelled*, all submitted awaitables (that have not completed yet) are also *cancelled*. If any Task or Future from the *aws* sequence is *cancelled*, it is treated as if it raised "CancelledError" – the "gather()" call is **not** cancelled in this case. This is to prevent the cancellation of one submitted Task/Future to cause other Tasks/Futures to be cancelled. Deprecated since version 3.8, will be removed in version 3.10: The *loop* parameter. Example: import asyncio async def factorial(name, number): f = 1 for i in range(2, number + 1): print(f"Task {name}: Compute factorial({number}), currently i={i}...") await asyncio.sleep(1) f *= i print(f"Task {name}: factorial({number}) = {f}") return f async def main(): # Schedule three calls *concurrently*: L = await asyncio.gather( factorial("A", 2), factorial("B", 3), factorial("C", 4), ) print(L) asyncio.run(main()) # Expected output: # # Task A: Compute factorial(2), currently i=2... # Task B: Compute factorial(3), currently i=2... # Task C: Compute factorial(4), currently i=2... # Task A: factorial(2) = 2 # Task B: Compute factorial(3), currently i=3... # Task C: Compute factorial(4), currently i=3... # Task B: factorial(3) = 6 # Task C: Compute factorial(4), currently i=4... # Task C: factorial(4) = 24 # [2, 6, 24] Note: If *return_exceptions* is False, cancelling gather() after it has been marked done won’t cancel any submitted awaitables. For instance, gather can be marked done after propagating an exception to the caller, therefore, calling "gather.cancel()" after catching an exception (raised by one of the awaitables) from gather won’t cancel any other awaitables. Changed in version 3.7: If the *gather* itself is cancelled, the cancellation is propagated regardless of *return_exceptions*. Shielding From Cancellation =========================== awaitable asyncio.shield(aw, *, loop=None) Protect an awaitable object from being "cancelled". If *aw* is a coroutine it is automatically scheduled as a Task. The statement: res = await shield(something()) is equivalent to: res = await something() *except* that if the coroutine containing it is cancelled, the Task running in "something()" is not cancelled. From the point of view of "something()", the cancellation did not happen. Although its caller is still cancelled, so the “await” expression still raises a "CancelledError". If "something()" is cancelled by other means (i.e. from within itself) that would also cancel "shield()". If it is desired to completely ignore cancellation (not recommended) the "shield()" function should be combined with a try/except clause, as follows: try: res = await shield(something()) except CancelledError: res = None Deprecated since version 3.8, will be removed in version 3.10: The *loop* parameter. Timeouts ======== coroutine asyncio.wait_for(aw, timeout, *, loop=None) Wait for the *aw* awaitable to complete with a timeout. If *aw* is a coroutine it is automatically scheduled as a Task. *timeout* can either be "None" or a float or int number of seconds to wait for. If *timeout* is "None", block until the future completes. If a timeout occurs, it cancels the task and raises "asyncio.TimeoutError". To avoid the task "cancellation", wrap it in "shield()". The function will wait until the future is actually cancelled, so the total wait time may exceed the *timeout*. If an exception happens during cancellation, it is propagated. If the wait is cancelled, the future *aw* is also cancelled. Deprecated since version 3.8, will be removed in version 3.10: The *loop* parameter. Example: async def eternity(): # Sleep for one hour await asyncio.sleep(3600) print('yay!') async def main(): # Wait for at most 1 second try: await asyncio.wait_for(eternity(), timeout=1.0) except asyncio.TimeoutError: print('timeout!') asyncio.run(main()) # Expected output: # # timeout! Changed in version 3.7: When *aw* is cancelled due to a timeout, "wait_for" waits for *aw* to be cancelled. Previously, it raised "asyncio.TimeoutError" immediately. Waiting Primitives ================== coroutine asyncio.wait(aws, *, loop=None, timeout=None, return_when=ALL_COMPLETED) Run awaitable objects in the *aws* iterable concurrently and block until the condition specified by *return_when*. The *aws* iterable must not be empty. Returns two sets of Tasks/Futures: "(done, pending)". Usage: done, pending = await asyncio.wait(aws) *timeout* (a float or int), if specified, can be used to control the maximum number of seconds to wait before returning. Note that this function does not raise "asyncio.TimeoutError". Futures or Tasks that aren’t done when the timeout occurs are simply returned in the second set. *return_when* indicates when this function should return. It must be one of the following constants: +-------------------------------+------------------------------------------+ | Constant | Description | |===============================|==========================================| | "FIRST_COMPLETED" | The function will return when any future | | | finishes or is cancelled. | +-------------------------------+------------------------------------------+ | "FIRST_EXCEPTION" | The function will return when any future | | | finishes by raising an exception. If no | | | future raises an exception then it is | | | equivalent to "ALL_COMPLETED". | +-------------------------------+------------------------------------------+ | "ALL_COMPLETED" | The function will return when all | | | futures finish or are cancelled. | +-------------------------------+------------------------------------------+ Unlike "wait_for()", "wait()" does not cancel the futures when a timeout occurs. Deprecated since version 3.8: If any awaitable in *aws* is a coroutine, it is automatically scheduled as a Task. Passing coroutines objects to "wait()" directly is deprecated as it leads to confusing behavior. Deprecated since version 3.8, will be removed in version 3.10: The *loop* parameter. Note: "wait()" schedules coroutines as Tasks automatically and later returns those implicitly created Task objects in "(done, pending)" sets. Therefore the following code won’t work as expected: async def foo(): return 42 coro = foo() done, pending = await asyncio.wait({coro}) if coro in done: # This branch will never be run! Here is how the above snippet can be fixed: async def foo(): return 42 task = asyncio.create_task(foo()) done, pending = await asyncio.wait({task}) if task in done: # Everything will work as expected now. Deprecated since version 3.8, will be removed in version 3.11: Passing coroutine objects to "wait()" directly is deprecated. asyncio.as_completed(aws, *, loop=None, timeout=None) Run awaitable objects in the *aws* iterable concurrently. Return an iterator of coroutines. Each coroutine returned can be awaited to get the earliest next result from the iterable of the remaining awaitables. Raises "asyncio.TimeoutError" if the timeout occurs before all Futures are done. Deprecated since version 3.8, will be removed in version 3.10: The *loop* parameter. Example: for coro in as_completed(aws): earliest_result = await coro # ... Running in Threads ================== coroutine asyncio.to_thread(func, /, *args, **kwargs) Asynchronously run function *func* in a separate thread. Any *args and **kwargs supplied for this function are directly passed to *func*. Also, the current "contextvars.Context" is propagated, allowing context variables from the event loop thread to be accessed in the separate thread. Return a coroutine that can be awaited to get the eventual result of *func*. This coroutine function is primarily intended to be used for executing IO-bound functions/methods that would otherwise block the event loop if they were ran in the main thread. For example: def blocking_io(): print(f"start blocking_io at {time.strftime('%X')}") # Note that time.sleep() can be replaced with any blocking # IO-bound operation, such as file operations. time.sleep(1) print(f"blocking_io complete at {time.strftime('%X')}") async def main(): print(f"started main at {time.strftime('%X')}") await asyncio.gather( asyncio.to_thread(blocking_io), asyncio.sleep(1)) print(f"finished main at {time.strftime('%X')}") asyncio.run(main()) # Expected output: # # started main at 19:50:53 # start blocking_io at 19:50:53 # blocking_io complete at 19:50:54 # finished main at 19:50:54 Directly calling *blocking_io()* in any coroutine would block the event loop for its duration, resulting in an additional 1 second of run time. Instead, by using *asyncio.to_thread()*, we can run it in a separate thread without blocking the event loop. Note: Due to the *GIL*, *asyncio.to_thread()* can typically only be used to make IO-bound functions non-blocking. However, for extension modules that release the GIL or alternative Python implementations that don’t have one, *asyncio.to_thread()* can also be used for CPU-bound functions. New in version 3.9. Scheduling From Other Threads ============================= asyncio.run_coroutine_threadsafe(coro, loop) Submit a coroutine to the given event loop. Thread-safe. Return a "concurrent.futures.Future" to wait for the result from another OS thread. This function is meant to be called from a different OS thread than the one where the event loop is running. Example: # Create a coroutine coro = asyncio.sleep(1, result=3) # Submit the coroutine to a given loop future = asyncio.run_coroutine_threadsafe(coro, loop) # Wait for the result with an optional timeout argument assert future.result(timeout) == 3 If an exception is raised in the coroutine, the returned Future will be notified. It can also be used to cancel the task in the event loop: try: result = future.result(timeout) except asyncio.TimeoutError: print('The coroutine took too long, cancelling the task...') future.cancel() except Exception as exc: print(f'The coroutine raised an exception: {exc!r}') else: print(f'The coroutine returned: {result!r}') See the concurrency and multithreading section of the documentation. Unlike other asyncio functions this function requires the *loop* argument to be passed explicitly. New in version 3.5.1. Introspection ============= asyncio.current_task(loop=None) Return the currently running "Task" instance, or "None" if no task is running. If *loop* is "None" "get_running_loop()" is used to get the current loop. New in version 3.7. asyncio.all_tasks(loop=None) Return a set of not yet finished "Task" objects run by the loop. If *loop* is "None", "get_running_loop()" is used for getting current loop. New in version 3.7. Task Object =========== class asyncio.Task(coro, *, loop=None, name=None) A "Future-like" object that runs a Python coroutine. Not thread- safe. Tasks are used to run coroutines in event loops. If a coroutine awaits on a Future, the Task suspends the execution of the coroutine and waits for the completion of the Future. When the Future is *done*, the execution of the wrapped coroutine resumes. Event loops use cooperative scheduling: an event loop runs one Task at a time. While a Task awaits for the completion of a Future, the event loop runs other Tasks, callbacks, or performs IO operations. Use the high-level "asyncio.create_task()" function to create Tasks, or the low-level "loop.create_task()" or "ensure_future()" functions. Manual instantiation of Tasks is discouraged. To cancel a running Task use the "cancel()" method. Calling it will cause the Task to throw a "CancelledError" exception into the wrapped coroutine. If a coroutine is awaiting on a Future object during cancellation, the Future object will be cancelled. "cancelled()" can be used to check if the Task was cancelled. The method returns "True" if the wrapped coroutine did not suppress the "CancelledError" exception and was actually cancelled. "asyncio.Task" inherits from "Future" all of its APIs except "Future.set_result()" and "Future.set_exception()". Tasks support the "contextvars" module. When a Task is created it copies the current context and later runs its coroutine in the copied context. Changed in version 3.7: Added support for the "contextvars" module. Changed in version 3.8: Added the "name" parameter. Deprecated since version 3.8, will be removed in version 3.10: The *loop* parameter. cancel(msg=None) Request the Task to be cancelled. This arranges for a "CancelledError" exception to be thrown into the wrapped coroutine on the next cycle of the event loop. The coroutine then has a chance to clean up or even deny the request by suppressing the exception with a "try" … … "except CancelledError" … "finally" block. Therefore, unlike "Future.cancel()", "Task.cancel()" does not guarantee that the Task will be cancelled, although suppressing cancellation completely is not common and is actively discouraged. Changed in version 3.9: Added the "msg" parameter. The following example illustrates how coroutines can intercept the cancellation request: async def cancel_me(): print('cancel_me(): before sleep') try: # Wait for 1 hour await asyncio.sleep(3600) except asyncio.CancelledError: print('cancel_me(): cancel sleep') raise finally: print('cancel_me(): after sleep') async def main(): # Create a "cancel_me" Task task = asyncio.create_task(cancel_me()) # Wait for 1 second await asyncio.sleep(1) task.cancel() try: await task except asyncio.CancelledError: print("main(): cancel_me is cancelled now") asyncio.run(main()) # Expected output: # # cancel_me(): before sleep # cancel_me(): cancel sleep # cancel_me(): after sleep # main(): cancel_me is cancelled now cancelled() Return "True" if the Task is *cancelled*. The Task is *cancelled* when the cancellation was requested with "cancel()" and the wrapped coroutine propagated the "CancelledError" exception thrown into it. done() Return "True" if the Task is *done*. A Task is *done* when the wrapped coroutine either returned a value, raised an exception, or the Task was cancelled. result() Return the result of the Task. If the Task is *done*, the result of the wrapped coroutine is returned (or if the coroutine raised an exception, that exception is re-raised.) If the Task has been *cancelled*, this method raises a "CancelledError" exception. If the Task’s result isn’t yet available, this method raises a "InvalidStateError" exception. exception() Return the exception of the Task. If the wrapped coroutine raised an exception that exception is returned. If the wrapped coroutine returned normally this method returns "None". If the Task has been *cancelled*, this method raises a "CancelledError" exception. If the Task isn’t *done* yet, this method raises an "InvalidStateError" exception. add_done_callback(callback, *, context=None) Add a callback to be run when the Task is *done*. This method should only be used in low-level callback-based code. See the documentation of "Future.add_done_callback()" for more details. remove_done_callback(callback) Remove *callback* from the callbacks list. This method should only be used in low-level callback-based code. See the documentation of "Future.remove_done_callback()" for more details. get_stack(*, limit=None) Return the list of stack frames for this Task. If the wrapped coroutine is not done, this returns the stack where it is suspended. If the coroutine has completed successfully or was cancelled, this returns an empty list. If the coroutine was terminated by an exception, this returns the list of traceback frames. The frames are always ordered from oldest to newest. Only one stack frame is returned for a suspended coroutine. The optional *limit* argument sets the maximum number of frames to return; by default all available frames are returned. The ordering of the returned list differs depending on whether a stack or a traceback is returned: the newest frames of a stack are returned, but the oldest frames of a traceback are returned. (This matches the behavior of the traceback module.) print_stack(*, limit=None, file=None) Print the stack or traceback for this Task. This produces output similar to that of the traceback module for the frames retrieved by "get_stack()". The *limit* argument is passed to "get_stack()" directly. The *file* argument is an I/O stream to which the output is written; by default output is written to "sys.stderr". get_coro() Return the coroutine object wrapped by the "Task". New in version 3.8. get_name() Return the name of the Task. If no name has been explicitly assigned to the Task, the default asyncio Task implementation generates a default name during instantiation. New in version 3.8. set_name(value) Set the name of the Task. The *value* argument can be any object, which is then converted to a string. In the default Task implementation, the name will be visible in the "repr()" output of a task object. New in version 3.8. Generator-based Coroutines ========================== Note: Support for generator-based coroutines is **deprecated** and is removed in Python 3.11. Generator-based coroutines predate async/await syntax. They are Python generators that use "yield from" expressions to await on Futures and other coroutines. Generator-based coroutines should be decorated with "@asyncio.coroutine", although this is not enforced. @asyncio.coroutine Decorator to mark generator-based coroutines. This decorator enables legacy generator-based coroutines to be compatible with async/await code: @asyncio.coroutine def old_style_coroutine(): yield from asyncio.sleep(1) async def main(): await old_style_coroutine() This decorator should not be used for "async def" coroutines. Deprecated since version 3.8, will be removed in version 3.11: Use "async def" instead. asyncio.iscoroutine(obj) Return "True" if *obj* is a coroutine object. This method is different from "inspect.iscoroutine()" because it returns "True" for generator-based coroutines. asyncio.iscoroutinefunction(func) Return "True" if *func* is a coroutine function. This method is different from "inspect.iscoroutinefunction()" because it returns "True" for generator-based coroutine functions decorated with "@coroutine". "asyncio" — Asynchronous I/O **************************** ====================================================================== Hello World! ^^^^^^^^^^^^ import asyncio async def main(): print('Hello ...') await asyncio.sleep(1) print('... World!') asyncio.run(main()) asyncio is a library to write **concurrent** code using the **async/await** syntax. asyncio is used as a foundation for multiple Python asynchronous frameworks that provide high-performance network and web-servers, database connection libraries, distributed task queues, etc. asyncio is often a perfect fit for IO-bound and high-level **structured** network code. asyncio provides a set of **high-level** APIs to: * run Python coroutines concurrently and have full control over their execution; * perform network IO and IPC; * control subprocesses; * distribute tasks via queues; * synchronize concurrent code; Additionally, there are **low-level** APIs for *library and framework developers* to: * create and manage event loops, which provide asynchronous APIs for "networking", running "subprocesses", handling "OS signals", etc; * implement efficient protocols using transports; * bridge callback-based libraries and code with async/await syntax. -[ asyncio REPL ]- You can experiment with an "asyncio" concurrent context in the REPL: $ python -m asyncio asyncio REPL ... Use "await" directly instead of "asyncio.run()". Type "help", "copyright", "credits" or "license" for more information. >>> import asyncio >>> await asyncio.sleep(10, result='hello') 'hello' Raises an auditing event "cpython.run_stdin" with no arguments. Changed in version 3.9.20: (also 3.8.20) Emits audit events. -[ Reference ]- High-level APIs * Coroutines and Tasks * Streams * Synchronization Primitives * Subprocesses * Queues * Exceptions Low-level APIs * Event Loop * Futures * Transports and Protocols * Policies * Platform Support Guides and Tutorials * High-level API Index * Low-level API Index * Developing with asyncio Note: The source code for asyncio can be found in Lib/asyncio/. "asyncore" — Asynchronous socket handler **************************************** **Source code:** Lib/asyncore.py Deprecated since version 3.6: "asyncore" will be removed in Python 3.12 (see **PEP 594** for details). Please use "asyncio" instead. ====================================================================== Note: This module exists for backwards compatibility only. For new code we recommend using "asyncio". This module provides the basic infrastructure for writing asynchronous socket service clients and servers. There are only two ways to have a program on a single processor do “more than one thing at a time.” Multi-threaded programming is the simplest and most popular way to do it, but there is another very different technique, that lets you have nearly all the advantages of multi-threading, without actually using multiple threads. It’s really only practical if your program is largely I/O bound. If your program is processor bound, then pre-emptive scheduled threads are probably what you really need. Network servers are rarely processor bound, however. If your operating system supports the "select()" system call in its I/O library (and nearly all do), then you can use it to juggle multiple communication channels at once; doing other work while your I/O is taking place in the “background.” Although this strategy can seem strange and complex, especially at first, it is in many ways easier to understand and control than multi-threaded programming. The "asyncore" module solves many of the difficult problems for you, making the task of building sophisticated high-performance network servers and clients a snap. For “conversational” applications and protocols the companion "asynchat" module is invaluable. The basic idea behind both modules is to create one or more network *channels*, instances of class "asyncore.dispatcher" and "asynchat.async_chat". Creating the channels adds them to a global map, used by the "loop()" function if you do not provide it with your own *map*. Once the initial channel(s) is(are) created, calling the "loop()" function activates channel service, which continues until the last channel (including any that have been added to the map during asynchronous service) is closed. asyncore.loop([timeout[, use_poll[, map[, count]]]]) Enter a polling loop that terminates after count passes or all open channels have been closed. All arguments are optional. The *count* parameter defaults to "None", resulting in the loop terminating only when all channels have been closed. The *timeout* argument sets the timeout parameter for the appropriate "select()" or "poll()" call, measured in seconds; the default is 30 seconds. The *use_poll* parameter, if true, indicates that "poll()" should be used in preference to "select()" (the default is "False"). The *map* parameter is a dictionary whose items are the channels to watch. As channels are closed they are deleted from their map. If *map* is omitted, a global map is used. Channels (instances of "asyncore.dispatcher", "asynchat.async_chat" and subclasses thereof) can freely be mixed in the map. class asyncore.dispatcher The "dispatcher" class is a thin wrapper around a low-level socket object. To make it more useful, it has a few methods for event- handling which are called from the asynchronous loop. Otherwise, it can be treated as a normal non-blocking socket object. The firing of low-level events at certain times or in certain connection states tells the asynchronous loop that certain higher- level events have taken place. For example, if we have asked for a socket to connect to another host, we know that the connection has been made when the socket becomes writable for the first time (at this point you know that you may write to it with the expectation of success). The implied higher-level events are: +------------------------+------------------------------------------+ | Event | Description | |========================|==========================================| | "handle_connect()" | Implied by the first read or write event | +------------------------+------------------------------------------+ | "handle_close()" | Implied by a read event with no data | | | available | +------------------------+------------------------------------------+ | "handle_accepted()" | Implied by a read event on a listening | | | socket | +------------------------+------------------------------------------+ During asynchronous processing, each mapped channel’s "readable()" and "writable()" methods are used to determine whether the channel’s socket should be added to the list of channels "select()"ed or "poll()"ed for read and write events. Thus, the set of channel events is larger than the basic socket events. The full set of methods that can be overridden in your subclass follows: handle_read() Called when the asynchronous loop detects that a "read()" call on the channel’s socket will succeed. handle_write() Called when the asynchronous loop detects that a writable socket can be written. Often this method will implement the necessary buffering for performance. For example: def handle_write(self): sent = self.send(self.buffer) self.buffer = self.buffer[sent:] handle_expt() Called when there is out of band (OOB) data for a socket connection. This will almost never happen, as OOB is tenuously supported and rarely used. handle_connect() Called when the active opener’s socket actually makes a connection. Might send a “welcome” banner, or initiate a protocol negotiation with the remote endpoint, for example. handle_close() Called when the socket is closed. handle_error() Called when an exception is raised and not otherwise handled. The default version prints a condensed traceback. handle_accept() Called on listening channels (passive openers) when a connection can be established with a new remote endpoint that has issued a "connect()" call for the local endpoint. Deprecated in version 3.2; use "handle_accepted()" instead. Deprecated since version 3.2. handle_accepted(sock, addr) Called on listening channels (passive openers) when a connection has been established with a new remote endpoint that has issued a "connect()" call for the local endpoint. *sock* is a *new* socket object usable to send and receive data on the connection, and *addr* is the address bound to the socket on the other end of the connection. New in version 3.2. readable() Called each time around the asynchronous loop to determine whether a channel’s socket should be added to the list on which read events can occur. The default method simply returns "True", indicating that by default, all channels will be interested in read events. writable() Called each time around the asynchronous loop to determine whether a channel’s socket should be added to the list on which write events can occur. The default method simply returns "True", indicating that by default, all channels will be interested in write events. In addition, each channel delegates or extends many of the socket methods. Most of these are nearly identical to their socket partners. create_socket(family=socket.AF_INET, type=socket.SOCK_STREAM) This is identical to the creation of a normal socket, and will use the same options for creation. Refer to the "socket" documentation for information on creating sockets. Changed in version 3.3: *family* and *type* arguments can be omitted. connect(address) As with the normal socket object, *address* is a tuple with the first element the host to connect to, and the second the port number. send(data) Send *data* to the remote end-point of the socket. recv(buffer_size) Read at most *buffer_size* bytes from the socket’s remote end- point. An empty bytes object implies that the channel has been closed from the other end. Note that "recv()" may raise "BlockingIOError" , even though "select.select()" or "select.poll()" has reported the socket ready for reading. listen(backlog) Listen for connections made to the socket. The *backlog* argument specifies the maximum number of queued connections and should be at least 1; the maximum value is system-dependent (usually 5). bind(address) Bind the socket to *address*. The socket must not already be bound. (The format of *address* depends on the address family — refer to the "socket" documentation for more information.) To mark the socket as re-usable (setting the "SO_REUSEADDR" option), call the "dispatcher" object’s "set_reuse_addr()" method. accept() Accept a connection. The socket must be bound to an address and listening for connections. The return value can be either "None" or a pair "(conn, address)" where *conn* is a *new* socket object usable to send and receive data on the connection, and *address* is the address bound to the socket on the other end of the connection. When "None" is returned it means the connection didn’t take place, in which case the server should just ignore this event and keep listening for further incoming connections. close() Close the socket. All future operations on the socket object will fail. The remote end-point will receive no more data (after queued data is flushed). Sockets are automatically closed when they are garbage-collected. class asyncore.dispatcher_with_send A "dispatcher" subclass which adds simple buffered output capability, useful for simple clients. For more sophisticated usage use "asynchat.async_chat". class asyncore.file_dispatcher A file_dispatcher takes a file descriptor or *file object* along with an optional map argument and wraps it for use with the "poll()" or "loop()" functions. If provided a file object or anything with a "fileno()" method, that method will be called and passed to the "file_wrapper" constructor. Availability: Unix. class asyncore.file_wrapper A file_wrapper takes an integer file descriptor and calls "os.dup()" to duplicate the handle so that the original handle may be closed independently of the file_wrapper. This class implements sufficient methods to emulate a socket for use by the "file_dispatcher" class. Availability: Unix. asyncore Example basic HTTP client ================================== Here is a very basic HTTP client that uses the "dispatcher" class to implement its socket handling: import asyncore class HTTPClient(asyncore.dispatcher): def __init__(self, host, path): asyncore.dispatcher.__init__(self) self.create_socket() self.connect( (host, 80) ) self.buffer = bytes('GET %s HTTP/1.0\r\nHost: %s\r\n\r\n' % (path, host), 'ascii') def handle_connect(self): pass def handle_close(self): self.close() def handle_read(self): print(self.recv(8192)) def writable(self): return (len(self.buffer) > 0) def handle_write(self): sent = self.send(self.buffer) self.buffer = self.buffer[sent:] client = HTTPClient('www.python.org', '/') asyncore.loop() asyncore Example basic echo server ================================== Here is a basic echo server that uses the "dispatcher" class to accept connections and dispatches the incoming connections to a handler: import asyncore class EchoHandler(asyncore.dispatcher_with_send): def handle_read(self): data = self.recv(8192) if data: self.send(data) class EchoServer(asyncore.dispatcher): def __init__(self, host, port): asyncore.dispatcher.__init__(self) self.create_socket() self.set_reuse_addr() self.bind((host, port)) self.listen(5) def handle_accepted(self, sock, addr): print('Incoming connection from %s' % repr(addr)) handler = EchoHandler(sock) server = EchoServer('localhost', 8080) asyncore.loop() "atexit" — Exit handlers ************************ ====================================================================== The "atexit" module defines functions to register and unregister cleanup functions. Functions thus registered are automatically executed upon normal interpreter termination. "atexit" runs these functions in the *reverse* order in which they were registered; if you register "A", "B", and "C", at interpreter termination time they will be run in the order "C", "B", "A". **Note:** The functions registered via this module are not called when the program is killed by a signal not handled by Python, when a Python fatal internal error is detected, or when "os._exit()" is called. Changed in version 3.7: When used with C-API subinterpreters, registered functions are local to the interpreter they were registered in. atexit.register(func, *args, **kwargs) Register *func* as a function to be executed at termination. Any optional arguments that are to be passed to *func* must be passed as arguments to "register()". It is possible to register the same function and arguments more than once. At normal program termination (for instance, if "sys.exit()" is called or the main module’s execution completes), all functions registered are called in last in, first out order. The assumption is that lower level modules will normally be imported before higher level modules and thus must be cleaned up later. If an exception is raised during execution of the exit handlers, a traceback is printed (unless "SystemExit" is raised) and the exception information is saved. After all exit handlers have had a chance to run, the last exception to be raised is re-raised. This function returns *func*, which makes it possible to use it as a decorator. atexit.unregister(func) Remove *func* from the list of functions to be run at interpreter shutdown. "unregister()" silently does nothing if *func* was not previously registered. If *func* has been registered more than once, every occurrence of that function in the "atexit" call stack will be removed. Equality comparisons ("==") are used internally during unregistration, so function references do not need to have matching identities. See also: Module "readline" Useful example of "atexit" to read and write "readline" history files. "atexit" Example ================ The following simple example demonstrates how a module can initialize a counter from a file when it is imported and save the counter’s updated value automatically when the program terminates without relying on the application making an explicit call into this module at termination. try: with open('counterfile') as infile: _count = int(infile.read()) except FileNotFoundError: _count = 0 def incrcounter(n): global _count _count = _count + n def savecounter(): with open('counterfile', 'w') as outfile: outfile.write('%d' % _count) import atexit atexit.register(savecounter) Positional and keyword arguments may also be passed to "register()" to be passed along to the registered function when it is called: def goodbye(name, adjective): print('Goodbye %s, it was %s to meet you.' % (name, adjective)) import atexit atexit.register(goodbye, 'Donny', 'nice') # or: atexit.register(goodbye, adjective='nice', name='Donny') Usage as a *decorator*: import atexit @atexit.register def goodbye(): print('You are now leaving the Python sector.') This only works with functions that can be called without arguments. "audioop" — Manipulate raw audio data ************************************* Deprecated since version 3.11: The "audioop" module is deprecated (see **PEP 594** for details). ====================================================================== The "audioop" module contains some useful operations on sound fragments. It operates on sound fragments consisting of signed integer samples 8, 16, 24 or 32 bits wide, stored in *bytes-like objects*. All scalar items are integers, unless specified otherwise. Changed in version 3.4: Support for 24-bit samples was added. All functions now accept any *bytes-like object*. String input now results in an immediate error. This module provides support for a-LAW, u-LAW and Intel/DVI ADPCM encodings. A few of the more complicated operations only take 16-bit samples, otherwise the sample size (in bytes) is always a parameter of the operation. The module defines the following variables and functions: exception audioop.error This exception is raised on all errors, such as unknown number of bytes per sample, etc. audioop.add(fragment1, fragment2, width) Return a fragment which is the addition of the two samples passed as parameters. *width* is the sample width in bytes, either "1", "2", "3" or "4". Both fragments should have the same length. Samples are truncated in case of overflow. audioop.adpcm2lin(adpcmfragment, width, state) Decode an Intel/DVI ADPCM coded fragment to a linear fragment. See the description of "lin2adpcm()" for details on ADPCM coding. Return a tuple "(sample, newstate)" where the sample has the width specified in *width*. audioop.alaw2lin(fragment, width) Convert sound fragments in a-LAW encoding to linearly encoded sound fragments. a-LAW encoding always uses 8 bits samples, so *width* refers only to the sample width of the output fragment here. audioop.avg(fragment, width) Return the average over all samples in the fragment. audioop.avgpp(fragment, width) Return the average peak-peak value over all samples in the fragment. No filtering is done, so the usefulness of this routine is questionable. audioop.bias(fragment, width, bias) Return a fragment that is the original fragment with a bias added to each sample. Samples wrap around in case of overflow. audioop.byteswap(fragment, width) “Byteswap” all samples in a fragment and returns the modified fragment. Converts big-endian samples to little-endian and vice versa. New in version 3.4. audioop.cross(fragment, width) Return the number of zero crossings in the fragment passed as an argument. audioop.findfactor(fragment, reference) Return a factor *F* such that "rms(add(fragment, mul(reference, -F)))" is minimal, i.e., return the factor with which you should multiply *reference* to make it match as well as possible to *fragment*. The fragments should both contain 2-byte samples. The time taken by this routine is proportional to "len(fragment)". audioop.findfit(fragment, reference) Try to match *reference* as well as possible to a portion of *fragment* (which should be the longer fragment). This is (conceptually) done by taking slices out of *fragment*, using "findfactor()" to compute the best match, and minimizing the result. The fragments should both contain 2-byte samples. Return a tuple "(offset, factor)" where *offset* is the (integer) offset into *fragment* where the optimal match started and *factor* is the (floating-point) factor as per "findfactor()". audioop.findmax(fragment, length) Search *fragment* for a slice of length *length* samples (not bytes!) with maximum energy, i.e., return *i* for which "rms(fragment[i*2:(i+length)*2])" is maximal. The fragments should both contain 2-byte samples. The routine takes time proportional to "len(fragment)". audioop.getsample(fragment, width, index) Return the value of sample *index* from the fragment. audioop.lin2adpcm(fragment, width, state) Convert samples to 4 bit Intel/DVI ADPCM encoding. ADPCM coding is an adaptive coding scheme, whereby each 4 bit number is the difference between one sample and the next, divided by a (varying) step. The Intel/DVI ADPCM algorithm has been selected for use by the IMA, so it may well become a standard. *state* is a tuple containing the state of the coder. The coder returns a tuple "(adpcmfrag, newstate)", and the *newstate* should be passed to the next call of "lin2adpcm()". In the initial call, "None" can be passed as the state. *adpcmfrag* is the ADPCM coded fragment packed 2 4-bit values per byte. audioop.lin2alaw(fragment, width) Convert samples in the audio fragment to a-LAW encoding and return this as a bytes object. a-LAW is an audio encoding format whereby you get a dynamic range of about 13 bits using only 8 bit samples. It is used by the Sun audio hardware, among others. audioop.lin2lin(fragment, width, newwidth) Convert samples between 1-, 2-, 3- and 4-byte formats. Note: In some audio formats, such as .WAV files, 16, 24 and 32 bit samples are signed, but 8 bit samples are unsigned. So when converting to 8 bit wide samples for these formats, you need to also add 128 to the result: new_frames = audioop.lin2lin(frames, old_width, 1) new_frames = audioop.bias(new_frames, 1, 128) The same, in reverse, has to be applied when converting from 8 to 16, 24 or 32 bit width samples. audioop.lin2ulaw(fragment, width) Convert samples in the audio fragment to u-LAW encoding and return this as a bytes object. u-LAW is an audio encoding format whereby you get a dynamic range of about 14 bits using only 8 bit samples. It is used by the Sun audio hardware, among others. audioop.max(fragment, width) Return the maximum of the *absolute value* of all samples in a fragment. audioop.maxpp(fragment, width) Return the maximum peak-peak value in the sound fragment. audioop.minmax(fragment, width) Return a tuple consisting of the minimum and maximum values of all samples in the sound fragment. audioop.mul(fragment, width, factor) Return a fragment that has all samples in the original fragment multiplied by the floating-point value *factor*. Samples are truncated in case of overflow. audioop.ratecv(fragment, width, nchannels, inrate, outrate, state[, weightA[, weightB]]) Convert the frame rate of the input fragment. *state* is a tuple containing the state of the converter. The converter returns a tuple "(newfragment, newstate)", and *newstate* should be passed to the next call of "ratecv()". The initial call should pass "None" as the state. The *weightA* and *weightB* arguments are parameters for a simple digital filter and default to "1" and "0" respectively. audioop.reverse(fragment, width) Reverse the samples in a fragment and returns the modified fragment. audioop.rms(fragment, width) Return the root-mean-square of the fragment, i.e. "sqrt(sum(S_i^2)/n)". This is a measure of the power in an audio signal. audioop.tomono(fragment, width, lfactor, rfactor) Convert a stereo fragment to a mono fragment. The left channel is multiplied by *lfactor* and the right channel by *rfactor* before adding the two channels to give a mono signal. audioop.tostereo(fragment, width, lfactor, rfactor) Generate a stereo fragment from a mono fragment. Each pair of samples in the stereo fragment are computed from the mono sample, whereby left channel samples are multiplied by *lfactor* and right channel samples by *rfactor*. audioop.ulaw2lin(fragment, width) Convert sound fragments in u-LAW encoding to linearly encoded sound fragments. u-LAW encoding always uses 8 bits samples, so *width* refers only to the sample width of the output fragment here. Note that operations such as "mul()" or "max()" make no distinction between mono and stereo fragments, i.e. all samples are treated equal. If this is a problem the stereo fragment should be split into two mono fragments first and recombined later. Here is an example of how to do that: def mul_stereo(sample, width, lfactor, rfactor): lsample = audioop.tomono(sample, width, 1, 0) rsample = audioop.tomono(sample, width, 0, 1) lsample = audioop.mul(lsample, width, lfactor) rsample = audioop.mul(rsample, width, rfactor) lsample = audioop.tostereo(lsample, width, 1, 0) rsample = audioop.tostereo(rsample, width, 0, 1) return audioop.add(lsample, rsample, width) If you use the ADPCM coder to build network packets and you want your protocol to be stateless (i.e. to be able to tolerate packet loss) you should not only transmit the data but also the state. Note that you should send the *initial* state (the one you passed to "lin2adpcm()") along to the decoder, not the final state (as returned by the coder). If you want to use "struct.Struct" to store the state in binary you can code the first element (the predicted value) in 16 bits and the second (the delta index) in 8. The ADPCM coders have never been tried against other ADPCM coders, only against themselves. It could well be that I misinterpreted the standards in which case they will not be interoperable with the respective standards. The "find*()" routines might look a bit funny at first sight. They are primarily meant to do echo cancellation. A reasonably fast way to do this is to pick the most energetic piece of the output sample, locate that in the input sample and subtract the whole output sample from the input sample: def echocancel(outputdata, inputdata): pos = audioop.findmax(outputdata, 800) # one tenth second out_test = outputdata[pos*2:] in_test = inputdata[pos*2:] ipos, factor = audioop.findfit(in_test, out_test) # Optional (for better cancellation): # factor = audioop.findfactor(in_test[ipos*2:ipos*2+len(out_test)], # out_test) prefill = '\0'*(pos+ipos)*2 postfill = '\0'*(len(inputdata)-len(prefill)-len(outputdata)) outputdata = prefill + audioop.mul(outputdata, 2, -factor) + postfill return audioop.add(inputdata, outputdata, 2) Audit events table ****************** This table contains all events raised by "sys.audit()" or "PySys_Audit()" calls throughout the CPython runtime and the standard library. These calls were added in 3.8.0 or later (see **PEP 578**). See "sys.addaudithook()" and "PySys_AddAuditHook()" for information on handling these events. **CPython implementation detail:** This table is generated from the CPython documentation, and may not represent events raised by other implementations. See your runtime specific documentation for actual events raised. +--------------------------------+---------------------------------------------------------+-----------------+ | Audit event | Arguments | References | +--------------------------------+---------------------------------------------------------+-----------------+ The following events are raised internally and do not correspond to any public API of CPython: +----------------------------+---------------------------------------------+ | Audit event | Arguments | |============================|=============================================| | _winapi.CreateFile | "file_name", "desired_access", | | | "share_mode", "creation_disposition", | | | "flags_and_attributes" | +----------------------------+---------------------------------------------+ | _winapi.CreateJunction | "src_path", "dst_path" | +----------------------------+---------------------------------------------+ | _winapi.CreateNamedPipe | "name", "open_mode", "pipe_mode" | +----------------------------+---------------------------------------------+ | _winapi.CreatePipe | | +----------------------------+---------------------------------------------+ | _winapi.CreateProcess | "application_name", "command_line", | | | "current_directory" | +----------------------------+---------------------------------------------+ | _winapi.OpenProcess | "process_id", "desired_access" | +----------------------------+---------------------------------------------+ | _winapi.TerminateProcess | "handle", "exit_code" | +----------------------------+---------------------------------------------+ | ctypes.PyObj_FromPtr | "obj" | +----------------------------+---------------------------------------------+ "base64" — Base16, Base32, Base64, Base85 Data Encodings ******************************************************** **Source code:** Lib/base64.py ====================================================================== This module provides functions for encoding binary data to printable ASCII characters and decoding such encodings back to binary data. It provides encoding and decoding functions for the encodings specified in **RFC 3548**, which defines the Base16, Base32, and Base64 algorithms, and for the de-facto standard Ascii85 and Base85 encodings. The **RFC 3548** encodings are suitable for encoding binary data so that it can safely sent by email, used as parts of URLs, or included as part of an HTTP POST request. The encoding algorithm is not the same as the **uuencode** program. There are two interfaces provided by this module. The modern interface supports encoding *bytes-like objects* to ASCII "bytes", and decoding *bytes-like objects* or strings containing ASCII to "bytes". Both base-64 alphabets defined in **RFC 3548** (normal, and URL- and filesystem-safe) are supported. The legacy interface does not support decoding from strings, but it does provide functions for encoding and decoding to and from *file objects*. It only supports the Base64 standard alphabet, and it adds newlines every 76 characters as per **RFC 2045**. Note that if you are looking for **RFC 2045** support you probably want to be looking at the "email" package instead. Changed in version 3.3: ASCII-only Unicode strings are now accepted by the decoding functions of the modern interface. Changed in version 3.4: Any *bytes-like objects* are now accepted by all encoding and decoding functions in this module. Ascii85/Base85 support added. The modern interface provides: base64.b64encode(s, altchars=None) Encode the *bytes-like object* *s* using Base64 and return the encoded "bytes". Optional *altchars* must be a *bytes-like object* of at least length 2 (additional characters are ignored) which specifies an alternative alphabet for the "+" and "/" characters. This allows an application to e.g. generate URL or filesystem safe Base64 strings. The default is "None", for which the standard Base64 alphabet is used. base64.b64decode(s, altchars=None, validate=False) Decode the Base64 encoded *bytes-like object* or ASCII string *s* and return the decoded "bytes". Optional *altchars* must be a *bytes-like object* or ASCII string of at least length 2 (additional characters are ignored) which specifies the alternative alphabet used instead of the "+" and "/" characters. A "binascii.Error" exception is raised if *s* is incorrectly padded. If *validate* is "False" (the default), characters that are neither in the normal base-64 alphabet nor the alternative alphabet are discarded prior to the padding check. If *validate* is "True", these non-alphabet characters in the input result in a "binascii.Error". base64.standard_b64encode(s) Encode *bytes-like object* *s* using the standard Base64 alphabet and return the encoded "bytes". base64.standard_b64decode(s) Decode *bytes-like object* or ASCII string *s* using the standard Base64 alphabet and return the decoded "bytes". base64.urlsafe_b64encode(s) Encode *bytes-like object* *s* using the URL- and filesystem-safe alphabet, which substitutes "-" instead of "+" and "_" instead of "/" in the standard Base64 alphabet, and return the encoded "bytes". The result can still contain "=". base64.urlsafe_b64decode(s) Decode *bytes-like object* or ASCII string *s* using the URL- and filesystem-safe alphabet, which substitutes "-" instead of "+" and "_" instead of "/" in the standard Base64 alphabet, and return the decoded "bytes". base64.b32encode(s) Encode the *bytes-like object* *s* using Base32 and return the encoded "bytes". base64.b32decode(s, casefold=False, map01=None) Decode the Base32 encoded *bytes-like object* or ASCII string *s* and return the decoded "bytes". Optional *casefold* is a flag specifying whether a lowercase alphabet is acceptable as input. For security purposes, the default is "False". **RFC 3548** allows for optional mapping of the digit 0 (zero) to the letter O (oh), and for optional mapping of the digit 1 (one) to either the letter I (eye) or letter L (el). The optional argument *map01* when not "None", specifies which letter the digit 1 should be mapped to (when *map01* is not "None", the digit 0 is always mapped to the letter O). For security purposes the default is "None", so that 0 and 1 are not allowed in the input. A "binascii.Error" is raised if *s* is incorrectly padded or if there are non-alphabet characters present in the input. base64.b16encode(s) Encode the *bytes-like object* *s* using Base16 and return the encoded "bytes". base64.b16decode(s, casefold=False) Decode the Base16 encoded *bytes-like object* or ASCII string *s* and return the decoded "bytes". Optional *casefold* is a flag specifying whether a lowercase alphabet is acceptable as input. For security purposes, the default is "False". A "binascii.Error" is raised if *s* is incorrectly padded or if there are non-alphabet characters present in the input. base64.a85encode(b, *, foldspaces=False, wrapcol=0, pad=False, adobe=False) Encode the *bytes-like object* *b* using Ascii85 and return the encoded "bytes". *foldspaces* is an optional flag that uses the special short sequence ‘y’ instead of 4 consecutive spaces (ASCII 0x20) as supported by ‘btoa’. This feature is not supported by the “standard” Ascii85 encoding. *wrapcol* controls whether the output should have newline ("b'\n'") characters added to it. If this is non-zero, each output line will be at most this many characters long. *pad* controls whether the input is padded to a multiple of 4 before encoding. Note that the "btoa" implementation always pads. *adobe* controls whether the encoded byte sequence is framed with "<~" and "~>", which is used by the Adobe implementation. New in version 3.4. base64.a85decode(b, *, foldspaces=False, adobe=False, ignorechars=b' \t\n\r\v') Decode the Ascii85 encoded *bytes-like object* or ASCII string *b* and return the decoded "bytes". *foldspaces* is a flag that specifies whether the ‘y’ short sequence should be accepted as shorthand for 4 consecutive spaces (ASCII 0x20). This feature is not supported by the “standard” Ascii85 encoding. *adobe* controls whether the input sequence is in Adobe Ascii85 format (i.e. is framed with <~ and ~>). *ignorechars* should be a *bytes-like object* or ASCII string containing characters to ignore from the input. This should only contain whitespace characters, and by default contains all whitespace characters in ASCII. New in version 3.4. base64.b85encode(b, pad=False) Encode the *bytes-like object* *b* using base85 (as used in e.g. git-style binary diffs) and return the encoded "bytes". If *pad* is true, the input is padded with "b'\0'" so its length is a multiple of 4 bytes before encoding. New in version 3.4. base64.b85decode(b) Decode the base85-encoded *bytes-like object* or ASCII string *b* and return the decoded "bytes". Padding is implicitly removed, if necessary. New in version 3.4. The legacy interface: base64.decode(input, output) Decode the contents of the binary *input* file and write the resulting binary data to the *output* file. *input* and *output* must be *file objects*. *input* will be read until "input.readline()" returns an empty bytes object. base64.decodebytes(s) Decode the *bytes-like object* *s*, which must contain one or more lines of base64 encoded data, and return the decoded "bytes". New in version 3.1. base64.encode(input, output) Encode the contents of the binary *input* file and write the resulting base64 encoded data to the *output* file. *input* and *output* must be *file objects*. *input* will be read until "input.read()" returns an empty bytes object. "encode()" inserts a newline character ("b'\n'") after every 76 bytes of the output, as well as ensuring that the output always ends with a newline, as per **RFC 2045** (MIME). base64.encodebytes(s) Encode the *bytes-like object* *s*, which can contain arbitrary binary data, and return "bytes" containing the base64-encoded data, with newlines ("b'\n'") inserted after every 76 bytes of output, and ensuring that there is a trailing newline, as per **RFC 2045** (MIME). New in version 3.1. An example usage of the module: >>> import base64 >>> encoded = base64.b64encode(b'data to be encoded') >>> encoded b'ZGF0YSB0byBiZSBlbmNvZGVk' >>> data = base64.b64decode(encoded) >>> data b'data to be encoded' See also: Module "binascii" Support module containing ASCII-to-binary and binary-to-ASCII conversions. **RFC 1521** - MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies Section 5.2, “Base64 Content-Transfer-Encoding,” provides the definition of the base64 encoding. "bdb" — Debugger framework ************************** **Source code:** Lib/bdb.py ====================================================================== The "bdb" module handles basic debugger functions, like setting breakpoints or managing execution via the debugger. The following exception is defined: exception bdb.BdbQuit Exception raised by the "Bdb" class for quitting the debugger. The "bdb" module also defines two classes: class bdb.Breakpoint(self, file, line, temporary=0, cond=None, funcname=None) This class implements temporary breakpoints, ignore counts, disabling and (re-)enabling, and conditionals. Breakpoints are indexed by number through a list called "bpbynumber" and by "(file, line)" pairs through "bplist". The former points to a single instance of class "Breakpoint". The latter points to a list of such instances since there may be more than one breakpoint per line. When creating a breakpoint, its associated filename should be in canonical form. If a *funcname* is defined, a breakpoint hit will be counted when the first line of that function is executed. A conditional breakpoint always counts a hit. "Breakpoint" instances have the following methods: deleteMe() Delete the breakpoint from the list associated to a file/line. If it is the last breakpoint in that position, it also deletes the entry for the file/line. enable() Mark the breakpoint as enabled. disable() Mark the breakpoint as disabled. bpformat() Return a string with all the information about the breakpoint, nicely formatted: * The breakpoint number. * If it is temporary or not. * Its file,line position. * The condition that causes a break. * If it must be ignored the next N times. * The breakpoint hit count. New in version 3.2. bpprint(out=None) Print the output of "bpformat()" to the file *out*, or if it is "None", to standard output. class bdb.Bdb(skip=None) The "Bdb" class acts as a generic Python debugger base class. This class takes care of the details of the trace facility; a derived class should implement user interaction. The standard debugger class ("pdb.Pdb") is an example. The *skip* argument, if given, must be an iterable of glob-style module name patterns. The debugger will not step into frames that originate in a module that matches one of these patterns. Whether a frame is considered to originate in a certain module is determined by the "__name__" in the frame globals. New in version 3.1: The *skip* argument. The following methods of "Bdb" normally don’t need to be overridden. canonic(filename) Auxiliary method for getting a filename in a canonical form, that is, as a case-normalized (on case-insensitive filesystems) absolute path, stripped of surrounding angle brackets. reset() Set the "botframe", "stopframe", "returnframe" and "quitting" attributes with values ready to start debugging. trace_dispatch(frame, event, arg) This function is installed as the trace function of debugged frames. Its return value is the new trace function (in most cases, that is, itself). The default implementation decides how to dispatch a frame, depending on the type of event (passed as a string) that is about to be executed. *event* can be one of the following: * ""line"": A new line of code is going to be executed. * ""call"": A function is about to be called, or another code block entered. * ""return"": A function or other code block is about to return. * ""exception"": An exception has occurred. * ""c_call"": A C function is about to be called. * ""c_return"": A C function has returned. * ""c_exception"": A C function has raised an exception. For the Python events, specialized functions (see below) are called. For the C events, no action is taken. The *arg* parameter depends on the previous event. See the documentation for "sys.settrace()" for more information on the trace function. For more information on code and frame objects, refer to The standard type hierarchy. dispatch_line(frame) If the debugger should stop on the current line, invoke the "user_line()" method (which should be overridden in subclasses). Raise a "BdbQuit" exception if the "Bdb.quitting" flag is set (which can be set from "user_line()"). Return a reference to the "trace_dispatch()" method for further tracing in that scope. dispatch_call(frame, arg) If the debugger should stop on this function call, invoke the "user_call()" method (which should be overridden in subclasses). Raise a "BdbQuit" exception if the "Bdb.quitting" flag is set (which can be set from "user_call()"). Return a reference to the "trace_dispatch()" method for further tracing in that scope. dispatch_return(frame, arg) If the debugger should stop on this function return, invoke the "user_return()" method (which should be overridden in subclasses). Raise a "BdbQuit" exception if the "Bdb.quitting" flag is set (which can be set from "user_return()"). Return a reference to the "trace_dispatch()" method for further tracing in that scope. dispatch_exception(frame, arg) If the debugger should stop at this exception, invokes the "user_exception()" method (which should be overridden in subclasses). Raise a "BdbQuit" exception if the "Bdb.quitting" flag is set (which can be set from "user_exception()"). Return a reference to the "trace_dispatch()" method for further tracing in that scope. Normally derived classes don’t override the following methods, but they may if they want to redefine the definition of stopping and breakpoints. stop_here(frame) This method checks if the *frame* is somewhere below "botframe" in the call stack. "botframe" is the frame in which debugging started. break_here(frame) This method checks if there is a breakpoint in the filename and line belonging to *frame* or, at least, in the current function. If the breakpoint is a temporary one, this method deletes it. break_anywhere(frame) This method checks if there is a breakpoint in the filename of the current frame. Derived classes should override these methods to gain control over debugger operation. user_call(frame, argument_list) This method is called from "dispatch_call()" when there is the possibility that a break might be necessary anywhere inside the called function. user_line(frame) This method is called from "dispatch_line()" when either "stop_here()" or "break_here()" yields "True". user_return(frame, return_value) This method is called from "dispatch_return()" when "stop_here()" yields "True". user_exception(frame, exc_info) This method is called from "dispatch_exception()" when "stop_here()" yields "True". do_clear(arg) Handle how a breakpoint must be removed when it is a temporary one. This method must be implemented by derived classes. Derived classes and clients can call the following methods to affect the stepping state. set_step() Stop after one line of code. set_next(frame) Stop on the next line in or below the given frame. set_return(frame) Stop when returning from the given frame. set_until(frame) Stop when the line with the line no greater than the current one is reached or when returning from current frame. set_trace([frame]) Start debugging from *frame*. If *frame* is not specified, debugging starts from caller’s frame. set_continue() Stop only at breakpoints or when finished. If there are no breakpoints, set the system trace function to "None". set_quit() Set the "quitting" attribute to "True". This raises "BdbQuit" in the next call to one of the "dispatch_*()" methods. Derived classes and clients can call the following methods to manipulate breakpoints. These methods return a string containing an error message if something went wrong, or "None" if all is well. set_break(filename, lineno, temporary=0, cond, funcname) Set a new breakpoint. If the *lineno* line doesn’t exist for the *filename* passed as argument, return an error message. The *filename* should be in canonical form, as described in the "canonic()" method. clear_break(filename, lineno) Delete the breakpoints in *filename* and *lineno*. If none were set, an error message is returned. clear_bpbynumber(arg) Delete the breakpoint which has the index *arg* in the "Breakpoint.bpbynumber". If *arg* is not numeric or out of range, return an error message. clear_all_file_breaks(filename) Delete all breakpoints in *filename*. If none were set, an error message is returned. clear_all_breaks() Delete all existing breakpoints. get_bpbynumber(arg) Return a breakpoint specified by the given number. If *arg* is a string, it will be converted to a number. If *arg* is a non- numeric string, if the given breakpoint never existed or has been deleted, a "ValueError" is raised. New in version 3.2. get_break(filename, lineno) Check if there is a breakpoint for *lineno* of *filename*. get_breaks(filename, lineno) Return all breakpoints for *lineno* in *filename*, or an empty list if none are set. get_file_breaks(filename) Return all breakpoints in *filename*, or an empty list if none are set. get_all_breaks() Return all breakpoints that are set. Derived classes and clients can call the following methods to get a data structure representing a stack trace. get_stack(f, t) Get a list of records for a frame and all higher (calling) and lower frames, and the size of the higher part. format_stack_entry(frame_lineno, lprefix=': ') Return a string with information about a stack entry, identified by a "(frame, lineno)" tuple: * The canonical form of the filename which contains the frame. * The function name, or """". * The input arguments. * The return value. * The line of code (if it exists). The following two methods can be called by clients to use a debugger to debug a *statement*, given as a string. run(cmd, globals=None, locals=None) Debug a statement executed via the "exec()" function. *globals* defaults to "__main__.__dict__", *locals* defaults to *globals*. runeval(expr, globals=None, locals=None) Debug an expression executed via the "eval()" function. *globals* and *locals* have the same meaning as in "run()". runctx(cmd, globals, locals) For backwards compatibility. Calls the "run()" method. runcall(func, /, *args, **kwds) Debug a single function call, and return its result. Finally, the module defines the following functions: bdb.checkfuncname(b, frame) Check whether we should break here, depending on the way the breakpoint *b* was set. If it was set via line number, it checks if "b.line" is the same as the one in the frame also passed as argument. If the breakpoint was set via function name, we have to check we are in the right frame (the right function) and if we are in its first executable line. bdb.effective(file, line, frame) Determine if there is an effective (active) breakpoint at this line of code. Return a tuple of the breakpoint and a boolean that indicates if it is ok to delete a temporary breakpoint. Return "(None, None)" if there is no matching breakpoint. bdb.set_trace() Start debugging with a "Bdb" instance from caller’s frame. Binary Data Services ******************** The modules described in this chapter provide some basic services operations for manipulation of binary data. Other operations on binary data, specifically in relation to file formats and network protocols, are described in the relevant sections. Some libraries described under Text Processing Services also work with either ASCII-compatible binary formats (for example, "re") or all binary data (for example, "difflib"). In addition, see the documentation for Python’s built-in binary data types in Binary Sequence Types — bytes, bytearray, memoryview. * "struct" — Interpret bytes as packed binary data * Functions and Exceptions * Format Strings * Byte Order, Size, and Alignment * Format Characters * Examples * Classes * "codecs" — Codec registry and base classes * Codec Base Classes * Error Handlers * Stateless Encoding and Decoding * Incremental Encoding and Decoding * IncrementalEncoder Objects * IncrementalDecoder Objects * Stream Encoding and Decoding * StreamWriter Objects * StreamReader Objects * StreamReaderWriter Objects * StreamRecoder Objects * Encodings and Unicode * Standard Encodings * Python Specific Encodings * Text Encodings * Binary Transforms * Text Transforms * "encodings.idna" — Internationalized Domain Names in Applications * "encodings.mbcs" — Windows ANSI codepage * "encodings.utf_8_sig" — UTF-8 codec with BOM signature "binascii" — Convert between binary and ASCII ********************************************* ====================================================================== The "binascii" module contains a number of methods to convert between binary and various ASCII-encoded binary representations. Normally, you will not use these functions directly but use wrapper modules like "uu", "base64", or "binhex" instead. The "binascii" module contains low-level functions written in C for greater speed that are used by the higher-level modules. Note: "a2b_*" functions accept Unicode strings containing only ASCII characters. Other functions only accept *bytes-like objects* (such as "bytes", "bytearray" and other objects that support the buffer protocol). Changed in version 3.3: ASCII-only unicode strings are now accepted by the "a2b_*" functions. The "binascii" module defines the following functions: binascii.a2b_uu(string) Convert a single line of uuencoded data back to binary and return the binary data. Lines normally contain 45 (binary) bytes, except for the last line. Line data may be followed by whitespace. binascii.b2a_uu(data, *, backtick=False) Convert binary data to a line of ASCII characters, the return value is the converted line, including a newline char. The length of *data* should be at most 45. If *backtick* is true, zeros are represented by "'`'" instead of spaces. Changed in version 3.7: Added the *backtick* parameter. binascii.a2b_base64(string) Convert a block of base64 data back to binary and return the binary data. More than one line may be passed at a time. binascii.b2a_base64(data, *, newline=True) Convert binary data to a line of ASCII characters in base64 coding. The return value is the converted line, including a newline char if *newline* is true. The output of this function conforms to **RFC 3548**. Changed in version 3.6: Added the *newline* parameter. binascii.a2b_qp(data, header=False) Convert a block of quoted-printable data back to binary and return the binary data. More than one line may be passed at a time. If the optional argument *header* is present and true, underscores will be decoded as spaces. binascii.b2a_qp(data, quotetabs=False, istext=True, header=False) Convert binary data to a line(s) of ASCII characters in quoted- printable encoding. The return value is the converted line(s). If the optional argument *quotetabs* is present and true, all tabs and spaces will be encoded. If the optional argument *istext* is present and true, newlines are not encoded but trailing whitespace will be encoded. If the optional argument *header* is present and true, spaces will be encoded as underscores per **RFC 1522**. If the optional argument *header* is present and false, newline characters will be encoded as well; otherwise linefeed conversion might corrupt the binary data stream. binascii.a2b_hqx(string) Convert binhex4 formatted ASCII data to binary, without doing RLE- decompression. The string should contain a complete number of binary bytes, or (in case of the last portion of the binhex4 data) have the remaining bits zero. Deprecated since version 3.9. binascii.rledecode_hqx(data) Perform RLE-decompression on the data, as per the binhex4 standard. The algorithm uses "0x90" after a byte as a repeat indicator, followed by a count. A count of "0" specifies a byte value of "0x90". The routine returns the decompressed data, unless data input data ends in an orphaned repeat indicator, in which case the "Incomplete" exception is raised. Changed in version 3.2: Accept only bytestring or bytearray objects as input. Deprecated since version 3.9. binascii.rlecode_hqx(data) Perform binhex4 style RLE-compression on *data* and return the result. Deprecated since version 3.9. binascii.b2a_hqx(data) Perform hexbin4 binary-to-ASCII translation and return the resulting string. The argument should already be RLE-coded, and have a length divisible by 3 (except possibly the last fragment). Deprecated since version 3.9. binascii.crc_hqx(data, value) Compute a 16-bit CRC value of *data*, starting with *value* as the initial CRC, and return the result. This uses the CRC-CCITT polynomial *x*^16 + *x*^12 + *x*^5 + 1, often represented as 0x1021. This CRC is used in the binhex4 format. binascii.crc32(data[, value]) Compute CRC-32, the unsigned 32-bit checksum of *data*, starting with an initial CRC of *value*. The default initial CRC is zero. The algorithm is consistent with the ZIP file checksum. Since the algorithm is designed for use as a checksum algorithm, it is not suitable for use as a general hash algorithm. Use as follows: print(binascii.crc32(b"hello world")) # Or, in two pieces: crc = binascii.crc32(b"hello") crc = binascii.crc32(b" world", crc) print('crc32 = {:#010x}'.format(crc)) Changed in version 3.0: The result is always unsigned. To generate the same numeric value when using Python 2 or earlier, use "crc32(data) & 0xffffffff". binascii.b2a_hex(data[, sep[, bytes_per_sep=1]]) binascii.hexlify(data[, sep[, bytes_per_sep=1]]) Return the hexadecimal representation of the binary *data*. Every byte of *data* is converted into the corresponding 2-digit hex representation. The returned bytes object is therefore twice as long as the length of *data*. Similar functionality (but returning a text string) is also conveniently accessible using the "bytes.hex()" method. If *sep* is specified, it must be a single character str or bytes object. It will be inserted in the output after every *bytes_per_sep* input bytes. Separator placement is counted from the right end of the output by default, if you wish to count from the left, supply a negative *bytes_per_sep* value. >>> import binascii >>> binascii.b2a_hex(b'\xb9\x01\xef') b'b901ef' >>> binascii.hexlify(b'\xb9\x01\xef', '-') b'b9-01-ef' >>> binascii.b2a_hex(b'\xb9\x01\xef', b'_', 2) b'b9_01ef' >>> binascii.b2a_hex(b'\xb9\x01\xef', b' ', -2) b'b901 ef' Changed in version 3.8: The *sep* and *bytes_per_sep* parameters were added. binascii.a2b_hex(hexstr) binascii.unhexlify(hexstr) Return the binary data represented by the hexadecimal string *hexstr*. This function is the inverse of "b2a_hex()". *hexstr* must contain an even number of hexadecimal digits (which can be upper or lower case), otherwise an "Error" exception is raised. Similar functionality (accepting only text string arguments, but more liberal towards whitespace) is also accessible using the "bytes.fromhex()" class method. exception binascii.Error Exception raised on errors. These are usually programming errors. exception binascii.Incomplete Exception raised on incomplete data. These are usually not programming errors, but may be handled by reading a little more data and trying again. See also: Module "base64" Support for RFC compliant base64-style encoding in base 16, 32, 64, and 85. Module "binhex" Support for the binhex format used on the Macintosh. Module "uu" Support for UU encoding used on Unix. Module "quopri" Support for quoted-printable encoding used in MIME email messages. "binhex" — Encode and decode binhex4 files ****************************************** **Source code:** Lib/binhex.py Deprecated since version 3.9. ====================================================================== This module encodes and decodes files in binhex4 format, a format allowing representation of Macintosh files in ASCII. Only the data fork is handled. The "binhex" module defines the following functions: binhex.binhex(input, output) Convert a binary file with filename *input* to binhex file *output*. The *output* parameter can either be a filename or a file-like object (any object supporting a "write()" and "close()" method). binhex.hexbin(input, output) Decode a binhex file *input*. *input* may be a filename or a file- like object supporting "read()" and "close()" methods. The resulting file is written to a file named *output*, unless the argument is "None" in which case the output filename is read from the binhex file. The following exception is also defined: exception binhex.Error Exception raised when something can’t be encoded using the binhex format (for example, a filename is too long to fit in the filename field), or when input is not properly encoded binhex data. See also: Module "binascii" Support module containing ASCII-to-binary and binary-to-ASCII conversions. Notes ===== There is an alternative, more powerful interface to the coder and decoder, see the source for details. If you code or decode textfiles on non-Macintosh platforms they will still use the old Macintosh newline convention (carriage-return as end of line). "bisect" — Array bisection algorithm ************************************ **Source code:** Lib/bisect.py ====================================================================== This module provides support for maintaining a list in sorted order without having to sort the list after each insertion. For long lists of items with expensive comparison operations, this can be an improvement over the more common approach. The module is called "bisect" because it uses a basic bisection algorithm to do its work. The source code may be most useful as a working example of the algorithm (the boundary conditions are already right!). The following functions are provided: bisect.bisect_left(a, x, lo=0, hi=len(a)) Locate the insertion point for *x* in *a* to maintain sorted order. The parameters *lo* and *hi* may be used to specify a subset of the list which should be considered; by default the entire list is used. If *x* is already present in *a*, the insertion point will be before (to the left of) any existing entries. The return value is suitable for use as the first parameter to "list.insert()" assuming that *a* is already sorted. The returned insertion point *i* partitions the array *a* into two halves so that "all(val < x for val in a[lo:i])" for the left side and "all(val >= x for val in a[i:hi])" for the right side. bisect.bisect_right(a, x, lo=0, hi=len(a)) bisect.bisect(a, x, lo=0, hi=len(a)) Similar to "bisect_left()", but returns an insertion point which comes after (to the right of) any existing entries of *x* in *a*. The returned insertion point *i* partitions the array *a* into two halves so that "all(val <= x for val in a[lo:i])" for the left side and "all(val > x for val in a[i:hi])" for the right side. bisect.insort_left(a, x, lo=0, hi=len(a)) Insert *x* in *a* in sorted order. This is equivalent to "a.insert(bisect.bisect_left(a, x, lo, hi), x)" assuming that *a* is already sorted. Keep in mind that the O(log n) search is dominated by the slow O(n) insertion step. bisect.insort_right(a, x, lo=0, hi=len(a)) bisect.insort(a, x, lo=0, hi=len(a)) Similar to "insort_left()", but inserting *x* in *a* after any existing entries of *x*. See also: SortedCollection recipe that uses bisect to build a full-featured collection class with straight-forward search methods and support for a key-function. The keys are precomputed to save unnecessary calls to the key function during searches. Searching Sorted Lists ====================== The above "bisect()" functions are useful for finding insertion points but can be tricky or awkward to use for common searching tasks. The following five functions show how to transform them into the standard lookups for sorted lists: def index(a, x): 'Locate the leftmost value exactly equal to x' i = bisect_left(a, x) if i != len(a) and a[i] == x: return i raise ValueError def find_lt(a, x): 'Find rightmost value less than x' i = bisect_left(a, x) if i: return a[i-1] raise ValueError def find_le(a, x): 'Find rightmost value less than or equal to x' i = bisect_right(a, x) if i: return a[i-1] raise ValueError def find_gt(a, x): 'Find leftmost value greater than x' i = bisect_right(a, x) if i != len(a): return a[i] raise ValueError def find_ge(a, x): 'Find leftmost item greater than or equal to x' i = bisect_left(a, x) if i != len(a): return a[i] raise ValueError Other Examples ============== The "bisect()" function can be useful for numeric table lookups. This example uses "bisect()" to look up a letter grade for an exam score (say) based on a set of ordered numeric breakpoints: 90 and up is an ‘A’, 80 to 89 is a ‘B’, and so on: >>> def grade(score, breakpoints=[60, 70, 80, 90], grades='FDCBA'): ... i = bisect(breakpoints, score) ... return grades[i] ... >>> [grade(score) for score in [33, 99, 77, 70, 89, 90, 100]] ['F', 'A', 'C', 'C', 'B', 'A', 'A'] Unlike the "sorted()" function, it does not make sense for the "bisect()" functions to have *key* or *reversed* arguments because that would lead to an inefficient design (successive calls to bisect functions would not “remember” all of the previous key lookups). Instead, it is better to search a list of precomputed keys to find the index of the record in question: >>> data = [('red', 5), ('blue', 1), ('yellow', 8), ('black', 0)] >>> data.sort(key=lambda r: r[1]) >>> keys = [r[1] for r in data] # precomputed list of keys >>> data[bisect_left(keys, 0)] ('black', 0) >>> data[bisect_left(keys, 1)] ('blue', 1) >>> data[bisect_left(keys, 5)] ('red', 5) >>> data[bisect_left(keys, 8)] ('yellow', 8) "builtins" — Built-in objects ***************************** ====================================================================== This module provides direct access to all ‘built-in’ identifiers of Python; for example, "builtins.open" is the full name for the built-in function "open()". See Built-in Functions and Built-in Constants for documentation. This module is not normally accessed explicitly by most applications, but can be useful in modules that provide objects with the same name as a built-in value, but in which the built-in of that name is also needed. For example, in a module that wants to implement an "open()" function that wraps the built-in "open()", this module can be used directly: import builtins def open(path): f = builtins.open(path, 'r') return UpperCaser(f) class UpperCaser: '''Wrapper around a file that converts output to upper-case.''' def __init__(self, f): self._f = f def read(self, count=-1): return self._f.read(count).upper() # ... As an implementation detail, most modules have the name "__builtins__" made available as part of their globals. The value of "__builtins__" is normally either this module or the value of this module’s "__dict__" attribute. Since this is an implementation detail, it may not be used by alternate implementations of Python. "bz2" — Support for **bzip2** compression ***************************************** **Source code:** Lib/bz2.py ====================================================================== This module provides a comprehensive interface for compressing and decompressing data using the bzip2 compression algorithm. The "bz2" module contains: * The "open()" function and "BZ2File" class for reading and writing compressed files. * The "BZ2Compressor" and "BZ2Decompressor" classes for incremental (de)compression. * The "compress()" and "decompress()" functions for one-shot (de)compression. All of the classes in this module may safely be accessed from multiple threads. (De)compression of files ======================== bz2.open(filename, mode='rb', compresslevel=9, encoding=None, errors=None, newline=None) Open a bzip2-compressed file in binary or text mode, returning a *file object*. As with the constructor for "BZ2File", the *filename* argument can be an actual filename (a "str" or "bytes" object), or an existing file object to read from or write to. The *mode* argument can be any of "'r'", "'rb'", "'w'", "'wb'", "'x'", "'xb'", "'a'" or "'ab'" for binary mode, or "'rt'", "'wt'", "'xt'", or "'at'" for text mode. The default is "'rb'". The *compresslevel* argument is an integer from 1 to 9, as for the "BZ2File" constructor. For binary mode, this function is equivalent to the "BZ2File" constructor: "BZ2File(filename, mode, compresslevel=compresslevel)". In this case, the *encoding*, *errors* and *newline* arguments must not be provided. For text mode, a "BZ2File" object is created, and wrapped in an "io.TextIOWrapper" instance with the specified encoding, error handling behavior, and line ending(s). New in version 3.3. Changed in version 3.4: The "'x'" (exclusive creation) mode was added. Changed in version 3.6: Accepts a *path-like object*. class bz2.BZ2File(filename, mode='r', *, compresslevel=9) Open a bzip2-compressed file in binary mode. If *filename* is a "str" or "bytes" object, open the named file directly. Otherwise, *filename* should be a *file object*, which will be used to read or write the compressed data. The *mode* argument can be either "'r'" for reading (default), "'w'" for overwriting, "'x'" for exclusive creation, or "'a'" for appending. These can equivalently be given as "'rb'", "'wb'", "'xb'" and "'ab'" respectively. If *filename* is a file object (rather than an actual file name), a mode of "'w'" does not truncate the file, and is instead equivalent to "'a'". If *mode* is "'w'" or "'a'", *compresslevel* can be an integer between "1" and "9" specifying the level of compression: "1" produces the least compression, and "9" (default) produces the most compression. If *mode* is "'r'", the input file may be the concatenation of multiple compressed streams. "BZ2File" provides all of the members specified by the "io.BufferedIOBase", except for "detach()" and "truncate()". Iteration and the "with" statement are supported. "BZ2File" also provides the following method: peek([n]) Return buffered data without advancing the file position. At least one byte of data will be returned (unless at EOF). The exact number of bytes returned is unspecified. Note: While calling "peek()" does not change the file position of the "BZ2File", it may change the position of the underlying file object (e.g. if the "BZ2File" was constructed by passing a file object for *filename*). New in version 3.3. Changed in version 3.1: Support for the "with" statement was added. Changed in version 3.3: The "fileno()", "readable()", "seekable()", "writable()", "read1()" and "readinto()" methods were added. Changed in version 3.3: Support was added for *filename* being a *file object* instead of an actual filename. Changed in version 3.3: The "'a'" (append) mode was added, along with support for reading multi-stream files. Changed in version 3.4: The "'x'" (exclusive creation) mode was added. Changed in version 3.5: The "read()" method now accepts an argument of "None". Changed in version 3.6: Accepts a *path-like object*. Changed in version 3.9: The *buffering* parameter has been removed. It was ignored and deprecated since Python 3.0. Pass an open file object to control how the file is opened.The *compresslevel* parameter became keyword-only. Incremental (de)compression =========================== class bz2.BZ2Compressor(compresslevel=9) Create a new compressor object. This object may be used to compress data incrementally. For one-shot compression, use the "compress()" function instead. *compresslevel*, if given, must be an integer between "1" and "9". The default is "9". compress(data) Provide data to the compressor object. Returns a chunk of compressed data if possible, or an empty byte string otherwise. When you have finished providing data to the compressor, call the "flush()" method to finish the compression process. flush() Finish the compression process. Returns the compressed data left in internal buffers. The compressor object may not be used after this method has been called. class bz2.BZ2Decompressor Create a new decompressor object. This object may be used to decompress data incrementally. For one-shot compression, use the "decompress()" function instead. Note: This class does not transparently handle inputs containing multiple compressed streams, unlike "decompress()" and "BZ2File". If you need to decompress a multi-stream input with "BZ2Decompressor", you must use a new decompressor for each stream. decompress(data, max_length=-1) Decompress *data* (a *bytes-like object*), returning uncompressed data as bytes. Some of *data* may be buffered internally, for use in later calls to "decompress()". The returned data should be concatenated with the output of any previous calls to "decompress()". If *max_length* is nonnegative, returns at most *max_length* bytes of decompressed data. If this limit is reached and further output can be produced, the "needs_input" attribute will be set to "False". In this case, the next call to "decompress()" may provide *data* as "b''" to obtain more of the output. If all of the input data was decompressed and returned (either because this was less than *max_length* bytes, or because *max_length* was negative), the "needs_input" attribute will be set to "True". Attempting to decompress data after the end of stream is reached raises an *EOFError*. Any data found after the end of the stream is ignored and saved in the "unused_data" attribute. Changed in version 3.5: Added the *max_length* parameter. eof "True" if the end-of-stream marker has been reached. New in version 3.3. unused_data Data found after the end of the compressed stream. If this attribute is accessed before the end of the stream has been reached, its value will be "b''". needs_input "False" if the "decompress()" method can provide more decompressed data before requiring new uncompressed input. New in version 3.5. One-shot (de)compression ======================== bz2.compress(data, compresslevel=9) Compress *data*, a *bytes-like object*. *compresslevel*, if given, must be an integer between "1" and "9". The default is "9". For incremental compression, use a "BZ2Compressor" instead. bz2.decompress(data) Decompress *data*, a *bytes-like object*. If *data* is the concatenation of multiple compressed streams, decompress all of the streams. For incremental decompression, use a "BZ2Decompressor" instead. Changed in version 3.3: Support for multi-stream inputs was added. Examples of usage ================= Below are some examples of typical usage of the "bz2" module. Using "compress()" and "decompress()" to demonstrate round-trip compression: >>> import bz2 >>> data = b"""\ ... Donec rhoncus quis sapien sit amet molestie. Fusce scelerisque vel augue ... nec ullamcorper. Nam rutrum pretium placerat. Aliquam vel tristique lorem, ... sit amet cursus ante. In interdum laoreet mi, sit amet ultrices purus ... pulvinar a. Nam gravida euismod magna, non varius justo tincidunt feugiat. ... Aliquam pharetra lacus non risus vehicula rutrum. Maecenas aliquam leo ... felis. Pellentesque semper nunc sit amet nibh ullamcorper, ac elementum ... dolor luctus. Curabitur lacinia mi ornare consectetur vestibulum.""" >>> c = bz2.compress(data) >>> len(data) / len(c) # Data compression ratio 1.513595166163142 >>> d = bz2.decompress(c) >>> data == d # Check equality to original object after round-trip True Using "BZ2Compressor" for incremental compression: >>> import bz2 >>> def gen_data(chunks=10, chunksize=1000): ... """Yield incremental blocks of chunksize bytes.""" ... for _ in range(chunks): ... yield b"z" * chunksize ... >>> comp = bz2.BZ2Compressor() >>> out = b"" >>> for chunk in gen_data(): ... # Provide data to the compressor object ... out = out + comp.compress(chunk) ... >>> # Finish the compression process. Call this once you have >>> # finished providing data to the compressor. >>> out = out + comp.flush() The example above uses a very “nonrandom” stream of data (a stream of *b”z”* chunks). Random data tends to compress poorly, while ordered, repetitive data usually yields a high compression ratio. Writing and reading a bzip2-compressed file in binary mode: >>> import bz2 >>> data = b"""\ ... Donec rhoncus quis sapien sit amet molestie. Fusce scelerisque vel augue ... nec ullamcorper. Nam rutrum pretium placerat. Aliquam vel tristique lorem, ... sit amet cursus ante. In interdum laoreet mi, sit amet ultrices purus ... pulvinar a. Nam gravida euismod magna, non varius justo tincidunt feugiat. ... Aliquam pharetra lacus non risus vehicula rutrum. Maecenas aliquam leo ... felis. Pellentesque semper nunc sit amet nibh ullamcorper, ac elementum ... dolor luctus. Curabitur lacinia mi ornare consectetur vestibulum.""" >>> with bz2.open("myfile.bz2", "wb") as f: ... # Write compressed data to file ... unused = f.write(data) >>> with bz2.open("myfile.bz2", "rb") as f: ... # Decompress data from file ... content = f.read() >>> content == data # Check equality to original object after round-trip True "calendar" — General calendar-related functions *********************************************** **Source code:** Lib/calendar.py ====================================================================== This module allows you to output calendars like the Unix **cal** program, and provides additional useful functions related to the calendar. By default, these calendars have Monday as the first day of the week, and Sunday as the last (the European convention). Use "setfirstweekday()" to set the first day of the week to Sunday (6) or to any other weekday. Parameters that specify dates are given as integers. For related functionality, see also the "datetime" and "time" modules. The functions and classes defined in this module use an idealized calendar, the current Gregorian calendar extended indefinitely in both directions. This matches the definition of the “proleptic Gregorian” calendar in Dershowitz and Reingold’s book “Calendrical Calculations”, where it’s the base calendar for all computations. Zero and negative years are interpreted as prescribed by the ISO 8601 standard. Year 0 is 1 BC, year -1 is 2 BC, and so on. class calendar.Calendar(firstweekday=0) Creates a "Calendar" object. *firstweekday* is an integer specifying the first day of the week. "0" is Monday (the default), "6" is Sunday. A "Calendar" object provides several methods that can be used for preparing the calendar data for formatting. This class doesn’t do any formatting itself. This is the job of subclasses. "Calendar" instances have the following methods: iterweekdays() Return an iterator for the week day numbers that will be used for one week. The first value from the iterator will be the same as the value of the "firstweekday" property. itermonthdates(year, month) Return an iterator for the month *month* (1–12) in the year *year*. This iterator will return all days (as "datetime.date" objects) for the month and all days before the start of the month or after the end of the month that are required to get a complete week. itermonthdays(year, month) Return an iterator for the month *month* in the year *year* similar to "itermonthdates()", but not restricted by the "datetime.date" range. Days returned will simply be day of the month numbers. For the days outside of the specified month, the day number is "0". itermonthdays2(year, month) Return an iterator for the month *month* in the year *year* similar to "itermonthdates()", but not restricted by the "datetime.date" range. Days returned will be tuples consisting of a day of the month number and a week day number. itermonthdays3(year, month) Return an iterator for the month *month* in the year *year* similar to "itermonthdates()", but not restricted by the "datetime.date" range. Days returned will be tuples consisting of a year, a month and a day of the month numbers. New in version 3.7. itermonthdays4(year, month) Return an iterator for the month *month* in the year *year* similar to "itermonthdates()", but not restricted by the "datetime.date" range. Days returned will be tuples consisting of a year, a month, a day of the month, and a day of the week numbers. New in version 3.7. monthdatescalendar(year, month) Return a list of the weeks in the month *month* of the *year* as full weeks. Weeks are lists of seven "datetime.date" objects. monthdays2calendar(year, month) Return a list of the weeks in the month *month* of the *year* as full weeks. Weeks are lists of seven tuples of day numbers and weekday numbers. monthdayscalendar(year, month) Return a list of the weeks in the month *month* of the *year* as full weeks. Weeks are lists of seven day numbers. yeardatescalendar(year, width=3) Return the data for the specified year ready for formatting. The return value is a list of month rows. Each month row contains up to *width* months (defaulting to 3). Each month contains between 4 and 6 weeks and each week contains 1–7 days. Days are "datetime.date" objects. yeardays2calendar(year, width=3) Return the data for the specified year ready for formatting (similar to "yeardatescalendar()"). Entries in the week lists are tuples of day numbers and weekday numbers. Day numbers outside this month are zero. yeardayscalendar(year, width=3) Return the data for the specified year ready for formatting (similar to "yeardatescalendar()"). Entries in the week lists are day numbers. Day numbers outside this month are zero. class calendar.TextCalendar(firstweekday=0) This class can be used to generate plain text calendars. "TextCalendar" instances have the following methods: formatmonth(theyear, themonth, w=0, l=0) Return a month’s calendar in a multi-line string. If *w* is provided, it specifies the width of the date columns, which are centered. If *l* is given, it specifies the number of lines that each week will use. Depends on the first weekday as specified in the constructor or set by the "setfirstweekday()" method. prmonth(theyear, themonth, w=0, l=0) Print a month’s calendar as returned by "formatmonth()". formatyear(theyear, w=2, l=1, c=6, m=3) Return a *m*-column calendar for an entire year as a multi-line string. Optional parameters *w*, *l*, and *c* are for date column width, lines per week, and number of spaces between month columns, respectively. Depends on the first weekday as specified in the constructor or set by the "setfirstweekday()" method. The earliest year for which a calendar can be generated is platform-dependent. pryear(theyear, w=2, l=1, c=6, m=3) Print the calendar for an entire year as returned by "formatyear()". class calendar.HTMLCalendar(firstweekday=0) This class can be used to generate HTML calendars. "HTMLCalendar" instances have the following methods: formatmonth(theyear, themonth, withyear=True) Return a month’s calendar as an HTML table. If *withyear* is true the year will be included in the header, otherwise just the month name will be used. formatyear(theyear, width=3) Return a year’s calendar as an HTML table. *width* (defaulting to 3) specifies the number of months per row. formatyearpage(theyear, width=3, css='calendar.css', encoding=None) Return a year’s calendar as a complete HTML page. *width* (defaulting to 3) specifies the number of months per row. *css* is the name for the cascading style sheet to be used. "None" can be passed if no style sheet should be used. *encoding* specifies the encoding to be used for the output (defaulting to the system default encoding). "HTMLCalendar" has the following attributes you can override to customize the CSS classes used by the calendar: cssclasses A list of CSS classes used for each weekday. The default class list is: cssclasses = ["mon", "tue", "wed", "thu", "fri", "sat", "sun"] more styles can be added for each day: cssclasses = ["mon text-bold", "tue", "wed", "thu", "fri", "sat", "sun red"] Note that the length of this list must be seven items. cssclass_noday The CSS class for a weekday occurring in the previous or coming month. New in version 3.7. cssclasses_weekday_head A list of CSS classes used for weekday names in the header row. The default is the same as "cssclasses". New in version 3.7. cssclass_month_head The month’s head CSS class (used by "formatmonthname()"). The default value is ""month"". New in version 3.7. cssclass_month The CSS class for the whole month’s table (used by "formatmonth()"). The default value is ""month"". New in version 3.7. cssclass_year The CSS class for the whole year’s table of tables (used by "formatyear()"). The default value is ""year"". New in version 3.7. cssclass_year_head The CSS class for the table head for the whole year (used by "formatyear()"). The default value is ""year"". New in version 3.7. Note that although the naming for the above described class attributes is singular (e.g. "cssclass_month" "cssclass_noday"), one can replace the single CSS class with a space separated list of CSS classes, for example: "text-bold text-red" Here is an example how "HTMLCalendar" can be customized: class CustomHTMLCal(calendar.HTMLCalendar): cssclasses = [style + " text-nowrap" for style in calendar.HTMLCalendar.cssclasses] cssclass_month_head = "text-center month-head" cssclass_month = "text-center month" cssclass_year = "text-italic lead" class calendar.LocaleTextCalendar(firstweekday=0, locale=None) This subclass of "TextCalendar" can be passed a locale name in the constructor and will return month and weekday names in the specified locale. If this locale includes an encoding all strings containing month and weekday names will be returned as unicode. class calendar.LocaleHTMLCalendar(firstweekday=0, locale=None) This subclass of "HTMLCalendar" can be passed a locale name in the constructor and will return month and weekday names in the specified locale. If this locale includes an encoding all strings containing month and weekday names will be returned as unicode. Note: The "formatweekday()" and "formatmonthname()" methods of these two classes temporarily change the current locale to the given *locale*. Because the current locale is a process-wide setting, they are not thread-safe. For simple text calendars this module provides the following functions. calendar.setfirstweekday(weekday) Sets the weekday ("0" is Monday, "6" is Sunday) to start each week. The values "MONDAY", "TUESDAY", "WEDNESDAY", "THURSDAY", "FRIDAY", "SATURDAY", and "SUNDAY" are provided for convenience. For example, to set the first weekday to Sunday: import calendar calendar.setfirstweekday(calendar.SUNDAY) calendar.firstweekday() Returns the current setting for the weekday to start each week. calendar.isleap(year) Returns "True" if *year* is a leap year, otherwise "False". calendar.leapdays(y1, y2) Returns the number of leap years in the range from *y1* to *y2* (exclusive), where *y1* and *y2* are years. This function works for ranges spanning a century change. calendar.weekday(year, month, day) Returns the day of the week ("0" is Monday) for *year* ("1970"–…), *month* ("1"–"12"), *day* ("1"–"31"). calendar.weekheader(n) Return a header containing abbreviated weekday names. *n* specifies the width in characters for one weekday. calendar.monthrange(year, month) Returns weekday of first day of the month and number of days in month, for the specified *year* and *month*. calendar.monthcalendar(year, month) Returns a matrix representing a month’s calendar. Each row represents a week; days outside of the month are represented by zeros. Each week begins with Monday unless set by "setfirstweekday()". calendar.prmonth(theyear, themonth, w=0, l=0) Prints a month’s calendar as returned by "month()". calendar.month(theyear, themonth, w=0, l=0) Returns a month’s calendar in a multi-line string using the "formatmonth()" of the "TextCalendar" class. calendar.prcal(year, w=0, l=0, c=6, m=3) Prints the calendar for an entire year as returned by "calendar()". calendar.calendar(year, w=2, l=1, c=6, m=3) Returns a 3-column calendar for an entire year as a multi-line string using the "formatyear()" of the "TextCalendar" class. calendar.timegm(tuple) An unrelated but handy function that takes a time tuple such as returned by the "gmtime()" function in the "time" module, and returns the corresponding Unix timestamp value, assuming an epoch of 1970, and the POSIX encoding. In fact, "time.gmtime()" and "timegm()" are each others’ inverse. The "calendar" module exports the following data attributes: calendar.day_name An array that represents the days of the week in the current locale. calendar.day_abbr An array that represents the abbreviated days of the week in the current locale. calendar.month_name An array that represents the months of the year in the current locale. This follows normal convention of January being month number 1, so it has a length of 13 and "month_name[0]" is the empty string. calendar.month_abbr An array that represents the abbreviated months of the year in the current locale. This follows normal convention of January being month number 1, so it has a length of 13 and "month_abbr[0]" is the empty string. See also: Module "datetime" Object-oriented interface to dates and times with similar functionality to the "time" module. Module "time" Low-level time related functions. "cgi" — Common Gateway Interface support **************************************** **Source code:** Lib/cgi.py Deprecated since version 3.11: The "cgi" module is deprecated (see **PEP 594** for details and alternatives). ====================================================================== Support module for Common Gateway Interface (CGI) scripts. This module defines a number of utilities for use by CGI scripts written in Python. Introduction ============ A CGI script is invoked by an HTTP server, usually to process user input submitted through an HTML "
" or "" element. Most often, CGI scripts live in the server’s special "cgi-bin" directory. The HTTP server places all sorts of information about the request (such as the client’s hostname, the requested URL, the query string, and lots of other goodies) in the script’s shell environment, executes the script, and sends the script’s output back to the client. The script’s input is connected to the client too, and sometimes the form data is read this way; at other times the form data is passed via the “query string” part of the URL. This module is intended to take care of the different cases and provide a simpler interface to the Python script. It also provides a number of utilities that help in debugging scripts, and the latest addition is support for file uploads from a form (if your browser supports it). The output of a CGI script should consist of two sections, separated by a blank line. The first section contains a number of headers, telling the client what kind of data is following. Python code to generate a minimal header section looks like this: print("Content-Type: text/html") # HTML is following print() # blank line, end of headers The second section is usually HTML, which allows the client software to display nicely formatted text with header, in-line images, etc. Here’s Python code that prints a simple piece of HTML: print("CGI script output") print("

This is my first CGI script

") print("Hello, world!") Using the cgi module ==================== Begin by writing "import cgi". When you write a new script, consider adding these lines: import cgitb cgitb.enable() This activates a special exception handler that will display detailed reports in the Web browser if any errors occur. If you’d rather not show the guts of your program to users of your script, you can have the reports saved to files instead, with code like this: import cgitb cgitb.enable(display=0, logdir="/path/to/logdir") It’s very helpful to use this feature during script development. The reports produced by "cgitb" provide information that can save you a lot of time in tracking down bugs. You can always remove the "cgitb" line later when you have tested your script and are confident that it works correctly. To get at submitted form data, use the "FieldStorage" class. If the form contains non-ASCII characters, use the *encoding* keyword parameter set to the value of the encoding defined for the document. It is usually contained in the META tag in the HEAD section of the HTML document or by the *Content-Type* header. This reads the form contents from the standard input or the environment (depending on the value of various environment variables set according to the CGI standard). Since it may consume standard input, it should be instantiated only once. The "FieldStorage" instance can be indexed like a Python dictionary. It allows membership testing with the "in" operator, and also supports the standard dictionary method "keys()" and the built-in function "len()". Form fields containing empty strings are ignored and do not appear in the dictionary; to keep such values, provide a true value for the optional *keep_blank_values* keyword parameter when creating the "FieldStorage" instance. For instance, the following code (which assumes that the *Content- Type* header and blank line have already been printed) checks that the fields "name" and "addr" are both set to a non-empty string: form = cgi.FieldStorage() if "name" not in form or "addr" not in form: print("

Error

") print("Please fill in the name and addr fields.") return print("

name:", form["name"].value) print("

addr:", form["addr"].value) ...further form processing here... Here the fields, accessed through "form[key]", are themselves instances of "FieldStorage" (or "MiniFieldStorage", depending on the form encoding). The "value" attribute of the instance yields the string value of the field. The "getvalue()" method returns this string value directly; it also accepts an optional second argument as a default to return if the requested key is not present. If the submitted form data contains more than one field with the same name, the object retrieved by "form[key]" is not a "FieldStorage" or "MiniFieldStorage" instance but a list of such instances. Similarly, in this situation, "form.getvalue(key)" would return a list of strings. If you expect this possibility (when your HTML form contains multiple fields with the same name), use the "getlist()" method, which always returns a list of values (so that you do not need to special- case the single item case). For example, this code concatenates any number of username fields, separated by commas: value = form.getlist("username") usernames = ",".join(value) If a field represents an uploaded file, accessing the value via the "value" attribute or the "getvalue()" method reads the entire file in memory as bytes. This may not be what you want. You can test for an uploaded file by testing either the "filename" attribute or the "file" attribute. You can then read the data from the "file" attribute before it is automatically closed as part of the garbage collection of the "FieldStorage" instance (the "read()" and "readline()" methods will return bytes): fileitem = form["userfile"] if fileitem.file: # It's an uploaded file; count lines linecount = 0 while True: line = fileitem.file.readline() if not line: break linecount = linecount + 1 "FieldStorage" objects also support being used in a "with" statement, which will automatically close them when done. If an error is encountered when obtaining the contents of an uploaded file (for example, when the user interrupts the form submission by clicking on a Back or Cancel button) the "done" attribute of the object for the field will be set to the value -1. The file upload draft standard entertains the possibility of uploading multiple files from one field (using a recursive *multipart/** encoding). When this occurs, the item will be a dictionary-like "FieldStorage" item. This can be determined by testing its "type" attribute, which should be *multipart/form-data* (or perhaps another MIME type matching *multipart/**). In this case, it can be iterated over recursively just like the top-level form object. When a form is submitted in the “old” format (as the query string or as a single data part of type *application/x-www-form-urlencoded*), the items will actually be instances of the class "MiniFieldStorage". In this case, the "list", "file", and "filename" attributes are always "None". A form submitted via POST that also has a query string will contain both "FieldStorage" and "MiniFieldStorage" items. Changed in version 3.4: The "file" attribute is automatically closed upon the garbage collection of the creating "FieldStorage" instance. Changed in version 3.5: Added support for the context management protocol to the "FieldStorage" class. Higher Level Interface ====================== The previous section explains how to read CGI form data using the "FieldStorage" class. This section describes a higher level interface which was added to this class to allow one to do it in a more readable and intuitive way. The interface doesn’t make the techniques described in previous sections obsolete — they are still useful to process file uploads efficiently, for example. The interface consists of two simple methods. Using the methods you can process form data in a generic way, without the need to worry whether only one or more values were posted under one name. In the previous section, you learned to write following code anytime you expected a user to post more than one value under one name: item = form.getvalue("item") if isinstance(item, list): # The user is requesting more than one item. else: # The user is requesting only one item. This situation is common for example when a form contains a group of multiple checkboxes with the same name: In most situations, however, there’s only one form control with a particular name in a form and then you expect and need only one value associated with this name. So you write a script containing for example this code: user = form.getvalue("user").upper() The problem with the code is that you should never expect that a client will provide valid input to your scripts. For example, if a curious user appends another "user=foo" pair to the query string, then the script would crash, because in this situation the "getvalue("user")" method call returns a list instead of a string. Calling the "upper()" method on a list is not valid (since lists do not have a method of this name) and results in an "AttributeError" exception. Therefore, the appropriate way to read form data values was to always use the code which checks whether the obtained value is a single value or a list of values. That’s annoying and leads to less readable scripts. A more convenient approach is to use the methods "getfirst()" and "getlist()" provided by this higher level interface. FieldStorage.getfirst(name, default=None) This method always returns only one value associated with form field *name*. The method returns only the first value in case that more values were posted under such name. Please note that the order in which the values are received may vary from browser to browser and should not be counted on. [1] If no such form field or value exists then the method returns the value specified by the optional parameter *default*. This parameter defaults to "None" if not specified. FieldStorage.getlist(name) This method always returns a list of values associated with form field *name*. The method returns an empty list if no such form field or value exists for *name*. It returns a list consisting of one item if only one such value exists. Using these methods you can write nice compact code: import cgi form = cgi.FieldStorage() user = form.getfirst("user", "").upper() # This way it's safe. for item in form.getlist("item"): do_something(item) Functions ========= These are useful if you want more control, or if you want to employ some of the algorithms implemented in this module in other circumstances. cgi.parse(fp=None, environ=os.environ, keep_blank_values=False, strict_parsing=False, separator="&") Parse a query in the environment or from a file (the file defaults to "sys.stdin"). The *keep_blank_values*, *strict_parsing* and *separator* parameters are passed to "urllib.parse.parse_qs()" unchanged. cgi.parse_multipart(fp, pdict, encoding="utf-8", errors="replace", separator="&") Parse input of type *multipart/form-data* (for file uploads). Arguments are *fp* for the input file, *pdict* for a dictionary containing other parameters in the *Content-Type* header, and *encoding*, the request encoding. Returns a dictionary just like "urllib.parse.parse_qs()": keys are the field names, each value is a list of values for that field. For non-file fields, the value is a list of strings. This is easy to use but not much good if you are expecting megabytes to be uploaded — in that case, use the "FieldStorage" class instead which is much more flexible. Changed in version 3.7: Added the *encoding* and *errors* parameters. For non-file fields, the value is now a list of strings, not bytes. Changed in version 3.9.2: Added the *separator* parameter. cgi.parse_header(string) Parse a MIME header (such as *Content-Type*) into a main value and a dictionary of parameters. cgi.test() Robust test CGI script, usable as main program. Writes minimal HTTP headers and formats all information provided to the script in HTML format. cgi.print_environ() Format the shell environment in HTML. cgi.print_form(form) Format a form in HTML. cgi.print_directory() Format the current directory in HTML. cgi.print_environ_usage() Print a list of useful (used by CGI) environment variables in HTML. Caring about security ===================== There’s one important rule: if you invoke an external program (via "os.system()", "os.popen()" or other functions with similar functionality), make very sure you don’t pass arbitrary strings received from the client to the shell. This is a well-known security hole whereby clever hackers anywhere on the Web can exploit a gullible CGI script to invoke arbitrary shell commands. Even parts of the URL or field names cannot be trusted, since the request doesn’t have to come from your form! To be on the safe side, if you must pass a string gotten from a form to a shell command, you should make sure the string contains only alphanumeric characters, dashes, underscores, and periods. Installing your CGI script on a Unix system =========================================== Read the documentation for your HTTP server and check with your local system administrator to find the directory where CGI scripts should be installed; usually this is in a directory "cgi-bin" in the server tree. Make sure that your script is readable and executable by “others”; the Unix file mode should be "0o755" octal (use "chmod 0755 filename"). Make sure that the first line of the script contains "#!" starting in column 1 followed by the pathname of the Python interpreter, for instance: #!/usr/local/bin/python Make sure the Python interpreter exists and is executable by “others”. Make sure that any files your script needs to read or write are readable or writable, respectively, by “others” — their mode should be "0o644" for readable and "0o666" for writable. This is because, for security reasons, the HTTP server executes your script as user “nobody”, without any special privileges. It can only read (write, execute) files that everybody can read (write, execute). The current directory at execution time is also different (it is usually the server’s cgi-bin directory) and the set of environment variables is also different from what you get when you log in. In particular, don’t count on the shell’s search path for executables ("PATH") or the Python module search path ("PYTHONPATH") to be set to anything interesting. If you need to load modules from a directory which is not on Python’s default module search path, you can change the path in your script, before importing other modules. For example: import sys sys.path.insert(0, "/usr/home/joe/lib/python") sys.path.insert(0, "/usr/local/lib/python") (This way, the directory inserted last will be searched first!) Instructions for non-Unix systems will vary; check your HTTP server’s documentation (it will usually have a section on CGI scripts). Testing your CGI script ======================= Unfortunately, a CGI script will generally not run when you try it from the command line, and a script that works perfectly from the command line may fail mysteriously when run from the server. There’s one reason why you should still test your script from the command line: if it contains a syntax error, the Python interpreter won’t execute it at all, and the HTTP server will most likely send a cryptic error to the client. Assuming your script has no syntax errors, yet it does not work, you have no choice but to read the next section. Debugging CGI scripts ===================== First of all, check for trivial installation errors — reading the section above on installing your CGI script carefully can save you a lot of time. If you wonder whether you have understood the installation procedure correctly, try installing a copy of this module file ("cgi.py") as a CGI script. When invoked as a script, the file will dump its environment and the contents of the form in HTML format. Give it the right mode etc., and send it a request. If it’s installed in the standard "cgi-bin" directory, it should be possible to send it a request by entering a URL into your browser of the form: http://yourhostname/cgi-bin/cgi.py?name=Joe+Blow&addr=At+Home If this gives an error of type 404, the server cannot find the script – perhaps you need to install it in a different directory. If it gives another error, there’s an installation problem that you should fix before trying to go any further. If you get a nicely formatted listing of the environment and form content (in this example, the fields should be listed as “addr” with value “At Home” and “name” with value “Joe Blow”), the "cgi.py" script has been installed correctly. If you follow the same procedure for your own script, you should now be able to debug it. The next step could be to call the "cgi" module’s "test()" function from your script: replace its main code with the single statement cgi.test() This should produce the same results as those gotten from installing the "cgi.py" file itself. When an ordinary Python script raises an unhandled exception (for whatever reason: of a typo in a module name, a file that can’t be opened, etc.), the Python interpreter prints a nice traceback and exits. While the Python interpreter will still do this when your CGI script raises an exception, most likely the traceback will end up in one of the HTTP server’s log files, or be discarded altogether. Fortunately, once you have managed to get your script to execute *some* code, you can easily send tracebacks to the Web browser using the "cgitb" module. If you haven’t done so already, just add the lines: import cgitb cgitb.enable() to the top of your script. Then try running it again; when a problem occurs, you should see a detailed report that will likely make apparent the cause of the crash. If you suspect that there may be a problem in importing the "cgitb" module, you can use an even more robust approach (which only uses built-in modules): import sys sys.stderr = sys.stdout print("Content-Type: text/plain") print() ...your code here... This relies on the Python interpreter to print the traceback. The content type of the output is set to plain text, which disables all HTML processing. If your script works, the raw HTML will be displayed by your client. If it raises an exception, most likely after the first two lines have been printed, a traceback will be displayed. Because no HTML interpretation is going on, the traceback will be readable. Common problems and solutions ============================= * Most HTTP servers buffer the output from CGI scripts until the script is completed. This means that it is not possible to display a progress report on the client’s display while the script is running. * Check the installation instructions above. * Check the HTTP server’s log files. ("tail -f logfile" in a separate window may be useful!) * Always check a script for syntax errors first, by doing something like "python script.py". * If your script does not have any syntax errors, try adding "import cgitb; cgitb.enable()" to the top of the script. * When invoking external programs, make sure they can be found. Usually, this means using absolute path names — "PATH" is usually not set to a very useful value in a CGI script. * When reading or writing external files, make sure they can be read or written by the userid under which your CGI script will be running: this is typically the userid under which the web server is running, or some explicitly specified userid for a web server’s "suexec" feature. * Don’t try to give a CGI script a set-uid mode. This doesn’t work on most systems, and is a security liability as well. -[ Footnotes ]- [1] Note that some recent versions of the HTML specification do state what order the field values should be supplied in, but knowing whether a request was received from a conforming browser, or even from a browser at all, is tedious and error-prone. "cgitb" — Traceback manager for CGI scripts ******************************************* **Source code:** Lib/cgitb.py Deprecated since version 3.11: The "cgitb" module is deprecated (see **PEP 594** for details). ====================================================================== The "cgitb" module provides a special exception handler for Python scripts. (Its name is a bit misleading. It was originally designed to display extensive traceback information in HTML for CGI scripts. It was later generalized to also display this information in plain text.) After this module is activated, if an uncaught exception occurs, a detailed, formatted report will be displayed. The report includes a traceback showing excerpts of the source code for each level, as well as the values of the arguments and local variables to currently running functions, to help you debug the problem. Optionally, you can save this information to a file instead of sending it to the browser. To enable this feature, simply add this to the top of your CGI script: import cgitb cgitb.enable() The options to the "enable()" function control whether the report is displayed in the browser and whether the report is logged to a file for later analysis. cgitb.enable(display=1, logdir=None, context=5, format="html") This function causes the "cgitb" module to take over the interpreter’s default handling for exceptions by setting the value of "sys.excepthook". The optional argument *display* defaults to "1" and can be set to "0" to suppress sending the traceback to the browser. If the argument *logdir* is present, the traceback reports are written to files. The value of *logdir* should be a directory where these files will be placed. The optional argument *context* is the number of lines of context to display around the current line of source code in the traceback; this defaults to "5". If the optional argument *format* is ""html"", the output is formatted as HTML. Any other value forces plain text output. The default value is ""html"". cgitb.text(info, context=5) This function handles the exception described by *info* (a 3-tuple containing the result of "sys.exc_info()"), formatting its traceback as text and returning the result as a string. The optional argument *context* is the number of lines of context to display around the current line of source code in the traceback; this defaults to "5". cgitb.html(info, context=5) This function handles the exception described by *info* (a 3-tuple containing the result of "sys.exc_info()"), formatting its traceback as HTML and returning the result as a string. The optional argument *context* is the number of lines of context to display around the current line of source code in the traceback; this defaults to "5". cgitb.handler(info=None) This function handles an exception using the default settings (that is, show a report in the browser, but don’t log to a file). This can be used when you’ve caught an exception and want to report it using "cgitb". The optional *info* argument should be a 3-tuple containing an exception type, exception value, and traceback object, exactly like the tuple returned by "sys.exc_info()". If the *info* argument is not supplied, the current exception is obtained from "sys.exc_info()". "chunk" — Read IFF chunked data ******************************* **Source code:** Lib/chunk.py Deprecated since version 3.11: The "chunk" module is deprecated (see **PEP 594** for details). ====================================================================== This module provides an interface for reading files that use EA IFF 85 chunks. [1] This format is used in at least the Audio Interchange File Format (AIFF/AIFF-C) and the Real Media File Format (RMFF). The WAVE audio file format is closely related and can also be read using this module. A chunk has the following structure: +-----------+----------+---------------------------------+ | Offset | Length | Contents | |===========|==========|=================================| | 0 | 4 | Chunk ID | +-----------+----------+---------------------------------+ | 4 | 4 | Size of chunk in big-endian | | | | byte order, not including the | | | | header | +-----------+----------+---------------------------------+ | 8 | *n* | Data bytes, where *n* is the | | | | size given in the preceding | | | | field | +-----------+----------+---------------------------------+ | 8 + *n* | 0 or 1 | Pad byte needed if *n* is odd | | | | and chunk alignment is used | +-----------+----------+---------------------------------+ The ID is a 4-byte string which identifies the type of chunk. The size field (a 32-bit value, encoded using big-endian byte order) gives the size of the chunk data, not including the 8-byte header. Usually an IFF-type file consists of one or more chunks. The proposed usage of the "Chunk" class defined here is to instantiate an instance at the start of each chunk and read from the instance until it reaches the end, after which a new instance can be instantiated. At the end of the file, creating a new instance will fail with an "EOFError" exception. class chunk.Chunk(file, align=True, bigendian=True, inclheader=False) Class which represents a chunk. The *file* argument is expected to be a file-like object. An instance of this class is specifically allowed. The only method that is needed is "read()". If the methods "seek()" and "tell()" are present and don’t raise an exception, they are also used. If these methods are present and raise an exception, they are expected to not have altered the object. If the optional argument *align* is true, chunks are assumed to be aligned on 2-byte boundaries. If *align* is false, no alignment is assumed. The default value is true. If the optional argument *bigendian* is false, the chunk size is assumed to be in little-endian order. This is needed for WAVE audio files. The default value is true. If the optional argument *inclheader* is true, the size given in the chunk header includes the size of the header. The default value is false. A "Chunk" object supports the following methods: getname() Returns the name (ID) of the chunk. This is the first 4 bytes of the chunk. getsize() Returns the size of the chunk. close() Close and skip to the end of the chunk. This does not close the underlying file. The remaining methods will raise "OSError" if called after the "close()" method has been called. Before Python 3.3, they used to raise "IOError", now an alias of "OSError". isatty() Returns "False". seek(pos, whence=0) Set the chunk’s current position. The *whence* argument is optional and defaults to "0" (absolute file positioning); other values are "1" (seek relative to the current position) and "2" (seek relative to the file’s end). There is no return value. If the underlying file does not allow seek, only forward seeks are allowed. tell() Return the current position into the chunk. read(size=-1) Read at most *size* bytes from the chunk (less if the read hits the end of the chunk before obtaining *size* bytes). If the *size* argument is negative or omitted, read all data until the end of the chunk. An empty bytes object is returned when the end of the chunk is encountered immediately. skip() Skip to the end of the chunk. All further calls to "read()" for the chunk will return "b''". If you are not interested in the contents of the chunk, this method should be called so that the file points to the start of the next chunk. -[ Footnotes ]- [1] “EA IFF 85” Standard for Interchange Format Files, Jerry Morrison, Electronic Arts, January 1985. "cmath" — Mathematical functions for complex numbers **************************************************** ====================================================================== This module provides access to mathematical functions for complex numbers. The functions in this module accept integers, floating-point numbers or complex numbers as arguments. They will also accept any Python object that has either a "__complex__()" or a "__float__()" method: these methods are used to convert the object to a complex or floating-point number, respectively, and the function is then applied to the result of the conversion. Note: On platforms with hardware and system-level support for signed zeros, functions involving branch cuts are continuous on *both* sides of the branch cut: the sign of the zero distinguishes one side of the branch cut from the other. On platforms that do not support signed zeros the continuity is as specified below. Conversions to and from polar coordinates ========================================= A Python complex number "z" is stored internally using *rectangular* or *Cartesian* coordinates. It is completely determined by its *real part* "z.real" and its *imaginary part* "z.imag". In other words: z == z.real + z.imag*1j *Polar coordinates* give an alternative way to represent a complex number. In polar coordinates, a complex number *z* is defined by the modulus *r* and the phase angle *phi*. The modulus *r* is the distance from *z* to the origin, while the phase *phi* is the counterclockwise angle, measured in radians, from the positive x-axis to the line segment that joins the origin to *z*. The following functions can be used to convert from the native rectangular coordinates to polar coordinates and back. cmath.phase(x) Return the phase of *x* (also known as the *argument* of *x*), as a float. "phase(x)" is equivalent to "math.atan2(x.imag, x.real)". The result lies in the range [-*π*, *π*], and the branch cut for this operation lies along the negative real axis, continuous from above. On systems with support for signed zeros (which includes most systems in current use), this means that the sign of the result is the same as the sign of "x.imag", even when "x.imag" is zero: >>> phase(complex(-1.0, 0.0)) 3.141592653589793 >>> phase(complex(-1.0, -0.0)) -3.141592653589793 Note: The modulus (absolute value) of a complex number *x* can be computed using the built-in "abs()" function. There is no separate "cmath" module function for this operation. cmath.polar(x) Return the representation of *x* in polar coordinates. Returns a pair "(r, phi)" where *r* is the modulus of *x* and phi is the phase of *x*. "polar(x)" is equivalent to "(abs(x), phase(x))". cmath.rect(r, phi) Return the complex number *x* with polar coordinates *r* and *phi*. Equivalent to "r * (math.cos(phi) + math.sin(phi)*1j)". Power and logarithmic functions =============================== cmath.exp(x) Return *e* raised to the power *x*, where *e* is the base of natural logarithms. cmath.log(x[, base]) Returns the logarithm of *x* to the given *base*. If the *base* is not specified, returns the natural logarithm of *x*. There is one branch cut, from 0 along the negative real axis to -∞, continuous from above. cmath.log10(x) Return the base-10 logarithm of *x*. This has the same branch cut as "log()". cmath.sqrt(x) Return the square root of *x*. This has the same branch cut as "log()". Trigonometric functions ======================= cmath.acos(x) Return the arc cosine of *x*. There are two branch cuts: One extends right from 1 along the real axis to ∞, continuous from below. The other extends left from -1 along the real axis to -∞, continuous from above. cmath.asin(x) Return the arc sine of *x*. This has the same branch cuts as "acos()". cmath.atan(x) Return the arc tangent of *x*. There are two branch cuts: One extends from "1j" along the imaginary axis to "∞j", continuous from the right. The other extends from "-1j" along the imaginary axis to "-∞j", continuous from the left. cmath.cos(x) Return the cosine of *x*. cmath.sin(x) Return the sine of *x*. cmath.tan(x) Return the tangent of *x*. Hyperbolic functions ==================== cmath.acosh(x) Return the inverse hyperbolic cosine of *x*. There is one branch cut, extending left from 1 along the real axis to -∞, continuous from above. cmath.asinh(x) Return the inverse hyperbolic sine of *x*. There are two branch cuts: One extends from "1j" along the imaginary axis to "∞j", continuous from the right. The other extends from "-1j" along the imaginary axis to "-∞j", continuous from the left. cmath.atanh(x) Return the inverse hyperbolic tangent of *x*. There are two branch cuts: One extends from "1" along the real axis to "∞", continuous from below. The other extends from "-1" along the real axis to "-∞", continuous from above. cmath.cosh(x) Return the hyperbolic cosine of *x*. cmath.sinh(x) Return the hyperbolic sine of *x*. cmath.tanh(x) Return the hyperbolic tangent of *x*. Classification functions ======================== cmath.isfinite(x) Return "True" if both the real and imaginary parts of *x* are finite, and "False" otherwise. New in version 3.2. cmath.isinf(x) Return "True" if either the real or the imaginary part of *x* is an infinity, and "False" otherwise. cmath.isnan(x) Return "True" if either the real or the imaginary part of *x* is a NaN, and "False" otherwise. cmath.isclose(a, b, *, rel_tol=1e-09, abs_tol=0.0) Return "True" if the values *a* and *b* are close to each other and "False" otherwise. Whether or not two values are considered close is determined according to given absolute and relative tolerances. *rel_tol* is the relative tolerance – it is the maximum allowed difference between *a* and *b*, relative to the larger absolute value of *a* or *b*. For example, to set a tolerance of 5%, pass "rel_tol=0.05". The default tolerance is "1e-09", which assures that the two values are the same within about 9 decimal digits. *rel_tol* must be greater than zero. *abs_tol* is the minimum absolute tolerance – useful for comparisons near zero. *abs_tol* must be at least zero. If no errors occur, the result will be: "abs(a-b) <= max(rel_tol * max(abs(a), abs(b)), abs_tol)". The IEEE 754 special values of "NaN", "inf", and "-inf" will be handled according to IEEE rules. Specifically, "NaN" is not considered close to any other value, including "NaN". "inf" and "-inf" are only considered close to themselves. New in version 3.5. See also: **PEP 485** – A function for testing approximate equality Constants ========= cmath.pi The mathematical constant *π*, as a float. cmath.e The mathematical constant *e*, as a float. cmath.tau The mathematical constant *τ*, as a float. New in version 3.6. cmath.inf Floating-point positive infinity. Equivalent to "float('inf')". New in version 3.6. cmath.infj Complex number with zero real part and positive infinity imaginary part. Equivalent to "complex(0.0, float('inf'))". New in version 3.6. cmath.nan A floating-point “not a number” (NaN) value. Equivalent to "float('nan')". New in version 3.6. cmath.nanj Complex number with zero real part and NaN imaginary part. Equivalent to "complex(0.0, float('nan'))". New in version 3.6. Note that the selection of functions is similar, but not identical, to that in module "math". The reason for having two modules is that some users aren’t interested in complex numbers, and perhaps don’t even know what they are. They would rather have "math.sqrt(-1)" raise an exception than return a complex number. Also note that the functions defined in "cmath" always return a complex number, even if the answer can be expressed as a real number (in which case the complex number has an imaginary part of zero). A note on branch cuts: They are curves along which the given function fails to be continuous. They are a necessary feature of many complex functions. It is assumed that if you need to compute with complex functions, you will understand about branch cuts. Consult almost any (not too elementary) book on complex variables for enlightenment. For information of the proper choice of branch cuts for numerical purposes, a good reference should be the following: See also: Kahan, W: Branch cuts for complex elementary functions; or, Much ado about nothing’s sign bit. In Iserles, A., and Powell, M. (eds.), The state of the art in numerical analysis. Clarendon Press (1987) pp165–211. "cmd" — Support for line-oriented command interpreters ****************************************************** **Source code:** Lib/cmd.py ====================================================================== The "Cmd" class provides a simple framework for writing line-oriented command interpreters. These are often useful for test harnesses, administrative tools, and prototypes that will later be wrapped in a more sophisticated interface. class cmd.Cmd(completekey='tab', stdin=None, stdout=None) A "Cmd" instance or subclass instance is a line-oriented interpreter framework. There is no good reason to instantiate "Cmd" itself; rather, it’s useful as a superclass of an interpreter class you define yourself in order to inherit "Cmd"’s methods and encapsulate action methods. The optional argument *completekey* is the "readline" name of a completion key; it defaults to "Tab". If *completekey* is not "None" and "readline" is available, command completion is done automatically. The optional arguments *stdin* and *stdout* specify the input and output file objects that the Cmd instance or subclass instance will use for input and output. If not specified, they will default to "sys.stdin" and "sys.stdout". If you want a given *stdin* to be used, make sure to set the instance’s "use_rawinput" attribute to "False", otherwise *stdin* will be ignored. Cmd Objects =========== A "Cmd" instance has the following methods: Cmd.cmdloop(intro=None) Repeatedly issue a prompt, accept input, parse an initial prefix off the received input, and dispatch to action methods, passing them the remainder of the line as argument. The optional argument is a banner or intro string to be issued before the first prompt (this overrides the "intro" class attribute). If the "readline" module is loaded, input will automatically inherit **bash**-like history-list editing (e.g. "Control-P" scrolls back to the last command, "Control-N" forward to the next one, "Control-F" moves the cursor to the right non-destructively, "Control-B" moves the cursor to the left non-destructively, etc.). An end-of-file on input is passed back as the string "'EOF'". An interpreter instance will recognize a command name "foo" if and only if it has a method "do_foo()". As a special case, a line beginning with the character "'?'" is dispatched to the method "do_help()". As another special case, a line beginning with the character "'!'" is dispatched to the method "do_shell()" (if such a method is defined). This method will return when the "postcmd()" method returns a true value. The *stop* argument to "postcmd()" is the return value from the command’s corresponding "do_*()" method. If completion is enabled, completing commands will be done automatically, and completing of commands args is done by calling "complete_foo()" with arguments *text*, *line*, *begidx*, and *endidx*. *text* is the string prefix we are attempting to match: all returned matches must begin with it. *line* is the current input line with leading whitespace removed, *begidx* and *endidx* are the beginning and ending indexes of the prefix text, which could be used to provide different completion depending upon which position the argument is in. All subclasses of "Cmd" inherit a predefined "do_help()". This method, called with an argument "'bar'", invokes the corresponding method "help_bar()", and if that is not present, prints the docstring of "do_bar()", if available. With no argument, "do_help()" lists all available help topics (that is, all commands with corresponding "help_*()" methods or commands that have docstrings), and also lists any undocumented commands. Cmd.onecmd(str) Interpret the argument as though it had been typed in response to the prompt. This may be overridden, but should not normally need to be; see the "precmd()" and "postcmd()" methods for useful execution hooks. The return value is a flag indicating whether interpretation of commands by the interpreter should stop. If there is a "do_*()" method for the command *str*, the return value of that method is returned, otherwise the return value from the "default()" method is returned. Cmd.emptyline() Method called when an empty line is entered in response to the prompt. If this method is not overridden, it repeats the last nonempty command entered. Cmd.default(line) Method called on an input line when the command prefix is not recognized. If this method is not overridden, it prints an error message and returns. Cmd.completedefault(text, line, begidx, endidx) Method called to complete an input line when no command-specific "complete_*()" method is available. By default, it returns an empty list. Cmd.precmd(line) Hook method executed just before the command line *line* is interpreted, but after the input prompt is generated and issued. This method is a stub in "Cmd"; it exists to be overridden by subclasses. The return value is used as the command which will be executed by the "onecmd()" method; the "precmd()" implementation may re-write the command or simply return *line* unchanged. Cmd.postcmd(stop, line) Hook method executed just after a command dispatch is finished. This method is a stub in "Cmd"; it exists to be overridden by subclasses. *line* is the command line which was executed, and *stop* is a flag which indicates whether execution will be terminated after the call to "postcmd()"; this will be the return value of the "onecmd()" method. The return value of this method will be used as the new value for the internal flag which corresponds to *stop*; returning false will cause interpretation to continue. Cmd.preloop() Hook method executed once when "cmdloop()" is called. This method is a stub in "Cmd"; it exists to be overridden by subclasses. Cmd.postloop() Hook method executed once when "cmdloop()" is about to return. This method is a stub in "Cmd"; it exists to be overridden by subclasses. Instances of "Cmd" subclasses have some public instance variables: Cmd.prompt The prompt issued to solicit input. Cmd.identchars The string of characters accepted for the command prefix. Cmd.lastcmd The last nonempty command prefix seen. Cmd.cmdqueue A list of queued input lines. The cmdqueue list is checked in "cmdloop()" when new input is needed; if it is nonempty, its elements will be processed in order, as if entered at the prompt. Cmd.intro A string to issue as an intro or banner. May be overridden by giving the "cmdloop()" method an argument. Cmd.doc_header The header to issue if the help output has a section for documented commands. Cmd.misc_header The header to issue if the help output has a section for miscellaneous help topics (that is, there are "help_*()" methods without corresponding "do_*()" methods). Cmd.undoc_header The header to issue if the help output has a section for undocumented commands (that is, there are "do_*()" methods without corresponding "help_*()" methods). Cmd.ruler The character used to draw separator lines under the help-message headers. If empty, no ruler line is drawn. It defaults to "'='". Cmd.use_rawinput A flag, defaulting to true. If true, "cmdloop()" uses "input()" to display a prompt and read the next command; if false, "sys.stdout.write()" and "sys.stdin.readline()" are used. (This means that by importing "readline", on systems that support it, the interpreter will automatically support **Emacs**-like line editing and command-history keystrokes.) Cmd Example =========== The "cmd" module is mainly useful for building custom shells that let a user work with a program interactively. This section presents a simple example of how to build a shell around a few of the commands in the "turtle" module. Basic turtle commands such as "forward()" are added to a "Cmd" subclass with method named "do_forward()". The argument is converted to a number and dispatched to the turtle module. The docstring is used in the help utility provided by the shell. The example also includes a basic record and playback facility implemented with the "precmd()" method which is responsible for converting the input to lowercase and writing the commands to a file. The "do_playback()" method reads the file and adds the recorded commands to the "cmdqueue" for immediate playback: import cmd, sys from turtle import * class TurtleShell(cmd.Cmd): intro = 'Welcome to the turtle shell. Type help or ? to list commands.\n' prompt = '(turtle) ' file = None # ----- basic turtle commands ----- def do_forward(self, arg): 'Move the turtle forward by the specified distance: FORWARD 10' forward(*parse(arg)) def do_right(self, arg): 'Turn turtle right by given number of degrees: RIGHT 20' right(*parse(arg)) def do_left(self, arg): 'Turn turtle left by given number of degrees: LEFT 90' left(*parse(arg)) def do_goto(self, arg): 'Move turtle to an absolute position with changing orientation. GOTO 100 200' goto(*parse(arg)) def do_home(self, arg): 'Return turtle to the home position: HOME' home() def do_circle(self, arg): 'Draw circle with given radius an options extent and steps: CIRCLE 50' circle(*parse(arg)) def do_position(self, arg): 'Print the current turtle position: POSITION' print('Current position is %d %d\n' % position()) def do_heading(self, arg): 'Print the current turtle heading in degrees: HEADING' print('Current heading is %d\n' % (heading(),)) def do_color(self, arg): 'Set the color: COLOR BLUE' color(arg.lower()) def do_undo(self, arg): 'Undo (repeatedly) the last turtle action(s): UNDO' def do_reset(self, arg): 'Clear the screen and return turtle to center: RESET' reset() def do_bye(self, arg): 'Stop recording, close the turtle window, and exit: BYE' print('Thank you for using Turtle') self.close() bye() return True # ----- record and playback ----- def do_record(self, arg): 'Save future commands to filename: RECORD rose.cmd' self.file = open(arg, 'w') def do_playback(self, arg): 'Playback commands from a file: PLAYBACK rose.cmd' self.close() with open(arg) as f: self.cmdqueue.extend(f.read().splitlines()) def precmd(self, line): line = line.lower() if self.file and 'playback' not in line: print(line, file=self.file) return line def close(self): if self.file: self.file.close() self.file = None def parse(arg): 'Convert a series of zero or more numbers to an argument tuple' return tuple(map(int, arg.split())) if __name__ == '__main__': TurtleShell().cmdloop() Here is a sample session with the turtle shell showing the help functions, using blank lines to repeat commands, and the simple record and playback facility: Welcome to the turtle shell. Type help or ? to list commands. (turtle) ? Documented commands (type help ): ======================================== bye color goto home playback record right circle forward heading left position reset undo (turtle) help forward Move the turtle forward by the specified distance: FORWARD 10 (turtle) record spiral.cmd (turtle) position Current position is 0 0 (turtle) heading Current heading is 0 (turtle) reset (turtle) circle 20 (turtle) right 30 (turtle) circle 40 (turtle) right 30 (turtle) circle 60 (turtle) right 30 (turtle) circle 80 (turtle) right 30 (turtle) circle 100 (turtle) right 30 (turtle) circle 120 (turtle) right 30 (turtle) circle 120 (turtle) heading Current heading is 180 (turtle) forward 100 (turtle) (turtle) right 90 (turtle) forward 100 (turtle) (turtle) right 90 (turtle) forward 400 (turtle) right 90 (turtle) forward 500 (turtle) right 90 (turtle) forward 400 (turtle) right 90 (turtle) forward 300 (turtle) playback spiral.cmd Current position is 0 0 Current heading is 0 Current heading is 180 (turtle) bye Thank you for using Turtle "code" — Interpreter base classes ********************************* **Source code:** Lib/code.py ====================================================================== The "code" module provides facilities to implement read-eval-print loops in Python. Two classes and convenience functions are included which can be used to build applications which provide an interactive interpreter prompt. class code.InteractiveInterpreter(locals=None) This class deals with parsing and interpreter state (the user’s namespace); it does not deal with input buffering or prompting or input file naming (the filename is always passed in explicitly). The optional *locals* argument specifies the dictionary in which code will be executed; it defaults to a newly created dictionary with key "'__name__'" set to "'__console__'" and key "'__doc__'" set to "None". class code.InteractiveConsole(locals=None, filename="") Closely emulate the behavior of the interactive Python interpreter. This class builds on "InteractiveInterpreter" and adds prompting using the familiar "sys.ps1" and "sys.ps2", and input buffering. code.interact(banner=None, readfunc=None, local=None, exitmsg=None) Convenience function to run a read-eval-print loop. This creates a new instance of "InteractiveConsole" and sets *readfunc* to be used as the "InteractiveConsole.raw_input()" method, if provided. If *local* is provided, it is passed to the "InteractiveConsole" constructor for use as the default namespace for the interpreter loop. The "interact()" method of the instance is then run with *banner* and *exitmsg* passed as the banner and exit message to use, if provided. The console object is discarded after use. Changed in version 3.6: Added *exitmsg* parameter. code.compile_command(source, filename="", symbol="single") This function is useful for programs that want to emulate Python’s interpreter main loop (a.k.a. the read-eval-print loop). The tricky part is to determine when the user has entered an incomplete command that can be completed by entering more text (as opposed to a complete command or a syntax error). This function *almost* always makes the same decision as the real interpreter main loop. *source* is the source string; *filename* is the optional filename from which source was read, defaulting to "''"; and *symbol* is the optional grammar start symbol, which should be "'single'" (the default), "'eval'" or "'exec'". Returns a code object (the same as "compile(source, filename, symbol)") if the command is complete and valid; "None" if the command is incomplete; raises "SyntaxError" if the command is complete and contains a syntax error, or raises "OverflowError" or "ValueError" if the command contains an invalid literal. Interactive Interpreter Objects =============================== InteractiveInterpreter.runsource(source, filename="", symbol="single") Compile and run some source in the interpreter. Arguments are the same as for "compile_command()"; the default for *filename* is "''", and for *symbol* is "'single'". One of several things can happen: * The input is incorrect; "compile_command()" raised an exception ("SyntaxError" or "OverflowError"). A syntax traceback will be printed by calling the "showsyntaxerror()" method. "runsource()" returns "False". * The input is incomplete, and more input is required; "compile_command()" returned "None". "runsource()" returns "True". * The input is complete; "compile_command()" returned a code object. The code is executed by calling the "runcode()" (which also handles run-time exceptions, except for "SystemExit"). "runsource()" returns "False". The return value can be used to decide whether to use "sys.ps1" or "sys.ps2" to prompt the next line. InteractiveInterpreter.runcode(code) Execute a code object. When an exception occurs, "showtraceback()" is called to display a traceback. All exceptions are caught except "SystemExit", which is allowed to propagate. A note about "KeyboardInterrupt": this exception may occur elsewhere in this code, and may not always be caught. The caller should be prepared to deal with it. InteractiveInterpreter.showsyntaxerror(filename=None) Display the syntax error that just occurred. This does not display a stack trace because there isn’t one for syntax errors. If *filename* is given, it is stuffed into the exception instead of the default filename provided by Python’s parser, because it always uses "''" when reading from a string. The output is written by the "write()" method. InteractiveInterpreter.showtraceback() Display the exception that just occurred. We remove the first stack item because it is within the interpreter object implementation. The output is written by the "write()" method. Changed in version 3.5: The full chained traceback is displayed instead of just the primary traceback. InteractiveInterpreter.write(data) Write a string to the standard error stream ("sys.stderr"). Derived classes should override this to provide the appropriate output handling as needed. Interactive Console Objects =========================== The "InteractiveConsole" class is a subclass of "InteractiveInterpreter", and so offers all the methods of the interpreter objects as well as the following additions. InteractiveConsole.interact(banner=None, exitmsg=None) Closely emulate the interactive Python console. The optional *banner* argument specify the banner to print before the first interaction; by default it prints a banner similar to the one printed by the standard Python interpreter, followed by the class name of the console object in parentheses (so as not to confuse this with the real interpreter – since it’s so close!). The optional *exitmsg* argument specifies an exit message printed when exiting. Pass the empty string to suppress the exit message. If *exitmsg* is not given or "None", a default message is printed. Changed in version 3.4: To suppress printing any banner, pass an empty string. Changed in version 3.6: Print an exit message when exiting. InteractiveConsole.push(line) Push a line of source text to the interpreter. The line should not have a trailing newline; it may have internal newlines. The line is appended to a buffer and the interpreter’s "runsource()" method is called with the concatenated contents of the buffer as source. If this indicates that the command was executed or invalid, the buffer is reset; otherwise, the command is incomplete, and the buffer is left as it was after the line was appended. The return value is "True" if more input is required, "False" if the line was dealt with in some way (this is the same as "runsource()"). InteractiveConsole.resetbuffer() Remove any unhandled source text from the input buffer. InteractiveConsole.raw_input(prompt="") Write a prompt and read a line. The returned line does not include the trailing newline. When the user enters the EOF key sequence, "EOFError" is raised. The base implementation reads from "sys.stdin"; a subclass may replace this with a different implementation. "codeop" — Compile Python code ****************************** **Source code:** Lib/codeop.py ====================================================================== The "codeop" module provides utilities upon which the Python read- eval-print loop can be emulated, as is done in the "code" module. As a result, you probably don’t want to use the module directly; if you want to include such a loop in your program you probably want to use the "code" module instead. There are two parts to this job: 1. Being able to tell if a line of input completes a Python statement: in short, telling whether to print ‘">>>"’ or ‘"..."’ next. 2. Remembering which future statements the user has entered, so subsequent input can be compiled with these in effect. The "codeop" module provides a way of doing each of these things, and a way of doing them both. To do just the former: codeop.compile_command(source, filename="", symbol="single") Tries to compile *source*, which should be a string of Python code and return a code object if *source* is valid Python code. In that case, the filename attribute of the code object will be *filename*, which defaults to "''". Returns "None" if *source* is *not* valid Python code, but is a prefix of valid Python code. If there is a problem with *source*, an exception will be raised. "SyntaxError" is raised if there is invalid Python syntax, and "OverflowError" or "ValueError" if there is an invalid literal. The *symbol* argument determines whether *source* is compiled as a statement ("'single'", the default), as a sequence of statements ("'exec'") or as an *expression* ("'eval'"). Any other value will cause "ValueError" to be raised. Note: It is possible (but not likely) that the parser stops parsing with a successful outcome before reaching the end of the source; in this case, trailing symbols may be ignored instead of causing an error. For example, a backslash followed by two newlines may be followed by arbitrary garbage. This will be fixed once the API for the parser is better. class codeop.Compile Instances of this class have "__call__()" methods identical in signature to the built-in function "compile()", but with the difference that if the instance compiles program text containing a "__future__" statement, the instance ‘remembers’ and compiles all subsequent program texts with the statement in force. class codeop.CommandCompiler Instances of this class have "__call__()" methods identical in signature to "compile_command()"; the difference is that if the instance compiles program text containing a "__future__" statement, the instance ‘remembers’ and compiles all subsequent program texts with the statement in force. "collections.abc" — Abstract Base Classes for Containers ******************************************************** New in version 3.3: Formerly, this module was part of the "collections" module. **Source code:** Lib/_collections_abc.py ====================================================================== This module provides *abstract base classes* that can be used to test whether a class provides a particular interface; for example, whether it is hashable or whether it is a mapping. New in version 3.9: These abstract classes now support "[]". See Generic Alias Type and **PEP 585**. Collections Abstract Base Classes ================================= The collections module offers the following *ABCs*: +----------------------------+------------------------+-------------------------+------------------------------------------------------+ | ABC | Inherits from | Abstract Methods | Mixin Methods | |============================|========================|=========================|======================================================| | "Container" | | "__contains__" | | +----------------------------+------------------------+-------------------------+------------------------------------------------------+ | "Hashable" | | "__hash__" | | +----------------------------+------------------------+-------------------------+------------------------------------------------------+ | "Iterable" | | "__iter__" | | +----------------------------+------------------------+-------------------------+------------------------------------------------------+ | "Iterator" | "Iterable" | "__next__" | "__iter__" | +----------------------------+------------------------+-------------------------+------------------------------------------------------+ | "Reversible" | "Iterable" | "__reversed__" | | +----------------------------+------------------------+-------------------------+------------------------------------------------------+ | "Generator" | "Iterator" | "send", "throw" | "close", "__iter__", "__next__" | +----------------------------+------------------------+-------------------------+------------------------------------------------------+ | "Sized" | | "__len__" | | +----------------------------+------------------------+-------------------------+------------------------------------------------------+ | "Callable" | | "__call__" | | +----------------------------+------------------------+-------------------------+------------------------------------------------------+ | "Collection" | "Sized", "Iterable", | "__contains__", | | | | "Container" | "__iter__", "__len__" | | +----------------------------+------------------------+-------------------------+------------------------------------------------------+ | "Sequence" | "Reversible", | "__getitem__", | "__contains__", "__iter__", "__reversed__", "index", | | | "Collection" | "__len__" | and "count" | +----------------------------+------------------------+-------------------------+------------------------------------------------------+ | "MutableSequence" | "Sequence" | "__getitem__", | Inherited "Sequence" methods and "append", | | | | "__setitem__", | "reverse", "extend", "pop", "remove", and "__iadd__" | | | | "__delitem__", | | | | | "__len__", "insert" | | +----------------------------+------------------------+-------------------------+------------------------------------------------------+ | "ByteString" | "Sequence" | "__getitem__", | Inherited "Sequence" methods | | | | "__len__" | | +----------------------------+------------------------+-------------------------+------------------------------------------------------+ | "Set" | "Collection" | "__contains__", | "__le__", "__lt__", "__eq__", "__ne__", "__gt__", | | | | "__iter__", "__len__" | "__ge__", "__and__", "__or__", "__sub__", "__xor__", | | | | | and "isdisjoint" | +----------------------------+------------------------+-------------------------+------------------------------------------------------+ | "MutableSet" | "Set" | "__contains__", | Inherited "Set" methods and "clear", "pop", | | | | "__iter__", "__len__", | "remove", "__ior__", "__iand__", "__ixor__", and | | | | "add", "discard" | "__isub__" | +----------------------------+------------------------+-------------------------+------------------------------------------------------+ | "Mapping" | "Collection" | "__getitem__", | "__contains__", "keys", "items", "values", "get", | | | | "__iter__", "__len__" | "__eq__", and "__ne__" | +----------------------------+------------------------+-------------------------+------------------------------------------------------+ | "MutableMapping" | "Mapping" | "__getitem__", | Inherited "Mapping" methods and "pop", "popitem", | | | | "__setitem__", | "clear", "update", and "setdefault" | | | | "__delitem__", | | | | | "__iter__", "__len__" | | +----------------------------+------------------------+-------------------------+------------------------------------------------------+ | "MappingView" | "Sized" | | "__len__" | +----------------------------+------------------------+-------------------------+------------------------------------------------------+ | "ItemsView" | "MappingView", "Set" | | "__contains__", "__iter__" | +----------------------------+------------------------+-------------------------+------------------------------------------------------+ | "KeysView" | "MappingView", "Set" | | "__contains__", "__iter__" | +----------------------------+------------------------+-------------------------+------------------------------------------------------+ | "ValuesView" | "MappingView", | | "__contains__", "__iter__" | | | "Collection" | | | +----------------------------+------------------------+-------------------------+------------------------------------------------------+ | "Awaitable" | | "__await__" | | +----------------------------+------------------------+-------------------------+------------------------------------------------------+ | "Coroutine" | "Awaitable" | "send", "throw" | "close" | +----------------------------+------------------------+-------------------------+------------------------------------------------------+ | "AsyncIterable" | | "__aiter__" | | +----------------------------+------------------------+-------------------------+------------------------------------------------------+ | "AsyncIterator" | "AsyncIterable" | "__anext__" | "__aiter__" | +----------------------------+------------------------+-------------------------+------------------------------------------------------+ | "AsyncGenerator" | "AsyncIterator" | "asend", "athrow" | "aclose", "__aiter__", "__anext__" | +----------------------------+------------------------+-------------------------+------------------------------------------------------+ class collections.abc.Container ABC for classes that provide the "__contains__()" method. class collections.abc.Hashable ABC for classes that provide the "__hash__()" method. class collections.abc.Sized ABC for classes that provide the "__len__()" method. class collections.abc.Callable ABC for classes that provide the "__call__()" method. class collections.abc.Iterable ABC for classes that provide the "__iter__()" method. Checking "isinstance(obj, Iterable)" detects classes that are registered as "Iterable" or that have an "__iter__()" method, but it does not detect classes that iterate with the "__getitem__()" method. The only reliable way to determine whether an object is *iterable* is to call "iter(obj)". class collections.abc.Collection ABC for sized iterable container classes. New in version 3.6. class collections.abc.Iterator ABC for classes that provide the "__iter__()" and "__next__()" methods. See also the definition of *iterator*. class collections.abc.Reversible ABC for iterable classes that also provide the "__reversed__()" method. New in version 3.6. class collections.abc.Generator ABC for generator classes that implement the protocol defined in **PEP 342** that extends iterators with the "send()", "throw()" and "close()" methods. See also the definition of *generator*. New in version 3.5. class collections.abc.Sequence class collections.abc.MutableSequence class collections.abc.ByteString ABCs for read-only and mutable *sequences*. Implementation note: Some of the mixin methods, such as "__iter__()", "__reversed__()" and "index()", make repeated calls to the underlying "__getitem__()" method. Consequently, if "__getitem__()" is implemented with constant access speed, the mixin methods will have linear performance; however, if the underlying method is linear (as it would be with a linked list), the mixins will have quadratic performance and will likely need to be overridden. Changed in version 3.5: The index() method added support for *stop* and *start* arguments. class collections.abc.Set class collections.abc.MutableSet ABCs for read-only and mutable sets. class collections.abc.Mapping class collections.abc.MutableMapping ABCs for read-only and mutable *mappings*. class collections.abc.MappingView class collections.abc.ItemsView class collections.abc.KeysView class collections.abc.ValuesView ABCs for mapping, items, keys, and values *views*. class collections.abc.Awaitable ABC for *awaitable* objects, which can be used in "await" expressions. Custom implementations must provide the "__await__()" method. *Coroutine* objects and instances of the "Coroutine" ABC are all instances of this ABC. Note: In CPython, generator-based coroutines (generators decorated with "types.coroutine()" or "asyncio.coroutine()") are *awaitables*, even though they do not have an "__await__()" method. Using "isinstance(gencoro, Awaitable)" for them will return "False". Use "inspect.isawaitable()" to detect them. New in version 3.5. class collections.abc.Coroutine ABC for coroutine compatible classes. These implement the following methods, defined in Coroutine Objects: "send()", "throw()", and "close()". Custom implementations must also implement "__await__()". All "Coroutine" instances are also instances of "Awaitable". See also the definition of *coroutine*. Note: In CPython, generator-based coroutines (generators decorated with "types.coroutine()" or "asyncio.coroutine()") are *awaitables*, even though they do not have an "__await__()" method. Using "isinstance(gencoro, Coroutine)" for them will return "False". Use "inspect.isawaitable()" to detect them. New in version 3.5. class collections.abc.AsyncIterable ABC for classes that provide "__aiter__" method. See also the definition of *asynchronous iterable*. New in version 3.5. class collections.abc.AsyncIterator ABC for classes that provide "__aiter__" and "__anext__" methods. See also the definition of *asynchronous iterator*. New in version 3.5. class collections.abc.AsyncGenerator ABC for asynchronous generator classes that implement the protocol defined in **PEP 525** and **PEP 492**. New in version 3.6. These ABCs allow us to ask classes or instances if they provide particular functionality, for example: size = None if isinstance(myvar, collections.abc.Sized): size = len(myvar) Several of the ABCs are also useful as mixins that make it easier to develop classes supporting container APIs. For example, to write a class supporting the full "Set" API, it is only necessary to supply the three underlying abstract methods: "__contains__()", "__iter__()", and "__len__()". The ABC supplies the remaining methods such as "__and__()" and "isdisjoint()": class ListBasedSet(collections.abc.Set): ''' Alternate set implementation favoring space over speed and not requiring the set elements to be hashable. ''' def __init__(self, iterable): self.elements = lst = [] for value in iterable: if value not in lst: lst.append(value) def __iter__(self): return iter(self.elements) def __contains__(self, value): return value in self.elements def __len__(self): return len(self.elements) s1 = ListBasedSet('abcdef') s2 = ListBasedSet('defghi') overlap = s1 & s2 # The __and__() method is supported automatically Notes on using "Set" and "MutableSet" as a mixin: 1. Since some set operations create new sets, the default mixin methods need a way to create new instances from an iterable. The class constructor is assumed to have a signature in the form "ClassName(iterable)". That assumption is factored-out to an internal classmethod called "_from_iterable()" which calls "cls(iterable)" to produce a new set. If the "Set" mixin is being used in a class with a different constructor signature, you will need to override "_from_iterable()" with a classmethod or regular method that can construct new instances from an iterable argument. 2. To override the comparisons (presumably for speed, as the semantics are fixed), redefine "__le__()" and "__ge__()", then the other operations will automatically follow suit. 3. The "Set" mixin provides a "_hash()" method to compute a hash value for the set; however, "__hash__()" is not defined because not all sets are hashable or immutable. To add set hashability using mixins, inherit from both "Set()" and "Hashable()", then define "__hash__ = Set._hash". See also: * OrderedSet recipe for an example built on "MutableSet". * For more about ABCs, see the "abc" module and **PEP 3119**. "collections" — Container datatypes *********************************** **Source code:** Lib/collections/__init__.py ====================================================================== This module implements specialized container datatypes providing alternatives to Python’s general purpose built-in containers, "dict", "list", "set", and "tuple". +-----------------------+----------------------------------------------------------------------+ | "namedtuple()" | factory function for creating tuple subclasses with named fields | +-----------------------+----------------------------------------------------------------------+ | "deque" | list-like container with fast appends and pops on either end | +-----------------------+----------------------------------------------------------------------+ | "ChainMap" | dict-like class for creating a single view of multiple mappings | +-----------------------+----------------------------------------------------------------------+ | "Counter" | dict subclass for counting hashable objects | +-----------------------+----------------------------------------------------------------------+ | "OrderedDict" | dict subclass that remembers the order entries were added | +-----------------------+----------------------------------------------------------------------+ | "defaultdict" | dict subclass that calls a factory function to supply missing values | +-----------------------+----------------------------------------------------------------------+ | "UserDict" | wrapper around dictionary objects for easier dict subclassing | +-----------------------+----------------------------------------------------------------------+ | "UserList" | wrapper around list objects for easier list subclassing | +-----------------------+----------------------------------------------------------------------+ | "UserString" | wrapper around string objects for easier string subclassing | +-----------------------+----------------------------------------------------------------------+ Deprecated since version 3.3, will be removed in version 3.10: Moved Collections Abstract Base Classes to the "collections.abc" module. For backwards compatibility, they continue to be visible in this module through Python 3.9. "ChainMap" objects ================== New in version 3.3. A "ChainMap" class is provided for quickly linking a number of mappings so they can be treated as a single unit. It is often much faster than creating a new dictionary and running multiple "update()" calls. The class can be used to simulate nested scopes and is useful in templating. class collections.ChainMap(*maps) A "ChainMap" groups multiple dicts or other mappings together to create a single, updateable view. If no *maps* are specified, a single empty dictionary is provided so that a new chain always has at least one mapping. The underlying mappings are stored in a list. That list is public and can be accessed or updated using the *maps* attribute. There is no other state. Lookups search the underlying mappings successively until a key is found. In contrast, writes, updates, and deletions only operate on the first mapping. A "ChainMap" incorporates the underlying mappings by reference. So, if one of the underlying mappings gets updated, those changes will be reflected in "ChainMap". All of the usual dictionary methods are supported. In addition, there is a *maps* attribute, a method for creating new subcontexts, and a property for accessing all but the first mapping: maps A user updateable list of mappings. The list is ordered from first-searched to last-searched. It is the only stored state and can be modified to change which mappings are searched. The list should always contain at least one mapping. new_child(m=None) Returns a new "ChainMap" containing a new map followed by all of the maps in the current instance. If "m" is specified, it becomes the new map at the front of the list of mappings; if not specified, an empty dict is used, so that a call to "d.new_child()" is equivalent to: "ChainMap({}, *d.maps)". This method is used for creating subcontexts that can be updated without altering values in any of the parent mappings. Changed in version 3.4: The optional "m" parameter was added. parents Property returning a new "ChainMap" containing all of the maps in the current instance except the first one. This is useful for skipping the first map in the search. Use cases are similar to those for the "nonlocal" keyword used in *nested scopes*. The use cases also parallel those for the built-in "super()" function. A reference to "d.parents" is equivalent to: "ChainMap(*d.maps[1:])". Note, the iteration order of a "ChainMap()" is determined by scanning the mappings last to first: >>> baseline = {'music': 'bach', 'art': 'rembrandt'} >>> adjustments = {'art': 'van gogh', 'opera': 'carmen'} >>> list(ChainMap(adjustments, baseline)) ['music', 'art', 'opera'] This gives the same ordering as a series of "dict.update()" calls starting with the last mapping: >>> combined = baseline.copy() >>> combined.update(adjustments) >>> list(combined) ['music', 'art', 'opera'] Changed in version 3.9: Added support for "|" and "|=" operators, specified in **PEP 584**. See also: * The MultiContext class in the Enthought CodeTools package has options to support writing to any mapping in the chain. * Django’s Context class for templating is a read-only chain of mappings. It also features pushing and popping of contexts similar to the "new_child()" method and the "parents" property. * The Nested Contexts recipe has options to control whether writes and other mutations apply only to the first mapping or to any mapping in the chain. * A greatly simplified read-only version of Chainmap. "ChainMap" Examples and Recipes ------------------------------- This section shows various approaches to working with chained maps. Example of simulating Python’s internal lookup chain: import builtins pylookup = ChainMap(locals(), globals(), vars(builtins)) Example of letting user specified command-line arguments take precedence over environment variables which in turn take precedence over default values: import os, argparse defaults = {'color': 'red', 'user': 'guest'} parser = argparse.ArgumentParser() parser.add_argument('-u', '--user') parser.add_argument('-c', '--color') namespace = parser.parse_args() command_line_args = {k: v for k, v in vars(namespace).items() if v is not None} combined = ChainMap(command_line_args, os.environ, defaults) print(combined['color']) print(combined['user']) Example patterns for using the "ChainMap" class to simulate nested contexts: c = ChainMap() # Create root context d = c.new_child() # Create nested child context e = c.new_child() # Child of c, independent from d e.maps[0] # Current context dictionary -- like Python's locals() e.maps[-1] # Root context -- like Python's globals() e.parents # Enclosing context chain -- like Python's nonlocals d['x'] = 1 # Set value in current context d['x'] # Get first key in the chain of contexts del d['x'] # Delete from current context list(d) # All nested values k in d # Check all nested values len(d) # Number of nested values d.items() # All nested items dict(d) # Flatten into a regular dictionary The "ChainMap" class only makes updates (writes and deletions) to the first mapping in the chain while lookups will search the full chain. However, if deep writes and deletions are desired, it is easy to make a subclass that updates keys found deeper in the chain: class DeepChainMap(ChainMap): 'Variant of ChainMap that allows direct updates to inner scopes' def __setitem__(self, key, value): for mapping in self.maps: if key in mapping: mapping[key] = value return self.maps[0][key] = value def __delitem__(self, key): for mapping in self.maps: if key in mapping: del mapping[key] return raise KeyError(key) >>> d = DeepChainMap({'zebra': 'black'}, {'elephant': 'blue'}, {'lion': 'yellow'}) >>> d['lion'] = 'orange' # update an existing key two levels down >>> d['snake'] = 'red' # new keys get added to the topmost dict >>> del d['elephant'] # remove an existing key one level down >>> d # display result DeepChainMap({'zebra': 'black', 'snake': 'red'}, {}, {'lion': 'orange'}) "Counter" objects ================= A counter tool is provided to support convenient and rapid tallies. For example: >>> # Tally occurrences of words in a list >>> cnt = Counter() >>> for word in ['red', 'blue', 'red', 'green', 'blue', 'blue']: ... cnt[word] += 1 >>> cnt Counter({'blue': 3, 'red': 2, 'green': 1}) >>> # Find the ten most common words in Hamlet >>> import re >>> words = re.findall(r'\w+', open('hamlet.txt').read().lower()) >>> Counter(words).most_common(10) [('the', 1143), ('and', 966), ('to', 762), ('of', 669), ('i', 631), ('you', 554), ('a', 546), ('my', 514), ('hamlet', 471), ('in', 451)] class collections.Counter([iterable-or-mapping]) A "Counter" is a "dict" subclass for counting hashable objects. It is a collection where elements are stored as dictionary keys and their counts are stored as dictionary values. Counts are allowed to be any integer value including zero or negative counts. The "Counter" class is similar to bags or multisets in other languages. Elements are counted from an *iterable* or initialized from another *mapping* (or counter): >>> c = Counter() # a new, empty counter >>> c = Counter('gallahad') # a new counter from an iterable >>> c = Counter({'red': 4, 'blue': 2}) # a new counter from a mapping >>> c = Counter(cats=4, dogs=8) # a new counter from keyword args Counter objects have a dictionary interface except that they return a zero count for missing items instead of raising a "KeyError": >>> c = Counter(['eggs', 'ham']) >>> c['bacon'] # count of a missing element is zero 0 Setting a count to zero does not remove an element from a counter. Use "del" to remove it entirely: >>> c['sausage'] = 0 # counter entry with a zero count >>> del c['sausage'] # del actually removes the entry New in version 3.1. Changed in version 3.7: As a "dict" subclass, "Counter" Inherited the capability to remember insertion order. Math operations on *Counter* objects also preserve order. Results are ordered according to when an element is first encountered in the left operand and then by the order encountered in the right operand. Counter objects support additional methods beyond those available for all dictionaries: elements() Return an iterator over elements repeating each as many times as its count. Elements are returned in the order first encountered. If an element’s count is less than one, "elements()" will ignore it. >>> c = Counter(a=4, b=2, c=0, d=-2) >>> sorted(c.elements()) ['a', 'a', 'a', 'a', 'b', 'b'] most_common([n]) Return a list of the *n* most common elements and their counts from the most common to the least. If *n* is omitted or "None", "most_common()" returns *all* elements in the counter. Elements with equal counts are ordered in the order first encountered: >>> Counter('abracadabra').most_common(3) [('a', 5), ('b', 2), ('r', 2)] subtract([iterable-or-mapping]) Elements are subtracted from an *iterable* or from another *mapping* (or counter). Like "dict.update()" but subtracts counts instead of replacing them. Both inputs and outputs may be zero or negative. >>> c = Counter(a=4, b=2, c=0, d=-2) >>> d = Counter(a=1, b=2, c=3, d=4) >>> c.subtract(d) >>> c Counter({'a': 3, 'b': 0, 'c': -3, 'd': -6}) New in version 3.2. The usual dictionary methods are available for "Counter" objects except for two which work differently for counters. fromkeys(iterable) This class method is not implemented for "Counter" objects. update([iterable-or-mapping]) Elements are counted from an *iterable* or added-in from another *mapping* (or counter). Like "dict.update()" but adds counts instead of replacing them. Also, the *iterable* is expected to be a sequence of elements, not a sequence of "(key, value)" pairs. Common patterns for working with "Counter" objects: sum(c.values()) # total of all counts c.clear() # reset all counts list(c) # list unique elements set(c) # convert to a set dict(c) # convert to a regular dictionary c.items() # convert to a list of (elem, cnt) pairs Counter(dict(list_of_pairs)) # convert from a list of (elem, cnt) pairs c.most_common()[:-n-1:-1] # n least common elements +c # remove zero and negative counts Several mathematical operations are provided for combining "Counter" objects to produce multisets (counters that have counts greater than zero). Addition and subtraction combine counters by adding or subtracting the counts of corresponding elements. Intersection and union return the minimum and maximum of corresponding counts. Each operation can accept inputs with signed counts, but the output will exclude results with counts of zero or less. >>> c = Counter(a=3, b=1) >>> d = Counter(a=1, b=2) >>> c + d # add two counters together: c[x] + d[x] Counter({'a': 4, 'b': 3}) >>> c - d # subtract (keeping only positive counts) Counter({'a': 2}) >>> c & d # intersection: min(c[x], d[x]) Counter({'a': 1, 'b': 1}) >>> c | d # union: max(c[x], d[x]) Counter({'a': 3, 'b': 2}) Unary addition and subtraction are shortcuts for adding an empty counter or subtracting from an empty counter. >>> c = Counter(a=2, b=-4) >>> +c Counter({'a': 2}) >>> -c Counter({'b': 4}) New in version 3.3: Added support for unary plus, unary minus, and in- place multiset operations. Note: Counters were primarily designed to work with positive integers to represent running counts; however, care was taken to not unnecessarily preclude use cases needing other types or negative values. To help with those use cases, this section documents the minimum range and type restrictions. * The "Counter" class itself is a dictionary subclass with no restrictions on its keys and values. The values are intended to be numbers representing counts, but you *could* store anything in the value field. * The "most_common()" method requires only that the values be orderable. * For in-place operations such as "c[key] += 1", the value type need only support addition and subtraction. So fractions, floats, and decimals would work and negative values are supported. The same is also true for "update()" and "subtract()" which allow negative and zero values for both inputs and outputs. * The multiset methods are designed only for use cases with positive values. The inputs may be negative or zero, but only outputs with positive values are created. There are no type restrictions, but the value type needs to support addition, subtraction, and comparison. * The "elements()" method requires integer counts. It ignores zero and negative counts. See also: * Bag class in Smalltalk. * Wikipedia entry for Multisets. * C++ multisets tutorial with examples. * For mathematical operations on multisets and their use cases, see *Knuth, Donald. The Art of Computer Programming Volume II, Section 4.6.3, Exercise 19*. * To enumerate all distinct multisets of a given size over a given set of elements, see "itertools.combinations_with_replacement()": map(Counter, combinations_with_replacement('ABC', 2)) # --> AA AB AC BB BC CC "deque" objects =============== class collections.deque([iterable[, maxlen]]) Returns a new deque object initialized left-to-right (using "append()") with data from *iterable*. If *iterable* is not specified, the new deque is empty. Deques are a generalization of stacks and queues (the name is pronounced “deck” and is short for “double-ended queue”). Deques support thread-safe, memory efficient appends and pops from either side of the deque with approximately the same O(1) performance in either direction. Though "list" objects support similar operations, they are optimized for fast fixed-length operations and incur O(n) memory movement costs for "pop(0)" and "insert(0, v)" operations which change both the size and position of the underlying data representation. If *maxlen* is not specified or is "None", deques may grow to an arbitrary length. Otherwise, the deque is bounded to the specified maximum length. Once a bounded length deque is full, when new items are added, a corresponding number of items are discarded from the opposite end. Bounded length deques provide functionality similar to the "tail" filter in Unix. They are also useful for tracking transactions and other pools of data where only the most recent activity is of interest. Deque objects support the following methods: append(x) Add *x* to the right side of the deque. appendleft(x) Add *x* to the left side of the deque. clear() Remove all elements from the deque leaving it with length 0. copy() Create a shallow copy of the deque. New in version 3.5. count(x) Count the number of deque elements equal to *x*. New in version 3.2. extend(iterable) Extend the right side of the deque by appending elements from the iterable argument. extendleft(iterable) Extend the left side of the deque by appending elements from *iterable*. Note, the series of left appends results in reversing the order of elements in the iterable argument. index(x[, start[, stop]]) Return the position of *x* in the deque (at or after index *start* and before index *stop*). Returns the first match or raises "ValueError" if not found. New in version 3.5. insert(i, x) Insert *x* into the deque at position *i*. If the insertion would cause a bounded deque to grow beyond *maxlen*, an "IndexError" is raised. New in version 3.5. pop() Remove and return an element from the right side of the deque. If no elements are present, raises an "IndexError". popleft() Remove and return an element from the left side of the deque. If no elements are present, raises an "IndexError". remove(value) Remove the first occurrence of *value*. If not found, raises a "ValueError". reverse() Reverse the elements of the deque in-place and then return "None". New in version 3.2. rotate(n=1) Rotate the deque *n* steps to the right. If *n* is negative, rotate to the left. When the deque is not empty, rotating one step to the right is equivalent to "d.appendleft(d.pop())", and rotating one step to the left is equivalent to "d.append(d.popleft())". Deque objects also provide one read-only attribute: maxlen Maximum size of a deque or "None" if unbounded. New in version 3.1. In addition to the above, deques support iteration, pickling, "len(d)", "reversed(d)", "copy.copy(d)", "copy.deepcopy(d)", membership testing with the "in" operator, and subscript references such as "d[0]" to access the first element. Indexed access is O(1) at both ends but slows to O(n) in the middle. For fast random access, use lists instead. Starting in version 3.5, deques support "__add__()", "__mul__()", and "__imul__()". Example: >>> from collections import deque >>> d = deque('ghi') # make a new deque with three items >>> for elem in d: # iterate over the deque's elements ... print(elem.upper()) G H I >>> d.append('j') # add a new entry to the right side >>> d.appendleft('f') # add a new entry to the left side >>> d # show the representation of the deque deque(['f', 'g', 'h', 'i', 'j']) >>> d.pop() # return and remove the rightmost item 'j' >>> d.popleft() # return and remove the leftmost item 'f' >>> list(d) # list the contents of the deque ['g', 'h', 'i'] >>> d[0] # peek at leftmost item 'g' >>> d[-1] # peek at rightmost item 'i' >>> list(reversed(d)) # list the contents of a deque in reverse ['i', 'h', 'g'] >>> 'h' in d # search the deque True >>> d.extend('jkl') # add multiple elements at once >>> d deque(['g', 'h', 'i', 'j', 'k', 'l']) >>> d.rotate(1) # right rotation >>> d deque(['l', 'g', 'h', 'i', 'j', 'k']) >>> d.rotate(-1) # left rotation >>> d deque(['g', 'h', 'i', 'j', 'k', 'l']) >>> deque(reversed(d)) # make a new deque in reverse order deque(['l', 'k', 'j', 'i', 'h', 'g']) >>> d.clear() # empty the deque >>> d.pop() # cannot pop from an empty deque Traceback (most recent call last): File "", line 1, in -toplevel- d.pop() IndexError: pop from an empty deque >>> d.extendleft('abc') # extendleft() reverses the input order >>> d deque(['c', 'b', 'a']) "deque" Recipes --------------- This section shows various approaches to working with deques. Bounded length deques provide functionality similar to the "tail" filter in Unix: def tail(filename, n=10): 'Return the last n lines of a file' with open(filename) as f: return deque(f, n) Another approach to using deques is to maintain a sequence of recently added elements by appending to the right and popping to the left: def moving_average(iterable, n=3): # moving_average([40, 30, 50, 46, 39, 44]) --> 40.0 42.0 45.0 43.0 # http://en.wikipedia.org/wiki/Moving_average it = iter(iterable) d = deque(itertools.islice(it, n-1)) d.appendleft(0) s = sum(d) for elem in it: s += elem - d.popleft() d.append(elem) yield s / n A round-robin scheduler can be implemented with input iterators stored in a "deque". Values are yielded from the active iterator in position zero. If that iterator is exhausted, it can be removed with "popleft()"; otherwise, it can be cycled back to the end with the "rotate()" method: def roundrobin(*iterables): "roundrobin('ABC', 'D', 'EF') --> A D E B F C" iterators = deque(map(iter, iterables)) while iterators: try: while True: yield next(iterators[0]) iterators.rotate(-1) except StopIteration: # Remove an exhausted iterator. iterators.popleft() The "rotate()" method provides a way to implement "deque" slicing and deletion. For example, a pure Python implementation of "del d[n]" relies on the "rotate()" method to position elements to be popped: def delete_nth(d, n): d.rotate(-n) d.popleft() d.rotate(n) To implement "deque" slicing, use a similar approach applying "rotate()" to bring a target element to the left side of the deque. Remove old entries with "popleft()", add new entries with "extend()", and then reverse the rotation. With minor variations on that approach, it is easy to implement Forth style stack manipulations such as "dup", "drop", "swap", "over", "pick", "rot", and "roll". "defaultdict" objects ===================== class collections.defaultdict(default_factory=None, /[, ...]) Return a new dictionary-like object. "defaultdict" is a subclass of the built-in "dict" class. It overrides one method and adds one writable instance variable. The remaining functionality is the same as for the "dict" class and is not documented here. The first argument provides the initial value for the "default_factory" attribute; it defaults to "None". All remaining arguments are treated the same as if they were passed to the "dict" constructor, including keyword arguments. "defaultdict" objects support the following method in addition to the standard "dict" operations: __missing__(key) If the "default_factory" attribute is "None", this raises a "KeyError" exception with the *key* as argument. If "default_factory" is not "None", it is called without arguments to provide a default value for the given *key*, this value is inserted in the dictionary for the *key*, and returned. If calling "default_factory" raises an exception this exception is propagated unchanged. This method is called by the "__getitem__()" method of the "dict" class when the requested key is not found; whatever it returns or raises is then returned or raised by "__getitem__()". Note that "__missing__()" is *not* called for any operations besides "__getitem__()". This means that "get()" will, like normal dictionaries, return "None" as a default rather than using "default_factory". "defaultdict" objects support the following instance variable: default_factory This attribute is used by the "__missing__()" method; it is initialized from the first argument to the constructor, if present, or to "None", if absent. Changed in version 3.9: Added merge ("|") and update ("|=") operators, specified in **PEP 584**. "defaultdict" Examples ---------------------- Using "list" as the "default_factory", it is easy to group a sequence of key-value pairs into a dictionary of lists: >>> s = [('yellow', 1), ('blue', 2), ('yellow', 3), ('blue', 4), ('red', 1)] >>> d = defaultdict(list) >>> for k, v in s: ... d[k].append(v) ... >>> sorted(d.items()) [('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])] When each key is encountered for the first time, it is not already in the mapping; so an entry is automatically created using the "default_factory" function which returns an empty "list". The "list.append()" operation then attaches the value to the new list. When keys are encountered again, the look-up proceeds normally (returning the list for that key) and the "list.append()" operation adds another value to the list. This technique is simpler and faster than an equivalent technique using "dict.setdefault()": >>> d = {} >>> for k, v in s: ... d.setdefault(k, []).append(v) ... >>> sorted(d.items()) [('blue', [2, 4]), ('red', [1]), ('yellow', [1, 3])] Setting the "default_factory" to "int" makes the "defaultdict" useful for counting (like a bag or multiset in other languages): >>> s = 'mississippi' >>> d = defaultdict(int) >>> for k in s: ... d[k] += 1 ... >>> sorted(d.items()) [('i', 4), ('m', 1), ('p', 2), ('s', 4)] When a letter is first encountered, it is missing from the mapping, so the "default_factory" function calls "int()" to supply a default count of zero. The increment operation then builds up the count for each letter. The function "int()" which always returns zero is just a special case of constant functions. A faster and more flexible way to create constant functions is to use a lambda function which can supply any constant value (not just zero): >>> def constant_factory(value): ... return lambda: value >>> d = defaultdict(constant_factory('')) >>> d.update(name='John', action='ran') >>> '%(name)s %(action)s to %(object)s' % d 'John ran to ' Setting the "default_factory" to "set" makes the "defaultdict" useful for building a dictionary of sets: >>> s = [('red', 1), ('blue', 2), ('red', 3), ('blue', 4), ('red', 1), ('blue', 4)] >>> d = defaultdict(set) >>> for k, v in s: ... d[k].add(v) ... >>> sorted(d.items()) [('blue', {2, 4}), ('red', {1, 3})] "namedtuple()" Factory Function for Tuples with Named Fields ============================================================ Named tuples assign meaning to each position in a tuple and allow for more readable, self-documenting code. They can be used wherever regular tuples are used, and they add the ability to access fields by name instead of position index. collections.namedtuple(typename, field_names, *, rename=False, defaults=None, module=None) Returns a new tuple subclass named *typename*. The new subclass is used to create tuple-like objects that have fields accessible by attribute lookup as well as being indexable and iterable. Instances of the subclass also have a helpful docstring (with typename and field_names) and a helpful "__repr__()" method which lists the tuple contents in a "name=value" format. The *field_names* are a sequence of strings such as "['x', 'y']". Alternatively, *field_names* can be a single string with each fieldname separated by whitespace and/or commas, for example "'x y'" or "'x, y'". Any valid Python identifier may be used for a fieldname except for names starting with an underscore. Valid identifiers consist of letters, digits, and underscores but do not start with a digit or underscore and cannot be a "keyword" such as *class*, *for*, *return*, *global*, *pass*, or *raise*. If *rename* is true, invalid fieldnames are automatically replaced with positional names. For example, "['abc', 'def', 'ghi', 'abc']" is converted to "['abc', '_1', 'ghi', '_3']", eliminating the keyword "def" and the duplicate fieldname "abc". *defaults* can be "None" or an *iterable* of default values. Since fields with a default value must come after any fields without a default, the *defaults* are applied to the rightmost parameters. For example, if the fieldnames are "['x', 'y', 'z']" and the defaults are "(1, 2)", then "x" will be a required argument, "y" will default to "1", and "z" will default to "2". If *module* is defined, the "__module__" attribute of the named tuple is set to that value. Named tuple instances do not have per-instance dictionaries, so they are lightweight and require no more memory than regular tuples. To support pickling, the named tuple class should be assigned to a variable that matches *typename*. Changed in version 3.1: Added support for *rename*. Changed in version 3.6: The *verbose* and *rename* parameters became keyword-only arguments. Changed in version 3.6: Added the *module* parameter. Changed in version 3.7: Removed the *verbose* parameter and the "_source" attribute. Changed in version 3.7: Added the *defaults* parameter and the "_field_defaults" attribute. >>> # Basic example >>> Point = namedtuple('Point', ['x', 'y']) >>> p = Point(11, y=22) # instantiate with positional or keyword arguments >>> p[0] + p[1] # indexable like the plain tuple (11, 22) 33 >>> x, y = p # unpack like a regular tuple >>> x, y (11, 22) >>> p.x + p.y # fields also accessible by name 33 >>> p # readable __repr__ with a name=value style Point(x=11, y=22) Named tuples are especially useful for assigning field names to result tuples returned by the "csv" or "sqlite3" modules: EmployeeRecord = namedtuple('EmployeeRecord', 'name, age, title, department, paygrade') import csv for emp in map(EmployeeRecord._make, csv.reader(open("employees.csv", "rb"))): print(emp.name, emp.title) import sqlite3 conn = sqlite3.connect('/companydata') cursor = conn.cursor() cursor.execute('SELECT name, age, title, department, paygrade FROM employees') for emp in map(EmployeeRecord._make, cursor.fetchall()): print(emp.name, emp.title) In addition to the methods inherited from tuples, named tuples support three additional methods and two attributes. To prevent conflicts with field names, the method and attribute names start with an underscore. classmethod somenamedtuple._make(iterable) Class method that makes a new instance from an existing sequence or iterable. >>> t = [11, 22] >>> Point._make(t) Point(x=11, y=22) somenamedtuple._asdict() Return a new "dict" which maps field names to their corresponding values: >>> p = Point(x=11, y=22) >>> p._asdict() {'x': 11, 'y': 22} Changed in version 3.1: Returns an "OrderedDict" instead of a regular "dict". Changed in version 3.8: Returns a regular "dict" instead of an "OrderedDict". As of Python 3.7, regular dicts are guaranteed to be ordered. If the extra features of "OrderedDict" are required, the suggested remediation is to cast the result to the desired type: "OrderedDict(nt._asdict())". somenamedtuple._replace(**kwargs) Return a new instance of the named tuple replacing specified fields with new values: >>> p = Point(x=11, y=22) >>> p._replace(x=33) Point(x=33, y=22) >>> for partnum, record in inventory.items(): ... inventory[partnum] = record._replace(price=newprices[partnum], timestamp=time.now()) somenamedtuple._fields Tuple of strings listing the field names. Useful for introspection and for creating new named tuple types from existing named tuples. >>> p._fields # view the field names ('x', 'y') >>> Color = namedtuple('Color', 'red green blue') >>> Pixel = namedtuple('Pixel', Point._fields + Color._fields) >>> Pixel(11, 22, 128, 255, 0) Pixel(x=11, y=22, red=128, green=255, blue=0) somenamedtuple._field_defaults Dictionary mapping field names to default values. >>> Account = namedtuple('Account', ['type', 'balance'], defaults=[0]) >>> Account._field_defaults {'balance': 0} >>> Account('premium') Account(type='premium', balance=0) To retrieve a field whose name is stored in a string, use the "getattr()" function: >>> getattr(p, 'x') 11 To convert a dictionary to a named tuple, use the double-star-operator (as described in Unpacking Argument Lists): >>> d = {'x': 11, 'y': 22} >>> Point(**d) Point(x=11, y=22) Since a named tuple is a regular Python class, it is easy to add or change functionality with a subclass. Here is how to add a calculated field and a fixed-width print format: >>> class Point(namedtuple('Point', ['x', 'y'])): ... __slots__ = () ... @property ... def hypot(self): ... return (self.x ** 2 + self.y ** 2) ** 0.5 ... def __str__(self): ... return 'Point: x=%6.3f y=%6.3f hypot=%6.3f' % (self.x, self.y, self.hypot) >>> for p in Point(3, 4), Point(14, 5/7): ... print(p) Point: x= 3.000 y= 4.000 hypot= 5.000 Point: x=14.000 y= 0.714 hypot=14.018 The subclass shown above sets "__slots__" to an empty tuple. This helps keep memory requirements low by preventing the creation of instance dictionaries. Subclassing is not useful for adding new, stored fields. Instead, simply create a new named tuple type from the "_fields" attribute: >>> Point3D = namedtuple('Point3D', Point._fields + ('z',)) Docstrings can be customized by making direct assignments to the "__doc__" fields: >>> Book = namedtuple('Book', ['id', 'title', 'authors']) >>> Book.__doc__ += ': Hardcover book in active collection' >>> Book.id.__doc__ = '13-digit ISBN' >>> Book.title.__doc__ = 'Title of first printing' >>> Book.authors.__doc__ = 'List of authors sorted by last name' Changed in version 3.5: Property docstrings became writeable. See also: * See "typing.NamedTuple" for a way to add type hints for named tuples. It also provides an elegant notation using the "class" keyword: class Component(NamedTuple): part_number: int weight: float description: Optional[str] = None * See "types.SimpleNamespace()" for a mutable namespace based on an underlying dictionary instead of a tuple. * The "dataclasses" module provides a decorator and functions for automatically adding generated special methods to user-defined classes. "OrderedDict" objects ===================== Ordered dictionaries are just like regular dictionaries but have some extra capabilities relating to ordering operations. They have become less important now that the built-in "dict" class gained the ability to remember insertion order (this new behavior became guaranteed in Python 3.7). Some differences from "dict" still remain: * The regular "dict" was designed to be very good at mapping operations. Tracking insertion order was secondary. * The "OrderedDict" was designed to be good at reordering operations. Space efficiency, iteration speed, and the performance of update operations were secondary. * Algorithmically, "OrderedDict" can handle frequent reordering operations better than "dict". This makes it suitable for tracking recent accesses (for example in an LRU cache). * The equality operation for "OrderedDict" checks for matching order. * The "popitem()" method of "OrderedDict" has a different signature. It accepts an optional argument to specify which item is popped. * "OrderedDict" has a "move_to_end()" method to efficiently reposition an element to an endpoint. * Until Python 3.8, "dict" lacked a "__reversed__()" method. class collections.OrderedDict([items]) Return an instance of a "dict" subclass that has methods specialized for rearranging dictionary order. New in version 3.1. popitem(last=True) The "popitem()" method for ordered dictionaries returns and removes a (key, value) pair. The pairs are returned in LIFO (last-in, first-out) order if *last* is true or FIFO (first-in, first-out) order if false. move_to_end(key, last=True) Move an existing *key* to either end of an ordered dictionary. The item is moved to the right end if *last* is true (the default) or to the beginning if *last* is false. Raises "KeyError" if the *key* does not exist: >>> d = OrderedDict.fromkeys('abcde') >>> d.move_to_end('b') >>> ''.join(d.keys()) 'acdeb' >>> d.move_to_end('b', last=False) >>> ''.join(d.keys()) 'bacde' New in version 3.2. In addition to the usual mapping methods, ordered dictionaries also support reverse iteration using "reversed()". Equality tests between "OrderedDict" objects are order-sensitive and are implemented as "list(od1.items())==list(od2.items())". Equality tests between "OrderedDict" objects and other "Mapping" objects are order-insensitive like regular dictionaries. This allows "OrderedDict" objects to be substituted anywhere a regular dictionary is used. Changed in version 3.5: The items, keys, and values *views* of "OrderedDict" now support reverse iteration using "reversed()". Changed in version 3.6: With the acceptance of **PEP 468**, order is retained for keyword arguments passed to the "OrderedDict" constructor and its "update()" method. Changed in version 3.9: Added merge ("|") and update ("|=") operators, specified in **PEP 584**. "OrderedDict" Examples and Recipes ---------------------------------- It is straightforward to create an ordered dictionary variant that remembers the order the keys were *last* inserted. If a new entry overwrites an existing entry, the original insertion position is changed and moved to the end: class LastUpdatedOrderedDict(OrderedDict): 'Store items in the order the keys were last added' def __setitem__(self, key, value): super().__setitem__(key, value) self.move_to_end(key) An "OrderedDict" would also be useful for implementing variants of "functools.lru_cache()": class LRU: def __init__(self, func, maxsize=128): self.func = func self.maxsize = maxsize self.cache = OrderedDict() def __call__(self, *args): if args in self.cache: value = self.cache[args] self.cache.move_to_end(args) return value value = self.func(*args) if len(self.cache) >= self.maxsize: self.cache.popitem(False) self.cache[args] = value return value "UserDict" objects ================== The class, "UserDict" acts as a wrapper around dictionary objects. The need for this class has been partially supplanted by the ability to subclass directly from "dict"; however, this class can be easier to work with because the underlying dictionary is accessible as an attribute. class collections.UserDict([initialdata]) Class that simulates a dictionary. The instance’s contents are kept in a regular dictionary, which is accessible via the "data" attribute of "UserDict" instances. If *initialdata* is provided, "data" is initialized with its contents; note that a reference to *initialdata* will not be kept, allowing it to be used for other purposes. In addition to supporting the methods and operations of mappings, "UserDict" instances provide the following attribute: data A real dictionary used to store the contents of the "UserDict" class. "UserList" objects ================== This class acts as a wrapper around list objects. It is a useful base class for your own list-like classes which can inherit from them and override existing methods or add new ones. In this way, one can add new behaviors to lists. The need for this class has been partially supplanted by the ability to subclass directly from "list"; however, this class can be easier to work with because the underlying list is accessible as an attribute. class collections.UserList([list]) Class that simulates a list. The instance’s contents are kept in a regular list, which is accessible via the "data" attribute of "UserList" instances. The instance’s contents are initially set to a copy of *list*, defaulting to the empty list "[]". *list* can be any iterable, for example a real Python list or a "UserList" object. In addition to supporting the methods and operations of mutable sequences, "UserList" instances provide the following attribute: data A real "list" object used to store the contents of the "UserList" class. **Subclassing requirements:** Subclasses of "UserList" are expected to offer a constructor which can be called with either no arguments or one argument. List operations which return a new sequence attempt to create an instance of the actual implementation class. To do so, it assumes that the constructor can be called with a single parameter, which is a sequence object used as a data source. If a derived class does not wish to comply with this requirement, all of the special methods supported by this class will need to be overridden; please consult the sources for information about the methods which need to be provided in that case. "UserString" objects ==================== The class, "UserString" acts as a wrapper around string objects. The need for this class has been partially supplanted by the ability to subclass directly from "str"; however, this class can be easier to work with because the underlying string is accessible as an attribute. class collections.UserString(seq) Class that simulates a string object. The instance’s content is kept in a regular string object, which is accessible via the "data" attribute of "UserString" instances. The instance’s contents are initially set to a copy of *seq*. The *seq* argument can be any object which can be converted into a string using the built-in "str()" function. In addition to supporting the methods and operations of strings, "UserString" instances provide the following attribute: data A real "str" object used to store the contents of the "UserString" class. Changed in version 3.5: New methods "__getnewargs__", "__rmod__", "casefold", "format_map", "isprintable", and "maketrans". "colorsys" — Conversions between color systems ********************************************** **Source code:** Lib/colorsys.py ====================================================================== The "colorsys" module defines bidirectional conversions of color values between colors expressed in the RGB (Red Green Blue) color space used in computer monitors and three other coordinate systems: YIQ, HLS (Hue Lightness Saturation) and HSV (Hue Saturation Value). Coordinates in all of these color spaces are floating point values. In the YIQ space, the Y coordinate is between 0 and 1, but the I and Q coordinates can be positive or negative. In all other spaces, the coordinates are all between 0 and 1. See also: More information about color spaces can be found at https://poynton.ca/ColorFAQ.html and https://www.cambridgeincolour.com/tutorials/color-spaces.htm. The "colorsys" module defines the following functions: colorsys.rgb_to_yiq(r, g, b) Convert the color from RGB coordinates to YIQ coordinates. colorsys.yiq_to_rgb(y, i, q) Convert the color from YIQ coordinates to RGB coordinates. colorsys.rgb_to_hls(r, g, b) Convert the color from RGB coordinates to HLS coordinates. colorsys.hls_to_rgb(h, l, s) Convert the color from HLS coordinates to RGB coordinates. colorsys.rgb_to_hsv(r, g, b) Convert the color from RGB coordinates to HSV coordinates. colorsys.hsv_to_rgb(h, s, v) Convert the color from HSV coordinates to RGB coordinates. Example: >>> import colorsys >>> colorsys.rgb_to_hsv(0.2, 0.4, 0.4) (0.5, 0.5, 0.4) >>> colorsys.hsv_to_rgb(0.5, 0.5, 0.4) (0.2, 0.4, 0.4) "compileall" — Byte-compile Python libraries ******************************************** **Source code:** Lib/compileall.py ====================================================================== This module provides some utility functions to support installing Python libraries. These functions compile Python source files in a directory tree. This module can be used to create the cached byte-code files at library installation time, which makes them available for use even by users who don’t have write permission to the library directories. Command-line use ================ This module can work as a script (using **python -m compileall**) to compile Python sources. directory ... file ... Positional arguments are files to compile or directories that contain source files, traversed recursively. If no argument is given, behave as if the command line was "-l ". -l Do not recurse into subdirectories, only compile source code files directly contained in the named or implied directories. -f Force rebuild even if timestamps are up-to-date. -q Do not print the list of files compiled. If passed once, error messages will still be printed. If passed twice ("-qq"), all output is suppressed. -d destdir Directory prepended to the path to each file being compiled. This will appear in compilation time tracebacks, and is also compiled in to the byte-code file, where it will be used in tracebacks and other messages in cases where the source file does not exist at the time the byte-code file is executed. -s strip_prefix -p prepend_prefix Remove ("-s") or append ("-p") the given prefix of paths recorded in the ".pyc" files. Cannot be combined with "-d". -x regex regex is used to search the full path to each file considered for compilation, and if the regex produces a match, the file is skipped. -i list Read the file "list" and add each line that it contains to the list of files and directories to compile. If "list" is "-", read lines from "stdin". -b Write the byte-code files to their legacy locations and names, which may overwrite byte-code files created by another version of Python. The default is to write files to their **PEP 3147** locations and names, which allows byte-code files from multiple versions of Python to coexist. -r Control the maximum recursion level for subdirectories. If this is given, then "-l" option will not be taken into account. **python -m compileall -r 0** is equivalent to **python -m compileall -l**. -j N Use *N* workers to compile the files within the given directory. If "0" is used, then the result of "os.cpu_count()" will be used. --invalidation-mode [timestamp|checked-hash|unchecked-hash] Control how the generated byte-code files are invalidated at runtime. The "timestamp" value, means that ".pyc" files with the source timestamp and size embedded will be generated. The "checked- hash" and "unchecked-hash" values cause hash-based pycs to be generated. Hash-based pycs embed a hash of the source file contents rather than a timestamp. See Cached bytecode invalidation for more information on how Python validates bytecode cache files at runtime. The default is "timestamp" if the "SOURCE_DATE_EPOCH" environment variable is not set, and "checked-hash" if the "SOURCE_DATE_EPOCH" environment variable is set. -o level Compile with the given optimization level. May be used multiple times to compile for multiple levels at a time (for example, "compileall -o 1 -o 2"). -e dir Ignore symlinks pointing outside the given directory. --hardlink-dupes If two ".pyc" files with different optimization level have the same content, use hard links to consolidate duplicate files. Changed in version 3.2: Added the "-i", "-b" and "-h" options. Changed in version 3.5: Added the "-j", "-r", and "-qq" options. "-q" option was changed to a multilevel value. "-b" will always produce a byte-code file ending in ".pyc", never ".pyo". Changed in version 3.7: Added the "--invalidation-mode" option. Changed in version 3.9: Added the "-s", "-p", "-e" and "--hardlink- dupes" options. Raised the default recursion limit from 10 to "sys.getrecursionlimit()". Added the possibility to specify the "-o" option multiple times. There is no command-line option to control the optimization level used by the "compile()" function, because the Python interpreter itself already provides the option: **python -O -m compileall**. Similarly, the "compile()" function respects the "sys.pycache_prefix" setting. The generated bytecode cache will only be useful if "compile()" is run with the same "sys.pycache_prefix" (if any) that will be used at runtime. Public functions ================ compileall.compile_dir(dir, maxlevels=sys.getrecursionlimit(), ddir=None, force=False, rx=None, quiet=0, legacy=False, optimize=-1, workers=1, invalidation_mode=None, *, stripdir=None, prependdir=None, limit_sl_dest=None, hardlink_dupes=False) Recursively descend the directory tree named by *dir*, compiling all ".py" files along the way. Return a true value if all the files compiled successfully, and a false value otherwise. The *maxlevels* parameter is used to limit the depth of the recursion; it defaults to "sys.getrecursionlimit()". If *ddir* is given, it is prepended to the path to each file being compiled for use in compilation time tracebacks, and is also compiled in to the byte-code file, where it will be used in tracebacks and other messages in cases where the source file does not exist at the time the byte-code file is executed. If *force* is true, modules are re-compiled even if the timestamps are up to date. If *rx* is given, its "search" method is called on the complete path to each file considered for compilation, and if it returns a true value, the file is skipped. This can be used to exclude files matching a regular expression, given as a re.Pattern object. If *quiet* is "False" or "0" (the default), the filenames and other information are printed to standard out. Set to "1", only errors are printed. Set to "2", all output is suppressed. If *legacy* is true, byte-code files are written to their legacy locations and names, which may overwrite byte-code files created by another version of Python. The default is to write files to their **PEP 3147** locations and names, which allows byte-code files from multiple versions of Python to coexist. *optimize* specifies the optimization level for the compiler. It is passed to the built-in "compile()" function. Accepts also a sequence of optimization levels which lead to multiple compilations of one ".py" file in one call. The argument *workers* specifies how many workers are used to compile files in parallel. The default is to not use multiple workers. If the platform can’t use multiple workers and *workers* argument is given, then sequential compilation will be used as a fallback. If *workers* is 0, the number of cores in the system is used. If *workers* is lower than "0", a "ValueError" will be raised. *invalidation_mode* should be a member of the "py_compile.PycInvalidationMode" enum and controls how the generated pycs are invalidated at runtime. The *stripdir*, *prependdir* and *limit_sl_dest* arguments correspond to the "-s", "-p" and "-e" options described above. They may be specified as "str", "bytes" or "os.PathLike". If *hardlink_dupes* is true and two ".pyc" files with different optimization level have the same content, use hard links to consolidate duplicate files. Changed in version 3.2: Added the *legacy* and *optimize* parameter. Changed in version 3.5: Added the *workers* parameter. Changed in version 3.5: *quiet* parameter was changed to a multilevel value. Changed in version 3.5: The *legacy* parameter only writes out ".pyc" files, not ".pyo" files no matter what the value of *optimize* is. Changed in version 3.6: Accepts a *path-like object*. Changed in version 3.7: The *invalidation_mode* parameter was added. Changed in version 3.7.2: The *invalidation_mode* parameter’s default value is updated to None. Changed in version 3.8: Setting *workers* to 0 now chooses the optimal number of cores. Changed in version 3.9: Added *stripdir*, *prependdir*, *limit_sl_dest* and *hardlink_dupes* arguments. Default value of *maxlevels* was changed from "10" to "sys.getrecursionlimit()" compileall.compile_file(fullname, ddir=None, force=False, rx=None, quiet=0, legacy=False, optimize=-1, invalidation_mode=None, *, stripdir=None, prependdir=None, limit_sl_dest=None, hardlink_dupes=False) Compile the file with path *fullname*. Return a true value if the file compiled successfully, and a false value otherwise. If *ddir* is given, it is prepended to the path to the file being compiled for use in compilation time tracebacks, and is also compiled in to the byte-code file, where it will be used in tracebacks and other messages in cases where the source file does not exist at the time the byte-code file is executed. If *rx* is given, its "search" method is passed the full path name to the file being compiled, and if it returns a true value, the file is not compiled and "True" is returned. This can be used to exclude files matching a regular expression, given as a re.Pattern object. If *quiet* is "False" or "0" (the default), the filenames and other information are printed to standard out. Set to "1", only errors are printed. Set to "2", all output is suppressed. If *legacy* is true, byte-code files are written to their legacy locations and names, which may overwrite byte-code files created by another version of Python. The default is to write files to their **PEP 3147** locations and names, which allows byte-code files from multiple versions of Python to coexist. *optimize* specifies the optimization level for the compiler. It is passed to the built-in "compile()" function. Accepts also a sequence of optimization levels which lead to multiple compilations of one ".py" file in one call. *invalidation_mode* should be a member of the "py_compile.PycInvalidationMode" enum and controls how the generated pycs are invalidated at runtime. The *stripdir*, *prependdir* and *limit_sl_dest* arguments correspond to the "-s", "-p" and "-e" options described above. They may be specified as "str", "bytes" or "os.PathLike". If *hardlink_dupes* is true and two ".pyc" files with different optimization level have the same content, use hard links to consolidate duplicate files. New in version 3.2. Changed in version 3.5: *quiet* parameter was changed to a multilevel value. Changed in version 3.5: The *legacy* parameter only writes out ".pyc" files, not ".pyo" files no matter what the value of *optimize* is. Changed in version 3.7: The *invalidation_mode* parameter was added. Changed in version 3.7.2: The *invalidation_mode* parameter’s default value is updated to None. Changed in version 3.9: Added *stripdir*, *prependdir*, *limit_sl_dest* and *hardlink_dupes* arguments. compileall.compile_path(skip_curdir=True, maxlevels=0, force=False, quiet=0, legacy=False, optimize=-1, invalidation_mode=None) Byte-compile all the ".py" files found along "sys.path". Return a true value if all the files compiled successfully, and a false value otherwise. If *skip_curdir* is true (the default), the current directory is not included in the search. All other parameters are passed to the "compile_dir()" function. Note that unlike the other compile functions, "maxlevels" defaults to "0". Changed in version 3.2: Added the *legacy* and *optimize* parameter. Changed in version 3.5: *quiet* parameter was changed to a multilevel value. Changed in version 3.5: The *legacy* parameter only writes out ".pyc" files, not ".pyo" files no matter what the value of *optimize* is. Changed in version 3.7: The *invalidation_mode* parameter was added. Changed in version 3.7.2: The *invalidation_mode* parameter’s default value is updated to None. To force a recompile of all the ".py" files in the "Lib/" subdirectory and all its subdirectories: import compileall compileall.compile_dir('Lib/', force=True) # Perform same compilation, excluding files in .svn directories. import re compileall.compile_dir('Lib/', rx=re.compile(r'[/\\][.]svn'), force=True) # pathlib.Path objects can also be used. import pathlib compileall.compile_dir(pathlib.Path('Lib/'), force=True) See also: Module "py_compile" Byte-compile a single source file. Concurrent Execution ******************** The modules described in this chapter provide support for concurrent execution of code. The appropriate choice of tool will depend on the task to be executed (CPU bound vs IO bound) and preferred style of development (event driven cooperative multitasking vs preemptive multitasking). Here’s an overview: * "threading" — Thread-based parallelism * Thread-Local Data * Thread Objects * Lock Objects * RLock Objects * Condition Objects * Semaphore Objects * "Semaphore" Example * Event Objects * Timer Objects * Barrier Objects * Using locks, conditions, and semaphores in the "with" statement * "multiprocessing" — Process-based parallelism * Introduction * The "Process" class * Contexts and start methods * Exchanging objects between processes * Synchronization between processes * Sharing state between processes * Using a pool of workers * Reference * "Process" and exceptions * Pipes and Queues * Miscellaneous * Connection Objects * Synchronization primitives * Shared "ctypes" Objects * The "multiprocessing.sharedctypes" module * Managers * Customized managers * Using a remote manager * Proxy Objects * Cleanup * Process Pools * Listeners and Clients * Address Formats * Authentication keys * Logging * The "multiprocessing.dummy" module * Programming guidelines * All start methods * The *spawn* and *forkserver* start methods * Examples * "multiprocessing.shared_memory" — Provides shared memory for direct access across processes * The "concurrent" package * "concurrent.futures" — Launching parallel tasks * Executor Objects * ThreadPoolExecutor * ThreadPoolExecutor Example * ProcessPoolExecutor * ProcessPoolExecutor Example * Future Objects * Module Functions * Exception classes * "subprocess" — Subprocess management * Using the "subprocess" Module * Frequently Used Arguments * Popen Constructor * Exceptions * Security Considerations * Popen Objects * Windows Popen Helpers * Windows Constants * Older high-level API * Replacing Older Functions with the "subprocess" Module * Replacing **/bin/sh** shell command substitution * Replacing shell pipeline * Replacing "os.system()" * Replacing the "os.spawn" family * Replacing "os.popen()", "os.popen2()", "os.popen3()" * Replacing functions from the "popen2" module * Legacy Shell Invocation Functions * Notes * Converting an argument sequence to a string on Windows * "sched" — Event scheduler * Scheduler Objects * "queue" — A synchronized queue class * Queue Objects * SimpleQueue Objects * "contextvars" — Context Variables * Context Variables * Manual Context Management * asyncio support The following are support modules for some of the above services: * "_thread" — Low-level threading API "concurrent.futures" — Launching parallel tasks *********************************************** New in version 3.2. **Source code:** Lib/concurrent/futures/thread.py and Lib/concurrent/futures/process.py ====================================================================== The "concurrent.futures" module provides a high-level interface for asynchronously executing callables. The asynchronous execution can be performed with threads, using "ThreadPoolExecutor", or separate processes, using "ProcessPoolExecutor". Both implement the same interface, which is defined by the abstract "Executor" class. Executor Objects ================ class concurrent.futures.Executor An abstract class that provides methods to execute calls asynchronously. It should not be used directly, but through its concrete subclasses. submit(fn, /, *args, **kwargs) Schedules the callable, *fn*, to be executed as "fn(*args, **kwargs)" and returns a "Future" object representing the execution of the callable. with ThreadPoolExecutor(max_workers=1) as executor: future = executor.submit(pow, 323, 1235) print(future.result()) map(func, *iterables, timeout=None, chunksize=1) Similar to "map(func, *iterables)" except: * the *iterables* are collected immediately rather than lazily; * *func* is executed asynchronously and several calls to *func* may be made concurrently. The returned iterator raises a "concurrent.futures.TimeoutError" if "__next__()" is called and the result isn’t available after *timeout* seconds from the original call to "Executor.map()". *timeout* can be an int or a float. If *timeout* is not specified or "None", there is no limit to the wait time. If a *func* call raises an exception, then that exception will be raised when its value is retrieved from the iterator. When using "ProcessPoolExecutor", this method chops *iterables* into a number of chunks which it submits to the pool as separate tasks. The (approximate) size of these chunks can be specified by setting *chunksize* to a positive integer. For very long iterables, using a large value for *chunksize* can significantly improve performance compared to the default size of 1. With "ThreadPoolExecutor", *chunksize* has no effect. Changed in version 3.5: Added the *chunksize* argument. shutdown(wait=True, *, cancel_futures=False) Signal the executor that it should free any resources that it is using when the currently pending futures are done executing. Calls to "Executor.submit()" and "Executor.map()" made after shutdown will raise "RuntimeError". If *wait* is "True" then this method will not return until all the pending futures are done executing and the resources associated with the executor have been freed. If *wait* is "False" then this method will return immediately and the resources associated with the executor will be freed when all pending futures are done executing. Regardless of the value of *wait*, the entire Python program will not exit until all pending futures are done executing. If *cancel_futures* is "True", this method will cancel all pending futures that the executor has not started running. Any futures that are completed or running won’t be cancelled, regardless of the value of *cancel_futures*. If both *cancel_futures* and *wait* are "True", all futures that the executor has started running will be completed prior to this method returning. The remaining futures are cancelled. You can avoid having to call this method explicitly if you use the "with" statement, which will shutdown the "Executor" (waiting as if "Executor.shutdown()" were called with *wait* set to "True"): import shutil with ThreadPoolExecutor(max_workers=4) as e: e.submit(shutil.copy, 'src1.txt', 'dest1.txt') e.submit(shutil.copy, 'src2.txt', 'dest2.txt') e.submit(shutil.copy, 'src3.txt', 'dest3.txt') e.submit(shutil.copy, 'src4.txt', 'dest4.txt') Changed in version 3.9: Added *cancel_futures*. ThreadPoolExecutor ================== "ThreadPoolExecutor" is an "Executor" subclass that uses a pool of threads to execute calls asynchronously. Deadlocks can occur when the callable associated with a "Future" waits on the results of another "Future". For example: import time def wait_on_b(): time.sleep(5) print(b.result()) # b will never complete because it is waiting on a. return 5 def wait_on_a(): time.sleep(5) print(a.result()) # a will never complete because it is waiting on b. return 6 executor = ThreadPoolExecutor(max_workers=2) a = executor.submit(wait_on_b) b = executor.submit(wait_on_a) And: def wait_on_future(): f = executor.submit(pow, 5, 2) # This will never complete because there is only one worker thread and # it is executing this function. print(f.result()) executor = ThreadPoolExecutor(max_workers=1) executor.submit(wait_on_future) class concurrent.futures.ThreadPoolExecutor(max_workers=None, thread_name_prefix='', initializer=None, initargs=()) An "Executor" subclass that uses a pool of at most *max_workers* threads to execute calls asynchronously. *initializer* is an optional callable that is called at the start of each worker thread; *initargs* is a tuple of arguments passed to the initializer. Should *initializer* raise an exception, all currently pending jobs will raise a "BrokenThreadPool", as well as any attempt to submit more jobs to the pool. Changed in version 3.5: If *max_workers* is "None" or not given, it will default to the number of processors on the machine, multiplied by "5", assuming that "ThreadPoolExecutor" is often used to overlap I/O instead of CPU work and the number of workers should be higher than the number of workers for "ProcessPoolExecutor". New in version 3.6: The *thread_name_prefix* argument was added to allow users to control the "threading.Thread" names for worker threads created by the pool for easier debugging. Changed in version 3.7: Added the *initializer* and *initargs* arguments. Changed in version 3.8: Default value of *max_workers* is changed to "min(32, os.cpu_count() + 4)". This default value preserves at least 5 workers for I/O bound tasks. It utilizes at most 32 CPU cores for CPU bound tasks which release the GIL. And it avoids using very large resources implicitly on many-core machines.ThreadPoolExecutor now reuses idle worker threads before starting *max_workers* worker threads too. ThreadPoolExecutor Example -------------------------- import concurrent.futures import urllib.request URLS = ['http://www.foxnews.com/', 'http://www.cnn.com/', 'http://europe.wsj.com/', 'http://www.bbc.co.uk/', 'http://nonexistant-subdomain.python.org/'] # Retrieve a single page and report the URL and contents def load_url(url, timeout): with urllib.request.urlopen(url, timeout=timeout) as conn: return conn.read() # We can use a with statement to ensure threads are cleaned up promptly with concurrent.futures.ThreadPoolExecutor(max_workers=5) as executor: # Start the load operations and mark each future with its URL future_to_url = {executor.submit(load_url, url, 60): url for url in URLS} for future in concurrent.futures.as_completed(future_to_url): url = future_to_url[future] try: data = future.result() except Exception as exc: print('%r generated an exception: %s' % (url, exc)) else: print('%r page is %d bytes' % (url, len(data))) ProcessPoolExecutor =================== The "ProcessPoolExecutor" class is an "Executor" subclass that uses a pool of processes to execute calls asynchronously. "ProcessPoolExecutor" uses the "multiprocessing" module, which allows it to side-step the *Global Interpreter Lock* but also means that only picklable objects can be executed and returned. The "__main__" module must be importable by worker subprocesses. This means that "ProcessPoolExecutor" will not work in the interactive interpreter. Calling "Executor" or "Future" methods from a callable submitted to a "ProcessPoolExecutor" will result in deadlock. class concurrent.futures.ProcessPoolExecutor(max_workers=None, mp_context=None, initializer=None, initargs=()) An "Executor" subclass that executes calls asynchronously using a pool of at most *max_workers* processes. If *max_workers* is "None" or not given, it will default to the number of processors on the machine. If *max_workers* is less than or equal to "0", then a "ValueError" will be raised. On Windows, *max_workers* must be less than or equal to "61". If it is not then "ValueError" will be raised. If *max_workers* is "None", then the default chosen will be at most "61", even if more processors are available. *mp_context* can be a multiprocessing context or None. It will be used to launch the workers. If *mp_context* is "None" or not given, the default multiprocessing context is used. *initializer* is an optional callable that is called at the start of each worker process; *initargs* is a tuple of arguments passed to the initializer. Should *initializer* raise an exception, all currently pending jobs will raise a "BrokenProcessPool", as well as any attempt to submit more jobs to the pool. Changed in version 3.3: When one of the worker processes terminates abruptly, a "BrokenProcessPool" error is now raised. Previously, behaviour was undefined but operations on the executor or its futures would often freeze or deadlock. Changed in version 3.7: The *mp_context* argument was added to allow users to control the start_method for worker processes created by the pool.Added the *initializer* and *initargs* arguments. ProcessPoolExecutor Example --------------------------- import concurrent.futures import math PRIMES = [ 112272535095293, 112582705942171, 112272535095293, 115280095190773, 115797848077099, 1099726899285419] def is_prime(n): if n < 2: return False if n == 2: return True if n % 2 == 0: return False sqrt_n = int(math.floor(math.sqrt(n))) for i in range(3, sqrt_n + 1, 2): if n % i == 0: return False return True def main(): with concurrent.futures.ProcessPoolExecutor() as executor: for number, prime in zip(PRIMES, executor.map(is_prime, PRIMES)): print('%d is prime: %s' % (number, prime)) if __name__ == '__main__': main() Future Objects ============== The "Future" class encapsulates the asynchronous execution of a callable. "Future" instances are created by "Executor.submit()". class concurrent.futures.Future Encapsulates the asynchronous execution of a callable. "Future" instances are created by "Executor.submit()" and should not be created directly except for testing. cancel() Attempt to cancel the call. If the call is currently being executed or finished running and cannot be cancelled then the method will return "False", otherwise the call will be cancelled and the method will return "True". cancelled() Return "True" if the call was successfully cancelled. running() Return "True" if the call is currently being executed and cannot be cancelled. done() Return "True" if the call was successfully cancelled or finished running. result(timeout=None) Return the value returned by the call. If the call hasn’t yet completed then this method will wait up to *timeout* seconds. If the call hasn’t completed in *timeout* seconds, then a "concurrent.futures.TimeoutError" will be raised. *timeout* can be an int or float. If *timeout* is not specified or "None", there is no limit to the wait time. If the future is cancelled before completing then "CancelledError" will be raised. If the call raised an exception, this method will raise the same exception. exception(timeout=None) Return the exception raised by the call. If the call hasn’t yet completed then this method will wait up to *timeout* seconds. If the call hasn’t completed in *timeout* seconds, then a "concurrent.futures.TimeoutError" will be raised. *timeout* can be an int or float. If *timeout* is not specified or "None", there is no limit to the wait time. If the future is cancelled before completing then "CancelledError" will be raised. If the call completed without raising, "None" is returned. add_done_callback(fn) Attaches the callable *fn* to the future. *fn* will be called, with the future as its only argument, when the future is cancelled or finishes running. Added callables are called in the order that they were added and are always called in a thread belonging to the process that added them. If the callable raises an "Exception" subclass, it will be logged and ignored. If the callable raises a "BaseException" subclass, the behavior is undefined. If the future has already completed or been cancelled, *fn* will be called immediately. The following "Future" methods are meant for use in unit tests and "Executor" implementations. set_running_or_notify_cancel() This method should only be called by "Executor" implementations before executing the work associated with the "Future" and by unit tests. If the method returns "False" then the "Future" was cancelled, i.e. "Future.cancel()" was called and returned *True*. Any threads waiting on the "Future" completing (i.e. through "as_completed()" or "wait()") will be woken up. If the method returns "True" then the "Future" was not cancelled and has been put in the running state, i.e. calls to "Future.running()" will return *True*. This method can only be called once and cannot be called after "Future.set_result()" or "Future.set_exception()" have been called. set_result(result) Sets the result of the work associated with the "Future" to *result*. This method should only be used by "Executor" implementations and unit tests. Changed in version 3.8: This method raises "concurrent.futures.InvalidStateError" if the "Future" is already done. set_exception(exception) Sets the result of the work associated with the "Future" to the "Exception" *exception*. This method should only be used by "Executor" implementations and unit tests. Changed in version 3.8: This method raises "concurrent.futures.InvalidStateError" if the "Future" is already done. Module Functions ================ concurrent.futures.wait(fs, timeout=None, return_when=ALL_COMPLETED) Wait for the "Future" instances (possibly created by different "Executor" instances) given by *fs* to complete. Duplicate futures given to *fs* are removed and will be returned only once. Returns a named 2-tuple of sets. The first set, named "done", contains the futures that completed (finished or cancelled futures) before the wait completed. The second set, named "not_done", contains the futures that did not complete (pending or running futures). *timeout* can be used to control the maximum number of seconds to wait before returning. *timeout* can be an int or float. If *timeout* is not specified or "None", there is no limit to the wait time. *return_when* indicates when this function should return. It must be one of the following constants: +-------------------------------+------------------------------------------+ | Constant | Description | |===============================|==========================================| | "FIRST_COMPLETED" | The function will return when any future | | | finishes or is cancelled. | +-------------------------------+------------------------------------------+ | "FIRST_EXCEPTION" | The function will return when any future | | | finishes by raising an exception. If no | | | future raises an exception then it is | | | equivalent to "ALL_COMPLETED". | +-------------------------------+------------------------------------------+ | "ALL_COMPLETED" | The function will return when all | | | futures finish or are cancelled. | +-------------------------------+------------------------------------------+ concurrent.futures.as_completed(fs, timeout=None) Returns an iterator over the "Future" instances (possibly created by different "Executor" instances) given by *fs* that yields futures as they complete (finished or cancelled futures). Any futures given by *fs* that are duplicated will be returned once. Any futures that completed before "as_completed()" is called will be yielded first. The returned iterator raises a "concurrent.futures.TimeoutError" if "__next__()" is called and the result isn’t available after *timeout* seconds from the original call to "as_completed()". *timeout* can be an int or float. If *timeout* is not specified or "None", there is no limit to the wait time. See also: **PEP 3148** – futures - execute computations asynchronously The proposal which described this feature for inclusion in the Python standard library. Exception classes ================= exception concurrent.futures.CancelledError Raised when a future is cancelled. exception concurrent.futures.TimeoutError Raised when a future operation exceeds the given timeout. exception concurrent.futures.BrokenExecutor Derived from "RuntimeError", this exception class is raised when an executor is broken for some reason, and cannot be used to submit or execute new tasks. New in version 3.7. exception concurrent.futures.InvalidStateError Raised when an operation is performed on a future that is not allowed in the current state. New in version 3.8. exception concurrent.futures.thread.BrokenThreadPool Derived from "BrokenExecutor", this exception class is raised when one of the workers of a "ThreadPoolExecutor" has failed initializing. New in version 3.7. exception concurrent.futures.process.BrokenProcessPool Derived from "BrokenExecutor" (formerly "RuntimeError"), this exception class is raised when one of the workers of a "ProcessPoolExecutor" has terminated in a non-clean fashion (for example, if it was killed from the outside). New in version 3.3. The "concurrent" package ************************ Currently, there is only one module in this package: * "concurrent.futures" – Launching parallel tasks "configparser" — Configuration file parser ****************************************** **Source code:** Lib/configparser.py ====================================================================== This module provides the "ConfigParser" class which implements a basic configuration language which provides a structure similar to what’s found in Microsoft Windows INI files. You can use this to write Python programs which can be customized by end users easily. Note: This library does *not* interpret or write the value-type prefixes used in the Windows Registry extended version of INI syntax. See also: Module "shlex" Support for creating Unix shell-like mini-languages which can be used as an alternate format for application configuration files. Module "json" The json module implements a subset of JavaScript syntax which can also be used for this purpose. Quick Start =========== Let’s take a very basic configuration file that looks like this: [DEFAULT] ServerAliveInterval = 45 Compression = yes CompressionLevel = 9 ForwardX11 = yes [bitbucket.org] User = hg [topsecret.server.com] Port = 50022 ForwardX11 = no The structure of INI files is described in the following section. Essentially, the file consists of sections, each of which contains keys with values. "configparser" classes can read and write such files. Let’s start by creating the above configuration file programmatically. >>> import configparser >>> config = configparser.ConfigParser() >>> config['DEFAULT'] = {'ServerAliveInterval': '45', ... 'Compression': 'yes', ... 'CompressionLevel': '9'} >>> config['bitbucket.org'] = {} >>> config['bitbucket.org']['User'] = 'hg' >>> config['topsecret.server.com'] = {} >>> topsecret = config['topsecret.server.com'] >>> topsecret['Port'] = '50022' # mutates the parser >>> topsecret['ForwardX11'] = 'no' # same here >>> config['DEFAULT']['ForwardX11'] = 'yes' >>> with open('example.ini', 'w') as configfile: ... config.write(configfile) ... As you can see, we can treat a config parser much like a dictionary. There are differences, outlined later, but the behavior is very close to what you would expect from a dictionary. Now that we have created and saved a configuration file, let’s read it back and explore the data it holds. >>> config = configparser.ConfigParser() >>> config.sections() [] >>> config.read('example.ini') ['example.ini'] >>> config.sections() ['bitbucket.org', 'topsecret.server.com'] >>> 'bitbucket.org' in config True >>> 'bytebong.com' in config False >>> config['bitbucket.org']['User'] 'hg' >>> config['DEFAULT']['Compression'] 'yes' >>> topsecret = config['topsecret.server.com'] >>> topsecret['ForwardX11'] 'no' >>> topsecret['Port'] '50022' >>> for key in config['bitbucket.org']: ... print(key) user compressionlevel serveraliveinterval compression forwardx11 >>> config['bitbucket.org']['ForwardX11'] 'yes' As we can see above, the API is pretty straightforward. The only bit of magic involves the "DEFAULT" section which provides default values for all other sections [1]. Note also that keys in sections are case- insensitive and stored in lowercase [1]. Supported Datatypes =================== Config parsers do not guess datatypes of values in configuration files, always storing them internally as strings. This means that if you need other datatypes, you should convert on your own: >>> int(topsecret['Port']) 50022 >>> float(topsecret['CompressionLevel']) 9.0 Since this task is so common, config parsers provide a range of handy getter methods to handle integers, floats and booleans. The last one is the most interesting because simply passing the value to "bool()" would do no good since "bool('False')" is still "True". This is why config parsers also provide "getboolean()". This method is case- insensitive and recognizes Boolean values from "'yes'"/"'no'", "'on'"/"'off'", "'true'"/"'false'" and "'1'"/"'0'" [1]. For example: >>> topsecret.getboolean('ForwardX11') False >>> config['bitbucket.org'].getboolean('ForwardX11') True >>> config.getboolean('bitbucket.org', 'Compression') True Apart from "getboolean()", config parsers also provide equivalent "getint()" and "getfloat()" methods. You can register your own converters and customize the provided ones. [1] Fallback Values =============== As with a dictionary, you can use a section’s "get()" method to provide fallback values: >>> topsecret.get('Port') '50022' >>> topsecret.get('CompressionLevel') '9' >>> topsecret.get('Cipher') >>> topsecret.get('Cipher', '3des-cbc') '3des-cbc' Please note that default values have precedence over fallback values. For instance, in our example the "'CompressionLevel'" key was specified only in the "'DEFAULT'" section. If we try to get it from the section "'topsecret.server.com'", we will always get the default, even if we specify a fallback: >>> topsecret.get('CompressionLevel', '3') '9' One more thing to be aware of is that the parser-level "get()" method provides a custom, more complex interface, maintained for backwards compatibility. When using this method, a fallback value can be provided via the "fallback" keyword-only argument: >>> config.get('bitbucket.org', 'monster', ... fallback='No such things as monsters') 'No such things as monsters' The same "fallback" argument can be used with the "getint()", "getfloat()" and "getboolean()" methods, for example: >>> 'BatchMode' in topsecret False >>> topsecret.getboolean('BatchMode', fallback=True) True >>> config['DEFAULT']['BatchMode'] = 'no' >>> topsecret.getboolean('BatchMode', fallback=True) False Supported INI File Structure ============================ A configuration file consists of sections, each led by a "[section]" header, followed by key/value entries separated by a specific string ("=" or ":" by default [1]). By default, section names are case sensitive but keys are not [1]. Leading and trailing whitespace is removed from keys and values. Values can be omitted if the parser is configured to allow it [1], in which case the key/value delimiter may also be left out. Values can also span multiple lines, as long as they are indented deeper than the first line of the value. Depending on the parser’s mode, blank lines may be treated as parts of multiline values or ignored. By default, a valid section name can be any string that does not contain ‘\n’ or ‘]’. To change this, see "ConfigParser.SECTCRE". Configuration files may include comments, prefixed by specific characters ("#" and ";" by default [1]). Comments may appear on their own on an otherwise empty line, possibly indented. [1] For example: [Simple Values] key=value spaces in keys=allowed spaces in values=allowed as well spaces around the delimiter = obviously you can also use : to delimit keys from values [All Values Are Strings] values like this: 1000000 or this: 3.14159265359 are they treated as numbers? : no integers, floats and booleans are held as: strings can use the API to get converted values directly: true [Multiline Values] chorus: I'm a lumberjack, and I'm okay I sleep all night and I work all day [No Values] key_without_value empty string value here = [You can use comments] # like this ; or this # By default only in an empty line. # Inline comments can be harmful because they prevent users # from using the delimiting characters as parts of values. # That being said, this can be customized. [Sections Can Be Indented] can_values_be_as_well = True does_that_mean_anything_special = False purpose = formatting for readability multiline_values = are handled just fine as long as they are indented deeper than the first line of a value # Did I mention we can indent comments, too? Interpolation of values ======================= On top of the core functionality, "ConfigParser" supports interpolation. This means values can be preprocessed before returning them from "get()" calls. class configparser.BasicInterpolation The default implementation used by "ConfigParser". It enables values to contain format strings which refer to other values in the same section, or values in the special default section [1]. Additional default values can be provided on initialization. For example: [Paths] home_dir: /Users my_dir: %(home_dir)s/lumberjack my_pictures: %(my_dir)s/Pictures [Escape] # use a %% to escape the % sign (% is the only character that needs to be escaped): gain: 80%% In the example above, "ConfigParser" with *interpolation* set to "BasicInterpolation()" would resolve "%(home_dir)s" to the value of "home_dir" ("/Users" in this case). "%(my_dir)s" in effect would resolve to "/Users/lumberjack". All interpolations are done on demand so keys used in the chain of references do not have to be specified in any specific order in the configuration file. With "interpolation" set to "None", the parser would simply return "%(my_dir)s/Pictures" as the value of "my_pictures" and "%(home_dir)s/lumberjack" as the value of "my_dir". class configparser.ExtendedInterpolation An alternative handler for interpolation which implements a more advanced syntax, used for instance in "zc.buildout". Extended interpolation is using "${section:option}" to denote a value from a foreign section. Interpolation can span multiple levels. For convenience, if the "section:" part is omitted, interpolation defaults to the current section (and possibly the default values from the special section). For example, the configuration specified above with basic interpolation, would look like this with extended interpolation: [Paths] home_dir: /Users my_dir: ${home_dir}/lumberjack my_pictures: ${my_dir}/Pictures [Escape] # use a $$ to escape the $ sign ($ is the only character that needs to be escaped): cost: $$80 Values from other sections can be fetched as well: [Common] home_dir: /Users library_dir: /Library system_dir: /System macports_dir: /opt/local [Frameworks] Python: 3.2 path: ${Common:system_dir}/Library/Frameworks/ [Arthur] nickname: Two Sheds last_name: Jackson my_dir: ${Common:home_dir}/twosheds my_pictures: ${my_dir}/Pictures python_dir: ${Frameworks:path}/Python/Versions/${Frameworks:Python} Mapping Protocol Access ======================= New in version 3.2. Mapping protocol access is a generic name for functionality that enables using custom objects as if they were dictionaries. In case of "configparser", the mapping interface implementation is using the "parser['section']['option']" notation. "parser['section']" in particular returns a proxy for the section’s data in the parser. This means that the values are not copied but they are taken from the original parser on demand. What’s even more important is that when values are changed on a section proxy, they are actually mutated in the original parser. "configparser" objects behave as close to actual dictionaries as possible. The mapping interface is complete and adheres to the "MutableMapping" ABC. However, there are a few differences that should be taken into account: * By default, all keys in sections are accessible in a case- insensitive manner [1]. E.g. "for option in parser["section"]" yields only "optionxform"’ed option key names. This means lowercased keys by default. At the same time, for a section that holds the key "'a'", both expressions return "True": "a" in parser["section"] "A" in parser["section"] * All sections include "DEFAULTSECT" values as well which means that ".clear()" on a section may not leave the section visibly empty. This is because default values cannot be deleted from the section (because technically they are not there). If they are overridden in the section, deleting causes the default value to be visible again. Trying to delete a default value causes a "KeyError". * "DEFAULTSECT" cannot be removed from the parser: * trying to delete it raises "ValueError", * "parser.clear()" leaves it intact, * "parser.popitem()" never returns it. * "parser.get(section, option, **kwargs)" - the second argument is **not** a fallback value. Note however that the section-level "get()" methods are compatible both with the mapping protocol and the classic configparser API. * "parser.items()" is compatible with the mapping protocol (returns a list of *section_name*, *section_proxy* pairs including the DEFAULTSECT). However, this method can also be invoked with arguments: "parser.items(section, raw, vars)". The latter call returns a list of *option*, *value* pairs for a specified "section", with all interpolations expanded (unless "raw=True" is provided). The mapping protocol is implemented on top of the existing legacy API so that subclasses overriding the original interface still should have mappings working as expected. Customizing Parser Behaviour ============================ There are nearly as many INI format variants as there are applications using it. "configparser" goes a long way to provide support for the largest sensible set of INI styles available. The default functionality is mainly dictated by historical background and it’s very likely that you will want to customize some of the features. The most common way to change the way a specific config parser works is to use the "__init__()" options: * *defaults*, default value: "None" This option accepts a dictionary of key-value pairs which will be initially put in the "DEFAULT" section. This makes for an elegant way to support concise configuration files that don’t specify values which are the same as the documented default. Hint: if you want to specify default values for a specific section, use "read_dict()" before you read the actual file. * *dict_type*, default value: "dict" This option has a major impact on how the mapping protocol will behave and how the written configuration files look. With the standard dictionary, every section is stored in the order they were added to the parser. Same goes for options within sections. An alternative dictionary type can be used for example to sort sections and options on write-back. Please note: there are ways to add a set of key-value pairs in a single operation. When you use a regular dictionary in those operations, the order of the keys will be ordered. For example: >>> parser = configparser.ConfigParser() >>> parser.read_dict({'section1': {'key1': 'value1', ... 'key2': 'value2', ... 'key3': 'value3'}, ... 'section2': {'keyA': 'valueA', ... 'keyB': 'valueB', ... 'keyC': 'valueC'}, ... 'section3': {'foo': 'x', ... 'bar': 'y', ... 'baz': 'z'} ... }) >>> parser.sections() ['section1', 'section2', 'section3'] >>> [option for option in parser['section3']] ['foo', 'bar', 'baz'] * *allow_no_value*, default value: "False" Some configuration files are known to include settings without values, but which otherwise conform to the syntax supported by "configparser". The *allow_no_value* parameter to the constructor can be used to indicate that such values should be accepted: >>> import configparser >>> sample_config = """ ... [mysqld] ... user = mysql ... pid-file = /var/run/mysqld/mysqld.pid ... skip-external-locking ... old_passwords = 1 ... skip-bdb ... # we don't need ACID today ... skip-innodb ... """ >>> config = configparser.ConfigParser(allow_no_value=True) >>> config.read_string(sample_config) >>> # Settings with values are treated as before: >>> config["mysqld"]["user"] 'mysql' >>> # Settings without values provide None: >>> config["mysqld"]["skip-bdb"] >>> # Settings which aren't specified still raise an error: >>> config["mysqld"]["does-not-exist"] Traceback (most recent call last): ... KeyError: 'does-not-exist' * *delimiters*, default value: "('=', ':')" Delimiters are substrings that delimit keys from values within a section. The first occurrence of a delimiting substring on a line is considered a delimiter. This means values (but not keys) can contain the delimiters. See also the *space_around_delimiters* argument to "ConfigParser.write()". * *comment_prefixes*, default value: "('#', ';')" * *inline_comment_prefixes*, default value: "None" Comment prefixes are strings that indicate the start of a valid comment within a config file. *comment_prefixes* are used only on otherwise empty lines (optionally indented) whereas *inline_comment_prefixes* can be used after every valid value (e.g. section names, options and empty lines as well). By default inline comments are disabled and "'#'" and "';'" are used as prefixes for whole line comments. Changed in version 3.2: In previous versions of "configparser" behaviour matched "comment_prefixes=('#',';')" and "inline_comment_prefixes=(';',)". Please note that config parsers don’t support escaping of comment prefixes so using *inline_comment_prefixes* may prevent users from specifying option values with characters used as comment prefixes. When in doubt, avoid setting *inline_comment_prefixes*. In any circumstances, the only way of storing comment prefix characters at the beginning of a line in multiline values is to interpolate the prefix, for example: >>> from configparser import ConfigParser, ExtendedInterpolation >>> parser = ConfigParser(interpolation=ExtendedInterpolation()) >>> # the default BasicInterpolation could be used as well >>> parser.read_string(""" ... [DEFAULT] ... hash = # ... ... [hashes] ... shebang = ... ${hash}!/usr/bin/env python ... ${hash} -*- coding: utf-8 -*- ... ... extensions = ... enabled_extension ... another_extension ... #disabled_by_comment ... yet_another_extension ... ... interpolation not necessary = if # is not at line start ... even in multiline values = line #1 ... line #2 ... line #3 ... """) >>> print(parser['hashes']['shebang']) #!/usr/bin/env python # -*- coding: utf-8 -*- >>> print(parser['hashes']['extensions']) enabled_extension another_extension yet_another_extension >>> print(parser['hashes']['interpolation not necessary']) if # is not at line start >>> print(parser['hashes']['even in multiline values']) line #1 line #2 line #3 * *strict*, default value: "True" When set to "True", the parser will not allow for any section or option duplicates while reading from a single source (using "read_file()", "read_string()" or "read_dict()"). It is recommended to use strict parsers in new applications. Changed in version 3.2: In previous versions of "configparser" behaviour matched "strict=False". * *empty_lines_in_values*, default value: "True" In config parsers, values can span multiple lines as long as they are indented more than the key that holds them. By default parsers also let empty lines to be parts of values. At the same time, keys can be arbitrarily indented themselves to improve readability. In consequence, when configuration files get big and complex, it is easy for the user to lose track of the file structure. Take for instance: [Section] key = multiline value with a gotcha this = is still a part of the multiline value of 'key' This can be especially problematic for the user to see if she’s using a proportional font to edit the file. That is why when your application does not need values with empty lines, you should consider disallowing them. This will make empty lines split keys every time. In the example above, it would produce two keys, "key" and "this". * *default_section*, default value: "configparser.DEFAULTSECT" (that is: ""DEFAULT"") The convention of allowing a special section of default values for other sections or interpolation purposes is a powerful concept of this library, letting users create complex declarative configurations. This section is normally called ""DEFAULT"" but this can be customized to point to any other valid section name. Some typical values include: ""general"" or ""common"". The name provided is used for recognizing default sections when reading from any source and is used when writing configuration back to a file. Its current value can be retrieved using the "parser_instance.default_section" attribute and may be modified at runtime (i.e. to convert files from one format to another). * *interpolation*, default value: "configparser.BasicInterpolation" Interpolation behaviour may be customized by providing a custom handler through the *interpolation* argument. "None" can be used to turn off interpolation completely, "ExtendedInterpolation()" provides a more advanced variant inspired by "zc.buildout". More on the subject in the dedicated documentation section. "RawConfigParser" has a default value of "None". * *converters*, default value: not set Config parsers provide option value getters that perform type conversion. By default "getint()", "getfloat()", and "getboolean()" are implemented. Should other getters be desirable, users may define them in a subclass or pass a dictionary where each key is a name of the converter and each value is a callable implementing said conversion. For instance, passing "{'decimal': decimal.Decimal}" would add "getdecimal()" on both the parser object and all section proxies. In other words, it will be possible to write both "parser_instance.getdecimal('section', 'key', fallback=0)" and "parser_instance['section'].getdecimal('key', 0)". If the converter needs to access the state of the parser, it can be implemented as a method on a config parser subclass. If the name of this method starts with "get", it will be available on all section proxies, in the dict-compatible form (see the "getdecimal()" example above). More advanced customization may be achieved by overriding default values of these parser attributes. The defaults are defined on the classes, so they may be overridden by subclasses or by attribute assignment. ConfigParser.BOOLEAN_STATES By default when using "getboolean()", config parsers consider the following values "True": "'1'", "'yes'", "'true'", "'on'" and the following values "False": "'0'", "'no'", "'false'", "'off'". You can override this by specifying a custom dictionary of strings and their Boolean outcomes. For example: >>> custom = configparser.ConfigParser() >>> custom['section1'] = {'funky': 'nope'} >>> custom['section1'].getboolean('funky') Traceback (most recent call last): ... ValueError: Not a boolean: nope >>> custom.BOOLEAN_STATES = {'sure': True, 'nope': False} >>> custom['section1'].getboolean('funky') False Other typical Boolean pairs include "accept"/"reject" or "enabled"/"disabled". ConfigParser.optionxform(option) This method transforms option names on every read, get, or set operation. The default converts the name to lowercase. This also means that when a configuration file gets written, all keys will be lowercase. Override this method if that’s unsuitable. For example: >>> config = """ ... [Section1] ... Key = Value ... ... [Section2] ... AnotherKey = Value ... """ >>> typical = configparser.ConfigParser() >>> typical.read_string(config) >>> list(typical['Section1'].keys()) ['key'] >>> list(typical['Section2'].keys()) ['anotherkey'] >>> custom = configparser.RawConfigParser() >>> custom.optionxform = lambda option: option >>> custom.read_string(config) >>> list(custom['Section1'].keys()) ['Key'] >>> list(custom['Section2'].keys()) ['AnotherKey'] Note: The optionxform function transforms option names to a canonical form. This should be an idempotent function: if the name is already in canonical form, it should be returned unchanged. ConfigParser.SECTCRE A compiled regular expression used to parse section headers. The default matches "[section]" to the name ""section"". Whitespace is considered part of the section name, thus "[ larch ]" will be read as a section of name "" larch "". Override this attribute if that’s unsuitable. For example: >>> import re >>> config = """ ... [Section 1] ... option = value ... ... [ Section 2 ] ... another = val ... """ >>> typical = configparser.ConfigParser() >>> typical.read_string(config) >>> typical.sections() ['Section 1', ' Section 2 '] >>> custom = configparser.ConfigParser() >>> custom.SECTCRE = re.compile(r"\[ *(?P

[^]]+?) *\]") >>> custom.read_string(config) >>> custom.sections() ['Section 1', 'Section 2'] Note: While ConfigParser objects also use an "OPTCRE" attribute for recognizing option lines, it’s not recommended to override it because that would interfere with constructor options *allow_no_value* and *delimiters*. Legacy API Examples =================== Mainly because of backwards compatibility concerns, "configparser" provides also a legacy API with explicit "get"/"set" methods. While there are valid use cases for the methods outlined below, mapping protocol access is preferred for new projects. The legacy API is at times more advanced, low-level and downright counterintuitive. An example of writing to a configuration file: import configparser config = configparser.RawConfigParser() # Please note that using RawConfigParser's set functions, you can assign # non-string values to keys internally, but will receive an error when # attempting to write to a file or when you get it in non-raw mode. Setting # values using the mapping protocol or ConfigParser's set() does not allow # such assignments to take place. config.add_section('Section1') config.set('Section1', 'an_int', '15') config.set('Section1', 'a_bool', 'true') config.set('Section1', 'a_float', '3.1415') config.set('Section1', 'baz', 'fun') config.set('Section1', 'bar', 'Python') config.set('Section1', 'foo', '%(bar)s is %(baz)s!') # Writing our configuration file to 'example.cfg' with open('example.cfg', 'w') as configfile: config.write(configfile) An example of reading the configuration file again: import configparser config = configparser.RawConfigParser() config.read('example.cfg') # getfloat() raises an exception if the value is not a float # getint() and getboolean() also do this for their respective types a_float = config.getfloat('Section1', 'a_float') an_int = config.getint('Section1', 'an_int') print(a_float + an_int) # Notice that the next output does not interpolate '%(bar)s' or '%(baz)s'. # This is because we are using a RawConfigParser(). if config.getboolean('Section1', 'a_bool'): print(config.get('Section1', 'foo')) To get interpolation, use "ConfigParser": import configparser cfg = configparser.ConfigParser() cfg.read('example.cfg') # Set the optional *raw* argument of get() to True if you wish to disable # interpolation in a single get operation. print(cfg.get('Section1', 'foo', raw=False)) # -> "Python is fun!" print(cfg.get('Section1', 'foo', raw=True)) # -> "%(bar)s is %(baz)s!" # The optional *vars* argument is a dict with members that will take # precedence in interpolation. print(cfg.get('Section1', 'foo', vars={'bar': 'Documentation', 'baz': 'evil'})) # The optional *fallback* argument can be used to provide a fallback value print(cfg.get('Section1', 'foo')) # -> "Python is fun!" print(cfg.get('Section1', 'foo', fallback='Monty is not.')) # -> "Python is fun!" print(cfg.get('Section1', 'monster', fallback='No such things as monsters.')) # -> "No such things as monsters." # A bare print(cfg.get('Section1', 'monster')) would raise NoOptionError # but we can also use: print(cfg.get('Section1', 'monster', fallback=None)) # -> None Default values are available in both types of ConfigParsers. They are used in interpolation if an option used is not defined elsewhere. import configparser # New instance with 'bar' and 'baz' defaulting to 'Life' and 'hard' each config = configparser.ConfigParser({'bar': 'Life', 'baz': 'hard'}) config.read('example.cfg') print(config.get('Section1', 'foo')) # -> "Python is fun!" config.remove_option('Section1', 'bar') config.remove_option('Section1', 'baz') print(config.get('Section1', 'foo')) # -> "Life is hard!" ConfigParser Objects ==================== class configparser.ConfigParser(defaults=None, dict_type=dict, allow_no_value=False, delimiters=('=', ':'), comment_prefixes=('#', ';'), inline_comment_prefixes=None, strict=True, empty_lines_in_values=True, default_section=configparser.DEFAULTSECT, interpolation=BasicInterpolation(), converters={}) The main configuration parser. When *defaults* is given, it is initialized into the dictionary of intrinsic defaults. When *dict_type* is given, it will be used to create the dictionary objects for the list of sections, for the options within a section, and for the default values. When *delimiters* is given, it is used as the set of substrings that divide keys from values. When *comment_prefixes* is given, it will be used as the set of substrings that prefix comments in otherwise empty lines. Comments can be indented. When *inline_comment_prefixes* is given, it will be used as the set of substrings that prefix comments in non-empty lines. When *strict* is "True" (the default), the parser won’t allow for any section or option duplicates while reading from a single source (file, string or dictionary), raising "DuplicateSectionError" or "DuplicateOptionError". When *empty_lines_in_values* is "False" (default: "True"), each empty line marks the end of an option. Otherwise, internal empty lines of a multiline option are kept as part of the value. When *allow_no_value* is "True" (default: "False"), options without values are accepted; the value held for these is "None" and they are serialized without the trailing delimiter. When *default_section* is given, it specifies the name for the special section holding default values for other sections and interpolation purposes (normally named ""DEFAULT""). This value can be retrieved and changed on runtime using the "default_section" instance attribute. Interpolation behaviour may be customized by providing a custom handler through the *interpolation* argument. "None" can be used to turn off interpolation completely, "ExtendedInterpolation()" provides a more advanced variant inspired by "zc.buildout". More on the subject in the dedicated documentation section. All option names used in interpolation will be passed through the "optionxform()" method just like any other option name reference. For example, using the default implementation of "optionxform()" (which converts option names to lower case), the values "foo %(bar)s" and "foo %(BAR)s" are equivalent. When *converters* is given, it should be a dictionary where each key represents the name of a type converter and each value is a callable implementing the conversion from string to the desired datatype. Every converter gets its own corresponding "get*()" method on the parser object and section proxies. Changed in version 3.1: The default *dict_type* is "collections.OrderedDict". Changed in version 3.2: *allow_no_value*, *delimiters*, *comment_prefixes*, *strict*, *empty_lines_in_values*, *default_section* and *interpolation* were added. Changed in version 3.5: The *converters* argument was added. Changed in version 3.7: The *defaults* argument is read with "read_dict()", providing consistent behavior across the parser: non-string keys and values are implicitly converted to strings. Changed in version 3.8: The default *dict_type* is "dict", since it now preserves insertion order. defaults() Return a dictionary containing the instance-wide defaults. sections() Return a list of the sections available; the *default section* is not included in the list. add_section(section) Add a section named *section* to the instance. If a section by the given name already exists, "DuplicateSectionError" is raised. If the *default section* name is passed, "ValueError" is raised. The name of the section must be a string; if not, "TypeError" is raised. Changed in version 3.2: Non-string section names raise "TypeError". has_section(section) Indicates whether the named *section* is present in the configuration. The *default section* is not acknowledged. options(section) Return a list of options available in the specified *section*. has_option(section, option) If the given *section* exists, and contains the given *option*, return "True"; otherwise return "False". If the specified *section* is "None" or an empty string, DEFAULT is assumed. read(filenames, encoding=None) Attempt to read and parse an iterable of filenames, returning a list of filenames which were successfully parsed. If *filenames* is a string, a "bytes" object or a *path-like object*, it is treated as a single filename. If a file named in *filenames* cannot be opened, that file will be ignored. This is designed so that you can specify an iterable of potential configuration file locations (for example, the current directory, the user’s home directory, and some system-wide directory), and all existing configuration files in the iterable will be read. If none of the named files exist, the "ConfigParser" instance will contain an empty dataset. An application which requires initial values to be loaded from a file should load the required file or files using "read_file()" before calling "read()" for any optional files: import configparser, os config = configparser.ConfigParser() config.read_file(open('defaults.cfg')) config.read(['site.cfg', os.path.expanduser('~/.myapp.cfg')], encoding='cp1250') New in version 3.2: The *encoding* parameter. Previously, all files were read using the default encoding for "open()". New in version 3.6.1: The *filenames* parameter accepts a *path- like object*. New in version 3.7: The *filenames* parameter accepts a "bytes" object. read_file(f, source=None) Read and parse configuration data from *f* which must be an iterable yielding Unicode strings (for example files opened in text mode). Optional argument *source* specifies the name of the file being read. If not given and *f* has a "name" attribute, that is used for *source*; the default is "''". New in version 3.2: Replaces "readfp()". read_string(string, source='') Parse configuration data from a string. Optional argument *source* specifies a context-specific name of the string passed. If not given, "''" is used. This should commonly be a filesystem path or a URL. New in version 3.2. read_dict(dictionary, source='') Load configuration from any object that provides a dict-like "items()" method. Keys are section names, values are dictionaries with keys and values that should be present in the section. If the used dictionary type preserves order, sections and their keys will be added in order. Values are automatically converted to strings. Optional argument *source* specifies a context-specific name of the dictionary passed. If not given, "" is used. This method can be used to copy state between parsers. New in version 3.2. get(section, option, *, raw=False, vars=None[, fallback]) Get an *option* value for the named *section*. If *vars* is provided, it must be a dictionary. The *option* is looked up in *vars* (if provided), *section*, and in *DEFAULTSECT* in that order. If the key is not found and *fallback* is provided, it is used as a fallback value. "None" can be provided as a *fallback* value. All the "'%'" interpolations are expanded in the return values, unless the *raw* argument is true. Values for interpolation keys are looked up in the same manner as the option. Changed in version 3.2: Arguments *raw*, *vars* and *fallback* are keyword only to protect users from trying to use the third argument as the *fallback* fallback (especially when using the mapping protocol). getint(section, option, *, raw=False, vars=None[, fallback]) A convenience method which coerces the *option* in the specified *section* to an integer. See "get()" for explanation of *raw*, *vars* and *fallback*. getfloat(section, option, *, raw=False, vars=None[, fallback]) A convenience method which coerces the *option* in the specified *section* to a floating point number. See "get()" for explanation of *raw*, *vars* and *fallback*. getboolean(section, option, *, raw=False, vars=None[, fallback]) A convenience method which coerces the *option* in the specified *section* to a Boolean value. Note that the accepted values for the option are "'1'", "'yes'", "'true'", and "'on'", which cause this method to return "True", and "'0'", "'no'", "'false'", and "'off'", which cause it to return "False". These string values are checked in a case-insensitive manner. Any other value will cause it to raise "ValueError". See "get()" for explanation of *raw*, *vars* and *fallback*. items(raw=False, vars=None) items(section, raw=False, vars=None) When *section* is not given, return a list of *section_name*, *section_proxy* pairs, including DEFAULTSECT. Otherwise, return a list of *name*, *value* pairs for the options in the given *section*. Optional arguments have the same meaning as for the "get()" method. Changed in version 3.8: Items present in *vars* no longer appear in the result. The previous behaviour mixed actual parser options with variables provided for interpolation. set(section, option, value) If the given section exists, set the given option to the specified value; otherwise raise "NoSectionError". *option* and *value* must be strings; if not, "TypeError" is raised. write(fileobject, space_around_delimiters=True) Write a representation of the configuration to the specified *file object*, which must be opened in text mode (accepting strings). This representation can be parsed by a future "read()" call. If *space_around_delimiters* is true, delimiters between keys and values are surrounded by spaces. Note: Comments in the original configuration file are not preserved when writing the configuration back. What is considered a comment, depends on the given values for *comment_prefix* and *inline_comment_prefix*. remove_option(section, option) Remove the specified *option* from the specified *section*. If the section does not exist, raise "NoSectionError". If the option existed to be removed, return "True"; otherwise return "False". remove_section(section) Remove the specified *section* from the configuration. If the section in fact existed, return "True". Otherwise return "False". optionxform(option) Transforms the option name *option* as found in an input file or as passed in by client code to the form that should be used in the internal structures. The default implementation returns a lower-case version of *option*; subclasses may override this or client code can set an attribute of this name on instances to affect this behavior. You don’t need to subclass the parser to use this method, you can also set it on an instance, to a function that takes a string argument and returns a string. Setting it to "str", for example, would make option names case sensitive: cfgparser = ConfigParser() cfgparser.optionxform = str Note that when reading configuration files, whitespace around the option names is stripped before "optionxform()" is called. readfp(fp, filename=None) Deprecated since version 3.2: Use "read_file()" instead. Changed in version 3.2: "readfp()" now iterates on *fp* instead of calling "fp.readline()". For existing code calling "readfp()" with arguments which don’t support iteration, the following generator may be used as a wrapper around the file-like object: def readline_generator(fp): line = fp.readline() while line: yield line line = fp.readline() Instead of "parser.readfp(fp)" use "parser.read_file(readline_generator(fp))". configparser.MAX_INTERPOLATION_DEPTH The maximum depth for recursive interpolation for "get()" when the *raw* parameter is false. This is relevant only when the default *interpolation* is used. RawConfigParser Objects ======================= class configparser.RawConfigParser(defaults=None, dict_type=dict, allow_no_value=False, *, delimiters=('=', ':'), comment_prefixes=('#', ';'), inline_comment_prefixes=None, strict=True, empty_lines_in_values=True, default_section=configparser.DEFAULTSECT[, interpolation]) Legacy variant of the "ConfigParser". It has interpolation disabled by default and allows for non-string section names, option names, and values via its unsafe "add_section" and "set" methods, as well as the legacy "defaults=" keyword argument handling. Changed in version 3.8: The default *dict_type* is "dict", since it now preserves insertion order. Note: Consider using "ConfigParser" instead which checks types of the values to be stored internally. If you don’t want interpolation, you can use "ConfigParser(interpolation=None)". add_section(section) Add a section named *section* to the instance. If a section by the given name already exists, "DuplicateSectionError" is raised. If the *default section* name is passed, "ValueError" is raised. Type of *section* is not checked which lets users create non- string named sections. This behaviour is unsupported and may cause internal errors. set(section, option, value) If the given section exists, set the given option to the specified value; otherwise raise "NoSectionError". While it is possible to use "RawConfigParser" (or "ConfigParser" with *raw* parameters set to true) for *internal* storage of non-string values, full functionality (including interpolation and output to files) can only be achieved using string values. This method lets users assign non-string values to keys internally. This behaviour is unsupported and will cause errors when attempting to write to a file or get it in non-raw mode. **Use the mapping protocol API** which does not allow such assignments to take place. Exceptions ========== exception configparser.Error Base class for all other "configparser" exceptions. exception configparser.NoSectionError Exception raised when a specified section is not found. exception configparser.DuplicateSectionError Exception raised if "add_section()" is called with the name of a section that is already present or in strict parsers when a section if found more than once in a single input file, string or dictionary. New in version 3.2: Optional "source" and "lineno" attributes and arguments to "__init__()" were added. exception configparser.DuplicateOptionError Exception raised by strict parsers if a single option appears twice during reading from a single file, string or dictionary. This catches misspellings and case sensitivity-related errors, e.g. a dictionary may have two keys representing the same case-insensitive configuration key. exception configparser.NoOptionError Exception raised when a specified option is not found in the specified section. exception configparser.InterpolationError Base class for exceptions raised when problems occur performing string interpolation. exception configparser.InterpolationDepthError Exception raised when string interpolation cannot be completed because the number of iterations exceeds "MAX_INTERPOLATION_DEPTH". Subclass of "InterpolationError". exception configparser.InterpolationMissingOptionError Exception raised when an option referenced from a value does not exist. Subclass of "InterpolationError". exception configparser.InterpolationSyntaxError Exception raised when the source text into which substitutions are made does not conform to the required syntax. Subclass of "InterpolationError". exception configparser.MissingSectionHeaderError Exception raised when attempting to parse a file which has no section headers. exception configparser.ParsingError Exception raised when errors occur attempting to parse a file. Changed in version 3.2: The "filename" attribute and "__init__()" argument were renamed to "source" for consistency. -[ Footnotes ]- [1] Config parsers allow for heavy customization. If you are interested in changing the behaviour outlined by the footnote reference, consult the Customizing Parser Behaviour section. Built-in Constants ****************** A small number of constants live in the built-in namespace. They are: False The false value of the "bool" type. Assignments to "False" are illegal and raise a "SyntaxError". True The true value of the "bool" type. Assignments to "True" are illegal and raise a "SyntaxError". None The sole value of the type "NoneType". "None" is frequently used to represent the absence of a value, as when default arguments are not passed to a function. Assignments to "None" are illegal and raise a "SyntaxError". NotImplemented Special value which should be returned by the binary special methods (e.g. "__eq__()", "__lt__()", "__add__()", "__rsub__()", etc.) to indicate that the operation is not implemented with respect to the other type; may be returned by the in-place binary special methods (e.g. "__imul__()", "__iand__()", etc.) for the same purpose. It should not be evaluated in a boolean context. Note: When a binary (or in-place) method returns "NotImplemented" the interpreter will try the reflected operation on the other type (or some other fallback, depending on the operator). If all attempts return "NotImplemented", the interpreter will raise an appropriate exception. Incorrectly returning "NotImplemented" will result in a misleading error message or the "NotImplemented" value being returned to Python code.See Implementing the arithmetic operations for examples. Note: "NotImplementedError" and "NotImplemented" are not interchangeable, even though they have similar names and purposes. See "NotImplementedError" for details on when to use it. Changed in version 3.9: Evaluating "NotImplemented" in a boolean context is deprecated. While it currently evaluates as true, it will emit a "DeprecationWarning". It will raise a "TypeError" in a future version of Python. Ellipsis The same as the ellipsis literal “"..."”. Special value used mostly in conjunction with extended slicing syntax for user-defined container data types. __debug__ This constant is true if Python was not started with an "-O" option. See also the "assert" statement. Note: The names "None", "False", "True" and "__debug__" cannot be reassigned (assignments to them, even as an attribute name, raise "SyntaxError"), so they can be considered “true” constants. Constants added by the "site" module ==================================== The "site" module (which is imported automatically during startup, except if the "-S" command-line option is given) adds several constants to the built-in namespace. They are useful for the interactive interpreter shell and should not be used in programs. quit(code=None) exit(code=None) Objects that when printed, print a message like “Use quit() or Ctrl-D (i.e. EOF) to exit”, and when called, raise "SystemExit" with the specified exit code. copyright credits Objects that when printed or called, print the text of copyright or credits, respectively. license Object that when printed, prints the message “Type license() to see the full license text”, and when called, displays the full license text in a pager-like fashion (one screen at a time). "contextlib" — Utilities for "with"-statement contexts ****************************************************** **Source code:** Lib/contextlib.py ====================================================================== This module provides utilities for common tasks involving the "with" statement. For more information see also Context Manager Types and With Statement Context Managers. Utilities ========= Functions and classes provided: class contextlib.AbstractContextManager An *abstract base class* for classes that implement "object.__enter__()" and "object.__exit__()". A default implementation for "object.__enter__()" is provided which returns "self" while "object.__exit__()" is an abstract method which by default returns "None". See also the definition of Context Manager Types. New in version 3.6. class contextlib.AbstractAsyncContextManager An *abstract base class* for classes that implement "object.__aenter__()" and "object.__aexit__()". A default implementation for "object.__aenter__()" is provided which returns "self" while "object.__aexit__()" is an abstract method which by default returns "None". See also the definition of Asynchronous Context Managers. New in version 3.7. @contextlib.contextmanager This function is a *decorator* that can be used to define a factory function for "with" statement context managers, without needing to create a class or separate "__enter__()" and "__exit__()" methods. While many objects natively support use in with statements, sometimes a resource needs to be managed that isn’t a context manager in its own right, and doesn’t implement a "close()" method for use with "contextlib.closing" An abstract example would be the following to ensure correct resource management: from contextlib import contextmanager @contextmanager def managed_resource(*args, **kwds): # Code to acquire resource, e.g.: resource = acquire_resource(*args, **kwds) try: yield resource finally: # Code to release resource, e.g.: release_resource(resource) >>> with managed_resource(timeout=3600) as resource: ... # Resource is released at the end of this block, ... # even if code in the block raises an exception The function being decorated must return a *generator*-iterator when called. This iterator must yield exactly one value, which will be bound to the targets in the "with" statement’s "as" clause, if any. At the point where the generator yields, the block nested in the "with" statement is executed. The generator is then resumed after the block is exited. If an unhandled exception occurs in the block, it is reraised inside the generator at the point where the yield occurred. Thus, you can use a "try"…"except"…"finally" statement to trap the error (if any), or ensure that some cleanup takes place. If an exception is trapped merely in order to log it or to perform some action (rather than to suppress it entirely), the generator must reraise that exception. Otherwise the generator context manager will indicate to the "with" statement that the exception has been handled, and execution will resume with the statement immediately following the "with" statement. "contextmanager()" uses "ContextDecorator" so the context managers it creates can be used as decorators as well as in "with" statements. When used as a decorator, a new generator instance is implicitly created on each function call (this allows the otherwise “one-shot” context managers created by "contextmanager()" to meet the requirement that context managers support multiple invocations in order to be used as decorators). Changed in version 3.2: Use of "ContextDecorator". @contextlib.asynccontextmanager Similar to "contextmanager()", but creates an asynchronous context manager. This function is a *decorator* that can be used to define a factory function for "async with" statement asynchronous context managers, without needing to create a class or separate "__aenter__()" and "__aexit__()" methods. It must be applied to an *asynchronous generator* function. A simple example: from contextlib import asynccontextmanager @asynccontextmanager async def get_connection(): conn = await acquire_db_connection() try: yield conn finally: await release_db_connection(conn) async def get_all_users(): async with get_connection() as conn: return conn.query('SELECT ...') New in version 3.7. contextlib.closing(thing) Return a context manager that closes *thing* upon completion of the block. This is basically equivalent to: from contextlib import contextmanager @contextmanager def closing(thing): try: yield thing finally: thing.close() And lets you write code like this: from contextlib import closing from urllib.request import urlopen with closing(urlopen('https://www.python.org')) as page: for line in page: print(line) without needing to explicitly close "page". Even if an error occurs, "page.close()" will be called when the "with" block is exited. contextlib.nullcontext(enter_result=None) Return a context manager that returns *enter_result* from "__enter__", but otherwise does nothing. It is intended to be used as a stand-in for an optional context manager, for example: def myfunction(arg, ignore_exceptions=False): if ignore_exceptions: # Use suppress to ignore all exceptions. cm = contextlib.suppress(Exception) else: # Do not ignore any exceptions, cm has no effect. cm = contextlib.nullcontext() with cm: # Do something An example using *enter_result*: def process_file(file_or_path): if isinstance(file_or_path, str): # If string, open file cm = open(file_or_path) else: # Caller is responsible for closing file cm = nullcontext(file_or_path) with cm as file: # Perform processing on the file New in version 3.7. contextlib.suppress(*exceptions) Return a context manager that suppresses any of the specified exceptions if they occur in the body of a "with" statement and then resumes execution with the first statement following the end of the "with" statement. As with any other mechanism that completely suppresses exceptions, this context manager should be used only to cover very specific errors where silently continuing with program execution is known to be the right thing to do. For example: from contextlib import suppress with suppress(FileNotFoundError): os.remove('somefile.tmp') with suppress(FileNotFoundError): os.remove('someotherfile.tmp') This code is equivalent to: try: os.remove('somefile.tmp') except FileNotFoundError: pass try: os.remove('someotherfile.tmp') except FileNotFoundError: pass This context manager is reentrant. New in version 3.4. contextlib.redirect_stdout(new_target) Context manager for temporarily redirecting "sys.stdout" to another file or file-like object. This tool adds flexibility to existing functions or classes whose output is hardwired to stdout. For example, the output of "help()" normally is sent to *sys.stdout*. You can capture that output in a string by redirecting the output to an "io.StringIO" object. The replacement stream is returned from the "__enter__" method and so is available as the target of the "with" statement: with redirect_stdout(io.StringIO()) as f: help(pow) s = f.getvalue() To send the output of "help()" to a file on disk, redirect the output to a regular file: with open('help.txt', 'w') as f: with redirect_stdout(f): help(pow) To send the output of "help()" to *sys.stderr*: with redirect_stdout(sys.stderr): help(pow) Note that the global side effect on "sys.stdout" means that this context manager is not suitable for use in library code and most threaded applications. It also has no effect on the output of subprocesses. However, it is still a useful approach for many utility scripts. This context manager is reentrant. New in version 3.4. contextlib.redirect_stderr(new_target) Similar to "redirect_stdout()" but redirecting "sys.stderr" to another file or file-like object. This context manager is reentrant. New in version 3.5. class contextlib.ContextDecorator A base class that enables a context manager to also be used as a decorator. Context managers inheriting from "ContextDecorator" have to implement "__enter__" and "__exit__" as normal. "__exit__" retains its optional exception handling even when used as a decorator. "ContextDecorator" is used by "contextmanager()", so you get this functionality automatically. Example of "ContextDecorator": from contextlib import ContextDecorator class mycontext(ContextDecorator): def __enter__(self): print('Starting') return self def __exit__(self, *exc): print('Finishing') return False >>> @mycontext() ... def function(): ... print('The bit in the middle') ... >>> function() Starting The bit in the middle Finishing >>> with mycontext(): ... print('The bit in the middle') ... Starting The bit in the middle Finishing This change is just syntactic sugar for any construct of the following form: def f(): with cm(): # Do stuff "ContextDecorator" lets you instead write: @cm() def f(): # Do stuff It makes it clear that the "cm" applies to the whole function, rather than just a piece of it (and saving an indentation level is nice, too). Existing context managers that already have a base class can be extended by using "ContextDecorator" as a mixin class: from contextlib import ContextDecorator class mycontext(ContextBaseClass, ContextDecorator): def __enter__(self): return self def __exit__(self, *exc): return False Note: As the decorated function must be able to be called multiple times, the underlying context manager must support use in multiple "with" statements. If this is not the case, then the original construct with the explicit "with" statement inside the function should be used. New in version 3.2. class contextlib.ExitStack A context manager that is designed to make it easy to programmatically combine other context managers and cleanup functions, especially those that are optional or otherwise driven by input data. For example, a set of files may easily be handled in a single with statement as follows: with ExitStack() as stack: files = [stack.enter_context(open(fname)) for fname in filenames] # All opened files will automatically be closed at the end of # the with statement, even if attempts to open files later # in the list raise an exception The "__enter__()" method returns the "ExitStack" instance, and performs no additional operations. Each instance maintains a stack of registered callbacks that are called in reverse order when the instance is closed (either explicitly or implicitly at the end of a "with" statement). Note that callbacks are *not* invoked implicitly when the context stack instance is garbage collected. This stack model is used so that context managers that acquire their resources in their "__init__" method (such as file objects) can be handled correctly. Since registered callbacks are invoked in the reverse order of registration, this ends up behaving as if multiple nested "with" statements had been used with the registered set of callbacks. This even extends to exception handling - if an inner callback suppresses or replaces an exception, then outer callbacks will be passed arguments based on that updated state. This is a relatively low level API that takes care of the details of correctly unwinding the stack of exit callbacks. It provides a suitable foundation for higher level context managers that manipulate the exit stack in application specific ways. New in version 3.3. enter_context(cm) Enters a new context manager and adds its "__exit__()" method to the callback stack. The return value is the result of the context manager’s own "__enter__()" method. These context managers may suppress exceptions just as they normally would if used directly as part of a "with" statement. push(exit) Adds a context manager’s "__exit__()" method to the callback stack. As "__enter__" is *not* invoked, this method can be used to cover part of an "__enter__()" implementation with a context manager’s own "__exit__()" method. If passed an object that is not a context manager, this method assumes it is a callback with the same signature as a context manager’s "__exit__()" method and adds it directly to the callback stack. By returning true values, these callbacks can suppress exceptions the same way context manager "__exit__()" methods can. The passed in object is returned from the function, allowing this method to be used as a function decorator. callback(callback, /, *args, **kwds) Accepts an arbitrary callback function and arguments and adds it to the callback stack. Unlike the other methods, callbacks added this way cannot suppress exceptions (as they are never passed the exception details). The passed in callback is returned from the function, allowing this method to be used as a function decorator. pop_all() Transfers the callback stack to a fresh "ExitStack" instance and returns it. No callbacks are invoked by this operation - instead, they will now be invoked when the new stack is closed (either explicitly or implicitly at the end of a "with" statement). For example, a group of files can be opened as an “all or nothing” operation as follows: with ExitStack() as stack: files = [stack.enter_context(open(fname)) for fname in filenames] # Hold onto the close method, but don't call it yet. close_files = stack.pop_all().close # If opening any file fails, all previously opened files will be # closed automatically. If all files are opened successfully, # they will remain open even after the with statement ends. # close_files() can then be invoked explicitly to close them all. close() Immediately unwinds the callback stack, invoking callbacks in the reverse order of registration. For any context managers and exit callbacks registered, the arguments passed in will indicate that no exception occurred. class contextlib.AsyncExitStack An asynchronous context manager, similar to "ExitStack", that supports combining both synchronous and asynchronous context managers, as well as having coroutines for cleanup logic. The "close()" method is not implemented, "aclose()" must be used instead. coroutine enter_async_context(cm) Similar to "enter_context()" but expects an asynchronous context manager. push_async_exit(exit) Similar to "push()" but expects either an asynchronous context manager or a coroutine function. push_async_callback(callback, /, *args, **kwds) Similar to "callback()" but expects a coroutine function. coroutine aclose() Similar to "close()" but properly handles awaitables. Continuing the example for "asynccontextmanager()": async with AsyncExitStack() as stack: connections = [await stack.enter_async_context(get_connection()) for i in range(5)] # All opened connections will automatically be released at the end of # the async with statement, even if attempts to open a connection # later in the list raise an exception. New in version 3.7. Examples and Recipes ==================== This section describes some examples and recipes for making effective use of the tools provided by "contextlib". Supporting a variable number of context managers ------------------------------------------------ The primary use case for "ExitStack" is the one given in the class documentation: supporting a variable number of context managers and other cleanup operations in a single "with" statement. The variability may come from the number of context managers needed being driven by user input (such as opening a user specified collection of files), or from some of the context managers being optional: with ExitStack() as stack: for resource in resources: stack.enter_context(resource) if need_special_resource(): special = acquire_special_resource() stack.callback(release_special_resource, special) # Perform operations that use the acquired resources As shown, "ExitStack" also makes it quite easy to use "with" statements to manage arbitrary resources that don’t natively support the context management protocol. Catching exceptions from "__enter__" methods -------------------------------------------- It is occasionally desirable to catch exceptions from an "__enter__" method implementation, *without* inadvertently catching exceptions from the "with" statement body or the context manager’s "__exit__" method. By using "ExitStack" the steps in the context management protocol can be separated slightly in order to allow this: stack = ExitStack() try: x = stack.enter_context(cm) except Exception: # handle __enter__ exception else: with stack: # Handle normal case Actually needing to do this is likely to indicate that the underlying API should be providing a direct resource management interface for use with "try"/"except"/"finally" statements, but not all APIs are well designed in that regard. When a context manager is the only resource management API provided, then "ExitStack" can make it easier to handle various situations that can’t be handled directly in a "with" statement. Cleaning up in an "__enter__" implementation -------------------------------------------- As noted in the documentation of "ExitStack.push()", this method can be useful in cleaning up an already allocated resource if later steps in the "__enter__()" implementation fail. Here’s an example of doing this for a context manager that accepts resource acquisition and release functions, along with an optional validation function, and maps them to the context management protocol: from contextlib import contextmanager, AbstractContextManager, ExitStack class ResourceManager(AbstractContextManager): def __init__(self, acquire_resource, release_resource, check_resource_ok=None): self.acquire_resource = acquire_resource self.release_resource = release_resource if check_resource_ok is None: def check_resource_ok(resource): return True self.check_resource_ok = check_resource_ok @contextmanager def _cleanup_on_error(self): with ExitStack() as stack: stack.push(self) yield # The validation check passed and didn't raise an exception # Accordingly, we want to keep the resource, and pass it # back to our caller stack.pop_all() def __enter__(self): resource = self.acquire_resource() with self._cleanup_on_error(): if not self.check_resource_ok(resource): msg = "Failed validation for {!r}" raise RuntimeError(msg.format(resource)) return resource def __exit__(self, *exc_details): # We don't need to duplicate any of our resource release logic self.release_resource() Replacing any use of "try-finally" and flag variables ----------------------------------------------------- A pattern you will sometimes see is a "try-finally" statement with a flag variable to indicate whether or not the body of the "finally" clause should be executed. In its simplest form (that can’t already be handled just by using an "except" clause instead), it looks something like this: cleanup_needed = True try: result = perform_operation() if result: cleanup_needed = False finally: if cleanup_needed: cleanup_resources() As with any "try" statement based code, this can cause problems for development and review, because the setup code and the cleanup code can end up being separated by arbitrarily long sections of code. "ExitStack" makes it possible to instead register a callback for execution at the end of a "with" statement, and then later decide to skip executing that callback: from contextlib import ExitStack with ExitStack() as stack: stack.callback(cleanup_resources) result = perform_operation() if result: stack.pop_all() This allows the intended cleanup up behaviour to be made explicit up front, rather than requiring a separate flag variable. If a particular application uses this pattern a lot, it can be simplified even further by means of a small helper class: from contextlib import ExitStack class Callback(ExitStack): def __init__(self, callback, /, *args, **kwds): super().__init__() self.callback(callback, *args, **kwds) def cancel(self): self.pop_all() with Callback(cleanup_resources) as cb: result = perform_operation() if result: cb.cancel() If the resource cleanup isn’t already neatly bundled into a standalone function, then it is still possible to use the decorator form of "ExitStack.callback()" to declare the resource cleanup in advance: from contextlib import ExitStack with ExitStack() as stack: @stack.callback def cleanup_resources(): ... result = perform_operation() if result: stack.pop_all() Due to the way the decorator protocol works, a callback function declared this way cannot take any parameters. Instead, any resources to be released must be accessed as closure variables. Using a context manager as a function decorator ----------------------------------------------- "ContextDecorator" makes it possible to use a context manager in both an ordinary "with" statement and also as a function decorator. For example, it is sometimes useful to wrap functions or groups of statements with a logger that can track the time of entry and time of exit. Rather than writing both a function decorator and a context manager for the task, inheriting from "ContextDecorator" provides both capabilities in a single definition: from contextlib import ContextDecorator import logging logging.basicConfig(level=logging.INFO) class track_entry_and_exit(ContextDecorator): def __init__(self, name): self.name = name def __enter__(self): logging.info('Entering: %s', self.name) def __exit__(self, exc_type, exc, exc_tb): logging.info('Exiting: %s', self.name) Instances of this class can be used as both a context manager: with track_entry_and_exit('widget loader'): print('Some time consuming activity goes here') load_widget() And also as a function decorator: @track_entry_and_exit('widget loader') def activity(): print('Some time consuming activity goes here') load_widget() Note that there is one additional limitation when using context managers as function decorators: there’s no way to access the return value of "__enter__()". If that value is needed, then it is still necessary to use an explicit "with" statement. See also: **PEP 343** - The “with” statement The specification, background, and examples for the Python "with" statement. Single use, reusable and reentrant context managers =================================================== Most context managers are written in a way that means they can only be used effectively in a "with" statement once. These single use context managers must be created afresh each time they’re used - attempting to use them a second time will trigger an exception or otherwise not work correctly. This common limitation means that it is generally advisable to create context managers directly in the header of the "with" statement where they are used (as shown in all of the usage examples above). Files are an example of effectively single use context managers, since the first "with" statement will close the file, preventing any further IO operations using that file object. Context managers created using "contextmanager()" are also single use context managers, and will complain about the underlying generator failing to yield if an attempt is made to use them a second time: >>> from contextlib import contextmanager >>> @contextmanager ... def singleuse(): ... print("Before") ... yield ... print("After") ... >>> cm = singleuse() >>> with cm: ... pass ... Before After >>> with cm: ... pass ... Traceback (most recent call last): ... RuntimeError: generator didn't yield Reentrant context managers -------------------------- More sophisticated context managers may be “reentrant”. These context managers can not only be used in multiple "with" statements, but may also be used *inside* a "with" statement that is already using the same context manager. "threading.RLock" is an example of a reentrant context manager, as are "suppress()" and "redirect_stdout()". Here’s a very simple example of reentrant use: >>> from contextlib import redirect_stdout >>> from io import StringIO >>> stream = StringIO() >>> write_to_stream = redirect_stdout(stream) >>> with write_to_stream: ... print("This is written to the stream rather than stdout") ... with write_to_stream: ... print("This is also written to the stream") ... >>> print("This is written directly to stdout") This is written directly to stdout >>> print(stream.getvalue()) This is written to the stream rather than stdout This is also written to the stream Real world examples of reentrancy are more likely to involve multiple functions calling each other and hence be far more complicated than this example. Note also that being reentrant is *not* the same thing as being thread safe. "redirect_stdout()", for example, is definitely not thread safe, as it makes a global modification to the system state by binding "sys.stdout" to a different stream. Reusable context managers ------------------------- Distinct from both single use and reentrant context managers are “reusable” context managers (or, to be completely explicit, “reusable, but not reentrant” context managers, since reentrant context managers are also reusable). These context managers support being used multiple times, but will fail (or otherwise not work correctly) if the specific context manager instance has already been used in a containing with statement. "threading.Lock" is an example of a reusable, but not reentrant, context manager (for a reentrant lock, it is necessary to use "threading.RLock" instead). Another example of a reusable, but not reentrant, context manager is "ExitStack", as it invokes *all* currently registered callbacks when leaving any with statement, regardless of where those callbacks were added: >>> from contextlib import ExitStack >>> stack = ExitStack() >>> with stack: ... stack.callback(print, "Callback: from first context") ... print("Leaving first context") ... Leaving first context Callback: from first context >>> with stack: ... stack.callback(print, "Callback: from second context") ... print("Leaving second context") ... Leaving second context Callback: from second context >>> with stack: ... stack.callback(print, "Callback: from outer context") ... with stack: ... stack.callback(print, "Callback: from inner context") ... print("Leaving inner context") ... print("Leaving outer context") ... Leaving inner context Callback: from inner context Callback: from outer context Leaving outer context As the output from the example shows, reusing a single stack object across multiple with statements works correctly, but attempting to nest them will cause the stack to be cleared at the end of the innermost with statement, which is unlikely to be desirable behaviour. Using separate "ExitStack" instances instead of reusing a single instance avoids that problem: >>> from contextlib import ExitStack >>> with ExitStack() as outer_stack: ... outer_stack.callback(print, "Callback: from outer context") ... with ExitStack() as inner_stack: ... inner_stack.callback(print, "Callback: from inner context") ... print("Leaving inner context") ... print("Leaving outer context") ... Leaving inner context Callback: from inner context Leaving outer context Callback: from outer context "contextvars" — Context Variables ********************************* ====================================================================== This module provides APIs to manage, store, and access context-local state. The "ContextVar" class is used to declare and work with *Context Variables*. The "copy_context()" function and the "Context" class should be used to manage the current context in asynchronous frameworks. Context managers that have state should use Context Variables instead of "threading.local()" to prevent their state from bleeding to other code unexpectedly, when used in concurrent code. See also **PEP 567** for additional details. New in version 3.7. Context Variables ================= class contextvars.ContextVar(name[, *, default]) This class is used to declare a new Context Variable, e.g.: var: ContextVar[int] = ContextVar('var', default=42) The required *name* parameter is used for introspection and debug purposes. The optional keyword-only *default* parameter is returned by "ContextVar.get()" when no value for the variable is found in the current context. **Important:** Context Variables should be created at the top module level and never in closures. "Context" objects hold strong references to context variables which prevents context variables from being properly garbage collected. name The name of the variable. This is a read-only property. New in version 3.7.1. get([default]) Return a value for the context variable for the current context. If there is no value for the variable in the current context, the method will: * return the value of the *default* argument of the method, if provided; or * return the default value for the context variable, if it was created with one; or * raise a "LookupError". set(value) Call to set a new value for the context variable in the current context. The required *value* argument is the new value for the context variable. Returns a "Token" object that can be used to restore the variable to its previous value via the "ContextVar.reset()" method. reset(token) Reset the context variable to the value it had before the "ContextVar.set()" that created the *token* was used. For example: var = ContextVar('var') token = var.set('new value') # code that uses 'var'; var.get() returns 'new value'. var.reset(token) # After the reset call the var has no value again, so # var.get() would raise a LookupError. class contextvars.Token *Token* objects are returned by the "ContextVar.set()" method. They can be passed to the "ContextVar.reset()" method to revert the value of the variable to what it was before the corresponding *set*. var A read-only property. Points to the "ContextVar" object that created the token. old_value A read-only property. Set to the value the variable had before the "ContextVar.set()" method call that created the token. It points to "Token.MISSING" is the variable was not set before the call. MISSING A marker object used by "Token.old_value". Manual Context Management ========================= contextvars.copy_context() Returns a copy of the current "Context" object. The following snippet gets a copy of the current context and prints all variables and their values that are set in it: ctx: Context = copy_context() print(list(ctx.items())) The function has an O(1) complexity, i.e. works equally fast for contexts with a few context variables and for contexts that have a lot of them. class contextvars.Context A mapping of "ContextVars" to their values. "Context()" creates an empty context with no values in it. To get a copy of the current context use the "copy_context()" function. Every thread will have a different top-level "Context" object. This means that a "ContextVar" object behaves in a similar fashion to "threading.local()" when values are assigned in different threads. Context implements the "collections.abc.Mapping" interface. run(callable, *args, **kwargs) Execute "callable(*args, **kwargs)" code in the context object the *run* method is called on. Return the result of the execution or propagate an exception if one occurred. Any changes to any context variables that *callable* makes will be contained in the context object: var = ContextVar('var') var.set('spam') def main(): # 'var' was set to 'spam' before # calling 'copy_context()' and 'ctx.run(main)', so: # var.get() == ctx[var] == 'spam' var.set('ham') # Now, after setting 'var' to 'ham': # var.get() == ctx[var] == 'ham' ctx = copy_context() # Any changes that the 'main' function makes to 'var' # will be contained in 'ctx'. ctx.run(main) # The 'main()' function was run in the 'ctx' context, # so changes to 'var' are contained in it: # ctx[var] == 'ham' # However, outside of 'ctx', 'var' is still set to 'spam': # var.get() == 'spam' The method raises a "RuntimeError" when called on the same context object from more than one OS thread, or when called recursively. copy() Return a shallow copy of the context object. var in context Return "True" if the *context* has a value for *var* set; return "False" otherwise. context[var] Return the value of the *var* "ContextVar" variable. If the variable is not set in the context object, a "KeyError" is raised. get(var[, default]) Return the value for *var* if *var* has the value in the context object. Return *default* otherwise. If *default* is not given, return "None". iter(context) Return an iterator over the variables stored in the context object. len(proxy) Return the number of variables set in the context object. keys() Return a list of all variables in the context object. values() Return a list of all variables’ values in the context object. items() Return a list of 2-tuples containing all variables and their values in the context object. asyncio support =============== Context variables are natively supported in "asyncio" and are ready to be used without any extra configuration. For example, here is a simple echo server, that uses a context variable to make the address of a remote client available in the Task that handles that client: import asyncio import contextvars client_addr_var = contextvars.ContextVar('client_addr') def render_goodbye(): # The address of the currently handled client can be accessed # without passing it explicitly to this function. client_addr = client_addr_var.get() return f'Good bye, client @ {client_addr}\n'.encode() async def handle_request(reader, writer): addr = writer.transport.get_extra_info('socket').getpeername() client_addr_var.set(addr) # In any code that we call is now possible to get # client's address by calling 'client_addr_var.get()'. while True: line = await reader.readline() print(line) if not line.strip(): break writer.write(line) writer.write(render_goodbye()) writer.close() async def main(): srv = await asyncio.start_server( handle_request, '127.0.0.1', 8081) async with srv: await srv.serve_forever() asyncio.run(main()) # To test it you can use telnet: # telnet 127.0.0.1 8081 "copy" — Shallow and deep copy operations ***************************************** **Source code:** Lib/copy.py ====================================================================== Assignment statements in Python do not copy objects, they create bindings between a target and an object. For collections that are mutable or contain mutable items, a copy is sometimes needed so one can change one copy without changing the other. This module provides generic shallow and deep copy operations (explained below). Interface summary: copy.copy(x) Return a shallow copy of *x*. copy.deepcopy(x[, memo]) Return a deep copy of *x*. exception copy.Error Raised for module specific errors. The difference between shallow and deep copying is only relevant for compound objects (objects that contain other objects, like lists or class instances): * A *shallow copy* constructs a new compound object and then (to the extent possible) inserts *references* into it to the objects found in the original. * A *deep copy* constructs a new compound object and then, recursively, inserts *copies* into it of the objects found in the original. Two problems often exist with deep copy operations that don’t exist with shallow copy operations: * Recursive objects (compound objects that, directly or indirectly, contain a reference to themselves) may cause a recursive loop. * Because deep copy copies everything it may copy too much, such as data which is intended to be shared between copies. The "deepcopy()" function avoids these problems by: * keeping a "memo" dictionary of objects already copied during the current copying pass; and * letting user-defined classes override the copying operation or the set of components copied. This module does not copy types like module, method, stack trace, stack frame, file, socket, window, or any similar types. It does “copy” functions and classes (shallow and deeply), by returning the original object unchanged; this is compatible with the way these are treated by the "pickle" module. Shallow copies of dictionaries can be made using "dict.copy()", and of lists by assigning a slice of the entire list, for example, "copied_list = original_list[:]". Classes can use the same interfaces to control copying that they use to control pickling. See the description of module "pickle" for information on these methods. In fact, the "copy" module uses the registered pickle functions from the "copyreg" module. In order for a class to define its own copy implementation, it can define special methods "__copy__()" and "__deepcopy__()". The former is called to implement the shallow copy operation; no additional arguments are passed. The latter is called to implement the deep copy operation; it is passed one argument, the "memo" dictionary. If the "__deepcopy__()" implementation needs to make a deep copy of a component, it should call the "deepcopy()" function with the component as first argument and the memo dictionary as second argument. See also: Module "pickle" Discussion of the special methods used to support object state retrieval and restoration. "copyreg" — Register "pickle" support functions *********************************************** **Source code:** Lib/copyreg.py ====================================================================== The "copyreg" module offers a way to define functions used while pickling specific objects. The "pickle" and "copy" modules use those functions when pickling/copying those objects. The module provides configuration information about object constructors which are not classes. Such constructors may be factory functions or class instances. copyreg.constructor(object) Declares *object* to be a valid constructor. If *object* is not callable (and hence not valid as a constructor), raises "TypeError". copyreg.pickle(type, function, constructor=None) Declares that *function* should be used as a “reduction” function for objects of type *type*. *function* should return either a string or a tuple containing two or three elements. The optional *constructor* parameter, if provided, is a callable object which can be used to reconstruct the object when called with the tuple of arguments returned by *function* at pickling time. A "TypeError" is raised if the *constructor* is not callable. See the "pickle" module for more details on the interface expected of *function* and *constructor*. Note that the "dispatch_table" attribute of a pickler object or subclass of "pickle.Pickler" can also be used for declaring reduction functions. Example ======= The example below would like to show how to register a pickle function and how it will be used: >>> import copyreg, copy, pickle >>> class C: ... def __init__(self, a): ... self.a = a ... >>> def pickle_c(c): ... print("pickling a C instance...") ... return C, (c.a,) ... >>> copyreg.pickle(C, pickle_c) >>> c = C(1) >>> d = copy.copy(c) pickling a C instance... >>> p = pickle.dumps(c) pickling a C instance... "crypt" — Function to check Unix passwords ****************************************** **Source code:** Lib/crypt.py Deprecated since version 3.11: The "crypt" module is deprecated (see **PEP 594** for details and alternatives). The "hashlib" module is a potential replacement for certain use cases. ====================================================================== This module implements an interface to the *crypt(3)* routine, which is a one-way hash function based upon a modified DES algorithm; see the Unix man page for further details. Possible uses include storing hashed passwords so you can check passwords without storing the actual password, or attempting to crack Unix passwords with a dictionary. Notice that the behavior of this module depends on the actual implementation of the *crypt(3)* routine in the running system. Therefore, any extensions available on the current implementation will also be available on this module. Availability: Unix. Not available on VxWorks. Hashing Methods =============== New in version 3.3. The "crypt" module defines the list of hashing methods (not all methods are available on all platforms): crypt.METHOD_SHA512 A Modular Crypt Format method with 16 character salt and 86 character hash based on the SHA-512 hash function. This is the strongest method. crypt.METHOD_SHA256 Another Modular Crypt Format method with 16 character salt and 43 character hash based on the SHA-256 hash function. crypt.METHOD_BLOWFISH Another Modular Crypt Format method with 22 character salt and 31 character hash based on the Blowfish cipher. New in version 3.7. crypt.METHOD_MD5 Another Modular Crypt Format method with 8 character salt and 22 character hash based on the MD5 hash function. crypt.METHOD_CRYPT The traditional method with a 2 character salt and 13 characters of hash. This is the weakest method. Module Attributes ================= New in version 3.3. crypt.methods A list of available password hashing algorithms, as "crypt.METHOD_*" objects. This list is sorted from strongest to weakest. Module Functions ================ The "crypt" module defines the following functions: crypt.crypt(word, salt=None) *word* will usually be a user’s password as typed at a prompt or in a graphical interface. The optional *salt* is either a string as returned from "mksalt()", one of the "crypt.METHOD_*" values (though not all may be available on all platforms), or a full encrypted password including salt, as returned by this function. If *salt* is not provided, the strongest method available in "methods" will be used. Checking a password is usually done by passing the plain-text password as *word* and the full results of a previous "crypt()" call, which should be the same as the results of this call. *salt* (either a random 2 or 16 character string, possibly prefixed with "$digit$" to indicate the method) which will be used to perturb the encryption algorithm. The characters in *salt* must be in the set "[./a-zA-Z0-9]", with the exception of Modular Crypt Format which prefixes a "$digit$". Returns the hashed password as a string, which will be composed of characters from the same alphabet as the salt. Since a few *crypt(3)* extensions allow different values, with different sizes in the *salt*, it is recommended to use the full crypted password as salt when checking for a password. Changed in version 3.3: Accept "crypt.METHOD_*" values in addition to strings for *salt*. crypt.mksalt(method=None, *, rounds=None) Return a randomly generated salt of the specified method. If no *method* is given, the strongest method available in "methods" is used. The return value is a string suitable for passing as the *salt* argument to "crypt()". *rounds* specifies the number of rounds for "METHOD_SHA256", "METHOD_SHA512" and "METHOD_BLOWFISH". For "METHOD_SHA256" and "METHOD_SHA512" it must be an integer between "1000" and "999_999_999", the default is "5000". For "METHOD_BLOWFISH" it must be a power of two between "16" (2^4) and "2_147_483_648" (2^31), the default is "4096" (2^12). New in version 3.3. Changed in version 3.7: Added the *rounds* parameter. Examples ======== A simple example illustrating typical use (a constant-time comparison operation is needed to limit exposure to timing attacks. "hmac.compare_digest()" is suitable for this purpose): import pwd import crypt import getpass from hmac import compare_digest as compare_hash def login(): username = input('Python login: ') cryptedpasswd = pwd.getpwnam(username)[1] if cryptedpasswd: if cryptedpasswd == 'x' or cryptedpasswd == '*': raise ValueError('no support for shadow passwords') cleartext = getpass.getpass() return compare_hash(crypt.crypt(cleartext, cryptedpasswd), cryptedpasswd) else: return True To generate a hash of a password using the strongest available method and check it against the original: import crypt from hmac import compare_digest as compare_hash hashed = crypt.crypt(plaintext) if not compare_hash(hashed, crypt.crypt(plaintext, hashed)): raise ValueError("hashed version doesn't validate against original") Cryptographic Services ********************** The modules described in this chapter implement various algorithms of a cryptographic nature. They are available at the discretion of the installation. On Unix systems, the "crypt" module may also be available. Here’s an overview: * "hashlib" — Secure hashes and message digests * Hash algorithms * SHAKE variable length digests * Key derivation * BLAKE2 * Creating hash objects * Constants * Examples * Simple hashing * Using different digest sizes * Keyed hashing * Randomized hashing * Personalization * Tree mode * Credits * "hmac" — Keyed-Hashing for Message Authentication * "secrets" — Generate secure random numbers for managing secrets * Random numbers * Generating tokens * How many bytes should tokens use? * Other functions * Recipes and best practices "csv" — CSV File Reading and Writing ************************************ **Source code:** Lib/csv.py ====================================================================== The so-called CSV (Comma Separated Values) format is the most common import and export format for spreadsheets and databases. CSV format was used for many years prior to attempts to describe the format in a standardized way in **RFC 4180**. The lack of a well-defined standard means that subtle differences often exist in the data produced and consumed by different applications. These differences can make it annoying to process CSV files from multiple sources. Still, while the delimiters and quoting characters vary, the overall format is similar enough that it is possible to write a single module which can efficiently manipulate such data, hiding the details of reading and writing the data from the programmer. The "csv" module implements classes to read and write tabular data in CSV format. It allows programmers to say, “write this data in the format preferred by Excel,” or “read data from this file which was generated by Excel,” without knowing the precise details of the CSV format used by Excel. Programmers can also describe the CSV formats understood by other applications or define their own special-purpose CSV formats. The "csv" module’s "reader" and "writer" objects read and write sequences. Programmers can also read and write data in dictionary form using the "DictReader" and "DictWriter" classes. See also: **PEP 305** - CSV File API The Python Enhancement Proposal which proposed this addition to Python. Module Contents =============== The "csv" module defines the following functions: csv.reader(csvfile, dialect='excel', **fmtparams) Return a reader object which will iterate over lines in the given *csvfile*. *csvfile* can be any object which supports the *iterator* protocol and returns a string each time its "__next__()" method is called — *file objects* and list objects are both suitable. If *csvfile* is a file object, it should be opened with "newline=''". [1] An optional *dialect* parameter can be given which is used to define a set of parameters specific to a particular CSV dialect. It may be an instance of a subclass of the "Dialect" class or one of the strings returned by the "list_dialects()" function. The other optional *fmtparams* keyword arguments can be given to override individual formatting parameters in the current dialect. For full details about the dialect and formatting parameters, see section Dialects and Formatting Parameters. Each row read from the csv file is returned as a list of strings. No automatic data type conversion is performed unless the "QUOTE_NONNUMERIC" format option is specified (in which case unquoted fields are transformed into floats). A short usage example: >>> import csv >>> with open('eggs.csv', newline='') as csvfile: ... spamreader = csv.reader(csvfile, delimiter=' ', quotechar='|') ... for row in spamreader: ... print(', '.join(row)) Spam, Spam, Spam, Spam, Spam, Baked Beans Spam, Lovely Spam, Wonderful Spam csv.writer(csvfile, dialect='excel', **fmtparams) Return a writer object responsible for converting the user’s data into delimited strings on the given file-like object. *csvfile* can be any object with a "write()" method. If *csvfile* is a file object, it should be opened with "newline=''" [1]. An optional *dialect* parameter can be given which is used to define a set of parameters specific to a particular CSV dialect. It may be an instance of a subclass of the "Dialect" class or one of the strings returned by the "list_dialects()" function. The other optional *fmtparams* keyword arguments can be given to override individual formatting parameters in the current dialect. For full details about dialects and formatting parameters, see the Dialects and Formatting Parameters section. To make it as easy as possible to interface with modules which implement the DB API, the value "None" is written as the empty string. While this isn’t a reversible transformation, it makes it easier to dump SQL NULL data values to CSV files without preprocessing the data returned from a "cursor.fetch*" call. All other non-string data are stringified with "str()" before being written. A short usage example: import csv with open('eggs.csv', 'w', newline='') as csvfile: spamwriter = csv.writer(csvfile, delimiter=' ', quotechar='|', quoting=csv.QUOTE_MINIMAL) spamwriter.writerow(['Spam'] * 5 + ['Baked Beans']) spamwriter.writerow(['Spam', 'Lovely Spam', 'Wonderful Spam']) csv.register_dialect(name[, dialect[, **fmtparams]]) Associate *dialect* with *name*. *name* must be a string. The dialect can be specified either by passing a sub-class of "Dialect", or by *fmtparams* keyword arguments, or both, with keyword arguments overriding parameters of the dialect. For full details about dialects and formatting parameters, see section Dialects and Formatting Parameters. csv.unregister_dialect(name) Delete the dialect associated with *name* from the dialect registry. An "Error" is raised if *name* is not a registered dialect name. csv.get_dialect(name) Return the dialect associated with *name*. An "Error" is raised if *name* is not a registered dialect name. This function returns an immutable "Dialect". csv.list_dialects() Return the names of all registered dialects. csv.field_size_limit([new_limit]) Returns the current maximum field size allowed by the parser. If *new_limit* is given, this becomes the new limit. The "csv" module defines the following classes: class csv.DictReader(f, fieldnames=None, restkey=None, restval=None, dialect='excel', *args, **kwds) Create an object that operates like a regular reader but maps the information in each row to a "dict" whose keys are given by the optional *fieldnames* parameter. The *fieldnames* parameter is a *sequence*. If *fieldnames* is omitted, the values in the first row of file *f* will be used as the fieldnames. Regardless of how the fieldnames are determined, the dictionary preserves their original ordering. If a row has more fields than fieldnames, the remaining data is put in a list and stored with the fieldname specified by *restkey* (which defaults to "None"). If a non-blank row has fewer fields than fieldnames, the missing values are filled-in with the value of *restval* (which defaults to "None"). All other optional or keyword arguments are passed to the underlying "reader" instance. Changed in version 3.6: Returned rows are now of type "OrderedDict". Changed in version 3.8: Returned rows are now of type "dict". A short usage example: >>> import csv >>> with open('names.csv', newline='') as csvfile: ... reader = csv.DictReader(csvfile) ... for row in reader: ... print(row['first_name'], row['last_name']) ... Eric Idle John Cleese >>> print(row) {'first_name': 'John', 'last_name': 'Cleese'} class csv.DictWriter(f, fieldnames, restval='', extrasaction='raise', dialect='excel', *args, **kwds) Create an object which operates like a regular writer but maps dictionaries onto output rows. The *fieldnames* parameter is a "sequence" of keys that identify the order in which values in the dictionary passed to the "writerow()" method are written to file *f*. The optional *restval* parameter specifies the value to be written if the dictionary is missing a key in *fieldnames*. If the dictionary passed to the "writerow()" method contains a key not found in *fieldnames*, the optional *extrasaction* parameter indicates what action to take. If it is set to "'raise'", the default value, a "ValueError" is raised. If it is set to "'ignore'", extra values in the dictionary are ignored. Any other optional or keyword arguments are passed to the underlying "writer" instance. Note that unlike the "DictReader" class, the *fieldnames* parameter of the "DictWriter" class is not optional. A short usage example: import csv with open('names.csv', 'w', newline='') as csvfile: fieldnames = ['first_name', 'last_name'] writer = csv.DictWriter(csvfile, fieldnames=fieldnames) writer.writeheader() writer.writerow({'first_name': 'Baked', 'last_name': 'Beans'}) writer.writerow({'first_name': 'Lovely', 'last_name': 'Spam'}) writer.writerow({'first_name': 'Wonderful', 'last_name': 'Spam'}) class csv.Dialect The "Dialect" class is a container class whose attributes contain information for how to handle doublequotes, whitespace, delimiters, etc. Due to the lack of a strict CSV specification, different applications produce subtly different CSV data. "Dialect" instances define how "reader" and "writer" instances behave. All available "Dialect" names are returned by "list_dialects()", and they can be registered with specific "reader" and "writer" classes through their initializer ("__init__") functions like this: import csv with open('students.csv', 'w', newline='') as csvfile: writer = csv.writer(csvfile, dialect='unix') ^^^^^^^^^^^^^^ class csv.excel The "excel" class defines the usual properties of an Excel- generated CSV file. It is registered with the dialect name "'excel'". class csv.excel_tab The "excel_tab" class defines the usual properties of an Excel- generated TAB-delimited file. It is registered with the dialect name "'excel-tab'". class csv.unix_dialect The "unix_dialect" class defines the usual properties of a CSV file generated on UNIX systems, i.e. using "'\n'" as line terminator and quoting all fields. It is registered with the dialect name "'unix'". New in version 3.2. class csv.Sniffer The "Sniffer" class is used to deduce the format of a CSV file. The "Sniffer" class provides two methods: sniff(sample, delimiters=None) Analyze the given *sample* and return a "Dialect" subclass reflecting the parameters found. If the optional *delimiters* parameter is given, it is interpreted as a string containing possible valid delimiter characters. has_header(sample) Analyze the sample text (presumed to be in CSV format) and return "True" if the first row appears to be a series of column headers. An example for "Sniffer" use: with open('example.csv', newline='') as csvfile: dialect = csv.Sniffer().sniff(csvfile.read(1024)) csvfile.seek(0) reader = csv.reader(csvfile, dialect) # ... process CSV file contents here ... The "csv" module defines the following constants: csv.QUOTE_ALL Instructs "writer" objects to quote all fields. csv.QUOTE_MINIMAL Instructs "writer" objects to only quote those fields which contain special characters such as *delimiter*, *quotechar* or any of the characters in *lineterminator*. csv.QUOTE_NONNUMERIC Instructs "writer" objects to quote all non-numeric fields. Instructs the reader to convert all non-quoted fields to type *float*. csv.QUOTE_NONE Instructs "writer" objects to never quote fields. When the current *delimiter* occurs in output data it is preceded by the current *escapechar* character. If *escapechar* is not set, the writer will raise "Error" if any characters that require escaping are encountered. Instructs "reader" to perform no special processing of quote characters. The "csv" module defines the following exception: exception csv.Error Raised by any of the functions when an error is detected. Dialects and Formatting Parameters ================================== To make it easier to specify the format of input and output records, specific formatting parameters are grouped together into dialects. A dialect is a subclass of the "Dialect" class having a set of specific methods and a single "validate()" method. When creating "reader" or "writer" objects, the programmer can specify a string or a subclass of the "Dialect" class as the dialect parameter. In addition to, or instead of, the *dialect* parameter, the programmer can also specify individual formatting parameters, which have the same names as the attributes defined below for the "Dialect" class. Dialects support the following attributes: Dialect.delimiter A one-character string used to separate fields. It defaults to "','". Dialect.doublequote Controls how instances of *quotechar* appearing inside a field should themselves be quoted. When "True", the character is doubled. When "False", the *escapechar* is used as a prefix to the *quotechar*. It defaults to "True". On output, if *doublequote* is "False" and no *escapechar* is set, "Error" is raised if a *quotechar* is found in a field. Dialect.escapechar A one-character string used by the writer to escape the *delimiter* if *quoting* is set to "QUOTE_NONE" and the *quotechar* if *doublequote* is "False". On reading, the *escapechar* removes any special meaning from the following character. It defaults to "None", which disables escaping. Dialect.lineterminator The string used to terminate lines produced by the "writer". It defaults to "'\r\n'". Note: The "reader" is hard-coded to recognise either "'\r'" or "'\n'" as end-of-line, and ignores *lineterminator*. This behavior may change in the future. Dialect.quotechar A one-character string used to quote fields containing special characters, such as the *delimiter* or *quotechar*, or which contain new-line characters. It defaults to "'"'". Dialect.quoting Controls when quotes should be generated by the writer and recognised by the reader. It can take on any of the "QUOTE_*" constants (see section Module Contents) and defaults to "QUOTE_MINIMAL". Dialect.skipinitialspace When "True", whitespace immediately following the *delimiter* is ignored. The default is "False". Dialect.strict When "True", raise exception "Error" on bad CSV input. The default is "False". Reader Objects ============== Reader objects ("DictReader" instances and objects returned by the "reader()" function) have the following public methods: csvreader.__next__() Return the next row of the reader’s iterable object as a list (if the object was returned from "reader()") or a dict (if it is a "DictReader" instance), parsed according to the current "Dialect". Usually you should call this as "next(reader)". Reader objects have the following public attributes: csvreader.dialect A read-only description of the dialect in use by the parser. csvreader.line_num The number of lines read from the source iterator. This is not the same as the number of records returned, as records can span multiple lines. DictReader objects have the following public attribute: csvreader.fieldnames If not passed as a parameter when creating the object, this attribute is initialized upon first access or when the first record is read from the file. Writer Objects ============== "Writer" objects ("DictWriter" instances and objects returned by the "writer()" function) have the following public methods. A *row* must be an iterable of strings or numbers for "Writer" objects and a dictionary mapping fieldnames to strings or numbers (by passing them through "str()" first) for "DictWriter" objects. Note that complex numbers are written out surrounded by parens. This may cause some problems for other programs which read CSV files (assuming they support complex numbers at all). csvwriter.writerow(row) Write the *row* parameter to the writer’s file object, formatted according to the current "Dialect". Return the return value of the call to the *write* method of the underlying file object. Changed in version 3.5: Added support of arbitrary iterables. csvwriter.writerows(rows) Write all elements in *rows* (an iterable of *row* objects as described above) to the writer’s file object, formatted according to the current dialect. Writer objects have the following public attribute: csvwriter.dialect A read-only description of the dialect in use by the writer. DictWriter objects have the following public method: DictWriter.writeheader() Write a row with the field names (as specified in the constructor) to the writer’s file object, formatted according to the current dialect. Return the return value of the "csvwriter.writerow()" call used internally. New in version 3.2. Changed in version 3.8: "writeheader()" now also returns the value returned by the "csvwriter.writerow()" method it uses internally. Examples ======== The simplest example of reading a CSV file: import csv with open('some.csv', newline='') as f: reader = csv.reader(f) for row in reader: print(row) Reading a file with an alternate format: import csv with open('passwd', newline='') as f: reader = csv.reader(f, delimiter=':', quoting=csv.QUOTE_NONE) for row in reader: print(row) The corresponding simplest possible writing example is: import csv with open('some.csv', 'w', newline='') as f: writer = csv.writer(f) writer.writerows(someiterable) Since "open()" is used to open a CSV file for reading, the file will by default be decoded into unicode using the system default encoding (see "locale.getpreferredencoding()"). To decode a file using a different encoding, use the "encoding" argument of open: import csv with open('some.csv', newline='', encoding='utf-8') as f: reader = csv.reader(f) for row in reader: print(row) The same applies to writing in something other than the system default encoding: specify the encoding argument when opening the output file. Registering a new dialect: import csv csv.register_dialect('unixpwd', delimiter=':', quoting=csv.QUOTE_NONE) with open('passwd', newline='') as f: reader = csv.reader(f, 'unixpwd') A slightly more advanced use of the reader — catching and reporting errors: import csv, sys filename = 'some.csv' with open(filename, newline='') as f: reader = csv.reader(f) try: for row in reader: print(row) except csv.Error as e: sys.exit('file {}, line {}: {}'.format(filename, reader.line_num, e)) And while the module doesn’t directly support parsing strings, it can easily be done: import csv for row in csv.reader(['one,two,three']): print(row) -[ Footnotes ]- [1] If "newline=''" is not specified, newlines embedded inside quoted fields will not be interpreted correctly, and on platforms that use "\r\n" linendings on write an extra "\r" will be added. It should always be safe to specify "newline=''", since the csv module does its own (*universal*) newline handling. "ctypes" — A foreign function library for Python ************************************************ ====================================================================== "ctypes" is a foreign function library for Python. It provides C compatible data types, and allows calling functions in DLLs or shared libraries. It can be used to wrap these libraries in pure Python. ctypes tutorial =============== Note: The code samples in this tutorial use "doctest" to make sure that they actually work. Since some code samples behave differently under Linux, Windows, or macOS, they contain doctest directives in comments. Note: Some code samples reference the ctypes "c_int" type. On platforms where "sizeof(long) == sizeof(int)" it is an alias to "c_long". So, you should not be confused if "c_long" is printed if you would expect "c_int" — they are actually the same type. Loading dynamic link libraries ------------------------------ "ctypes" exports the *cdll*, and on Windows *windll* and *oledll* objects, for loading dynamic link libraries. You load libraries by accessing them as attributes of these objects. *cdll* loads libraries which export functions using the standard "cdecl" calling convention, while *windll* libraries call functions using the "stdcall" calling convention. *oledll* also uses the "stdcall" calling convention, and assumes the functions return a Windows "HRESULT" error code. The error code is used to automatically raise an "OSError" exception when the function call fails. Changed in version 3.3: Windows errors used to raise "WindowsError", which is now an alias of "OSError". Here are some examples for Windows. Note that "msvcrt" is the MS standard C library containing most standard C functions, and uses the cdecl calling convention: >>> from ctypes import * >>> print(windll.kernel32) >>> print(cdll.msvcrt) >>> libc = cdll.msvcrt >>> Windows appends the usual ".dll" file suffix automatically. Note: Accessing the standard C library through "cdll.msvcrt" will use an outdated version of the library that may be incompatible with the one being used by Python. Where possible, use native Python functionality, or else import and use the "msvcrt" module. On Linux, it is required to specify the filename *including* the extension to load a library, so attribute access can not be used to load libraries. Either the "LoadLibrary()" method of the dll loaders should be used, or you should load the library by creating an instance of CDLL by calling the constructor: >>> cdll.LoadLibrary("libc.so.6") >>> libc = CDLL("libc.so.6") >>> libc >>> Accessing functions from loaded dlls ------------------------------------ Functions are accessed as attributes of dll objects: >>> from ctypes import * >>> libc.printf <_FuncPtr object at 0x...> >>> print(windll.kernel32.GetModuleHandleA) <_FuncPtr object at 0x...> >>> print(windll.kernel32.MyOwnFunction) Traceback (most recent call last): File "", line 1, in File "ctypes.py", line 239, in __getattr__ func = _StdcallFuncPtr(name, self) AttributeError: function 'MyOwnFunction' not found >>> Note that win32 system dlls like "kernel32" and "user32" often export ANSI as well as UNICODE versions of a function. The UNICODE version is exported with an "W" appended to the name, while the ANSI version is exported with an "A" appended to the name. The win32 "GetModuleHandle" function, which returns a *module handle* for a given module name, has the following C prototype, and a macro is used to expose one of them as "GetModuleHandle" depending on whether UNICODE is defined or not: /* ANSI version */ HMODULE GetModuleHandleA(LPCSTR lpModuleName); /* UNICODE version */ HMODULE GetModuleHandleW(LPCWSTR lpModuleName); *windll* does not try to select one of them by magic, you must access the version you need by specifying "GetModuleHandleA" or "GetModuleHandleW" explicitly, and then call it with bytes or string objects respectively. Sometimes, dlls export functions with names which aren’t valid Python identifiers, like ""??2@YAPAXI@Z"". In this case you have to use "getattr()" to retrieve the function: >>> getattr(cdll.msvcrt, "??2@YAPAXI@Z") <_FuncPtr object at 0x...> >>> On Windows, some dlls export functions not by name but by ordinal. These functions can be accessed by indexing the dll object with the ordinal number: >>> cdll.kernel32[1] <_FuncPtr object at 0x...> >>> cdll.kernel32[0] Traceback (most recent call last): File "", line 1, in File "ctypes.py", line 310, in __getitem__ func = _StdcallFuncPtr(name, self) AttributeError: function ordinal 0 not found >>> Calling functions ----------------- You can call these functions like any other Python callable. This example uses the "time()" function, which returns system time in seconds since the Unix epoch, and the "GetModuleHandleA()" function, which returns a win32 module handle. This example calls both functions with a "NULL" pointer ("None" should be used as the "NULL" pointer): >>> print(libc.time(None)) 1150640792 >>> print(hex(windll.kernel32.GetModuleHandleA(None))) 0x1d000000 >>> "ValueError" is raised when you call an "stdcall" function with the "cdecl" calling convention, or vice versa: >>> cdll.kernel32.GetModuleHandleA(None) Traceback (most recent call last): File "", line 1, in ValueError: Procedure probably called with not enough arguments (4 bytes missing) >>> >>> windll.msvcrt.printf(b"spam") Traceback (most recent call last): File "", line 1, in ValueError: Procedure probably called with too many arguments (4 bytes in excess) >>> To find out the correct calling convention you have to look into the C header file or the documentation for the function you want to call. On Windows, "ctypes" uses win32 structured exception handling to prevent crashes from general protection faults when functions are called with invalid argument values: >>> windll.kernel32.GetModuleHandleA(32) Traceback (most recent call last): File "", line 1, in OSError: exception: access violation reading 0x00000020 >>> There are, however, enough ways to crash Python with "ctypes", so you should be careful anyway. The "faulthandler" module can be helpful in debugging crashes (e.g. from segmentation faults produced by erroneous C library calls). "None", integers, bytes objects and (unicode) strings are the only native Python objects that can directly be used as parameters in these function calls. "None" is passed as a C "NULL" pointer, bytes objects and strings are passed as pointer to the memory block that contains their data ("char *" or "wchar_t *"). Python integers are passed as the platforms default C "int" type, their value is masked to fit into the C type. Before we move on calling functions with other parameter types, we have to learn more about "ctypes" data types. Fundamental data types ---------------------- "ctypes" defines a number of primitive C compatible data types: +------------------------+--------------------------------------------+------------------------------+ | ctypes type | C type | Python type | |========================|============================================|==============================| | "c_bool" | "_Bool" | bool (1) | +------------------------+--------------------------------------------+------------------------------+ | "c_char" | "char" | 1-character bytes object | +------------------------+--------------------------------------------+------------------------------+ | "c_wchar" | "wchar_t" | 1-character string | +------------------------+--------------------------------------------+------------------------------+ | "c_byte" | "char" | int | +------------------------+--------------------------------------------+------------------------------+ | "c_ubyte" | "unsigned char" | int | +------------------------+--------------------------------------------+------------------------------+ | "c_short" | "short" | int | +------------------------+--------------------------------------------+------------------------------+ | "c_ushort" | "unsigned short" | int | +------------------------+--------------------------------------------+------------------------------+ | "c_int" | "int" | int | +------------------------+--------------------------------------------+------------------------------+ | "c_uint" | "unsigned int" | int | +------------------------+--------------------------------------------+------------------------------+ | "c_long" | "long" | int | +------------------------+--------------------------------------------+------------------------------+ | "c_ulong" | "unsigned long" | int | +------------------------+--------------------------------------------+------------------------------+ | "c_longlong" | "__int64" or "long long" | int | +------------------------+--------------------------------------------+------------------------------+ | "c_ulonglong" | "unsigned __int64" or "unsigned long long" | int | +------------------------+--------------------------------------------+------------------------------+ | "c_size_t" | "size_t" | int | +------------------------+--------------------------------------------+------------------------------+ | "c_ssize_t" | "ssize_t" or "Py_ssize_t" | int | +------------------------+--------------------------------------------+------------------------------+ | "c_float" | "float" | float | +------------------------+--------------------------------------------+------------------------------+ | "c_double" | "double" | float | +------------------------+--------------------------------------------+------------------------------+ | "c_longdouble" | "long double" | float | +------------------------+--------------------------------------------+------------------------------+ | "c_char_p" | "char *" (NUL terminated) | bytes object or "None" | +------------------------+--------------------------------------------+------------------------------+ | "c_wchar_p" | "wchar_t *" (NUL terminated) | string or "None" | +------------------------+--------------------------------------------+------------------------------+ | "c_void_p" | "void *" | int or "None" | +------------------------+--------------------------------------------+------------------------------+ 1. The constructor accepts any object with a truth value. All these types can be created by calling them with an optional initializer of the correct type and value: >>> c_int() c_long(0) >>> c_wchar_p("Hello, World") c_wchar_p(140018365411392) >>> c_ushort(-3) c_ushort(65533) >>> Since these types are mutable, their value can also be changed afterwards: >>> i = c_int(42) >>> print(i) c_long(42) >>> print(i.value) 42 >>> i.value = -99 >>> print(i.value) -99 >>> Assigning a new value to instances of the pointer types "c_char_p", "c_wchar_p", and "c_void_p" changes the *memory location* they point to, *not the contents* of the memory block (of course not, because Python bytes objects are immutable): >>> s = "Hello, World" >>> c_s = c_wchar_p(s) >>> print(c_s) c_wchar_p(139966785747344) >>> print(c_s.value) Hello World >>> c_s.value = "Hi, there" >>> print(c_s) # the memory location has changed c_wchar_p(139966783348904) >>> print(c_s.value) Hi, there >>> print(s) # first object is unchanged Hello, World >>> You should be careful, however, not to pass them to functions expecting pointers to mutable memory. If you need mutable memory blocks, ctypes has a "create_string_buffer()" function which creates these in various ways. The current memory block contents can be accessed (or changed) with the "raw" property; if you want to access it as NUL terminated string, use the "value" property: >>> from ctypes import * >>> p = create_string_buffer(3) # create a 3 byte buffer, initialized to NUL bytes >>> print(sizeof(p), repr(p.raw)) 3 b'\x00\x00\x00' >>> p = create_string_buffer(b"Hello") # create a buffer containing a NUL terminated string >>> print(sizeof(p), repr(p.raw)) 6 b'Hello\x00' >>> print(repr(p.value)) b'Hello' >>> p = create_string_buffer(b"Hello", 10) # create a 10 byte buffer >>> print(sizeof(p), repr(p.raw)) 10 b'Hello\x00\x00\x00\x00\x00' >>> p.value = b"Hi" >>> print(sizeof(p), repr(p.raw)) 10 b'Hi\x00lo\x00\x00\x00\x00\x00' >>> The "create_string_buffer()" function replaces the "c_buffer()" function (which is still available as an alias), as well as the "c_string()" function from earlier ctypes releases. To create a mutable memory block containing unicode characters of the C type "wchar_t" use the "create_unicode_buffer()" function. Calling functions, continued ---------------------------- Note that printf prints to the real standard output channel, *not* to "sys.stdout", so these examples will only work at the console prompt, not from within *IDLE* or *PythonWin*: >>> printf = libc.printf >>> printf(b"Hello, %s\n", b"World!") Hello, World! 14 >>> printf(b"Hello, %S\n", "World!") Hello, World! 14 >>> printf(b"%d bottles of beer\n", 42) 42 bottles of beer 19 >>> printf(b"%f bottles of beer\n", 42.5) Traceback (most recent call last): File "", line 1, in ArgumentError: argument 2: exceptions.TypeError: Don't know how to convert parameter 2 >>> As has been mentioned before, all Python types except integers, strings, and bytes objects have to be wrapped in their corresponding "ctypes" type, so that they can be converted to the required C data type: >>> printf(b"An int %d, a double %f\n", 1234, c_double(3.14)) An int 1234, a double 3.140000 31 >>> Calling functions with your own custom data types ------------------------------------------------- You can also customize "ctypes" argument conversion to allow instances of your own classes be used as function arguments. "ctypes" looks for an "_as_parameter_" attribute and uses this as the function argument. Of course, it must be one of integer, string, or bytes: >>> class Bottles: ... def __init__(self, number): ... self._as_parameter_ = number ... >>> bottles = Bottles(42) >>> printf(b"%d bottles of beer\n", bottles) 42 bottles of beer 19 >>> If you don’t want to store the instance’s data in the "_as_parameter_" instance variable, you could define a "property" which makes the attribute available on request. Specifying the required argument types (function prototypes) ------------------------------------------------------------ It is possible to specify the required argument types of functions exported from DLLs by setting the "argtypes" attribute. "argtypes" must be a sequence of C data types (the "printf" function is probably not a good example here, because it takes a variable number and different types of parameters depending on the format string, on the other hand this is quite handy to experiment with this feature): >>> printf.argtypes = [c_char_p, c_char_p, c_int, c_double] >>> printf(b"String '%s', Int %d, Double %f\n", b"Hi", 10, 2.2) String 'Hi', Int 10, Double 2.200000 37 >>> Specifying a format protects against incompatible argument types (just as a prototype for a C function), and tries to convert the arguments to valid types: >>> printf(b"%d %d %d", 1, 2, 3) Traceback (most recent call last): File "", line 1, in ArgumentError: argument 2: exceptions.TypeError: wrong type >>> printf(b"%s %d %f\n", b"X", 2, 3) X 2 3.000000 13 >>> If you have defined your own classes which you pass to function calls, you have to implement a "from_param()" class method for them to be able to use them in the "argtypes" sequence. The "from_param()" class method receives the Python object passed to the function call, it should do a typecheck or whatever is needed to make sure this object is acceptable, and then return the object itself, its "_as_parameter_" attribute, or whatever you want to pass as the C function argument in this case. Again, the result should be an integer, string, bytes, a "ctypes" instance, or an object with an "_as_parameter_" attribute. Return types ------------ By default functions are assumed to return the C "int" type. Other return types can be specified by setting the "restype" attribute of the function object. Here is a more advanced example, it uses the "strchr" function, which expects a string pointer and a char, and returns a pointer to a string: >>> strchr = libc.strchr >>> strchr(b"abcdef", ord("d")) 8059983 >>> strchr.restype = c_char_p # c_char_p is a pointer to a string >>> strchr(b"abcdef", ord("d")) b'def' >>> print(strchr(b"abcdef", ord("x"))) None >>> If you want to avoid the "ord("x")" calls above, you can set the "argtypes" attribute, and the second argument will be converted from a single character Python bytes object into a C char: >>> strchr.restype = c_char_p >>> strchr.argtypes = [c_char_p, c_char] >>> strchr(b"abcdef", b"d") 'def' >>> strchr(b"abcdef", b"def") Traceback (most recent call last): File "", line 1, in ArgumentError: argument 2: exceptions.TypeError: one character string expected >>> print(strchr(b"abcdef", b"x")) None >>> strchr(b"abcdef", b"d") 'def' >>> You can also use a callable Python object (a function or a class for example) as the "restype" attribute, if the foreign function returns an integer. The callable will be called with the *integer* the C function returns, and the result of this call will be used as the result of your function call. This is useful to check for error return values and automatically raise an exception: >>> GetModuleHandle = windll.kernel32.GetModuleHandleA >>> def ValidHandle(value): ... if value == 0: ... raise WinError() ... return value ... >>> >>> GetModuleHandle.restype = ValidHandle >>> GetModuleHandle(None) 486539264 >>> GetModuleHandle("something silly") Traceback (most recent call last): File "", line 1, in File "", line 3, in ValidHandle OSError: [Errno 126] The specified module could not be found. >>> "WinError" is a function which will call Windows "FormatMessage()" api to get the string representation of an error code, and *returns* an exception. "WinError" takes an optional error code parameter, if no one is used, it calls "GetLastError()" to retrieve it. Please note that a much more powerful error checking mechanism is available through the "errcheck" attribute; see the reference manual for details. Passing pointers (or: passing parameters by reference) ------------------------------------------------------ Sometimes a C api function expects a *pointer* to a data type as parameter, probably to write into the corresponding location, or if the data is too large to be passed by value. This is also known as *passing parameters by reference*. "ctypes" exports the "byref()" function which is used to pass parameters by reference. The same effect can be achieved with the "pointer()" function, although "pointer()" does a lot more work since it constructs a real pointer object, so it is faster to use "byref()" if you don’t need the pointer object in Python itself: >>> i = c_int() >>> f = c_float() >>> s = create_string_buffer(b'\000' * 32) >>> print(i.value, f.value, repr(s.value)) 0 0.0 b'' >>> libc.sscanf(b"1 3.14 Hello", b"%d %f %s", ... byref(i), byref(f), s) 3 >>> print(i.value, f.value, repr(s.value)) 1 3.1400001049 b'Hello' >>> Structures and unions --------------------- Structures and unions must derive from the "Structure" and "Union" base classes which are defined in the "ctypes" module. Each subclass must define a "_fields_" attribute. "_fields_" must be a list of *2-tuples*, containing a *field name* and a *field type*. The field type must be a "ctypes" type like "c_int", or any other derived "ctypes" type: structure, union, array, pointer. Here is a simple example of a POINT structure, which contains two integers named *x* and *y*, and also shows how to initialize a structure in the constructor: >>> from ctypes import * >>> class POINT(Structure): ... _fields_ = [("x", c_int), ... ("y", c_int)] ... >>> point = POINT(10, 20) >>> print(point.x, point.y) 10 20 >>> point = POINT(y=5) >>> print(point.x, point.y) 0 5 >>> POINT(1, 2, 3) Traceback (most recent call last): File "", line 1, in TypeError: too many initializers >>> You can, however, build much more complicated structures. A structure can itself contain other structures by using a structure as a field type. Here is a RECT structure which contains two POINTs named *upperleft* and *lowerright*: >>> class RECT(Structure): ... _fields_ = [("upperleft", POINT), ... ("lowerright", POINT)] ... >>> rc = RECT(point) >>> print(rc.upperleft.x, rc.upperleft.y) 0 5 >>> print(rc.lowerright.x, rc.lowerright.y) 0 0 >>> Nested structures can also be initialized in the constructor in several ways: >>> r = RECT(POINT(1, 2), POINT(3, 4)) >>> r = RECT((1, 2), (3, 4)) Field *descriptor*s can be retrieved from the *class*, they are useful for debugging because they can provide useful information: >>> print(POINT.x) >>> print(POINT.y) >>> Warning: "ctypes" does not support passing unions or structures with bit- fields to functions by value. While this may work on 32-bit x86, it’s not guaranteed by the library to work in the general case. Unions and structures with bit-fields should always be passed to functions by pointer. Structure/union alignment and byte order ---------------------------------------- By default, Structure and Union fields are aligned in the same way the C compiler does it. It is possible to override this behavior by specifying a "_pack_" class attribute in the subclass definition. This must be set to a positive integer and specifies the maximum alignment for the fields. This is what "#pragma pack(n)" also does in MSVC. "ctypes" uses the native byte order for Structures and Unions. To build structures with non-native byte order, you can use one of the "BigEndianStructure", "LittleEndianStructure", "BigEndianUnion", and "LittleEndianUnion" base classes. These classes cannot contain pointer fields. Bit fields in structures and unions ----------------------------------- It is possible to create structures and unions containing bit fields. Bit fields are only possible for integer fields, the bit width is specified as the third item in the "_fields_" tuples: >>> class Int(Structure): ... _fields_ = [("first_16", c_int, 16), ... ("second_16", c_int, 16)] ... >>> print(Int.first_16) >>> print(Int.second_16) >>> Arrays ------ Arrays are sequences, containing a fixed number of instances of the same type. The recommended way to create array types is by multiplying a data type with a positive integer: TenPointsArrayType = POINT * 10 Here is an example of a somewhat artificial data type, a structure containing 4 POINTs among other stuff: >>> from ctypes import * >>> class POINT(Structure): ... _fields_ = ("x", c_int), ("y", c_int) ... >>> class MyStruct(Structure): ... _fields_ = [("a", c_int), ... ("b", c_float), ... ("point_array", POINT * 4)] >>> >>> print(len(MyStruct().point_array)) 4 >>> Instances are created in the usual way, by calling the class: arr = TenPointsArrayType() for pt in arr: print(pt.x, pt.y) The above code print a series of "0 0" lines, because the array contents is initialized to zeros. Initializers of the correct type can also be specified: >>> from ctypes import * >>> TenIntegers = c_int * 10 >>> ii = TenIntegers(1, 2, 3, 4, 5, 6, 7, 8, 9, 10) >>> print(ii) >>> for i in ii: print(i, end=" ") ... 1 2 3 4 5 6 7 8 9 10 >>> Pointers -------- Pointer instances are created by calling the "pointer()" function on a "ctypes" type: >>> from ctypes import * >>> i = c_int(42) >>> pi = pointer(i) >>> Pointer instances have a "contents" attribute which returns the object to which the pointer points, the "i" object above: >>> pi.contents c_long(42) >>> Note that "ctypes" does not have OOR (original object return), it constructs a new, equivalent object each time you retrieve an attribute: >>> pi.contents is i False >>> pi.contents is pi.contents False >>> Assigning another "c_int" instance to the pointer’s contents attribute would cause the pointer to point to the memory location where this is stored: >>> i = c_int(99) >>> pi.contents = i >>> pi.contents c_long(99) >>> Pointer instances can also be indexed with integers: >>> pi[0] 99 >>> Assigning to an integer index changes the pointed to value: >>> print(i) c_long(99) >>> pi[0] = 22 >>> print(i) c_long(22) >>> It is also possible to use indexes different from 0, but you must know what you’re doing, just as in C: You can access or change arbitrary memory locations. Generally you only use this feature if you receive a pointer from a C function, and you *know* that the pointer actually points to an array instead of a single item. Behind the scenes, the "pointer()" function does more than simply create pointer instances, it has to create pointer *types* first. This is done with the "POINTER()" function, which accepts any "ctypes" type, and returns a new type: >>> PI = POINTER(c_int) >>> PI >>> PI(42) Traceback (most recent call last): File "", line 1, in TypeError: expected c_long instead of int >>> PI(c_int(42)) >>> Calling the pointer type without an argument creates a "NULL" pointer. "NULL" pointers have a "False" boolean value: >>> null_ptr = POINTER(c_int)() >>> print(bool(null_ptr)) False >>> "ctypes" checks for "NULL" when dereferencing pointers (but dereferencing invalid non-"NULL" pointers would crash Python): >>> null_ptr[0] Traceback (most recent call last): .... ValueError: NULL pointer access >>> >>> null_ptr[0] = 1234 Traceback (most recent call last): .... ValueError: NULL pointer access >>> Type conversions ---------------- Usually, ctypes does strict type checking. This means, if you have "POINTER(c_int)" in the "argtypes" list of a function or as the type of a member field in a structure definition, only instances of exactly the same type are accepted. There are some exceptions to this rule, where ctypes accepts other objects. For example, you can pass compatible array instances instead of pointer types. So, for "POINTER(c_int)", ctypes accepts an array of c_int: >>> class Bar(Structure): ... _fields_ = [("count", c_int), ("values", POINTER(c_int))] ... >>> bar = Bar() >>> bar.values = (c_int * 3)(1, 2, 3) >>> bar.count = 3 >>> for i in range(bar.count): ... print(bar.values[i]) ... 1 2 3 >>> In addition, if a function argument is explicitly declared to be a pointer type (such as "POINTER(c_int)") in "argtypes", an object of the pointed type ("c_int" in this case) can be passed to the function. ctypes will apply the required "byref()" conversion in this case automatically. To set a POINTER type field to "NULL", you can assign "None": >>> bar.values = None >>> Sometimes you have instances of incompatible types. In C, you can cast one type into another type. "ctypes" provides a "cast()" function which can be used in the same way. The "Bar" structure defined above accepts "POINTER(c_int)" pointers or "c_int" arrays for its "values" field, but not instances of other types: >>> bar.values = (c_byte * 4)() Traceback (most recent call last): File "", line 1, in TypeError: incompatible types, c_byte_Array_4 instance instead of LP_c_long instance >>> For these cases, the "cast()" function is handy. The "cast()" function can be used to cast a ctypes instance into a pointer to a different ctypes data type. "cast()" takes two parameters, a ctypes object that is or can be converted to a pointer of some kind, and a ctypes pointer type. It returns an instance of the second argument, which references the same memory block as the first argument: >>> a = (c_byte * 4)() >>> cast(a, POINTER(c_int)) >>> So, "cast()" can be used to assign to the "values" field of "Bar" the structure: >>> bar = Bar() >>> bar.values = cast((c_byte * 4)(), POINTER(c_int)) >>> print(bar.values[0]) 0 >>> Incomplete Types ---------------- *Incomplete Types* are structures, unions or arrays whose members are not yet specified. In C, they are specified by forward declarations, which are defined later: struct cell; /* forward declaration */ struct cell { char *name; struct cell *next; }; The straightforward translation into ctypes code would be this, but it does not work: >>> class cell(Structure): ... _fields_ = [("name", c_char_p), ... ("next", POINTER(cell))] ... Traceback (most recent call last): File "", line 1, in File "", line 2, in cell NameError: name 'cell' is not defined >>> because the new "class cell" is not available in the class statement itself. In "ctypes", we can define the "cell" class and set the "_fields_" attribute later, after the class statement: >>> from ctypes import * >>> class cell(Structure): ... pass ... >>> cell._fields_ = [("name", c_char_p), ... ("next", POINTER(cell))] >>> Let’s try it. We create two instances of "cell", and let them point to each other, and finally follow the pointer chain a few times: >>> c1 = cell() >>> c1.name = b"foo" >>> c2 = cell() >>> c2.name = b"bar" >>> c1.next = pointer(c2) >>> c2.next = pointer(c1) >>> p = c1 >>> for i in range(8): ... print(p.name, end=" ") ... p = p.next[0] ... foo bar foo bar foo bar foo bar >>> Callback functions ------------------ "ctypes" allows creating C callable function pointers from Python callables. These are sometimes called *callback functions*. First, you must create a class for the callback function. The class knows the calling convention, the return type, and the number and types of arguments this function will receive. The "CFUNCTYPE()" factory function creates types for callback functions using the "cdecl" calling convention. On Windows, the "WINFUNCTYPE()" factory function creates types for callback functions using the "stdcall" calling convention. Both of these factory functions are called with the result type as first argument, and the callback functions expected argument types as the remaining arguments. I will present an example here which uses the standard C library’s "qsort()" function, that is used to sort items with the help of a callback function. "qsort()" will be used to sort an array of integers: >>> IntArray5 = c_int * 5 >>> ia = IntArray5(5, 1, 7, 33, 99) >>> qsort = libc.qsort >>> qsort.restype = None >>> "qsort()" must be called with a pointer to the data to sort, the number of items in the data array, the size of one item, and a pointer to the comparison function, the callback. The callback will then be called with two pointers to items, and it must return a negative integer if the first item is smaller than the second, a zero if they are equal, and a positive integer otherwise. So our callback function receives pointers to integers, and must return an integer. First we create the "type" for the callback function: >>> CMPFUNC = CFUNCTYPE(c_int, POINTER(c_int), POINTER(c_int)) >>> To get started, here is a simple callback that shows the values it gets passed: >>> def py_cmp_func(a, b): ... print("py_cmp_func", a[0], b[0]) ... return 0 ... >>> cmp_func = CMPFUNC(py_cmp_func) >>> The result: >>> qsort(ia, len(ia), sizeof(c_int), cmp_func) py_cmp_func 5 1 py_cmp_func 33 99 py_cmp_func 7 33 py_cmp_func 5 7 py_cmp_func 1 7 >>> Now we can actually compare the two items and return a useful result: >>> def py_cmp_func(a, b): ... print("py_cmp_func", a[0], b[0]) ... return a[0] - b[0] ... >>> >>> qsort(ia, len(ia), sizeof(c_int), CMPFUNC(py_cmp_func)) py_cmp_func 5 1 py_cmp_func 33 99 py_cmp_func 7 33 py_cmp_func 1 7 py_cmp_func 5 7 >>> As we can easily check, our array is sorted now: >>> for i in ia: print(i, end=" ") ... 1 5 7 33 99 >>> The function factories can be used as decorator factories, so we may as well write: >>> @CFUNCTYPE(c_int, POINTER(c_int), POINTER(c_int)) ... def py_cmp_func(a, b): ... print("py_cmp_func", a[0], b[0]) ... return a[0] - b[0] ... >>> qsort(ia, len(ia), sizeof(c_int), py_cmp_func) py_cmp_func 5 1 py_cmp_func 33 99 py_cmp_func 7 33 py_cmp_func 1 7 py_cmp_func 5 7 >>> Note: Make sure you keep references to "CFUNCTYPE()" objects as long as they are used from C code. "ctypes" doesn’t, and if you don’t, they may be garbage collected, crashing your program when a callback is made.Also, note that if the callback function is called in a thread created outside of Python’s control (e.g. by the foreign code that calls the callback), ctypes creates a new dummy Python thread on every invocation. This behavior is correct for most purposes, but it means that values stored with "threading.local" will *not* survive across different callbacks, even when those calls are made from the same C thread. Accessing values exported from dlls ----------------------------------- Some shared libraries not only export functions, they also export variables. An example in the Python library itself is the "Py_OptimizeFlag", an integer set to 0, 1, or 2, depending on the "-O" or "-OO" flag given on startup. "ctypes" can access values like this with the "in_dll()" class methods of the type. *pythonapi* is a predefined symbol giving access to the Python C api: >>> opt_flag = c_int.in_dll(pythonapi, "Py_OptimizeFlag") >>> print(opt_flag) c_long(0) >>> If the interpreter would have been started with "-O", the sample would have printed "c_long(1)", or "c_long(2)" if "-OO" would have been specified. An extended example which also demonstrates the use of pointers accesses the "PyImport_FrozenModules" pointer exported by Python. Quoting the docs for that value: This pointer is initialized to point to an array of "struct _frozen" records, terminated by one whose members are all "NULL" or zero. When a frozen module is imported, it is searched in this table. Third-party code could play tricks with this to provide a dynamically created collection of frozen modules. So manipulating this pointer could even prove useful. To restrict the example size, we show only how this table can be read with "ctypes": >>> from ctypes import * >>> >>> class struct_frozen(Structure): ... _fields_ = [("name", c_char_p), ... ("code", POINTER(c_ubyte)), ... ("size", c_int)] ... >>> We have defined the "struct _frozen" data type, so we can get the pointer to the table: >>> FrozenTable = POINTER(struct_frozen) >>> table = FrozenTable.in_dll(pythonapi, "PyImport_FrozenModules") >>> Since "table" is a "pointer" to the array of "struct_frozen" records, we can iterate over it, but we just have to make sure that our loop terminates, because pointers have no size. Sooner or later it would probably crash with an access violation or whatever, so it’s better to break out of the loop when we hit the "NULL" entry: >>> for item in table: ... if item.name is None: ... break ... print(item.name.decode("ascii"), item.size) ... _frozen_importlib 31764 _frozen_importlib_external 41499 __hello__ 161 __phello__ -161 __phello__.spam 161 >>> The fact that standard Python has a frozen module and a frozen package (indicated by the negative "size" member) is not well known, it is only used for testing. Try it out with "import __hello__" for example. Surprises --------- There are some edges in "ctypes" where you might expect something other than what actually happens. Consider the following example: >>> from ctypes import * >>> class POINT(Structure): ... _fields_ = ("x", c_int), ("y", c_int) ... >>> class RECT(Structure): ... _fields_ = ("a", POINT), ("b", POINT) ... >>> p1 = POINT(1, 2) >>> p2 = POINT(3, 4) >>> rc = RECT(p1, p2) >>> print(rc.a.x, rc.a.y, rc.b.x, rc.b.y) 1 2 3 4 >>> # now swap the two points >>> rc.a, rc.b = rc.b, rc.a >>> print(rc.a.x, rc.a.y, rc.b.x, rc.b.y) 3 4 3 4 >>> Hm. We certainly expected the last statement to print "3 4 1 2". What happened? Here are the steps of the "rc.a, rc.b = rc.b, rc.a" line above: >>> temp0, temp1 = rc.b, rc.a >>> rc.a = temp0 >>> rc.b = temp1 >>> Note that "temp0" and "temp1" are objects still using the internal buffer of the "rc" object above. So executing "rc.a = temp0" copies the buffer contents of "temp0" into "rc" ‘s buffer. This, in turn, changes the contents of "temp1". So, the last assignment "rc.b = temp1", doesn’t have the expected effect. Keep in mind that retrieving sub-objects from Structure, Unions, and Arrays doesn’t *copy* the sub-object, instead it retrieves a wrapper object accessing the root-object’s underlying buffer. Another example that may behave differently from what one would expect is this: >>> s = c_char_p() >>> s.value = b"abc def ghi" >>> s.value b'abc def ghi' >>> s.value is s.value False >>> Note: Objects instantiated from "c_char_p" can only have their value set to bytes or integers. Why is it printing "False"? ctypes instances are objects containing a memory block plus some *descriptor*s accessing the contents of the memory. Storing a Python object in the memory block does not store the object itself, instead the "contents" of the object is stored. Accessing the contents again constructs a new Python object each time! Variable-sized data types ------------------------- "ctypes" provides some support for variable-sized arrays and structures. The "resize()" function can be used to resize the memory buffer of an existing ctypes object. The function takes the object as first argument, and the requested size in bytes as the second argument. The memory block cannot be made smaller than the natural memory block specified by the objects type, a "ValueError" is raised if this is tried: >>> short_array = (c_short * 4)() >>> print(sizeof(short_array)) 8 >>> resize(short_array, 4) Traceback (most recent call last): ... ValueError: minimum size is 8 >>> resize(short_array, 32) >>> sizeof(short_array) 32 >>> sizeof(type(short_array)) 8 >>> This is nice and fine, but how would one access the additional elements contained in this array? Since the type still only knows about 4 elements, we get errors accessing other elements: >>> short_array[:] [0, 0, 0, 0] >>> short_array[7] Traceback (most recent call last): ... IndexError: invalid index >>> Another way to use variable-sized data types with "ctypes" is to use the dynamic nature of Python, and (re-)define the data type after the required size is already known, on a case by case basis. ctypes reference ================ Finding shared libraries ------------------------ When programming in a compiled language, shared libraries are accessed when compiling/linking a program, and when the program is run. The purpose of the "find_library()" function is to locate a library in a way similar to what the compiler or runtime loader does (on platforms with several versions of a shared library the most recent should be loaded), while the ctypes library loaders act like when a program is run, and call the runtime loader directly. The "ctypes.util" module provides a function which can help to determine the library to load. ctypes.util.find_library(name) Try to find a library and return a pathname. *name* is the library name without any prefix like *lib*, suffix like ".so", ".dylib" or version number (this is the form used for the posix linker option "-l"). If no library can be found, returns "None". The exact functionality is system dependent. On Linux, "find_library()" tries to run external programs ("/sbin/ldconfig", "gcc", "objdump" and "ld") to find the library file. It returns the filename of the library file. Changed in version 3.6: On Linux, the value of the environment variable "LD_LIBRARY_PATH" is used when searching for libraries, if a library cannot be found by any other means. Here are some examples: >>> from ctypes.util import find_library >>> find_library("m") 'libm.so.6' >>> find_library("c") 'libc.so.6' >>> find_library("bz2") 'libbz2.so.1.0' >>> On macOS, "find_library()" tries several predefined naming schemes and paths to locate the library, and returns a full pathname if successful: >>> from ctypes.util import find_library >>> find_library("c") '/usr/lib/libc.dylib' >>> find_library("m") '/usr/lib/libm.dylib' >>> find_library("bz2") '/usr/lib/libbz2.dylib' >>> find_library("AGL") '/System/Library/Frameworks/AGL.framework/AGL' >>> On Windows, "find_library()" searches along the system search path, and returns the full pathname, but since there is no predefined naming scheme a call like "find_library("c")" will fail and return "None". If wrapping a shared library with "ctypes", it *may* be better to determine the shared library name at development time, and hardcode that into the wrapper module instead of using "find_library()" to locate the library at runtime. Loading shared libraries ------------------------ There are several ways to load shared libraries into the Python process. One way is to instantiate one of the following classes: class ctypes.CDLL(name, mode=DEFAULT_MODE, handle=None, use_errno=False, use_last_error=False, winmode=None) Instances of this class represent loaded shared libraries. Functions in these libraries use the standard C calling convention, and are assumed to return "int". On Windows creating a "CDLL" instance may fail even if the DLL name exists. When a dependent DLL of the loaded DLL is not found, a "OSError" error is raised with the message *“[WinError 126] The specified module could not be found”.* This error message does not contain the name of the missing DLL because the Windows API does not return this information making this error hard to diagnose. To resolve this error and determine which DLL is not found, you need to find the list of dependent DLLs and determine which one is not found using Windows debugging and tracing tools. See also: Microsoft DUMPBIN tool – A tool to find DLL dependents. class ctypes.OleDLL(name, mode=DEFAULT_MODE, handle=None, use_errno=False, use_last_error=False, winmode=None) Windows only: Instances of this class represent loaded shared libraries, functions in these libraries use the "stdcall" calling convention, and are assumed to return the windows specific "HRESULT" code. "HRESULT" values contain information specifying whether the function call failed or succeeded, together with additional error code. If the return value signals a failure, an "OSError" is automatically raised. Changed in version 3.3: "WindowsError" used to be raised. class ctypes.WinDLL(name, mode=DEFAULT_MODE, handle=None, use_errno=False, use_last_error=False, winmode=None) Windows only: Instances of this class represent loaded shared libraries, functions in these libraries use the "stdcall" calling convention, and are assumed to return "int" by default. The Python *global interpreter lock* is released before calling any function exported by these libraries, and reacquired afterwards. class ctypes.PyDLL(name, mode=DEFAULT_MODE, handle=None) Instances of this class behave like "CDLL" instances, except that the Python GIL is *not* released during the function call, and after the function execution the Python error flag is checked. If the error flag is set, a Python exception is raised. Thus, this is only useful to call Python C api functions directly. All these classes can be instantiated by calling them with at least one argument, the pathname of the shared library. If you have an existing handle to an already loaded shared library, it can be passed as the "handle" named parameter, otherwise the underlying platforms "dlopen" or "LoadLibrary" function is used to load the library into the process, and to get a handle to it. The *mode* parameter can be used to specify how the library is loaded. For details, consult the *dlopen(3)* manpage. On Windows, *mode* is ignored. On posix systems, RTLD_NOW is always added, and is not configurable. The *use_errno* parameter, when set to true, enables a ctypes mechanism that allows accessing the system "errno" error number in a safe way. "ctypes" maintains a thread-local copy of the systems "errno" variable; if you call foreign functions created with "use_errno=True" then the "errno" value before the function call is swapped with the ctypes private copy, the same happens immediately after the function call. The function "ctypes.get_errno()" returns the value of the ctypes private copy, and the function "ctypes.set_errno()" changes the ctypes private copy to a new value and returns the former value. The *use_last_error* parameter, when set to true, enables the same mechanism for the Windows error code which is managed by the "GetLastError()" and "SetLastError()" Windows API functions; "ctypes.get_last_error()" and "ctypes.set_last_error()" are used to request and change the ctypes private copy of the windows error code. The *winmode* parameter is used on Windows to specify how the library is loaded (since *mode* is ignored). It takes any value that is valid for the Win32 API "LoadLibraryEx" flags parameter. When omitted, the default is to use the flags that result in the most secure DLL load to avoiding issues such as DLL hijacking. Passing the full path to the DLL is the safest way to ensure the correct library and dependencies are loaded. Changed in version 3.8: Added *winmode* parameter. ctypes.RTLD_GLOBAL Flag to use as *mode* parameter. On platforms where this flag is not available, it is defined as the integer zero. ctypes.RTLD_LOCAL Flag to use as *mode* parameter. On platforms where this is not available, it is the same as *RTLD_GLOBAL*. ctypes.DEFAULT_MODE The default mode which is used to load shared libraries. On OSX 10.3, this is *RTLD_GLOBAL*, otherwise it is the same as *RTLD_LOCAL*. Instances of these classes have no public methods. Functions exported by the shared library can be accessed as attributes or by index. Please note that accessing the function through an attribute caches the result and therefore accessing it repeatedly returns the same object each time. On the other hand, accessing it through an index returns a new object each time: >>> from ctypes import CDLL >>> libc = CDLL("libc.so.6") # On Linux >>> libc.time == libc.time True >>> libc['time'] == libc['time'] False The following public attributes are available, their name starts with an underscore to not clash with exported function names: PyDLL._handle The system handle used to access the library. PyDLL._name The name of the library passed in the constructor. Shared libraries can also be loaded by using one of the prefabricated objects, which are instances of the "LibraryLoader" class, either by calling the "LoadLibrary()" method, or by retrieving the library as attribute of the loader instance. class ctypes.LibraryLoader(dlltype) Class which loads shared libraries. *dlltype* should be one of the "CDLL", "PyDLL", "WinDLL", or "OleDLL" types. "__getattr__()" has special behavior: It allows loading a shared library by accessing it as attribute of a library loader instance. The result is cached, so repeated attribute accesses return the same library each time. LoadLibrary(name) Load a shared library into the process and return it. This method always returns a new instance of the library. These prefabricated library loaders are available: ctypes.cdll Creates "CDLL" instances. ctypes.windll Windows only: Creates "WinDLL" instances. ctypes.oledll Windows only: Creates "OleDLL" instances. ctypes.pydll Creates "PyDLL" instances. For accessing the C Python api directly, a ready-to-use Python shared library object is available: ctypes.pythonapi An instance of "PyDLL" that exposes Python C API functions as attributes. Note that all these functions are assumed to return C "int", which is of course not always the truth, so you have to assign the correct "restype" attribute to use these functions. Loading a library through any of these objects raises an auditing event "ctypes.dlopen" with string argument "name", the name used to load the library. Accessing a function on a loaded library raises an auditing event "ctypes.dlsym" with arguments "library" (the library object) and "name" (the symbol’s name as a string or integer). In cases when only the library handle is available rather than the object, accessing a function raises an auditing event "ctypes.dlsym/handle" with arguments "handle" (the raw library handle) and "name". Foreign functions ----------------- As explained in the previous section, foreign functions can be accessed as attributes of loaded shared libraries. The function objects created in this way by default accept any number of arguments, accept any ctypes data instances as arguments, and return the default result type specified by the library loader. They are instances of a private class: class ctypes._FuncPtr Base class for C callable foreign functions. Instances of foreign functions are also C compatible data types; they represent C function pointers. This behavior can be customized by assigning to special attributes of the foreign function object. restype Assign a ctypes type to specify the result type of the foreign function. Use "None" for "void", a function not returning anything. It is possible to assign a callable Python object that is not a ctypes type, in this case the function is assumed to return a C "int", and the callable will be called with this integer, allowing further processing or error checking. Using this is deprecated, for more flexible post processing or error checking use a ctypes data type as "restype" and assign a callable to the "errcheck" attribute. argtypes Assign a tuple of ctypes types to specify the argument types that the function accepts. Functions using the "stdcall" calling convention can only be called with the same number of arguments as the length of this tuple; functions using the C calling convention accept additional, unspecified arguments as well. When a foreign function is called, each actual argument is passed to the "from_param()" class method of the items in the "argtypes" tuple, this method allows adapting the actual argument to an object that the foreign function accepts. For example, a "c_char_p" item in the "argtypes" tuple will convert a string passed as argument into a bytes object using ctypes conversion rules. New: It is now possible to put items in argtypes which are not ctypes types, but each item must have a "from_param()" method which returns a value usable as argument (integer, string, ctypes instance). This allows defining adapters that can adapt custom objects as function parameters. errcheck Assign a Python function or another callable to this attribute. The callable will be called with three or more arguments: callable(result, func, arguments) *result* is what the foreign function returns, as specified by the "restype" attribute. *func* is the foreign function object itself, this allows reusing the same callable object to check or post process the results of several functions. *arguments* is a tuple containing the parameters originally passed to the function call, this allows specializing the behavior on the arguments used. The object that this function returns will be returned from the foreign function call, but it can also check the result value and raise an exception if the foreign function call failed. exception ctypes.ArgumentError This exception is raised when a foreign function call cannot convert one of the passed arguments. On Windows, when a foreign function call raises a system exception (for example, due to an access violation), it will be captured and replaced with a suitable Python exception. Further, an auditing event "ctypes.seh_exception" with argument "code" will be raised, allowing an audit hook to replace the exception with its own. Some ways to invoke foreign function calls may raise an auditing event "ctypes.call_function" with arguments "function pointer" and "arguments". Function prototypes ------------------- Foreign functions can also be created by instantiating function prototypes. Function prototypes are similar to function prototypes in C; they describe a function (return type, argument types, calling convention) without defining an implementation. The factory functions must be called with the desired result type and the argument types of the function, and can be used as decorator factories, and as such, be applied to functions through the "@wrapper" syntax. See Callback functions for examples. ctypes.CFUNCTYPE(restype, *argtypes, use_errno=False, use_last_error=False) The returned function prototype creates functions that use the standard C calling convention. The function will release the GIL during the call. If *use_errno* is set to true, the ctypes private copy of the system "errno" variable is exchanged with the real "errno" value before and after the call; *use_last_error* does the same for the Windows error code. ctypes.WINFUNCTYPE(restype, *argtypes, use_errno=False, use_last_error=False) Windows only: The returned function prototype creates functions that use the "stdcall" calling convention. The function will release the GIL during the call. *use_errno* and *use_last_error* have the same meaning as above. ctypes.PYFUNCTYPE(restype, *argtypes) The returned function prototype creates functions that use the Python calling convention. The function will *not* release the GIL during the call. Function prototypes created by these factory functions can be instantiated in different ways, depending on the type and number of the parameters in the call: prototype(address) Returns a foreign function at the specified address which must be an integer. prototype(callable) Create a C callable function (a callback function) from a Python *callable*. prototype(func_spec[, paramflags]) Returns a foreign function exported by a shared library. *func_spec* must be a 2-tuple "(name_or_ordinal, library)". The first item is the name of the exported function as string, or the ordinal of the exported function as small integer. The second item is the shared library instance. prototype(vtbl_index, name[, paramflags[, iid]]) Returns a foreign function that will call a COM method. *vtbl_index* is the index into the virtual function table, a small non-negative integer. *name* is name of the COM method. *iid* is an optional pointer to the interface identifier which is used in extended error reporting. COM methods use a special calling convention: They require a pointer to the COM interface as first argument, in addition to those parameters that are specified in the "argtypes" tuple. The optional *paramflags* parameter creates foreign function wrappers with much more functionality than the features described above. *paramflags* must be a tuple of the same length as "argtypes". Each item in this tuple contains further information about a parameter, it must be a tuple containing one, two, or three items. The first item is an integer containing a combination of direction flags for the parameter: 1 Specifies an input parameter to the function. 2 Output parameter. The foreign function fills in a value. 4 Input parameter which defaults to the integer zero. The optional second item is the parameter name as string. If this is specified, the foreign function can be called with named parameters. The optional third item is the default value for this parameter. This example demonstrates how to wrap the Windows "MessageBoxW" function so that it supports default parameters and named arguments. The C declaration from the windows header file is this: WINUSERAPI int WINAPI MessageBoxW( HWND hWnd, LPCWSTR lpText, LPCWSTR lpCaption, UINT uType); Here is the wrapping with "ctypes": >>> from ctypes import c_int, WINFUNCTYPE, windll >>> from ctypes.wintypes import HWND, LPCWSTR, UINT >>> prototype = WINFUNCTYPE(c_int, HWND, LPCWSTR, LPCWSTR, UINT) >>> paramflags = (1, "hwnd", 0), (1, "text", "Hi"), (1, "caption", "Hello from ctypes"), (1, "flags", 0) >>> MessageBox = prototype(("MessageBoxW", windll.user32), paramflags) The "MessageBox" foreign function can now be called in these ways: >>> MessageBox() >>> MessageBox(text="Spam, spam, spam") >>> MessageBox(flags=2, text="foo bar") A second example demonstrates output parameters. The win32 "GetWindowRect" function retrieves the dimensions of a specified window by copying them into "RECT" structure that the caller has to supply. Here is the C declaration: WINUSERAPI BOOL WINAPI GetWindowRect( HWND hWnd, LPRECT lpRect); Here is the wrapping with "ctypes": >>> from ctypes import POINTER, WINFUNCTYPE, windll, WinError >>> from ctypes.wintypes import BOOL, HWND, RECT >>> prototype = WINFUNCTYPE(BOOL, HWND, POINTER(RECT)) >>> paramflags = (1, "hwnd"), (2, "lprect") >>> GetWindowRect = prototype(("GetWindowRect", windll.user32), paramflags) >>> Functions with output parameters will automatically return the output parameter value if there is a single one, or a tuple containing the output parameter values when there are more than one, so the GetWindowRect function now returns a RECT instance, when called. Output parameters can be combined with the "errcheck" protocol to do further output processing and error checking. The win32 "GetWindowRect" api function returns a "BOOL" to signal success or failure, so this function could do the error checking, and raises an exception when the api call failed: >>> def errcheck(result, func, args): ... if not result: ... raise WinError() ... return args ... >>> GetWindowRect.errcheck = errcheck >>> If the "errcheck" function returns the argument tuple it receives unchanged, "ctypes" continues the normal processing it does on the output parameters. If you want to return a tuple of window coordinates instead of a "RECT" instance, you can retrieve the fields in the function and return them instead, the normal processing will no longer take place: >>> def errcheck(result, func, args): ... if not result: ... raise WinError() ... rc = args[1] ... return rc.left, rc.top, rc.bottom, rc.right ... >>> GetWindowRect.errcheck = errcheck >>> Utility functions ----------------- ctypes.addressof(obj) Returns the address of the memory buffer as integer. *obj* must be an instance of a ctypes type. Raises an auditing event "ctypes.addressof" with argument "obj". ctypes.alignment(obj_or_type) Returns the alignment requirements of a ctypes type. *obj_or_type* must be a ctypes type or instance. ctypes.byref(obj[, offset]) Returns a light-weight pointer to *obj*, which must be an instance of a ctypes type. *offset* defaults to zero, and must be an integer that will be added to the internal pointer value. "byref(obj, offset)" corresponds to this C code: (((char *)&obj) + offset) The returned object can only be used as a foreign function call parameter. It behaves similar to "pointer(obj)", but the construction is a lot faster. ctypes.cast(obj, type) This function is similar to the cast operator in C. It returns a new instance of *type* which points to the same memory block as *obj*. *type* must be a pointer type, and *obj* must be an object that can be interpreted as a pointer. ctypes.create_string_buffer(init_or_size, size=None) This function creates a mutable character buffer. The returned object is a ctypes array of "c_char". *init_or_size* must be an integer which specifies the size of the array, or a bytes object which will be used to initialize the array items. If a bytes object is specified as first argument, the buffer is made one item larger than its length so that the last element in the array is a NUL termination character. An integer can be passed as second argument which allows specifying the size of the array if the length of the bytes should not be used. Raises an auditing event "ctypes.create_string_buffer" with arguments "init", "size". ctypes.create_unicode_buffer(init_or_size, size=None) This function creates a mutable unicode character buffer. The returned object is a ctypes array of "c_wchar". *init_or_size* must be an integer which specifies the size of the array, or a string which will be used to initialize the array items. If a string is specified as first argument, the buffer is made one item larger than the length of the string so that the last element in the array is a NUL termination character. An integer can be passed as second argument which allows specifying the size of the array if the length of the string should not be used. Raises an auditing event "ctypes.create_unicode_buffer" with arguments "init", "size". ctypes.DllCanUnloadNow() Windows only: This function is a hook which allows implementing in- process COM servers with ctypes. It is called from the DllCanUnloadNow function that the _ctypes extension dll exports. ctypes.DllGetClassObject() Windows only: This function is a hook which allows implementing in- process COM servers with ctypes. It is called from the DllGetClassObject function that the "_ctypes" extension dll exports. ctypes.util.find_library(name) Try to find a library and return a pathname. *name* is the library name without any prefix like "lib", suffix like ".so", ".dylib" or version number (this is the form used for the posix linker option "-l"). If no library can be found, returns "None". The exact functionality is system dependent. ctypes.util.find_msvcrt() Windows only: return the filename of the VC runtime library used by Python, and by the extension modules. If the name of the library cannot be determined, "None" is returned. If you need to free memory, for example, allocated by an extension module with a call to the "free(void *)", it is important that you use the function in the same library that allocated the memory. ctypes.FormatError([code]) Windows only: Returns a textual description of the error code *code*. If no error code is specified, the last error code is used by calling the Windows api function GetLastError. ctypes.GetLastError() Windows only: Returns the last error code set by Windows in the calling thread. This function calls the Windows *GetLastError()* function directly, it does not return the ctypes-private copy of the error code. ctypes.get_errno() Returns the current value of the ctypes-private copy of the system "errno" variable in the calling thread. Raises an auditing event "ctypes.get_errno" with no arguments. ctypes.get_last_error() Windows only: returns the current value of the ctypes-private copy of the system "LastError" variable in the calling thread. Raises an auditing event "ctypes.get_last_error" with no arguments. ctypes.memmove(dst, src, count) Same as the standard C memmove library function: copies *count* bytes from *src* to *dst*. *dst* and *src* must be integers or ctypes instances that can be converted to pointers. ctypes.memset(dst, c, count) Same as the standard C memset library function: fills the memory block at address *dst* with *count* bytes of value *c*. *dst* must be an integer specifying an address, or a ctypes instance. ctypes.POINTER(type) This factory function creates and returns a new ctypes pointer type. Pointer types are cached and reused internally, so calling this function repeatedly is cheap. *type* must be a ctypes type. ctypes.pointer(obj) This function creates a new pointer instance, pointing to *obj*. The returned object is of the type "POINTER(type(obj))". Note: If you just want to pass a pointer to an object to a foreign function call, you should use "byref(obj)" which is much faster. ctypes.resize(obj, size) This function resizes the internal memory buffer of *obj*, which must be an instance of a ctypes type. It is not possible to make the buffer smaller than the native size of the objects type, as given by "sizeof(type(obj))", but it is possible to enlarge the buffer. ctypes.set_errno(value) Set the current value of the ctypes-private copy of the system "errno" variable in the calling thread to *value* and return the previous value. Raises an auditing event "ctypes.set_errno" with argument "errno". ctypes.set_last_error(value) Windows only: set the current value of the ctypes-private copy of the system "LastError" variable in the calling thread to *value* and return the previous value. Raises an auditing event "ctypes.set_last_error" with argument "error". ctypes.sizeof(obj_or_type) Returns the size in bytes of a ctypes type or instance memory buffer. Does the same as the C "sizeof" operator. ctypes.string_at(address, size=-1) This function returns the C string starting at memory address *address* as a bytes object. If size is specified, it is used as size, otherwise the string is assumed to be zero-terminated. Raises an auditing event "ctypes.string_at" with arguments "address", "size". ctypes.WinError(code=None, descr=None) Windows only: this function is probably the worst-named thing in ctypes. It creates an instance of OSError. If *code* is not specified, "GetLastError" is called to determine the error code. If *descr* is not specified, "FormatError()" is called to get a textual description of the error. Changed in version 3.3: An instance of "WindowsError" used to be created. ctypes.wstring_at(address, size=-1) This function returns the wide character string starting at memory address *address* as a string. If *size* is specified, it is used as the number of characters of the string, otherwise the string is assumed to be zero-terminated. Raises an auditing event "ctypes.wstring_at" with arguments "address", "size". Data types ---------- class ctypes._CData This non-public class is the common base class of all ctypes data types. Among other things, all ctypes type instances contain a memory block that hold C compatible data; the address of the memory block is returned by the "addressof()" helper function. Another instance variable is exposed as "_objects"; this contains other Python objects that need to be kept alive in case the memory block contains pointers. Common methods of ctypes data types, these are all class methods (to be exact, they are methods of the *metaclass*): from_buffer(source[, offset]) This method returns a ctypes instance that shares the buffer of the *source* object. The *source* object must support the writeable buffer interface. The optional *offset* parameter specifies an offset into the source buffer in bytes; the default is zero. If the source buffer is not large enough a "ValueError" is raised. Raises an auditing event "ctypes.cdata/buffer" with arguments "pointer", "size", "offset". from_buffer_copy(source[, offset]) This method creates a ctypes instance, copying the buffer from the *source* object buffer which must be readable. The optional *offset* parameter specifies an offset into the source buffer in bytes; the default is zero. If the source buffer is not large enough a "ValueError" is raised. Raises an auditing event "ctypes.cdata/buffer" with arguments "pointer", "size", "offset". from_address(address) This method returns a ctypes type instance using the memory specified by *address* which must be an integer. This method, and others that indirectly call this method, raises an auditing event "ctypes.cdata" with argument "address". from_param(obj) This method adapts *obj* to a ctypes type. It is called with the actual object used in a foreign function call when the type is present in the foreign function’s "argtypes" tuple; it must return an object that can be used as a function call parameter. All ctypes data types have a default implementation of this classmethod that normally returns *obj* if that is an instance of the type. Some types accept other objects as well. in_dll(library, name) This method returns a ctypes type instance exported by a shared library. *name* is the name of the symbol that exports the data, *library* is the loaded shared library. Common instance variables of ctypes data types: _b_base_ Sometimes ctypes data instances do not own the memory block they contain, instead they share part of the memory block of a base object. The "_b_base_" read-only member is the root ctypes object that owns the memory block. _b_needsfree_ This read-only variable is true when the ctypes data instance has allocated the memory block itself, false otherwise. _objects This member is either "None" or a dictionary containing Python objects that need to be kept alive so that the memory block contents is kept valid. This object is only exposed for debugging; never modify the contents of this dictionary. Fundamental data types ---------------------- class ctypes._SimpleCData This non-public class is the base class of all fundamental ctypes data types. It is mentioned here because it contains the common attributes of the fundamental ctypes data types. "_SimpleCData" is a subclass of "_CData", so it inherits their methods and attributes. ctypes data types that are not and do not contain pointers can now be pickled. Instances have a single attribute: value This attribute contains the actual value of the instance. For integer and pointer types, it is an integer, for character types, it is a single character bytes object or string, for character pointer types it is a Python bytes object or string. When the "value" attribute is retrieved from a ctypes instance, usually a new object is returned each time. "ctypes" does *not* implement original object return, always a new object is constructed. The same is true for all other ctypes object instances. Fundamental data types, when returned as foreign function call results, or, for example, by retrieving structure field members or array items, are transparently converted to native Python types. In other words, if a foreign function has a "restype" of "c_char_p", you will always receive a Python bytes object, *not* a "c_char_p" instance. Subclasses of fundamental data types do *not* inherit this behavior. So, if a foreign functions "restype" is a subclass of "c_void_p", you will receive an instance of this subclass from the function call. Of course, you can get the value of the pointer by accessing the "value" attribute. These are the fundamental ctypes data types: class ctypes.c_byte Represents the C "signed char" datatype, and interprets the value as small integer. The constructor accepts an optional integer initializer; no overflow checking is done. class ctypes.c_char Represents the C "char" datatype, and interprets the value as a single character. The constructor accepts an optional string initializer, the length of the string must be exactly one character. class ctypes.c_char_p Represents the C "char *" datatype when it points to a zero- terminated string. For a general character pointer that may also point to binary data, "POINTER(c_char)" must be used. The constructor accepts an integer address, or a bytes object. class ctypes.c_double Represents the C "double" datatype. The constructor accepts an optional float initializer. class ctypes.c_longdouble Represents the C "long double" datatype. The constructor accepts an optional float initializer. On platforms where "sizeof(long double) == sizeof(double)" it is an alias to "c_double". class ctypes.c_float Represents the C "float" datatype. The constructor accepts an optional float initializer. class ctypes.c_int Represents the C "signed int" datatype. The constructor accepts an optional integer initializer; no overflow checking is done. On platforms where "sizeof(int) == sizeof(long)" it is an alias to "c_long". class ctypes.c_int8 Represents the C 8-bit "signed int" datatype. Usually an alias for "c_byte". class ctypes.c_int16 Represents the C 16-bit "signed int" datatype. Usually an alias for "c_short". class ctypes.c_int32 Represents the C 32-bit "signed int" datatype. Usually an alias for "c_int". class ctypes.c_int64 Represents the C 64-bit "signed int" datatype. Usually an alias for "c_longlong". class ctypes.c_long Represents the C "signed long" datatype. The constructor accepts an optional integer initializer; no overflow checking is done. class ctypes.c_longlong Represents the C "signed long long" datatype. The constructor accepts an optional integer initializer; no overflow checking is done. class ctypes.c_short Represents the C "signed short" datatype. The constructor accepts an optional integer initializer; no overflow checking is done. class ctypes.c_size_t Represents the C "size_t" datatype. class ctypes.c_ssize_t Represents the C "ssize_t" datatype. New in version 3.2. class ctypes.c_ubyte Represents the C "unsigned char" datatype, it interprets the value as small integer. The constructor accepts an optional integer initializer; no overflow checking is done. class ctypes.c_uint Represents the C "unsigned int" datatype. The constructor accepts an optional integer initializer; no overflow checking is done. On platforms where "sizeof(int) == sizeof(long)" it is an alias for "c_ulong". class ctypes.c_uint8 Represents the C 8-bit "unsigned int" datatype. Usually an alias for "c_ubyte". class ctypes.c_uint16 Represents the C 16-bit "unsigned int" datatype. Usually an alias for "c_ushort". class ctypes.c_uint32 Represents the C 32-bit "unsigned int" datatype. Usually an alias for "c_uint". class ctypes.c_uint64 Represents the C 64-bit "unsigned int" datatype. Usually an alias for "c_ulonglong". class ctypes.c_ulong Represents the C "unsigned long" datatype. The constructor accepts an optional integer initializer; no overflow checking is done. class ctypes.c_ulonglong Represents the C "unsigned long long" datatype. The constructor accepts an optional integer initializer; no overflow checking is done. class ctypes.c_ushort Represents the C "unsigned short" datatype. The constructor accepts an optional integer initializer; no overflow checking is done. class ctypes.c_void_p Represents the C "void *" type. The value is represented as integer. The constructor accepts an optional integer initializer. class ctypes.c_wchar Represents the C "wchar_t" datatype, and interprets the value as a single character unicode string. The constructor accepts an optional string initializer, the length of the string must be exactly one character. class ctypes.c_wchar_p Represents the C "wchar_t *" datatype, which must be a pointer to a zero-terminated wide character string. The constructor accepts an integer address, or a string. class ctypes.c_bool Represent the C "bool" datatype (more accurately, "_Bool" from C99). Its value can be "True" or "False", and the constructor accepts any object that has a truth value. class ctypes.HRESULT Windows only: Represents a "HRESULT" value, which contains success or error information for a function or method call. class ctypes.py_object Represents the C "PyObject *" datatype. Calling this without an argument creates a "NULL" "PyObject *" pointer. The "ctypes.wintypes" module provides quite some other Windows specific data types, for example "HWND", "WPARAM", or "DWORD". Some useful structures like "MSG" or "RECT" are also defined. Structured data types --------------------- class ctypes.Union(*args, **kw) Abstract base class for unions in native byte order. class ctypes.BigEndianStructure(*args, **kw) Abstract base class for structures in *big endian* byte order. class ctypes.LittleEndianStructure(*args, **kw) Abstract base class for structures in *little endian* byte order. Structures with non-native byte order cannot contain pointer type fields, or any other data types containing pointer type fields. class ctypes.Structure(*args, **kw) Abstract base class for structures in *native* byte order. Concrete structure and union types must be created by subclassing one of these types, and at least define a "_fields_" class variable. "ctypes" will create *descriptor*s which allow reading and writing the fields by direct attribute accesses. These are the _fields_ A sequence defining the structure fields. The items must be 2-tuples or 3-tuples. The first item is the name of the field, the second item specifies the type of the field; it can be any ctypes data type. For integer type fields like "c_int", a third optional item can be given. It must be a small positive integer defining the bit width of the field. Field names must be unique within one structure or union. This is not checked, only one field can be accessed when names are repeated. It is possible to define the "_fields_" class variable *after* the class statement that defines the Structure subclass, this allows creating data types that directly or indirectly reference themselves: class List(Structure): pass List._fields_ = [("pnext", POINTER(List)), ... ] The "_fields_" class variable must, however, be defined before the type is first used (an instance is created, "sizeof()" is called on it, and so on). Later assignments to the "_fields_" class variable will raise an AttributeError. It is possible to define sub-subclasses of structure types, they inherit the fields of the base class plus the "_fields_" defined in the sub-subclass, if any. _pack_ An optional small integer that allows overriding the alignment of structure fields in the instance. "_pack_" must already be defined when "_fields_" is assigned, otherwise it will have no effect. _anonymous_ An optional sequence that lists the names of unnamed (anonymous) fields. "_anonymous_" must be already defined when "_fields_" is assigned, otherwise it will have no effect. The fields listed in this variable must be structure or union type fields. "ctypes" will create descriptors in the structure type that allows accessing the nested fields directly, without the need to create the structure or union field. Here is an example type (Windows): class _U(Union): _fields_ = [("lptdesc", POINTER(TYPEDESC)), ("lpadesc", POINTER(ARRAYDESC)), ("hreftype", HREFTYPE)] class TYPEDESC(Structure): _anonymous_ = ("u",) _fields_ = [("u", _U), ("vt", VARTYPE)] The "TYPEDESC" structure describes a COM data type, the "vt" field specifies which one of the union fields is valid. Since the "u" field is defined as anonymous field, it is now possible to access the members directly off the TYPEDESC instance. "td.lptdesc" and "td.u.lptdesc" are equivalent, but the former is faster since it does not need to create a temporary union instance: td = TYPEDESC() td.vt = VT_PTR td.lptdesc = POINTER(some_type) td.u.lptdesc = POINTER(some_type) It is possible to define sub-subclasses of structures, they inherit the fields of the base class. If the subclass definition has a separate "_fields_" variable, the fields specified in this are appended to the fields of the base class. Structure and union constructors accept both positional and keyword arguments. Positional arguments are used to initialize member fields in the same order as they are appear in "_fields_". Keyword arguments in the constructor are interpreted as attribute assignments, so they will initialize "_fields_" with the same name, or create new attributes for names not present in "_fields_". Arrays and pointers ------------------- class ctypes.Array(*args) Abstract base class for arrays. The recommended way to create concrete array types is by multiplying any "ctypes" data type with a non-negative integer. Alternatively, you can subclass this type and define "_length_" and "_type_" class variables. Array elements can be read and written using standard subscript and slice accesses; for slice reads, the resulting object is *not* itself an "Array". _length_ A positive integer specifying the number of elements in the array. Out-of-range subscripts result in an "IndexError". Will be returned by "len()". _type_ Specifies the type of each element in the array. Array subclass constructors accept positional arguments, used to initialize the elements in order. class ctypes._Pointer Private, abstract base class for pointers. Concrete pointer types are created by calling "POINTER()" with the type that will be pointed to; this is done automatically by "pointer()". If a pointer points to an array, its elements can be read and written using standard subscript and slice accesses. Pointer objects have no size, so "len()" will raise "TypeError". Negative subscripts will read from the memory *before* the pointer (as in C), and out-of-range subscripts will probably crash with an access violation (if you’re lucky). _type_ Specifies the type pointed to. contents Returns the object to which to pointer points. Assigning to this attribute changes the pointer to point to the assigned object. "curses.ascii" — Utilities for ASCII characters *********************************************** ====================================================================== The "curses.ascii" module supplies name constants for ASCII characters and functions to test membership in various ASCII character classes. The constants supplied are names for control characters as follows: +----------------+------------------------------------------------+ | Name | Meaning | |================|================================================| | "NUL" | | +----------------+------------------------------------------------+ | "SOH" | Start of heading, console interrupt | +----------------+------------------------------------------------+ | "STX" | Start of text | +----------------+------------------------------------------------+ | "ETX" | End of text | +----------------+------------------------------------------------+ | "EOT" | End of transmission | +----------------+------------------------------------------------+ | "ENQ" | Enquiry, goes with "ACK" flow control | +----------------+------------------------------------------------+ | "ACK" | Acknowledgement | +----------------+------------------------------------------------+ | "BEL" | Bell | +----------------+------------------------------------------------+ | "BS" | Backspace | +----------------+------------------------------------------------+ | "TAB" | Tab | +----------------+------------------------------------------------+ | "HT" | Alias for "TAB": “Horizontal tab” | +----------------+------------------------------------------------+ | "LF" | Line feed | +----------------+------------------------------------------------+ | "NL" | Alias for "LF": “New line” | +----------------+------------------------------------------------+ | "VT" | Vertical tab | +----------------+------------------------------------------------+ | "FF" | Form feed | +----------------+------------------------------------------------+ | "CR" | Carriage return | +----------------+------------------------------------------------+ | "SO" | Shift-out, begin alternate character set | +----------------+------------------------------------------------+ | "SI" | Shift-in, resume default character set | +----------------+------------------------------------------------+ | "DLE" | Data-link escape | +----------------+------------------------------------------------+ | "DC1" | XON, for flow control | +----------------+------------------------------------------------+ | "DC2" | Device control 2, block-mode flow control | +----------------+------------------------------------------------+ | "DC3" | XOFF, for flow control | +----------------+------------------------------------------------+ | "DC4" | Device control 4 | +----------------+------------------------------------------------+ | "NAK" | Negative acknowledgement | +----------------+------------------------------------------------+ | "SYN" | Synchronous idle | +----------------+------------------------------------------------+ | "ETB" | End transmission block | +----------------+------------------------------------------------+ | "CAN" | Cancel | +----------------+------------------------------------------------+ | "EM" | End of medium | +----------------+------------------------------------------------+ | "SUB" | Substitute | +----------------+------------------------------------------------+ | "ESC" | Escape | +----------------+------------------------------------------------+ | "FS" | File separator | +----------------+------------------------------------------------+ | "GS" | Group separator | +----------------+------------------------------------------------+ | "RS" | Record separator, block-mode terminator | +----------------+------------------------------------------------+ | "US" | Unit separator | +----------------+------------------------------------------------+ | "SP" | Space | +----------------+------------------------------------------------+ | "DEL" | Delete | +----------------+------------------------------------------------+ Note that many of these have little practical significance in modern usage. The mnemonics derive from teleprinter conventions that predate digital computers. The module supplies the following functions, patterned on those in the standard C library: curses.ascii.isalnum(c) Checks for an ASCII alphanumeric character; it is equivalent to "isalpha(c) or isdigit(c)". curses.ascii.isalpha(c) Checks for an ASCII alphabetic character; it is equivalent to "isupper(c) or islower(c)". curses.ascii.isascii(c) Checks for a character value that fits in the 7-bit ASCII set. curses.ascii.isblank(c) Checks for an ASCII whitespace character; space or horizontal tab. curses.ascii.iscntrl(c) Checks for an ASCII control character (in the range 0x00 to 0x1f or 0x7f). curses.ascii.isdigit(c) Checks for an ASCII decimal digit, "'0'" through "'9'". This is equivalent to "c in string.digits". curses.ascii.isgraph(c) Checks for ASCII any printable character except space. curses.ascii.islower(c) Checks for an ASCII lower-case character. curses.ascii.isprint(c) Checks for any ASCII printable character including space. curses.ascii.ispunct(c) Checks for any printable ASCII character which is not a space or an alphanumeric character. curses.ascii.isspace(c) Checks for ASCII white-space characters; space, line feed, carriage return, form feed, horizontal tab, vertical tab. curses.ascii.isupper(c) Checks for an ASCII uppercase letter. curses.ascii.isxdigit(c) Checks for an ASCII hexadecimal digit. This is equivalent to "c in string.hexdigits". curses.ascii.isctrl(c) Checks for an ASCII control character (ordinal values 0 to 31). curses.ascii.ismeta(c) Checks for a non-ASCII character (ordinal values 0x80 and above). These functions accept either integers or single-character strings; when the argument is a string, it is first converted using the built- in function "ord()". Note that all these functions check ordinal bit values derived from the character of the string you pass in; they do not actually know anything about the host machine’s character encoding. The following two functions take either a single-character string or integer byte value; they return a value of the same type. curses.ascii.ascii(c) Return the ASCII value corresponding to the low 7 bits of *c*. curses.ascii.ctrl(c) Return the control character corresponding to the given character (the character bit value is bitwise-anded with 0x1f). curses.ascii.alt(c) Return the 8-bit character corresponding to the given ASCII character (the character bit value is bitwise-ored with 0x80). The following function takes either a single-character string or integer value; it returns a string. curses.ascii.unctrl(c) Return a string representation of the ASCII character *c*. If *c* is printable, this string is the character itself. If the character is a control character (0x00–0x1f) the string consists of a caret ("'^'") followed by the corresponding uppercase letter. If the character is an ASCII delete (0x7f) the string is "'^?'". If the character has its meta bit (0x80) set, the meta bit is stripped, the preceding rules applied, and "'!'" prepended to the result. curses.ascii.controlnames A 33-element string array that contains the ASCII mnemonics for the thirty-two ASCII control characters from 0 (NUL) to 0x1f (US), in order, plus the mnemonic "SP" for the space character. "curses.panel" — A panel stack extension for curses *************************************************** ====================================================================== Panels are windows with the added feature of depth, so they can be stacked on top of each other, and only the visible portions of each window will be displayed. Panels can be added, moved up or down in the stack, and removed. Functions ========= The module "curses.panel" defines the following functions: curses.panel.bottom_panel() Returns the bottom panel in the panel stack. curses.panel.new_panel(win) Returns a panel object, associating it with the given window *win*. Be aware that you need to keep the returned panel object referenced explicitly. If you don’t, the panel object is garbage collected and removed from the panel stack. curses.panel.top_panel() Returns the top panel in the panel stack. curses.panel.update_panels() Updates the virtual screen after changes in the panel stack. This does not call "curses.doupdate()", so you’ll have to do this yourself. Panel Objects ============= Panel objects, as returned by "new_panel()" above, are windows with a stacking order. There’s always a window associated with a panel which determines the content, while the panel methods are responsible for the window’s depth in the panel stack. Panel objects have the following methods: Panel.above() Returns the panel above the current panel. Panel.below() Returns the panel below the current panel. Panel.bottom() Push the panel to the bottom of the stack. Panel.hidden() Returns "True" if the panel is hidden (not visible), "False" otherwise. Panel.hide() Hide the panel. This does not delete the object, it just makes the window on screen invisible. Panel.move(y, x) Move the panel to the screen coordinates "(y, x)". Panel.replace(win) Change the window associated with the panel to the window *win*. Panel.set_userptr(obj) Set the panel’s user pointer to *obj*. This is used to associate an arbitrary piece of data with the panel, and can be any Python object. Panel.show() Display the panel (which might have been hidden). Panel.top() Push panel to the top of the stack. Panel.userptr() Returns the user pointer for the panel. This might be any Python object. Panel.window() Returns the window object associated with the panel. "curses" — Terminal handling for character-cell displays ******************************************************** ====================================================================== The "curses" module provides an interface to the curses library, the de-facto standard for portable advanced terminal handling. While curses is most widely used in the Unix environment, versions are available for Windows, DOS, and possibly other systems as well. This extension module is designed to match the API of ncurses, an open- source curses library hosted on Linux and the BSD variants of Unix. Note: Whenever the documentation mentions a *character* it can be specified as an integer, a one-character Unicode string or a one- byte byte string.Whenever the documentation mentions a *character string* it can be specified as a Unicode string or a byte string. Note: Since version 5.4, the ncurses library decides how to interpret non- ASCII data using the "nl_langinfo" function. That means that you have to call "locale.setlocale()" in the application and encode Unicode strings using one of the system’s available encodings. This example uses the system’s default encoding: import locale locale.setlocale(locale.LC_ALL, '') code = locale.getpreferredencoding() Then use *code* as the encoding for "str.encode()" calls. See also: Module "curses.ascii" Utilities for working with ASCII characters, regardless of your locale settings. Module "curses.panel" A panel stack extension that adds depth to curses windows. Module "curses.textpad" Editable text widget for curses supporting **Emacs**-like bindings. Curses Programming with Python Tutorial material on using curses with Python, by Andrew Kuchling and Eric Raymond. The Tools/demo/ directory in the Python source distribution contains some example programs using the curses bindings provided by this module. Functions ========= The module "curses" defines the following exception: exception curses.error Exception raised when a curses library function returns an error. Note: Whenever *x* or *y* arguments to a function or a method are optional, they default to the current cursor location. Whenever *attr* is optional, it defaults to "A_NORMAL". The module "curses" defines the following functions: curses.baudrate() Return the output speed of the terminal in bits per second. On software terminal emulators it will have a fixed high value. Included for historical reasons; in former times, it was used to write output loops for time delays and occasionally to change interfaces depending on the line speed. curses.beep() Emit a short attention sound. curses.can_change_color() Return "True" or "False", depending on whether the programmer can change the colors displayed by the terminal. curses.cbreak() Enter cbreak mode. In cbreak mode (sometimes called “rare” mode) normal tty line buffering is turned off and characters are available to be read one by one. However, unlike raw mode, special characters (interrupt, quit, suspend, and flow control) retain their effects on the tty driver and calling program. Calling first "raw()" then "cbreak()" leaves the terminal in cbreak mode. curses.color_content(color_number) Return the intensity of the red, green, and blue (RGB) components in the color *color_number*, which must be between "0" and "COLORS - 1". Return a 3-tuple, containing the R,G,B values for the given color, which will be between "0" (no component) and "1000" (maximum amount of component). curses.color_pair(pair_number) Return the attribute value for displaying text in the specified color pair. Only the first 256 color pairs are supported. This attribute value can be combined with "A_STANDOUT", "A_REVERSE", and the other "A_*" attributes. "pair_number()" is the counterpart to this function. curses.curs_set(visibility) Set the cursor state. *visibility* can be set to "0", "1", or "2", for invisible, normal, or very visible. If the terminal supports the visibility requested, return the previous cursor state; otherwise raise an exception. On many terminals, the “visible” mode is an underline cursor and the “very visible” mode is a block cursor. curses.def_prog_mode() Save the current terminal mode as the “program” mode, the mode when the running program is using curses. (Its counterpart is the “shell” mode, for when the program is not in curses.) Subsequent calls to "reset_prog_mode()" will restore this mode. curses.def_shell_mode() Save the current terminal mode as the “shell” mode, the mode when the running program is not using curses. (Its counterpart is the “program” mode, when the program is using curses capabilities.) Subsequent calls to "reset_shell_mode()" will restore this mode. curses.delay_output(ms) Insert an *ms* millisecond pause in output. curses.doupdate() Update the physical screen. The curses library keeps two data structures, one representing the current physical screen contents and a virtual screen representing the desired next state. The "doupdate()" ground updates the physical screen to match the virtual screen. The virtual screen may be updated by a "noutrefresh()" call after write operations such as "addstr()" have been performed on a window. The normal "refresh()" call is simply "noutrefresh()" followed by "doupdate()"; if you have to update multiple windows, you can speed performance and perhaps reduce screen flicker by issuing "noutrefresh()" calls on all windows, followed by a single "doupdate()". curses.echo() Enter echo mode. In echo mode, each character input is echoed to the screen as it is entered. curses.endwin() De-initialize the library, and return terminal to normal status. curses.erasechar() Return the user’s current erase character as a one-byte bytes object. Under Unix operating systems this is a property of the controlling tty of the curses program, and is not set by the curses library itself. curses.filter() The "filter()" routine, if used, must be called before "initscr()" is called. The effect is that, during those calls, "LINES" is set to "1"; the capabilities "clear", "cup", "cud", "cud1", "cuu1", "cuu", "vpa" are disabled; and the "home" string is set to the value of "cr". The effect is that the cursor is confined to the current line, and so are screen updates. This may be used for enabling character-at-a-time line editing without touching the rest of the screen. curses.flash() Flash the screen. That is, change it to reverse-video and then change it back in a short interval. Some people prefer such as ‘visible bell’ to the audible attention signal produced by "beep()". curses.flushinp() Flush all input buffers. This throws away any typeahead that has been typed by the user and has not yet been processed by the program. curses.getmouse() After "getch()" returns "KEY_MOUSE" to signal a mouse event, this method should be called to retrieve the queued mouse event, represented as a 5-tuple "(id, x, y, z, bstate)". *id* is an ID value used to distinguish multiple devices, and *x*, *y*, *z* are the event’s coordinates. (*z* is currently unused.) *bstate* is an integer value whose bits will be set to indicate the type of event, and will be the bitwise OR of one or more of the following constants, where *n* is the button number from 1 to 4: "BUTTONn_PRESSED", "BUTTONn_RELEASED", "BUTTONn_CLICKED", "BUTTONn_DOUBLE_CLICKED", "BUTTONn_TRIPLE_CLICKED", "BUTTON_SHIFT", "BUTTON_CTRL", "BUTTON_ALT". curses.getsyx() Return the current coordinates of the virtual screen cursor as a tuple "(y, x)". If "leaveok" is currently "True", then return "(-1, -1)". curses.getwin(file) Read window related data stored in the file by an earlier "putwin()" call. The routine then creates and initializes a new window using that data, returning the new window object. curses.has_colors() Return "True" if the terminal can display colors; otherwise, return "False". curses.has_ic() Return "True" if the terminal has insert- and delete-character capabilities. This function is included for historical reasons only, as all modern software terminal emulators have such capabilities. curses.has_il() Return "True" if the terminal has insert- and delete-line capabilities, or can simulate them using scrolling regions. This function is included for historical reasons only, as all modern software terminal emulators have such capabilities. curses.has_key(ch) Take a key value *ch*, and return "True" if the current terminal type recognizes a key with that value. curses.halfdelay(tenths) Used for half-delay mode, which is similar to cbreak mode in that characters typed by the user are immediately available to the program. However, after blocking for *tenths* tenths of seconds, raise an exception if nothing has been typed. The value of *tenths* must be a number between "1" and "255". Use "nocbreak()" to leave half-delay mode. curses.init_color(color_number, r, g, b) Change the definition of a color, taking the number of the color to be changed followed by three RGB values (for the amounts of red, green, and blue components). The value of *color_number* must be between "0" and *COLORS - 1*. Each of *r*, *g*, *b*, must be a value between "0" and "1000". When "init_color()" is used, all occurrences of that color on the screen immediately change to the new definition. This function is a no-op on most terminals; it is active only if "can_change_color()" returns "True". curses.init_pair(pair_number, fg, bg) Change the definition of a color-pair. It takes three arguments: the number of the color-pair to be changed, the foreground color number, and the background color number. The value of *pair_number* must be between "1" and "COLOR_PAIRS - 1" (the "0" color pair is wired to white on black and cannot be changed). The value of *fg* and *bg* arguments must be between "0" and "COLORS - 1", or, after calling "use_default_colors()", "-1". If the color- pair was previously initialized, the screen is refreshed and all occurrences of that color-pair are changed to the new definition. curses.initscr() Initialize the library. Return a window object which represents the whole screen. Note: If there is an error opening the terminal, the underlying curses library may cause the interpreter to exit. curses.is_term_resized(nlines, ncols) Return "True" if "resize_term()" would modify the window structure, "False" otherwise. curses.isendwin() Return "True" if "endwin()" has been called (that is, the curses library has been deinitialized). curses.keyname(k) Return the name of the key numbered *k* as a bytes object. The name of a key generating printable ASCII character is the key’s character. The name of a control-key combination is a two-byte bytes object consisting of a caret ("b'^'") followed by the corresponding printable ASCII character. The name of an alt-key combination (128–255) is a bytes object consisting of the prefix "b'M-'" followed by the name of the corresponding ASCII character. curses.killchar() Return the user’s current line kill character as a one-byte bytes object. Under Unix operating systems this is a property of the controlling tty of the curses program, and is not set by the curses library itself. curses.longname() Return a bytes object containing the terminfo long name field describing the current terminal. The maximum length of a verbose description is 128 characters. It is defined only after the call to "initscr()". curses.meta(flag) If *flag* is "True", allow 8-bit characters to be input. If *flag* is "False", allow only 7-bit chars. curses.mouseinterval(interval) Set the maximum time in milliseconds that can elapse between press and release events in order for them to be recognized as a click, and return the previous interval value. The default value is 200 msec, or one fifth of a second. curses.mousemask(mousemask) Set the mouse events to be reported, and return a tuple "(availmask, oldmask)". *availmask* indicates which of the specified mouse events can be reported; on complete failure it returns "0". *oldmask* is the previous value of the given window’s mouse event mask. If this function is never called, no mouse events are ever reported. curses.napms(ms) Sleep for *ms* milliseconds. curses.newpad(nlines, ncols) Create and return a pointer to a new pad data structure with the given number of lines and columns. Return a pad as a window object. A pad is like a window, except that it is not restricted by the screen size, and is not necessarily associated with a particular part of the screen. Pads can be used when a large window is needed, and only a part of the window will be on the screen at one time. Automatic refreshes of pads (such as from scrolling or echoing of input) do not occur. The "refresh()" and "noutrefresh()" methods of a pad require 6 arguments to specify the part of the pad to be displayed and the location on the screen to be used for the display. The arguments are *pminrow*, *pmincol*, *sminrow*, *smincol*, *smaxrow*, *smaxcol*; the *p* arguments refer to the upper left corner of the pad region to be displayed and the *s* arguments define a clipping box on the screen within which the pad region is to be displayed. curses.newwin(nlines, ncols) curses.newwin(nlines, ncols, begin_y, begin_x) Return a new window, whose left-upper corner is at "(begin_y, begin_x)", and whose height/width is *nlines*/*ncols*. By default, the window will extend from the specified position to the lower right corner of the screen. curses.nl() Enter newline mode. This mode translates the return key into newline on input, and translates newline into return and line-feed on output. Newline mode is initially on. curses.nocbreak() Leave cbreak mode. Return to normal “cooked” mode with line buffering. curses.noecho() Leave echo mode. Echoing of input characters is turned off. curses.nonl() Leave newline mode. Disable translation of return into newline on input, and disable low-level translation of newline into newline/return on output (but this does not change the behavior of "addch('\n')", which always does the equivalent of return and line feed on the virtual screen). With translation off, curses can sometimes speed up vertical motion a little; also, it will be able to detect the return key on input. curses.noqiflush() When the "noqiflush()" routine is used, normal flush of input and output queues associated with the "INTR", "QUIT" and "SUSP" characters will not be done. You may want to call "noqiflush()" in a signal handler if you want output to continue as though the interrupt had not occurred, after the handler exits. curses.noraw() Leave raw mode. Return to normal “cooked” mode with line buffering. curses.pair_content(pair_number) Return a tuple "(fg, bg)" containing the colors for the requested color pair. The value of *pair_number* must be between "0" and "COLOR_PAIRS - 1". curses.pair_number(attr) Return the number of the color-pair set by the attribute value *attr*. "color_pair()" is the counterpart to this function. curses.putp(str) Equivalent to "tputs(str, 1, putchar)"; emit the value of a specified terminfo capability for the current terminal. Note that the output of "putp()" always goes to standard output. curses.qiflush([flag]) If *flag* is "False", the effect is the same as calling "noqiflush()". If *flag* is "True", or no argument is provided, the queues will be flushed when these control characters are read. curses.raw() Enter raw mode. In raw mode, normal line buffering and processing of interrupt, quit, suspend, and flow control keys are turned off; characters are presented to curses input functions one by one. curses.reset_prog_mode() Restore the terminal to “program” mode, as previously saved by "def_prog_mode()". curses.reset_shell_mode() Restore the terminal to “shell” mode, as previously saved by "def_shell_mode()". curses.resetty() Restore the state of the terminal modes to what it was at the last call to "savetty()". curses.resize_term(nlines, ncols) Backend function used by "resizeterm()", performing most of the work; when resizing the windows, "resize_term()" blank-fills the areas that are extended. The calling application should fill in these areas with appropriate data. The "resize_term()" function attempts to resize all windows. However, due to the calling convention of pads, it is not possible to resize these without additional interaction with the application. curses.resizeterm(nlines, ncols) Resize the standard and current windows to the specified dimensions, and adjusts other bookkeeping data used by the curses library that record the window dimensions (in particular the SIGWINCH handler). curses.savetty() Save the current state of the terminal modes in a buffer, usable by "resetty()". curses.get_escdelay() Retrieves the value set by "set_escdelay()". New in version 3.9. curses.set_escdelay(ms) Sets the number of milliseconds to wait after reading an escape character, to distinguish between an individual escape character entered on the keyboard from escape sequences sent by cursor and function keys. New in version 3.9. curses.get_tabsize() Retrieves the value set by "set_tabsize()". New in version 3.9. curses.set_tabsize(size) Sets the number of columns used by the curses library when converting a tab character to spaces as it adds the tab to a window. New in version 3.9. curses.setsyx(y, x) Set the virtual screen cursor to *y*, *x*. If *y* and *x* are both "-1", then "leaveok" is set "True". curses.setupterm(term=None, fd=-1) Initialize the terminal. *term* is a string giving the terminal name, or "None"; if omitted or "None", the value of the "TERM" environment variable will be used. *fd* is the file descriptor to which any initialization sequences will be sent; if not supplied or "-1", the file descriptor for "sys.stdout" will be used. curses.start_color() Must be called if the programmer wants to use colors, and before any other color manipulation routine is called. It is good practice to call this routine right after "initscr()". "start_color()" initializes eight basic colors (black, red, green, yellow, blue, magenta, cyan, and white), and two global variables in the "curses" module, "COLORS" and "COLOR_PAIRS", containing the maximum number of colors and color-pairs the terminal can support. It also restores the colors on the terminal to the values they had when the terminal was just turned on. curses.termattrs() Return a logical OR of all video attributes supported by the terminal. This information is useful when a curses program needs complete control over the appearance of the screen. curses.termname() Return the value of the environment variable "TERM", as a bytes object, truncated to 14 characters. curses.tigetflag(capname) Return the value of the Boolean capability corresponding to the terminfo capability name *capname* as an integer. Return the value "-1" if *capname* is not a Boolean capability, or "0" if it is canceled or absent from the terminal description. curses.tigetnum(capname) Return the value of the numeric capability corresponding to the terminfo capability name *capname* as an integer. Return the value "-2" if *capname* is not a numeric capability, or "-1" if it is canceled or absent from the terminal description. curses.tigetstr(capname) Return the value of the string capability corresponding to the terminfo capability name *capname* as a bytes object. Return "None" if *capname* is not a terminfo “string capability”, or is canceled or absent from the terminal description. curses.tparm(str[, ...]) Instantiate the bytes object *str* with the supplied parameters, where *str* should be a parameterized string obtained from the terminfo database. E.g. "tparm(tigetstr("cup"), 5, 3)" could result in "b'\033[6;4H'", the exact result depending on terminal type. curses.typeahead(fd) Specify that the file descriptor *fd* be used for typeahead checking. If *fd* is "-1", then no typeahead checking is done. The curses library does “line-breakout optimization” by looking for typeahead periodically while updating the screen. If input is found, and it is coming from a tty, the current update is postponed until refresh or doupdate is called again, allowing faster response to commands typed in advance. This function allows specifying a different file descriptor for typeahead checking. curses.unctrl(ch) Return a bytes object which is a printable representation of the character *ch*. Control characters are represented as a caret followed by the character, for example as "b'^C'". Printing characters are left as they are. curses.ungetch(ch) Push *ch* so the next "getch()" will return it. Note: Only one *ch* can be pushed before "getch()" is called. curses.update_lines_cols() Update "LINES" and "COLS". Useful for detecting manual screen resize. New in version 3.5. curses.unget_wch(ch) Push *ch* so the next "get_wch()" will return it. Note: Only one *ch* can be pushed before "get_wch()" is called. New in version 3.3. curses.ungetmouse(id, x, y, z, bstate) Push a "KEY_MOUSE" event onto the input queue, associating the given state data with it. curses.use_env(flag) If used, this function should be called before "initscr()" or newterm are called. When *flag* is "False", the values of lines and columns specified in the terminfo database will be used, even if environment variables "LINES" and "COLUMNS" (used by default) are set, or if curses is running in a window (in which case default behavior would be to use the window size if "LINES" and "COLUMNS" are not set). curses.use_default_colors() Allow use of default values for colors on terminals supporting this feature. Use this to support transparency in your application. The default color is assigned to the color number "-1". After calling this function, "init_pair(x, curses.COLOR_RED, -1)" initializes, for instance, color pair *x* to a red foreground color on the default background. curses.wrapper(func, /, *args, **kwargs) Initialize curses and call another callable object, *func*, which should be the rest of your curses-using application. If the application raises an exception, this function will restore the terminal to a sane state before re-raising the exception and generating a traceback. The callable object *func* is then passed the main window ‘stdscr’ as its first argument, followed by any other arguments passed to "wrapper()". Before calling *func*, "wrapper()" turns on cbreak mode, turns off echo, enables the terminal keypad, and initializes colors if the terminal has color support. On exit (whether normally or by exception) it restores cooked mode, turns on echo, and disables the terminal keypad. Window Objects ============== Window objects, as returned by "initscr()" and "newwin()" above, have the following methods and attributes: window.addch(ch[, attr]) window.addch(y, x, ch[, attr]) Paint character *ch* at "(y, x)" with attributes *attr*, overwriting any character previously painted at that location. By default, the character position and attributes are the current settings for the window object. Note: Writing outside the window, subwindow, or pad raises a "curses.error". Attempting to write to the lower right corner of a window, subwindow, or pad will cause an exception to be raised after the character is printed. window.addnstr(str, n[, attr]) window.addnstr(y, x, str, n[, attr]) Paint at most *n* characters of the character string *str* at "(y, x)" with attributes *attr*, overwriting anything previously on the display. window.addstr(str[, attr]) window.addstr(y, x, str[, attr]) Paint the character string *str* at "(y, x)" with attributes *attr*, overwriting anything previously on the display. Note: * Writing outside the window, subwindow, or pad raises "curses.error". Attempting to write to the lower right corner of a window, subwindow, or pad will cause an exception to be raised after the string is printed. * A bug in ncurses, the backend for this Python module, can cause SegFaults when resizing windows. This is fixed in ncurses-6.1-20190511. If you are stuck with an earlier ncurses, you can avoid triggering this if you do not call "addstr()" with a *str* that has embedded newlines. Instead, call "addstr()" separately for each line. window.attroff(attr) Remove attribute *attr* from the “background” set applied to all writes to the current window. window.attron(attr) Add attribute *attr* from the “background” set applied to all writes to the current window. window.attrset(attr) Set the “background” set of attributes to *attr*. This set is initially "0" (no attributes). window.bkgd(ch[, attr]) Set the background property of the window to the character *ch*, with attributes *attr*. The change is then applied to every character position in that window: * The attribute of every character in the window is changed to the new background attribute. * Wherever the former background character appears, it is changed to the new background character. window.bkgdset(ch[, attr]) Set the window’s background. A window’s background consists of a character and any combination of attributes. The attribute part of the background is combined (OR’ed) with all non-blank characters that are written into the window. Both the character and attribute parts of the background are combined with the blank characters. The background becomes a property of the character and moves with the character through any scrolling and insert/delete line/character operations. window.border([ls[, rs[, ts[, bs[, tl[, tr[, bl[, br]]]]]]]]) Draw a border around the edges of the window. Each parameter specifies the character to use for a specific part of the border; see the table below for more details. Note: A "0" value for any parameter will cause the default character to be used for that parameter. Keyword parameters can *not* be used. The defaults are listed in this table: +-------------+-----------------------+-------------------------+ | Parameter | Description | Default value | |=============|=======================|=========================| | *ls* | Left side | "ACS_VLINE" | +-------------+-----------------------+-------------------------+ | *rs* | Right side | "ACS_VLINE" | +-------------+-----------------------+-------------------------+ | *ts* | Top | "ACS_HLINE" | +-------------+-----------------------+-------------------------+ | *bs* | Bottom | "ACS_HLINE" | +-------------+-----------------------+-------------------------+ | *tl* | Upper-left corner | "ACS_ULCORNER" | +-------------+-----------------------+-------------------------+ | *tr* | Upper-right corner | "ACS_URCORNER" | +-------------+-----------------------+-------------------------+ | *bl* | Bottom-left corner | "ACS_LLCORNER" | +-------------+-----------------------+-------------------------+ | *br* | Bottom-right corner | "ACS_LRCORNER" | +-------------+-----------------------+-------------------------+ window.box([vertch, horch]) Similar to "border()", but both *ls* and *rs* are *vertch* and both *ts* and *bs* are *horch*. The default corner characters are always used by this function. window.chgat(attr) window.chgat(num, attr) window.chgat(y, x, attr) window.chgat(y, x, num, attr) Set the attributes of *num* characters at the current cursor position, or at position "(y, x)" if supplied. If *num* is not given or is "-1", the attribute will be set on all the characters to the end of the line. This function moves cursor to position "(y, x)" if supplied. The changed line will be touched using the "touchline()" method so that the contents will be redisplayed by the next window refresh. window.clear() Like "erase()", but also cause the whole window to be repainted upon next call to "refresh()". window.clearok(flag) If *flag* is "True", the next call to "refresh()" will clear the window completely. window.clrtobot() Erase from cursor to the end of the window: all lines below the cursor are deleted, and then the equivalent of "clrtoeol()" is performed. window.clrtoeol() Erase from cursor to the end of the line. window.cursyncup() Update the current cursor position of all the ancestors of the window to reflect the current cursor position of the window. window.delch([y, x]) Delete any character at "(y, x)". window.deleteln() Delete the line under the cursor. All following lines are moved up by one line. window.derwin(begin_y, begin_x) window.derwin(nlines, ncols, begin_y, begin_x) An abbreviation for “derive window”, "derwin()" is the same as calling "subwin()", except that *begin_y* and *begin_x* are relative to the origin of the window, rather than relative to the entire screen. Return a window object for the derived window. window.echochar(ch[, attr]) Add character *ch* with attribute *attr*, and immediately call "refresh()" on the window. window.enclose(y, x) Test whether the given pair of screen-relative character-cell coordinates are enclosed by the given window, returning "True" or "False". It is useful for determining what subset of the screen windows enclose the location of a mouse event. window.encoding Encoding used to encode method arguments (Unicode strings and characters). The encoding attribute is inherited from the parent window when a subwindow is created, for example with "window.subwin()". By default, the locale encoding is used (see "locale.getpreferredencoding()"). New in version 3.3. window.erase() Clear the window. window.getbegyx() Return a tuple "(y, x)" of co-ordinates of upper-left corner. window.getbkgd() Return the given window’s current background character/attribute pair. window.getch([y, x]) Get a character. Note that the integer returned does *not* have to be in ASCII range: function keys, keypad keys and so on are represented by numbers higher than 255. In no-delay mode, return "-1" if there is no input, otherwise wait until a key is pressed. window.get_wch([y, x]) Get a wide character. Return a character for most keys, or an integer for function keys, keypad keys, and other special keys. In no-delay mode, raise an exception if there is no input. New in version 3.3. window.getkey([y, x]) Get a character, returning a string instead of an integer, as "getch()" does. Function keys, keypad keys and other special keys return a multibyte string containing the key name. In no-delay mode, raise an exception if there is no input. window.getmaxyx() Return a tuple "(y, x)" of the height and width of the window. window.getparyx() Return the beginning coordinates of this window relative to its parent window as a tuple "(y, x)". Return "(-1, -1)" if this window has no parent. window.getstr() window.getstr(n) window.getstr(y, x) window.getstr(y, x, n) Read a bytes object from the user, with primitive line editing capacity. window.getyx() Return a tuple "(y, x)" of current cursor position relative to the window’s upper-left corner. window.hline(ch, n) window.hline(y, x, ch, n) Display a horizontal line starting at "(y, x)" with length *n* consisting of the character *ch*. window.idcok(flag) If *flag* is "False", curses no longer considers using the hardware insert/delete character feature of the terminal; if *flag* is "True", use of character insertion and deletion is enabled. When curses is first initialized, use of character insert/delete is enabled by default. window.idlok(flag) If *flag* is "True", "curses" will try and use hardware line editing facilities. Otherwise, line insertion/deletion are disabled. window.immedok(flag) If *flag* is "True", any change in the window image automatically causes the window to be refreshed; you no longer have to call "refresh()" yourself. However, it may degrade performance considerably, due to repeated calls to wrefresh. This option is disabled by default. window.inch([y, x]) Return the character at the given position in the window. The bottom 8 bits are the character proper, and upper bits are the attributes. window.insch(ch[, attr]) window.insch(y, x, ch[, attr]) Paint character *ch* at "(y, x)" with attributes *attr*, moving the line from position *x* right by one character. window.insdelln(nlines) Insert *nlines* lines into the specified window above the current line. The *nlines* bottom lines are lost. For negative *nlines*, delete *nlines* lines starting with the one under the cursor, and move the remaining lines up. The bottom *nlines* lines are cleared. The current cursor position remains the same. window.insertln() Insert a blank line under the cursor. All following lines are moved down by one line. window.insnstr(str, n[, attr]) window.insnstr(y, x, str, n[, attr]) Insert a character string (as many characters as will fit on the line) before the character under the cursor, up to *n* characters. If *n* is zero or negative, the entire string is inserted. All characters to the right of the cursor are shifted right, with the rightmost characters on the line being lost. The cursor position does not change (after moving to *y*, *x*, if specified). window.insstr(str[, attr]) window.insstr(y, x, str[, attr]) Insert a character string (as many characters as will fit on the line) before the character under the cursor. All characters to the right of the cursor are shifted right, with the rightmost characters on the line being lost. The cursor position does not change (after moving to *y*, *x*, if specified). window.instr([n]) window.instr(y, x[, n]) Return a bytes object of characters, extracted from the window starting at the current cursor position, or at *y*, *x* if specified. Attributes are stripped from the characters. If *n* is specified, "instr()" returns a string at most *n* characters long (exclusive of the trailing NUL). window.is_linetouched(line) Return "True" if the specified line was modified since the last call to "refresh()"; otherwise return "False". Raise a "curses.error" exception if *line* is not valid for the given window. window.is_wintouched() Return "True" if the specified window was modified since the last call to "refresh()"; otherwise return "False". window.keypad(flag) If *flag* is "True", escape sequences generated by some keys (keypad, function keys) will be interpreted by "curses". If *flag* is "False", escape sequences will be left as is in the input stream. window.leaveok(flag) If *flag* is "True", cursor is left where it is on update, instead of being at “cursor position.” This reduces cursor movement where possible. If possible the cursor will be made invisible. If *flag* is "False", cursor will always be at “cursor position” after an update. window.move(new_y, new_x) Move cursor to "(new_y, new_x)". window.mvderwin(y, x) Move the window inside its parent window. The screen-relative parameters of the window are not changed. This routine is used to display different parts of the parent window at the same physical position on the screen. window.mvwin(new_y, new_x) Move the window so its upper-left corner is at "(new_y, new_x)". window.nodelay(flag) If *flag* is "True", "getch()" will be non-blocking. window.notimeout(flag) If *flag* is "True", escape sequences will not be timed out. If *flag* is "False", after a few milliseconds, an escape sequence will not be interpreted, and will be left in the input stream as is. window.noutrefresh() Mark for refresh but wait. This function updates the data structure representing the desired state of the window, but does not force an update of the physical screen. To accomplish that, call "doupdate()". window.overlay(destwin[, sminrow, smincol, dminrow, dmincol, dmaxrow, dmaxcol]) Overlay the window on top of *destwin*. The windows need not be the same size, only the overlapping region is copied. This copy is non- destructive, which means that the current background character does not overwrite the old contents of *destwin*. To get fine-grained control over the copied region, the second form of "overlay()" can be used. *sminrow* and *smincol* are the upper- left coordinates of the source window, and the other variables mark a rectangle in the destination window. window.overwrite(destwin[, sminrow, smincol, dminrow, dmincol, dmaxrow, dmaxcol]) Overwrite the window on top of *destwin*. The windows need not be the same size, in which case only the overlapping region is copied. This copy is destructive, which means that the current background character overwrites the old contents of *destwin*. To get fine-grained control over the copied region, the second form of "overwrite()" can be used. *sminrow* and *smincol* are the upper-left coordinates of the source window, the other variables mark a rectangle in the destination window. window.putwin(file) Write all data associated with the window into the provided file object. This information can be later retrieved using the "getwin()" function. window.redrawln(beg, num) Indicate that the *num* screen lines, starting at line *beg*, are corrupted and should be completely redrawn on the next "refresh()" call. window.redrawwin() Touch the entire window, causing it to be completely redrawn on the next "refresh()" call. window.refresh([pminrow, pmincol, sminrow, smincol, smaxrow, smaxcol]) Update the display immediately (sync actual screen with previous drawing/deleting methods). The 6 optional arguments can only be specified when the window is a pad created with "newpad()". The additional parameters are needed to indicate what part of the pad and screen are involved. *pminrow* and *pmincol* specify the upper left-hand corner of the rectangle to be displayed in the pad. *sminrow*, *smincol*, *smaxrow*, and *smaxcol* specify the edges of the rectangle to be displayed on the screen. The lower right-hand corner of the rectangle to be displayed in the pad is calculated from the screen coordinates, since the rectangles must be the same size. Both rectangles must be entirely contained within their respective structures. Negative values of *pminrow*, *pmincol*, *sminrow*, or *smincol* are treated as if they were zero. window.resize(nlines, ncols) Reallocate storage for a curses window to adjust its dimensions to the specified values. If either dimension is larger than the current values, the window’s data is filled with blanks that have the current background rendition (as set by "bkgdset()") merged into them. window.scroll([lines=1]) Scroll the screen or scrolling region upward by *lines* lines. window.scrollok(flag) Control what happens when the cursor of a window is moved off the edge of the window or scrolling region, either as a result of a newline action on the bottom line, or typing the last character of the last line. If *flag* is "False", the cursor is left on the bottom line. If *flag* is "True", the window is scrolled up one line. Note that in order to get the physical scrolling effect on the terminal, it is also necessary to call "idlok()". window.setscrreg(top, bottom) Set the scrolling region from line *top* to line *bottom*. All scrolling actions will take place in this region. window.standend() Turn off the standout attribute. On some terminals this has the side effect of turning off all attributes. window.standout() Turn on attribute *A_STANDOUT*. window.subpad(begin_y, begin_x) window.subpad(nlines, ncols, begin_y, begin_x) Return a sub-window, whose upper-left corner is at "(begin_y, begin_x)", and whose width/height is *ncols*/*nlines*. window.subwin(begin_y, begin_x) window.subwin(nlines, ncols, begin_y, begin_x) Return a sub-window, whose upper-left corner is at "(begin_y, begin_x)", and whose width/height is *ncols*/*nlines*. By default, the sub-window will extend from the specified position to the lower right corner of the window. window.syncdown() Touch each location in the window that has been touched in any of its ancestor windows. This routine is called by "refresh()", so it should almost never be necessary to call it manually. window.syncok(flag) If *flag* is "True", then "syncup()" is called automatically whenever there is a change in the window. window.syncup() Touch all locations in ancestors of the window that have been changed in the window. window.timeout(delay) Set blocking or non-blocking read behavior for the window. If *delay* is negative, blocking read is used (which will wait indefinitely for input). If *delay* is zero, then non-blocking read is used, and "getch()" will return "-1" if no input is waiting. If *delay* is positive, then "getch()" will block for *delay* milliseconds, and return "-1" if there is still no input at the end of that time. window.touchline(start, count[, changed]) Pretend *count* lines have been changed, starting with line *start*. If *changed* is supplied, it specifies whether the affected lines are marked as having been changed (*changed*"=True") or unchanged (*changed*"=False"). window.touchwin() Pretend the whole window has been changed, for purposes of drawing optimizations. window.untouchwin() Mark all lines in the window as unchanged since the last call to "refresh()". window.vline(ch, n) window.vline(y, x, ch, n) Display a vertical line starting at "(y, x)" with length *n* consisting of the character *ch*. Constants ========= The "curses" module defines the following data members: curses.ERR Some curses routines that return an integer, such as "getch()", return "ERR" upon failure. curses.OK Some curses routines that return an integer, such as "napms()", return "OK" upon success. curses.version A bytes object representing the current version of the module. Also available as "__version__". curses.ncurses_version A named tuple containing the three components of the ncurses library version: *major*, *minor*, and *patch*. All values are integers. The components can also be accessed by name, so "curses.ncurses_version[0]" is equivalent to "curses.ncurses_version.major" and so on. Availability: if the ncurses library is used. New in version 3.8. Some constants are available to specify character cell attributes. The exact constants available are system dependent. +--------------------+---------------------------------+ | Attribute | Meaning | |====================|=================================| | "A_ALTCHARSET" | Alternate character set mode | +--------------------+---------------------------------+ | "A_BLINK" | Blink mode | +--------------------+---------------------------------+ | "A_BOLD" | Bold mode | +--------------------+---------------------------------+ | "A_DIM" | Dim mode | +--------------------+---------------------------------+ | "A_INVIS" | Invisible or blank mode | +--------------------+---------------------------------+ | "A_ITALIC" | Italic mode | +--------------------+---------------------------------+ | "A_NORMAL" | Normal attribute | +--------------------+---------------------------------+ | "A_PROTECT" | Protected mode | +--------------------+---------------------------------+ | "A_REVERSE" | Reverse background and | | | foreground colors | +--------------------+---------------------------------+ | "A_STANDOUT" | Standout mode | +--------------------+---------------------------------+ | "A_UNDERLINE" | Underline mode | +--------------------+---------------------------------+ | "A_HORIZONTAL" | Horizontal highlight | +--------------------+---------------------------------+ | "A_LEFT" | Left highlight | +--------------------+---------------------------------+ | "A_LOW" | Low highlight | +--------------------+---------------------------------+ | "A_RIGHT" | Right highlight | +--------------------+---------------------------------+ | "A_TOP" | Top highlight | +--------------------+---------------------------------+ | "A_VERTICAL" | Vertical highlight | +--------------------+---------------------------------+ | "A_CHARTEXT" | Bit-mask to extract a character | +--------------------+---------------------------------+ New in version 3.7: "A_ITALIC" was added. Several constants are available to extract corresponding attributes returned by some methods. +--------------------+---------------------------------+ | Bit-mask | Meaning | |====================|=================================| | "A_ATTRIBUTES" | Bit-mask to extract attributes | +--------------------+---------------------------------+ | "A_CHARTEXT" | Bit-mask to extract a character | +--------------------+---------------------------------+ | "A_COLOR" | Bit-mask to extract color-pair | | | field information | +--------------------+---------------------------------+ Keys are referred to by integer constants with names starting with "KEY_". The exact keycaps available are system dependent. +---------------------+----------------------------------------------+ | Key constant | Key | |=====================|==============================================| | "KEY_MIN" | Minimum key value | +---------------------+----------------------------------------------+ | "KEY_BREAK" | Break key (unreliable) | +---------------------+----------------------------------------------+ | "KEY_DOWN" | Down-arrow | +---------------------+----------------------------------------------+ | "KEY_UP" | Up-arrow | +---------------------+----------------------------------------------+ | "KEY_LEFT" | Left-arrow | +---------------------+----------------------------------------------+ | "KEY_RIGHT" | Right-arrow | +---------------------+----------------------------------------------+ | "KEY_HOME" | Home key (upward+left arrow) | +---------------------+----------------------------------------------+ | "KEY_BACKSPACE" | Backspace (unreliable) | +---------------------+----------------------------------------------+ | "KEY_F0" | Function keys. Up to 64 function keys are | | | supported. | +---------------------+----------------------------------------------+ | "KEY_Fn" | Value of function key *n* | +---------------------+----------------------------------------------+ | "KEY_DL" | Delete line | +---------------------+----------------------------------------------+ | "KEY_IL" | Insert line | +---------------------+----------------------------------------------+ | "KEY_DC" | Delete character | +---------------------+----------------------------------------------+ | "KEY_IC" | Insert char or enter insert mode | +---------------------+----------------------------------------------+ | "KEY_EIC" | Exit insert char mode | +---------------------+----------------------------------------------+ | "KEY_CLEAR" | Clear screen | +---------------------+----------------------------------------------+ | "KEY_EOS" | Clear to end of screen | +---------------------+----------------------------------------------+ | "KEY_EOL" | Clear to end of line | +---------------------+----------------------------------------------+ | "KEY_SF" | Scroll 1 line forward | +---------------------+----------------------------------------------+ | "KEY_SR" | Scroll 1 line backward (reverse) | +---------------------+----------------------------------------------+ | "KEY_NPAGE" | Next page | +---------------------+----------------------------------------------+ | "KEY_PPAGE" | Previous page | +---------------------+----------------------------------------------+ | "KEY_STAB" | Set tab | +---------------------+----------------------------------------------+ | "KEY_CTAB" | Clear tab | +---------------------+----------------------------------------------+ | "KEY_CATAB" | Clear all tabs | +---------------------+----------------------------------------------+ | "KEY_ENTER" | Enter or send (unreliable) | +---------------------+----------------------------------------------+ | "KEY_SRESET" | Soft (partial) reset (unreliable) | +---------------------+----------------------------------------------+ | "KEY_RESET" | Reset or hard reset (unreliable) | +---------------------+----------------------------------------------+ | "KEY_PRINT" | Print | +---------------------+----------------------------------------------+ | "KEY_LL" | Home down or bottom (lower left) | +---------------------+----------------------------------------------+ | "KEY_A1" | Upper left of keypad | +---------------------+----------------------------------------------+ | "KEY_A3" | Upper right of keypad | +---------------------+----------------------------------------------+ | "KEY_B2" | Center of keypad | +---------------------+----------------------------------------------+ | "KEY_C1" | Lower left of keypad | +---------------------+----------------------------------------------+ | "KEY_C3" | Lower right of keypad | +---------------------+----------------------------------------------+ | "KEY_BTAB" | Back tab | +---------------------+----------------------------------------------+ | "KEY_BEG" | Beg (beginning) | +---------------------+----------------------------------------------+ | "KEY_CANCEL" | Cancel | +---------------------+----------------------------------------------+ | "KEY_CLOSE" | Close | +---------------------+----------------------------------------------+ | "KEY_COMMAND" | Cmd (command) | +---------------------+----------------------------------------------+ | "KEY_COPY" | Copy | +---------------------+----------------------------------------------+ | "KEY_CREATE" | Create | +---------------------+----------------------------------------------+ | "KEY_END" | End | +---------------------+----------------------------------------------+ | "KEY_EXIT" | Exit | +---------------------+----------------------------------------------+ | "KEY_FIND" | Find | +---------------------+----------------------------------------------+ | "KEY_HELP" | Help | +---------------------+----------------------------------------------+ | "KEY_MARK" | Mark | +---------------------+----------------------------------------------+ | "KEY_MESSAGE" | Message | +---------------------+----------------------------------------------+ | "KEY_MOVE" | Move | +---------------------+----------------------------------------------+ | "KEY_NEXT" | Next | +---------------------+----------------------------------------------+ | "KEY_OPEN" | Open | +---------------------+----------------------------------------------+ | "KEY_OPTIONS" | Options | +---------------------+----------------------------------------------+ | "KEY_PREVIOUS" | Prev (previous) | +---------------------+----------------------------------------------+ | "KEY_REDO" | Redo | +---------------------+----------------------------------------------+ | "KEY_REFERENCE" | Ref (reference) | +---------------------+----------------------------------------------+ | "KEY_REFRESH" | Refresh | +---------------------+----------------------------------------------+ | "KEY_REPLACE" | Replace | +---------------------+----------------------------------------------+ | "KEY_RESTART" | Restart | +---------------------+----------------------------------------------+ | "KEY_RESUME" | Resume | +---------------------+----------------------------------------------+ | "KEY_SAVE" | Save | +---------------------+----------------------------------------------+ | "KEY_SBEG" | Shifted Beg (beginning) | +---------------------+----------------------------------------------+ | "KEY_SCANCEL" | Shifted Cancel | +---------------------+----------------------------------------------+ | "KEY_SCOMMAND" | Shifted Command | +---------------------+----------------------------------------------+ | "KEY_SCOPY" | Shifted Copy | +---------------------+----------------------------------------------+ | "KEY_SCREATE" | Shifted Create | +---------------------+----------------------------------------------+ | "KEY_SDC" | Shifted Delete char | +---------------------+----------------------------------------------+ | "KEY_SDL" | Shifted Delete line | +---------------------+----------------------------------------------+ | "KEY_SELECT" | Select | +---------------------+----------------------------------------------+ | "KEY_SEND" | Shifted End | +---------------------+----------------------------------------------+ | "KEY_SEOL" | Shifted Clear line | +---------------------+----------------------------------------------+ | "KEY_SEXIT" | Shifted Exit | +---------------------+----------------------------------------------+ | "KEY_SFIND" | Shifted Find | +---------------------+----------------------------------------------+ | "KEY_SHELP" | Shifted Help | +---------------------+----------------------------------------------+ | "KEY_SHOME" | Shifted Home | +---------------------+----------------------------------------------+ | "KEY_SIC" | Shifted Input | +---------------------+----------------------------------------------+ | "KEY_SLEFT" | Shifted Left arrow | +---------------------+----------------------------------------------+ | "KEY_SMESSAGE" | Shifted Message | +---------------------+----------------------------------------------+ | "KEY_SMOVE" | Shifted Move | +---------------------+----------------------------------------------+ | "KEY_SNEXT" | Shifted Next | +---------------------+----------------------------------------------+ | "KEY_SOPTIONS" | Shifted Options | +---------------------+----------------------------------------------+ | "KEY_SPREVIOUS" | Shifted Prev | +---------------------+----------------------------------------------+ | "KEY_SPRINT" | Shifted Print | +---------------------+----------------------------------------------+ | "KEY_SREDO" | Shifted Redo | +---------------------+----------------------------------------------+ | "KEY_SREPLACE" | Shifted Replace | +---------------------+----------------------------------------------+ | "KEY_SRIGHT" | Shifted Right arrow | +---------------------+----------------------------------------------+ | "KEY_SRSUME" | Shifted Resume | +---------------------+----------------------------------------------+ | "KEY_SSAVE" | Shifted Save | +---------------------+----------------------------------------------+ | "KEY_SSUSPEND" | Shifted Suspend | +---------------------+----------------------------------------------+ | "KEY_SUNDO" | Shifted Undo | +---------------------+----------------------------------------------+ | "KEY_SUSPEND" | Suspend | +---------------------+----------------------------------------------+ | "KEY_UNDO" | Undo | +---------------------+----------------------------------------------+ | "KEY_MOUSE" | Mouse event has occurred | +---------------------+----------------------------------------------+ | "KEY_RESIZE" | Terminal resize event | +---------------------+----------------------------------------------+ | "KEY_MAX" | Maximum key value | +---------------------+----------------------------------------------+ On VT100s and their software emulations, such as X terminal emulators, there are normally at least four function keys ("KEY_F1", "KEY_F2", "KEY_F3", "KEY_F4") available, and the arrow keys mapped to "KEY_UP", "KEY_DOWN", "KEY_LEFT" and "KEY_RIGHT" in the obvious way. If your machine has a PC keyboard, it is safe to expect arrow keys and twelve function keys (older PC keyboards may have only ten function keys); also, the following keypad mappings are standard: +--------------------+-------------+ | Keycap | Constant | |====================|=============| | "Insert" | KEY_IC | +--------------------+-------------+ | "Delete" | KEY_DC | +--------------------+-------------+ | "Home" | KEY_HOME | +--------------------+-------------+ | "End" | KEY_END | +--------------------+-------------+ | "Page Up" | KEY_PPAGE | +--------------------+-------------+ | "Page Down" | KEY_NPAGE | +--------------------+-------------+ The following table lists characters from the alternate character set. These are inherited from the VT100 terminal, and will generally be available on software emulations such as X terminals. When there is no graphic available, curses falls back on a crude printable ASCII approximation. Note: These are available only after "initscr()" has been called. +--------------------+--------------------------------------------+ | ACS code | Meaning | |====================|============================================| | "ACS_BBSS" | alternate name for upper right corner | +--------------------+--------------------------------------------+ | "ACS_BLOCK" | solid square block | +--------------------+--------------------------------------------+ | "ACS_BOARD" | board of squares | +--------------------+--------------------------------------------+ | "ACS_BSBS" | alternate name for horizontal line | +--------------------+--------------------------------------------+ | "ACS_BSSB" | alternate name for upper left corner | +--------------------+--------------------------------------------+ | "ACS_BSSS" | alternate name for top tee | +--------------------+--------------------------------------------+ | "ACS_BTEE" | bottom tee | +--------------------+--------------------------------------------+ | "ACS_BULLET" | bullet | +--------------------+--------------------------------------------+ | "ACS_CKBOARD" | checker board (stipple) | +--------------------+--------------------------------------------+ | "ACS_DARROW" | arrow pointing down | +--------------------+--------------------------------------------+ | "ACS_DEGREE" | degree symbol | +--------------------+--------------------------------------------+ | "ACS_DIAMOND" | diamond | +--------------------+--------------------------------------------+ | "ACS_GEQUAL" | greater-than-or-equal-to | +--------------------+--------------------------------------------+ | "ACS_HLINE" | horizontal line | +--------------------+--------------------------------------------+ | "ACS_LANTERN" | lantern symbol | +--------------------+--------------------------------------------+ | "ACS_LARROW" | left arrow | +--------------------+--------------------------------------------+ | "ACS_LEQUAL" | less-than-or-equal-to | +--------------------+--------------------------------------------+ | "ACS_LLCORNER" | lower left-hand corner | +--------------------+--------------------------------------------+ | "ACS_LRCORNER" | lower right-hand corner | +--------------------+--------------------------------------------+ | "ACS_LTEE" | left tee | +--------------------+--------------------------------------------+ | "ACS_NEQUAL" | not-equal sign | +--------------------+--------------------------------------------+ | "ACS_PI" | letter pi | +--------------------+--------------------------------------------+ | "ACS_PLMINUS" | plus-or-minus sign | +--------------------+--------------------------------------------+ | "ACS_PLUS" | big plus sign | +--------------------+--------------------------------------------+ | "ACS_RARROW" | right arrow | +--------------------+--------------------------------------------+ | "ACS_RTEE" | right tee | +--------------------+--------------------------------------------+ | "ACS_S1" | scan line 1 | +--------------------+--------------------------------------------+ | "ACS_S3" | scan line 3 | +--------------------+--------------------------------------------+ | "ACS_S7" | scan line 7 | +--------------------+--------------------------------------------+ | "ACS_S9" | scan line 9 | +--------------------+--------------------------------------------+ | "ACS_SBBS" | alternate name for lower right corner | +--------------------+--------------------------------------------+ | "ACS_SBSB" | alternate name for vertical line | +--------------------+--------------------------------------------+ | "ACS_SBSS" | alternate name for right tee | +--------------------+--------------------------------------------+ | "ACS_SSBB" | alternate name for lower left corner | +--------------------+--------------------------------------------+ | "ACS_SSBS" | alternate name for bottom tee | +--------------------+--------------------------------------------+ | "ACS_SSSB" | alternate name for left tee | +--------------------+--------------------------------------------+ | "ACS_SSSS" | alternate name for crossover or big plus | +--------------------+--------------------------------------------+ | "ACS_STERLING" | pound sterling | +--------------------+--------------------------------------------+ | "ACS_TTEE" | top tee | +--------------------+--------------------------------------------+ | "ACS_UARROW" | up arrow | +--------------------+--------------------------------------------+ | "ACS_ULCORNER" | upper left corner | +--------------------+--------------------------------------------+ | "ACS_URCORNER" | upper right corner | +--------------------+--------------------------------------------+ | "ACS_VLINE" | vertical line | +--------------------+--------------------------------------------+ The following table lists the predefined colors: +---------------------+------------------------------+ | Constant | Color | |=====================|==============================| | "COLOR_BLACK" | Black | +---------------------+------------------------------+ | "COLOR_BLUE" | Blue | +---------------------+------------------------------+ | "COLOR_CYAN" | Cyan (light greenish blue) | +---------------------+------------------------------+ | "COLOR_GREEN" | Green | +---------------------+------------------------------+ | "COLOR_MAGENTA" | Magenta (purplish red) | +---------------------+------------------------------+ | "COLOR_RED" | Red | +---------------------+------------------------------+ | "COLOR_WHITE" | White | +---------------------+------------------------------+ | "COLOR_YELLOW" | Yellow | +---------------------+------------------------------+ "curses.textpad" — Text input widget for curses programs ******************************************************** The "curses.textpad" module provides a "Textbox" class that handles elementary text editing in a curses window, supporting a set of keybindings resembling those of Emacs (thus, also of Netscape Navigator, BBedit 6.x, FrameMaker, and many other programs). The module also provides a rectangle-drawing function useful for framing text boxes or for other purposes. The module "curses.textpad" defines the following function: curses.textpad.rectangle(win, uly, ulx, lry, lrx) Draw a rectangle. The first argument must be a window object; the remaining arguments are coordinates relative to that window. The second and third arguments are the y and x coordinates of the upper left hand corner of the rectangle to be drawn; the fourth and fifth arguments are the y and x coordinates of the lower right hand corner. The rectangle will be drawn using VT100/IBM PC forms characters on terminals that make this possible (including xterm and most other software terminal emulators). Otherwise it will be drawn with ASCII dashes, vertical bars, and plus signs. Textbox objects =============== You can instantiate a "Textbox" object as follows: class curses.textpad.Textbox(win) Return a textbox widget object. The *win* argument should be a curses window object in which the textbox is to be contained. The edit cursor of the textbox is initially located at the upper left hand corner of the containing window, with coordinates "(0, 0)". The instance’s "stripspaces" flag is initially on. "Textbox" objects have the following methods: edit([validator]) This is the entry point you will normally use. It accepts editing keystrokes until one of the termination keystrokes is entered. If *validator* is supplied, it must be a function. It will be called for each keystroke entered with the keystroke as a parameter; command dispatch is done on the result. This method returns the window contents as a string; whether blanks in the window are included is affected by the "stripspaces" attribute. do_command(ch) Process a single command keystroke. Here are the supported special keystrokes: +--------------------+---------------------------------------------+ | Keystroke | Action | |====================|=============================================| | "Control-A" | Go to left edge of window. | +--------------------+---------------------------------------------+ | "Control-B" | Cursor left, wrapping to previous line if | | | appropriate. | +--------------------+---------------------------------------------+ | "Control-D" | Delete character under cursor. | +--------------------+---------------------------------------------+ | "Control-E" | Go to right edge (stripspaces off) or end | | | of line (stripspaces on). | +--------------------+---------------------------------------------+ | "Control-F" | Cursor right, wrapping to next line when | | | appropriate. | +--------------------+---------------------------------------------+ | "Control-G" | Terminate, returning the window contents. | +--------------------+---------------------------------------------+ | "Control-H" | Delete character backward. | +--------------------+---------------------------------------------+ | "Control-J" | Terminate if the window is 1 line, | | | otherwise insert newline. | +--------------------+---------------------------------------------+ | "Control-K" | If line is blank, delete it, otherwise | | | clear to end of line. | +--------------------+---------------------------------------------+ | "Control-L" | Refresh screen. | +--------------------+---------------------------------------------+ | "Control-N" | Cursor down; move down one line. | +--------------------+---------------------------------------------+ | "Control-O" | Insert a blank line at cursor location. | +--------------------+---------------------------------------------+ | "Control-P" | Cursor up; move up one line. | +--------------------+---------------------------------------------+ Move operations do nothing if the cursor is at an edge where the movement is not possible. The following synonyms are supported where possible: +--------------------------+--------------------+ | Constant | Keystroke | |==========================|====================| | "KEY_LEFT" | "Control-B" | +--------------------------+--------------------+ | "KEY_RIGHT" | "Control-F" | +--------------------------+--------------------+ | "KEY_UP" | "Control-P" | +--------------------------+--------------------+ | "KEY_DOWN" | "Control-N" | +--------------------------+--------------------+ | "KEY_BACKSPACE" | "Control-h" | +--------------------------+--------------------+ All other keystrokes are treated as a command to insert the given character and move right (with line wrapping). gather() Return the window contents as a string; whether blanks in the window are included is affected by the "stripspaces" member. stripspaces This attribute is a flag which controls the interpretation of blanks in the window. When it is on, trailing blanks on each line are ignored; any cursor motion that would land the cursor on a trailing blank goes to the end of that line instead, and trailing blanks are stripped when the window contents are gathered. Custom Python Interpreters ************************** The modules described in this chapter allow writing interfaces similar to Python’s interactive interpreter. If you want a Python interpreter that supports some special feature in addition to the Python language, you should look at the "code" module. (The "codeop" module is lower- level, used to support compiling a possibly-incomplete chunk of Python code.) The full list of modules described in this chapter is: * "code" — Interpreter base classes * Interactive Interpreter Objects * Interactive Console Objects * "codeop" — Compile Python code "dataclasses" — Data Classes **************************** **Source code:** Lib/dataclasses.py ====================================================================== This module provides a decorator and functions for automatically adding generated *special method*s such as "__init__()" and "__repr__()" to user-defined classes. It was originally described in **PEP 557**. The member variables to use in these generated methods are defined using **PEP 526** type annotations. For example, this code: from dataclasses import dataclass @dataclass class InventoryItem: """Class for keeping track of an item in inventory.""" name: str unit_price: float quantity_on_hand: int = 0 def total_cost(self) -> float: return self.unit_price * self.quantity_on_hand will add, among other things, a "__init__()" that looks like: def __init__(self, name: str, unit_price: float, quantity_on_hand: int = 0): self.name = name self.unit_price = unit_price self.quantity_on_hand = quantity_on_hand Note that this method is automatically added to the class: it is not directly specified in the "InventoryItem" definition shown above. New in version 3.7. Module-level decorators, classes, and functions =============================================== @dataclasses.dataclass(*, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False) This function is a *decorator* that is used to add generated *special method*s to classes, as described below. The "dataclass()" decorator examines the class to find "field"s. A "field" is defined as a class variable that has a *type annotation*. With two exceptions described below, nothing in "dataclass()" examines the type specified in the variable annotation. The order of the fields in all of the generated methods is the order in which they appear in the class definition. The "dataclass()" decorator will add various “dunder” methods to the class, described below. If any of the added methods already exist in the class, the behavior depends on the parameter, as documented below. The decorator returns the same class that it is called on; no new class is created. If "dataclass()" is used just as a simple decorator with no parameters, it acts as if it has the default values documented in this signature. That is, these three uses of "dataclass()" are equivalent: @dataclass class C: ... @dataclass() class C: ... @dataclass(init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False) class C: ... The parameters to "dataclass()" are: * "init": If true (the default), a "__init__()" method will be generated. If the class already defines "__init__()", this parameter is ignored. * "repr": If true (the default), a "__repr__()" method will be generated. The generated repr string will have the class name and the name and repr of each field, in the order they are defined in the class. Fields that are marked as being excluded from the repr are not included. For example: "InventoryItem(name='widget', unit_price=3.0, quantity_on_hand=10)". If the class already defines "__repr__()", this parameter is ignored. * "eq": If true (the default), an "__eq__()" method will be generated. This method compares the class as if it were a tuple of its fields, in order. Both instances in the comparison must be of the identical type. If the class already defines "__eq__()", this parameter is ignored. * "order": If true (the default is "False"), "__lt__()", "__le__()", "__gt__()", and "__ge__()" methods will be generated. These compare the class as if it were a tuple of its fields, in order. Both instances in the comparison must be of the identical type. If "order" is true and "eq" is false, a "ValueError" is raised. If the class already defines any of "__lt__()", "__le__()", "__gt__()", or "__ge__()", then "TypeError" is raised. * "unsafe_hash": If "False" (the default), a "__hash__()" method is generated according to how "eq" and "frozen" are set. "__hash__()" is used by built-in "hash()", and when objects are added to hashed collections such as dictionaries and sets. Having a "__hash__()" implies that instances of the class are immutable. Mutability is a complicated property that depends on the programmer’s intent, the existence and behavior of "__eq__()", and the values of the "eq" and "frozen" flags in the "dataclass()" decorator. By default, "dataclass()" will not implicitly add a "__hash__()" method unless it is safe to do so. Neither will it add or change an existing explicitly defined "__hash__()" method. Setting the class attribute "__hash__ = None" has a specific meaning to Python, as described in the "__hash__()" documentation. If "__hash__()" is not explicitly defined, or if it is set to "None", then "dataclass()" *may* add an implicit "__hash__()" method. Although not recommended, you can force "dataclass()" to create a "__hash__()" method with "unsafe_hash=True". This might be the case if your class is logically immutable but can nonetheless be mutated. This is a specialized use case and should be considered carefully. Here are the rules governing implicit creation of a "__hash__()" method. Note that you cannot both have an explicit "__hash__()" method in your dataclass and set "unsafe_hash=True"; this will result in a "TypeError". If "eq" and "frozen" are both true, by default "dataclass()" will generate a "__hash__()" method for you. If "eq" is true and "frozen" is false, "__hash__()" will be set to "None", marking it unhashable (which it is, since it is mutable). If "eq" is false, "__hash__()" will be left untouched meaning the "__hash__()" method of the superclass will be used (if the superclass is "object", this means it will fall back to id-based hashing). * "frozen": If true (the default is "False"), assigning to fields will generate an exception. This emulates read-only frozen instances. If "__setattr__()" or "__delattr__()" is defined in the class, then "TypeError" is raised. See the discussion below. "field"s may optionally specify a default value, using normal Python syntax: @dataclass class C: a: int # 'a' has no default value b: int = 0 # assign a default value for 'b' In this example, both "a" and "b" will be included in the added "__init__()" method, which will be defined as: def __init__(self, a: int, b: int = 0): "TypeError" will be raised if a field without a default value follows a field with a default value. This is true whether this occurs in a single class, or as a result of class inheritance. dataclasses.field(*, default=MISSING, default_factory=MISSING, repr=True, hash=None, init=True, compare=True, metadata=None) For common and simple use cases, no other functionality is required. There are, however, some dataclass features that require additional per-field information. To satisfy this need for additional information, you can replace the default field value with a call to the provided "field()" function. For example: @dataclass class C: mylist: list[int] = field(default_factory=list) c = C() c.mylist += [1, 2, 3] As shown above, the "MISSING" value is a sentinel object used to detect if the "default" and "default_factory" parameters are provided. This sentinel is used because "None" is a valid value for "default". No code should directly use the "MISSING" value. The parameters to "field()" are: * "default": If provided, this will be the default value for this field. This is needed because the "field()" call itself replaces the normal position of the default value. * "default_factory": If provided, it must be a zero-argument callable that will be called when a default value is needed for this field. Among other purposes, this can be used to specify fields with mutable default values, as discussed below. It is an error to specify both "default" and "default_factory". * "init": If true (the default), this field is included as a parameter to the generated "__init__()" method. * "repr": If true (the default), this field is included in the string returned by the generated "__repr__()" method. * "compare": If true (the default), this field is included in the generated equality and comparison methods ("__eq__()", "__gt__()", et al.). * "hash": This can be a bool or "None". If true, this field is included in the generated "__hash__()" method. If "None" (the default), use the value of "compare": this would normally be the expected behavior. A field should be considered in the hash if it’s used for comparisons. Setting this value to anything other than "None" is discouraged. One possible reason to set "hash=False" but "compare=True" would be if a field is expensive to compute a hash value for, that field is needed for equality testing, and there are other fields that contribute to the type’s hash value. Even if a field is excluded from the hash, it will still be used for comparisons. * "metadata": This can be a mapping or None. None is treated as an empty dict. This value is wrapped in "MappingProxyType()" to make it read-only, and exposed on the "Field" object. It is not used at all by Data Classes, and is provided as a third-party extension mechanism. Multiple third-parties can each have their own key, to use as a namespace in the metadata. If the default value of a field is specified by a call to "field()", then the class attribute for this field will be replaced by the specified "default" value. If no "default" is provided, then the class attribute will be deleted. The intent is that after the "dataclass()" decorator runs, the class attributes will all contain the default values for the fields, just as if the default value itself were specified. For example, after: @dataclass class C: x: int y: int = field(repr=False) z: int = field(repr=False, default=10) t: int = 20 The class attribute "C.z" will be "10", the class attribute "C.t" will be "20", and the class attributes "C.x" and "C.y" will not be set. class dataclasses.Field "Field" objects describe each defined field. These objects are created internally, and are returned by the "fields()" module-level method (see below). Users should never instantiate a "Field" object directly. Its documented attributes are: * "name": The name of the field. * "type": The type of the field. * "default", "default_factory", "init", "repr", "hash", "compare", and "metadata" have the identical meaning and values as they do in the "field()" declaration. Other attributes may exist, but they are private and must not be inspected or relied on. dataclasses.fields(class_or_instance) Returns a tuple of "Field" objects that define the fields for this dataclass. Accepts either a dataclass, or an instance of a dataclass. Raises "TypeError" if not passed a dataclass or instance of one. Does not return pseudo-fields which are "ClassVar" or "InitVar". dataclasses.asdict(obj, *, dict_factory=dict) Converts the dataclass "obj" to a dict (by using the factory function "dict_factory"). Each dataclass is converted to a dict of its fields, as "name: value" pairs. dataclasses, dicts, lists, and tuples are recursed into. Other objects are copied with "copy.deepcopy()". Example of using "asdict()" on nested dataclasses: @dataclass class Point: x: int y: int @dataclass class C: mylist: list[Point] p = Point(10, 20) assert asdict(p) == {'x': 10, 'y': 20} c = C([Point(0, 0), Point(10, 4)]) assert asdict(c) == {'mylist': [{'x': 0, 'y': 0}, {'x': 10, 'y': 4}]} To create a shallow copy, the following workaround may be used: dict((field.name, getattr(obj, field.name)) for field in fields(obj)) "asdict()" raises "TypeError" if "obj" is not a dataclass instance. dataclasses.astuple(obj, *, tuple_factory=tuple) Converts the dataclass "obj" to a tuple (by using the factory function "tuple_factory"). Each dataclass is converted to a tuple of its field values. dataclasses, dicts, lists, and tuples are recursed into. Other objects are copied with "copy.deepcopy()". Continuing from the previous example: assert astuple(p) == (10, 20) assert astuple(c) == ([(0, 0), (10, 4)],) To create a shallow copy, the following workaround may be used: tuple(getattr(obj, field.name) for field in dataclasses.fields(obj)) "astuple()" raises "TypeError" if "obj" is not a dataclass instance. dataclasses.make_dataclass(cls_name, fields, *, bases=(), namespace=None, init=True, repr=True, eq=True, order=False, unsafe_hash=False, frozen=False) Creates a new dataclass with name "cls_name", fields as defined in "fields", base classes as given in "bases", and initialized with a namespace as given in "namespace". "fields" is an iterable whose elements are each either "name", "(name, type)", or "(name, type, Field)". If just "name" is supplied, "typing.Any" is used for "type". The values of "init", "repr", "eq", "order", "unsafe_hash", and "frozen" have the same meaning as they do in "dataclass()". This function is not strictly required, because any Python mechanism for creating a new class with "__annotations__" can then apply the "dataclass()" function to convert that class to a dataclass. This function is provided as a convenience. For example: C = make_dataclass('C', [('x', int), 'y', ('z', int, field(default=5))], namespace={'add_one': lambda self: self.x + 1}) Is equivalent to: @dataclass class C: x: int y: 'typing.Any' z: int = 5 def add_one(self): return self.x + 1 dataclasses.replace(obj, /, **changes) Creates a new object of the same type as "obj", replacing fields with values from "changes". If "obj" is not a Data Class, raises "TypeError". If values in "changes" do not specify fields, raises "TypeError". The newly returned object is created by calling the "__init__()" method of the dataclass. This ensures that "__post_init__()", if present, is also called. Init-only variables without default values, if any exist, must be specified on the call to "replace()" so that they can be passed to "__init__()" and "__post_init__()". It is an error for "changes" to contain any fields that are defined as having "init=False". A "ValueError" will be raised in this case. Be forewarned about how "init=False" fields work during a call to "replace()". They are not copied from the source object, but rather are initialized in "__post_init__()", if they’re initialized at all. It is expected that "init=False" fields will be rarely and judiciously used. If they are used, it might be wise to have alternate class constructors, or perhaps a custom "replace()" (or similarly named) method which handles instance copying. dataclasses.is_dataclass(obj) Return "True" if its parameter is a dataclass or an instance of one, otherwise return "False". If you need to know if a class is an instance of a dataclass (and not a dataclass itself), then add a further check for "not isinstance(obj, type)": def is_dataclass_instance(obj): return is_dataclass(obj) and not isinstance(obj, type) Post-init processing ==================== The generated "__init__()" code will call a method named "__post_init__()", if "__post_init__()" is defined on the class. It will normally be called as "self.__post_init__()". However, if any "InitVar" fields are defined, they will also be passed to "__post_init__()" in the order they were defined in the class. If no "__init__()" method is generated, then "__post_init__()" will not automatically be called. Among other uses, this allows for initializing field values that depend on one or more other fields. For example: @dataclass class C: a: float b: float c: float = field(init=False) def __post_init__(self): self.c = self.a + self.b The "__init__()" method generated by "dataclass()" does not call base class "__init__()" methods. If the base class has an "__init__()" method that has to be called, it is common to call this method in a "__post_init__()" method: @dataclass class Rectangle: height: float width: float @dataclass class Square(Rectangle): side: float def __post_init__(self): super().__init__(self.side, self.side) Note, however, that in general the dataclass-generated "__init__()" methods don’t need to be called, since the derived dataclass will take care of initializing all fields of any base class that is a dataclass itself. See the section below on init-only variables for ways to pass parameters to "__post_init__()". Also see the warning about how "replace()" handles "init=False" fields. Class variables =============== One of two places where "dataclass()" actually inspects the type of a field is to determine if a field is a class variable as defined in **PEP 526**. It does this by checking if the type of the field is "typing.ClassVar". If a field is a "ClassVar", it is excluded from consideration as a field and is ignored by the dataclass mechanisms. Such "ClassVar" pseudo-fields are not returned by the module-level "fields()" function. Init-only variables =================== The other place where "dataclass()" inspects a type annotation is to determine if a field is an init-only variable. It does this by seeing if the type of a field is of type "dataclasses.InitVar". If a field is an "InitVar", it is considered a pseudo-field called an init-only field. As it is not a true field, it is not returned by the module- level "fields()" function. Init-only fields are added as parameters to the generated "__init__()" method, and are passed to the optional "__post_init__()" method. They are not otherwise used by dataclasses. For example, suppose a field will be initialized from a database, if a value is not provided when creating the class: @dataclass class C: i: int j: int = None database: InitVar[DatabaseType] = None def __post_init__(self, database): if self.j is None and database is not None: self.j = database.lookup('j') c = C(10, database=my_database) In this case, "fields()" will return "Field" objects for "i" and "j", but not for "database". Frozen instances ================ It is not possible to create truly immutable Python objects. However, by passing "frozen=True" to the "dataclass()" decorator you can emulate immutability. In that case, dataclasses will add "__setattr__()" and "__delattr__()" methods to the class. These methods will raise a "FrozenInstanceError" when invoked. There is a tiny performance penalty when using "frozen=True": "__init__()" cannot use simple assignment to initialize fields, and must use "object.__setattr__()". Inheritance =========== When the dataclass is being created by the "dataclass()" decorator, it looks through all of the class’s base classes in reverse MRO (that is, starting at "object") and, for each dataclass that it finds, adds the fields from that base class to an ordered mapping of fields. After all of the base class fields are added, it adds its own fields to the ordered mapping. All of the generated methods will use this combined, calculated ordered mapping of fields. Because the fields are in insertion order, derived classes override base classes. An example: @dataclass class Base: x: Any = 15.0 y: int = 0 @dataclass class C(Base): z: int = 10 x: int = 15 The final list of fields is, in order, "x", "y", "z". The final type of "x" is "int", as specified in class "C". The generated "__init__()" method for "C" will look like: def __init__(self, x: int = 15, y: int = 0, z: int = 10): Default factory functions ========================= If a "field()" specifies a "default_factory", it is called with zero arguments when a default value for the field is needed. For example, to create a new instance of a list, use: mylist: list = field(default_factory=list) If a field is excluded from "__init__()" (using "init=False") and the field also specifies "default_factory", then the default factory function will always be called from the generated "__init__()" function. This happens because there is no other way to give the field an initial value. Mutable default values ====================== Python stores default member variable values in class attributes. Consider this example, not using dataclasses: class C: x = [] def add(self, element): self.x.append(element) o1 = C() o2 = C() o1.add(1) o2.add(2) assert o1.x == [1, 2] assert o1.x is o2.x Note that the two instances of class "C" share the same class variable "x", as expected. Using dataclasses, *if* this code was valid: @dataclass class D: x: List = [] def add(self, element): self.x += element it would generate code similar to: class D: x = [] def __init__(self, x=x): self.x = x def add(self, element): self.x += element assert D().x is D().x This has the same issue as the original example using class "C". That is, two instances of class "D" that do not specify a value for "x" when creating a class instance will share the same copy of "x". Because dataclasses just use normal Python class creation they also share this behavior. There is no general way for Data Classes to detect this condition. Instead, dataclasses will raise a "TypeError" if it detects a default parameter of type "list", "dict", or "set". This is a partial solution, but it does protect against many common errors. Using default factory functions is a way to create new instances of mutable types as default values for fields: @dataclass class D: x: list = field(default_factory=list) assert D().x is not D().x Exceptions ========== exception dataclasses.FrozenInstanceError Raised when an implicitly defined "__setattr__()" or "__delattr__()" is called on a dataclass which was defined with "frozen=True". It is a subclass of "AttributeError". Data Types ********** The modules described in this chapter provide a variety of specialized data types such as dates and times, fixed-type arrays, heap queues, double-ended queues, and enumerations. Python also provides some built-in data types, in particular, "dict", "list", "set" and "frozenset", and "tuple". The "str" class is used to hold Unicode strings, and the "bytes" and "bytearray" classes are used to hold binary data. The following modules are documented in this chapter: * "datetime" — Basic date and time types * Aware and Naive Objects * Constants * Available Types * Common Properties * Determining if an Object is Aware or Naive * "timedelta" Objects * Examples of usage: "timedelta" * "date" Objects * Examples of Usage: "date" * "datetime" Objects * Examples of Usage: "datetime" * "time" Objects * Examples of Usage: "time" * "tzinfo" Objects * "timezone" Objects * "strftime()" and "strptime()" Behavior * "strftime()" and "strptime()" Format Codes * Technical Detail * "zoneinfo" — IANA time zone support * Using "ZoneInfo" * Data sources * Configuring the data sources * Compile-time configuration * Environment configuration * Runtime configuration * The "ZoneInfo" class * String representations * Pickle serialization * Functions * Globals * Exceptions and warnings * "calendar" — General calendar-related functions * "collections" — Container datatypes * "ChainMap" objects * "ChainMap" Examples and Recipes * "Counter" objects * "deque" objects * "deque" Recipes * "defaultdict" objects * "defaultdict" Examples * "namedtuple()" Factory Function for Tuples with Named Fields * "OrderedDict" objects * "OrderedDict" Examples and Recipes * "UserDict" objects * "UserList" objects * "UserString" objects * "collections.abc" — Abstract Base Classes for Containers * Collections Abstract Base Classes * "heapq" — Heap queue algorithm * Basic Examples * Priority Queue Implementation Notes * Theory * "bisect" — Array bisection algorithm * Searching Sorted Lists * Other Examples * "array" — Efficient arrays of numeric values * "weakref" — Weak references * Weak Reference Objects * Example * Finalizer Objects * Comparing finalizers with "__del__()" methods * "types" — Dynamic type creation and names for built-in types * Dynamic Type Creation * Standard Interpreter Types * Additional Utility Classes and Functions * Coroutine Utility Functions * "copy" — Shallow and deep copy operations * "pprint" — Data pretty printer * PrettyPrinter Objects * Example * "reprlib" — Alternate "repr()" implementation * Repr Objects * Subclassing Repr Objects * "enum" — Support for enumerations * Module Contents * Creating an Enum * Programmatic access to enumeration members and their attributes * Duplicating enum members and values * Ensuring unique enumeration values * Using automatic values * Iteration * Comparisons * Allowed members and attributes of enumerations * Restricted Enum subclassing * Pickling * Functional API * Derived Enumerations * IntEnum * IntFlag * Flag * Others * When to use "__new__()" vs. "__init__()" * Interesting examples * Omitting values * Using "auto" * Using "object" * Using a descriptive string * Using a custom "__new__()" * OrderedEnum * DuplicateFreeEnum * Planet * TimePeriod * How are Enums different? * Enum Classes * Enum Members (aka instances) * Finer Points * Supported "__dunder__" names * Supported "_sunder_" names * _Private__names * "Enum" member type * Boolean value of "Enum" classes and members * "Enum" classes with methods * Combining members of "Flag" * "graphlib" — Functionality to operate with graph-like structures * Exceptions "datetime" — Basic date and time types ************************************** **Source code:** Lib/datetime.py ====================================================================== The "datetime" module supplies classes for manipulating dates and times. While date and time arithmetic is supported, the focus of the implementation is on efficient attribute extraction for output formatting and manipulation. See also: Module "calendar" General calendar related functions. Module "time" Time access and conversions. Module "zoneinfo" Concrete time zones representing the IANA time zone database. Package dateutil Third-party library with expanded time zone and parsing support. Aware and Naive Objects ======================= Date and time objects may be categorized as “aware” or “naive” depending on whether or not they include timezone information. With sufficient knowledge of applicable algorithmic and political time adjustments, such as time zone and daylight saving time information, an **aware** object can locate itself relative to other aware objects. An aware object represents a specific moment in time that is not open to interpretation. [1] A **naive** object does not contain enough information to unambiguously locate itself relative to other date/time objects. Whether a naive object represents Coordinated Universal Time (UTC), local time, or time in some other timezone is purely up to the program, just like it is up to the program whether a particular number represents metres, miles, or mass. Naive objects are easy to understand and to work with, at the cost of ignoring some aspects of reality. For applications requiring aware objects, "datetime" and "time" objects have an optional time zone information attribute, "tzinfo", that can be set to an instance of a subclass of the abstract "tzinfo" class. These "tzinfo" objects capture information about the offset from UTC time, the time zone name, and whether daylight saving time is in effect. Only one concrete "tzinfo" class, the "timezone" class, is supplied by the "datetime" module. The "timezone" class can represent simple timezones with fixed offsets from UTC, such as UTC itself or North American EST and EDT timezones. Supporting timezones at deeper levels of detail is up to the application. The rules for time adjustment across the world are more political than rational, change frequently, and there is no standard suitable for every application aside from UTC. Constants ========= The "datetime" module exports the following constants: datetime.MINYEAR The smallest year number allowed in a "date" or "datetime" object. "MINYEAR" is "1". datetime.MAXYEAR The largest year number allowed in a "date" or "datetime" object. "MAXYEAR" is "9999". Available Types =============== class datetime.date An idealized naive date, assuming the current Gregorian calendar always was, and always will be, in effect. Attributes: "year", "month", and "day". class datetime.time An idealized time, independent of any particular day, assuming that every day has exactly 24*60*60 seconds. (There is no notion of “leap seconds” here.) Attributes: "hour", "minute", "second", "microsecond", and "tzinfo". class datetime.datetime A combination of a date and a time. Attributes: "year", "month", "day", "hour", "minute", "second", "microsecond", and "tzinfo". class datetime.timedelta A duration expressing the difference between two "date", "time", or "datetime" instances to microsecond resolution. class datetime.tzinfo An abstract base class for time zone information objects. These are used by the "datetime" and "time" classes to provide a customizable notion of time adjustment (for example, to account for time zone and/or daylight saving time). class datetime.timezone A class that implements the "tzinfo" abstract base class as a fixed offset from the UTC. New in version 3.2. Objects of these types are immutable. Subclass relationships: object timedelta tzinfo timezone time date datetime Common Properties ----------------- The "date", "datetime", "time", and "timezone" types share these common features: * Objects of these types are immutable. * Objects of these types are hashable, meaning that they can be used as dictionary keys. * Objects of these types support efficient pickling via the "pickle" module. Determining if an Object is Aware or Naive ------------------------------------------ Objects of the "date" type are always naive. An object of type "time" or "datetime" may be aware or naive. A "datetime" object *d* is aware if both of the following hold: 1. "d.tzinfo" is not "None" 2. "d.tzinfo.utcoffset(d)" does not return "None" Otherwise, *d* is naive. A "time" object *t* is aware if both of the following hold: 1. "t.tzinfo" is not "None" 2. "t.tzinfo.utcoffset(None)" does not return "None". Otherwise, *t* is naive. The distinction between aware and naive doesn’t apply to "timedelta" objects. "timedelta" Objects =================== A "timedelta" object represents a duration, the difference between two dates or times. class datetime.timedelta(days=0, seconds=0, microseconds=0, milliseconds=0, minutes=0, hours=0, weeks=0) All arguments are optional and default to "0". Arguments may be integers or floats, and may be positive or negative. Only *days*, *seconds* and *microseconds* are stored internally. Arguments are converted to those units: * A millisecond is converted to 1000 microseconds. * A minute is converted to 60 seconds. * An hour is converted to 3600 seconds. * A week is converted to 7 days. and days, seconds and microseconds are then normalized so that the representation is unique, with * "0 <= microseconds < 1000000" * "0 <= seconds < 3600*24" (the number of seconds in one day) * "-999999999 <= days <= 999999999" The following example illustrates how any arguments besides *days*, *seconds* and *microseconds* are “merged” and normalized into those three resulting attributes: >>> from datetime import timedelta >>> delta = timedelta( ... days=50, ... seconds=27, ... microseconds=10, ... milliseconds=29000, ... minutes=5, ... hours=8, ... weeks=2 ... ) >>> # Only days, seconds, and microseconds remain >>> delta datetime.timedelta(days=64, seconds=29156, microseconds=10) If any argument is a float and there are fractional microseconds, the fractional microseconds left over from all arguments are combined and their sum is rounded to the nearest microsecond using round-half-to-even tiebreaker. If no argument is a float, the conversion and normalization processes are exact (no information is lost). If the normalized value of days lies outside the indicated range, "OverflowError" is raised. Note that normalization of negative values may be surprising at first. For example: >>> from datetime import timedelta >>> d = timedelta(microseconds=-1) >>> (d.days, d.seconds, d.microseconds) (-1, 86399, 999999) Class attributes: timedelta.min The most negative "timedelta" object, "timedelta(-999999999)". timedelta.max The most positive "timedelta" object, "timedelta(days=999999999, hours=23, minutes=59, seconds=59, microseconds=999999)". timedelta.resolution The smallest possible difference between non-equal "timedelta" objects, "timedelta(microseconds=1)". Note that, because of normalization, "timedelta.max" > "-timedelta.min". "-timedelta.max" is not representable as a "timedelta" object. Instance attributes (read-only): +--------------------+----------------------------------------------+ | Attribute | Value | |====================|==============================================| | "days" | Between -999999999 and 999999999 inclusive | +--------------------+----------------------------------------------+ | "seconds" | Between 0 and 86399 inclusive | +--------------------+----------------------------------------------+ | "microseconds" | Between 0 and 999999 inclusive | +--------------------+----------------------------------------------+ Supported operations: +----------------------------------+-------------------------------------------------+ | Operation | Result | |==================================|=================================================| | "t1 = t2 + t3" | Sum of *t2* and *t3*. Afterwards *t1*-*t2* == | | | *t3* and *t1*-*t3* == *t2* are true. (1) | +----------------------------------+-------------------------------------------------+ | "t1 = t2 - t3" | Difference of *t2* and *t3*. Afterwards *t1* == | | | *t2* - *t3* and *t2* == *t1* + *t3* are true. | | | (1)(6) | +----------------------------------+-------------------------------------------------+ | "t1 = t2 * i or t1 = i * t2" | Delta multiplied by an integer. Afterwards *t1* | | | // i == *t2* is true, provided "i != 0". | +----------------------------------+-------------------------------------------------+ | | In general, *t1* * i == *t1* * (i-1) + *t1* is | | | true. (1) | +----------------------------------+-------------------------------------------------+ | "t1 = t2 * f or t1 = f * t2" | Delta multiplied by a float. The result is | | | rounded to the nearest multiple of | | | timedelta.resolution using round-half-to-even. | +----------------------------------+-------------------------------------------------+ | "f = t2 / t3" | Division (3) of overall duration *t2* by | | | interval unit *t3*. Returns a "float" object. | +----------------------------------+-------------------------------------------------+ | "t1 = t2 / f or t1 = t2 / i" | Delta divided by a float or an int. The result | | | is rounded to the nearest multiple of | | | timedelta.resolution using round-half-to-even. | +----------------------------------+-------------------------------------------------+ | "t1 = t2 // i" or "t1 = t2 // | The floor is computed and the remainder (if | | t3" | any) is thrown away. In the second case, an | | | integer is returned. (3) | +----------------------------------+-------------------------------------------------+ | "t1 = t2 % t3" | The remainder is computed as a "timedelta" | | | object. (3) | +----------------------------------+-------------------------------------------------+ | "q, r = divmod(t1, t2)" | Computes the quotient and the remainder: "q = | | | t1 // t2" (3) and "r = t1 % t2". q is an | | | integer and r is a "timedelta" object. | +----------------------------------+-------------------------------------------------+ | "+t1" | Returns a "timedelta" object with the same | | | value. (2) | +----------------------------------+-------------------------------------------------+ | "-t1" | equivalent to "timedelta"(-*t1.days*, | | | -*t1.seconds*, -*t1.microseconds*), and to | | | *t1** -1. (1)(4) | +----------------------------------+-------------------------------------------------+ | "abs(t)" | equivalent to +*t* when "t.days >= 0", and to | | | -*t* when "t.days < 0". (2) | +----------------------------------+-------------------------------------------------+ | "str(t)" | Returns a string in the form "[D day[s], | | | ][H]H:MM:SS[.UUUUUU]", where D is negative for | | | negative "t". (5) | +----------------------------------+-------------------------------------------------+ | "repr(t)" | Returns a string representation of the | | | "timedelta" object as a constructor call with | | | canonical attribute values. | +----------------------------------+-------------------------------------------------+ Notes: 1. This is exact but may overflow. 2. This is exact and cannot overflow. 3. Division by 0 raises "ZeroDivisionError". 4. -*timedelta.max* is not representable as a "timedelta" object. 5. String representations of "timedelta" objects are normalized similarly to their internal representation. This leads to somewhat unusual results for negative timedeltas. For example: >>> timedelta(hours=-5) datetime.timedelta(days=-1, seconds=68400) >>> print(_) -1 day, 19:00:00 6. The expression "t2 - t3" will always be equal to the expression "t2 + (-t3)" except when t3 is equal to "timedelta.max"; in that case the former will produce a result while the latter will overflow. In addition to the operations listed above, "timedelta" objects support certain additions and subtractions with "date" and "datetime" objects (see below). Changed in version 3.2: Floor division and true division of a "timedelta" object by another "timedelta" object are now supported, as are remainder operations and the "divmod()" function. True division and multiplication of a "timedelta" object by a "float" object are now supported. Comparisons of "timedelta" objects are supported, with some caveats. The comparisons "==" or "!=" *always* return a "bool", no matter the type of the compared object: >>> from datetime import timedelta >>> delta1 = timedelta(seconds=57) >>> delta2 = timedelta(hours=25, seconds=2) >>> delta2 != delta1 True >>> delta2 == 5 False For all other comparisons (such as "<" and ">"), when a "timedelta" object is compared to an object of a different type, "TypeError" is raised: >>> delta2 > delta1 True >>> delta2 > 5 Traceback (most recent call last): File "", line 1, in TypeError: '>' not supported between instances of 'datetime.timedelta' and 'int' In Boolean contexts, a "timedelta" object is considered to be true if and only if it isn’t equal to "timedelta(0)". Instance methods: timedelta.total_seconds() Return the total number of seconds contained in the duration. Equivalent to "td / timedelta(seconds=1)". For interval units other than seconds, use the division form directly (e.g. "td / timedelta(microseconds=1)"). Note that for very large time intervals (greater than 270 years on most platforms) this method will lose microsecond accuracy. New in version 3.2. Examples of usage: "timedelta" ------------------------------ An additional example of normalization: >>> # Components of another_year add up to exactly 365 days >>> from datetime import timedelta >>> year = timedelta(days=365) >>> another_year = timedelta(weeks=40, days=84, hours=23, ... minutes=50, seconds=600) >>> year == another_year True >>> year.total_seconds() 31536000.0 Examples of "timedelta" arithmetic: >>> from datetime import timedelta >>> year = timedelta(days=365) >>> ten_years = 10 * year >>> ten_years datetime.timedelta(days=3650) >>> ten_years.days // 365 10 >>> nine_years = ten_years - year >>> nine_years datetime.timedelta(days=3285) >>> three_years = nine_years // 3 >>> three_years, three_years.days // 365 (datetime.timedelta(days=1095), 3) "date" Objects ============== A "date" object represents a date (year, month and day) in an idealized calendar, the current Gregorian calendar indefinitely extended in both directions. January 1 of year 1 is called day number 1, January 2 of year 1 is called day number 2, and so on. [2] class datetime.date(year, month, day) All arguments are required. Arguments must be integers, in the following ranges: * "MINYEAR <= year <= MAXYEAR" * "1 <= month <= 12" * "1 <= day <= number of days in the given month and year" If an argument outside those ranges is given, "ValueError" is raised. Other constructors, all class methods: classmethod date.today() Return the current local date. This is equivalent to "date.fromtimestamp(time.time())". classmethod date.fromtimestamp(timestamp) Return the local date corresponding to the POSIX timestamp, such as is returned by "time.time()". This may raise "OverflowError", if the timestamp is out of the range of values supported by the platform C "localtime()" function, and "OSError" on "localtime()" failure. It’s common for this to be restricted to years from 1970 through 2038. Note that on non-POSIX systems that include leap seconds in their notion of a timestamp, leap seconds are ignored by "fromtimestamp()". Changed in version 3.3: Raise "OverflowError" instead of "ValueError" if the timestamp is out of the range of values supported by the platform C "localtime()" function. Raise "OSError" instead of "ValueError" on "localtime()" failure. classmethod date.fromordinal(ordinal) Return the date corresponding to the proleptic Gregorian ordinal, where January 1 of year 1 has ordinal 1. "ValueError" is raised unless "1 <= ordinal <= date.max.toordinal()". For any date *d*, "date.fromordinal(d.toordinal()) == d". classmethod date.fromisoformat(date_string) Return a "date" corresponding to a *date_string* given in the format "YYYY-MM-DD": >>> from datetime import date >>> date.fromisoformat('2019-12-04') datetime.date(2019, 12, 4) This is the inverse of "date.isoformat()". It only supports the format "YYYY-MM-DD". New in version 3.7. classmethod date.fromisocalendar(year, week, day) Return a "date" corresponding to the ISO calendar date specified by year, week and day. This is the inverse of the function "date.isocalendar()". New in version 3.8. Class attributes: date.min The earliest representable date, "date(MINYEAR, 1, 1)". date.max The latest representable date, "date(MAXYEAR, 12, 31)". date.resolution The smallest possible difference between non-equal date objects, "timedelta(days=1)". Instance attributes (read-only): date.year Between "MINYEAR" and "MAXYEAR" inclusive. date.month Between 1 and 12 inclusive. date.day Between 1 and the number of days in the given month of the given year. Supported operations: +---------------------------------+------------------------------------------------+ | Operation | Result | |=================================|================================================| | "date2 = date1 + timedelta" | *date2* is "timedelta.days" days removed from | | | *date1*. (1) | +---------------------------------+------------------------------------------------+ | "date2 = date1 - timedelta" | Computes *date2* such that "date2 + timedelta | | | == date1". (2) | +---------------------------------+------------------------------------------------+ | "timedelta = date1 - date2" | (3) | +---------------------------------+------------------------------------------------+ | "date1 < date2" | *date1* is considered less than *date2* when | | | *date1* precedes *date2* in time. (4) | +---------------------------------+------------------------------------------------+ Notes: 1. *date2* is moved forward in time if "timedelta.days > 0", or backward if "timedelta.days < 0". Afterward "date2 - date1 == timedelta.days". "timedelta.seconds" and "timedelta.microseconds" are ignored. "OverflowError" is raised if "date2.year" would be smaller than "MINYEAR" or larger than "MAXYEAR". 2. "timedelta.seconds" and "timedelta.microseconds" are ignored. 3. This is exact, and cannot overflow. timedelta.seconds and timedelta.microseconds are 0, and date2 + timedelta == date1 after. 4. In other words, "date1 < date2" if and only if "date1.toordinal() < date2.toordinal()". Date comparison raises "TypeError" if the other comparand isn’t also a "date" object. However, "NotImplemented" is returned instead if the other comparand has a "timetuple()" attribute. This hook gives other kinds of date objects a chance at implementing mixed-type comparison. If not, when a "date" object is compared to an object of a different type, "TypeError" is raised unless the comparison is "==" or "!=". The latter cases return "False" or "True", respectively. In Boolean contexts, all "date" objects are considered to be true. Instance methods: date.replace(year=self.year, month=self.month, day=self.day) Return a date with the same value, except for those parameters given new values by whichever keyword arguments are specified. Example: >>> from datetime import date >>> d = date(2002, 12, 31) >>> d.replace(day=26) datetime.date(2002, 12, 26) date.timetuple() Return a "time.struct_time" such as returned by "time.localtime()". The hours, minutes and seconds are 0, and the DST flag is -1. "d.timetuple()" is equivalent to: time.struct_time((d.year, d.month, d.day, 0, 0, 0, d.weekday(), yday, -1)) where "yday = d.toordinal() - date(d.year, 1, 1).toordinal() + 1" is the day number within the current year starting with "1" for January 1st. date.toordinal() Return the proleptic Gregorian ordinal of the date, where January 1 of year 1 has ordinal 1. For any "date" object *d*, "date.fromordinal(d.toordinal()) == d". date.weekday() Return the day of the week as an integer, where Monday is 0 and Sunday is 6. For example, "date(2002, 12, 4).weekday() == 2", a Wednesday. See also "isoweekday()". date.isoweekday() Return the day of the week as an integer, where Monday is 1 and Sunday is 7. For example, "date(2002, 12, 4).isoweekday() == 3", a Wednesday. See also "weekday()", "isocalendar()". date.isocalendar() Return a *named tuple* object with three components: "year", "week" and "weekday". The ISO calendar is a widely used variant of the Gregorian calendar. [3] The ISO year consists of 52 or 53 full weeks, and where a week starts on a Monday and ends on a Sunday. The first week of an ISO year is the first (Gregorian) calendar week of a year containing a Thursday. This is called week number 1, and the ISO year of that Thursday is the same as its Gregorian year. For example, 2004 begins on a Thursday, so the first week of ISO year 2004 begins on Monday, 29 Dec 2003 and ends on Sunday, 4 Jan 2004: >>> from datetime import date >>> date(2003, 12, 29).isocalendar() datetime.IsoCalendarDate(year=2004, week=1, weekday=1) >>> date(2004, 1, 4).isocalendar() datetime.IsoCalendarDate(year=2004, week=1, weekday=7) Changed in version 3.9: Result changed from a tuple to a *named tuple*. date.isoformat() Return a string representing the date in ISO 8601 format, "YYYY-MM- DD": >>> from datetime import date >>> date(2002, 12, 4).isoformat() '2002-12-04' This is the inverse of "date.fromisoformat()". date.__str__() For a date *d*, "str(d)" is equivalent to "d.isoformat()". date.ctime() Return a string representing the date: >>> from datetime import date >>> date(2002, 12, 4).ctime() 'Wed Dec 4 00:00:00 2002' "d.ctime()" is equivalent to: time.ctime(time.mktime(d.timetuple())) on platforms where the native C "ctime()" function (which "time.ctime()" invokes, but which "date.ctime()" does not invoke) conforms to the C standard. date.strftime(format) Return a string representing the date, controlled by an explicit format string. Format codes referring to hours, minutes or seconds will see 0 values. For a complete list of formatting directives, see strftime() and strptime() Behavior. date.__format__(format) Same as "date.strftime()". This makes it possible to specify a format string for a "date" object in formatted string literals and when using "str.format()". For a complete list of formatting directives, see strftime() and strptime() Behavior. Examples of Usage: "date" ------------------------- Example of counting days to an event: >>> import time >>> from datetime import date >>> today = date.today() >>> today datetime.date(2007, 12, 5) >>> today == date.fromtimestamp(time.time()) True >>> my_birthday = date(today.year, 6, 24) >>> if my_birthday < today: ... my_birthday = my_birthday.replace(year=today.year + 1) >>> my_birthday datetime.date(2008, 6, 24) >>> time_to_birthday = abs(my_birthday - today) >>> time_to_birthday.days 202 More examples of working with "date": >>> from datetime import date >>> d = date.fromordinal(730920) # 730920th day after 1. 1. 0001 >>> d datetime.date(2002, 3, 11) >>> # Methods related to formatting string output >>> d.isoformat() '2002-03-11' >>> d.strftime("%d/%m/%y") '11/03/02' >>> d.strftime("%A %d. %B %Y") 'Monday 11. March 2002' >>> d.ctime() 'Mon Mar 11 00:00:00 2002' >>> 'The {1} is {0:%d}, the {2} is {0:%B}.'.format(d, "day", "month") 'The day is 11, the month is March.' >>> # Methods for to extracting 'components' under different calendars >>> t = d.timetuple() >>> for i in t: ... print(i) 2002 # year 3 # month 11 # day 0 0 0 0 # weekday (0 = Monday) 70 # 70th day in the year -1 >>> ic = d.isocalendar() >>> for i in ic: ... print(i) 2002 # ISO year 11 # ISO week number 1 # ISO day number ( 1 = Monday ) >>> # A date object is immutable; all operations produce a new object >>> d.replace(year=2005) datetime.date(2005, 3, 11) "datetime" Objects ================== A "datetime" object is a single object containing all the information from a "date" object and a "time" object. Like a "date" object, "datetime" assumes the current Gregorian calendar extended in both directions; like a "time" object, "datetime" assumes there are exactly 3600*24 seconds in every day. Constructor: class datetime.datetime(year, month, day, hour=0, minute=0, second=0, microsecond=0, tzinfo=None, *, fold=0) The *year*, *month* and *day* arguments are required. *tzinfo* may be "None", or an instance of a "tzinfo" subclass. The remaining arguments must be integers in the following ranges: * "MINYEAR <= year <= MAXYEAR", * "1 <= month <= 12", * "1 <= day <= number of days in the given month and year", * "0 <= hour < 24", * "0 <= minute < 60", * "0 <= second < 60", * "0 <= microsecond < 1000000", * "fold in [0, 1]". If an argument outside those ranges is given, "ValueError" is raised. New in version 3.6: Added the "fold" argument. Other constructors, all class methods: classmethod datetime.today() Return the current local datetime, with "tzinfo" "None". Equivalent to: datetime.fromtimestamp(time.time()) See also "now()", "fromtimestamp()". This method is functionally equivalent to "now()", but without a "tz" parameter. classmethod datetime.now(tz=None) Return the current local date and time. If optional argument *tz* is "None" or not specified, this is like "today()", but, if possible, supplies more precision than can be gotten from going through a "time.time()" timestamp (for example, this may be possible on platforms supplying the C "gettimeofday()" function). If *tz* is not "None", it must be an instance of a "tzinfo" subclass, and the current date and time are converted to *tz*’s time zone. This function is preferred over "today()" and "utcnow()". classmethod datetime.utcnow() Return the current UTC date and time, with "tzinfo" "None". This is like "now()", but returns the current UTC date and time, as a naive "datetime" object. An aware current UTC datetime can be obtained by calling "datetime.now(timezone.utc)". See also "now()". Warning: Because naive "datetime" objects are treated by many "datetime" methods as local times, it is preferred to use aware datetimes to represent times in UTC. As such, the recommended way to create an object representing the current time in UTC is by calling "datetime.now(timezone.utc)". classmethod datetime.fromtimestamp(timestamp, tz=None) Return the local date and time corresponding to the POSIX timestamp, such as is returned by "time.time()". If optional argument *tz* is "None" or not specified, the timestamp is converted to the platform’s local date and time, and the returned "datetime" object is naive. If *tz* is not "None", it must be an instance of a "tzinfo" subclass, and the timestamp is converted to *tz*’s time zone. "fromtimestamp()" may raise "OverflowError", if the timestamp is out of the range of values supported by the platform C "localtime()" or "gmtime()" functions, and "OSError" on "localtime()" or "gmtime()" failure. It’s common for this to be restricted to years in 1970 through 2038. Note that on non-POSIX systems that include leap seconds in their notion of a timestamp, leap seconds are ignored by "fromtimestamp()", and then it’s possible to have two timestamps differing by a second that yield identical "datetime" objects. This method is preferred over "utcfromtimestamp()". Changed in version 3.3: Raise "OverflowError" instead of "ValueError" if the timestamp is out of the range of values supported by the platform C "localtime()" or "gmtime()" functions. Raise "OSError" instead of "ValueError" on "localtime()" or "gmtime()" failure. Changed in version 3.6: "fromtimestamp()" may return instances with "fold" set to 1. classmethod datetime.utcfromtimestamp(timestamp) Return the UTC "datetime" corresponding to the POSIX timestamp, with "tzinfo" "None". (The resulting object is naive.) This may raise "OverflowError", if the timestamp is out of the range of values supported by the platform C "gmtime()" function, and "OSError" on "gmtime()" failure. It’s common for this to be restricted to years in 1970 through 2038. To get an aware "datetime" object, call "fromtimestamp()": datetime.fromtimestamp(timestamp, timezone.utc) On the POSIX compliant platforms, it is equivalent to the following expression: datetime(1970, 1, 1, tzinfo=timezone.utc) + timedelta(seconds=timestamp) except the latter formula always supports the full years range: between "MINYEAR" and "MAXYEAR" inclusive. Warning: Because naive "datetime" objects are treated by many "datetime" methods as local times, it is preferred to use aware datetimes to represent times in UTC. As such, the recommended way to create an object representing a specific timestamp in UTC is by calling "datetime.fromtimestamp(timestamp, tz=timezone.utc)". Changed in version 3.3: Raise "OverflowError" instead of "ValueError" if the timestamp is out of the range of values supported by the platform C "gmtime()" function. Raise "OSError" instead of "ValueError" on "gmtime()" failure. classmethod datetime.fromordinal(ordinal) Return the "datetime" corresponding to the proleptic Gregorian ordinal, where January 1 of year 1 has ordinal 1. "ValueError" is raised unless "1 <= ordinal <= datetime.max.toordinal()". The hour, minute, second and microsecond of the result are all 0, and "tzinfo" is "None". classmethod datetime.combine(date, time, tzinfo=self.tzinfo) Return a new "datetime" object whose date components are equal to the given "date" object’s, and whose time components are equal to the given "time" object’s. If the *tzinfo* argument is provided, its value is used to set the "tzinfo" attribute of the result, otherwise the "tzinfo" attribute of the *time* argument is used. For any "datetime" object *d*, "d == datetime.combine(d.date(), d.time(), d.tzinfo)". If date is a "datetime" object, its time components and "tzinfo" attributes are ignored. Changed in version 3.6: Added the *tzinfo* argument. classmethod datetime.fromisoformat(date_string) Return a "datetime" corresponding to a *date_string* in one of the formats emitted by "date.isoformat()" and "datetime.isoformat()". Specifically, this function supports strings in the format: YYYY-MM-DD[*HH[:MM[:SS[.fff[fff]]]][+HH:MM[:SS[.ffffff]]]] where "*" can match any single character. Caution: This does *not* support parsing arbitrary ISO 8601 strings - it is only intended as the inverse operation of "datetime.isoformat()". A more full-featured ISO 8601 parser, "dateutil.parser.isoparse" is available in the third-party package dateutil. Examples: >>> from datetime import datetime >>> datetime.fromisoformat('2011-11-04') datetime.datetime(2011, 11, 4, 0, 0) >>> datetime.fromisoformat('2011-11-04T00:05:23') datetime.datetime(2011, 11, 4, 0, 5, 23) >>> datetime.fromisoformat('2011-11-04 00:05:23.283') datetime.datetime(2011, 11, 4, 0, 5, 23, 283000) >>> datetime.fromisoformat('2011-11-04 00:05:23.283+00:00') datetime.datetime(2011, 11, 4, 0, 5, 23, 283000, tzinfo=datetime.timezone.utc) >>> datetime.fromisoformat('2011-11-04T00:05:23+04:00') datetime.datetime(2011, 11, 4, 0, 5, 23, tzinfo=datetime.timezone(datetime.timedelta(seconds=14400))) New in version 3.7. classmethod datetime.fromisocalendar(year, week, day) Return a "datetime" corresponding to the ISO calendar date specified by year, week and day. The non-date components of the datetime are populated with their normal default values. This is the inverse of the function "datetime.isocalendar()". New in version 3.8. classmethod datetime.strptime(date_string, format) Return a "datetime" corresponding to *date_string*, parsed according to *format*. This is equivalent to: datetime(*(time.strptime(date_string, format)[0:6])) "ValueError" is raised if the date_string and format can’t be parsed by "time.strptime()" or if it returns a value which isn’t a time tuple. For a complete list of formatting directives, see strftime() and strptime() Behavior. Class attributes: datetime.min The earliest representable "datetime", "datetime(MINYEAR, 1, 1, tzinfo=None)". datetime.max The latest representable "datetime", "datetime(MAXYEAR, 12, 31, 23, 59, 59, 999999, tzinfo=None)". datetime.resolution The smallest possible difference between non-equal "datetime" objects, "timedelta(microseconds=1)". Instance attributes (read-only): datetime.year Between "MINYEAR" and "MAXYEAR" inclusive. datetime.month Between 1 and 12 inclusive. datetime.day Between 1 and the number of days in the given month of the given year. datetime.hour In "range(24)". datetime.minute In "range(60)". datetime.second In "range(60)". datetime.microsecond In "range(1000000)". datetime.tzinfo The object passed as the *tzinfo* argument to the "datetime" constructor, or "None" if none was passed. datetime.fold In "[0, 1]". Used to disambiguate wall times during a repeated interval. (A repeated interval occurs when clocks are rolled back at the end of daylight saving time or when the UTC offset for the current zone is decreased for political reasons.) The value 0 (1) represents the earlier (later) of the two moments with the same wall time representation. New in version 3.6. Supported operations: +-----------------------------------------+----------------------------------+ | Operation | Result | |=========================================|==================================| | "datetime2 = datetime1 + timedelta" | (1) | +-----------------------------------------+----------------------------------+ | "datetime2 = datetime1 - timedelta" | (2) | +-----------------------------------------+----------------------------------+ | "timedelta = datetime1 - datetime2" | (3) | +-----------------------------------------+----------------------------------+ | "datetime1 < datetime2" | Compares "datetime" to | | | "datetime". (4) | +-----------------------------------------+----------------------------------+ 1. datetime2 is a duration of timedelta removed from datetime1, moving forward in time if "timedelta.days" > 0, or backward if "timedelta.days" < 0. The result has the same "tzinfo" attribute as the input datetime, and datetime2 - datetime1 == timedelta after. "OverflowError" is raised if datetime2.year would be smaller than "MINYEAR" or larger than "MAXYEAR". Note that no time zone adjustments are done even if the input is an aware object. 2. Computes the datetime2 such that datetime2 + timedelta == datetime1. As for addition, the result has the same "tzinfo" attribute as the input datetime, and no time zone adjustments are done even if the input is aware. 3. Subtraction of a "datetime" from a "datetime" is defined only if both operands are naive, or if both are aware. If one is aware and the other is naive, "TypeError" is raised. If both are naive, or both are aware and have the same "tzinfo" attribute, the "tzinfo" attributes are ignored, and the result is a "timedelta" object *t* such that "datetime2 + t == datetime1". No time zone adjustments are done in this case. If both are aware and have different "tzinfo" attributes, "a-b" acts as if *a* and *b* were first converted to naive UTC datetimes first. The result is "(a.replace(tzinfo=None) - a.utcoffset()) - (b.replace(tzinfo=None) - b.utcoffset())" except that the implementation never overflows. 4. *datetime1* is considered less than *datetime2* when *datetime1* precedes *datetime2* in time. If one comparand is naive and the other is aware, "TypeError" is raised if an order comparison is attempted. For equality comparisons, naive instances are never equal to aware instances. If both comparands are aware, and have the same "tzinfo" attribute, the common "tzinfo" attribute is ignored and the base datetimes are compared. If both comparands are aware and have different "tzinfo" attributes, the comparands are first adjusted by subtracting their UTC offsets (obtained from "self.utcoffset()"). Changed in version 3.3: Equality comparisons between aware and naive "datetime" instances don’t raise "TypeError". Note: In order to stop comparison from falling back to the default scheme of comparing object addresses, datetime comparison normally raises "TypeError" if the other comparand isn’t also a "datetime" object. However, "NotImplemented" is returned instead if the other comparand has a "timetuple()" attribute. This hook gives other kinds of date objects a chance at implementing mixed- type comparison. If not, when a "datetime" object is compared to an object of a different type, "TypeError" is raised unless the comparison is "==" or "!=". The latter cases return "False" or "True", respectively. Instance methods: datetime.date() Return "date" object with same year, month and day. datetime.time() Return "time" object with same hour, minute, second, microsecond and fold. "tzinfo" is "None". See also method "timetz()". Changed in version 3.6: The fold value is copied to the returned "time" object. datetime.timetz() Return "time" object with same hour, minute, second, microsecond, fold, and tzinfo attributes. See also method "time()". Changed in version 3.6: The fold value is copied to the returned "time" object. datetime.replace(year=self.year, month=self.month, day=self.day, hour=self.hour, minute=self.minute, second=self.second, microsecond=self.microsecond, tzinfo=self.tzinfo, *, fold=0) Return a datetime with the same attributes, except for those attributes given new values by whichever keyword arguments are specified. Note that "tzinfo=None" can be specified to create a naive datetime from an aware datetime with no conversion of date and time data. New in version 3.6: Added the "fold" argument. datetime.astimezone(tz=None) Return a "datetime" object with new "tzinfo" attribute *tz*, adjusting the date and time data so the result is the same UTC time as *self*, but in *tz*’s local time. If provided, *tz* must be an instance of a "tzinfo" subclass, and its "utcoffset()" and "dst()" methods must not return "None". If *self* is naive, it is presumed to represent time in the system timezone. If called without arguments (or with "tz=None") the system local timezone is assumed for the target timezone. The ".tzinfo" attribute of the converted datetime instance will be set to an instance of "timezone" with the zone name and offset obtained from the OS. If "self.tzinfo" is *tz*, "self.astimezone(tz)" is equal to *self*: no adjustment of date or time data is performed. Else the result is local time in the timezone *tz*, representing the same UTC time as *self*: after "astz = dt.astimezone(tz)", "astz - astz.utcoffset()" will have the same date and time data as "dt - dt.utcoffset()". If you merely want to attach a time zone object *tz* to a datetime *dt* without adjustment of date and time data, use "dt.replace(tzinfo=tz)". If you merely want to remove the time zone object from an aware datetime *dt* without conversion of date and time data, use "dt.replace(tzinfo=None)". Note that the default "tzinfo.fromutc()" method can be overridden in a "tzinfo" subclass to affect the result returned by "astimezone()". Ignoring error cases, "astimezone()" acts like: def astimezone(self, tz): if self.tzinfo is tz: return self # Convert self to UTC, and attach the new time zone object. utc = (self - self.utcoffset()).replace(tzinfo=tz) # Convert from UTC to tz's local time. return tz.fromutc(utc) Changed in version 3.3: *tz* now can be omitted. Changed in version 3.6: The "astimezone()" method can now be called on naive instances that are presumed to represent system local time. datetime.utcoffset() If "tzinfo" is "None", returns "None", else returns "self.tzinfo.utcoffset(self)", and raises an exception if the latter doesn’t return "None" or a "timedelta" object with magnitude less than one day. Changed in version 3.7: The UTC offset is not restricted to a whole number of minutes. datetime.dst() If "tzinfo" is "None", returns "None", else returns "self.tzinfo.dst(self)", and raises an exception if the latter doesn’t return "None" or a "timedelta" object with magnitude less than one day. Changed in version 3.7: The DST offset is not restricted to a whole number of minutes. datetime.tzname() If "tzinfo" is "None", returns "None", else returns "self.tzinfo.tzname(self)", raises an exception if the latter doesn’t return "None" or a string object, datetime.timetuple() Return a "time.struct_time" such as returned by "time.localtime()". "d.timetuple()" is equivalent to: time.struct_time((d.year, d.month, d.day, d.hour, d.minute, d.second, d.weekday(), yday, dst)) where "yday = d.toordinal() - date(d.year, 1, 1).toordinal() + 1" is the day number within the current year starting with "1" for January 1st. The "tm_isdst" flag of the result is set according to the "dst()" method: "tzinfo" is "None" or "dst()" returns "None", "tm_isdst" is set to "-1"; else if "dst()" returns a non-zero value, "tm_isdst" is set to "1"; else "tm_isdst" is set to "0". datetime.utctimetuple() If "datetime" instance *d* is naive, this is the same as "d.timetuple()" except that "tm_isdst" is forced to 0 regardless of what "d.dst()" returns. DST is never in effect for a UTC time. If *d* is aware, *d* is normalized to UTC time, by subtracting "d.utcoffset()", and a "time.struct_time" for the normalized time is returned. "tm_isdst" is forced to 0. Note that an "OverflowError" may be raised if *d*.year was "MINYEAR" or "MAXYEAR" and UTC adjustment spills over a year boundary. Warning: Because naive "datetime" objects are treated by many "datetime" methods as local times, it is preferred to use aware datetimes to represent times in UTC; as a result, using "utcfromtimetuple" may give misleading results. If you have a naive "datetime" representing UTC, use "datetime.replace(tzinfo=timezone.utc)" to make it aware, at which point you can use "datetime.timetuple()". datetime.toordinal() Return the proleptic Gregorian ordinal of the date. The same as "self.date().toordinal()". datetime.timestamp() Return POSIX timestamp corresponding to the "datetime" instance. The return value is a "float" similar to that returned by "time.time()". Naive "datetime" instances are assumed to represent local time and this method relies on the platform C "mktime()" function to perform the conversion. Since "datetime" supports wider range of values than "mktime()" on many platforms, this method may raise "OverflowError" for times far in the past or far in the future. For aware "datetime" instances, the return value is computed as: (dt - datetime(1970, 1, 1, tzinfo=timezone.utc)).total_seconds() New in version 3.3. Changed in version 3.6: The "timestamp()" method uses the "fold" attribute to disambiguate the times during a repeated interval. Note: There is no method to obtain the POSIX timestamp directly from a naive "datetime" instance representing UTC time. If your application uses this convention and your system timezone is not set to UTC, you can obtain the POSIX timestamp by supplying "tzinfo=timezone.utc": timestamp = dt.replace(tzinfo=timezone.utc).timestamp() or by calculating the timestamp directly: timestamp = (dt - datetime(1970, 1, 1)) / timedelta(seconds=1) datetime.weekday() Return the day of the week as an integer, where Monday is 0 and Sunday is 6. The same as "self.date().weekday()". See also "isoweekday()". datetime.isoweekday() Return the day of the week as an integer, where Monday is 1 and Sunday is 7. The same as "self.date().isoweekday()". See also "weekday()", "isocalendar()". datetime.isocalendar() Return a *named tuple* with three components: "year", "week" and "weekday". The same as "self.date().isocalendar()". datetime.isoformat(sep='T', timespec='auto') Return a string representing the date and time in ISO 8601 format: * "YYYY-MM-DDTHH:MM:SS.ffffff", if "microsecond" is not 0 * "YYYY-MM-DDTHH:MM:SS", if "microsecond" is 0 If "utcoffset()" does not return "None", a string is appended, giving the UTC offset: * "YYYY-MM-DDTHH:MM:SS.ffffff+HH:MM[:SS[.ffffff]]", if "microsecond" is not 0 * "YYYY-MM-DDTHH:MM:SS+HH:MM[:SS[.ffffff]]", if "microsecond" is 0 Examples: >>> from datetime import datetime, timezone >>> datetime(2019, 5, 18, 15, 17, 8, 132263).isoformat() '2019-05-18T15:17:08.132263' >>> datetime(2019, 5, 18, 15, 17, tzinfo=timezone.utc).isoformat() '2019-05-18T15:17:00+00:00' The optional argument *sep* (default "'T'") is a one-character separator, placed between the date and time portions of the result. For example: >>> from datetime import tzinfo, timedelta, datetime >>> class TZ(tzinfo): ... """A time zone with an arbitrary, constant -06:39 offset.""" ... def utcoffset(self, dt): ... return timedelta(hours=-6, minutes=-39) ... >>> datetime(2002, 12, 25, tzinfo=TZ()).isoformat(' ') '2002-12-25 00:00:00-06:39' >>> datetime(2009, 11, 27, microsecond=100, tzinfo=TZ()).isoformat() '2009-11-27T00:00:00.000100-06:39' The optional argument *timespec* specifies the number of additional components of the time to include (the default is "'auto'"). It can be one of the following: * "'auto'": Same as "'seconds'" if "microsecond" is 0, same as "'microseconds'" otherwise. * "'hours'": Include the "hour" in the two-digit "HH" format. * "'minutes'": Include "hour" and "minute" in "HH:MM" format. * "'seconds'": Include "hour", "minute", and "second" in "HH:MM:SS" format. * "'milliseconds'": Include full time, but truncate fractional second part to milliseconds. "HH:MM:SS.sss" format. * "'microseconds'": Include full time in "HH:MM:SS.ffffff" format. Note: Excluded time components are truncated, not rounded. "ValueError" will be raised on an invalid *timespec* argument: >>> from datetime import datetime >>> datetime.now().isoformat(timespec='minutes') '2002-12-25T00:00' >>> dt = datetime(2015, 1, 1, 12, 30, 59, 0) >>> dt.isoformat(timespec='microseconds') '2015-01-01T12:30:59.000000' New in version 3.6: Added the *timespec* argument. datetime.__str__() For a "datetime" instance *d*, "str(d)" is equivalent to "d.isoformat(' ')". datetime.ctime() Return a string representing the date and time: >>> from datetime import datetime >>> datetime(2002, 12, 4, 20, 30, 40).ctime() 'Wed Dec 4 20:30:40 2002' The output string will *not* include time zone information, regardless of whether the input is aware or naive. "d.ctime()" is equivalent to: time.ctime(time.mktime(d.timetuple())) on platforms where the native C "ctime()" function (which "time.ctime()" invokes, but which "datetime.ctime()" does not invoke) conforms to the C standard. datetime.strftime(format) Return a string representing the date and time, controlled by an explicit format string. For a complete list of formatting directives, see strftime() and strptime() Behavior. datetime.__format__(format) Same as "datetime.strftime()". This makes it possible to specify a format string for a "datetime" object in formatted string literals and when using "str.format()". For a complete list of formatting directives, see strftime() and strptime() Behavior. Examples of Usage: "datetime" ----------------------------- Examples of working with "datetime" objects: >>> from datetime import datetime, date, time, timezone >>> # Using datetime.combine() >>> d = date(2005, 7, 14) >>> t = time(12, 30) >>> datetime.combine(d, t) datetime.datetime(2005, 7, 14, 12, 30) >>> # Using datetime.now() >>> datetime.now() datetime.datetime(2007, 12, 6, 16, 29, 43, 79043) # GMT +1 >>> datetime.now(timezone.utc) datetime.datetime(2007, 12, 6, 15, 29, 43, 79060, tzinfo=datetime.timezone.utc) >>> # Using datetime.strptime() >>> dt = datetime.strptime("21/11/06 16:30", "%d/%m/%y %H:%M") >>> dt datetime.datetime(2006, 11, 21, 16, 30) >>> # Using datetime.timetuple() to get tuple of all attributes >>> tt = dt.timetuple() >>> for it in tt: ... print(it) ... 2006 # year 11 # month 21 # day 16 # hour 30 # minute 0 # second 1 # weekday (0 = Monday) 325 # number of days since 1st January -1 # dst - method tzinfo.dst() returned None >>> # Date in ISO format >>> ic = dt.isocalendar() >>> for it in ic: ... print(it) ... 2006 # ISO year 47 # ISO week 2 # ISO weekday >>> # Formatting a datetime >>> dt.strftime("%A, %d. %B %Y %I:%M%p") 'Tuesday, 21. November 2006 04:30PM' >>> 'The {1} is {0:%d}, the {2} is {0:%B}, the {3} is {0:%I:%M%p}.'.format(dt, "day", "month", "time") 'The day is 21, the month is November, the time is 04:30PM.' The example below defines a "tzinfo" subclass capturing time zone information for Kabul, Afghanistan, which used +4 UTC until 1945 and then +4:30 UTC thereafter: from datetime import timedelta, datetime, tzinfo, timezone class KabulTz(tzinfo): # Kabul used +4 until 1945, when they moved to +4:30 UTC_MOVE_DATE = datetime(1944, 12, 31, 20, tzinfo=timezone.utc) def utcoffset(self, dt): if dt.year < 1945: return timedelta(hours=4) elif (1945, 1, 1, 0, 0) <= dt.timetuple()[:5] < (1945, 1, 1, 0, 30): # An ambiguous ("imaginary") half-hour range representing # a 'fold' in time due to the shift from +4 to +4:30. # If dt falls in the imaginary range, use fold to decide how # to resolve. See PEP495. return timedelta(hours=4, minutes=(30 if dt.fold else 0)) else: return timedelta(hours=4, minutes=30) def fromutc(self, dt): # Follow same validations as in datetime.tzinfo if not isinstance(dt, datetime): raise TypeError("fromutc() requires a datetime argument") if dt.tzinfo is not self: raise ValueError("dt.tzinfo is not self") # A custom implementation is required for fromutc as # the input to this function is a datetime with utc values # but with a tzinfo set to self. # See datetime.astimezone or fromtimestamp. if dt.replace(tzinfo=timezone.utc) >= self.UTC_MOVE_DATE: return dt + timedelta(hours=4, minutes=30) else: return dt + timedelta(hours=4) def dst(self, dt): # Kabul does not observe daylight saving time. return timedelta(0) def tzname(self, dt): if dt >= self.UTC_MOVE_DATE: return "+04:30" return "+04" Usage of "KabulTz" from above: >>> tz1 = KabulTz() >>> # Datetime before the change >>> dt1 = datetime(1900, 11, 21, 16, 30, tzinfo=tz1) >>> print(dt1.utcoffset()) 4:00:00 >>> # Datetime after the change >>> dt2 = datetime(2006, 6, 14, 13, 0, tzinfo=tz1) >>> print(dt2.utcoffset()) 4:30:00 >>> # Convert datetime to another time zone >>> dt3 = dt2.astimezone(timezone.utc) >>> dt3 datetime.datetime(2006, 6, 14, 8, 30, tzinfo=datetime.timezone.utc) >>> dt2 datetime.datetime(2006, 6, 14, 13, 0, tzinfo=KabulTz()) >>> dt2 == dt3 True "time" Objects ============== A "time" object represents a (local) time of day, independent of any particular day, and subject to adjustment via a "tzinfo" object. class datetime.time(hour=0, minute=0, second=0, microsecond=0, tzinfo=None, *, fold=0) All arguments are optional. *tzinfo* may be "None", or an instance of a "tzinfo" subclass. The remaining arguments must be integers in the following ranges: * "0 <= hour < 24", * "0 <= minute < 60", * "0 <= second < 60", * "0 <= microsecond < 1000000", * "fold in [0, 1]". If an argument outside those ranges is given, "ValueError" is raised. All default to "0" except *tzinfo*, which defaults to "None". Class attributes: time.min The earliest representable "time", "time(0, 0, 0, 0)". time.max The latest representable "time", "time(23, 59, 59, 999999)". time.resolution The smallest possible difference between non-equal "time" objects, "timedelta(microseconds=1)", although note that arithmetic on "time" objects is not supported. Instance attributes (read-only): time.hour In "range(24)". time.minute In "range(60)". time.second In "range(60)". time.microsecond In "range(1000000)". time.tzinfo The object passed as the tzinfo argument to the "time" constructor, or "None" if none was passed. time.fold In "[0, 1]". Used to disambiguate wall times during a repeated interval. (A repeated interval occurs when clocks are rolled back at the end of daylight saving time or when the UTC offset for the current zone is decreased for political reasons.) The value 0 (1) represents the earlier (later) of the two moments with the same wall time representation. New in version 3.6. "time" objects support comparison of "time" to "time", where *a* is considered less than *b* when *a* precedes *b* in time. If one comparand is naive and the other is aware, "TypeError" is raised if an order comparison is attempted. For equality comparisons, naive instances are never equal to aware instances. If both comparands are aware, and have the same "tzinfo" attribute, the common "tzinfo" attribute is ignored and the base times are compared. If both comparands are aware and have different "tzinfo" attributes, the comparands are first adjusted by subtracting their UTC offsets (obtained from "self.utcoffset()"). In order to stop mixed- type comparisons from falling back to the default comparison by object address, when a "time" object is compared to an object of a different type, "TypeError" is raised unless the comparison is "==" or "!=". The latter cases return "False" or "True", respectively. Changed in version 3.3: Equality comparisons between aware and naive "time" instances don’t raise "TypeError". In Boolean contexts, a "time" object is always considered to be true. Changed in version 3.5: Before Python 3.5, a "time" object was considered to be false if it represented midnight in UTC. This behavior was considered obscure and error-prone and has been removed in Python 3.5. See bpo-13936 for full details. Other constructor: classmethod time.fromisoformat(time_string) Return a "time" corresponding to a *time_string* in one of the formats emitted by "time.isoformat()". Specifically, this function supports strings in the format: HH[:MM[:SS[.fff[fff]]]][+HH:MM[:SS[.ffffff]]] Caution: This does *not* support parsing arbitrary ISO 8601 strings. It is only intended as the inverse operation of "time.isoformat()". Examples: >>> from datetime import time >>> time.fromisoformat('04:23:01') datetime.time(4, 23, 1) >>> time.fromisoformat('04:23:01.000384') datetime.time(4, 23, 1, 384) >>> time.fromisoformat('04:23:01+04:00') datetime.time(4, 23, 1, tzinfo=datetime.timezone(datetime.timedelta(seconds=14400))) New in version 3.7. Instance methods: time.replace(hour=self.hour, minute=self.minute, second=self.second, microsecond=self.microsecond, tzinfo=self.tzinfo, *, fold=0) Return a "time" with the same value, except for those attributes given new values by whichever keyword arguments are specified. Note that "tzinfo=None" can be specified to create a naive "time" from an aware "time", without conversion of the time data. New in version 3.6: Added the "fold" argument. time.isoformat(timespec='auto') Return a string representing the time in ISO 8601 format, one of: * "HH:MM:SS.ffffff", if "microsecond" is not 0 * "HH:MM:SS", if "microsecond" is 0 * "HH:MM:SS.ffffff+HH:MM[:SS[.ffffff]]", if "utcoffset()" does not return "None" * "HH:MM:SS+HH:MM[:SS[.ffffff]]", if "microsecond" is 0 and "utcoffset()" does not return "None" The optional argument *timespec* specifies the number of additional components of the time to include (the default is "'auto'"). It can be one of the following: * "'auto'": Same as "'seconds'" if "microsecond" is 0, same as "'microseconds'" otherwise. * "'hours'": Include the "hour" in the two-digit "HH" format. * "'minutes'": Include "hour" and "minute" in "HH:MM" format. * "'seconds'": Include "hour", "minute", and "second" in "HH:MM:SS" format. * "'milliseconds'": Include full time, but truncate fractional second part to milliseconds. "HH:MM:SS.sss" format. * "'microseconds'": Include full time in "HH:MM:SS.ffffff" format. Note: Excluded time components are truncated, not rounded. "ValueError" will be raised on an invalid *timespec* argument. Example: >>> from datetime import time >>> time(hour=12, minute=34, second=56, microsecond=123456).isoformat(timespec='minutes') '12:34' >>> dt = time(hour=12, minute=34, second=56, microsecond=0) >>> dt.isoformat(timespec='microseconds') '12:34:56.000000' >>> dt.isoformat(timespec='auto') '12:34:56' New in version 3.6: Added the *timespec* argument. time.__str__() For a time *t*, "str(t)" is equivalent to "t.isoformat()". time.strftime(format) Return a string representing the time, controlled by an explicit format string. For a complete list of formatting directives, see strftime() and strptime() Behavior. time.__format__(format) Same as "time.strftime()". This makes it possible to specify a format string for a "time" object in formatted string literals and when using "str.format()". For a complete list of formatting directives, see strftime() and strptime() Behavior. time.utcoffset() If "tzinfo" is "None", returns "None", else returns "self.tzinfo.utcoffset(None)", and raises an exception if the latter doesn’t return "None" or a "timedelta" object with magnitude less than one day. Changed in version 3.7: The UTC offset is not restricted to a whole number of minutes. time.dst() If "tzinfo" is "None", returns "None", else returns "self.tzinfo.dst(None)", and raises an exception if the latter doesn’t return "None", or a "timedelta" object with magnitude less than one day. Changed in version 3.7: The DST offset is not restricted to a whole number of minutes. time.tzname() If "tzinfo" is "None", returns "None", else returns "self.tzinfo.tzname(None)", or raises an exception if the latter doesn’t return "None" or a string object. Examples of Usage: "time" ------------------------- Examples of working with a "time" object: >>> from datetime import time, tzinfo, timedelta >>> class TZ1(tzinfo): ... def utcoffset(self, dt): ... return timedelta(hours=1) ... def dst(self, dt): ... return timedelta(0) ... def tzname(self,dt): ... return "+01:00" ... def __repr__(self): ... return f"{self.__class__.__name__}()" ... >>> t = time(12, 10, 30, tzinfo=TZ1()) >>> t datetime.time(12, 10, 30, tzinfo=TZ1()) >>> t.isoformat() '12:10:30+01:00' >>> t.dst() datetime.timedelta(0) >>> t.tzname() '+01:00' >>> t.strftime("%H:%M:%S %Z") '12:10:30 +01:00' >>> 'The {} is {:%H:%M}.'.format("time", t) 'The time is 12:10.' "tzinfo" Objects ================ class datetime.tzinfo This is an abstract base class, meaning that this class should not be instantiated directly. Define a subclass of "tzinfo" to capture information about a particular time zone. An instance of (a concrete subclass of) "tzinfo" can be passed to the constructors for "datetime" and "time" objects. The latter objects view their attributes as being in local time, and the "tzinfo" object supports methods revealing offset of local time from UTC, the name of the time zone, and DST offset, all relative to a date or time object passed to them. You need to derive a concrete subclass, and (at least) supply implementations of the standard "tzinfo" methods needed by the "datetime" methods you use. The "datetime" module provides "timezone", a simple concrete subclass of "tzinfo" which can represent timezones with fixed offset from UTC such as UTC itself or North American EST and EDT. Special requirement for pickling: A "tzinfo" subclass must have an "__init__()" method that can be called with no arguments, otherwise it can be pickled but possibly not unpickled again. This is a technical requirement that may be relaxed in the future. A concrete subclass of "tzinfo" may need to implement the following methods. Exactly which methods are needed depends on the uses made of aware "datetime" objects. If in doubt, simply implement all of them. tzinfo.utcoffset(dt) Return offset of local time from UTC, as a "timedelta" object that is positive east of UTC. If local time is west of UTC, this should be negative. This represents the *total* offset from UTC; for example, if a "tzinfo" object represents both time zone and DST adjustments, "utcoffset()" should return their sum. If the UTC offset isn’t known, return "None". Else the value returned must be a "timedelta" object strictly between "-timedelta(hours=24)" and "timedelta(hours=24)" (the magnitude of the offset must be less than one day). Most implementations of "utcoffset()" will probably look like one of these two: return CONSTANT # fixed-offset class return CONSTANT + self.dst(dt) # daylight-aware class If "utcoffset()" does not return "None", "dst()" should not return "None" either. The default implementation of "utcoffset()" raises "NotImplementedError". Changed in version 3.7: The UTC offset is not restricted to a whole number of minutes. tzinfo.dst(dt) Return the daylight saving time (DST) adjustment, as a "timedelta" object or "None" if DST information isn’t known. Return "timedelta(0)" if DST is not in effect. If DST is in effect, return the offset as a "timedelta" object (see "utcoffset()" for details). Note that DST offset, if applicable, has already been added to the UTC offset returned by "utcoffset()", so there’s no need to consult "dst()" unless you’re interested in obtaining DST info separately. For example, "datetime.timetuple()" calls its "tzinfo" attribute’s "dst()" method to determine how the "tm_isdst" flag should be set, and "tzinfo.fromutc()" calls "dst()" to account for DST changes when crossing time zones. An instance *tz* of a "tzinfo" subclass that models both standard and daylight times must be consistent in this sense: "tz.utcoffset(dt) - tz.dst(dt)" must return the same result for every "datetime" *dt* with "dt.tzinfo == tz" For sane "tzinfo" subclasses, this expression yields the time zone’s “standard offset”, which should not depend on the date or the time, but only on geographic location. The implementation of "datetime.astimezone()" relies on this, but cannot detect violations; it’s the programmer’s responsibility to ensure it. If a "tzinfo" subclass cannot guarantee this, it may be able to override the default implementation of "tzinfo.fromutc()" to work correctly with "astimezone()" regardless. Most implementations of "dst()" will probably look like one of these two: def dst(self, dt): # a fixed-offset class: doesn't account for DST return timedelta(0) or: def dst(self, dt): # Code to set dston and dstoff to the time zone's DST # transition times based on the input dt.year, and expressed # in standard local time. if dston <= dt.replace(tzinfo=None) < dstoff: return timedelta(hours=1) else: return timedelta(0) The default implementation of "dst()" raises "NotImplementedError". Changed in version 3.7: The DST offset is not restricted to a whole number of minutes. tzinfo.tzname(dt) Return the time zone name corresponding to the "datetime" object *dt*, as a string. Nothing about string names is defined by the "datetime" module, and there’s no requirement that it mean anything in particular. For example, “GMT”, “UTC”, “-500”, “-5:00”, “EDT”, “US/Eastern”, “America/New York” are all valid replies. Return "None" if a string name isn’t known. Note that this is a method rather than a fixed string primarily because some "tzinfo" subclasses will wish to return different names depending on the specific value of *dt* passed, especially if the "tzinfo" class is accounting for daylight time. The default implementation of "tzname()" raises "NotImplementedError". These methods are called by a "datetime" or "time" object, in response to their methods of the same names. A "datetime" object passes itself as the argument, and a "time" object passes "None" as the argument. A "tzinfo" subclass’s methods should therefore be prepared to accept a *dt* argument of "None", or of class "datetime". When "None" is passed, it’s up to the class designer to decide the best response. For example, returning "None" is appropriate if the class wishes to say that time objects don’t participate in the "tzinfo" protocols. It may be more useful for "utcoffset(None)" to return the standard UTC offset, as there is no other convention for discovering the standard offset. When a "datetime" object is passed in response to a "datetime" method, "dt.tzinfo" is the same object as *self*. "tzinfo" methods can rely on this, unless user code calls "tzinfo" methods directly. The intent is that the "tzinfo" methods interpret *dt* as being in local time, and not need worry about objects in other timezones. There is one more "tzinfo" method that a subclass may wish to override: tzinfo.fromutc(dt) This is called from the default "datetime.astimezone()" implementation. When called from that, "dt.tzinfo" is *self*, and *dt*’s date and time data are to be viewed as expressing a UTC time. The purpose of "fromutc()" is to adjust the date and time data, returning an equivalent datetime in *self*’s local time. Most "tzinfo" subclasses should be able to inherit the default "fromutc()" implementation without problems. It’s strong enough to handle fixed-offset time zones, and time zones accounting for both standard and daylight time, and the latter even if the DST transition times differ in different years. An example of a time zone the default "fromutc()" implementation may not handle correctly in all cases is one where the standard offset (from UTC) depends on the specific date and time passed, which can happen for political reasons. The default implementations of "astimezone()" and "fromutc()" may not produce the result you want if the result is one of the hours straddling the moment the standard offset changes. Skipping code for error cases, the default "fromutc()" implementation acts like: def fromutc(self, dt): # raise ValueError error if dt.tzinfo is not self dtoff = dt.utcoffset() dtdst = dt.dst() # raise ValueError if dtoff is None or dtdst is None delta = dtoff - dtdst # this is self's standard offset if delta: dt += delta # convert to standard local time dtdst = dt.dst() # raise ValueError if dtdst is None if dtdst: return dt + dtdst else: return dt In the following "tzinfo_examples.py" file there are some examples of "tzinfo" classes: from datetime import tzinfo, timedelta, datetime ZERO = timedelta(0) HOUR = timedelta(hours=1) SECOND = timedelta(seconds=1) # A class capturing the platform's idea of local time. # (May result in wrong values on historical times in # timezones where UTC offset and/or the DST rules had # changed in the past.) import time as _time STDOFFSET = timedelta(seconds = -_time.timezone) if _time.daylight: DSTOFFSET = timedelta(seconds = -_time.altzone) else: DSTOFFSET = STDOFFSET DSTDIFF = DSTOFFSET - STDOFFSET class LocalTimezone(tzinfo): def fromutc(self, dt): assert dt.tzinfo is self stamp = (dt - datetime(1970, 1, 1, tzinfo=self)) // SECOND args = _time.localtime(stamp)[:6] dst_diff = DSTDIFF // SECOND # Detect fold fold = (args == _time.localtime(stamp - dst_diff)) return datetime(*args, microsecond=dt.microsecond, tzinfo=self, fold=fold) def utcoffset(self, dt): if self._isdst(dt): return DSTOFFSET else: return STDOFFSET def dst(self, dt): if self._isdst(dt): return DSTDIFF else: return ZERO def tzname(self, dt): return _time.tzname[self._isdst(dt)] def _isdst(self, dt): tt = (dt.year, dt.month, dt.day, dt.hour, dt.minute, dt.second, dt.weekday(), 0, 0) stamp = _time.mktime(tt) tt = _time.localtime(stamp) return tt.tm_isdst > 0 Local = LocalTimezone() # A complete implementation of current DST rules for major US time zones. def first_sunday_on_or_after(dt): days_to_go = 6 - dt.weekday() if days_to_go: dt += timedelta(days_to_go) return dt # US DST Rules # # This is a simplified (i.e., wrong for a few cases) set of rules for US # DST start and end times. For a complete and up-to-date set of DST rules # and timezone definitions, visit the Olson Database (or try pytz): # http://www.twinsun.com/tz/tz-link.htm # http://sourceforge.net/projects/pytz/ (might not be up-to-date) # # In the US, since 2007, DST starts at 2am (standard time) on the second # Sunday in March, which is the first Sunday on or after Mar 8. DSTSTART_2007 = datetime(1, 3, 8, 2) # and ends at 2am (DST time) on the first Sunday of Nov. DSTEND_2007 = datetime(1, 11, 1, 2) # From 1987 to 2006, DST used to start at 2am (standard time) on the first # Sunday in April and to end at 2am (DST time) on the last # Sunday of October, which is the first Sunday on or after Oct 25. DSTSTART_1987_2006 = datetime(1, 4, 1, 2) DSTEND_1987_2006 = datetime(1, 10, 25, 2) # From 1967 to 1986, DST used to start at 2am (standard time) on the last # Sunday in April (the one on or after April 24) and to end at 2am (DST time) # on the last Sunday of October, which is the first Sunday # on or after Oct 25. DSTSTART_1967_1986 = datetime(1, 4, 24, 2) DSTEND_1967_1986 = DSTEND_1987_2006 def us_dst_range(year): # Find start and end times for US DST. For years before 1967, return # start = end for no DST. if 2006 < year: dststart, dstend = DSTSTART_2007, DSTEND_2007 elif 1986 < year < 2007: dststart, dstend = DSTSTART_1987_2006, DSTEND_1987_2006 elif 1966 < year < 1987: dststart, dstend = DSTSTART_1967_1986, DSTEND_1967_1986 else: return (datetime(year, 1, 1), ) * 2 start = first_sunday_on_or_after(dststart.replace(year=year)) end = first_sunday_on_or_after(dstend.replace(year=year)) return start, end class USTimeZone(tzinfo): def __init__(self, hours, reprname, stdname, dstname): self.stdoffset = timedelta(hours=hours) self.reprname = reprname self.stdname = stdname self.dstname = dstname def __repr__(self): return self.reprname def tzname(self, dt): if self.dst(dt): return self.dstname else: return self.stdname def utcoffset(self, dt): return self.stdoffset + self.dst(dt) def dst(self, dt): if dt is None or dt.tzinfo is None: # An exception may be sensible here, in one or both cases. # It depends on how you want to treat them. The default # fromutc() implementation (called by the default astimezone() # implementation) passes a datetime with dt.tzinfo is self. return ZERO assert dt.tzinfo is self start, end = us_dst_range(dt.year) # Can't compare naive to aware objects, so strip the timezone from # dt first. dt = dt.replace(tzinfo=None) if start + HOUR <= dt < end - HOUR: # DST is in effect. return HOUR if end - HOUR <= dt < end: # Fold (an ambiguous hour): use dt.fold to disambiguate. return ZERO if dt.fold else HOUR if start <= dt < start + HOUR: # Gap (a non-existent hour): reverse the fold rule. return HOUR if dt.fold else ZERO # DST is off. return ZERO def fromutc(self, dt): assert dt.tzinfo is self start, end = us_dst_range(dt.year) start = start.replace(tzinfo=self) end = end.replace(tzinfo=self) std_time = dt + self.stdoffset dst_time = std_time + HOUR if end <= dst_time < end + HOUR: # Repeated hour return std_time.replace(fold=1) if std_time < start or dst_time >= end: # Standard time return std_time if start <= std_time < end - HOUR: # Daylight saving time return dst_time Eastern = USTimeZone(-5, "Eastern", "EST", "EDT") Central = USTimeZone(-6, "Central", "CST", "CDT") Mountain = USTimeZone(-7, "Mountain", "MST", "MDT") Pacific = USTimeZone(-8, "Pacific", "PST", "PDT") Note that there are unavoidable subtleties twice per year in a "tzinfo" subclass accounting for both standard and daylight time, at the DST transition points. For concreteness, consider US Eastern (UTC -0500), where EDT begins the minute after 1:59 (EST) on the second Sunday in March, and ends the minute after 1:59 (EDT) on the first Sunday in November: UTC 3:MM 4:MM 5:MM 6:MM 7:MM 8:MM EST 22:MM 23:MM 0:MM 1:MM 2:MM 3:MM EDT 23:MM 0:MM 1:MM 2:MM 3:MM 4:MM start 22:MM 23:MM 0:MM 1:MM 3:MM 4:MM end 23:MM 0:MM 1:MM 1:MM 2:MM 3:MM When DST starts (the “start” line), the local wall clock leaps from 1:59 to 3:00. A wall time of the form 2:MM doesn’t really make sense on that day, so "astimezone(Eastern)" won’t deliver a result with "hour == 2" on the day DST begins. For example, at the Spring forward transition of 2016, we get: >>> from datetime import datetime, timezone >>> from tzinfo_examples import HOUR, Eastern >>> u0 = datetime(2016, 3, 13, 5, tzinfo=timezone.utc) >>> for i in range(4): ... u = u0 + i*HOUR ... t = u.astimezone(Eastern) ... print(u.time(), 'UTC =', t.time(), t.tzname()) ... 05:00:00 UTC = 00:00:00 EST 06:00:00 UTC = 01:00:00 EST 07:00:00 UTC = 03:00:00 EDT 08:00:00 UTC = 04:00:00 EDT When DST ends (the “end” line), there’s a potentially worse problem: there’s an hour that can’t be spelled unambiguously in local wall time: the last hour of daylight time. In Eastern, that’s times of the form 5:MM UTC on the day daylight time ends. The local wall clock leaps from 1:59 (daylight time) back to 1:00 (standard time) again. Local times of the form 1:MM are ambiguous. "astimezone()" mimics the local clock’s behavior by mapping two adjacent UTC hours into the same local hour then. In the Eastern example, UTC times of the form 5:MM and 6:MM both map to 1:MM when converted to Eastern, but earlier times have the "fold" attribute set to 0 and the later times have it set to 1. For example, at the Fall back transition of 2016, we get: >>> u0 = datetime(2016, 11, 6, 4, tzinfo=timezone.utc) >>> for i in range(4): ... u = u0 + i*HOUR ... t = u.astimezone(Eastern) ... print(u.time(), 'UTC =', t.time(), t.tzname(), t.fold) ... 04:00:00 UTC = 00:00:00 EDT 0 05:00:00 UTC = 01:00:00 EDT 0 06:00:00 UTC = 01:00:00 EST 1 07:00:00 UTC = 02:00:00 EST 0 Note that the "datetime" instances that differ only by the value of the "fold" attribute are considered equal in comparisons. Applications that can’t bear wall-time ambiguities should explicitly check the value of the "fold" attribute or avoid using hybrid "tzinfo" subclasses; there are no ambiguities when using "timezone", or any other fixed-offset "tzinfo" subclass (such as a class representing only EST (fixed offset -5 hours), or only EDT (fixed offset -4 hours)). See also: "zoneinfo" The "datetime" module has a basic "timezone" class (for handling arbitrary fixed offsets from UTC) and its "timezone.utc" attribute (a UTC timezone instance). "zoneinfo" brings the *IANA timezone database* (also known as the Olson database) to Python, and its usage is recommended. IANA timezone database The Time Zone Database (often called tz, tzdata or zoneinfo) contains code and data that represent the history of local time for many representative locations around the globe. It is updated periodically to reflect changes made by political bodies to time zone boundaries, UTC offsets, and daylight-saving rules. "timezone" Objects ================== The "timezone" class is a subclass of "tzinfo", each instance of which represents a timezone defined by a fixed offset from UTC. Objects of this class cannot be used to represent timezone information in the locations where different offsets are used in different days of the year or where historical changes have been made to civil time. class datetime.timezone(offset, name=None) The *offset* argument must be specified as a "timedelta" object representing the difference between the local time and UTC. It must be strictly between "-timedelta(hours=24)" and "timedelta(hours=24)", otherwise "ValueError" is raised. The *name* argument is optional. If specified it must be a string that will be used as the value returned by the "datetime.tzname()" method. New in version 3.2. Changed in version 3.7: The UTC offset is not restricted to a whole number of minutes. timezone.utcoffset(dt) Return the fixed value specified when the "timezone" instance is constructed. The *dt* argument is ignored. The return value is a "timedelta" instance equal to the difference between the local time and UTC. Changed in version 3.7: The UTC offset is not restricted to a whole number of minutes. timezone.tzname(dt) Return the fixed value specified when the "timezone" instance is constructed. If *name* is not provided in the constructor, the name returned by "tzname(dt)" is generated from the value of the "offset" as follows. If *offset* is "timedelta(0)", the name is “UTC”, otherwise it is a string in the format "UTC±HH:MM", where ± is the sign of "offset", HH and MM are two digits of "offset.hours" and "offset.minutes" respectively. Changed in version 3.6: Name generated from "offset=timedelta(0)" is now plain *‘UTC’*, not "'UTC+00:00'". timezone.dst(dt) Always returns "None". timezone.fromutc(dt) Return "dt + offset". The *dt* argument must be an aware "datetime" instance, with "tzinfo" set to "self". Class attributes: timezone.utc The UTC timezone, "timezone(timedelta(0))". "strftime()" and "strptime()" Behavior ====================================== "date", "datetime", and "time" objects all support a "strftime(format)" method, to create a string representing the time under the control of an explicit format string. Conversely, the "datetime.strptime()" class method creates a "datetime" object from a string representing a date and time and a corresponding format string. The table below provides a high-level comparison of "strftime()" versus "strptime()": +------------------+----------------------------------------------------------+--------------------------------------------------------------------------------+ | | "strftime" | "strptime" | |==================|==========================================================|================================================================================| | Usage | Convert object to a string according to a given format | Parse a string into a "datetime" object given a corresponding format | +------------------+----------------------------------------------------------+--------------------------------------------------------------------------------+ | Type of method | Instance method | Class method | +------------------+----------------------------------------------------------+--------------------------------------------------------------------------------+ | Method of | "date"; "datetime"; "time" | "datetime" | +------------------+----------------------------------------------------------+--------------------------------------------------------------------------------+ | Signature | "strftime(format)" | "strptime(date_string, format)" | +------------------+----------------------------------------------------------+--------------------------------------------------------------------------------+ "strftime()" and "strptime()" Format Codes ------------------------------------------ The following is a list of all the format codes that the 1989 C standard requires, and these work on all platforms with a standard C implementation. +-------------+----------------------------------+--------------------------+---------+ | Directive | Meaning | Example | Notes | |=============|==================================|==========================|=========| | "%a" | Weekday as locale’s abbreviated | Sun, Mon, …, Sat | (1) | | | name. | (en_US); So, Mo, …, Sa | | | | | (de_DE) | | +-------------+----------------------------------+--------------------------+---------+ | "%A" | Weekday as locale’s full name. | Sunday, Monday, …, | (1) | | | | Saturday (en_US); | | | | | Sonntag, Montag, …, | | | | | Samstag (de_DE) | | +-------------+----------------------------------+--------------------------+---------+ | "%w" | Weekday as a decimal number, | 0, 1, …, 6 | | | | where 0 is Sunday and 6 is | | | | | Saturday. | | | +-------------+----------------------------------+--------------------------+---------+ | "%d" | Day of the month as a zero- | 01, 02, …, 31 | (9) | | | padded decimal number. | | | +-------------+----------------------------------+--------------------------+---------+ | "%b" | Month as locale’s abbreviated | Jan, Feb, …, Dec | (1) | | | name. | (en_US); Jan, Feb, …, | | | | | Dez (de_DE) | | +-------------+----------------------------------+--------------------------+---------+ | "%B" | Month as locale’s full name. | January, February, …, | (1) | | | | December (en_US); | | | | | Januar, Februar, …, | | | | | Dezember (de_DE) | | +-------------+----------------------------------+--------------------------+---------+ | "%m" | Month as a zero-padded decimal | 01, 02, …, 12 | (9) | | | number. | | | +-------------+----------------------------------+--------------------------+---------+ | "%y" | Year without century as a zero- | 00, 01, …, 99 | (9) | | | padded decimal number. | | | +-------------+----------------------------------+--------------------------+---------+ | "%Y" | Year with century as a decimal | 0001, 0002, …, 2013, | (2) | | | number. | 2014, …, 9998, 9999 | | +-------------+----------------------------------+--------------------------+---------+ | "%H" | Hour (24-hour clock) as a zero- | 00, 01, …, 23 | (9) | | | padded decimal number. | | | +-------------+----------------------------------+--------------------------+---------+ | "%I" | Hour (12-hour clock) as a zero- | 01, 02, …, 12 | (9) | | | padded decimal number. | | | +-------------+----------------------------------+--------------------------+---------+ | "%p" | Locale’s equivalent of either AM | AM, PM (en_US); am, pm | (1), | | | or PM. | (de_DE) | (3) | +-------------+----------------------------------+--------------------------+---------+ | "%M" | Minute as a zero-padded decimal | 00, 01, …, 59 | (9) | | | number. | | | +-------------+----------------------------------+--------------------------+---------+ | "%S" | Second as a zero-padded decimal | 00, 01, …, 59 | (4), | | | number. | | (9) | +-------------+----------------------------------+--------------------------+---------+ | "%f" | Microsecond as a decimal number, | 000000, 000001, …, | (5) | | | zero-padded to 6 digits. | 999999 | | +-------------+----------------------------------+--------------------------+---------+ | "%z" | UTC offset in the form | (empty), +0000, -0400, | (6) | | | "±HHMM[SS[.ffffff]]" (empty | +1030, +063415, | | | | string if the object is naive). | -030712.345216 | | +-------------+----------------------------------+--------------------------+---------+ | "%Z" | Time zone name (empty string if | (empty), UTC, GMT | (6) | | | the object is naive). | | | +-------------+----------------------------------+--------------------------+---------+ | "%j" | Day of the year as a zero-padded | 001, 002, …, 366 | (9) | | | decimal number. | | | +-------------+----------------------------------+--------------------------+---------+ | "%U" | Week number of the year (Sunday | 00, 01, …, 53 | (7), | | | as the first day of the week) as | | (9) | | | a zero-padded decimal number. | | | | | All days in a new year preceding | | | | | the first Sunday are considered | | | | | to be in week 0. | | | +-------------+----------------------------------+--------------------------+---------+ | "%W" | Week number of the year (Monday | 00, 01, …, 53 | (7), | | | as the first day of the week) as | | (9) | | | a zero-padded decimal number. | | | | | All days in a new year preceding | | | | | the first Monday are considered | | | | | to be in week 0. | | | +-------------+----------------------------------+--------------------------+---------+ | "%c" | Locale’s appropriate date and | Tue Aug 16 21:30:00 1988 | (1) | | | time representation. | (en_US); Di 16 Aug | | | | | 21:30:00 1988 (de_DE) | | +-------------+----------------------------------+--------------------------+---------+ | "%x" | Locale’s appropriate date | 08/16/88 (None); | (1) | | | representation. | 08/16/1988 (en_US); | | | | | 16.08.1988 (de_DE) | | +-------------+----------------------------------+--------------------------+---------+ | "%X" | Locale’s appropriate time | 21:30:00 (en_US); | (1) | | | representation. | 21:30:00 (de_DE) | | +-------------+----------------------------------+--------------------------+---------+ | "%%" | A literal "'%'" character. | % | | +-------------+----------------------------------+--------------------------+---------+ Several additional directives not required by the C89 standard are included for convenience. These parameters all correspond to ISO 8601 date values. +-------------+----------------------------------+--------------------------+---------+ | Directive | Meaning | Example | Notes | |=============|==================================|==========================|=========| | "%G" | ISO 8601 year with century | 0001, 0002, …, 2013, | (8) | | | representing the year that | 2014, …, 9998, 9999 | | | | contains the greater part of the | | | | | ISO week ("%V"). | | | +-------------+----------------------------------+--------------------------+---------+ | "%u" | ISO 8601 weekday as a decimal | 1, 2, …, 7 | | | | number where 1 is Monday. | | | +-------------+----------------------------------+--------------------------+---------+ | "%V" | ISO 8601 week as a decimal | 01, 02, …, 53 | (8), | | | number with Monday as the first | | (9) | | | day of the week. Week 01 is the | | | | | week containing Jan 4. | | | +-------------+----------------------------------+--------------------------+---------+ These may not be available on all platforms when used with the "strftime()" method. The ISO 8601 year and ISO 8601 week directives are not interchangeable with the year and week number directives above. Calling "strptime()" with incomplete or ambiguous ISO 8601 directives will raise a "ValueError". The full set of format codes supported varies across platforms, because Python calls the platform C library’s "strftime()" function, and platform variations are common. To see the full set of format codes supported on your platform, consult the *strftime(3)* documentation. There are also differences between platforms in handling of unsupported format specifiers. New in version 3.6: "%G", "%u" and "%V" were added. Technical Detail ---------------- Broadly speaking, "d.strftime(fmt)" acts like the "time" module’s "time.strftime(fmt, d.timetuple())" although not all objects support a "timetuple()" method. For the "datetime.strptime()" class method, the default value is "1900-01-01T00:00:00.000": any components not specified in the format string will be pulled from the default value. [4] Using "datetime.strptime(date_string, format)" is equivalent to: datetime(*(time.strptime(date_string, format)[0:6])) except when the format includes sub-second components or timezone offset information, which are supported in "datetime.strptime" but are discarded by "time.strptime". For "time" objects, the format codes for year, month, and day should not be used, as "time" objects have no such values. If they’re used anyway, "1900" is substituted for the year, and "1" for the month and day. For "date" objects, the format codes for hours, minutes, seconds, and microseconds should not be used, as "date" objects have no such values. If they’re used anyway, "0" is substituted for them. For the same reason, handling of format strings containing Unicode code points that can’t be represented in the charset of the current locale is also platform-dependent. On some platforms such code points are preserved intact in the output, while on others "strftime" may raise "UnicodeError" or return an empty string instead. Notes: 1. Because the format depends on the current locale, care should be taken when making assumptions about the output value. Field orderings will vary (for example, “month/day/year” versus “day/month/year”), and the output may contain Unicode characters encoded using the locale’s default encoding (for example, if the current locale is "ja_JP", the default encoding could be any one of "eucJP", "SJIS", or "utf-8"; use "locale.getlocale()" to determine the current locale’s encoding). 2. The "strptime()" method can parse years in the full [1, 9999] range, but years < 1000 must be zero-filled to 4-digit width. Changed in version 3.2: In previous versions, "strftime()" method was restricted to years >= 1900. Changed in version 3.3: In version 3.2, "strftime()" method was restricted to years >= 1000. 3. When used with the "strptime()" method, the "%p" directive only affects the output hour field if the "%I" directive is used to parse the hour. 4. Unlike the "time" module, the "datetime" module does not support leap seconds. 5. When used with the "strptime()" method, the "%f" directive accepts from one to six digits and zero pads on the right. "%f" is an extension to the set of format characters in the C standard (but implemented separately in datetime objects, and therefore always available). 6. For a naive object, the "%z" and "%Z" format codes are replaced by empty strings. For an aware object: "%z" "utcoffset()" is transformed into a string of the form "±HHMM[SS[.ffffff]]", where "HH" is a 2-digit string giving the number of UTC offset hours, "MM" is a 2-digit string giving the number of UTC offset minutes, "SS" is a 2-digit string giving the number of UTC offset seconds and "ffffff" is a 6-digit string giving the number of UTC offset microseconds. The "ffffff" part is omitted when the offset is a whole number of seconds and both the "ffffff" and the "SS" part is omitted when the offset is a whole number of minutes. For example, if "utcoffset()" returns "timedelta(hours=-3, minutes=-30)", "%z" is replaced with the string "'-0330'". Changed in version 3.7: The UTC offset is not restricted to a whole number of minutes. Changed in version 3.7: When the "%z" directive is provided to the "strptime()" method, the UTC offsets can have a colon as a separator between hours, minutes and seconds. For example, "'+01:00:00'" will be parsed as an offset of one hour. In addition, providing "'Z'" is identical to "'+00:00'". "%Z" In "strftime()", "%Z" is replaced by an empty string if "tzname()" returns "None"; otherwise "%Z" is replaced by the returned value, which must be a string. "strptime()" only accepts certain values for "%Z": 1. any value in "time.tzname" for your machine’s locale 2. the hard-coded values "UTC" and "GMT" So someone living in Japan may have "JST", "UTC", and "GMT" as valid values, but probably not "EST". It will raise "ValueError" for invalid values. Changed in version 3.2: When the "%z" directive is provided to the "strptime()" method, an aware "datetime" object will be produced. The "tzinfo" of the result will be set to a "timezone" instance. 7. When used with the "strptime()" method, "%U" and "%W" are only used in calculations when the day of the week and the calendar year ("%Y") are specified. 8. Similar to "%U" and "%W", "%V" is only used in calculations when the day of the week and the ISO year ("%G") are specified in a "strptime()" format string. Also note that "%G" and "%Y" are not interchangeable. 9. When used with the "strptime()" method, the leading zero is optional for formats "%d", "%m", "%H", "%I", "%M", "%S", "%J", "%U", "%W", and "%V". Format "%y" does require a leading zero. -[ Footnotes ]- [1] If, that is, we ignore the effects of Relativity [2] This matches the definition of the “proleptic Gregorian” calendar in Dershowitz and Reingold’s book *Calendrical Calculations*, where it’s the base calendar for all computations. See the book for algorithms for converting between proleptic Gregorian ordinals and many other calendar systems. [3] See R. H. van Gent’s guide to the mathematics of the ISO 8601 calendar for a good explanation. [4] Passing "datetime.strptime('Feb 29', '%b %d')" will fail since "1900" is not a leap year. "dbm" — Interfaces to Unix “databases” ************************************** **Source code:** Lib/dbm/__init__.py ====================================================================== "dbm" is a generic interface to variants of the DBM database — "dbm.gnu" or "dbm.ndbm". If none of these modules is installed, the slow-but-simple implementation in module "dbm.dumb" will be used. There is a third party interface to the Oracle Berkeley DB. exception dbm.error A tuple containing the exceptions that can be raised by each of the supported modules, with a unique exception also named "dbm.error" as the first item — the latter is used when "dbm.error" is raised. dbm.whichdb(filename) This function attempts to guess which of the several simple database modules available — "dbm.gnu", "dbm.ndbm" or "dbm.dumb" — should be used to open a given file. Returns one of the following values: "None" if the file can’t be opened because it’s unreadable or doesn’t exist; the empty string ("''") if the file’s format can’t be guessed; or a string containing the required module name, such as "'dbm.ndbm'" or "'dbm.gnu'". dbm.open(file, flag='r', mode=0o666) Open the database file *file* and return a corresponding object. If the database file already exists, the "whichdb()" function is used to determine its type and the appropriate module is used; if it does not exist, the first module listed above that can be imported is used. The optional *flag* argument can be: +-----------+---------------------------------------------+ | Value | Meaning | |===========|=============================================| | "'r'" | Open existing database for reading only | | | (default) | +-----------+---------------------------------------------+ | "'w'" | Open existing database for reading and | | | writing | +-----------+---------------------------------------------+ | "'c'" | Open database for reading and writing, | | | creating it if it doesn’t exist | +-----------+---------------------------------------------+ | "'n'" | Always create a new, empty database, open | | | for reading and writing | +-----------+---------------------------------------------+ The optional *mode* argument is the Unix mode of the file, used only when the database has to be created. It defaults to octal "0o666" (and will be modified by the prevailing umask). The object returned by "open()" supports the same basic functionality as dictionaries; keys and their corresponding values can be stored, retrieved, and deleted, and the "in" operator and the "keys()" method are available, as well as "get()" and "setdefault()". Changed in version 3.2: "get()" and "setdefault()" are now available in all database modules. Changed in version 3.8: Deleting a key from a read-only database raises database module specific error instead of "KeyError". Key and values are always stored as bytes. This means that when strings are used they are implicitly converted to the default encoding before being stored. These objects also support being used in a "with" statement, which will automatically close them when done. Changed in version 3.4: Added native support for the context management protocol to the objects returned by "open()". The following example records some hostnames and a corresponding title, and then prints out the contents of the database: import dbm # Open database, creating it if necessary. with dbm.open('cache', 'c') as db: # Record some values db[b'hello'] = b'there' db['www.python.org'] = 'Python Website' db['www.cnn.com'] = 'Cable News Network' # Note that the keys are considered bytes now. assert db[b'www.python.org'] == b'Python Website' # Notice how the value is now in bytes. assert db['www.cnn.com'] == b'Cable News Network' # Often-used methods of the dict interface work too. print(db.get('python.org', b'not present')) # Storing a non-string key or value will raise an exception (most # likely a TypeError). db['www.yahoo.com'] = 4 # db is automatically closed when leaving the with statement. See also: Module "shelve" Persistence module which stores non-string data. The individual submodules are described in the following sections. "dbm.gnu" — GNU’s reinterpretation of dbm ========================================= **Source code:** Lib/dbm/gnu.py ====================================================================== This module is quite similar to the "dbm" module, but uses the GNU library "gdbm" instead to provide some additional functionality. Please note that the file formats created by "dbm.gnu" and "dbm.ndbm" are incompatible. The "dbm.gnu" module provides an interface to the GNU DBM library. "dbm.gnu.gdbm" objects behave like mappings (dictionaries), except that keys and values are always converted to bytes before storing. Printing a "gdbm" object doesn’t print the keys and values, and the "items()" and "values()" methods are not supported. exception dbm.gnu.error Raised on "dbm.gnu"-specific errors, such as I/O errors. "KeyError" is raised for general mapping errors like specifying an incorrect key. dbm.gnu.open(filename[, flag[, mode]]) Open a "gdbm" database and return a "gdbm" object. The *filename* argument is the name of the database file. The optional *flag* argument can be: +-----------+---------------------------------------------+ | Value | Meaning | |===========|=============================================| | "'r'" | Open existing database for reading only | | | (default) | +-----------+---------------------------------------------+ | "'w'" | Open existing database for reading and | | | writing | +-----------+---------------------------------------------+ | "'c'" | Open database for reading and writing, | | | creating it if it doesn’t exist | +-----------+---------------------------------------------+ | "'n'" | Always create a new, empty database, open | | | for reading and writing | +-----------+---------------------------------------------+ The following additional characters may be appended to the flag to control how the database is opened: +-----------+----------------------------------------------+ | Value | Meaning | |===========|==============================================| | "'f'" | Open the database in fast mode. Writes to | | | the database will not be synchronized. | +-----------+----------------------------------------------+ | "'s'" | Synchronized mode. This will cause changes | | | to the database to be immediately written to | | | the file. | +-----------+----------------------------------------------+ | "'u'" | Do not lock database. | +-----------+----------------------------------------------+ Not all flags are valid for all versions of "gdbm". The module constant "open_flags" is a string of supported flag characters. The exception "error" is raised if an invalid flag is specified. The optional *mode* argument is the Unix mode of the file, used only when the database has to be created. It defaults to octal "0o666". In addition to the dictionary-like methods, "gdbm" objects have the following methods: gdbm.firstkey() It’s possible to loop over every key in the database using this method and the "nextkey()" method. The traversal is ordered by "gdbm"’s internal hash values, and won’t be sorted by the key values. This method returns the starting key. gdbm.nextkey(key) Returns the key that follows *key* in the traversal. The following code prints every key in the database "db", without having to create a list in memory that contains them all: k = db.firstkey() while k is not None: print(k) k = db.nextkey(k) gdbm.reorganize() If you have carried out a lot of deletions and would like to shrink the space used by the "gdbm" file, this routine will reorganize the database. "gdbm" objects will not shorten the length of a database file except by using this reorganization; otherwise, deleted file space will be kept and reused as new (key, value) pairs are added. gdbm.sync() When the database has been opened in fast mode, this method forces any unwritten data to be written to the disk. gdbm.close() Close the "gdbm" database. "dbm.ndbm" — Interface based on ndbm ==================================== **Source code:** Lib/dbm/ndbm.py ====================================================================== The "dbm.ndbm" module provides an interface to the Unix “(n)dbm” library. Dbm objects behave like mappings (dictionaries), except that keys and values are always stored as bytes. Printing a "dbm" object doesn’t print the keys and values, and the "items()" and "values()" methods are not supported. This module can be used with the “classic” ndbm interface or the GNU GDBM compatibility interface. On Unix, the **configure** script will attempt to locate the appropriate header file to simplify building this module. exception dbm.ndbm.error Raised on "dbm.ndbm"-specific errors, such as I/O errors. "KeyError" is raised for general mapping errors like specifying an incorrect key. dbm.ndbm.library Name of the "ndbm" implementation library used. dbm.ndbm.open(filename[, flag[, mode]]) Open a dbm database and return a "ndbm" object. The *filename* argument is the name of the database file (without the ".dir" or ".pag" extensions). The optional *flag* argument must be one of these values: +-----------+---------------------------------------------+ | Value | Meaning | |===========|=============================================| | "'r'" | Open existing database for reading only | | | (default) | +-----------+---------------------------------------------+ | "'w'" | Open existing database for reading and | | | writing | +-----------+---------------------------------------------+ | "'c'" | Open database for reading and writing, | | | creating it if it doesn’t exist | +-----------+---------------------------------------------+ | "'n'" | Always create a new, empty database, open | | | for reading and writing | +-----------+---------------------------------------------+ The optional *mode* argument is the Unix mode of the file, used only when the database has to be created. It defaults to octal "0o666" (and will be modified by the prevailing umask). In addition to the dictionary-like methods, "ndbm" objects provide the following method: ndbm.close() Close the "ndbm" database. "dbm.dumb" — Portable DBM implementation ======================================== **Source code:** Lib/dbm/dumb.py Note: The "dbm.dumb" module is intended as a last resort fallback for the "dbm" module when a more robust module is not available. The "dbm.dumb" module is not written for speed and is not nearly as heavily used as the other database modules. ====================================================================== The "dbm.dumb" module provides a persistent dictionary-like interface which is written entirely in Python. Unlike other modules such as "dbm.gnu" no external library is required. As with other persistent mappings, the keys and values are always stored as bytes. The module defines the following: exception dbm.dumb.error Raised on "dbm.dumb"-specific errors, such as I/O errors. "KeyError" is raised for general mapping errors like specifying an incorrect key. dbm.dumb.open(filename[, flag[, mode]]) Open a "dumbdbm" database and return a dumbdbm object. The *filename* argument is the basename of the database file (without any specific extensions). When a dumbdbm database is created, files with ".dat" and ".dir" extensions are created. The optional *flag* argument can be: +-----------+---------------------------------------------+ | Value | Meaning | |===========|=============================================| | "'r'" | Open existing database for reading only | | | (default) | +-----------+---------------------------------------------+ | "'w'" | Open existing database for reading and | | | writing | +-----------+---------------------------------------------+ | "'c'" | Open database for reading and writing, | | | creating it if it doesn’t exist | +-----------+---------------------------------------------+ | "'n'" | Always create a new, empty database, open | | | for reading and writing | +-----------+---------------------------------------------+ The optional *mode* argument is the Unix mode of the file, used only when the database has to be created. It defaults to octal "0o666" (and will be modified by the prevailing umask). Warning: It is possible to crash the Python interpreter when loading a database with a sufficiently large/complex entry due to stack depth limitations in Python’s AST compiler. Changed in version 3.5: "open()" always creates a new database when the flag has the value "'n'". Changed in version 3.8: A database opened with flags "'r'" is now read-only. Opening with flags "'r'" and "'w'" no longer creates a database if it does not exist. In addition to the methods provided by the "collections.abc.MutableMapping" class, "dumbdbm" objects provide the following methods: dumbdbm.sync() Synchronize the on-disk directory and data files. This method is called by the "Shelve.sync()" method. dumbdbm.close() Close the "dumbdbm" database. Debugging and Profiling *********************** These libraries help you with Python development: the debugger enables you to step through code, analyze stack frames and set breakpoints etc., and the profilers run code and give you a detailed breakdown of execution times, allowing you to identify bottlenecks in your programs. Auditing events provide visibility into runtime behaviors that would otherwise require intrusive debugging or patching. * Audit events table * "bdb" — Debugger framework * "faulthandler" — Dump the Python traceback * Dumping the traceback * Fault handler state * Dumping the tracebacks after a timeout * Dumping the traceback on a user signal * Issue with file descriptors * Example * "pdb" — The Python Debugger * Debugger Commands * The Python Profilers * Introduction to the profilers * Instant User’s Manual * "profile" and "cProfile" Module Reference * The "Stats" Class * What Is Deterministic Profiling? * Limitations * Calibration * Using a custom timer * "timeit" — Measure execution time of small code snippets * Basic Examples * Python Interface * Command-Line Interface * Examples * "trace" — Trace or track Python statement execution * Command-Line Usage * Main options * Modifiers * Filters * Programmatic Interface * "tracemalloc" — Trace memory allocations * Examples * Display the top 10 * Compute differences * Get the traceback of a memory block * Pretty top * Record the current and peak size of all traced memory blocks * API * Functions * DomainFilter * Filter * Frame * Snapshot * Statistic * StatisticDiff * Trace * Traceback "decimal" — Decimal fixed point and floating point arithmetic ************************************************************* **Source code:** Lib/decimal.py ====================================================================== The "decimal" module provides support for fast correctly-rounded decimal floating point arithmetic. It offers several advantages over the "float" datatype: * Decimal “is based on a floating-point model which was designed with people in mind, and necessarily has a paramount guiding principle – computers must provide an arithmetic that works in the same way as the arithmetic that people learn at school.” – excerpt from the decimal arithmetic specification. * Decimal numbers can be represented exactly. In contrast, numbers like "1.1" and "2.2" do not have exact representations in binary floating point. End users typically would not expect "1.1 + 2.2" to display as "3.3000000000000003" as it does with binary floating point. * The exactness carries over into arithmetic. In decimal floating point, "0.1 + 0.1 + 0.1 - 0.3" is exactly equal to zero. In binary floating point, the result is "5.5511151231257827e-017". While near to zero, the differences prevent reliable equality testing and differences can accumulate. For this reason, decimal is preferred in accounting applications which have strict equality invariants. * The decimal module incorporates a notion of significant places so that "1.30 + 1.20" is "2.50". The trailing zero is kept to indicate significance. This is the customary presentation for monetary applications. For multiplication, the “schoolbook” approach uses all the figures in the multiplicands. For instance, "1.3 * 1.2" gives "1.56" while "1.30 * 1.20" gives "1.5600". * Unlike hardware based binary floating point, the decimal module has a user alterable precision (defaulting to 28 places) which can be as large as needed for a given problem: >>> from decimal import * >>> getcontext().prec = 6 >>> Decimal(1) / Decimal(7) Decimal('0.142857') >>> getcontext().prec = 28 >>> Decimal(1) / Decimal(7) Decimal('0.1428571428571428571428571429') * Both binary and decimal floating point are implemented in terms of published standards. While the built-in float type exposes only a modest portion of its capabilities, the decimal module exposes all required parts of the standard. When needed, the programmer has full control over rounding and signal handling. This includes an option to enforce exact arithmetic by using exceptions to block any inexact operations. * The decimal module was designed to support “without prejudice, both exact unrounded decimal arithmetic (sometimes called fixed-point arithmetic) and rounded floating-point arithmetic.” – excerpt from the decimal arithmetic specification. The module design is centered around three concepts: the decimal number, the context for arithmetic, and signals. A decimal number is immutable. It has a sign, coefficient digits, and an exponent. To preserve significance, the coefficient digits do not truncate trailing zeros. Decimals also include special values such as "Infinity", "-Infinity", and "NaN". The standard also differentiates "-0" from "+0". The context for arithmetic is an environment specifying precision, rounding rules, limits on exponents, flags indicating the results of operations, and trap enablers which determine whether signals are treated as exceptions. Rounding options include "ROUND_CEILING", "ROUND_DOWN", "ROUND_FLOOR", "ROUND_HALF_DOWN", "ROUND_HALF_EVEN", "ROUND_HALF_UP", "ROUND_UP", and "ROUND_05UP". Signals are groups of exceptional conditions arising during the course of computation. Depending on the needs of the application, signals may be ignored, considered as informational, or treated as exceptions. The signals in the decimal module are: "Clamped", "InvalidOperation", "DivisionByZero", "Inexact", "Rounded", "Subnormal", "Overflow", "Underflow" and "FloatOperation". For each signal there is a flag and a trap enabler. When a signal is encountered, its flag is set to one, then, if the trap enabler is set to one, an exception is raised. Flags are sticky, so the user needs to reset them before monitoring a calculation. See also: * IBM’s General Decimal Arithmetic Specification, The General Decimal Arithmetic Specification. Quick-start Tutorial ==================== The usual start to using decimals is importing the module, viewing the current context with "getcontext()" and, if necessary, setting new values for precision, rounding, or enabled traps: >>> from decimal import * >>> getcontext() Context(prec=28, rounding=ROUND_HALF_EVEN, Emin=-999999, Emax=999999, capitals=1, clamp=0, flags=[], traps=[Overflow, DivisionByZero, InvalidOperation]) >>> getcontext().prec = 7 # Set a new precision Decimal instances can be constructed from integers, strings, floats, or tuples. Construction from an integer or a float performs an exact conversion of the value of that integer or float. Decimal numbers include special values such as "NaN" which stands for “Not a number”, positive and negative "Infinity", and "-0": >>> getcontext().prec = 28 >>> Decimal(10) Decimal('10') >>> Decimal('3.14') Decimal('3.14') >>> Decimal(3.14) Decimal('3.140000000000000124344978758017532527446746826171875') >>> Decimal((0, (3, 1, 4), -2)) Decimal('3.14') >>> Decimal(str(2.0 ** 0.5)) Decimal('1.4142135623730951') >>> Decimal(2) ** Decimal('0.5') Decimal('1.414213562373095048801688724') >>> Decimal('NaN') Decimal('NaN') >>> Decimal('-Infinity') Decimal('-Infinity') If the "FloatOperation" signal is trapped, accidental mixing of decimals and floats in constructors or ordering comparisons raises an exception: >>> c = getcontext() >>> c.traps[FloatOperation] = True >>> Decimal(3.14) Traceback (most recent call last): File "", line 1, in decimal.FloatOperation: [] >>> Decimal('3.5') < 3.7 Traceback (most recent call last): File "", line 1, in decimal.FloatOperation: [] >>> Decimal('3.5') == 3.5 True New in version 3.3. The significance of a new Decimal is determined solely by the number of digits input. Context precision and rounding only come into play during arithmetic operations. >>> getcontext().prec = 6 >>> Decimal('3.0') Decimal('3.0') >>> Decimal('3.1415926535') Decimal('3.1415926535') >>> Decimal('3.1415926535') + Decimal('2.7182818285') Decimal('5.85987') >>> getcontext().rounding = ROUND_UP >>> Decimal('3.1415926535') + Decimal('2.7182818285') Decimal('5.85988') If the internal limits of the C version are exceeded, constructing a decimal raises "InvalidOperation": >>> Decimal("1e9999999999999999999") Traceback (most recent call last): File "", line 1, in decimal.InvalidOperation: [] Changed in version 3.3. Decimals interact well with much of the rest of Python. Here is a small decimal floating point flying circus: >>> data = list(map(Decimal, '1.34 1.87 3.45 2.35 1.00 0.03 9.25'.split())) >>> max(data) Decimal('9.25') >>> min(data) Decimal('0.03') >>> sorted(data) [Decimal('0.03'), Decimal('1.00'), Decimal('1.34'), Decimal('1.87'), Decimal('2.35'), Decimal('3.45'), Decimal('9.25')] >>> sum(data) Decimal('19.29') >>> a,b,c = data[:3] >>> str(a) '1.34' >>> float(a) 1.34 >>> round(a, 1) Decimal('1.3') >>> int(a) 1 >>> a * 5 Decimal('6.70') >>> a * b Decimal('2.5058') >>> c % a Decimal('0.77') And some mathematical functions are also available to Decimal: >>> getcontext().prec = 28 >>> Decimal(2).sqrt() Decimal('1.414213562373095048801688724') >>> Decimal(1).exp() Decimal('2.718281828459045235360287471') >>> Decimal('10').ln() Decimal('2.302585092994045684017991455') >>> Decimal('10').log10() Decimal('1') The "quantize()" method rounds a number to a fixed exponent. This method is useful for monetary applications that often round results to a fixed number of places: >>> Decimal('7.325').quantize(Decimal('.01'), rounding=ROUND_DOWN) Decimal('7.32') >>> Decimal('7.325').quantize(Decimal('1.'), rounding=ROUND_UP) Decimal('8') As shown above, the "getcontext()" function accesses the current context and allows the settings to be changed. This approach meets the needs of most applications. For more advanced work, it may be useful to create alternate contexts using the Context() constructor. To make an alternate active, use the "setcontext()" function. In accordance with the standard, the "decimal" module provides two ready to use standard contexts, "BasicContext" and "ExtendedContext". The former is especially useful for debugging because many of the traps are enabled: >>> myothercontext = Context(prec=60, rounding=ROUND_HALF_DOWN) >>> setcontext(myothercontext) >>> Decimal(1) / Decimal(7) Decimal('0.142857142857142857142857142857142857142857142857142857142857') >>> ExtendedContext Context(prec=9, rounding=ROUND_HALF_EVEN, Emin=-999999, Emax=999999, capitals=1, clamp=0, flags=[], traps=[]) >>> setcontext(ExtendedContext) >>> Decimal(1) / Decimal(7) Decimal('0.142857143') >>> Decimal(42) / Decimal(0) Decimal('Infinity') >>> setcontext(BasicContext) >>> Decimal(42) / Decimal(0) Traceback (most recent call last): File "", line 1, in -toplevel- Decimal(42) / Decimal(0) DivisionByZero: x / 0 Contexts also have signal flags for monitoring exceptional conditions encountered during computations. The flags remain set until explicitly cleared, so it is best to clear the flags before each set of monitored computations by using the "clear_flags()" method. >>> setcontext(ExtendedContext) >>> getcontext().clear_flags() >>> Decimal(355) / Decimal(113) Decimal('3.14159292') >>> getcontext() Context(prec=9, rounding=ROUND_HALF_EVEN, Emin=-999999, Emax=999999, capitals=1, clamp=0, flags=[Inexact, Rounded], traps=[]) The *flags* entry shows that the rational approximation to "Pi" was rounded (digits beyond the context precision were thrown away) and that the result is inexact (some of the discarded digits were non- zero). Individual traps are set using the dictionary in the "traps" field of a context: >>> setcontext(ExtendedContext) >>> Decimal(1) / Decimal(0) Decimal('Infinity') >>> getcontext().traps[DivisionByZero] = 1 >>> Decimal(1) / Decimal(0) Traceback (most recent call last): File "", line 1, in -toplevel- Decimal(1) / Decimal(0) DivisionByZero: x / 0 Most programs adjust the current context only once, at the beginning of the program. And, in many applications, data is converted to "Decimal" with a single cast inside a loop. With context set and decimals created, the bulk of the program manipulates the data no differently than with other Python numeric types. Decimal objects =============== class decimal.Decimal(value="0", context=None) Construct a new "Decimal" object based from *value*. *value* can be an integer, string, tuple, "float", or another "Decimal" object. If no *value* is given, returns "Decimal('0')". If *value* is a string, it should conform to the decimal numeric string syntax after leading and trailing whitespace characters, as well as underscores throughout, are removed: sign ::= '+' | '-' digit ::= '0' | '1' | '2' | '3' | '4' | '5' | '6' | '7' | '8' | '9' indicator ::= 'e' | 'E' digits ::= digit [digit]... decimal-part ::= digits '.' [digits] | ['.'] digits exponent-part ::= indicator [sign] digits infinity ::= 'Infinity' | 'Inf' nan ::= 'NaN' [digits] | 'sNaN' [digits] numeric-value ::= decimal-part [exponent-part] | infinity numeric-string ::= [sign] numeric-value | [sign] nan Other Unicode decimal digits are also permitted where "digit" appears above. These include decimal digits from various other alphabets (for example, Arabic-Indic and Devanāgarī digits) along with the fullwidth digits "'\uff10'" through "'\uff19'". If *value* is a "tuple", it should have three components, a sign ("0" for positive or "1" for negative), a "tuple" of digits, and an integer exponent. For example, "Decimal((0, (1, 4, 1, 4), -3))" returns "Decimal('1.414')". If *value* is a "float", the binary floating point value is losslessly converted to its exact decimal equivalent. This conversion can often require 53 or more digits of precision. For example, "Decimal(float('1.1'))" converts to "Decimal('1.100000000000000088817841970012523233890533447265625')". The *context* precision does not affect how many digits are stored. That is determined exclusively by the number of digits in *value*. For example, "Decimal('3.00000')" records all five zeros even if the context precision is only three. The purpose of the *context* argument is determining what to do if *value* is a malformed string. If the context traps "InvalidOperation", an exception is raised; otherwise, the constructor returns a new Decimal with the value of "NaN". Once constructed, "Decimal" objects are immutable. Changed in version 3.2: The argument to the constructor is now permitted to be a "float" instance. Changed in version 3.3: "float" arguments raise an exception if the "FloatOperation" trap is set. By default the trap is off. Changed in version 3.6: Underscores are allowed for grouping, as with integral and floating-point literals in code. Decimal floating point objects share many properties with the other built-in numeric types such as "float" and "int". All of the usual math operations and special methods apply. Likewise, decimal objects can be copied, pickled, printed, used as dictionary keys, used as set elements, compared, sorted, and coerced to another type (such as "float" or "int"). There are some small differences between arithmetic on Decimal objects and arithmetic on integers and floats. When the remainder operator "%" is applied to Decimal objects, the sign of the result is the sign of the *dividend* rather than the sign of the divisor: >>> (-7) % 4 1 >>> Decimal(-7) % Decimal(4) Decimal('-3') The integer division operator "//" behaves analogously, returning the integer part of the true quotient (truncating towards zero) rather than its floor, so as to preserve the usual identity "x == (x // y) * y + x % y": >>> -7 // 4 -2 >>> Decimal(-7) // Decimal(4) Decimal('-1') The "%" and "//" operators implement the "remainder" and "divide- integer" operations (respectively) as described in the specification. Decimal objects cannot generally be combined with floats or instances of "fractions.Fraction" in arithmetic operations: an attempt to add a "Decimal" to a "float", for example, will raise a "TypeError". However, it is possible to use Python’s comparison operators to compare a "Decimal" instance "x" with another number "y". This avoids confusing results when doing equality comparisons between numbers of different types. Changed in version 3.2: Mixed-type comparisons between "Decimal" instances and other numeric types are now fully supported. In addition to the standard numeric properties, decimal floating point objects also have a number of specialized methods: adjusted() Return the adjusted exponent after shifting out the coefficient’s rightmost digits until only the lead digit remains: "Decimal('321e+5').adjusted()" returns seven. Used for determining the position of the most significant digit with respect to the decimal point. as_integer_ratio() Return a pair "(n, d)" of integers that represent the given "Decimal" instance as a fraction, in lowest terms and with a positive denominator: >>> Decimal('-3.14').as_integer_ratio() (-157, 50) The conversion is exact. Raise OverflowError on infinities and ValueError on NaNs. New in version 3.6. as_tuple() Return a *named tuple* representation of the number: "DecimalTuple(sign, digits, exponent)". canonical() Return the canonical encoding of the argument. Currently, the encoding of a "Decimal" instance is always canonical, so this operation returns its argument unchanged. compare(other, context=None) Compare the values of two Decimal instances. "compare()" returns a Decimal instance, and if either operand is a NaN then the result is a NaN: a or b is a NaN ==> Decimal('NaN') a < b ==> Decimal('-1') a == b ==> Decimal('0') a > b ==> Decimal('1') compare_signal(other, context=None) This operation is identical to the "compare()" method, except that all NaNs signal. That is, if neither operand is a signaling NaN then any quiet NaN operand is treated as though it were a signaling NaN. compare_total(other, context=None) Compare two operands using their abstract representation rather than their numerical value. Similar to the "compare()" method, but the result gives a total ordering on "Decimal" instances. Two "Decimal" instances with the same numeric value but different representations compare unequal in this ordering: >>> Decimal('12.0').compare_total(Decimal('12')) Decimal('-1') Quiet and signaling NaNs are also included in the total ordering. The result of this function is "Decimal('0')" if both operands have the same representation, "Decimal('-1')" if the first operand is lower in the total order than the second, and "Decimal('1')" if the first operand is higher in the total order than the second operand. See the specification for details of the total order. This operation is unaffected by context and is quiet: no flags are changed and no rounding is performed. As an exception, the C version may raise InvalidOperation if the second operand cannot be converted exactly. compare_total_mag(other, context=None) Compare two operands using their abstract representation rather than their value as in "compare_total()", but ignoring the sign of each operand. "x.compare_total_mag(y)" is equivalent to "x.copy_abs().compare_total(y.copy_abs())". This operation is unaffected by context and is quiet: no flags are changed and no rounding is performed. As an exception, the C version may raise InvalidOperation if the second operand cannot be converted exactly. conjugate() Just returns self, this method is only to comply with the Decimal Specification. copy_abs() Return the absolute value of the argument. This operation is unaffected by the context and is quiet: no flags are changed and no rounding is performed. copy_negate() Return the negation of the argument. This operation is unaffected by the context and is quiet: no flags are changed and no rounding is performed. copy_sign(other, context=None) Return a copy of the first operand with the sign set to be the same as the sign of the second operand. For example: >>> Decimal('2.3').copy_sign(Decimal('-1.5')) Decimal('-2.3') This operation is unaffected by context and is quiet: no flags are changed and no rounding is performed. As an exception, the C version may raise InvalidOperation if the second operand cannot be converted exactly. exp(context=None) Return the value of the (natural) exponential function "e**x" at the given number. The result is correctly rounded using the "ROUND_HALF_EVEN" rounding mode. >>> Decimal(1).exp() Decimal('2.718281828459045235360287471') >>> Decimal(321).exp() Decimal('2.561702493119680037517373933E+139') from_float(f) Classmethod that converts a float to a decimal number, exactly. Note *Decimal.from_float(0.1)* is not the same as *Decimal(‘0.1’)*. Since 0.1 is not exactly representable in binary floating point, the value is stored as the nearest representable value which is *0x1.999999999999ap-4*. That equivalent value in decimal is *0.1000000000000000055511151231257827021181583404541015625*. Note: From Python 3.2 onwards, a "Decimal" instance can also be constructed directly from a "float". >>> Decimal.from_float(0.1) Decimal('0.1000000000000000055511151231257827021181583404541015625') >>> Decimal.from_float(float('nan')) Decimal('NaN') >>> Decimal.from_float(float('inf')) Decimal('Infinity') >>> Decimal.from_float(float('-inf')) Decimal('-Infinity') New in version 3.1. fma(other, third, context=None) Fused multiply-add. Return self*other+third with no rounding of the intermediate product self*other. >>> Decimal(2).fma(3, 5) Decimal('11') is_canonical() Return "True" if the argument is canonical and "False" otherwise. Currently, a "Decimal" instance is always canonical, so this operation always returns "True". is_finite() Return "True" if the argument is a finite number, and "False" if the argument is an infinity or a NaN. is_infinite() Return "True" if the argument is either positive or negative infinity and "False" otherwise. is_nan() Return "True" if the argument is a (quiet or signaling) NaN and "False" otherwise. is_normal(context=None) Return "True" if the argument is a *normal* finite number. Return "False" if the argument is zero, subnormal, infinite or a NaN. is_qnan() Return "True" if the argument is a quiet NaN, and "False" otherwise. is_signed() Return "True" if the argument has a negative sign and "False" otherwise. Note that zeros and NaNs can both carry signs. is_snan() Return "True" if the argument is a signaling NaN and "False" otherwise. is_subnormal(context=None) Return "True" if the argument is subnormal, and "False" otherwise. is_zero() Return "True" if the argument is a (positive or negative) zero and "False" otherwise. ln(context=None) Return the natural (base e) logarithm of the operand. The result is correctly rounded using the "ROUND_HALF_EVEN" rounding mode. log10(context=None) Return the base ten logarithm of the operand. The result is correctly rounded using the "ROUND_HALF_EVEN" rounding mode. logb(context=None) For a nonzero number, return the adjusted exponent of its operand as a "Decimal" instance. If the operand is a zero then "Decimal('-Infinity')" is returned and the "DivisionByZero" flag is raised. If the operand is an infinity then "Decimal('Infinity')" is returned. logical_and(other, context=None) "logical_and()" is a logical operation which takes two *logical operands* (see Logical operands). The result is the digit-wise "and" of the two operands. logical_invert(context=None) "logical_invert()" is a logical operation. The result is the digit-wise inversion of the operand. logical_or(other, context=None) "logical_or()" is a logical operation which takes two *logical operands* (see Logical operands). The result is the digit-wise "or" of the two operands. logical_xor(other, context=None) "logical_xor()" is a logical operation which takes two *logical operands* (see Logical operands). The result is the digit-wise exclusive or of the two operands. max(other, context=None) Like "max(self, other)" except that the context rounding rule is applied before returning and that "NaN" values are either signaled or ignored (depending on the context and whether they are signaling or quiet). max_mag(other, context=None) Similar to the "max()" method, but the comparison is done using the absolute values of the operands. min(other, context=None) Like "min(self, other)" except that the context rounding rule is applied before returning and that "NaN" values are either signaled or ignored (depending on the context and whether they are signaling or quiet). min_mag(other, context=None) Similar to the "min()" method, but the comparison is done using the absolute values of the operands. next_minus(context=None) Return the largest number representable in the given context (or in the current thread’s context if no context is given) that is smaller than the given operand. next_plus(context=None) Return the smallest number representable in the given context (or in the current thread’s context if no context is given) that is larger than the given operand. next_toward(other, context=None) If the two operands are unequal, return the number closest to the first operand in the direction of the second operand. If both operands are numerically equal, return a copy of the first operand with the sign set to be the same as the sign of the second operand. normalize(context=None) Normalize the number by stripping the rightmost trailing zeros and converting any result equal to "Decimal('0')" to "Decimal('0e0')". Used for producing canonical values for attributes of an equivalence class. For example, "Decimal('32.100')" and "Decimal('0.321000e+2')" both normalize to the equivalent value "Decimal('32.1')". number_class(context=None) Return a string describing the *class* of the operand. The returned value is one of the following ten strings. * ""-Infinity"", indicating that the operand is negative infinity. * ""-Normal"", indicating that the operand is a negative normal number. * ""-Subnormal"", indicating that the operand is negative and subnormal. * ""-Zero"", indicating that the operand is a negative zero. * ""+Zero"", indicating that the operand is a positive zero. * ""+Subnormal"", indicating that the operand is positive and subnormal. * ""+Normal"", indicating that the operand is a positive normal number. * ""+Infinity"", indicating that the operand is positive infinity. * ""NaN"", indicating that the operand is a quiet NaN (Not a Number). * ""sNaN"", indicating that the operand is a signaling NaN. quantize(exp, rounding=None, context=None) Return a value equal to the first operand after rounding and having the exponent of the second operand. >>> Decimal('1.41421356').quantize(Decimal('1.000')) Decimal('1.414') Unlike other operations, if the length of the coefficient after the quantize operation would be greater than precision, then an "InvalidOperation" is signaled. This guarantees that, unless there is an error condition, the quantized exponent is always equal to that of the right-hand operand. Also unlike other operations, quantize never signals Underflow, even if the result is subnormal and inexact. If the exponent of the second operand is larger than that of the first then rounding may be necessary. In this case, the rounding mode is determined by the "rounding" argument if given, else by the given "context" argument; if neither argument is given the rounding mode of the current thread’s context is used. An error is returned whenever the resulting exponent is greater than "Emax" or less than "Etiny". radix() Return "Decimal(10)", the radix (base) in which the "Decimal" class does all its arithmetic. Included for compatibility with the specification. remainder_near(other, context=None) Return the remainder from dividing *self* by *other*. This differs from "self % other" in that the sign of the remainder is chosen so as to minimize its absolute value. More precisely, the return value is "self - n * other" where "n" is the integer nearest to the exact value of "self / other", and if two integers are equally near then the even one is chosen. If the result is zero then its sign will be the sign of *self*. >>> Decimal(18).remainder_near(Decimal(10)) Decimal('-2') >>> Decimal(25).remainder_near(Decimal(10)) Decimal('5') >>> Decimal(35).remainder_near(Decimal(10)) Decimal('-5') rotate(other, context=None) Return the result of rotating the digits of the first operand by an amount specified by the second operand. The second operand must be an integer in the range -precision through precision. The absolute value of the second operand gives the number of places to rotate. If the second operand is positive then rotation is to the left; otherwise rotation is to the right. The coefficient of the first operand is padded on the left with zeros to length precision if necessary. The sign and exponent of the first operand are unchanged. same_quantum(other, context=None) Test whether self and other have the same exponent or whether both are "NaN". This operation is unaffected by context and is quiet: no flags are changed and no rounding is performed. As an exception, the C version may raise InvalidOperation if the second operand cannot be converted exactly. scaleb(other, context=None) Return the first operand with exponent adjusted by the second. Equivalently, return the first operand multiplied by "10**other". The second operand must be an integer. shift(other, context=None) Return the result of shifting the digits of the first operand by an amount specified by the second operand. The second operand must be an integer in the range -precision through precision. The absolute value of the second operand gives the number of places to shift. If the second operand is positive then the shift is to the left; otherwise the shift is to the right. Digits shifted into the coefficient are zeros. The sign and exponent of the first operand are unchanged. sqrt(context=None) Return the square root of the argument to full precision. to_eng_string(context=None) Convert to a string, using engineering notation if an exponent is needed. Engineering notation has an exponent which is a multiple of 3. This can leave up to 3 digits to the left of the decimal place and may require the addition of either one or two trailing zeros. For example, this converts "Decimal('123E+1')" to "Decimal('1.23E+3')". to_integral(rounding=None, context=None) Identical to the "to_integral_value()" method. The "to_integral" name has been kept for compatibility with older versions. to_integral_exact(rounding=None, context=None) Round to the nearest integer, signaling "Inexact" or "Rounded" as appropriate if rounding occurs. The rounding mode is determined by the "rounding" parameter if given, else by the given "context". If neither parameter is given then the rounding mode of the current context is used. to_integral_value(rounding=None, context=None) Round to the nearest integer without signaling "Inexact" or "Rounded". If given, applies *rounding*; otherwise, uses the rounding method in either the supplied *context* or the current context. Logical operands ---------------- The "logical_and()", "logical_invert()", "logical_or()", and "logical_xor()" methods expect their arguments to be *logical operands*. A *logical operand* is a "Decimal" instance whose exponent and sign are both zero, and whose digits are all either "0" or "1". Context objects =============== Contexts are environments for arithmetic operations. They govern precision, set rules for rounding, determine which signals are treated as exceptions, and limit the range for exponents. Each thread has its own current context which is accessed or changed using the "getcontext()" and "setcontext()" functions: decimal.getcontext() Return the current context for the active thread. decimal.setcontext(c) Set the current context for the active thread to *c*. You can also use the "with" statement and the "localcontext()" function to temporarily change the active context. decimal.localcontext(ctx=None) Return a context manager that will set the current context for the active thread to a copy of *ctx* on entry to the with-statement and restore the previous context when exiting the with-statement. If no context is specified, a copy of the current context is used. For example, the following code sets the current decimal precision to 42 places, performs a calculation, and then automatically restores the previous context: from decimal import localcontext with localcontext() as ctx: ctx.prec = 42 # Perform a high precision calculation s = calculate_something() s = +s # Round the final result back to the default precision New contexts can also be created using the "Context" constructor described below. In addition, the module provides three pre-made contexts: class decimal.BasicContext This is a standard context defined by the General Decimal Arithmetic Specification. Precision is set to nine. Rounding is set to "ROUND_HALF_UP". All flags are cleared. All traps are enabled (treated as exceptions) except "Inexact", "Rounded", and "Subnormal". Because many of the traps are enabled, this context is useful for debugging. class decimal.ExtendedContext This is a standard context defined by the General Decimal Arithmetic Specification. Precision is set to nine. Rounding is set to "ROUND_HALF_EVEN". All flags are cleared. No traps are enabled (so that exceptions are not raised during computations). Because the traps are disabled, this context is useful for applications that prefer to have result value of "NaN" or "Infinity" instead of raising exceptions. This allows an application to complete a run in the presence of conditions that would otherwise halt the program. class decimal.DefaultContext This context is used by the "Context" constructor as a prototype for new contexts. Changing a field (such a precision) has the effect of changing the default for new contexts created by the "Context" constructor. This context is most useful in multi-threaded environments. Changing one of the fields before threads are started has the effect of setting system-wide defaults. Changing the fields after threads have started is not recommended as it would require thread synchronization to prevent race conditions. In single threaded environments, it is preferable to not use this context at all. Instead, simply create contexts explicitly as described below. The default values are "prec"="28", "rounding"="ROUND_HALF_EVEN", and enabled traps for "Overflow", "InvalidOperation", and "DivisionByZero". In addition to the three supplied contexts, new contexts can be created with the "Context" constructor. class decimal.Context(prec=None, rounding=None, Emin=None, Emax=None, capitals=None, clamp=None, flags=None, traps=None) Creates a new context. If a field is not specified or is "None", the default values are copied from the "DefaultContext". If the *flags* field is not specified or is "None", all flags are cleared. *prec* is an integer in the range ["1", "MAX_PREC"] that sets the precision for arithmetic operations in the context. The *rounding* option is one of the constants listed in the section Rounding Modes. The *traps* and *flags* fields list any signals to be set. Generally, new contexts should only set traps and leave the flags clear. The *Emin* and *Emax* fields are integers specifying the outer limits allowable for exponents. *Emin* must be in the range ["MIN_EMIN", "0"], *Emax* in the range ["0", "MAX_EMAX"]. The *capitals* field is either "0" or "1" (the default). If set to "1", exponents are printed with a capital "E"; otherwise, a lowercase "e" is used: "Decimal('6.02e+23')". The *clamp* field is either "0" (the default) or "1". If set to "1", the exponent "e" of a "Decimal" instance representable in this context is strictly limited to the range "Emin - prec + 1 <= e <= Emax - prec + 1". If *clamp* is "0" then a weaker condition holds: the adjusted exponent of the "Decimal" instance is at most "Emax". When *clamp* is "1", a large normal number will, where possible, have its exponent reduced and a corresponding number of zeros added to its coefficient, in order to fit the exponent constraints; this preserves the value of the number but loses information about significant trailing zeros. For example: >>> Context(prec=6, Emax=999, clamp=1).create_decimal('1.23e999') Decimal('1.23000E+999') A *clamp* value of "1" allows compatibility with the fixed-width decimal interchange formats specified in IEEE 754. The "Context" class defines several general purpose methods as well as a large number of methods for doing arithmetic directly in a given context. In addition, for each of the "Decimal" methods described above (with the exception of the "adjusted()" and "as_tuple()" methods) there is a corresponding "Context" method. For example, for a "Context" instance "C" and "Decimal" instance "x", "C.exp(x)" is equivalent to "x.exp(context=C)". Each "Context" method accepts a Python integer (an instance of "int") anywhere that a Decimal instance is accepted. clear_flags() Resets all of the flags to "0". clear_traps() Resets all of the traps to "0". New in version 3.3. copy() Return a duplicate of the context. copy_decimal(num) Return a copy of the Decimal instance num. create_decimal(num) Creates a new Decimal instance from *num* but using *self* as context. Unlike the "Decimal" constructor, the context precision, rounding method, flags, and traps are applied to the conversion. This is useful because constants are often given to a greater precision than is needed by the application. Another benefit is that rounding immediately eliminates unintended effects from digits beyond the current precision. In the following example, using unrounded inputs means that adding zero to a sum can change the result: >>> getcontext().prec = 3 >>> Decimal('3.4445') + Decimal('1.0023') Decimal('4.45') >>> Decimal('3.4445') + Decimal(0) + Decimal('1.0023') Decimal('4.44') This method implements the to-number operation of the IBM specification. If the argument is a string, no leading or trailing whitespace or underscores are permitted. create_decimal_from_float(f) Creates a new Decimal instance from a float *f* but rounding using *self* as the context. Unlike the "Decimal.from_float()" class method, the context precision, rounding method, flags, and traps are applied to the conversion. >>> context = Context(prec=5, rounding=ROUND_DOWN) >>> context.create_decimal_from_float(math.pi) Decimal('3.1415') >>> context = Context(prec=5, traps=[Inexact]) >>> context.create_decimal_from_float(math.pi) Traceback (most recent call last): ... decimal.Inexact: None New in version 3.1. Etiny() Returns a value equal to "Emin - prec + 1" which is the minimum exponent value for subnormal results. When underflow occurs, the exponent is set to "Etiny". Etop() Returns a value equal to "Emax - prec + 1". The usual approach to working with decimals is to create "Decimal" instances and then apply arithmetic operations which take place within the current context for the active thread. An alternative approach is to use context methods for calculating within a specific context. The methods are similar to those for the "Decimal" class and are only briefly recounted here. abs(x) Returns the absolute value of *x*. add(x, y) Return the sum of *x* and *y*. canonical(x) Returns the same Decimal object *x*. compare(x, y) Compares *x* and *y* numerically. compare_signal(x, y) Compares the values of the two operands numerically. compare_total(x, y) Compares two operands using their abstract representation. compare_total_mag(x, y) Compares two operands using their abstract representation, ignoring sign. copy_abs(x) Returns a copy of *x* with the sign set to 0. copy_negate(x) Returns a copy of *x* with the sign inverted. copy_sign(x, y) Copies the sign from *y* to *x*. divide(x, y) Return *x* divided by *y*. divide_int(x, y) Return *x* divided by *y*, truncated to an integer. divmod(x, y) Divides two numbers and returns the integer part of the result. exp(x) Returns *e ** x*. fma(x, y, z) Returns *x* multiplied by *y*, plus *z*. is_canonical(x) Returns "True" if *x* is canonical; otherwise returns "False". is_finite(x) Returns "True" if *x* is finite; otherwise returns "False". is_infinite(x) Returns "True" if *x* is infinite; otherwise returns "False". is_nan(x) Returns "True" if *x* is a qNaN or sNaN; otherwise returns "False". is_normal(x) Returns "True" if *x* is a normal number; otherwise returns "False". is_qnan(x) Returns "True" if *x* is a quiet NaN; otherwise returns "False". is_signed(x) Returns "True" if *x* is negative; otherwise returns "False". is_snan(x) Returns "True" if *x* is a signaling NaN; otherwise returns "False". is_subnormal(x) Returns "True" if *x* is subnormal; otherwise returns "False". is_zero(x) Returns "True" if *x* is a zero; otherwise returns "False". ln(x) Returns the natural (base e) logarithm of *x*. log10(x) Returns the base 10 logarithm of *x*. logb(x) Returns the exponent of the magnitude of the operand’s MSD. logical_and(x, y) Applies the logical operation *and* between each operand’s digits. logical_invert(x) Invert all the digits in *x*. logical_or(x, y) Applies the logical operation *or* between each operand’s digits. logical_xor(x, y) Applies the logical operation *xor* between each operand’s digits. max(x, y) Compares two values numerically and returns the maximum. max_mag(x, y) Compares the values numerically with their sign ignored. min(x, y) Compares two values numerically and returns the minimum. min_mag(x, y) Compares the values numerically with their sign ignored. minus(x) Minus corresponds to the unary prefix minus operator in Python. multiply(x, y) Return the product of *x* and *y*. next_minus(x) Returns the largest representable number smaller than *x*. next_plus(x) Returns the smallest representable number larger than *x*. next_toward(x, y) Returns the number closest to *x*, in direction towards *y*. normalize(x) Reduces *x* to its simplest form. number_class(x) Returns an indication of the class of *x*. plus(x) Plus corresponds to the unary prefix plus operator in Python. This operation applies the context precision and rounding, so it is *not* an identity operation. power(x, y, modulo=None) Return "x" to the power of "y", reduced modulo "modulo" if given. With two arguments, compute "x**y". If "x" is negative then "y" must be integral. The result will be inexact unless "y" is integral and the result is finite and can be expressed exactly in ‘precision’ digits. The rounding mode of the context is used. Results are always correctly-rounded in the Python version. "Decimal(0) ** Decimal(0)" results in "InvalidOperation", and if "InvalidOperation" is not trapped, then results in "Decimal('NaN')". Changed in version 3.3: The C module computes "power()" in terms of the correctly-rounded "exp()" and "ln()" functions. The result is well-defined but only “almost always correctly- rounded”. With three arguments, compute "(x**y) % modulo". For the three argument form, the following restrictions on the arguments hold: * all three arguments must be integral * "y" must be nonnegative * at least one of "x" or "y" must be nonzero * "modulo" must be nonzero and have at most ‘precision’ digits The value resulting from "Context.power(x, y, modulo)" is equal to the value that would be obtained by computing "(x**y) % modulo" with unbounded precision, but is computed more efficiently. The exponent of the result is zero, regardless of the exponents of "x", "y" and "modulo". The result is always exact. quantize(x, y) Returns a value equal to *x* (rounded), having the exponent of *y*. radix() Just returns 10, as this is Decimal, :) remainder(x, y) Returns the remainder from integer division. The sign of the result, if non-zero, is the same as that of the original dividend. remainder_near(x, y) Returns "x - y * n", where *n* is the integer nearest the exact value of "x / y" (if the result is 0 then its sign will be the sign of *x*). rotate(x, y) Returns a rotated copy of *x*, *y* times. same_quantum(x, y) Returns "True" if the two operands have the same exponent. scaleb(x, y) Returns the first operand after adding the second value its exp. shift(x, y) Returns a shifted copy of *x*, *y* times. sqrt(x) Square root of a non-negative number to context precision. subtract(x, y) Return the difference between *x* and *y*. to_eng_string(x) Convert to a string, using engineering notation if an exponent is needed. Engineering notation has an exponent which is a multiple of 3. This can leave up to 3 digits to the left of the decimal place and may require the addition of either one or two trailing zeros. to_integral_exact(x) Rounds to an integer. to_sci_string(x) Converts a number to a string using scientific notation. Constants ========= The constants in this section are only relevant for the C module. They are also included in the pure Python version for compatibility. +-----------------------+-----------------------+---------------------------------+ | | 32-bit | 64-bit | |=======================|=======================|=================================| | decimal.MAX_PREC | "425000000" | "999999999999999999" | +-----------------------+-----------------------+---------------------------------+ | decimal.MAX_EMAX | "425000000" | "999999999999999999" | +-----------------------+-----------------------+---------------------------------+ | decimal.MIN_EMIN | "-425000000" | "-999999999999999999" | +-----------------------+-----------------------+---------------------------------+ | decimal.MIN_ETINY | "-849999999" | "-1999999999999999997" | +-----------------------+-----------------------+---------------------------------+ decimal.HAVE_THREADS The value is "True". Deprecated, because Python now always has threads. Deprecated since version 3.9. decimal.HAVE_CONTEXTVAR The default value is "True". If Python is compiled "--without- decimal-contextvar", the C version uses a thread-local rather than a coroutine-local context and the value is "False". This is slightly faster in some nested context scenarios. New in version 3.9: backported to 3.7 and 3.8. Rounding modes ============== decimal.ROUND_CEILING Round towards "Infinity". decimal.ROUND_DOWN Round towards zero. decimal.ROUND_FLOOR Round towards "-Infinity". decimal.ROUND_HALF_DOWN Round to nearest with ties going towards zero. decimal.ROUND_HALF_EVEN Round to nearest with ties going to nearest even integer. decimal.ROUND_HALF_UP Round to nearest with ties going away from zero. decimal.ROUND_UP Round away from zero. decimal.ROUND_05UP Round away from zero if last digit after rounding towards zero would have been 0 or 5; otherwise round towards zero. Signals ======= Signals represent conditions that arise during computation. Each corresponds to one context flag and one context trap enabler. The context flag is set whenever the condition is encountered. After the computation, flags may be checked for informational purposes (for instance, to determine whether a computation was exact). After checking the flags, be sure to clear all flags before starting the next computation. If the context’s trap enabler is set for the signal, then the condition causes a Python exception to be raised. For example, if the "DivisionByZero" trap is set, then a "DivisionByZero" exception is raised upon encountering the condition. class decimal.Clamped Altered an exponent to fit representation constraints. Typically, clamping occurs when an exponent falls outside the context’s "Emin" and "Emax" limits. If possible, the exponent is reduced to fit by adding zeros to the coefficient. class decimal.DecimalException Base class for other signals and a subclass of "ArithmeticError". class decimal.DivisionByZero Signals the division of a non-infinite number by zero. Can occur with division, modulo division, or when raising a number to a negative power. If this signal is not trapped, returns "Infinity" or "-Infinity" with the sign determined by the inputs to the calculation. class decimal.Inexact Indicates that rounding occurred and the result is not exact. Signals when non-zero digits were discarded during rounding. The rounded result is returned. The signal flag or trap is used to detect when results are inexact. class decimal.InvalidOperation An invalid operation was performed. Indicates that an operation was requested that does not make sense. If not trapped, returns "NaN". Possible causes include: Infinity - Infinity 0 * Infinity Infinity / Infinity x % 0 Infinity % x sqrt(-x) and x > 0 0 ** 0 x ** (non-integer) x ** Infinity class decimal.Overflow Numerical overflow. Indicates the exponent is larger than "Emax" after rounding has occurred. If not trapped, the result depends on the rounding mode, either pulling inward to the largest representable finite number or rounding outward to "Infinity". In either case, "Inexact" and "Rounded" are also signaled. class decimal.Rounded Rounding occurred though possibly no information was lost. Signaled whenever rounding discards digits; even if those digits are zero (such as rounding "5.00" to "5.0"). If not trapped, returns the result unchanged. This signal is used to detect loss of significant digits. class decimal.Subnormal Exponent was lower than "Emin" prior to rounding. Occurs when an operation result is subnormal (the exponent is too small). If not trapped, returns the result unchanged. class decimal.Underflow Numerical underflow with result rounded to zero. Occurs when a subnormal result is pushed to zero by rounding. "Inexact" and "Subnormal" are also signaled. class decimal.FloatOperation Enable stricter semantics for mixing floats and Decimals. If the signal is not trapped (default), mixing floats and Decimals is permitted in the "Decimal" constructor, "create_decimal()" and all comparison operators. Both conversion and comparisons are exact. Any occurrence of a mixed operation is silently recorded by setting "FloatOperation" in the context flags. Explicit conversions with "from_float()" or "create_decimal_from_float()" do not set the flag. Otherwise (the signal is trapped), only equality comparisons and explicit conversions are silent. All other mixed operations raise "FloatOperation". The following table summarizes the hierarchy of signals: exceptions.ArithmeticError(exceptions.Exception) DecimalException Clamped DivisionByZero(DecimalException, exceptions.ZeroDivisionError) Inexact Overflow(Inexact, Rounded) Underflow(Inexact, Rounded, Subnormal) InvalidOperation Rounded Subnormal FloatOperation(DecimalException, exceptions.TypeError) Floating Point Notes ==================== Mitigating round-off error with increased precision --------------------------------------------------- The use of decimal floating point eliminates decimal representation error (making it possible to represent "0.1" exactly); however, some operations can still incur round-off error when non-zero digits exceed the fixed precision. The effects of round-off error can be amplified by the addition or subtraction of nearly offsetting quantities resulting in loss of significance. Knuth provides two instructive examples where rounded floating point arithmetic with insufficient precision causes the breakdown of the associative and distributive properties of addition: # Examples from Seminumerical Algorithms, Section 4.2.2. >>> from decimal import Decimal, getcontext >>> getcontext().prec = 8 >>> u, v, w = Decimal(11111113), Decimal(-11111111), Decimal('7.51111111') >>> (u + v) + w Decimal('9.5111111') >>> u + (v + w) Decimal('10') >>> u, v, w = Decimal(20000), Decimal(-6), Decimal('6.0000003') >>> (u*v) + (u*w) Decimal('0.01') >>> u * (v+w) Decimal('0.0060000') The "decimal" module makes it possible to restore the identities by expanding the precision sufficiently to avoid loss of significance: >>> getcontext().prec = 20 >>> u, v, w = Decimal(11111113), Decimal(-11111111), Decimal('7.51111111') >>> (u + v) + w Decimal('9.51111111') >>> u + (v + w) Decimal('9.51111111') >>> >>> u, v, w = Decimal(20000), Decimal(-6), Decimal('6.0000003') >>> (u*v) + (u*w) Decimal('0.0060000') >>> u * (v+w) Decimal('0.0060000') Special values -------------- The number system for the "decimal" module provides special values including "NaN", "sNaN", "-Infinity", "Infinity", and two zeros, "+0" and "-0". Infinities can be constructed directly with: "Decimal('Infinity')". Also, they can arise from dividing by zero when the "DivisionByZero" signal is not trapped. Likewise, when the "Overflow" signal is not trapped, infinity can result from rounding beyond the limits of the largest representable number. The infinities are signed (affine) and can be used in arithmetic operations where they get treated as very large, indeterminate numbers. For instance, adding a constant to infinity gives another infinite result. Some operations are indeterminate and return "NaN", or if the "InvalidOperation" signal is trapped, raise an exception. For example, "0/0" returns "NaN" which means “not a number”. This variety of "NaN" is quiet and, once created, will flow through other computations always resulting in another "NaN". This behavior can be useful for a series of computations that occasionally have missing inputs — it allows the calculation to proceed while flagging specific results as invalid. A variant is "sNaN" which signals rather than remaining quiet after every operation. This is a useful return value when an invalid result needs to interrupt a calculation for special handling. The behavior of Python’s comparison operators can be a little surprising where a "NaN" is involved. A test for equality where one of the operands is a quiet or signaling "NaN" always returns "False" (even when doing "Decimal('NaN')==Decimal('NaN')"), while a test for inequality always returns "True". An attempt to compare two Decimals using any of the "<", "<=", ">" or ">=" operators will raise the "InvalidOperation" signal if either operand is a "NaN", and return "False" if this signal is not trapped. Note that the General Decimal Arithmetic specification does not specify the behavior of direct comparisons; these rules for comparisons involving a "NaN" were taken from the IEEE 854 standard (see Table 3 in section 5.7). To ensure strict standards-compliance, use the "compare()" and "compare- signal()" methods instead. The signed zeros can result from calculations that underflow. They keep the sign that would have resulted if the calculation had been carried out to greater precision. Since their magnitude is zero, both positive and negative zeros are treated as equal and their sign is informational. In addition to the two signed zeros which are distinct yet equal, there are various representations of zero with differing precisions yet equivalent in value. This takes a bit of getting used to. For an eye accustomed to normalized floating point representations, it is not immediately obvious that the following calculation returns a value equal to zero: >>> 1 / Decimal('Infinity') Decimal('0E-1000026') Working with threads ==================== The "getcontext()" function accesses a different "Context" object for each thread. Having separate thread contexts means that threads may make changes (such as "getcontext().prec=10") without interfering with other threads. Likewise, the "setcontext()" function automatically assigns its target to the current thread. If "setcontext()" has not been called before "getcontext()", then "getcontext()" will automatically create a new context for use in the current thread. The new context is copied from a prototype context called *DefaultContext*. To control the defaults so that each thread will use the same values throughout the application, directly modify the *DefaultContext* object. This should be done *before* any threads are started so that there won’t be a race condition between threads calling "getcontext()". For example: # Set applicationwide defaults for all threads about to be launched DefaultContext.prec = 12 DefaultContext.rounding = ROUND_DOWN DefaultContext.traps = ExtendedContext.traps.copy() DefaultContext.traps[InvalidOperation] = 1 setcontext(DefaultContext) # Afterwards, the threads can be started t1.start() t2.start() t3.start() . . . Recipes ======= Here are a few recipes that serve as utility functions and that demonstrate ways to work with the "Decimal" class: def moneyfmt(value, places=2, curr='', sep=',', dp='.', pos='', neg='-', trailneg=''): """Convert Decimal to a money formatted string. places: required number of places after the decimal point curr: optional currency symbol before the sign (may be blank) sep: optional grouping separator (comma, period, space, or blank) dp: decimal point indicator (comma or period) only specify as blank when places is zero pos: optional sign for positive numbers: '+', space or blank neg: optional sign for negative numbers: '-', '(', space or blank trailneg:optional trailing minus indicator: '-', ')', space or blank >>> d = Decimal('-1234567.8901') >>> moneyfmt(d, curr='$') '-$1,234,567.89' >>> moneyfmt(d, places=0, sep='.', dp='', neg='', trailneg='-') '1.234.568-' >>> moneyfmt(d, curr='$', neg='(', trailneg=')') '($1,234,567.89)' >>> moneyfmt(Decimal(123456789), sep=' ') '123 456 789.00' >>> moneyfmt(Decimal('-0.02'), neg='<', trailneg='>') '<0.02>' """ q = Decimal(10) ** -places # 2 places --> '0.01' sign, digits, exp = value.quantize(q).as_tuple() result = [] digits = list(map(str, digits)) build, next = result.append, digits.pop if sign: build(trailneg) for i in range(places): build(next() if digits else '0') if places: build(dp) if not digits: build('0') i = 0 while digits: build(next()) i += 1 if i == 3 and digits: i = 0 build(sep) build(curr) build(neg if sign else pos) return ''.join(reversed(result)) def pi(): """Compute Pi to the current precision. >>> print(pi()) 3.141592653589793238462643383 """ getcontext().prec += 2 # extra digits for intermediate steps three = Decimal(3) # substitute "three=3.0" for regular floats lasts, t, s, n, na, d, da = 0, three, 3, 1, 0, 0, 24 while s != lasts: lasts = s n, na = n+na, na+8 d, da = d+da, da+32 t = (t * n) / d s += t getcontext().prec -= 2 return +s # unary plus applies the new precision def exp(x): """Return e raised to the power of x. Result type matches input type. >>> print(exp(Decimal(1))) 2.718281828459045235360287471 >>> print(exp(Decimal(2))) 7.389056098930650227230427461 >>> print(exp(2.0)) 7.38905609893 >>> print(exp(2+0j)) (7.38905609893+0j) """ getcontext().prec += 2 i, lasts, s, fact, num = 0, 0, 1, 1, 1 while s != lasts: lasts = s i += 1 fact *= i num *= x s += num / fact getcontext().prec -= 2 return +s def cos(x): """Return the cosine of x as measured in radians. The Taylor series approximation works best for a small value of x. For larger values, first compute x = x % (2 * pi). >>> print(cos(Decimal('0.5'))) 0.8775825618903727161162815826 >>> print(cos(0.5)) 0.87758256189 >>> print(cos(0.5+0j)) (0.87758256189+0j) """ getcontext().prec += 2 i, lasts, s, fact, num, sign = 0, 0, 1, 1, 1, 1 while s != lasts: lasts = s i += 2 fact *= i * (i-1) num *= x * x sign *= -1 s += num / fact * sign getcontext().prec -= 2 return +s def sin(x): """Return the sine of x as measured in radians. The Taylor series approximation works best for a small value of x. For larger values, first compute x = x % (2 * pi). >>> print(sin(Decimal('0.5'))) 0.4794255386042030002732879352 >>> print(sin(0.5)) 0.479425538604 >>> print(sin(0.5+0j)) (0.479425538604+0j) """ getcontext().prec += 2 i, lasts, s, fact, num, sign = 1, 0, x, 1, x, 1 while s != lasts: lasts = s i += 2 fact *= i * (i-1) num *= x * x sign *= -1 s += num / fact * sign getcontext().prec -= 2 return +s Decimal FAQ =========== Q. It is cumbersome to type "decimal.Decimal('1234.5')". Is there a way to minimize typing when using the interactive interpreter? A. Some users abbreviate the constructor to just a single letter: >>> D = decimal.Decimal >>> D('1.23') + D('3.45') Decimal('4.68') Q. In a fixed-point application with two decimal places, some inputs have many places and need to be rounded. Others are not supposed to have excess digits and need to be validated. What methods should be used? A. The "quantize()" method rounds to a fixed number of decimal places. If the "Inexact" trap is set, it is also useful for validation: >>> TWOPLACES = Decimal(10) ** -2 # same as Decimal('0.01') >>> # Round to two places >>> Decimal('3.214').quantize(TWOPLACES) Decimal('3.21') >>> # Validate that a number does not exceed two places >>> Decimal('3.21').quantize(TWOPLACES, context=Context(traps=[Inexact])) Decimal('3.21') >>> Decimal('3.214').quantize(TWOPLACES, context=Context(traps=[Inexact])) Traceback (most recent call last): ... Inexact: None Q. Once I have valid two place inputs, how do I maintain that invariant throughout an application? A. Some operations like addition, subtraction, and multiplication by an integer will automatically preserve fixed point. Others operations, like division and non-integer multiplication, will change the number of decimal places and need to be followed-up with a "quantize()" step: >>> a = Decimal('102.72') # Initial fixed-point values >>> b = Decimal('3.17') >>> a + b # Addition preserves fixed-point Decimal('105.89') >>> a - b Decimal('99.55') >>> a * 42 # So does integer multiplication Decimal('4314.24') >>> (a * b).quantize(TWOPLACES) # Must quantize non-integer multiplication Decimal('325.62') >>> (b / a).quantize(TWOPLACES) # And quantize division Decimal('0.03') In developing fixed-point applications, it is convenient to define functions to handle the "quantize()" step: >>> def mul(x, y, fp=TWOPLACES): ... return (x * y).quantize(fp) >>> def div(x, y, fp=TWOPLACES): ... return (x / y).quantize(fp) >>> mul(a, b) # Automatically preserve fixed-point Decimal('325.62') >>> div(b, a) Decimal('0.03') Q. There are many ways to express the same value. The numbers "200", "200.000", "2E2", and "02E+4" all have the same value at various precisions. Is there a way to transform them to a single recognizable canonical value? A. The "normalize()" method maps all equivalent values to a single representative: >>> values = map(Decimal, '200 200.000 2E2 .02E+4'.split()) >>> [v.normalize() for v in values] [Decimal('2E+2'), Decimal('2E+2'), Decimal('2E+2'), Decimal('2E+2')] Q. Some decimal values always print with exponential notation. Is there a way to get a non-exponential representation? A. For some values, exponential notation is the only way to express the number of significant places in the coefficient. For example, expressing "5.0E+3" as "5000" keeps the value constant but cannot show the original’s two-place significance. If an application does not care about tracking significance, it is easy to remove the exponent and trailing zeroes, losing significance, but keeping the value unchanged: >>> def remove_exponent(d): ... return d.quantize(Decimal(1)) if d == d.to_integral() else d.normalize() >>> remove_exponent(Decimal('5E+3')) Decimal('5000') Q. Is there a way to convert a regular float to a "Decimal"? A. Yes, any binary floating point number can be exactly expressed as a Decimal though an exact conversion may take more precision than intuition would suggest: >>> Decimal(math.pi) Decimal('3.141592653589793115997963468544185161590576171875') Q. Within a complex calculation, how can I make sure that I haven’t gotten a spurious result because of insufficient precision or rounding anomalies. A. The decimal module makes it easy to test results. A best practice is to re-run calculations using greater precision and with various rounding modes. Widely differing results indicate insufficient precision, rounding mode issues, ill-conditioned inputs, or a numerically unstable algorithm. Q. I noticed that context precision is applied to the results of operations but not to the inputs. Is there anything to watch out for when mixing values of different precisions? A. Yes. The principle is that all values are considered to be exact and so is the arithmetic on those values. Only the results are rounded. The advantage for inputs is that “what you type is what you get”. A disadvantage is that the results can look odd if you forget that the inputs haven’t been rounded: >>> getcontext().prec = 3 >>> Decimal('3.104') + Decimal('2.104') Decimal('5.21') >>> Decimal('3.104') + Decimal('0.000') + Decimal('2.104') Decimal('5.20') The solution is either to increase precision or to force rounding of inputs using the unary plus operation: >>> getcontext().prec = 3 >>> +Decimal('1.23456789') # unary plus triggers rounding Decimal('1.23') Alternatively, inputs can be rounded upon creation using the "Context.create_decimal()" method: >>> Context(prec=5, rounding=ROUND_DOWN).create_decimal('1.2345678') Decimal('1.2345') Q. Is the CPython implementation fast for large numbers? A. Yes. In the CPython and PyPy3 implementations, the C/CFFI versions of the decimal module integrate the high speed libmpdec library for arbitrary precision correctly-rounded decimal floating point arithmetic [1]. "libmpdec" uses Karatsuba multiplication for medium- sized numbers and the Number Theoretic Transform for very large numbers. The context must be adapted for exact arbitrary precision arithmetic. "Emin" and "Emax" should always be set to the maximum values, "clamp" should always be 0 (the default). Setting "prec" requires some care. The easiest approach for trying out bignum arithmetic is to use the maximum value for "prec" as well [2]: >>> setcontext(Context(prec=MAX_PREC, Emax=MAX_EMAX, Emin=MIN_EMIN)) >>> x = Decimal(2) ** 256 >>> x / 128 Decimal('904625697166532776746648320380374280103671755200316906558262375061821325312') For inexact results, "MAX_PREC" is far too large on 64-bit platforms and the available memory will be insufficient: >>> Decimal(1) / 3 Traceback (most recent call last): File "", line 1, in MemoryError On systems with overallocation (e.g. Linux), a more sophisticated approach is to adjust "prec" to the amount of available RAM. Suppose that you have 8GB of RAM and expect 10 simultaneous operands using a maximum of 500MB each: >>> import sys >>> >>> # Maximum number of digits for a single operand using 500MB in 8-byte words >>> # with 19 digits per word (4-byte and 9 digits for the 32-bit build): >>> maxdigits = 19 * ((500 * 1024**2) // 8) >>> >>> # Check that this works: >>> c = Context(prec=maxdigits, Emax=MAX_EMAX, Emin=MIN_EMIN) >>> c.traps[Inexact] = True >>> setcontext(c) >>> >>> # Fill the available precision with nines: >>> x = Decimal(0).logical_invert() * 9 >>> sys.getsizeof(x) 524288112 >>> x + 2 Traceback (most recent call last): File "", line 1, in decimal.Inexact: [] In general (and especially on systems without overallocation), it is recommended to estimate even tighter bounds and set the "Inexact" trap if all calculations are expected to be exact. [1] New in version 3.3. [2] Changed in version 3.9: This approach now works for all exact results except for non-integer powers. Development Tools ***************** The modules described in this chapter help you write software. For example, the "pydoc" module takes a module and generates documentation based on the module’s contents. The "doctest" and "unittest" modules contains frameworks for writing unit tests that automatically exercise code and verify that the expected output is produced. **2to3** can translate Python 2.x source code into valid Python 3.x code. The list of modules described in this chapter is: * "typing" — Support for type hints * Relevant PEPs * Type aliases * NewType * Callable * Generics * User-defined generic types * The "Any" type * Nominal vs structural subtyping * Module contents * Special typing primitives * Special types * Special forms * Building generic types * Other special directives * Generic concrete collections * Corresponding to built-in types * Corresponding to types in "collections" * Other concrete types * Abstract Base Classes * Corresponding to collections in "collections.abc" * Corresponding to other types in "collections.abc" * Asynchronous programming * Context manager types * Protocols * Functions and decorators * Introspection helpers * Constant * "pydoc" — Documentation generator and online help system * Python Development Mode * Effects of the Python Development Mode * ResourceWarning Example * Bad file descriptor error example * "doctest" — Test interactive Python examples * Simple Usage: Checking Examples in Docstrings * Simple Usage: Checking Examples in a Text File * How It Works * Which Docstrings Are Examined? * How are Docstring Examples Recognized? * What’s the Execution Context? * What About Exceptions? * Option Flags * Directives * Warnings * Basic API * Unittest API * Advanced API * DocTest Objects * Example Objects * DocTestFinder objects * DocTestParser objects * DocTestRunner objects * OutputChecker objects * Debugging * Soapbox * "unittest" — Unit testing framework * Basic example * Command-Line Interface * Command-line options * Test Discovery * Organizing test code * Re-using old test code * Skipping tests and expected failures * Distinguishing test iterations using subtests * Classes and functions * Test cases * Deprecated aliases * Grouping tests * Loading and running tests * load_tests Protocol * Class and Module Fixtures * setUpClass and tearDownClass * setUpModule and tearDownModule * Signal Handling * "unittest.mock" — mock object library * Quick Guide * The Mock Class * Calling * Deleting Attributes * Mock names and the name attribute * Attaching Mocks as Attributes * The patchers * patch * patch.object * patch.dict * patch.multiple * patch methods: start and stop * patch builtins * TEST_PREFIX * Nesting Patch Decorators * Where to patch * Patching Descriptors and Proxy Objects * MagicMock and magic method support * Mocking Magic Methods * Magic Mock * Helpers * sentinel * DEFAULT * call * create_autospec * ANY * FILTER_DIR * mock_open * Autospeccing * Sealing mocks * "unittest.mock" — getting started * Using Mock * Mock Patching Methods * Mock for Method Calls on an Object * Mocking Classes * Naming your mocks * Tracking all Calls * Setting Return Values and Attributes * Raising exceptions with mocks * Side effect functions and iterables * Mocking asynchronous iterators * Mocking asynchronous context manager * Creating a Mock from an Existing Object * Patch Decorators * Further Examples * Mocking chained calls * Partial mocking * Mocking a Generator Method * Applying the same patch to every test method * Mocking Unbound Methods * Checking multiple calls with mock * Coping with mutable arguments * Nesting Patches * Mocking a dictionary with MagicMock * Mock subclasses and their attributes * Mocking imports with patch.dict * Tracking order of calls and less verbose call assertions * More complex argument matching * 2to3 - Automated Python 2 to 3 code translation * Using 2to3 * Fixers * "lib2to3" - 2to3’s library * "test" — Regression tests package for Python * Writing Unit Tests for the "test" package * Running tests using the command-line interface * "test.support" — Utilities for the Python test suite * "test.support.socket_helper" — Utilities for socket tests * "test.support.script_helper" — Utilities for the Python execution tests * "test.support.bytecode_helper" — Support tools for testing correct bytecode generation Python Development Mode *********************** New in version 3.7. The Python Development Mode introduces additional runtime checks that are too expensive to be enabled by default. It should not be more verbose than the default if the code is correct; new warnings are only emitted when an issue is detected. It can be enabled using the "-X dev" command line option or by setting the "PYTHONDEVMODE" environment variable to "1". Effects of the Python Development Mode ************************************** Enabling the Python Development Mode is similar to the following command, but with additional effects described below: PYTHONMALLOC=debug PYTHONASYNCIODEBUG=1 python3 -W default -X faulthandler Effects of the Python Development Mode: * Add "default" warning filter. The following warnings are shown: * "DeprecationWarning" * "ImportWarning" * "PendingDeprecationWarning" * "ResourceWarning" Normally, the above warnings are filtered by the default warning filters. It behaves as if the "-W default" command line option is used. Use the "-W error" command line option or set the "PYTHONWARNINGS" environment variable to "error" to treat warnings as errors. * Install debug hooks on memory allocators to check for: * Buffer underflow * Buffer overflow * Memory allocator API violation * Unsafe usage of the GIL See the "PyMem_SetupDebugHooks()" C function. It behaves as if the "PYTHONMALLOC" environment variable is set to "debug". To enable the Python Development Mode without installing debug hooks on memory allocators, set the "PYTHONMALLOC" environment variable to "default". * Call "faulthandler.enable()" at Python startup to install handlers for the "SIGSEGV", "SIGFPE", "SIGABRT", "SIGBUS" and "SIGILL" signals to dump the Python traceback on a crash. It behaves as if the "-X faulthandler" command line option is used or if the "PYTHONFAULTHANDLER" environment variable is set to "1". * Enable asyncio debug mode. For example, "asyncio" checks for coroutines that were not awaited and logs them. It behaves as if the "PYTHONASYNCIODEBUG" environment variable is set to "1". * Check the *encoding* and *errors* arguments for string encoding and decoding operations. Examples: "open()", "str.encode()" and "bytes.decode()". By default, for best performance, the *errors* argument is only checked at the first encoding/decoding error and the *encoding* argument is sometimes ignored for empty strings. * The "io.IOBase" destructor logs "close()" exceptions. * Set the "dev_mode" attribute of "sys.flags" to "True". The Python Development Mode does not enable the "tracemalloc" module by default, because the overhead cost (to performance and memory) would be too large. Enabling the "tracemalloc" module provides additional information on the origin of some errors. For example, "ResourceWarning" logs the traceback where the resource was allocated, and a buffer overflow error logs the traceback where the memory block was allocated. The Python Development Mode does not prevent the "-O" command line option from removing "assert" statements nor from setting "__debug__" to "False". Changed in version 3.8: The "io.IOBase" destructor now logs "close()" exceptions. Changed in version 3.9: The *encoding* and *errors* arguments are now checked for string encoding and decoding operations. ResourceWarning Example *********************** Example of a script counting the number of lines of the text file specified in the command line: import sys def main(): fp = open(sys.argv[1]) nlines = len(fp.readlines()) print(nlines) # The file is closed implicitly if __name__ == "__main__": main() The script does not close the file explicitly. By default, Python does not emit any warning. Example using README.txt, which has 269 lines: $ python3 script.py README.txt 269 Enabling the Python Development Mode displays a "ResourceWarning" warning: $ python3 -X dev script.py README.txt 269 script.py:10: ResourceWarning: unclosed file <_io.TextIOWrapper name='README.rst' mode='r' encoding='UTF-8'> main() ResourceWarning: Enable tracemalloc to get the object allocation traceback In addition, enabling "tracemalloc" shows the line where the file was opened: $ python3 -X dev -X tracemalloc=5 script.py README.rst 269 script.py:10: ResourceWarning: unclosed file <_io.TextIOWrapper name='README.rst' mode='r' encoding='UTF-8'> main() Object allocated at (most recent call last): File "script.py", lineno 10 main() File "script.py", lineno 4 fp = open(sys.argv[1]) The fix is to close explicitly the file. Example using a context manager: def main(): # Close the file explicitly when exiting the with block with open(sys.argv[1]) as fp: nlines = len(fp.readlines()) print(nlines) Not closing a resource explicitly can leave a resource open for way longer than expected; it can cause severe issues upon exiting Python. It is bad in CPython, but it is even worse in PyPy. Closing resources explicitly makes an application more deterministic and more reliable. Bad file descriptor error example ********************************* Script displaying the first line of itself: import os def main(): fp = open(__file__) firstline = fp.readline() print(firstline.rstrip()) os.close(fp.fileno()) # The file is closed implicitly main() By default, Python does not emit any warning: $ python3 script.py import os The Python Development Mode shows a "ResourceWarning" and logs a “Bad file descriptor” error when finalizing the file object: $ python3 script.py import os script.py:10: ResourceWarning: unclosed file <_io.TextIOWrapper name='script.py' mode='r' encoding='UTF-8'> main() ResourceWarning: Enable tracemalloc to get the object allocation traceback Exception ignored in: <_io.TextIOWrapper name='script.py' mode='r' encoding='UTF-8'> Traceback (most recent call last): File "script.py", line 10, in main() OSError: [Errno 9] Bad file descriptor "os.close(fp.fileno())" closes the file descriptor. When the file object finalizer tries to close the file descriptor again, it fails with the "Bad file descriptor" error. A file descriptor must be closed only once. In the worst case scenario, closing it twice can lead to a crash (see bpo-18748 for an example). The fix is to remove the "os.close(fp.fileno())" line, or open the file with "closefd=False". Tkinter Dialogs *************** "tkinter.simpledialog" — Standard Tkinter input dialogs ======================================================= **Source code:** Lib/tkinter/simpledialog.py ====================================================================== The "tkinter.simpledialog" module contains convenience classes and functions for creating simple modal dialogs to get a value from the user. tkinter.simpledialog.askfloat(title, prompt, **kw) tkinter.simpledialog.askinteger(title, prompt, **kw) tkinter.simpledialog.askstring(title, prompt, **kw) The above three functions provide dialogs that prompt the user to enter a value of the desired type. class tkinter.simpledialog.Dialog(parent, title=None) The base class for custom dialogs. body(master) Override to construct the dialog’s interface and return the widget that should have initial focus. buttonbox() Default behaviour adds OK and Cancel buttons. Override for custom button layouts. "tkinter.filedialog" — File selection dialogs ============================================= **Source code:** Lib/tkinter/filedialog.py ====================================================================== The "tkinter.filedialog" module provides classes and factory functions for creating file/directory selection windows. Native Load/Save Dialogs ------------------------ The following classes and functions provide file dialog windows that combine a native look-and-feel with configuration options to customize behaviour. The following keyword arguments are applicable to the classes and functions listed below: *parent* - the window to place the dialog on top of *title* - the title of the window *initialdir* - the directory that the dialog starts in *initialfile* - the file selected upon opening of the dialog *filetypes* - a sequence of (label, pattern) tuples, ‘*’ wildcard is allowed *defaultextension* - default extension to append to file (save dialogs) *multiple* - when true, selection of multiple items is allowed **Static factory functions** The below functions when called create a modal, native look-and-feel dialog, wait for the user’s selection, then return the selected value(s) or "None" to the caller. tkinter.filedialog.askopenfile(mode="r", **options) tkinter.filedialog.askopenfiles(mode="r", **options) The above two functions create an "Open" dialog and return the opened file object(s) in read-only mode. tkinter.filedialog.asksaveasfile(mode="w", **options) Create a "SaveAs" dialog and return a file object opened in write- only mode. tkinter.filedialog.askopenfilename(**options) tkinter.filedialog.askopenfilenames(**options) The above two functions create an "Open" dialog and return the selected filename(s) that correspond to existing file(s). tkinter.filedialog.asksaveasfilename(**options) Create a "SaveAs" dialog and return the selected filename. tkinter.filedialog.askdirectory(**options) Prompt user to select a directory. Additional keyword option: *mustexist* - determines if selection must be an existing directory. class tkinter.filedialog.Open(master=None, **options) class tkinter.filedialog.SaveAs(master=None, **options) The above two classes provide native dialog windows for saving and loading files. **Convenience classes** The below classes are used for creating file/directory windows from scratch. These do not emulate the native look-and-feel of the platform. class tkinter.filedialog.Directory(master=None, **options) Create a dialog prompting the user to select a directory. Note: The *FileDialog* class should be subclassed for custom event handling and behaviour. class tkinter.filedialog.FileDialog(master, title=None) Create a basic file selection dialog. cancel_command(event=None) Trigger the termination of the dialog window. dirs_double_event(event) Event handler for double-click event on directory. dirs_select_event(event) Event handler for click event on directory. files_double_event(event) Event handler for double-click event on file. files_select_event(event) Event handler for single-click event on file. filter_command(event=None) Filter the files by directory. get_filter() Retrieve the file filter currently in use. get_selection() Retrieve the currently selected item. go(dir_or_file=os.curdir, pattern="*", default="", key=None) Render dialog and start event loop. ok_event(event) Exit dialog returning current selection. quit(how=None) Exit dialog returning filename, if any. set_filter(dir, pat) Set the file filter. set_selection(file) Update the current file selection to *file*. class tkinter.filedialog.LoadFileDialog(master, title=None) A subclass of FileDialog that creates a dialog window for selecting an existing file. ok_command() Test that a file is provided and that the selection indicates an already existing file. class tkinter.filedialog.SaveFileDialog(master, title=None) A subclass of FileDialog that creates a dialog window for selecting a destination file. ok_command() Test whether or not the selection points to a valid file that is not a directory. Confirmation is required if an already existing file is selected. "tkinter.commondialog" — Dialog window templates ================================================ **Source code:** Lib/tkinter/commondialog.py ====================================================================== The "tkinter.commondialog" module provides the "Dialog" class that is the base class for dialogs defined in other supporting modules. class tkinter.commondialog.Dialog(master=None, **options) show(color=None, **options) Render the Dialog window. See also: Modules "tkinter.messagebox", Reading and Writing Files "difflib" — Helpers for computing deltas **************************************** **Source code:** Lib/difflib.py ====================================================================== This module provides classes and functions for comparing sequences. It can be used for example, for comparing files, and can produce information about file differences in various formats, including HTML and context and unified diffs. For comparing directories and files, see also, the "filecmp" module. class difflib.SequenceMatcher This is a flexible class for comparing pairs of sequences of any type, so long as the sequence elements are *hashable*. The basic algorithm predates, and is a little fancier than, an algorithm published in the late 1980’s by Ratcliff and Obershelp under the hyperbolic name “gestalt pattern matching.” The idea is to find the longest contiguous matching subsequence that contains no “junk” elements; these “junk” elements are ones that are uninteresting in some sense, such as blank lines or whitespace. (Handling junk is an extension to the Ratcliff and Obershelp algorithm.) The same idea is then applied recursively to the pieces of the sequences to the left and to the right of the matching subsequence. This does not yield minimal edit sequences, but does tend to yield matches that “look right” to people. **Timing:** The basic Ratcliff-Obershelp algorithm is cubic time in the worst case and quadratic time in the expected case. "SequenceMatcher" is quadratic time for the worst case and has expected-case behavior dependent in a complicated way on how many elements the sequences have in common; best case time is linear. **Automatic junk heuristic:** "SequenceMatcher" supports a heuristic that automatically treats certain sequence items as junk. The heuristic counts how many times each individual item appears in the sequence. If an item’s duplicates (after the first one) account for more than 1% of the sequence and the sequence is at least 200 items long, this item is marked as “popular” and is treated as junk for the purpose of sequence matching. This heuristic can be turned off by setting the "autojunk" argument to "False" when creating the "SequenceMatcher". New in version 3.2: The *autojunk* parameter. class difflib.Differ This is a class for comparing sequences of lines of text, and producing human-readable differences or deltas. Differ uses "SequenceMatcher" both to compare sequences of lines, and to compare sequences of characters within similar (near-matching) lines. Each line of a "Differ" delta begins with a two-letter code: +------------+---------------------------------------------+ | Code | Meaning | |============|=============================================| | "'- '" | line unique to sequence 1 | +------------+---------------------------------------------+ | "'+ '" | line unique to sequence 2 | +------------+---------------------------------------------+ | "' '" | line common to both sequences | +------------+---------------------------------------------+ | "'? '" | line not present in either input sequence | +------------+---------------------------------------------+ Lines beginning with ‘"?"’ attempt to guide the eye to intraline differences, and were not present in either input sequence. These lines can be confusing if the sequences contain tab characters. class difflib.HtmlDiff This class can be used to create an HTML table (or a complete HTML file containing the table) showing a side by side, line by line comparison of text with inter-line and intra-line change highlights. The table can be generated in either full or contextual difference mode. The constructor for this class is: __init__(tabsize=8, wrapcolumn=None, linejunk=None, charjunk=IS_CHARACTER_JUNK) Initializes instance of "HtmlDiff". *tabsize* is an optional keyword argument to specify tab stop spacing and defaults to "8". *wrapcolumn* is an optional keyword to specify column number where lines are broken and wrapped, defaults to "None" where lines are not wrapped. *linejunk* and *charjunk* are optional keyword arguments passed into "ndiff()" (used by "HtmlDiff" to generate the side by side HTML differences). See "ndiff()" documentation for argument default values and descriptions. The following methods are public: make_file(fromlines, tolines, fromdesc='', todesc='', context=False, numlines=5, *, charset='utf-8') Compares *fromlines* and *tolines* (lists of strings) and returns a string which is a complete HTML file containing a table showing line by line differences with inter-line and intra-line changes highlighted. *fromdesc* and *todesc* are optional keyword arguments to specify from/to file column header strings (both default to an empty string). *context* and *numlines* are both optional keyword arguments. Set *context* to "True" when contextual differences are to be shown, else the default is "False" to show the full files. *numlines* defaults to "5". When *context* is "True" *numlines* controls the number of context lines which surround the difference highlights. When *context* is "False" *numlines* controls the number of lines which are shown before a difference highlight when using the “next” hyperlinks (setting to zero would cause the “next” hyperlinks to place the next difference highlight at the top of the browser without any leading context). Note: *fromdesc* and *todesc* are interpreted as unescaped HTML and should be properly escaped while receiving input from untrusted sources. Changed in version 3.5: *charset* keyword-only argument was added. The default charset of HTML document changed from "'ISO-8859-1'" to "'utf-8'". make_table(fromlines, tolines, fromdesc='', todesc='', context=False, numlines=5) Compares *fromlines* and *tolines* (lists of strings) and returns a string which is a complete HTML table showing line by line differences with inter-line and intra-line changes highlighted. The arguments for this method are the same as those for the "make_file()" method. "Tools/scripts/diff.py" is a command-line front-end to this class and contains a good example of its use. difflib.context_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\n') Compare *a* and *b* (lists of strings); return a delta (a *generator* generating the delta lines) in context diff format. Context diffs are a compact way of showing just the lines that have changed plus a few lines of context. The changes are shown in a before/after style. The number of context lines is set by *n* which defaults to three. By default, the diff control lines (those with "***" or "---") are created with a trailing newline. This is helpful so that inputs created from "io.IOBase.readlines()" result in diffs that are suitable for use with "io.IOBase.writelines()" since both the inputs and outputs have trailing newlines. For inputs that do not have trailing newlines, set the *lineterm* argument to """" so that the output will be uniformly newline free. The context diff format normally has a header for filenames and modification times. Any or all of these may be specified using strings for *fromfile*, *tofile*, *fromfiledate*, and *tofiledate*. The modification times are normally expressed in the ISO 8601 format. If not specified, the strings default to blanks. >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n'] >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n'] >>> sys.stdout.writelines(context_diff(s1, s2, fromfile='before.py', tofile='after.py')) *** before.py --- after.py *************** *** 1,4 **** ! bacon ! eggs ! ham guido --- 1,4 ---- ! python ! eggy ! hamster guido See A command-line interface to difflib for a more detailed example. difflib.get_close_matches(word, possibilities, n=3, cutoff=0.6) Return a list of the best “good enough” matches. *word* is a sequence for which close matches are desired (typically a string), and *possibilities* is a list of sequences against which to match *word* (typically a list of strings). Optional argument *n* (default "3") is the maximum number of close matches to return; *n* must be greater than "0". Optional argument *cutoff* (default "0.6") is a float in the range [0, 1]. Possibilities that don’t score at least that similar to *word* are ignored. The best (no more than *n*) matches among the possibilities are returned in a list, sorted by similarity score, most similar first. >>> get_close_matches('appel', ['ape', 'apple', 'peach', 'puppy']) ['apple', 'ape'] >>> import keyword >>> get_close_matches('wheel', keyword.kwlist) ['while'] >>> get_close_matches('pineapple', keyword.kwlist) [] >>> get_close_matches('accept', keyword.kwlist) ['except'] difflib.ndiff(a, b, linejunk=None, charjunk=IS_CHARACTER_JUNK) Compare *a* and *b* (lists of strings); return a "Differ"-style delta (a *generator* generating the delta lines). Optional keyword parameters *linejunk* and *charjunk* are filtering functions (or "None"): *linejunk*: A function that accepts a single string argument, and returns true if the string is junk, or false if not. The default is "None". There is also a module-level function "IS_LINE_JUNK()", which filters out lines without visible characters, except for at most one pound character ("'#'") – however the underlying "SequenceMatcher" class does a dynamic analysis of which lines are so frequent as to constitute noise, and this usually works better than using this function. *charjunk*: A function that accepts a character (a string of length 1), and returns if the character is junk, or false if not. The default is module-level function "IS_CHARACTER_JUNK()", which filters out whitespace characters (a blank or tab; it’s a bad idea to include newline in this!). "Tools/scripts/ndiff.py" is a command-line front-end to this function. >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True), ... 'ore\ntree\nemu\n'.splitlines(keepends=True)) >>> print(''.join(diff), end="") - one ? ^ + ore ? ^ - two - three ? - + tree + emu difflib.restore(sequence, which) Return one of the two sequences that generated a delta. Given a *sequence* produced by "Differ.compare()" or "ndiff()", extract lines originating from file 1 or 2 (parameter *which*), stripping off line prefixes. Example: >>> diff = ndiff('one\ntwo\nthree\n'.splitlines(keepends=True), ... 'ore\ntree\nemu\n'.splitlines(keepends=True)) >>> diff = list(diff) # materialize the generated delta into a list >>> print(''.join(restore(diff, 1)), end="") one two three >>> print(''.join(restore(diff, 2)), end="") ore tree emu difflib.unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\n') Compare *a* and *b* (lists of strings); return a delta (a *generator* generating the delta lines) in unified diff format. Unified diffs are a compact way of showing just the lines that have changed plus a few lines of context. The changes are shown in an inline style (instead of separate before/after blocks). The number of context lines is set by *n* which defaults to three. By default, the diff control lines (those with "---", "+++", or "@@") are created with a trailing newline. This is helpful so that inputs created from "io.IOBase.readlines()" result in diffs that are suitable for use with "io.IOBase.writelines()" since both the inputs and outputs have trailing newlines. For inputs that do not have trailing newlines, set the *lineterm* argument to """" so that the output will be uniformly newline free. The context diff format normally has a header for filenames and modification times. Any or all of these may be specified using strings for *fromfile*, *tofile*, *fromfiledate*, and *tofiledate*. The modification times are normally expressed in the ISO 8601 format. If not specified, the strings default to blanks. >>> s1 = ['bacon\n', 'eggs\n', 'ham\n', 'guido\n'] >>> s2 = ['python\n', 'eggy\n', 'hamster\n', 'guido\n'] >>> sys.stdout.writelines(unified_diff(s1, s2, fromfile='before.py', tofile='after.py')) --- before.py +++ after.py @@ -1,4 +1,4 @@ -bacon -eggs -ham +python +eggy +hamster guido See A command-line interface to difflib for a more detailed example. difflib.diff_bytes(dfunc, a, b, fromfile=b'', tofile=b'', fromfiledate=b'', tofiledate=b'', n=3, lineterm=b'\n') Compare *a* and *b* (lists of bytes objects) using *dfunc*; yield a sequence of delta lines (also bytes) in the format returned by *dfunc*. *dfunc* must be a callable, typically either "unified_diff()" or "context_diff()". Allows you to compare data with unknown or inconsistent encoding. All inputs except *n* must be bytes objects, not str. Works by losslessly converting all inputs (except *n*) to str, and calling "dfunc(a, b, fromfile, tofile, fromfiledate, tofiledate, n, lineterm)". The output of *dfunc* is then converted back to bytes, so the delta lines that you receive have the same unknown/inconsistent encodings as *a* and *b*. New in version 3.5. difflib.IS_LINE_JUNK(line) Return "True" for ignorable lines. The line *line* is ignorable if *line* is blank or contains a single "'#'", otherwise it is not ignorable. Used as a default for parameter *linejunk* in "ndiff()" in older versions. difflib.IS_CHARACTER_JUNK(ch) Return "True" for ignorable characters. The character *ch* is ignorable if *ch* is a space or tab, otherwise it is not ignorable. Used as a default for parameter *charjunk* in "ndiff()". See also: Pattern Matching: The Gestalt Approach Discussion of a similar algorithm by John W. Ratcliff and D. E. Metzener. This was published in Dr. Dobb’s Journal in July, 1988. SequenceMatcher Objects ======================= The "SequenceMatcher" class has this constructor: class difflib.SequenceMatcher(isjunk=None, a='', b='', autojunk=True) Optional argument *isjunk* must be "None" (the default) or a one- argument function that takes a sequence element and returns true if and only if the element is “junk” and should be ignored. Passing "None" for *isjunk* is equivalent to passing "lambda x: False"; in other words, no elements are ignored. For example, pass: lambda x: x in " \t" if you’re comparing lines as sequences of characters, and don’t want to synch up on blanks or hard tabs. The optional arguments *a* and *b* are sequences to be compared; both default to empty strings. The elements of both sequences must be *hashable*. The optional argument *autojunk* can be used to disable the automatic junk heuristic. New in version 3.2: The *autojunk* parameter. SequenceMatcher objects get three data attributes: *bjunk* is the set of elements of *b* for which *isjunk* is "True"; *bpopular* is the set of non-junk elements considered popular by the heuristic (if it is not disabled); *b2j* is a dict mapping the remaining elements of *b* to a list of positions where they occur. All three are reset whenever *b* is reset with "set_seqs()" or "set_seq2()". New in version 3.2: The *bjunk* and *bpopular* attributes. "SequenceMatcher" objects have the following methods: set_seqs(a, b) Set the two sequences to be compared. "SequenceMatcher" computes and caches detailed information about the second sequence, so if you want to compare one sequence against many sequences, use "set_seq2()" to set the commonly used sequence once and call "set_seq1()" repeatedly, once for each of the other sequences. set_seq1(a) Set the first sequence to be compared. The second sequence to be compared is not changed. set_seq2(b) Set the second sequence to be compared. The first sequence to be compared is not changed. find_longest_match(alo=0, ahi=None, blo=0, bhi=None) Find longest matching block in "a[alo:ahi]" and "b[blo:bhi]". If *isjunk* was omitted or "None", "find_longest_match()" returns "(i, j, k)" such that "a[i:i+k]" is equal to "b[j:j+k]", where "alo <= i <= i+k <= ahi" and "blo <= j <= j+k <= bhi". For all "(i', j', k')" meeting those conditions, the additional conditions "k >= k'", "i <= i'", and if "i == i'", "j <= j'" are also met. In other words, of all maximal matching blocks, return one that starts earliest in *a*, and of all those maximal matching blocks that start earliest in *a*, return the one that starts earliest in *b*. >>> s = SequenceMatcher(None, " abcd", "abcd abcd") >>> s.find_longest_match(0, 5, 0, 9) Match(a=0, b=4, size=5) If *isjunk* was provided, first the longest matching block is determined as above, but with the additional restriction that no junk element appears in the block. Then that block is extended as far as possible by matching (only) junk elements on both sides. So the resulting block never matches on junk except as identical junk happens to be adjacent to an interesting match. Here’s the same example as before, but considering blanks to be junk. That prevents "' abcd'" from matching the "' abcd'" at the tail end of the second sequence directly. Instead only the "'abcd'" can match, and matches the leftmost "'abcd'" in the second sequence: >>> s = SequenceMatcher(lambda x: x==" ", " abcd", "abcd abcd") >>> s.find_longest_match(0, 5, 0, 9) Match(a=1, b=0, size=4) If no blocks match, this returns "(alo, blo, 0)". This method returns a *named tuple* "Match(a, b, size)". Changed in version 3.9: Added default arguments. get_matching_blocks() Return list of triples describing non-overlapping matching subsequences. Each triple is of the form "(i, j, n)", and means that "a[i:i+n] == b[j:j+n]". The triples are monotonically increasing in *i* and *j*. The last triple is a dummy, and has the value "(len(a), len(b), 0)". It is the only triple with "n == 0". If "(i, j, n)" and "(i', j', n')" are adjacent triples in the list, and the second is not the last triple in the list, then "i+n < i'" or "j+n < j'"; in other words, adjacent triples always describe non- adjacent equal blocks. >>> s = SequenceMatcher(None, "abxcd", "abcd") >>> s.get_matching_blocks() [Match(a=0, b=0, size=2), Match(a=3, b=2, size=2), Match(a=5, b=4, size=0)] get_opcodes() Return list of 5-tuples describing how to turn *a* into *b*. Each tuple is of the form "(tag, i1, i2, j1, j2)". The first tuple has "i1 == j1 == 0", and remaining tuples have *i1* equal to the *i2* from the preceding tuple, and, likewise, *j1* equal to the previous *j2*. The *tag* values are strings, with these meanings: +-----------------+-----------------------------------------------+ | Value | Meaning | |=================|===============================================| | "'replace'" | "a[i1:i2]" should be replaced by "b[j1:j2]". | +-----------------+-----------------------------------------------+ | "'delete'" | "a[i1:i2]" should be deleted. Note that "j1 | | | == j2" in this case. | +-----------------+-----------------------------------------------+ | "'insert'" | "b[j1:j2]" should be inserted at "a[i1:i1]". | | | Note that "i1 == i2" in this case. | +-----------------+-----------------------------------------------+ | "'equal'" | "a[i1:i2] == b[j1:j2]" (the sub-sequences are | | | equal). | +-----------------+-----------------------------------------------+ For example: >>> a = "qabxcd" >>> b = "abycdf" >>> s = SequenceMatcher(None, a, b) >>> for tag, i1, i2, j1, j2 in s.get_opcodes(): ... print('{:7} a[{}:{}] --> b[{}:{}] {!r:>8} --> {!r}'.format( ... tag, i1, i2, j1, j2, a[i1:i2], b[j1:j2])) delete a[0:1] --> b[0:0] 'q' --> '' equal a[1:3] --> b[0:2] 'ab' --> 'ab' replace a[3:4] --> b[2:3] 'x' --> 'y' equal a[4:6] --> b[3:5] 'cd' --> 'cd' insert a[6:6] --> b[5:6] '' --> 'f' get_grouped_opcodes(n=3) Return a *generator* of groups with up to *n* lines of context. Starting with the groups returned by "get_opcodes()", this method splits out smaller change clusters and eliminates intervening ranges which have no changes. The groups are returned in the same format as "get_opcodes()". ratio() Return a measure of the sequences’ similarity as a float in the range [0, 1]. Where T is the total number of elements in both sequences, and M is the number of matches, this is 2.0*M / T. Note that this is "1.0" if the sequences are identical, and "0.0" if they have nothing in common. This is expensive to compute if "get_matching_blocks()" or "get_opcodes()" hasn’t already been called, in which case you may want to try "quick_ratio()" or "real_quick_ratio()" first to get an upper bound. Note: Caution: The result of a "ratio()" call may depend on the order of the arguments. For instance: >>> SequenceMatcher(None, 'tide', 'diet').ratio() 0.25 >>> SequenceMatcher(None, 'diet', 'tide').ratio() 0.5 quick_ratio() Return an upper bound on "ratio()" relatively quickly. real_quick_ratio() Return an upper bound on "ratio()" very quickly. The three methods that return the ratio of matching to total characters can give different results due to differing levels of approximation, although "quick_ratio()" and "real_quick_ratio()" are always at least as large as "ratio()": >>> s = SequenceMatcher(None, "abcd", "bcde") >>> s.ratio() 0.75 >>> s.quick_ratio() 0.75 >>> s.real_quick_ratio() 1.0 SequenceMatcher Examples ======================== This example compares two strings, considering blanks to be “junk”: >>> s = SequenceMatcher(lambda x: x == " ", ... "private Thread currentThread;", ... "private volatile Thread currentThread;") "ratio()" returns a float in [0, 1], measuring the similarity of the sequences. As a rule of thumb, a "ratio()" value over 0.6 means the sequences are close matches: >>> print(round(s.ratio(), 3)) 0.866 If you’re only interested in where the sequences match, "get_matching_blocks()" is handy: >>> for block in s.get_matching_blocks(): ... print("a[%d] and b[%d] match for %d elements" % block) a[0] and b[0] match for 8 elements a[8] and b[17] match for 21 elements a[29] and b[38] match for 0 elements Note that the last tuple returned by "get_matching_blocks()" is always a dummy, "(len(a), len(b), 0)", and this is the only case in which the last tuple element (number of elements matched) is "0". If you want to know how to change the first sequence into the second, use "get_opcodes()": >>> for opcode in s.get_opcodes(): ... print("%6s a[%d:%d] b[%d:%d]" % opcode) equal a[0:8] b[0:8] insert a[8:8] b[8:17] equal a[8:29] b[17:38] See also: * The "get_close_matches()" function in this module which shows how simple code building on "SequenceMatcher" can be used to do useful work. * Simple version control recipe for a small application built with "SequenceMatcher". Differ Objects ============== Note that "Differ"-generated deltas make no claim to be **minimal** diffs. To the contrary, minimal diffs are often counter-intuitive, because they synch up anywhere possible, sometimes accidental matches 100 pages apart. Restricting synch points to contiguous matches preserves some notion of locality, at the occasional cost of producing a longer diff. The "Differ" class has this constructor: class difflib.Differ(linejunk=None, charjunk=None) Optional keyword parameters *linejunk* and *charjunk* are for filter functions (or "None"): *linejunk*: A function that accepts a single string argument, and returns true if the string is junk. The default is "None", meaning that no line is considered junk. *charjunk*: A function that accepts a single character argument (a string of length 1), and returns true if the character is junk. The default is "None", meaning that no character is considered junk. These junk-filtering functions speed up matching to find differences and do not cause any differing lines or characters to be ignored. Read the description of the "find_longest_match()" method’s *isjunk* parameter for an explanation. "Differ" objects are used (deltas generated) via a single method: compare(a, b) Compare two sequences of lines, and generate the delta (a sequence of lines). Each sequence must contain individual single-line strings ending with newlines. Such sequences can be obtained from the "readlines()" method of file-like objects. The delta generated also consists of newline-terminated strings, ready to be printed as-is via the "writelines()" method of a file-like object. Differ Example ============== This example compares two texts. First we set up the texts, sequences of individual single-line strings ending with newlines (such sequences can also be obtained from the "readlines()" method of file-like objects): >>> text1 = ''' 1. Beautiful is better than ugly. ... 2. Explicit is better than implicit. ... 3. Simple is better than complex. ... 4. Complex is better than complicated. ... '''.splitlines(keepends=True) >>> len(text1) 4 >>> text1[0][-1] '\n' >>> text2 = ''' 1. Beautiful is better than ugly. ... 3. Simple is better than complex. ... 4. Complicated is better than complex. ... 5. Flat is better than nested. ... '''.splitlines(keepends=True) Next we instantiate a Differ object: >>> d = Differ() Note that when instantiating a "Differ" object we may pass functions to filter out line and character “junk.” See the "Differ()" constructor for details. Finally, we compare the two: >>> result = list(d.compare(text1, text2)) "result" is a list of strings, so let’s pretty-print it: >>> from pprint import pprint >>> pprint(result) [' 1. Beautiful is better than ugly.\n', '- 2. Explicit is better than implicit.\n', '- 3. Simple is better than complex.\n', '+ 3. Simple is better than complex.\n', '? ++\n', '- 4. Complex is better than complicated.\n', '? ^ ---- ^\n', '+ 4. Complicated is better than complex.\n', '? ++++ ^ ^\n', '+ 5. Flat is better than nested.\n'] As a single multi-line string it looks like this: >>> import sys >>> sys.stdout.writelines(result) 1. Beautiful is better than ugly. - 2. Explicit is better than implicit. - 3. Simple is better than complex. + 3. Simple is better than complex. ? ++ - 4. Complex is better than complicated. ? ^ ---- ^ + 4. Complicated is better than complex. ? ++++ ^ ^ + 5. Flat is better than nested. A command-line interface to difflib =================================== This example shows how to use difflib to create a "diff"-like utility. It is also contained in the Python source distribution, as "Tools/scripts/diff.py". #!/usr/bin/env python3 """ Command line interface to difflib.py providing diffs in four formats: * ndiff: lists every line and highlights interline changes. * context: highlights clusters of changes in a before/after format. * unified: highlights clusters of changes in an inline format. * html: generates side by side comparison with change highlights. """ import sys, os, difflib, argparse from datetime import datetime, timezone def file_mtime(path): t = datetime.fromtimestamp(os.stat(path).st_mtime, timezone.utc) return t.astimezone().isoformat() def main(): parser = argparse.ArgumentParser() parser.add_argument('-c', action='store_true', default=False, help='Produce a context format diff (default)') parser.add_argument('-u', action='store_true', default=False, help='Produce a unified format diff') parser.add_argument('-m', action='store_true', default=False, help='Produce HTML side by side diff ' '(can use -c and -l in conjunction)') parser.add_argument('-n', action='store_true', default=False, help='Produce a ndiff format diff') parser.add_argument('-l', '--lines', type=int, default=3, help='Set number of context lines (default 3)') parser.add_argument('fromfile') parser.add_argument('tofile') options = parser.parse_args() n = options.lines fromfile = options.fromfile tofile = options.tofile fromdate = file_mtime(fromfile) todate = file_mtime(tofile) with open(fromfile) as ff: fromlines = ff.readlines() with open(tofile) as tf: tolines = tf.readlines() if options.u: diff = difflib.unified_diff(fromlines, tolines, fromfile, tofile, fromdate, todate, n=n) elif options.n: diff = difflib.ndiff(fromlines, tolines) elif options.m: diff = difflib.HtmlDiff().make_file(fromlines,tolines,fromfile,tofile,context=options.c,numlines=n) else: diff = difflib.context_diff(fromlines, tolines, fromfile, tofile, fromdate, todate, n=n) sys.stdout.writelines(diff) if __name__ == '__main__': main() "dis" — Disassembler for Python bytecode **************************************** **Source code:** Lib/dis.py ====================================================================== The "dis" module supports the analysis of CPython *bytecode* by disassembling it. The CPython bytecode which this module takes as an input is defined in the file "Include/opcode.h" and used by the compiler and the interpreter. **CPython implementation detail:** Bytecode is an implementation detail of the CPython interpreter. No guarantees are made that bytecode will not be added, removed, or changed between versions of Python. Use of this module should not be considered to work across Python VMs or Python releases. Changed in version 3.6: Use 2 bytes for each instruction. Previously the number of bytes varied by instruction. Example: Given the function "myfunc()": def myfunc(alist): return len(alist) the following command can be used to display the disassembly of "myfunc()": >>> dis.dis(myfunc) 2 0 LOAD_GLOBAL 0 (len) 2 LOAD_FAST 0 (alist) 4 CALL_FUNCTION 1 6 RETURN_VALUE (The “2” is a line number). Bytecode analysis ================= New in version 3.4. The bytecode analysis API allows pieces of Python code to be wrapped in a "Bytecode" object that provides easy access to details of the compiled code. class dis.Bytecode(x, *, first_line=None, current_offset=None) Analyse the bytecode corresponding to a function, generator, asynchronous generator, coroutine, method, string of source code, or a code object (as returned by "compile()"). This is a convenience wrapper around many of the functions listed below, most notably "get_instructions()", as iterating over a "Bytecode" instance yields the bytecode operations as "Instruction" instances. If *first_line* is not "None", it indicates the line number that should be reported for the first source line in the disassembled code. Otherwise, the source line information (if any) is taken directly from the disassembled code object. If *current_offset* is not "None", it refers to an instruction offset in the disassembled code. Setting this means "dis()" will display a “current instruction” marker against the specified opcode. classmethod from_traceback(tb) Construct a "Bytecode" instance from the given traceback, setting *current_offset* to the instruction responsible for the exception. codeobj The compiled code object. first_line The first source line of the code object (if available) dis() Return a formatted view of the bytecode operations (the same as printed by "dis.dis()", but returned as a multi-line string). info() Return a formatted multi-line string with detailed information about the code object, like "code_info()". Changed in version 3.7: This can now handle coroutine and asynchronous generator objects. Example: >>> bytecode = dis.Bytecode(myfunc) >>> for instr in bytecode: ... print(instr.opname) ... LOAD_GLOBAL LOAD_FAST CALL_FUNCTION RETURN_VALUE Analysis functions ================== The "dis" module also defines the following analysis functions that convert the input directly to the desired output. They can be useful if only a single operation is being performed, so the intermediate analysis object isn’t useful: dis.code_info(x) Return a formatted multi-line string with detailed code object information for the supplied function, generator, asynchronous generator, coroutine, method, source code string or code object. Note that the exact contents of code info strings are highly implementation dependent and they may change arbitrarily across Python VMs or Python releases. New in version 3.2. Changed in version 3.7: This can now handle coroutine and asynchronous generator objects. dis.show_code(x, *, file=None) Print detailed code object information for the supplied function, method, source code string or code object to *file* (or "sys.stdout" if *file* is not specified). This is a convenient shorthand for "print(code_info(x), file=file)", intended for interactive exploration at the interpreter prompt. New in version 3.2. Changed in version 3.4: Added *file* parameter. dis.dis(x=None, *, file=None, depth=None) Disassemble the *x* object. *x* can denote either a module, a class, a method, a function, a generator, an asynchronous generator, a coroutine, a code object, a string of source code or a byte sequence of raw bytecode. For a module, it disassembles all functions. For a class, it disassembles all methods (including class and static methods). For a code object or sequence of raw bytecode, it prints one line per bytecode instruction. It also recursively disassembles nested code objects (the code of comprehensions, generator expressions and nested functions, and the code used for building nested classes). Strings are first compiled to code objects with the "compile()" built-in function before being disassembled. If no object is provided, this function disassembles the last traceback. The disassembly is written as text to the supplied *file* argument if provided and to "sys.stdout" otherwise. The maximal depth of recursion is limited by *depth* unless it is "None". "depth=0" means no recursion. Changed in version 3.4: Added *file* parameter. Changed in version 3.7: Implemented recursive disassembling and added *depth* parameter. Changed in version 3.7: This can now handle coroutine and asynchronous generator objects. dis.distb(tb=None, *, file=None) Disassemble the top-of-stack function of a traceback, using the last traceback if none was passed. The instruction causing the exception is indicated. The disassembly is written as text to the supplied *file* argument if provided and to "sys.stdout" otherwise. Changed in version 3.4: Added *file* parameter. dis.disassemble(code, lasti=-1, *, file=None) dis.disco(code, lasti=-1, *, file=None) Disassemble a code object, indicating the last instruction if *lasti* was provided. The output is divided in the following columns: 1. the line number, for the first instruction of each line 2. the current instruction, indicated as "-->", 3. a labelled instruction, indicated with ">>", 4. the address of the instruction, 5. the operation code name, 6. operation parameters, and 7. interpretation of the parameters in parentheses. The parameter interpretation recognizes local and global variable names, constant values, branch targets, and compare operators. The disassembly is written as text to the supplied *file* argument if provided and to "sys.stdout" otherwise. Changed in version 3.4: Added *file* parameter. dis.get_instructions(x, *, first_line=None) Return an iterator over the instructions in the supplied function, method, source code string or code object. The iterator generates a series of "Instruction" named tuples giving the details of each operation in the supplied code. If *first_line* is not "None", it indicates the line number that should be reported for the first source line in the disassembled code. Otherwise, the source line information (if any) is taken directly from the disassembled code object. New in version 3.4. dis.findlinestarts(code) This generator function uses the "co_firstlineno" and "co_lnotab" attributes of the code object *code* to find the offsets which are starts of lines in the source code. They are generated as "(offset, lineno)" pairs. See Objects/lnotab_notes.txt for the "co_lnotab" format and how to decode it. Changed in version 3.6: Line numbers can be decreasing. Before, they were always increasing. dis.findlabels(code) Detect all offsets in the raw compiled bytecode string *code* which are jump targets, and return a list of these offsets. dis.stack_effect(opcode, oparg=None, *, jump=None) Compute the stack effect of *opcode* with argument *oparg*. If the code has a jump target and *jump* is "True", "stack_effect()" will return the stack effect of jumping. If *jump* is "False", it will return the stack effect of not jumping. And if *jump* is "None" (default), it will return the maximal stack effect of both cases. New in version 3.4. Changed in version 3.8: Added *jump* parameter. Python Bytecode Instructions ============================ The "get_instructions()" function and "Bytecode" class provide details of bytecode instructions as "Instruction" instances: class dis.Instruction Details for a bytecode operation opcode numeric code for operation, corresponding to the opcode values listed below and the bytecode values in the Opcode collections. opname human readable name for operation arg numeric argument to operation (if any), otherwise "None" argval resolved arg value (if known), otherwise same as arg argrepr human readable description of operation argument offset start index of operation within bytecode sequence starts_line line started by this opcode (if any), otherwise "None" is_jump_target "True" if other code jumps to here, otherwise "False" New in version 3.4. The Python compiler currently generates the following bytecode instructions. **General instructions** NOP Do nothing code. Used as a placeholder by the bytecode optimizer. POP_TOP Removes the top-of-stack (TOS) item. ROT_TWO Swaps the two top-most stack items. ROT_THREE Lifts second and third stack item one position up, moves top down to position three. ROT_FOUR Lifts second, third and fourth stack items one position up, moves top down to position four. New in version 3.8. DUP_TOP Duplicates the reference on top of the stack. New in version 3.2. DUP_TOP_TWO Duplicates the two references on top of the stack, leaving them in the same order. New in version 3.2. **Unary operations** Unary operations take the top of the stack, apply the operation, and push the result back on the stack. UNARY_POSITIVE Implements "TOS = +TOS". UNARY_NEGATIVE Implements "TOS = -TOS". UNARY_NOT Implements "TOS = not TOS". UNARY_INVERT Implements "TOS = ~TOS". GET_ITER Implements "TOS = iter(TOS)". GET_YIELD_FROM_ITER If "TOS" is a *generator iterator* or *coroutine* object it is left as is. Otherwise, implements "TOS = iter(TOS)". New in version 3.5. **Binary operations** Binary operations remove the top of the stack (TOS) and the second top-most stack item (TOS1) from the stack. They perform the operation, and put the result back on the stack. BINARY_POWER Implements "TOS = TOS1 ** TOS". BINARY_MULTIPLY Implements "TOS = TOS1 * TOS". BINARY_MATRIX_MULTIPLY Implements "TOS = TOS1 @ TOS". New in version 3.5. BINARY_FLOOR_DIVIDE Implements "TOS = TOS1 // TOS". BINARY_TRUE_DIVIDE Implements "TOS = TOS1 / TOS". BINARY_MODULO Implements "TOS = TOS1 % TOS". BINARY_ADD Implements "TOS = TOS1 + TOS". BINARY_SUBTRACT Implements "TOS = TOS1 - TOS". BINARY_SUBSCR Implements "TOS = TOS1[TOS]". BINARY_LSHIFT Implements "TOS = TOS1 << TOS". BINARY_RSHIFT Implements "TOS = TOS1 >> TOS". BINARY_AND Implements "TOS = TOS1 & TOS". BINARY_XOR Implements "TOS = TOS1 ^ TOS". BINARY_OR Implements "TOS = TOS1 | TOS". **In-place operations** In-place operations are like binary operations, in that they remove TOS and TOS1, and push the result back on the stack, but the operation is done in-place when TOS1 supports it, and the resulting TOS may be (but does not have to be) the original TOS1. INPLACE_POWER Implements in-place "TOS = TOS1 ** TOS". INPLACE_MULTIPLY Implements in-place "TOS = TOS1 * TOS". INPLACE_MATRIX_MULTIPLY Implements in-place "TOS = TOS1 @ TOS". New in version 3.5. INPLACE_FLOOR_DIVIDE Implements in-place "TOS = TOS1 // TOS". INPLACE_TRUE_DIVIDE Implements in-place "TOS = TOS1 / TOS". INPLACE_MODULO Implements in-place "TOS = TOS1 % TOS". INPLACE_ADD Implements in-place "TOS = TOS1 + TOS". INPLACE_SUBTRACT Implements in-place "TOS = TOS1 - TOS". INPLACE_LSHIFT Implements in-place "TOS = TOS1 << TOS". INPLACE_RSHIFT Implements in-place "TOS = TOS1 >> TOS". INPLACE_AND Implements in-place "TOS = TOS1 & TOS". INPLACE_XOR Implements in-place "TOS = TOS1 ^ TOS". INPLACE_OR Implements in-place "TOS = TOS1 | TOS". STORE_SUBSCR Implements "TOS1[TOS] = TOS2". DELETE_SUBSCR Implements "del TOS1[TOS]". **Coroutine opcodes** GET_AWAITABLE Implements "TOS = get_awaitable(TOS)", where "get_awaitable(o)" returns "o" if "o" is a coroutine object or a generator object with the CO_ITERABLE_COROUTINE flag, or resolves "o.__await__". New in version 3.5. GET_AITER Implements "TOS = TOS.__aiter__()". New in version 3.5. Changed in version 3.7: Returning awaitable objects from "__aiter__" is no longer supported. GET_ANEXT Implements "PUSH(get_awaitable(TOS.__anext__()))". See "GET_AWAITABLE" for details about "get_awaitable" New in version 3.5. END_ASYNC_FOR Terminates an "async for" loop. Handles an exception raised when awaiting a next item. If TOS is "StopAsyncIteration" pop 7 values from the stack and restore the exception state using the second three of them. Otherwise re-raise the exception using the three values from the stack. An exception handler block is removed from the block stack. New in version 3.8. BEFORE_ASYNC_WITH Resolves "__aenter__" and "__aexit__" from the object on top of the stack. Pushes "__aexit__" and result of "__aenter__()" to the stack. New in version 3.5. SETUP_ASYNC_WITH Creates a new frame object. New in version 3.5. **Miscellaneous opcodes** PRINT_EXPR Implements the expression statement for the interactive mode. TOS is removed from the stack and printed. In non-interactive mode, an expression statement is terminated with "POP_TOP". SET_ADD(i) Calls "set.add(TOS1[-i], TOS)". Used to implement set comprehensions. LIST_APPEND(i) Calls "list.append(TOS1[-i], TOS)". Used to implement list comprehensions. MAP_ADD(i) Calls "dict.__setitem__(TOS1[-i], TOS1, TOS)". Used to implement dict comprehensions. New in version 3.1. Changed in version 3.8: Map value is TOS and map key is TOS1. Before, those were reversed. For all of the "SET_ADD", "LIST_APPEND" and "MAP_ADD" instructions, while the added value or key/value pair is popped off, the container object remains on the stack so that it is available for further iterations of the loop. RETURN_VALUE Returns with TOS to the caller of the function. YIELD_VALUE Pops TOS and yields it from a *generator*. YIELD_FROM Pops TOS and delegates to it as a subiterator from a *generator*. New in version 3.3. SETUP_ANNOTATIONS Checks whether "__annotations__" is defined in "locals()", if not it is set up to an empty "dict". This opcode is only emitted if a class or module body contains *variable annotations* statically. New in version 3.6. IMPORT_STAR Loads all symbols not starting with "'_'" directly from the module TOS to the local namespace. The module is popped after loading all names. This opcode implements "from module import *". POP_BLOCK Removes one block from the block stack. Per frame, there is a stack of blocks, denoting "try" statements, and such. POP_EXCEPT Removes one block from the block stack. The popped block must be an exception handler block, as implicitly created when entering an except handler. In addition to popping extraneous values from the frame stack, the last three popped values are used to restore the exception state. RERAISE Re-raises the exception currently on top of the stack. New in version 3.9. WITH_EXCEPT_START Calls the function in position 7 on the stack with the top three items on the stack as arguments. Used to implement the call "context_manager.__exit__(*exc_info())" when an exception has occurred in a "with" statement. New in version 3.9. LOAD_ASSERTION_ERROR Pushes "AssertionError" onto the stack. Used by the "assert" statement. New in version 3.9. LOAD_BUILD_CLASS Pushes "builtins.__build_class__()" onto the stack. It is later called by "CALL_FUNCTION" to construct a class. SETUP_WITH(delta) This opcode performs several operations before a with block starts. First, it loads "__exit__()" from the context manager and pushes it onto the stack for later use by "WITH_EXCEPT_START". Then, "__enter__()" is called, and a finally block pointing to *delta* is pushed. Finally, the result of calling the "__enter__()" method is pushed onto the stack. The next opcode will either ignore it ("POP_TOP"), or store it in (a) variable(s) ("STORE_FAST", "STORE_NAME", or "UNPACK_SEQUENCE"). New in version 3.2. All of the following opcodes use their arguments. STORE_NAME(namei) Implements "name = TOS". *namei* is the index of *name* in the attribute "co_names" of the code object. The compiler tries to use "STORE_FAST" or "STORE_GLOBAL" if possible. DELETE_NAME(namei) Implements "del name", where *namei* is the index into "co_names" attribute of the code object. UNPACK_SEQUENCE(count) Unpacks TOS into *count* individual values, which are put onto the stack right-to-left. UNPACK_EX(counts) Implements assignment with a starred target: Unpacks an iterable in TOS into individual values, where the total number of values can be smaller than the number of items in the iterable: one of the new values will be a list of all leftover items. The low byte of *counts* is the number of values before the list value, the high byte of *counts* the number of values after it. The resulting values are put onto the stack right-to-left. STORE_ATTR(namei) Implements "TOS.name = TOS1", where *namei* is the index of name in "co_names". DELETE_ATTR(namei) Implements "del TOS.name", using *namei* as index into "co_names". STORE_GLOBAL(namei) Works as "STORE_NAME", but stores the name as a global. DELETE_GLOBAL(namei) Works as "DELETE_NAME", but deletes a global name. LOAD_CONST(consti) Pushes "co_consts[consti]" onto the stack. LOAD_NAME(namei) Pushes the value associated with "co_names[namei]" onto the stack. BUILD_TUPLE(count) Creates a tuple consuming *count* items from the stack, and pushes the resulting tuple onto the stack. BUILD_LIST(count) Works as "BUILD_TUPLE", but creates a list. BUILD_SET(count) Works as "BUILD_TUPLE", but creates a set. BUILD_MAP(count) Pushes a new dictionary object onto the stack. Pops "2 * count" items so that the dictionary holds *count* entries: "{..., TOS3: TOS2, TOS1: TOS}". Changed in version 3.5: The dictionary is created from stack items instead of creating an empty dictionary pre-sized to hold *count* items. BUILD_CONST_KEY_MAP(count) The version of "BUILD_MAP" specialized for constant keys. Pops the top element on the stack which contains a tuple of keys, then starting from "TOS1", pops *count* values to form values in the built dictionary. New in version 3.6. BUILD_STRING(count) Concatenates *count* strings from the stack and pushes the resulting string onto the stack. New in version 3.6. LIST_TO_TUPLE Pops a list from the stack and pushes a tuple containing the same values. New in version 3.9. LIST_EXTEND(i) Calls "list.extend(TOS1[-i], TOS)". Used to build lists. New in version 3.9. SET_UPDATE(i) Calls "set.update(TOS1[-i], TOS)". Used to build sets. New in version 3.9. DICT_UPDATE(i) Calls "dict.update(TOS1[-i], TOS)". Used to build dicts. New in version 3.9. DICT_MERGE Like "DICT_UPDATE" but raises an exception for duplicate keys. New in version 3.9. LOAD_ATTR(namei) Replaces TOS with "getattr(TOS, co_names[namei])". COMPARE_OP(opname) Performs a Boolean operation. The operation name can be found in "cmp_op[opname]". IS_OP(invert) Performs "is" comparison, or "is not" if "invert" is 1. New in version 3.9. CONTAINS_OP(invert) Performs "in" comparison, or "not in" if "invert" is 1. New in version 3.9. IMPORT_NAME(namei) Imports the module "co_names[namei]". TOS and TOS1 are popped and provide the *fromlist* and *level* arguments of "__import__()". The module object is pushed onto the stack. The current namespace is not affected: for a proper import statement, a subsequent "STORE_FAST" instruction modifies the namespace. IMPORT_FROM(namei) Loads the attribute "co_names[namei]" from the module found in TOS. The resulting object is pushed onto the stack, to be subsequently stored by a "STORE_FAST" instruction. JUMP_FORWARD(delta) Increments bytecode counter by *delta*. POP_JUMP_IF_TRUE(target) If TOS is true, sets the bytecode counter to *target*. TOS is popped. New in version 3.1. POP_JUMP_IF_FALSE(target) If TOS is false, sets the bytecode counter to *target*. TOS is popped. New in version 3.1. JUMP_IF_NOT_EXC_MATCH(target) Tests whether the second value on the stack is an exception matching TOS, and jumps if it is not. Pops two values from the stack. New in version 3.9. JUMP_IF_TRUE_OR_POP(target) If TOS is true, sets the bytecode counter to *target* and leaves TOS on the stack. Otherwise (TOS is false), TOS is popped. New in version 3.1. JUMP_IF_FALSE_OR_POP(target) If TOS is false, sets the bytecode counter to *target* and leaves TOS on the stack. Otherwise (TOS is true), TOS is popped. New in version 3.1. JUMP_ABSOLUTE(target) Set bytecode counter to *target*. FOR_ITER(delta) TOS is an *iterator*. Call its "__next__()" method. If this yields a new value, push it on the stack (leaving the iterator below it). If the iterator indicates it is exhausted, TOS is popped, and the byte code counter is incremented by *delta*. LOAD_GLOBAL(namei) Loads the global named "co_names[namei]" onto the stack. SETUP_FINALLY(delta) Pushes a try block from a try-finally or try-except clause onto the block stack. *delta* points to the finally block or the first except block. LOAD_FAST(var_num) Pushes a reference to the local "co_varnames[var_num]" onto the stack. STORE_FAST(var_num) Stores TOS into the local "co_varnames[var_num]". DELETE_FAST(var_num) Deletes local "co_varnames[var_num]". LOAD_CLOSURE(i) Pushes a reference to the cell contained in slot *i* of the cell and free variable storage. The name of the variable is "co_cellvars[i]" if *i* is less than the length of *co_cellvars*. Otherwise it is "co_freevars[i - len(co_cellvars)]". LOAD_DEREF(i) Loads the cell contained in slot *i* of the cell and free variable storage. Pushes a reference to the object the cell contains on the stack. LOAD_CLASSDEREF(i) Much like "LOAD_DEREF" but first checks the locals dictionary before consulting the cell. This is used for loading free variables in class bodies. New in version 3.4. STORE_DEREF(i) Stores TOS into the cell contained in slot *i* of the cell and free variable storage. DELETE_DEREF(i) Empties the cell contained in slot *i* of the cell and free variable storage. Used by the "del" statement. New in version 3.2. RAISE_VARARGS(argc) Raises an exception using one of the 3 forms of the "raise" statement, depending on the value of *argc*: * 0: "raise" (re-raise previous exception) * 1: "raise TOS" (raise exception instance or type at "TOS") * 2: "raise TOS1 from TOS" (raise exception instance or type at "TOS1" with "__cause__" set to "TOS") CALL_FUNCTION(argc) Calls a callable object with positional arguments. *argc* indicates the number of positional arguments. The top of the stack contains positional arguments, with the right-most argument on top. Below the arguments is a callable object to call. "CALL_FUNCTION" pops all arguments and the callable object off the stack, calls the callable object with those arguments, and pushes the return value returned by the callable object. Changed in version 3.6: This opcode is used only for calls with positional arguments. CALL_FUNCTION_KW(argc) Calls a callable object with positional (if any) and keyword arguments. *argc* indicates the total number of positional and keyword arguments. The top element on the stack contains a tuple with the names of the keyword arguments, which must be strings. Below that are the values for the keyword arguments, in the order corresponding to the tuple. Below that are positional arguments, with the right-most parameter on top. Below the arguments is a callable object to call. "CALL_FUNCTION_KW" pops all arguments and the callable object off the stack, calls the callable object with those arguments, and pushes the return value returned by the callable object. Changed in version 3.6: Keyword arguments are packed in a tuple instead of a dictionary, *argc* indicates the total number of arguments. CALL_FUNCTION_EX(flags) Calls a callable object with variable set of positional and keyword arguments. If the lowest bit of *flags* is set, the top of the stack contains a mapping object containing additional keyword arguments. Before the callable is called, the mapping object and iterable object are each “unpacked” and their contents passed in as keyword and positional arguments respectively. "CALL_FUNCTION_EX" pops all arguments and the callable object off the stack, calls the callable object with those arguments, and pushes the return value returned by the callable object. New in version 3.6. LOAD_METHOD(namei) Loads a method named "co_names[namei]" from the TOS object. TOS is popped. This bytecode distinguishes two cases: if TOS has a method with the correct name, the bytecode pushes the unbound method and TOS. TOS will be used as the first argument ("self") by "CALL_METHOD" when calling the unbound method. Otherwise, "NULL" and the object return by the attribute lookup are pushed. New in version 3.7. CALL_METHOD(argc) Calls a method. *argc* is the number of positional arguments. Keyword arguments are not supported. This opcode is designed to be used with "LOAD_METHOD". Positional arguments are on top of the stack. Below them, the two items described in "LOAD_METHOD" are on the stack (either "self" and an unbound method object or "NULL" and an arbitrary callable). All of them are popped and the return value is pushed. New in version 3.7. MAKE_FUNCTION(flags) Pushes a new function object on the stack. From bottom to top, the consumed stack must consist of values if the argument carries a specified flag value * "0x01" a tuple of default values for positional-only and positional-or-keyword parameters in positional order * "0x02" a dictionary of keyword-only parameters’ default values * "0x04" an annotation dictionary * "0x08" a tuple containing cells for free variables, making a closure * the code associated with the function (at TOS1) * the *qualified name* of the function (at TOS) BUILD_SLICE(argc) Pushes a slice object on the stack. *argc* must be 2 or 3. If it is 2, "slice(TOS1, TOS)" is pushed; if it is 3, "slice(TOS2, TOS1, TOS)" is pushed. See the "slice()" built-in function for more information. EXTENDED_ARG(ext) Prefixes any opcode which has an argument too big to fit into the default one byte. *ext* holds an additional byte which act as higher bits in the argument. For each opcode, at most three prefixal "EXTENDED_ARG" are allowed, forming an argument from two- byte to four-byte. FORMAT_VALUE(flags) Used for implementing formatted literal strings (f-strings). Pops an optional *fmt_spec* from the stack, then a required *value*. *flags* is interpreted as follows: * "(flags & 0x03) == 0x00": *value* is formatted as-is. * "(flags & 0x03) == 0x01": call "str()" on *value* before formatting it. * "(flags & 0x03) == 0x02": call "repr()" on *value* before formatting it. * "(flags & 0x03) == 0x03": call "ascii()" on *value* before formatting it. * "(flags & 0x04) == 0x04": pop *fmt_spec* from the stack and use it, else use an empty *fmt_spec*. Formatting is performed using "PyObject_Format()". The result is pushed on the stack. New in version 3.6. HAVE_ARGUMENT This is not really an opcode. It identifies the dividing line between opcodes which don’t use their argument and those that do ("< HAVE_ARGUMENT" and ">= HAVE_ARGUMENT", respectively). Changed in version 3.6: Now every instruction has an argument, but opcodes "< HAVE_ARGUMENT" ignore it. Before, only opcodes ">= HAVE_ARGUMENT" had an argument. Opcode collections ================== These collections are provided for automatic introspection of bytecode instructions: dis.opname Sequence of operation names, indexable using the bytecode. dis.opmap Dictionary mapping operation names to bytecodes. dis.cmp_op Sequence of all compare operation names. dis.hasconst Sequence of bytecodes that access a constant. dis.hasfree Sequence of bytecodes that access a free variable (note that ‘free’ in this context refers to names in the current scope that are referenced by inner scopes or names in outer scopes that are referenced from this scope. It does *not* include references to global or builtin scopes). dis.hasname Sequence of bytecodes that access an attribute by name. dis.hasjrel Sequence of bytecodes that have a relative jump target. dis.hasjabs Sequence of bytecodes that have an absolute jump target. dis.haslocal Sequence of bytecodes that access a local variable. dis.hascompare Sequence of bytecodes of Boolean operations. Software Packaging and Distribution *********************************** These libraries help you with publishing and installing Python software. While these modules are designed to work in conjunction with the Python Package Index, they can also be used with a local index server, or without any index server at all. * "distutils" — Building and installing Python modules * "ensurepip" — Bootstrapping the "pip" installer * Command line interface * Module API * "venv" — Creation of virtual environments * Creating virtual environments * API * An example of extending "EnvBuilder" * "zipapp" — Manage executable Python zip archives * Basic Example * Command-Line Interface * Python API * Examples * Specifying the Interpreter * Creating Standalone Applications with zipapp * Making a Windows executable * Caveats * The Python Zip Application Archive Format "distutils" — Building and installing Python modules **************************************************** ====================================================================== The "distutils" package provides support for building and installing additional modules into a Python installation. The new modules may be either 100%-pure Python, or may be extension modules written in C, or may be collections of Python packages which include modules coded in both Python and C. Most Python users will *not* want to use this module directly, but instead use the cross-version tools maintained by the Python Packaging Authority. In particular, setuptools is an enhanced alternative to "distutils" that provides: * support for declaring project dependencies * additional mechanisms for configuring which files to include in source releases (including plugins for integration with version control systems) * the ability to declare project “entry points”, which can be used as the basis for application plugin systems * the ability to automatically generate Windows command line executables at installation time rather than needing to prebuild them * consistent behaviour across all supported Python versions The recommended pip installer runs all "setup.py" scripts with "setuptools", even if the script itself only imports "distutils". Refer to the Python Packaging User Guide for more information. For the benefits of packaging tool authors and users seeking a deeper understanding of the details of the current packaging and distribution system, the legacy "distutils" based user documentation and API reference remain available: * Installing Python Modules (Legacy version) * Distributing Python Modules (Legacy version) "doctest" — Test interactive Python examples ******************************************** **Source code:** Lib/doctest.py ====================================================================== The "doctest" module searches for pieces of text that look like interactive Python sessions, and then executes those sessions to verify that they work exactly as shown. There are several common ways to use doctest: * To check that a module’s docstrings are up-to-date by verifying that all interactive examples still work as documented. * To perform regression testing by verifying that interactive examples from a test file or a test object work as expected. * To write tutorial documentation for a package, liberally illustrated with input-output examples. Depending on whether the examples or the expository text are emphasized, this has the flavor of “literate testing” or “executable documentation”. Here’s a complete but small example module: """ This is the "example" module. The example module supplies one function, factorial(). For example, >>> factorial(5) 120 """ def factorial(n): """Return the factorial of n, an exact integer >= 0. >>> [factorial(n) for n in range(6)] [1, 1, 2, 6, 24, 120] >>> factorial(30) 265252859812191058636308480000000 >>> factorial(-1) Traceback (most recent call last): ... ValueError: n must be >= 0 Factorials of floats are OK, but the float must be an exact integer: >>> factorial(30.1) Traceback (most recent call last): ... ValueError: n must be exact integer >>> factorial(30.0) 265252859812191058636308480000000 It must also not be ridiculously large: >>> factorial(1e100) Traceback (most recent call last): ... OverflowError: n too large """ import math if not n >= 0: raise ValueError("n must be >= 0") if math.floor(n) != n: raise ValueError("n must be exact integer") if n+1 == n: # catch a value like 1e300 raise OverflowError("n too large") result = 1 factor = 2 while factor <= n: result *= factor factor += 1 return result if __name__ == "__main__": import doctest doctest.testmod() If you run "example.py" directly from the command line, "doctest" works its magic: $ python example.py $ There’s no output! That’s normal, and it means all the examples worked. Pass "-v" to the script, and "doctest" prints a detailed log of what it’s trying, and prints a summary at the end: $ python example.py -v Trying: factorial(5) Expecting: 120 ok Trying: [factorial(n) for n in range(6)] Expecting: [1, 1, 2, 6, 24, 120] ok And so on, eventually ending with: Trying: factorial(1e100) Expecting: Traceback (most recent call last): ... OverflowError: n too large ok 2 items passed all tests: 1 tests in __main__ 8 tests in __main__.factorial 9 tests in 2 items. 9 passed and 0 failed. Test passed. $ That’s all you need to know to start making productive use of "doctest"! Jump in. The following sections provide full details. Note that there are many examples of doctests in the standard Python test suite and libraries. Especially useful examples can be found in the standard test file "Lib/test/test_doctest.py". Simple Usage: Checking Examples in Docstrings ============================================= The simplest way to start using doctest (but not necessarily the way you’ll continue to do it) is to end each module "M" with: if __name__ == "__main__": import doctest doctest.testmod() "doctest" then examines docstrings in module "M". Running the module as a script causes the examples in the docstrings to get executed and verified: python M.py This won’t display anything unless an example fails, in which case the failing example(s) and the cause(s) of the failure(s) are printed to stdout, and the final line of output is "***Test Failed*** N failures.", where *N* is the number of examples that failed. Run it with the "-v" switch instead: python M.py -v and a detailed report of all examples tried is printed to standard output, along with assorted summaries at the end. You can force verbose mode by passing "verbose=True" to "testmod()", or prohibit it by passing "verbose=False". In either of those cases, "sys.argv" is not examined by "testmod()" (so passing "-v" or not has no effect). There is also a command line shortcut for running "testmod()". You can instruct the Python interpreter to run the doctest module directly from the standard library and pass the module name(s) on the command line: python -m doctest -v example.py This will import "example.py" as a standalone module and run "testmod()" on it. Note that this may not work correctly if the file is part of a package and imports other submodules from that package. For more information on "testmod()", see section Basic API. Simple Usage: Checking Examples in a Text File ============================================== Another simple application of doctest is testing interactive examples in a text file. This can be done with the "testfile()" function: import doctest doctest.testfile("example.txt") That short script executes and verifies any interactive Python examples contained in the file "example.txt". The file content is treated as if it were a single giant docstring; the file doesn’t need to contain a Python program! For example, perhaps "example.txt" contains this: The ``example`` module ====================== Using ``factorial`` ------------------- This is an example text file in reStructuredText format. First import ``factorial`` from the ``example`` module: >>> from example import factorial Now use it: >>> factorial(6) 120 Running "doctest.testfile("example.txt")" then finds the error in this documentation: File "./example.txt", line 14, in example.txt Failed example: factorial(6) Expected: 120 Got: 720 As with "testmod()", "testfile()" won’t display anything unless an example fails. If an example does fail, then the failing example(s) and the cause(s) of the failure(s) are printed to stdout, using the same format as "testmod()". By default, "testfile()" looks for files in the calling module’s directory. See section Basic API for a description of the optional arguments that can be used to tell it to look for files in other locations. Like "testmod()", "testfile()"’s verbosity can be set with the "-v" command-line switch or with the optional keyword argument *verbose*. There is also a command line shortcut for running "testfile()". You can instruct the Python interpreter to run the doctest module directly from the standard library and pass the file name(s) on the command line: python -m doctest -v example.txt Because the file name does not end with ".py", "doctest" infers that it must be run with "testfile()", not "testmod()". For more information on "testfile()", see section Basic API. How It Works ============ This section examines in detail how doctest works: which docstrings it looks at, how it finds interactive examples, what execution context it uses, how it handles exceptions, and how option flags can be used to control its behavior. This is the information that you need to know to write doctest examples; for information about actually running doctest on these examples, see the following sections. Which Docstrings Are Examined? ------------------------------ The module docstring, and all function, class and method docstrings are searched. Objects imported into the module are not searched. In addition, if "M.__test__" exists and “is true”, it must be a dict, and each entry maps a (string) name to a function object, class object, or string. Function and class object docstrings found from "M.__test__" are searched, and strings are treated as if they were docstrings. In output, a key "K" in "M.__test__" appears with name .__test__.K Any classes found are recursively searched similarly, to test docstrings in their contained methods and nested classes. **CPython implementation detail:** Prior to version 3.4, extension modules written in C were not fully searched by doctest. How are Docstring Examples Recognized? -------------------------------------- In most cases a copy-and-paste of an interactive console session works fine, but doctest isn’t trying to do an exact emulation of any specific Python shell. >>> # comments are ignored >>> x = 12 >>> x 12 >>> if x == 13: ... print("yes") ... else: ... print("no") ... print("NO") ... print("NO!!!") ... no NO NO!!! >>> Any expected output must immediately follow the final "'>>> '" or "'... '" line containing the code, and the expected output (if any) extends to the next "'>>> '" or all-whitespace line. The fine print: * Expected output cannot contain an all-whitespace line, since such a line is taken to signal the end of expected output. If expected output does contain a blank line, put "" in your doctest example each place a blank line is expected. * All hard tab characters are expanded to spaces, using 8-column tab stops. Tabs in output generated by the tested code are not modified. Because any hard tabs in the sample output *are* expanded, this means that if the code output includes hard tabs, the only way the doctest can pass is if the "NORMALIZE_WHITESPACE" option or directive is in effect. Alternatively, the test can be rewritten to capture the output and compare it to an expected value as part of the test. This handling of tabs in the source was arrived at through trial and error, and has proven to be the least error prone way of handling them. It is possible to use a different algorithm for handling tabs by writing a custom "DocTestParser" class. * Output to stdout is captured, but not output to stderr (exception tracebacks are captured via a different means). * If you continue a line via backslashing in an interactive session, or for any other reason use a backslash, you should use a raw docstring, which will preserve your backslashes exactly as you type them: >>> def f(x): ... r'''Backslashes in a raw docstring: m\n''' >>> print(f.__doc__) Backslashes in a raw docstring: m\n Otherwise, the backslash will be interpreted as part of the string. For example, the "\n" above would be interpreted as a newline character. Alternatively, you can double each backslash in the doctest version (and not use a raw string): >>> def f(x): ... '''Backslashes in a raw docstring: m\\n''' >>> print(f.__doc__) Backslashes in a raw docstring: m\n * The starting column doesn’t matter: >>> assert "Easy!" >>> import math >>> math.floor(1.9) 1 and as many leading whitespace characters are stripped from the expected output as appeared in the initial "'>>> '" line that started the example. What’s the Execution Context? ----------------------------- By default, each time "doctest" finds a docstring to test, it uses a *shallow copy* of "M"’s globals, so that running tests doesn’t change the module’s real globals, and so that one test in "M" can’t leave behind crumbs that accidentally allow another test to work. This means examples can freely use any names defined at top-level in "M", and names defined earlier in the docstring being run. Examples cannot see names defined in other docstrings. You can force use of your own dict as the execution context by passing "globs=your_dict" to "testmod()" or "testfile()" instead. What About Exceptions? ---------------------- No problem, provided that the traceback is the only output produced by the example: just paste in the traceback. [1] Since tracebacks contain details that are likely to change rapidly (for example, exact file paths and line numbers), this is one case where doctest works hard to be flexible in what it accepts. Simple example: >>> [1, 2, 3].remove(42) Traceback (most recent call last): File "", line 1, in ValueError: list.remove(x): x not in list That doctest succeeds if "ValueError" is raised, with the "list.remove(x): x not in list" detail as shown. The expected output for an exception must start with a traceback header, which may be either of the following two lines, indented the same as the first line of the example: Traceback (most recent call last): Traceback (innermost last): The traceback header is followed by an optional traceback stack, whose contents are ignored by doctest. The traceback stack is typically omitted, or copied verbatim from an interactive session. The traceback stack is followed by the most interesting part: the line(s) containing the exception type and detail. This is usually the last line of a traceback, but can extend across multiple lines if the exception has a multi-line detail: >>> raise ValueError('multi\n line\ndetail') Traceback (most recent call last): File "", line 1, in ValueError: multi line detail The last three lines (starting with "ValueError") are compared against the exception’s type and detail, and the rest are ignored. Best practice is to omit the traceback stack, unless it adds significant documentation value to the example. So the last example is probably better as: >>> raise ValueError('multi\n line\ndetail') Traceback (most recent call last): ... ValueError: multi line detail Note that tracebacks are treated very specially. In particular, in the rewritten example, the use of "..." is independent of doctest’s "ELLIPSIS" option. The ellipsis in that example could be left out, or could just as well be three (or three hundred) commas or digits, or an indented transcript of a Monty Python skit. Some details you should read once, but won’t need to remember: * Doctest can’t guess whether your expected output came from an exception traceback or from ordinary printing. So, e.g., an example that expects "ValueError: 42 is prime" will pass whether "ValueError" is actually raised or if the example merely prints that traceback text. In practice, ordinary output rarely begins with a traceback header line, so this doesn’t create real problems. * Each line of the traceback stack (if present) must be indented further than the first line of the example, *or* start with a non- alphanumeric character. The first line following the traceback header indented the same and starting with an alphanumeric is taken to be the start of the exception detail. Of course this does the right thing for genuine tracebacks. * When the "IGNORE_EXCEPTION_DETAIL" doctest option is specified, everything following the leftmost colon and any module information in the exception name is ignored. * The interactive shell omits the traceback header line for some "SyntaxError"s. But doctest uses the traceback header line to distinguish exceptions from non-exceptions. So in the rare case where you need to test a "SyntaxError" that omits the traceback header, you will need to manually add the traceback header line to your test example. * For some "SyntaxError"s, Python displays the character position of the syntax error, using a "^" marker: >>> 1 1 File "", line 1 1 1 ^ SyntaxError: invalid syntax Since the lines showing the position of the error come before the exception type and detail, they are not checked by doctest. For example, the following test would pass, even though it puts the "^" marker in the wrong location: >>> 1 1 Traceback (most recent call last): File "", line 1 1 1 ^ SyntaxError: invalid syntax Option Flags ------------ A number of option flags control various aspects of doctest’s behavior. Symbolic names for the flags are supplied as module constants, which can be bitwise ORed together and passed to various functions. The names can also be used in doctest directives, and may be passed to the doctest command line interface via the "-o" option. New in version 3.4: The "-o" command line option. The first group of options define test semantics, controlling aspects of how doctest decides whether actual output matches an example’s expected output: doctest.DONT_ACCEPT_TRUE_FOR_1 By default, if an expected output block contains just "1", an actual output block containing just "1" or just "True" is considered to be a match, and similarly for "0" versus "False". When "DONT_ACCEPT_TRUE_FOR_1" is specified, neither substitution is allowed. The default behavior caters to that Python changed the return type of many functions from integer to boolean; doctests expecting “little integer” output still work in these cases. This option will probably go away, but not for several years. doctest.DONT_ACCEPT_BLANKLINE By default, if an expected output block contains a line containing only the string "", then that line will match a blank line in the actual output. Because a genuinely blank line delimits the expected output, this is the only way to communicate that a blank line is expected. When "DONT_ACCEPT_BLANKLINE" is specified, this substitution is not allowed. doctest.NORMALIZE_WHITESPACE When specified, all sequences of whitespace (blanks and newlines) are treated as equal. Any sequence of whitespace within the expected output will match any sequence of whitespace within the actual output. By default, whitespace must match exactly. "NORMALIZE_WHITESPACE" is especially useful when a line of expected output is very long, and you want to wrap it across multiple lines in your source. doctest.ELLIPSIS When specified, an ellipsis marker ("...") in the expected output can match any substring in the actual output. This includes substrings that span line boundaries, and empty substrings, so it’s best to keep usage of this simple. Complicated uses can lead to the same kinds of “oops, it matched too much!” surprises that ".*" is prone to in regular expressions. doctest.IGNORE_EXCEPTION_DETAIL When specified, doctests expecting exceptions pass so long as an exception of the expected type is raised, even if the details (message and fully-qualified exception name) don’t match. For example, an example expecting "ValueError: 42" will pass if the actual exception raised is "ValueError: 3*14", but will fail if, say, a "TypeError" is raised instead. It will also ignore any fully-qualified name included before the exception class, which can vary between implementations and versions of Python and the code/libraries in use. Hence, all three of these variations will work with the flag specified: >>> raise Exception('message') Traceback (most recent call last): Exception: message >>> raise Exception('message') Traceback (most recent call last): builtins.Exception: message >>> raise Exception('message') Traceback (most recent call last): __main__.Exception: message Note that "ELLIPSIS" can also be used to ignore the details of the exception message, but such a test may still fail based on whether the module name is present or matches exactly. Changed in version 3.2: "IGNORE_EXCEPTION_DETAIL" now also ignores any information relating to the module containing the exception under test. doctest.SKIP When specified, do not run the example at all. This can be useful in contexts where doctest examples serve as both documentation and test cases, and an example should be included for documentation purposes, but should not be checked. E.g., the example’s output might be random; or the example might depend on resources which would be unavailable to the test driver. The SKIP flag can also be used for temporarily “commenting out” examples. doctest.COMPARISON_FLAGS A bitmask or’ing together all the comparison flags above. The second group of options controls how test failures are reported: doctest.REPORT_UDIFF When specified, failures that involve multi-line expected and actual outputs are displayed using a unified diff. doctest.REPORT_CDIFF When specified, failures that involve multi-line expected and actual outputs will be displayed using a context diff. doctest.REPORT_NDIFF When specified, differences are computed by "difflib.Differ", using the same algorithm as the popular "ndiff.py" utility. This is the only method that marks differences within lines as well as across lines. For example, if a line of expected output contains digit "1" where actual output contains letter "l", a line is inserted with a caret marking the mismatching column positions. doctest.REPORT_ONLY_FIRST_FAILURE When specified, display the first failing example in each doctest, but suppress output for all remaining examples. This will prevent doctest from reporting correct examples that break because of earlier failures; but it might also hide incorrect examples that fail independently of the first failure. When "REPORT_ONLY_FIRST_FAILURE" is specified, the remaining examples are still run, and still count towards the total number of failures reported; only the output is suppressed. doctest.FAIL_FAST When specified, exit after the first failing example and don’t attempt to run the remaining examples. Thus, the number of failures reported will be at most 1. This flag may be useful during debugging, since examples after the first failure won’t even produce debugging output. The doctest command line accepts the option "-f" as a shorthand for "-o FAIL_FAST". New in version 3.4. doctest.REPORTING_FLAGS A bitmask or’ing together all the reporting flags above. There is also a way to register new option flag names, though this isn’t useful unless you intend to extend "doctest" internals via subclassing: doctest.register_optionflag(name) Create a new option flag with a given name, and return the new flag’s integer value. "register_optionflag()" can be used when subclassing "OutputChecker" or "DocTestRunner" to create new options that are supported by your subclasses. "register_optionflag()" should always be called using the following idiom: MY_FLAG = register_optionflag('MY_FLAG') Directives ---------- Doctest directives may be used to modify the option flags for an individual example. Doctest directives are special Python comments following an example’s source code: directive ::= "#" "doctest:" directive_options directive_options ::= directive_option ("," directive_option)\* directive_option ::= on_or_off directive_option_name on_or_off ::= "+" \| "-" directive_option_name ::= "DONT_ACCEPT_BLANKLINE" \| "NORMALIZE_WHITESPACE" \| ... Whitespace is not allowed between the "+" or "-" and the directive option name. The directive option name can be any of the option flag names explained above. An example’s doctest directives modify doctest’s behavior for that single example. Use "+" to enable the named behavior, or "-" to disable it. For example, this test passes: >>> print(list(range(20))) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19] Without the directive it would fail, both because the actual output doesn’t have two blanks before the single-digit list elements, and because the actual output is on a single line. This test also passes, and also requires a directive to do so: >>> print(list(range(20))) [0, 1, ..., 18, 19] Multiple directives can be used on a single physical line, separated by commas: >>> print(list(range(20))) [0, 1, ..., 18, 19] If multiple directive comments are used for a single example, then they are combined: >>> print(list(range(20))) ... [0, 1, ..., 18, 19] As the previous example shows, you can add "..." lines to your example containing only directives. This can be useful when an example is too long for a directive to comfortably fit on the same line: >>> print(list(range(5)) + list(range(10, 20)) + list(range(30, 40))) ... [0, ..., 4, 10, ..., 19, 30, ..., 39] Note that since all options are disabled by default, and directives apply only to the example they appear in, enabling options (via "+" in a directive) is usually the only meaningful choice. However, option flags can also be passed to functions that run doctests, establishing different defaults. In such cases, disabling an option via "-" in a directive can be useful. Warnings -------- "doctest" is serious about requiring exact matches in expected output. If even a single character doesn’t match, the test fails. This will probably surprise you a few times, as you learn exactly what Python does and doesn’t guarantee about output. For example, when printing a set, Python doesn’t guarantee that the element is printed in any particular order, so a test like >>> foo() {"Hermione", "Harry"} is vulnerable! One workaround is to do >>> foo() == {"Hermione", "Harry"} True instead. Another is to do >>> d = sorted(foo()) >>> d ['Harry', 'Hermione'] Note: Before Python 3.6, when printing a dict, Python did not guarantee that the key-value pairs was printed in any particular order. There are others, but you get the idea. Another bad idea is to print things that embed an object address, like >>> id(1.0) # certain to fail some of the time 7948648 >>> class C: pass >>> C() # the default repr() for instances embeds an address <__main__.C instance at 0x00AC18F0> The "ELLIPSIS" directive gives a nice approach for the last example: >>> C() <__main__.C instance at 0x...> Floating-point numbers are also subject to small output variations across platforms, because Python defers to the platform C library for float formatting, and C libraries vary widely in quality here. >>> 1./7 # risky 0.14285714285714285 >>> print(1./7) # safer 0.142857142857 >>> print(round(1./7, 6)) # much safer 0.142857 Numbers of the form "I/2.**J" are safe across all platforms, and I often contrive doctest examples to produce numbers of that form: >>> 3./4 # utterly safe 0.75 Simple fractions are also easier for people to understand, and that makes for better documentation. Basic API ========= The functions "testmod()" and "testfile()" provide a simple interface to doctest that should be sufficient for most basic uses. For a less formal introduction to these two functions, see sections Simple Usage: Checking Examples in Docstrings and Simple Usage: Checking Examples in a Text File. doctest.testfile(filename, module_relative=True, name=None, package=None, globs=None, verbose=None, report=True, optionflags=0, extraglobs=None, raise_on_error=False, parser=DocTestParser(), encoding=None) All arguments except *filename* are optional, and should be specified in keyword form. Test examples in the file named *filename*. Return "(failure_count, test_count)". Optional argument *module_relative* specifies how the filename should be interpreted: * If *module_relative* is "True" (the default), then *filename* specifies an OS-independent module-relative path. By default, this path is relative to the calling module’s directory; but if the *package* argument is specified, then it is relative to that package. To ensure OS-independence, *filename* should use "/" characters to separate path segments, and may not be an absolute path (i.e., it may not begin with "/"). * If *module_relative* is "False", then *filename* specifies an OS- specific path. The path may be absolute or relative; relative paths are resolved with respect to the current working directory. Optional argument *name* gives the name of the test; by default, or if "None", "os.path.basename(filename)" is used. Optional argument *package* is a Python package or the name of a Python package whose directory should be used as the base directory for a module-relative filename. If no package is specified, then the calling module’s directory is used as the base directory for module-relative filenames. It is an error to specify *package* if *module_relative* is "False". Optional argument *globs* gives a dict to be used as the globals when executing examples. A new shallow copy of this dict is created for the doctest, so its examples start with a clean slate. By default, or if "None", a new empty dict is used. Optional argument *extraglobs* gives a dict merged into the globals used to execute examples. This works like "dict.update()": if *globs* and *extraglobs* have a common key, the associated value in *extraglobs* appears in the combined dict. By default, or if "None", no extra globals are used. This is an advanced feature that allows parameterization of doctests. For example, a doctest can be written for a base class, using a generic name for the class, then reused to test any number of subclasses by passing an *extraglobs* dict mapping the generic name to the subclass to be tested. Optional argument *verbose* prints lots of stuff if true, and prints only failures if false; by default, or if "None", it’s true if and only if "'-v'" is in "sys.argv". Optional argument *report* prints a summary at the end when true, else prints nothing at the end. In verbose mode, the summary is detailed, else the summary is very brief (in fact, empty if all tests passed). Optional argument *optionflags* (default value 0) takes the bitwise OR of option flags. See section Option Flags. Optional argument *raise_on_error* defaults to false. If true, an exception is raised upon the first failure or unexpected exception in an example. This allows failures to be post-mortem debugged. Default behavior is to continue running examples. Optional argument *parser* specifies a "DocTestParser" (or subclass) that should be used to extract tests from the files. It defaults to a normal parser (i.e., "DocTestParser()"). Optional argument *encoding* specifies an encoding that should be used to convert the file to unicode. doctest.testmod(m=None, name=None, globs=None, verbose=None, report=True, optionflags=0, extraglobs=None, raise_on_error=False, exclude_empty=False) All arguments are optional, and all except for *m* should be specified in keyword form. Test examples in docstrings in functions and classes reachable from module *m* (or module "__main__" if *m* is not supplied or is "None"), starting with "m.__doc__". Also test examples reachable from dict "m.__test__", if it exists and is not "None". "m.__test__" maps names (strings) to functions, classes and strings; function and class docstrings are searched for examples; strings are searched directly, as if they were docstrings. Only docstrings attached to objects belonging to module *m* are searched. Return "(failure_count, test_count)". Optional argument *name* gives the name of the module; by default, or if "None", "m.__name__" is used. Optional argument *exclude_empty* defaults to false. If true, objects for which no doctests are found are excluded from consideration. The default is a backward compatibility hack, so that code still using "doctest.master.summarize()" in conjunction with "testmod()" continues to get output for objects with no tests. The *exclude_empty* argument to the newer "DocTestFinder" constructor defaults to true. Optional arguments *extraglobs*, *verbose*, *report*, *optionflags*, *raise_on_error*, and *globs* are the same as for function "testfile()" above, except that *globs* defaults to "m.__dict__". doctest.run_docstring_examples(f, globs, verbose=False, name="NoName", compileflags=None, optionflags=0) Test examples associated with object *f*; for example, *f* may be a string, a module, a function, or a class object. A shallow copy of dictionary argument *globs* is used for the execution context. Optional argument *name* is used in failure messages, and defaults to ""NoName"". If optional argument *verbose* is true, output is generated even if there are no failures. By default, output is generated only in case of an example failure. Optional argument *compileflags* gives the set of flags that should be used by the Python compiler when running the examples. By default, or if "None", flags are deduced corresponding to the set of future features found in *globs*. Optional argument *optionflags* works as for function "testfile()" above. Unittest API ============ As your collection of doctest’ed modules grows, you’ll want a way to run all their doctests systematically. "doctest" provides two functions that can be used to create "unittest" test suites from modules and text files containing doctests. To integrate with "unittest" test discovery, include a "load_tests()" function in your test module: import unittest import doctest import my_module_with_doctests def load_tests(loader, tests, ignore): tests.addTests(doctest.DocTestSuite(my_module_with_doctests)) return tests There are two main functions for creating "unittest.TestSuite" instances from text files and modules with doctests: doctest.DocFileSuite(*paths, module_relative=True, package=None, setUp=None, tearDown=None, globs=None, optionflags=0, parser=DocTestParser(), encoding=None) Convert doctest tests from one or more text files to a "unittest.TestSuite". The returned "unittest.TestSuite" is to be run by the unittest framework and runs the interactive examples in each file. If an example in any file fails, then the synthesized unit test fails, and a "failureException" exception is raised showing the name of the file containing the test and a (sometimes approximate) line number. Pass one or more paths (as strings) to text files to be examined. Options may be provided as keyword arguments: Optional argument *module_relative* specifies how the filenames in *paths* should be interpreted: * If *module_relative* is "True" (the default), then each filename in *paths* specifies an OS-independent module-relative path. By default, this path is relative to the calling module’s directory; but if the *package* argument is specified, then it is relative to that package. To ensure OS-independence, each filename should use "/" characters to separate path segments, and may not be an absolute path (i.e., it may not begin with "/"). * If *module_relative* is "False", then each filename in *paths* specifies an OS-specific path. The path may be absolute or relative; relative paths are resolved with respect to the current working directory. Optional argument *package* is a Python package or the name of a Python package whose directory should be used as the base directory for module-relative filenames in *paths*. If no package is specified, then the calling module’s directory is used as the base directory for module-relative filenames. It is an error to specify *package* if *module_relative* is "False". Optional argument *setUp* specifies a set-up function for the test suite. This is called before running the tests in each file. The *setUp* function will be passed a "DocTest" object. The setUp function can access the test globals as the *globs* attribute of the test passed. Optional argument *tearDown* specifies a tear-down function for the test suite. This is called after running the tests in each file. The *tearDown* function will be passed a "DocTest" object. The setUp function can access the test globals as the *globs* attribute of the test passed. Optional argument *globs* is a dictionary containing the initial global variables for the tests. A new copy of this dictionary is created for each test. By default, *globs* is a new empty dictionary. Optional argument *optionflags* specifies the default doctest options for the tests, created by or-ing together individual option flags. See section Option Flags. See function "set_unittest_reportflags()" below for a better way to set reporting options. Optional argument *parser* specifies a "DocTestParser" (or subclass) that should be used to extract tests from the files. It defaults to a normal parser (i.e., "DocTestParser()"). Optional argument *encoding* specifies an encoding that should be used to convert the file to unicode. The global "__file__" is added to the globals provided to doctests loaded from a text file using "DocFileSuite()". doctest.DocTestSuite(module=None, globs=None, extraglobs=None, test_finder=None, setUp=None, tearDown=None, checker=None) Convert doctest tests for a module to a "unittest.TestSuite". The returned "unittest.TestSuite" is to be run by the unittest framework and runs each doctest in the module. If any of the doctests fail, then the synthesized unit test fails, and a "failureException" exception is raised showing the name of the file containing the test and a (sometimes approximate) line number. Optional argument *module* provides the module to be tested. It can be a module object or a (possibly dotted) module name. If not specified, the module calling this function is used. Optional argument *globs* is a dictionary containing the initial global variables for the tests. A new copy of this dictionary is created for each test. By default, *globs* is a new empty dictionary. Optional argument *extraglobs* specifies an extra set of global variables, which is merged into *globs*. By default, no extra globals are used. Optional argument *test_finder* is the "DocTestFinder" object (or a drop-in replacement) that is used to extract doctests from the module. Optional arguments *setUp*, *tearDown*, and *optionflags* are the same as for function "DocFileSuite()" above. This function uses the same search technique as "testmod()". Changed in version 3.5: "DocTestSuite()" returns an empty "unittest.TestSuite" if *module* contains no docstrings instead of raising "ValueError". Under the covers, "DocTestSuite()" creates a "unittest.TestSuite" out of "doctest.DocTestCase" instances, and "DocTestCase" is a subclass of "unittest.TestCase". "DocTestCase" isn’t documented here (it’s an internal detail), but studying its code can answer questions about the exact details of "unittest" integration. Similarly, "DocFileSuite()" creates a "unittest.TestSuite" out of "doctest.DocFileCase" instances, and "DocFileCase" is a subclass of "DocTestCase". So both ways of creating a "unittest.TestSuite" run instances of "DocTestCase". This is important for a subtle reason: when you run "doctest" functions yourself, you can control the "doctest" options in use directly, by passing option flags to "doctest" functions. However, if you’re writing a "unittest" framework, "unittest" ultimately controls when and how tests get run. The framework author typically wants to control "doctest" reporting options (perhaps, e.g., specified by command line options), but there’s no way to pass options through "unittest" to "doctest" test runners. For this reason, "doctest" also supports a notion of "doctest" reporting flags specific to "unittest" support, via this function: doctest.set_unittest_reportflags(flags) Set the "doctest" reporting flags to use. Argument *flags* takes the bitwise OR of option flags. See section Option Flags. Only “reporting flags” can be used. This is a module-global setting, and affects all future doctests run by module "unittest": the "runTest()" method of "DocTestCase" looks at the option flags specified for the test case when the "DocTestCase" instance was constructed. If no reporting flags were specified (which is the typical and expected case), "doctest"’s "unittest" reporting flags are bitwise ORed into the option flags, and the option flags so augmented are passed to the "DocTestRunner" instance created to run the doctest. If any reporting flags were specified when the "DocTestCase" instance was constructed, "doctest"’s "unittest" reporting flags are ignored. The value of the "unittest" reporting flags in effect before the function was called is returned by the function. Advanced API ============ The basic API is a simple wrapper that’s intended to make doctest easy to use. It is fairly flexible, and should meet most users’ needs; however, if you require more fine-grained control over testing, or wish to extend doctest’s capabilities, then you should use the advanced API. The advanced API revolves around two container classes, which are used to store the interactive examples extracted from doctest cases: * "Example": A single Python *statement*, paired with its expected output. * "DocTest": A collection of "Example"s, typically extracted from a single docstring or text file. Additional processing classes are defined to find, parse, and run, and check doctest examples: * "DocTestFinder": Finds all docstrings in a given module, and uses a "DocTestParser" to create a "DocTest" from every docstring that contains interactive examples. * "DocTestParser": Creates a "DocTest" object from a string (such as an object’s docstring). * "DocTestRunner": Executes the examples in a "DocTest", and uses an "OutputChecker" to verify their output. * "OutputChecker": Compares the actual output from a doctest example with the expected output, and decides whether they match. The relationships among these processing classes are summarized in the following diagram: list of: +------+ +---------+ |module| --DocTestFinder-> | DocTest | --DocTestRunner-> results +------+ | ^ +---------+ | ^ (printed) | | | Example | | | v | | ... | v | DocTestParser | Example | OutputChecker +---------+ DocTest Objects --------------- class doctest.DocTest(examples, globs, name, filename, lineno, docstring) A collection of doctest examples that should be run in a single namespace. The constructor arguments are used to initialize the attributes of the same names. "DocTest" defines the following attributes. They are initialized by the constructor, and should not be modified directly. examples A list of "Example" objects encoding the individual interactive Python examples that should be run by this test. globs The namespace (aka globals) that the examples should be run in. This is a dictionary mapping names to values. Any changes to the namespace made by the examples (such as binding new variables) will be reflected in "globs" after the test is run. name A string name identifying the "DocTest". Typically, this is the name of the object or file that the test was extracted from. filename The name of the file that this "DocTest" was extracted from; or "None" if the filename is unknown, or if the "DocTest" was not extracted from a file. lineno The line number within "filename" where this "DocTest" begins, or "None" if the line number is unavailable. This line number is zero-based with respect to the beginning of the file. docstring The string that the test was extracted from, or "None" if the string is unavailable, or if the test was not extracted from a string. Example Objects --------------- class doctest.Example(source, want, exc_msg=None, lineno=0, indent=0, options=None) A single interactive example, consisting of a Python statement and its expected output. The constructor arguments are used to initialize the attributes of the same names. "Example" defines the following attributes. They are initialized by the constructor, and should not be modified directly. source A string containing the example’s source code. This source code consists of a single Python statement, and always ends with a newline; the constructor adds a newline when necessary. want The expected output from running the example’s source code (either from stdout, or a traceback in case of exception). "want" ends with a newline unless no output is expected, in which case it’s an empty string. The constructor adds a newline when necessary. exc_msg The exception message generated by the example, if the example is expected to generate an exception; or "None" if it is not expected to generate an exception. This exception message is compared against the return value of "traceback.format_exception_only()". "exc_msg" ends with a newline unless it’s "None". The constructor adds a newline if needed. lineno The line number within the string containing this example where the example begins. This line number is zero-based with respect to the beginning of the containing string. indent The example’s indentation in the containing string, i.e., the number of space characters that precede the example’s first prompt. options A dictionary mapping from option flags to "True" or "False", which is used to override default options for this example. Any option flags not contained in this dictionary are left at their default value (as specified by the "DocTestRunner"’s "optionflags"). By default, no options are set. DocTestFinder objects --------------------- class doctest.DocTestFinder(verbose=False, parser=DocTestParser(), recurse=True, exclude_empty=True) A processing class used to extract the "DocTest"s that are relevant to a given object, from its docstring and the docstrings of its contained objects. "DocTest"s can be extracted from modules, classes, functions, methods, staticmethods, classmethods, and properties. The optional argument *verbose* can be used to display the objects searched by the finder. It defaults to "False" (no output). The optional argument *parser* specifies the "DocTestParser" object (or a drop-in replacement) that is used to extract doctests from docstrings. If the optional argument *recurse* is false, then "DocTestFinder.find()" will only examine the given object, and not any contained objects. If the optional argument *exclude_empty* is false, then "DocTestFinder.find()" will include tests for objects with empty docstrings. "DocTestFinder" defines the following method: find(obj[, name][, module][, globs][, extraglobs]) Return a list of the "DocTest"s that are defined by *obj*’s docstring, or by any of its contained objects’ docstrings. The optional argument *name* specifies the object’s name; this name will be used to construct names for the returned "DocTest"s. If *name* is not specified, then "obj.__name__" is used. The optional parameter *module* is the module that contains the given object. If the module is not specified or is "None", then the test finder will attempt to automatically determine the correct module. The object’s module is used: * As a default namespace, if *globs* is not specified. * To prevent the DocTestFinder from extracting DocTests from objects that are imported from other modules. (Contained objects with modules other than *module* are ignored.) * To find the name of the file containing the object. * To help find the line number of the object within its file. If *module* is "False", no attempt to find the module will be made. This is obscure, of use mostly in testing doctest itself: if *module* is "False", or is "None" but cannot be found automatically, then all objects are considered to belong to the (non-existent) module, so all contained objects will (recursively) be searched for doctests. The globals for each "DocTest" is formed by combining *globs* and *extraglobs* (bindings in *extraglobs* override bindings in *globs*). A new shallow copy of the globals dictionary is created for each "DocTest". If *globs* is not specified, then it defaults to the module’s *__dict__*, if specified, or "{}" otherwise. If *extraglobs* is not specified, then it defaults to "{}". DocTestParser objects --------------------- class doctest.DocTestParser A processing class used to extract interactive examples from a string, and use them to create a "DocTest" object. "DocTestParser" defines the following methods: get_doctest(string, globs, name, filename, lineno) Extract all doctest examples from the given string, and collect them into a "DocTest" object. *globs*, *name*, *filename*, and *lineno* are attributes for the new "DocTest" object. See the documentation for "DocTest" for more information. get_examples(string, name='') Extract all doctest examples from the given string, and return them as a list of "Example" objects. Line numbers are 0-based. The optional argument *name* is a name identifying this string, and is only used for error messages. parse(string, name='') Divide the given string into examples and intervening text, and return them as a list of alternating "Example"s and strings. Line numbers for the "Example"s are 0-based. The optional argument *name* is a name identifying this string, and is only used for error messages. DocTestRunner objects --------------------- class doctest.DocTestRunner(checker=None, verbose=None, optionflags=0) A processing class used to execute and verify the interactive examples in a "DocTest". The comparison between expected outputs and actual outputs is done by an "OutputChecker". This comparison may be customized with a number of option flags; see section Option Flags for more information. If the option flags are insufficient, then the comparison may also be customized by passing a subclass of "OutputChecker" to the constructor. The test runner’s display output can be controlled in two ways. First, an output function can be passed to "TestRunner.run()"; this function will be called with strings that should be displayed. It defaults to "sys.stdout.write". If capturing the output is not sufficient, then the display output can be also customized by subclassing DocTestRunner, and overriding the methods "report_start()", "report_success()", "report_unexpected_exception()", and "report_failure()". The optional keyword argument *checker* specifies the "OutputChecker" object (or drop-in replacement) that should be used to compare the expected outputs to the actual outputs of doctest examples. The optional keyword argument *verbose* controls the "DocTestRunner"’s verbosity. If *verbose* is "True", then information is printed about each example, as it is run. If *verbose* is "False", then only failures are printed. If *verbose* is unspecified, or "None", then verbose output is used iff the command-line switch "-v" is used. The optional keyword argument *optionflags* can be used to control how the test runner compares expected output to actual output, and how it displays failures. For more information, see section Option Flags. "DocTestParser" defines the following methods: report_start(out, test, example) Report that the test runner is about to process the given example. This method is provided to allow subclasses of "DocTestRunner" to customize their output; it should not be called directly. *example* is the example about to be processed. *test* is the test *containing example*. *out* is the output function that was passed to "DocTestRunner.run()". report_success(out, test, example, got) Report that the given example ran successfully. This method is provided to allow subclasses of "DocTestRunner" to customize their output; it should not be called directly. *example* is the example about to be processed. *got* is the actual output from the example. *test* is the test containing *example*. *out* is the output function that was passed to "DocTestRunner.run()". report_failure(out, test, example, got) Report that the given example failed. This method is provided to allow subclasses of "DocTestRunner" to customize their output; it should not be called directly. *example* is the example about to be processed. *got* is the actual output from the example. *test* is the test containing *example*. *out* is the output function that was passed to "DocTestRunner.run()". report_unexpected_exception(out, test, example, exc_info) Report that the given example raised an unexpected exception. This method is provided to allow subclasses of "DocTestRunner" to customize their output; it should not be called directly. *example* is the example about to be processed. *exc_info* is a tuple containing information about the unexpected exception (as returned by "sys.exc_info()"). *test* is the test containing *example*. *out* is the output function that was passed to "DocTestRunner.run()". run(test, compileflags=None, out=None, clear_globs=True) Run the examples in *test* (a "DocTest" object), and display the results using the writer function *out*. The examples are run in the namespace "test.globs". If *clear_globs* is true (the default), then this namespace will be cleared after the test runs, to help with garbage collection. If you would like to examine the namespace after the test completes, then use *clear_globs=False*. *compileflags* gives the set of flags that should be used by the Python compiler when running the examples. If not specified, then it will default to the set of future-import flags that apply to *globs*. The output of each example is checked using the "DocTestRunner"’s output checker, and the results are formatted by the "DocTestRunner.report_*()" methods. summarize(verbose=None) Print a summary of all the test cases that have been run by this DocTestRunner, and return a *named tuple* "TestResults(failed, attempted)". The optional *verbose* argument controls how detailed the summary is. If the verbosity is not specified, then the "DocTestRunner"’s verbosity is used. OutputChecker objects --------------------- class doctest.OutputChecker A class used to check the whether the actual output from a doctest example matches the expected output. "OutputChecker" defines two methods: "check_output()", which compares a given pair of outputs, and returns "True" if they match; and "output_difference()", which returns a string describing the differences between two outputs. "OutputChecker" defines the following methods: check_output(want, got, optionflags) Return "True" iff the actual output from an example (*got*) matches the expected output (*want*). These strings are always considered to match if they are identical; but depending on what option flags the test runner is using, several non-exact match types are also possible. See section Option Flags for more information about option flags. output_difference(example, got, optionflags) Return a string describing the differences between the expected output for a given example (*example*) and the actual output (*got*). *optionflags* is the set of option flags used to compare *want* and *got*. Debugging ========= Doctest provides several mechanisms for debugging doctest examples: * Several functions convert doctests to executable Python programs, which can be run under the Python debugger, "pdb". * The "DebugRunner" class is a subclass of "DocTestRunner" that raises an exception for the first failing example, containing information about that example. This information can be used to perform post- mortem debugging on the example. * The "unittest" cases generated by "DocTestSuite()" support the "debug()" method defined by "unittest.TestCase". * You can add a call to "pdb.set_trace()" in a doctest example, and you’ll drop into the Python debugger when that line is executed. Then you can inspect current values of variables, and so on. For example, suppose "a.py" contains just this module docstring: """ >>> def f(x): ... g(x*2) >>> def g(x): ... print(x+3) ... import pdb; pdb.set_trace() >>> f(3) 9 """ Then an interactive Python session may look like this: >>> import a, doctest >>> doctest.testmod(a) --Return-- > (3)g()->None -> import pdb; pdb.set_trace() (Pdb) list 1 def g(x): 2 print(x+3) 3 -> import pdb; pdb.set_trace() [EOF] (Pdb) p x 6 (Pdb) step --Return-- > (2)f()->None -> g(x*2) (Pdb) list 1 def f(x): 2 -> g(x*2) [EOF] (Pdb) p x 3 (Pdb) step --Return-- > (1)?()->None -> f(3) (Pdb) cont (0, 3) >>> Functions that convert doctests to Python code, and possibly run the synthesized code under the debugger: doctest.script_from_examples(s) Convert text with examples to a script. Argument *s* is a string containing doctest examples. The string is converted to a Python script, where doctest examples in *s* are converted to regular code, and everything else is converted to Python comments. The generated script is returned as a string. For example, import doctest print(doctest.script_from_examples(r""" Set x and y to 1 and 2. >>> x, y = 1, 2 Print their sum: >>> print(x+y) 3 """)) displays: # Set x and y to 1 and 2. x, y = 1, 2 # # Print their sum: print(x+y) # Expected: ## 3 This function is used internally by other functions (see below), but can also be useful when you want to transform an interactive Python session into a Python script. doctest.testsource(module, name) Convert the doctest for an object to a script. Argument *module* is a module object, or dotted name of a module, containing the object whose doctests are of interest. Argument *name* is the name (within the module) of the object with the doctests of interest. The result is a string, containing the object’s docstring converted to a Python script, as described for "script_from_examples()" above. For example, if module "a.py" contains a top-level function "f()", then import a, doctest print(doctest.testsource(a, "a.f")) prints a script version of function "f()"’s docstring, with doctests converted to code, and the rest placed in comments. doctest.debug(module, name, pm=False) Debug the doctests for an object. The *module* and *name* arguments are the same as for function "testsource()" above. The synthesized Python script for the named object’s docstring is written to a temporary file, and then that file is run under the control of the Python debugger, "pdb". A shallow copy of "module.__dict__" is used for both local and global execution context. Optional argument *pm* controls whether post-mortem debugging is used. If *pm* has a true value, the script file is run directly, and the debugger gets involved only if the script terminates via raising an unhandled exception. If it does, then post-mortem debugging is invoked, via "pdb.post_mortem()", passing the traceback object from the unhandled exception. If *pm* is not specified, or is false, the script is run under the debugger from the start, via passing an appropriate "exec()" call to "pdb.run()". doctest.debug_src(src, pm=False, globs=None) Debug the doctests in a string. This is like function "debug()" above, except that a string containing doctest examples is specified directly, via the *src* argument. Optional argument *pm* has the same meaning as in function "debug()" above. Optional argument *globs* gives a dictionary to use as both local and global execution context. If not specified, or "None", an empty dictionary is used. If specified, a shallow copy of the dictionary is used. The "DebugRunner" class, and the special exceptions it may raise, are of most interest to testing framework authors, and will only be sketched here. See the source code, and especially "DebugRunner"’s docstring (which is a doctest!) for more details: class doctest.DebugRunner(checker=None, verbose=None, optionflags=0) A subclass of "DocTestRunner" that raises an exception as soon as a failure is encountered. If an unexpected exception occurs, an "UnexpectedException" exception is raised, containing the test, the example, and the original exception. If the output doesn’t match, then a "DocTestFailure" exception is raised, containing the test, the example, and the actual output. For information about the constructor parameters and methods, see the documentation for "DocTestRunner" in section Advanced API. There are two exceptions that may be raised by "DebugRunner" instances: exception doctest.DocTestFailure(test, example, got) An exception raised by "DocTestRunner" to signal that a doctest example’s actual output did not match its expected output. The constructor arguments are used to initialize the attributes of the same names. "DocTestFailure" defines the following attributes: DocTestFailure.test The "DocTest" object that was being run when the example failed. DocTestFailure.example The "Example" that failed. DocTestFailure.got The example’s actual output. exception doctest.UnexpectedException(test, example, exc_info) An exception raised by "DocTestRunner" to signal that a doctest example raised an unexpected exception. The constructor arguments are used to initialize the attributes of the same names. "UnexpectedException" defines the following attributes: UnexpectedException.test The "DocTest" object that was being run when the example failed. UnexpectedException.example The "Example" that failed. UnexpectedException.exc_info A tuple containing information about the unexpected exception, as returned by "sys.exc_info()". Soapbox ======= As mentioned in the introduction, "doctest" has grown to have three primary uses: 1. Checking examples in docstrings. 2. Regression testing. 3. Executable documentation / literate testing. These uses have different requirements, and it is important to distinguish them. In particular, filling your docstrings with obscure test cases makes for bad documentation. When writing a docstring, choose docstring examples with care. There’s an art to this that needs to be learned—it may not be natural at first. Examples should add genuine value to the documentation. A good example can often be worth many words. If done with care, the examples will be invaluable for your users, and will pay back the time it takes to collect them many times over as the years go by and things change. I’m still amazed at how often one of my "doctest" examples stops working after a “harmless” change. Doctest also makes an excellent tool for regression testing, especially if you don’t skimp on explanatory text. By interleaving prose and examples, it becomes much easier to keep track of what’s actually being tested, and why. When a test fails, good prose can make it much easier to figure out what the problem is, and how it should be fixed. It’s true that you could write extensive comments in code-based testing, but few programmers do. Many have found that using doctest approaches instead leads to much clearer tests. Perhaps this is simply because doctest makes writing prose a little easier than writing code, while writing comments in code is a little harder. I think it goes deeper than just that: the natural attitude when writing a doctest-based test is that you want to explain the fine points of your software, and illustrate them with examples. This in turn naturally leads to test files that start with the simplest features, and logically progress to complications and edge cases. A coherent narrative is the result, instead of a collection of isolated functions that test isolated bits of functionality seemingly at random. It’s a different attitude, and produces different results, blurring the distinction between testing and explaining. Regression testing is best confined to dedicated objects or files. There are several options for organizing tests: * Write text files containing test cases as interactive examples, and test the files using "testfile()" or "DocFileSuite()". This is recommended, although is easiest to do for new projects, designed from the start to use doctest. * Define functions named "_regrtest_topic" that consist of single docstrings, containing test cases for the named topics. These functions can be included in the same file as the module, or separated out into a separate test file. * Define a "__test__" dictionary mapping from regression test topics to docstrings containing test cases. When you have placed your tests in a module, the module can itself be the test runner. When a test fails, you can arrange for your test runner to re-run only the failing doctest while you debug the problem. Here is a minimal example of such a test runner: if __name__ == '__main__': import doctest flags = doctest.REPORT_NDIFF|doctest.FAIL_FAST if len(sys.argv) > 1: name = sys.argv[1] if name in globals(): obj = globals()[name] else: obj = __test__[name] doctest.run_docstring_examples(obj, globals(), name=name, optionflags=flags) else: fail, total = doctest.testmod(optionflags=flags) print("{} failures out of {} tests".format(fail, total)) -[ Footnotes ]- [1] Examples containing both expected output and an exception are not supported. Trying to guess where one ends and the other begins is too error-prone, and that also makes for a confusing test. "email.charset": Representing character sets ******************************************** **Source code:** Lib/email/charset.py ====================================================================== This module is part of the legacy ("Compat32") email API. In the new API only the aliases table is used. The remaining text in this section is the original documentation of the module. This module provides a class "Charset" for representing character sets and character set conversions in email messages, as well as a character set registry and several convenience methods for manipulating this registry. Instances of "Charset" are used in several other modules within the "email" package. Import this class from the "email.charset" module. class email.charset.Charset(input_charset=DEFAULT_CHARSET) Map character sets to their email properties. This class provides information about the requirements imposed on email for a specific character set. It also provides convenience routines for converting between character sets, given the availability of the applicable codecs. Given a character set, it will do its best to provide information on how to use that character set in an email message in an RFC-compliant way. Certain character sets must be encoded with quoted-printable or base64 when used in email headers or bodies. Certain character sets must be converted outright, and are not allowed in email. Optional *input_charset* is as described below; it is always coerced to lower case. After being alias normalized it is also used as a lookup into the registry of character sets to find out the header encoding, body encoding, and output conversion codec to be used for the character set. For example, if *input_charset* is "iso-8859-1", then headers and bodies will be encoded using quoted- printable and no output conversion codec is necessary. If *input_charset* is "euc-jp", then headers will be encoded with base64, bodies will not be encoded, but output text will be converted from the "euc-jp" character set to the "iso-2022-jp" character set. "Charset" instances have the following data attributes: input_charset The initial character set specified. Common aliases are converted to their *official* email names (e.g. "latin_1" is converted to "iso-8859-1"). Defaults to 7-bit "us-ascii". header_encoding If the character set must be encoded before it can be used in an email header, this attribute will be set to "charset.QP" (for quoted-printable), "charset.BASE64" (for base64 encoding), or "charset.SHORTEST" for the shortest of QP or BASE64 encoding. Otherwise, it will be "None". body_encoding Same as *header_encoding*, but describes the encoding for the mail message’s body, which indeed may be different than the header encoding. "charset.SHORTEST" is not allowed for *body_encoding*. output_charset Some character sets must be converted before they can be used in email headers or bodies. If the *input_charset* is one of them, this attribute will contain the name of the character set output will be converted to. Otherwise, it will be "None". input_codec The name of the Python codec used to convert the *input_charset* to Unicode. If no conversion codec is necessary, this attribute will be "None". output_codec The name of the Python codec used to convert Unicode to the *output_charset*. If no conversion codec is necessary, this attribute will have the same value as the *input_codec*. "Charset" instances also have the following methods: get_body_encoding() Return the content transfer encoding used for body encoding. This is either the string "quoted-printable" or "base64" depending on the encoding used, or it is a function, in which case you should call the function with a single argument, the Message object being encoded. The function should then set the *Content-Transfer-Encoding* header itself to whatever is appropriate. Returns the string "quoted-printable" if *body_encoding* is "QP", returns the string "base64" if *body_encoding* is "BASE64", and returns the string "7bit" otherwise. get_output_charset() Return the output character set. This is the *output_charset* attribute if that is not "None", otherwise it is *input_charset*. header_encode(string) Header-encode the string *string*. The type of encoding (base64 or quoted-printable) will be based on the *header_encoding* attribute. header_encode_lines(string, maxlengths) Header-encode a *string* by converting it first to bytes. This is similar to "header_encode()" except that the string is fit into maximum line lengths as given by the argument *maxlengths*, which must be an iterator: each element returned from this iterator will provide the next maximum line length. body_encode(string) Body-encode the string *string*. The type of encoding (base64 or quoted-printable) will be based on the *body_encoding* attribute. The "Charset" class also provides a number of methods to support standard operations and built-in functions. __str__() Returns *input_charset* as a string coerced to lower case. "__repr__()" is an alias for "__str__()". __eq__(other) This method allows you to compare two "Charset" instances for equality. __ne__(other) This method allows you to compare two "Charset" instances for inequality. The "email.charset" module also provides the following functions for adding new entries to the global character set, alias, and codec registries: email.charset.add_charset(charset, header_enc=None, body_enc=None, output_charset=None) Add character properties to the global registry. *charset* is the input character set, and must be the canonical name of a character set. Optional *header_enc* and *body_enc* is either "charset.QP" for quoted-printable, "charset.BASE64" for base64 encoding, "charset.SHORTEST" for the shortest of quoted-printable or base64 encoding, or "None" for no encoding. "SHORTEST" is only valid for *header_enc*. The default is "None" for no encoding. Optional *output_charset* is the character set that the output should be in. Conversions will proceed from input charset, to Unicode, to the output charset when the method "Charset.convert()" is called. The default is to output in the same character set as the input. Both *input_charset* and *output_charset* must have Unicode codec entries in the module’s character set-to-codec mapping; use "add_codec()" to add codecs the module does not know about. See the "codecs" module’s documentation for more information. The global character set registry is kept in the module global dictionary "CHARSETS". email.charset.add_alias(alias, canonical) Add a character set alias. *alias* is the alias name, e.g. "latin-1". *canonical* is the character set’s canonical name, e.g. "iso-8859-1". The global charset alias registry is kept in the module global dictionary "ALIASES". email.charset.add_codec(charset, codecname) Add a codec that map characters in the given character set to and from Unicode. *charset* is the canonical name of a character set. *codecname* is the name of a Python codec, as appropriate for the second argument to the "str"’s "encode()" method. "email.message.Message": Representing an email message using the "compat32" API ******************************************************************************* The "Message" class is very similar to the "EmailMessage" class, without the methods added by that class, and with the default behavior of certain other methods being slightly different. We also document here some methods that, while supported by the "EmailMessage" class, are not recommended unless you are dealing with legacy code. The philosophy and structure of the two classes is otherwise the same. This document describes the behavior under the default (for "Message") policy "Compat32". If you are going to use another policy, you should be using the "EmailMessage" class instead. An email message consists of *headers* and a *payload*. Headers must be **RFC 5322** style names and values, where the field name and value are separated by a colon. The colon is not part of either the field name or the field value. The payload may be a simple text message, or a binary object, or a structured sequence of sub-messages each with their own set of headers and their own payload. The latter type of payload is indicated by the message having a MIME type such as *multipart/** or *message/rfc822*. The conceptual model provided by a "Message" object is that of an ordered dictionary of headers with additional methods for accessing both specialized information from the headers, for accessing the payload, for generating a serialized version of the message, and for recursively walking over the object tree. Note that duplicate headers are supported but special methods must be used to access them. The "Message" pseudo-dictionary is indexed by the header names, which must be ASCII values. The values of the dictionary are strings that are supposed to contain only ASCII characters; there is some special handling for non-ASCII input, but it doesn’t always produce the correct results. Headers are stored and returned in case-preserving form, but field names are matched case-insensitively. There may also be a single envelope header, also known as the *Unix-From* header or the "From_" header. The *payload* is either a string or bytes, in the case of simple message objects, or a list of "Message" objects, for MIME container documents (e.g. *multipart/** and *message/rfc822*). Here are the methods of the "Message" class: class email.message.Message(policy=compat32) If *policy* is specified (it must be an instance of a "policy" class) use the rules it specifies to update and serialize the representation of the message. If *policy* is not set, use the "compat32" policy, which maintains backward compatibility with the Python 3.2 version of the email package. For more information see the "policy" documentation. Changed in version 3.3: The *policy* keyword argument was added. as_string(unixfrom=False, maxheaderlen=0, policy=None) Return the entire message flattened as a string. When optional *unixfrom* is true, the envelope header is included in the returned string. *unixfrom* defaults to "False". For backward compatibility reasons, *maxheaderlen* defaults to "0", so if you want a different value you must override it explicitly (the value specified for *max_line_length* in the policy will be ignored by this method). The *policy* argument may be used to override the default policy obtained from the message instance. This can be used to control some of the formatting produced by the method, since the specified *policy* will be passed to the "Generator". Flattening the message may trigger changes to the "Message" if defaults need to be filled in to complete the transformation to a string (for example, MIME boundaries may be generated or modified). Note that this method is provided as a convenience and may not always format the message the way you want. For example, by default it does not do the mangling of lines that begin with "From" that is required by the unix mbox format. For more flexibility, instantiate a "Generator" instance and use its "flatten()" method directly. For example: from io import StringIO from email.generator import Generator fp = StringIO() g = Generator(fp, mangle_from_=True, maxheaderlen=60) g.flatten(msg) text = fp.getvalue() If the message object contains binary data that is not encoded according to RFC standards, the non-compliant data will be replaced by unicode “unknown character” code points. (See also "as_bytes()" and "BytesGenerator".) Changed in version 3.4: the *policy* keyword argument was added. __str__() Equivalent to "as_string()". Allows "str(msg)" to produce a string containing the formatted message. as_bytes(unixfrom=False, policy=None) Return the entire message flattened as a bytes object. When optional *unixfrom* is true, the envelope header is included in the returned string. *unixfrom* defaults to "False". The *policy* argument may be used to override the default policy obtained from the message instance. This can be used to control some of the formatting produced by the method, since the specified *policy* will be passed to the "BytesGenerator". Flattening the message may trigger changes to the "Message" if defaults need to be filled in to complete the transformation to a string (for example, MIME boundaries may be generated or modified). Note that this method is provided as a convenience and may not always format the message the way you want. For example, by default it does not do the mangling of lines that begin with "From" that is required by the unix mbox format. For more flexibility, instantiate a "BytesGenerator" instance and use its "flatten()" method directly. For example: from io import BytesIO from email.generator import BytesGenerator fp = BytesIO() g = BytesGenerator(fp, mangle_from_=True, maxheaderlen=60) g.flatten(msg) text = fp.getvalue() New in version 3.4. __bytes__() Equivalent to "as_bytes()". Allows "bytes(msg)" to produce a bytes object containing the formatted message. New in version 3.4. is_multipart() Return "True" if the message’s payload is a list of sub-"Message" objects, otherwise return "False". When "is_multipart()" returns "False", the payload should be a string object (which might be a CTE encoded binary payload). (Note that "is_multipart()" returning "True" does not necessarily mean that “msg.get_content_maintype() == ‘multipart’” will return the "True". For example, "is_multipart" will return "True" when the "Message" is of type "message/rfc822".) set_unixfrom(unixfrom) Set the message’s envelope header to *unixfrom*, which should be a string. get_unixfrom() Return the message’s envelope header. Defaults to "None" if the envelope header was never set. attach(payload) Add the given *payload* to the current payload, which must be "None" or a list of "Message" objects before the call. After the call, the payload will always be a list of "Message" objects. If you want to set the payload to a scalar object (e.g. a string), use "set_payload()" instead. This is a legacy method. On the "EmailMessage" class its functionality is replaced by "set_content()" and the related "make" and "add" methods. get_payload(i=None, decode=False) Return the current payload, which will be a list of "Message" objects when "is_multipart()" is "True", or a string when "is_multipart()" is "False". If the payload is a list and you mutate the list object, you modify the message’s payload in place. With optional argument *i*, "get_payload()" will return the *i*-th element of the payload, counting from zero, if "is_multipart()" is "True". An "IndexError" will be raised if *i* is less than 0 or greater than or equal to the number of items in the payload. If the payload is a string (i.e. "is_multipart()" is "False") and *i* is given, a "TypeError" is raised. Optional *decode* is a flag indicating whether the payload should be decoded or not, according to the *Content-Transfer- Encoding* header. When "True" and the message is not a multipart, the payload will be decoded if this header’s value is "quoted-printable" or "base64". If some other encoding is used, or *Content-Transfer-Encoding* header is missing, the payload is returned as-is (undecoded). In all cases the returned value is binary data. If the message is a multipart and the *decode* flag is "True", then "None" is returned. If the payload is base64 and it was not perfectly formed (missing padding, characters outside the base64 alphabet), then an appropriate defect will be added to the message’s defect property ("InvalidBase64PaddingDefect" or "InvalidBase64CharactersDefect", respectively). When *decode* is "False" (the default) the body is returned as a string without decoding the *Content-Transfer-Encoding*. However, for a *Content-Transfer-Encoding* of 8bit, an attempt is made to decode the original bytes using the "charset" specified by the *Content-Type* header, using the "replace" error handler. If no "charset" is specified, or if the "charset" given is not recognized by the email package, the body is decoded using the default ASCII charset. This is a legacy method. On the "EmailMessage" class its functionality is replaced by "get_content()" and "iter_parts()". set_payload(payload, charset=None) Set the entire message object’s payload to *payload*. It is the client’s responsibility to ensure the payload invariants. Optional *charset* sets the message’s default character set; see "set_charset()" for details. This is a legacy method. On the "EmailMessage" class its functionality is replaced by "set_content()". set_charset(charset) Set the character set of the payload to *charset*, which can either be a "Charset" instance (see "email.charset"), a string naming a character set, or "None". If it is a string, it will be converted to a "Charset" instance. If *charset* is "None", the "charset" parameter will be removed from the *Content-Type* header (the message will not be otherwise modified). Anything else will generate a "TypeError". If there is no existing *MIME-Version* header one will be added. If there is no existing *Content-Type* header, one will be added with a value of *text/plain*. Whether the *Content-Type* header already exists or not, its "charset" parameter will be set to *charset.output_charset*. If *charset.input_charset* and *charset.output_charset* differ, the payload will be re-encoded to the *output_charset*. If there is no existing *Content- Transfer-Encoding* header, then the payload will be transfer- encoded, if needed, using the specified "Charset", and a header with the appropriate value will be added. If a *Content- Transfer-Encoding* header already exists, the payload is assumed to already be correctly encoded using that *Content-Transfer- Encoding* and is not modified. This is a legacy method. On the "EmailMessage" class its functionality is replaced by the *charset* parameter of the "email.emailmessage.EmailMessage.set_content()" method. get_charset() Return the "Charset" instance associated with the message’s payload. This is a legacy method. On the "EmailMessage" class it always returns "None". The following methods implement a mapping-like interface for accessing the message’s **RFC 2822** headers. Note that there are some semantic differences between these methods and a normal mapping (i.e. dictionary) interface. For example, in a dictionary there are no duplicate keys, but here there may be duplicate message headers. Also, in dictionaries there is no guaranteed order to the keys returned by "keys()", but in a "Message" object, headers are always returned in the order they appeared in the original message, or were added to the message later. Any header deleted and then re-added are always appended to the end of the header list. These semantic differences are intentional and are biased toward maximal convenience. Note that in all cases, any envelope header present in the message is not included in the mapping interface. In a model generated from bytes, any header values that (in contravention of the RFCs) contain non-ASCII bytes will, when retrieved through this interface, be represented as "Header" objects with a charset of *unknown-8bit*. __len__() Return the total number of headers, including duplicates. __contains__(name) Return "True" if the message object has a field named *name*. Matching is done case-insensitively and *name* should not include the trailing colon. Used for the "in" operator, e.g.: if 'message-id' in myMessage: print('Message-ID:', myMessage['message-id']) __getitem__(name) Return the value of the named header field. *name* should not include the colon field separator. If the header is missing, "None" is returned; a "KeyError" is never raised. Note that if the named field appears more than once in the message’s headers, exactly which of those field values will be returned is undefined. Use the "get_all()" method to get the values of all the extant named headers. __setitem__(name, val) Add a header to the message with field name *name* and value *val*. The field is appended to the end of the message’s existing fields. Note that this does *not* overwrite or delete any existing header with the same name. If you want to ensure that the new header is the only one present in the message with field name *name*, delete the field first, e.g.: del msg['subject'] msg['subject'] = 'Python roolz!' __delitem__(name) Delete all occurrences of the field with name *name* from the message’s headers. No exception is raised if the named field isn’t present in the headers. keys() Return a list of all the message’s header field names. values() Return a list of all the message’s field values. items() Return a list of 2-tuples containing all the message’s field headers and values. get(name, failobj=None) Return the value of the named header field. This is identical to "__getitem__()" except that optional *failobj* is returned if the named header is missing (defaults to "None"). Here are some additional useful methods: get_all(name, failobj=None) Return a list of all the values for the field named *name*. If there are no such named headers in the message, *failobj* is returned (defaults to "None"). add_header(_name, _value, **_params) Extended header setting. This method is similar to "__setitem__()" except that additional header parameters can be provided as keyword arguments. *_name* is the header field to add and *_value* is the *primary* value for the header. For each item in the keyword argument dictionary *_params*, the key is taken as the parameter name, with underscores converted to dashes (since dashes are illegal in Python identifiers). Normally, the parameter will be added as "key="value"" unless the value is "None", in which case only the key will be added. If the value contains non-ASCII characters, it can be specified as a three tuple in the format "(CHARSET, LANGUAGE, VALUE)", where "CHARSET" is a string naming the charset to be used to encode the value, "LANGUAGE" can usually be set to "None" or the empty string (see **RFC 2231** for other possibilities), and "VALUE" is the string value containing non-ASCII code points. If a three tuple is not passed and the value contains non-ASCII characters, it is automatically encoded in **RFC 2231** format using a "CHARSET" of "utf-8" and a "LANGUAGE" of "None". Here’s an example: msg.add_header('Content-Disposition', 'attachment', filename='bud.gif') This will add a header that looks like Content-Disposition: attachment; filename="bud.gif" An example with non-ASCII characters: msg.add_header('Content-Disposition', 'attachment', filename=('iso-8859-1', '', 'Fußballer.ppt')) Which produces Content-Disposition: attachment; filename*="iso-8859-1''Fu%DFballer.ppt" replace_header(_name, _value) Replace a header. Replace the first header found in the message that matches *_name*, retaining header order and field name case. If no matching header was found, a "KeyError" is raised. get_content_type() Return the message’s content type. The returned string is coerced to lower case of the form *maintype/subtype*. If there was no *Content-Type* header in the message the default type as given by "get_default_type()" will be returned. Since according to **RFC 2045**, messages always have a default type, "get_content_type()" will always return a value. **RFC 2045** defines a message’s default type to be *text/plain* unless it appears inside a *multipart/digest* container, in which case it would be *message/rfc822*. If the *Content-Type* header has an invalid type specification, **RFC 2045** mandates that the default type be *text/plain*. get_content_maintype() Return the message’s main content type. This is the *maintype* part of the string returned by "get_content_type()". get_content_subtype() Return the message’s sub-content type. This is the *subtype* part of the string returned by "get_content_type()". get_default_type() Return the default content type. Most messages have a default content type of *text/plain*, except for messages that are subparts of *multipart/digest* containers. Such subparts have a default content type of *message/rfc822*. set_default_type(ctype) Set the default content type. *ctype* should either be *text/plain* or *message/rfc822*, although this is not enforced. The default content type is not stored in the *Content-Type* header. get_params(failobj=None, header='content-type', unquote=True) Return the message’s *Content-Type* parameters, as a list. The elements of the returned list are 2-tuples of key/value pairs, as split on the "'='" sign. The left hand side of the "'='" is the key, while the right hand side is the value. If there is no "'='" sign in the parameter the value is the empty string, otherwise the value is as described in "get_param()" and is unquoted if optional *unquote* is "True" (the default). Optional *failobj* is the object to return if there is no *Content-Type* header. Optional *header* is the header to search instead of *Content-Type*. This is a legacy method. On the "EmailMessage" class its functionality is replaced by the *params* property of the individual header objects returned by the header access methods. get_param(param, failobj=None, header='content-type', unquote=True) Return the value of the *Content-Type* header’s parameter *param* as a string. If the message has no *Content-Type* header or if there is no such parameter, then *failobj* is returned (defaults to "None"). Optional *header* if given, specifies the message header to use instead of *Content-Type*. Parameter keys are always compared case insensitively. The return value can either be a string, or a 3-tuple if the parameter was **RFC 2231** encoded. When it’s a 3-tuple, the elements of the value are of the form "(CHARSET, LANGUAGE, VALUE)". Note that both "CHARSET" and "LANGUAGE" can be "None", in which case you should consider "VALUE" to be encoded in the "us-ascii" charset. You can usually ignore "LANGUAGE". If your application doesn’t care whether the parameter was encoded as in **RFC 2231**, you can collapse the parameter value by calling "email.utils.collapse_rfc2231_value()", passing in the return value from "get_param()". This will return a suitably decoded Unicode string when the value is a tuple, or the original string unquoted if it isn’t. For example: rawparam = msg.get_param('foo') param = email.utils.collapse_rfc2231_value(rawparam) In any case, the parameter value (either the returned string, or the "VALUE" item in the 3-tuple) is always unquoted, unless *unquote* is set to "False". This is a legacy method. On the "EmailMessage" class its functionality is replaced by the *params* property of the individual header objects returned by the header access methods. set_param(param, value, header='Content-Type', requote=True, charset=None, language='', replace=False) Set a parameter in the *Content-Type* header. If the parameter already exists in the header, its value will be replaced with *value*. If the *Content-Type* header as not yet been defined for this message, it will be set to *text/plain* and the new parameter value will be appended as per **RFC 2045**. Optional *header* specifies an alternative header to *Content- Type*, and all parameters will be quoted as necessary unless optional *requote* is "False" (the default is "True"). If optional *charset* is specified, the parameter will be encoded according to **RFC 2231**. Optional *language* specifies the RFC 2231 language, defaulting to the empty string. Both *charset* and *language* should be strings. If *replace* is "False" (the default) the header is moved to the end of the list of headers. If *replace* is "True", the header will be updated in place. Changed in version 3.4: "replace" keyword was added. del_param(param, header='content-type', requote=True) Remove the given parameter completely from the *Content-Type* header. The header will be re-written in place without the parameter or its value. All values will be quoted as necessary unless *requote* is "False" (the default is "True"). Optional *header* specifies an alternative to *Content-Type*. set_type(type, header='Content-Type', requote=True) Set the main type and subtype for the *Content-Type* header. *type* must be a string in the form *maintype/subtype*, otherwise a "ValueError" is raised. This method replaces the *Content-Type* header, keeping all the parameters in place. If *requote* is "False", this leaves the existing header’s quoting as is, otherwise the parameters will be quoted (the default). An alternative header can be specified in the *header* argument. When the *Content-Type* header is set a *MIME-Version* header is also added. This is a legacy method. On the "EmailMessage" class its functionality is replaced by the "make_" and "add_" methods. get_filename(failobj=None) Return the value of the "filename" parameter of the *Content- Disposition* header of the message. If the header does not have a "filename" parameter, this method falls back to looking for the "name" parameter on the *Content-Type* header. If neither is found, or the header is missing, then *failobj* is returned. The returned string will always be unquoted as per "email.utils.unquote()". get_boundary(failobj=None) Return the value of the "boundary" parameter of the *Content- Type* header of the message, or *failobj* if either the header is missing, or has no "boundary" parameter. The returned string will always be unquoted as per "email.utils.unquote()". set_boundary(boundary) Set the "boundary" parameter of the *Content-Type* header to *boundary*. "set_boundary()" will always quote *boundary* if necessary. A "HeaderParseError" is raised if the message object has no *Content-Type* header. Note that using this method is subtly different than deleting the old *Content-Type* header and adding a new one with the new boundary via "add_header()", because "set_boundary()" preserves the order of the *Content-Type* header in the list of headers. However, it does *not* preserve any continuation lines which may have been present in the original *Content-Type* header. get_content_charset(failobj=None) Return the "charset" parameter of the *Content-Type* header, coerced to lower case. If there is no *Content-Type* header, or if that header has no "charset" parameter, *failobj* is returned. Note that this method differs from "get_charset()" which returns the "Charset" instance for the default encoding of the message body. get_charsets(failobj=None) Return a list containing the character set names in the message. If the message is a *multipart*, then the list will contain one element for each subpart in the payload, otherwise, it will be a list of length 1. Each item in the list will be a string which is the value of the "charset" parameter in the *Content-Type* header for the represented subpart. However, if the subpart has no *Content- Type* header, no "charset" parameter, or is not of the *text* main MIME type, then that item in the returned list will be *failobj*. get_content_disposition() Return the lowercased value (without parameters) of the message’s *Content-Disposition* header if it has one, or "None". The possible values for this method are *inline*, *attachment* or "None" if the message follows **RFC 2183**. New in version 3.5. walk() The "walk()" method is an all-purpose generator which can be used to iterate over all the parts and subparts of a message object tree, in depth-first traversal order. You will typically use "walk()" as the iterator in a "for" loop; each iteration returns the next subpart. Here’s an example that prints the MIME type of every part of a multipart message structure: >>> for part in msg.walk(): ... print(part.get_content_type()) multipart/report text/plain message/delivery-status text/plain text/plain message/rfc822 text/plain "walk" iterates over the subparts of any part where "is_multipart()" returns "True", even though "msg.get_content_maintype() == 'multipart'" may return "False". We can see this in our example by making use of the "_structure" debug helper function: >>> for part in msg.walk(): ... print(part.get_content_maintype() == 'multipart', ... part.is_multipart()) True True False False False True False False False False False True False False >>> _structure(msg) multipart/report text/plain message/delivery-status text/plain text/plain message/rfc822 text/plain Here the "message" parts are not "multiparts", but they do contain subparts. "is_multipart()" returns "True" and "walk" descends into the subparts. "Message" objects can also optionally contain two instance attributes, which can be used when generating the plain text of a MIME message. preamble The format of a MIME document allows for some text between the blank line following the headers, and the first multipart boundary string. Normally, this text is never visible in a MIME- aware mail reader because it falls outside the standard MIME armor. However, when viewing the raw text of the message, or when viewing the message in a non-MIME aware reader, this text can become visible. The *preamble* attribute contains this leading extra-armor text for MIME documents. When the "Parser" discovers some text after the headers but before the first boundary string, it assigns this text to the message’s *preamble* attribute. When the "Generator" is writing out the plain text representation of a MIME message, and it finds the message has a *preamble* attribute, it will write this text in the area between the headers and the first boundary. See "email.parser" and "email.generator" for details. Note that if the message object has no preamble, the *preamble* attribute will be "None". epilogue The *epilogue* attribute acts the same way as the *preamble* attribute, except that it contains text that appears between the last boundary and the end of the message. You do not need to set the epilogue to the empty string in order for the "Generator" to print a newline at the end of the file. defects The *defects* attribute contains a list of all the problems found when parsing this message. See "email.errors" for a detailed description of the possible parsing defects. "email.contentmanager": Managing MIME Content ********************************************* **Source code:** Lib/email/contentmanager.py ====================================================================== New in version 3.6: [1] class email.contentmanager.ContentManager Base class for content managers. Provides the standard registry mechanisms to register converters between MIME content and other representations, as well as the "get_content" and "set_content" dispatch methods. get_content(msg, *args, **kw) Look up a handler function based on the "mimetype" of *msg* (see next paragraph), call it, passing through all arguments, and return the result of the call. The expectation is that the handler will extract the payload from *msg* and return an object that encodes information about the extracted data. To find the handler, look for the following keys in the registry, stopping with the first one found: * the string representing the full MIME type ("maintype/subtype") * the string representing the "maintype" * the empty string If none of these keys produce a handler, raise a "KeyError" for the full MIME type. set_content(msg, obj, *args, **kw) If the "maintype" is "multipart", raise a "TypeError"; otherwise look up a handler function based on the type of *obj* (see next paragraph), call "clear_content()" on the *msg*, and call the handler function, passing through all arguments. The expectation is that the handler will transform and store *obj* into *msg*, possibly making other changes to *msg* as well, such as adding various MIME headers to encode information needed to interpret the stored data. To find the handler, obtain the type of *obj* ("typ = type(obj)"), and look for the following keys in the registry, stopping with the first one found: * the type itself ("typ") * the type’s fully qualified name ("typ.__module__ + '.' + typ.__qualname__"). * the type’s qualname ("typ.__qualname__") * the type’s name ("typ.__name__"). If none of the above match, repeat all of the checks above for each of the types in the *MRO* ("typ.__mro__"). Finally, if no other key yields a handler, check for a handler for the key "None". If there is no handler for "None", raise a "KeyError" for the fully qualified name of the type. Also add a *MIME-Version* header if one is not present (see also "MIMEPart"). add_get_handler(key, handler) Record the function *handler* as the handler for *key*. For the possible values of *key*, see "get_content()". add_set_handler(typekey, handler) Record *handler* as the function to call when an object of a type matching *typekey* is passed to "set_content()". For the possible values of *typekey*, see "set_content()". Content Manager Instances ========================= Currently the email package provides only one concrete content manager, "raw_data_manager", although more may be added in the future. "raw_data_manager" is the "content_manager" provided by "EmailPolicy" and its derivatives. email.contentmanager.raw_data_manager This content manager provides only a minimum interface beyond that provided by "Message" itself: it deals only with text, raw byte strings, and "Message" objects. Nevertheless, it provides significant advantages compared to the base API: "get_content" on a text part will return a unicode string without the application needing to manually decode it, "set_content" provides a rich set of options for controlling the headers added to a part and controlling the content transfer encoding, and it enables the use of the various "add_" methods, thereby simplifying the creation of multipart messages. email.contentmanager.get_content(msg, errors='replace') Return the payload of the part as either a string (for "text" parts), an "EmailMessage" object (for "message/rfc822" parts), or a "bytes" object (for all other non-multipart types). Raise a "KeyError" if called on a "multipart". If the part is a "text" part and *errors* is specified, use it as the error handler when decoding the payload to unicode. The default error handler is "replace". email.contentmanager.set_content(msg, <'str'>, subtype="plain", charset='utf-8', cte=None, disposition=None, filename=None, cid=None, params=None, headers=None) email.contentmanager.set_content(msg, <'bytes'>, maintype, subtype, cte="base64", disposition=None, filename=None, cid=None, params=None, headers=None) email.contentmanager.set_content(msg, <'EmailMessage'>, cte=None, disposition=None, filename=None, cid=None, params=None, headers=None) Add headers and payload to *msg*: Add a *Content-Type* header with a "maintype/subtype" value. * For "str", set the MIME "maintype" to "text", and set the subtype to *subtype* if it is specified, or "plain" if it is not. * For "bytes", use the specified *maintype* and *subtype*, or raise a "TypeError" if they are not specified. * For "EmailMessage" objects, set the maintype to "message", and set the subtype to *subtype* if it is specified or "rfc822" if it is not. If *subtype* is "partial", raise an error ("bytes" objects must be used to construct "message/partial" parts). If *charset* is provided (which is valid only for "str"), encode the string to bytes using the specified character set. The default is "utf-8". If the specified *charset* is a known alias for a standard MIME charset name, use the standard charset instead. If *cte* is set, encode the payload using the specified content transfer encoding, and set the *Content-Transfer-Encoding* header to that value. Possible values for *cte* are "quoted- printable", "base64", "7bit", "8bit", and "binary". If the input cannot be encoded in the specified encoding (for example, specifying a *cte* of "7bit" for an input that contains non- ASCII values), raise a "ValueError". * For "str" objects, if *cte* is not set use heuristics to determine the most compact encoding. * For "EmailMessage", per **RFC 2046**, raise an error if a *cte* of "quoted-printable" or "base64" is requested for *subtype* "rfc822", and for any *cte* other than "7bit" for *subtype* "external-body". For "message/rfc822", use "8bit" if *cte* is not specified. For all other values of *subtype*, use "7bit". Note: A *cte* of "binary" does not actually work correctly yet. The "EmailMessage" object as modified by "set_content" is correct, but "BytesGenerator" does not serialize it correctly. If *disposition* is set, use it as the value of the *Content- Disposition* header. If not specified, and *filename* is specified, add the header with the value "attachment". If *disposition* is not specified and *filename* is also not specified, do not add the header. The only valid values for *disposition* are "attachment" and "inline". If *filename* is specified, use it as the value of the "filename" parameter of the *Content-Disposition* header. If *cid* is specified, add a *Content-ID* header with *cid* as its value. If *params* is specified, iterate its "items" method and use the resulting "(key, value)" pairs to set additional parameters on the *Content-Type* header. If *headers* is specified and is a list of strings of the form "headername: headervalue" or a list of "header" objects (distinguished from strings by having a "name" attribute), add the headers to *msg*. -[ Footnotes ]- [1] Originally added in 3.4 as a *provisional module* "email.encoders": Encoders ************************** **Source code:** Lib/email/encoders.py ====================================================================== This module is part of the legacy ("Compat32") email API. In the new API the functionality is provided by the *cte* parameter of the "set_content()" method. This module is deprecated in Python 3. The functions provided here should not be called explicitly since the "MIMEText" class sets the content type and CTE header using the *_subtype* and *_charset* values passed during the instantiation of that class. The remaining text in this section is the original documentation of the module. When creating "Message" objects from scratch, you often need to encode the payloads for transport through compliant mail servers. This is especially true for *image/** and *text/** type messages containing binary data. The "email" package provides some convenient encoders in its "encoders" module. These encoders are actually used by the "MIMEAudio" and "MIMEImage" class constructors to provide default encodings. All encoder functions take exactly one argument, the message object to encode. They usually extract the payload, encode it, and reset the payload to this newly encoded value. They should also set the *Content-Transfer-Encoding* header as appropriate. Note that these functions are not meaningful for a multipart message. They must be applied to individual subparts instead, and will raise a "TypeError" if passed a message whose type is multipart. Here are the encoding functions provided: email.encoders.encode_quopri(msg) Encodes the payload into quoted-printable form and sets the *Content-Transfer-Encoding* header to "quoted-printable" [1]. This is a good encoding to use when most of your payload is normal printable data, but contains a few unprintable characters. email.encoders.encode_base64(msg) Encodes the payload into base64 form and sets the *Content- Transfer-Encoding* header to "base64". This is a good encoding to use when most of your payload is unprintable data since it is a more compact form than quoted-printable. The drawback of base64 encoding is that it renders the text non-human readable. email.encoders.encode_7or8bit(msg) This doesn’t actually modify the message’s payload, but it does set the *Content-Transfer-Encoding* header to either "7bit" or "8bit" as appropriate, based on the payload data. email.encoders.encode_noop(msg) This does nothing; it doesn’t even set the *Content-Transfer- Encoding* header. -[ Footnotes ]- [1] Note that encoding with "encode_quopri()" also encodes all tabs and space characters in the data. "email.errors": Exception and Defect classes ******************************************** **Source code:** Lib/email/errors.py ====================================================================== The following exception classes are defined in the "email.errors" module: exception email.errors.MessageError This is the base class for all exceptions that the "email" package can raise. It is derived from the standard "Exception" class and defines no additional methods. exception email.errors.MessageParseError This is the base class for exceptions raised by the "Parser" class. It is derived from "MessageError". This class is also used internally by the parser used by "headerregistry". exception email.errors.HeaderParseError Raised under some error conditions when parsing the **RFC 5322** headers of a message, this class is derived from "MessageParseError". The "set_boundary()" method will raise this error if the content type is unknown when the method is called. "Header" may raise this error for certain base64 decoding errors, and when an attempt is made to create a header that appears to contain an embedded header (that is, there is what is supposed to be a continuation line that has no leading whitespace and looks like a header). exception email.errors.BoundaryError Deprecated and no longer used. exception email.errors.MultipartConversionError Raised when a payload is added to a "Message" object using "add_payload()", but the payload is already a scalar and the message’s *Content-Type* main type is not either *multipart* or missing. "MultipartConversionError" multiply inherits from "MessageError" and the built-in "TypeError". Since "Message.add_payload()" is deprecated, this exception is rarely raised in practice. However the exception may also be raised if the "attach()" method is called on an instance of a class derived from "MIMENonMultipart" (e.g. "MIMEImage"). exception email.errors.HeaderWriteError Raised when an error occurs when the "generator" outputs headers. Here is the list of the defects that the "FeedParser" can find while parsing messages. Note that the defects are added to the message where the problem was found, so for example, if a message nested inside a *multipart/alternative* had a malformed header, that nested message object would have a defect, but the containing messages would not. All defect classes are subclassed from "email.errors.MessageDefect". * "NoBoundaryInMultipartDefect" – A message claimed to be a multipart, but had no *boundary* parameter. * "StartBoundaryNotFoundDefect" – The start boundary claimed in the *Content-Type* header was never found. * "CloseBoundaryNotFoundDefect" – A start boundary was found, but no corresponding close boundary was ever found. New in version 3.3. * "FirstHeaderLineIsContinuationDefect" – The message had a continuation line as its first header line. * "MisplacedEnvelopeHeaderDefect" - A “Unix From” header was found in the middle of a header block. * "MissingHeaderBodySeparatorDefect" - A line was found while parsing headers that had no leading white space but contained no ‘:’. Parsing continues assuming that the line represents the first line of the body. New in version 3.3. * "MalformedHeaderDefect" – A header was found that was missing a colon, or was otherwise malformed. Deprecated since version 3.3: This defect has not been used for several Python versions. * "MultipartInvariantViolationDefect" – A message claimed to be a *multipart*, but no subparts were found. Note that when a message has this defect, its "is_multipart()" method may return "False" even though its content type claims to be *multipart*. * "InvalidBase64PaddingDefect" – When decoding a block of base64 encoded bytes, the padding was not correct. Enough padding is added to perform the decode, but the resulting decoded bytes may be invalid. * "InvalidBase64CharactersDefect" – When decoding a block of base64 encoded bytes, characters outside the base64 alphabet were encountered. The characters are ignored, but the resulting decoded bytes may be invalid. * "InvalidBase64LengthDefect" – When decoding a block of base64 encoded bytes, the number of non-padding base64 characters was invalid (1 more than a multiple of 4). The encoded block was kept as-is. "email": Examples ***************** Here are a few examples of how to use the "email" package to read, write, and send simple email messages, as well as more complex MIME messages. First, let’s see how to create and send a simple text message (both the text content and the addresses may contain unicode characters): # Import smtplib for the actual sending function import smtplib # Import the email modules we'll need from email.message import EmailMessage # Open the plain text file whose name is in textfile for reading. with open(textfile) as fp: # Create a text/plain message msg = EmailMessage() msg.set_content(fp.read()) # me == the sender's email address # you == the recipient's email address msg['Subject'] = f'The contents of {textfile}' msg['From'] = me msg['To'] = you # Send the message via our own SMTP server. s = smtplib.SMTP('localhost') s.send_message(msg) s.quit() Parsing **RFC 822** headers can easily be done by the using the classes from the "parser" module: # Import the email modules we'll need from email.parser import BytesParser, Parser from email.policy import default # If the e-mail headers are in a file, uncomment these two lines: # with open(messagefile, 'rb') as fp: # headers = BytesParser(policy=default).parse(fp) # Or for parsing headers in a string (this is an uncommon operation), use: headers = Parser(policy=default).parsestr( 'From: Foo Bar \n' 'To: \n' 'Subject: Test message\n' '\n' 'Body would go here\n') # Now the header items can be accessed as a dictionary: print('To: {}'.format(headers['to'])) print('From: {}'.format(headers['from'])) print('Subject: {}'.format(headers['subject'])) # You can also access the parts of the addresses: print('Recipient username: {}'.format(headers['to'].addresses[0].username)) print('Sender name: {}'.format(headers['from'].addresses[0].display_name)) Here’s an example of how to send a MIME message containing a bunch of family pictures that may be residing in a directory: # Import smtplib for the actual sending function import smtplib # And imghdr to find the types of our images import imghdr # Here are the email package modules we'll need from email.message import EmailMessage # Create the container email message. msg = EmailMessage() msg['Subject'] = 'Our family reunion' # me == the sender's email address # family = the list of all recipients' email addresses msg['From'] = me msg['To'] = ', '.join(family) msg.preamble = 'You will not see this in a MIME-aware mail reader.\n' # Open the files in binary mode. Use imghdr to figure out the # MIME subtype for each specific image. for file in pngfiles: with open(file, 'rb') as fp: img_data = fp.read() msg.add_attachment(img_data, maintype='image', subtype=imghdr.what(None, img_data)) # Send the email via our own SMTP server. with smtplib.SMTP('localhost') as s: s.send_message(msg) Here’s an example of how to send the entire contents of a directory as an email message: [1] #!/usr/bin/env python3 """Send the contents of a directory as a MIME message.""" import os import smtplib # For guessing MIME type based on file name extension import mimetypes from argparse import ArgumentParser from email.message import EmailMessage from email.policy import SMTP def main(): parser = ArgumentParser(description="""\ Send the contents of a directory as a MIME message. Unless the -o option is given, the email is sent by forwarding to your local SMTP server, which then does the normal delivery process. Your local machine must be running an SMTP server. """) parser.add_argument('-d', '--directory', help="""Mail the contents of the specified directory, otherwise use the current directory. Only the regular files in the directory are sent, and we don't recurse to subdirectories.""") parser.add_argument('-o', '--output', metavar='FILE', help="""Print the composed message to FILE instead of sending the message to the SMTP server.""") parser.add_argument('-s', '--sender', required=True, help='The value of the From: header (required)') parser.add_argument('-r', '--recipient', required=True, action='append', metavar='RECIPIENT', default=[], dest='recipients', help='A To: header value (at least one required)') args = parser.parse_args() directory = args.directory if not directory: directory = '.' # Create the message msg = EmailMessage() msg['Subject'] = f'Contents of directory {os.path.abspath(directory)}' msg['To'] = ', '.join(args.recipients) msg['From'] = args.sender msg.preamble = 'You will not see this in a MIME-aware mail reader.\n' for filename in os.listdir(directory): path = os.path.join(directory, filename) if not os.path.isfile(path): continue # Guess the content type based on the file's extension. Encoding # will be ignored, although we should check for simple things like # gzip'd or compressed files. ctype, encoding = mimetypes.guess_type(path) if ctype is None or encoding is not None: # No guess could be made, or the file is encoded (compressed), so # use a generic bag-of-bits type. ctype = 'application/octet-stream' maintype, subtype = ctype.split('/', 1) with open(path, 'rb') as fp: msg.add_attachment(fp.read(), maintype=maintype, subtype=subtype, filename=filename) # Now send or store the message if args.output: with open(args.output, 'wb') as fp: fp.write(msg.as_bytes(policy=SMTP)) else: with smtplib.SMTP('localhost') as s: s.send_message(msg) if __name__ == '__main__': main() Here’s an example of how to unpack a MIME message like the one above, into a directory of files: #!/usr/bin/env python3 """Unpack a MIME message into a directory of files.""" import os import email import mimetypes from email.policy import default from argparse import ArgumentParser def main(): parser = ArgumentParser(description="""\ Unpack a MIME message into a directory of files. """) parser.add_argument('-d', '--directory', required=True, help="""Unpack the MIME message into the named directory, which will be created if it doesn't already exist.""") parser.add_argument('msgfile') args = parser.parse_args() with open(args.msgfile, 'rb') as fp: msg = email.message_from_binary_file(fp, policy=default) try: os.mkdir(args.directory) except FileExistsError: pass counter = 1 for part in msg.walk(): # multipart/* are just containers if part.get_content_maintype() == 'multipart': continue # Applications should really sanitize the given filename so that an # email message can't be used to overwrite important files filename = part.get_filename() if not filename: ext = mimetypes.guess_extension(part.get_content_type()) if not ext: # Use a generic bag-of-bits extension ext = '.bin' filename = f'part-{counter:03d}{ext}' counter += 1 with open(os.path.join(args.directory, filename), 'wb') as fp: fp.write(part.get_payload(decode=True)) if __name__ == '__main__': main() Here’s an example of how to create an HTML message with an alternative plain text version. To make things a bit more interesting, we include a related image in the html part, and we save a copy of what we are going to send to disk, as well as sending it. #!/usr/bin/env python3 import smtplib from email.message import EmailMessage from email.headerregistry import Address from email.utils import make_msgid # Create the base text message. msg = EmailMessage() msg['Subject'] = "Ayons asperges pour le déjeuner" msg['From'] = Address("Pepé Le Pew", "pepe", "example.com") msg['To'] = (Address("Penelope Pussycat", "penelope", "example.com"), Address("Fabrette Pussycat", "fabrette", "example.com")) msg.set_content("""\ Salut! Cela ressemble à un excellent recipie[1] déjeuner. [1] http://www.yummly.com/recipe/Roasted-Asparagus-Epicurious-203718 --Pepé """) # Add the html version. This converts the message into a multipart/alternative # container, with the original text message as the first part and the new html # message as the second part. asparagus_cid = make_msgid() msg.add_alternative("""\

Salut!

Cela ressemble à un excellent recipie déjeuner.

""".format(asparagus_cid=asparagus_cid[1:-1]), subtype='html') # note that we needed to peel the <> off the msgid for use in the html. # Now add the related image to the html part. with open("roasted-asparagus.jpg", 'rb') as img: msg.get_payload()[1].add_related(img.read(), 'image', 'jpeg', cid=asparagus_cid) # Make a local copy of what we are going to send. with open('outgoing.msg', 'wb') as f: f.write(bytes(msg)) # Send the message via local SMTP server. with smtplib.SMTP('localhost') as s: s.send_message(msg) If we were sent the message from the last example, here is one way we could process it: import os import sys import tempfile import mimetypes import webbrowser # Import the email modules we'll need from email import policy from email.parser import BytesParser # An imaginary module that would make this work and be safe. from imaginary import magic_html_parser # In a real program you'd get the filename from the arguments. with open('outgoing.msg', 'rb') as fp: msg = BytesParser(policy=policy.default).parse(fp) # Now the header items can be accessed as a dictionary, and any non-ASCII will # be converted to unicode: print('To:', msg['to']) print('From:', msg['from']) print('Subject:', msg['subject']) # If we want to print a preview of the message content, we can extract whatever # the least formatted payload is and print the first three lines. Of course, # if the message has no plain text part printing the first three lines of html # is probably useless, but this is just a conceptual example. simplest = msg.get_body(preferencelist=('plain', 'html')) print() print(''.join(simplest.get_content().splitlines(keepends=True)[:3])) ans = input("View full message?") if ans.lower()[0] == 'n': sys.exit() # We can extract the richest alternative in order to display it: richest = msg.get_body() partfiles = {} if richest['content-type'].maintype == 'text': if richest['content-type'].subtype == 'plain': for line in richest.get_content().splitlines(): print(line) sys.exit() elif richest['content-type'].subtype == 'html': body = richest else: print("Don't know how to display {}".format(richest.get_content_type())) sys.exit() elif richest['content-type'].content_type == 'multipart/related': body = richest.get_body(preferencelist=('html')) for part in richest.iter_attachments(): fn = part.get_filename() if fn: extension = os.path.splitext(part.get_filename())[1] else: extension = mimetypes.guess_extension(part.get_content_type()) with tempfile.NamedTemporaryFile(suffix=extension, delete=False) as f: f.write(part.get_content()) # again strip the <> to go from email form of cid to html form. partfiles[part['content-id'][1:-1]] = f.name else: print("Don't know how to display {}".format(richest.get_content_type())) sys.exit() with tempfile.NamedTemporaryFile(mode='w', delete=False) as f: # The magic_html_parser has to rewrite the href="cid:...." attributes to # point to the filenames in partfiles. It also has to do a safety-sanitize # of the html. It could be written using html.parser. f.write(magic_html_parser(body.get_content(), partfiles)) webbrowser.open(f.name) os.remove(f.name) for fn in partfiles.values(): os.remove(fn) # Of course, there are lots of email messages that could break this simple # minded program, but it will handle the most common ones. Up to the prompt, the output from the above is: To: Penelope Pussycat , Fabrette Pussycat From: Pepé Le Pew Subject: Ayons asperges pour le déjeuner Salut! Cela ressemble à un excellent recipie[1] déjeuner. -[ Footnotes ]- [1] Thanks to Matthew Dixon Cowles for the original inspiration and examples. "email.generator": Generating MIME documents ******************************************** **Source code:** Lib/email/generator.py ====================================================================== One of the most common tasks is to generate the flat (serialized) version of the email message represented by a message object structure. You will need to do this if you want to send your message via "smtplib.SMTP.sendmail()" or the "nntplib" module, or print the message on the console. Taking a message object structure and producing a serialized representation is the job of the generator classes. As with the "email.parser" module, you aren’t limited to the functionality of the bundled generator; you could write one from scratch yourself. However the bundled generator knows how to generate most email in a standards-compliant way, should handle MIME and non- MIME email messages just fine, and is designed so that the bytes- oriented parsing and generation operations are inverses, assuming the same non-transforming "policy" is used for both. That is, parsing the serialized byte stream via the "BytesParser" class and then regenerating the serialized byte stream using "BytesGenerator" should produce output identical to the input [1]. (On the other hand, using the generator on an "EmailMessage" constructed by program may result in changes to the "EmailMessage" object as defaults are filled in.) The "Generator" class can be used to flatten a message into a text (as opposed to binary) serialized representation, but since Unicode cannot represent binary data directly, the message is of necessity transformed into something that contains only ASCII characters, using the standard email RFC Content Transfer Encoding techniques for encoding email messages for transport over channels that are not “8 bit clean”. To accommodate reproducible processing of SMIME-signed messages "Generator" disables header folding for message parts of type "multipart/signed" and all subparts. class email.generator.BytesGenerator(outfp, mangle_from_=None, maxheaderlen=None, *, policy=None) Return a "BytesGenerator" object that will write any message provided to the "flatten()" method, or any surrogateescape encoded text provided to the "write()" method, to the *file-like object* *outfp*. *outfp* must support a "write" method that accepts binary data. If optional *mangle_from_* is "True", put a ">" character in front of any line in the body that starts with the exact string ""From "", that is "From" followed by a space at the beginning of a line. *mangle_from_* defaults to the value of the "mangle_from_" setting of the *policy* (which is "True" for the "compat32" policy and "False" for all others). *mangle_from_* is intended for use when messages are stored in unix mbox format (see "mailbox" and WHY THE CONTENT-LENGTH FORMAT IS BAD). If *maxheaderlen* is not "None", refold any header lines that are longer than *maxheaderlen*, or if "0", do not rewrap any headers. If *manheaderlen* is "None" (the default), wrap headers and other message lines according to the *policy* settings. If *policy* is specified, use that policy to control message generation. If *policy* is "None" (the default), use the policy associated with the "Message" or "EmailMessage" object passed to "flatten" to control the message generation. See "email.policy" for details on what *policy* controls. New in version 3.2. Changed in version 3.3: Added the *policy* keyword. Changed in version 3.6: The default behavior of the *mangle_from_* and *maxheaderlen* parameters is to follow the policy. flatten(msg, unixfrom=False, linesep=None) Print the textual representation of the message object structure rooted at *msg* to the output file specified when the "BytesGenerator" instance was created. If the "policy" option "cte_type" is "8bit" (the default), copy any headers in the original parsed message that have not been modified to the output with any bytes with the high bit set reproduced as in the original, and preserve the non-ASCII *Content-Transfer-Encoding* of any body parts that have them. If "cte_type" is "7bit", convert the bytes with the high bit set as needed using an ASCII-compatible *Content-Transfer-Encoding*. That is, transform parts with non-ASCII *Content-Transfer- Encoding* (*Content-Transfer-Encoding: 8bit*) to an ASCII compatible *Content-Transfer-Encoding*, and encode RFC-invalid non-ASCII bytes in headers using the MIME "unknown-8bit" character set, thus rendering them RFC-compliant. If *unixfrom* is "True", print the envelope header delimiter used by the Unix mailbox format (see "mailbox") before the first of the **RFC 5322** headers of the root message object. If the root object has no envelope header, craft a standard one. The default is "False". Note that for subparts, no envelope header is ever printed. If *linesep* is not "None", use it as the separator character between all the lines of the flattened message. If *linesep* is "None" (the default), use the value specified in the *policy*. clone(fp) Return an independent clone of this "BytesGenerator" instance with the exact same option settings, and *fp* as the new *outfp*. write(s) Encode *s* using the "ASCII" codec and the "surrogateescape" error handler, and pass it to the *write* method of the *outfp* passed to the "BytesGenerator"’s constructor. As a convenience, "EmailMessage" provides the methods "as_bytes()" and "bytes(aMessage)" (a.k.a. "__bytes__()"), which simplify the generation of a serialized binary representation of a message object. For more detail, see "email.message". Because strings cannot represent binary data, the "Generator" class must convert any binary data in any message it flattens to an ASCII compatible format, by converting them to an ASCII compatible *Content- Transfer_Encoding*. Using the terminology of the email RFCs, you can think of this as "Generator" serializing to an I/O stream that is not “8 bit clean”. In other words, most applications will want to be using "BytesGenerator", and not "Generator". class email.generator.Generator(outfp, mangle_from_=None, maxheaderlen=None, *, policy=None) Return a "Generator" object that will write any message provided to the "flatten()" method, or any text provided to the "write()" method, to the *file-like object* *outfp*. *outfp* must support a "write" method that accepts string data. If optional *mangle_from_* is "True", put a ">" character in front of any line in the body that starts with the exact string ""From "", that is "From" followed by a space at the beginning of a line. *mangle_from_* defaults to the value of the "mangle_from_" setting of the *policy* (which is "True" for the "compat32" policy and "False" for all others). *mangle_from_* is intended for use when messages are stored in unix mbox format (see "mailbox" and WHY THE CONTENT-LENGTH FORMAT IS BAD). If *maxheaderlen* is not "None", refold any header lines that are longer than *maxheaderlen*, or if "0", do not rewrap any headers. If *manheaderlen* is "None" (the default), wrap headers and other message lines according to the *policy* settings. If *policy* is specified, use that policy to control message generation. If *policy* is "None" (the default), use the policy associated with the "Message" or "EmailMessage" object passed to "flatten" to control the message generation. See "email.policy" for details on what *policy* controls. Changed in version 3.3: Added the *policy* keyword. Changed in version 3.6: The default behavior of the *mangle_from_* and *maxheaderlen* parameters is to follow the policy. flatten(msg, unixfrom=False, linesep=None) Print the textual representation of the message object structure rooted at *msg* to the output file specified when the "Generator" instance was created. If the "policy" option "cte_type" is "8bit", generate the message as if the option were set to "7bit". (This is required because strings cannot represent non-ASCII bytes.) Convert any bytes with the high bit set as needed using an ASCII-compatible *Content-Transfer-Encoding*. That is, transform parts with non- ASCII *Content-Transfer-Encoding* (*Content-Transfer-Encoding: 8bit*) to an ASCII compatible *Content-Transfer-Encoding*, and encode RFC-invalid non-ASCII bytes in headers using the MIME "unknown-8bit" character set, thus rendering them RFC-compliant. If *unixfrom* is "True", print the envelope header delimiter used by the Unix mailbox format (see "mailbox") before the first of the **RFC 5322** headers of the root message object. If the root object has no envelope header, craft a standard one. The default is "False". Note that for subparts, no envelope header is ever printed. If *linesep* is not "None", use it as the separator character between all the lines of the flattened message. If *linesep* is "None" (the default), use the value specified in the *policy*. Changed in version 3.2: Added support for re-encoding "8bit" message bodies, and the *linesep* argument. clone(fp) Return an independent clone of this "Generator" instance with the exact same options, and *fp* as the new *outfp*. write(s) Write *s* to the *write* method of the *outfp* passed to the "Generator"’s constructor. This provides just enough file-like API for "Generator" instances to be used in the "print()" function. As a convenience, "EmailMessage" provides the methods "as_string()" and "str(aMessage)" (a.k.a. "__str__()"), which simplify the generation of a formatted string representation of a message object. For more detail, see "email.message". The "email.generator" module also provides a derived class, "DecodedGenerator", which is like the "Generator" base class, except that non-*text* parts are not serialized, but are instead represented in the output stream by a string derived from a template filled in with information about the part. class email.generator.DecodedGenerator(outfp, mangle_from_=None, maxheaderlen=None, fmt=None, *, policy=None) Act like "Generator", except that for any subpart of the message passed to "Generator.flatten()", if the subpart is of main type *text*, print the decoded payload of the subpart, and if the main type is not *text*, instead of printing it fill in the string *fmt* using information from the part and print the resulting filled-in string. To fill in *fmt*, execute "fmt % part_info", where "part_info" is a dictionary composed of the following keys and values: * "type" – Full MIME type of the non-*text* part * "maintype" – Main MIME type of the non-*text* part * "subtype" – Sub-MIME type of the non-*text* part * "filename" – Filename of the non-*text* part * "description" – Description associated with the non-*text* part * "encoding" – Content transfer encoding of the non-*text* part If *fmt* is "None", use the following default *fmt*: “[Non-text (%(type)s) part of message omitted, filename %(filename)s]” Optional *_mangle_from_* and *maxheaderlen* are as with the "Generator" base class. -[ Footnotes ]- [1] This statement assumes that you use the appropriate setting for "unixfrom", and that there are no "policy" settings calling for automatic adjustments (for example, "refold_source" must be "none", which is *not* the default). It is also not 100% true, since if the message does not conform to the RFC standards occasionally information about the exact original text is lost during parsing error recovery. It is a goal to fix these latter edge cases when possible. "email.header": Internationalized headers ***************************************** **Source code:** Lib/email/header.py ====================================================================== This module is part of the legacy ("Compat32") email API. In the current API encoding and decoding of headers is handled transparently by the dictionary-like API of the "EmailMessage" class. In addition to uses in legacy code, this module can be useful in applications that need to completely control the character sets used when encoding headers. The remaining text in this section is the original documentation of the module. **RFC 2822** is the base standard that describes the format of email messages. It derives from the older **RFC 822** standard which came into widespread use at a time when most email was composed of ASCII characters only. **RFC 2822** is a specification written assuming email contains only 7-bit ASCII characters. Of course, as email has been deployed worldwide, it has become internationalized, such that language specific character sets can now be used in email messages. The base standard still requires email messages to be transferred using only 7-bit ASCII characters, so a slew of RFCs have been written describing how to encode email containing non-ASCII characters into **RFC 2822**-compliant format. These RFCs include **RFC 2045**, **RFC 2046**, **RFC 2047**, and **RFC 2231**. The "email" package supports these standards in its "email.header" and "email.charset" modules. If you want to include non-ASCII characters in your email headers, say in the *Subject* or *To* fields, you should use the "Header" class and assign the field in the "Message" object to an instance of "Header" instead of using a string for the header value. Import the "Header" class from the "email.header" module. For example: >>> from email.message import Message >>> from email.header import Header >>> msg = Message() >>> h = Header('p\xf6stal', 'iso-8859-1') >>> msg['Subject'] = h >>> msg.as_string() 'Subject: =?iso-8859-1?q?p=F6stal?=\n\n' Notice here how we wanted the *Subject* field to contain a non-ASCII character? We did this by creating a "Header" instance and passing in the character set that the byte string was encoded in. When the subsequent "Message" instance was flattened, the *Subject* field was properly **RFC 2047** encoded. MIME-aware mail readers would show this header using the embedded ISO-8859-1 character. Here is the "Header" class description: class email.header.Header(s=None, charset=None, maxlinelen=None, header_name=None, continuation_ws=' ', errors='strict') Create a MIME-compliant header that can contain strings in different character sets. Optional *s* is the initial header value. If "None" (the default), the initial header value is not set. You can later append to the header with "append()" method calls. *s* may be an instance of "bytes" or "str", but see the "append()" documentation for semantics. Optional *charset* serves two purposes: it has the same meaning as the *charset* argument to the "append()" method. It also sets the default character set for all subsequent "append()" calls that omit the *charset* argument. If *charset* is not provided in the constructor (the default), the "us-ascii" character set is used both as *s*’s initial charset and as the default for subsequent "append()" calls. The maximum line length can be specified explicitly via *maxlinelen*. For splitting the first line to a shorter value (to account for the field header which isn’t included in *s*, e.g. *Subject*) pass in the name of the field in *header_name*. The default *maxlinelen* is 76, and the default value for *header_name* is "None", meaning it is not taken into account for the first line of a long, split header. Optional *continuation_ws* must be **RFC 2822**-compliant folding whitespace, and is usually either a space or a hard tab character. This character will be prepended to continuation lines. *continuation_ws* defaults to a single space character. Optional *errors* is passed straight through to the "append()" method. append(s, charset=None, errors='strict') Append the string *s* to the MIME header. Optional *charset*, if given, should be a "Charset" instance (see "email.charset") or the name of a character set, which will be converted to a "Charset" instance. A value of "None" (the default) means that the *charset* given in the constructor is used. *s* may be an instance of "bytes" or "str". If it is an instance of "bytes", then *charset* is the encoding of that byte string, and a "UnicodeError" will be raised if the string cannot be decoded with that character set. If *s* is an instance of "str", then *charset* is a hint specifying the character set of the characters in the string. In either case, when producing an **RFC 2822**-compliant header using **RFC 2047** rules, the string will be encoded using the output codec of the charset. If the string cannot be encoded using the output codec, a UnicodeError will be raised. Optional *errors* is passed as the errors argument to the decode call if *s* is a byte string. encode(splitchars=';, \t', maxlinelen=None, linesep='\n') Encode a message header into an RFC-compliant format, possibly wrapping long lines and encapsulating non-ASCII parts in base64 or quoted-printable encodings. Optional *splitchars* is a string containing characters which should be given extra weight by the splitting algorithm during normal header wrapping. This is in very rough support of **RFC 2822**’s ‘higher level syntactic breaks’: split points preceded by a splitchar are preferred during line splitting, with the characters preferred in the order in which they appear in the string. Space and tab may be included in the string to indicate whether preference should be given to one over the other as a split point when other split chars do not appear in the line being split. Splitchars does not affect **RFC 2047** encoded lines. *maxlinelen*, if given, overrides the instance’s value for the maximum line length. *linesep* specifies the characters used to separate the lines of the folded header. It defaults to the most useful value for Python application code ("\n"), but "\r\n" can be specified in order to produce headers with RFC-compliant line separators. Changed in version 3.2: Added the *linesep* argument. The "Header" class also provides a number of methods to support standard operators and built-in functions. __str__() Returns an approximation of the "Header" as a string, using an unlimited line length. All pieces are converted to unicode using the specified encoding and joined together appropriately. Any pieces with a charset of "'unknown-8bit'" are decoded as ASCII using the "'replace'" error handler. Changed in version 3.2: Added handling for the "'unknown-8bit'" charset. __eq__(other) This method allows you to compare two "Header" instances for equality. __ne__(other) This method allows you to compare two "Header" instances for inequality. The "email.header" module also provides the following convenient functions. email.header.decode_header(header) Decode a message header value without converting the character set. The header value is in *header*. This function returns a list of "(decoded_string, charset)" pairs containing each of the decoded parts of the header. *charset* is "None" for non-encoded parts of the header, otherwise a lower case string containing the name of the character set specified in the encoded string. Here’s an example: >>> from email.header import decode_header >>> decode_header('=?iso-8859-1?q?p=F6stal?=') [(b'p\xf6stal', 'iso-8859-1')] email.header.make_header(decoded_seq, maxlinelen=None, header_name=None, continuation_ws=' ') Create a "Header" instance from a sequence of pairs as returned by "decode_header()". "decode_header()" takes a header value string and returns a sequence of pairs of the format "(decoded_string, charset)" where *charset* is the name of the character set. This function takes one of those sequence of pairs and returns a "Header" instance. Optional *maxlinelen*, *header_name*, and *continuation_ws* are as in the "Header" constructor. "email.headerregistry": Custom Header Objects ********************************************* **Source code:** Lib/email/headerregistry.py ====================================================================== New in version 3.6: [1] Headers are represented by customized subclasses of "str". The particular class used to represent a given header is determined by the "header_factory" of the "policy" in effect when the headers are created. This section documents the particular "header_factory" implemented by the email package for handling **RFC 5322** compliant email messages, which not only provides customized header objects for various header types, but also provides an extension mechanism for applications to add their own custom header types. When using any of the policy objects derived from "EmailPolicy", all headers are produced by "HeaderRegistry" and have "BaseHeader" as their last base class. Each header class has an additional base class that is determined by the type of the header. For example, many headers have the class "UnstructuredHeader" as their other base class. The specialized second class for a header is determined by the name of the header, using a lookup table stored in the "HeaderRegistry". All of this is managed transparently for the typical application program, but interfaces are provided for modifying the default behavior for use by more complex applications. The sections below first document the header base classes and their attributes, followed by the API for modifying the behavior of "HeaderRegistry", and finally the support classes used to represent the data parsed from structured headers. class email.headerregistry.BaseHeader(name, value) *name* and *value* are passed to "BaseHeader" from the "header_factory" call. The string value of any header object is the *value* fully decoded to unicode. This base class defines the following read-only properties: name The name of the header (the portion of the field before the ‘:’). This is exactly the value passed in the "header_factory" call for *name*; that is, case is preserved. defects A tuple of "HeaderDefect" instances reporting any RFC compliance problems found during parsing. The email package tries to be complete about detecting compliance issues. See the "errors" module for a discussion of the types of defects that may be reported. max_count The maximum number of headers of this type that can have the same "name". A value of "None" means unlimited. The "BaseHeader" value for this attribute is "None"; it is expected that specialized header classes will override this value as needed. "BaseHeader" also provides the following method, which is called by the email library code and should not in general be called by application programs: fold(*, policy) Return a string containing "linesep" characters as required to correctly fold the header according to *policy*. A "cte_type" of "8bit" will be treated as if it were "7bit", since headers may not contain arbitrary binary data. If "utf8" is "False", non-ASCII data will be **RFC 2047** encoded. "BaseHeader" by itself cannot be used to create a header object. It defines a protocol that each specialized header cooperates with in order to produce the header object. Specifically, "BaseHeader" requires that the specialized class provide a "classmethod()" named "parse". This method is called as follows: parse(string, kwds) "kwds" is a dictionary containing one pre-initialized key, "defects". "defects" is an empty list. The parse method should append any detected defects to this list. On return, the "kwds" dictionary *must* contain values for at least the keys "decoded" and "defects". "decoded" should be the string value for the header (that is, the header value fully decoded to unicode). The parse method should assume that *string* may contain content-transfer- encoded parts, but should correctly handle all valid unicode characters as well so that it can parse un-encoded header values. "BaseHeader"’s "__new__" then creates the header instance, and calls its "init" method. The specialized class only needs to provide an "init" method if it wishes to set additional attributes beyond those provided by "BaseHeader" itself. Such an "init" method should look like this: def init(self, /, *args, **kw): self._myattr = kw.pop('myattr') super().init(*args, **kw) That is, anything extra that the specialized class puts in to the "kwds" dictionary should be removed and handled, and the remaining contents of "kw" (and "args") passed to the "BaseHeader" "init" method. class email.headerregistry.UnstructuredHeader An “unstructured” header is the default type of header in **RFC 5322**. Any header that does not have a specified syntax is treated as unstructured. The classic example of an unstructured header is the *Subject* header. In **RFC 5322**, an unstructured header is a run of arbitrary text in the ASCII character set. **RFC 2047**, however, has an **RFC 5322** compatible mechanism for encoding non-ASCII text as ASCII characters within a header value. When a *value* containing encoded words is passed to the constructor, the "UnstructuredHeader" parser converts such encoded words into unicode, following the **RFC 2047** rules for unstructured text. The parser uses heuristics to attempt to decode certain non- compliant encoded words. Defects are registered in such cases, as well as defects for issues such as invalid characters within the encoded words or the non-encoded text. This header type provides no additional attributes. class email.headerregistry.DateHeader **RFC 5322** specifies a very specific format for dates within email headers. The "DateHeader" parser recognizes that date format, as well as recognizing a number of variant forms that are sometimes found “in the wild”. This header type provides the following additional attributes: datetime If the header value can be recognized as a valid date of one form or another, this attribute will contain a "datetime" instance representing that date. If the timezone of the input date is specified as "-0000" (indicating it is in UTC but contains no information about the source timezone), then "datetime" will be a naive "datetime". If a specific timezone offset is found (including *+0000*), then "datetime" will contain an aware "datetime" that uses "datetime.timezone" to record the timezone offset. The "decoded" value of the header is determined by formatting the "datetime" according to the **RFC 5322** rules; that is, it is set to: email.utils.format_datetime(self.datetime) When creating a "DateHeader", *value* may be "datetime" instance. This means, for example, that the following code is valid and does what one would expect: msg['Date'] = datetime(2011, 7, 15, 21) Because this is a naive "datetime" it will be interpreted as a UTC timestamp, and the resulting value will have a timezone of "-0000". Much more useful is to use the "localtime()" function from the "utils" module: msg['Date'] = utils.localtime() This example sets the date header to the current time and date using the current timezone offset. class email.headerregistry.AddressHeader Address headers are one of the most complex structured header types. The "AddressHeader" class provides a generic interface to any address header. This header type provides the following additional attributes: groups A tuple of "Group" objects encoding the addresses and groups found in the header value. Addresses that are not part of a group are represented in this list as single-address "Groups" whose "display_name" is "None". addresses A tuple of "Address" objects encoding all of the individual addresses from the header value. If the header value contains any groups, the individual addresses from the group are included in the list at the point where the group occurs in the value (that is, the list of addresses is “flattened” into a one dimensional list). The "decoded" value of the header will have all encoded words decoded to unicode. "idna" encoded domain names are also decoded to unicode. The "decoded" value is set by "join"ing the "str" value of the elements of the "groups" attribute with "', '". A list of "Address" and "Group" objects in any combination may be used to set the value of an address header. "Group" objects whose "display_name" is "None" will be interpreted as single addresses, which allows an address list to be copied with groups intact by using the list obtained from the "groups" attribute of the source header. class email.headerregistry.SingleAddressHeader A subclass of "AddressHeader" that adds one additional attribute: address The single address encoded by the header value. If the header value actually contains more than one address (which would be a violation of the RFC under the default "policy"), accessing this attribute will result in a "ValueError". Many of the above classes also have a "Unique" variant (for example, "UniqueUnstructuredHeader"). The only difference is that in the "Unique" variant, "max_count" is set to 1. class email.headerregistry.MIMEVersionHeader There is really only one valid value for the *MIME-Version* header, and that is "1.0". For future proofing, this header class supports other valid version numbers. If a version number has a valid value per **RFC 2045**, then the header object will have non-"None" values for the following attributes: version The version number as a string, with any whitespace and/or comments removed. major The major version number as an integer minor The minor version number as an integer class email.headerregistry.ParameterizedMIMEHeader MIME headers all start with the prefix ‘Content-’. Each specific header has a certain value, described under the class for that header. Some can also take a list of supplemental parameters, which have a common format. This class serves as a base for all the MIME headers that take parameters. params A dictionary mapping parameter names to parameter values. class email.headerregistry.ContentTypeHeader A "ParameterizedMIMEHeader" class that handles the *Content-Type* header. content_type The content type string, in the form "maintype/subtype". maintype subtype class email.headerregistry.ContentDispositionHeader A "ParameterizedMIMEHeader" class that handles the *Content- Disposition* header. content_disposition "inline" and "attachment" are the only valid values in common use. class email.headerregistry.ContentTransferEncoding Handles the *Content-Transfer-Encoding* header. cte Valid values are "7bit", "8bit", "base64", and "quoted- printable". See **RFC 2045** for more information. class email.headerregistry.HeaderRegistry(base_class=BaseHeader, default_class=UnstructuredHeader, use_default_map=True) This is the factory used by "EmailPolicy" by default. "HeaderRegistry" builds the class used to create a header instance dynamically, using *base_class* and a specialized class retrieved from a registry that it holds. When a given header name does not appear in the registry, the class specified by *default_class* is used as the specialized class. When *use_default_map* is "True" (the default), the standard mapping of header names to classes is copied in to the registry during initialization. *base_class* is always the last class in the generated class’s "__bases__" list. The default mappings are: subject: UniqueUnstructuredHeader date: UniqueDateHeader resent-date: DateHeader orig-date: UniqueDateHeader sender: UniqueSingleAddressHeader resent-sender: SingleAddressHeader to: UniqueAddressHeader resent-to: AddressHeader cc: UniqueAddressHeader resent-cc: AddressHeader bcc: UniqueAddressHeader resent-bcc: AddressHeader from: UniqueAddressHeader resent-from: AddressHeader reply-to: UniqueAddressHeader mime-version: MIMEVersionHeader content-type: ContentTypeHeader content-disposition: ContentDispositionHeader content-transfer-encoding: ContentTransferEncodingHeader message-id: MessageIDHeader "HeaderRegistry" has the following methods: map_to_type(self, name, cls) *name* is the name of the header to be mapped. It will be converted to lower case in the registry. *cls* is the specialized class to be used, along with *base_class*, to create the class used to instantiate headers that match *name*. __getitem__(name) Construct and return a class to handle creating a *name* header. __call__(name, value) Retrieves the specialized header associated with *name* from the registry (using *default_class* if *name* does not appear in the registry) and composes it with *base_class* to produce a class, calls the constructed class’s constructor, passing it the same argument list, and finally returns the class instance created thereby. The following classes are the classes used to represent data parsed from structured headers and can, in general, be used by an application program to construct structured values to assign to specific headers. class email.headerregistry.Address(display_name='', username='', domain='', addr_spec=None) The class used to represent an email address. The general form of an address is: [display_name] or: username@domain where each part must conform to specific syntax rules spelled out in **RFC 5322**. As a convenience *addr_spec* can be specified instead of *username* and *domain*, in which case *username* and *domain* will be parsed from the *addr_spec*. An *addr_spec* must be a properly RFC quoted string; if it is not "Address" will raise an error. Unicode characters are allowed and will be property encoded when serialized. However, per the RFCs, unicode is *not* allowed in the username portion of the address. display_name The display name portion of the address, if any, with all quoting removed. If the address does not have a display name, this attribute will be an empty string. username The "username" portion of the address, with all quoting removed. domain The "domain" portion of the address. addr_spec The "username@domain" portion of the address, correctly quoted for use as a bare address (the second form shown above). This attribute is not mutable. __str__() The "str" value of the object is the address quoted according to **RFC 5322** rules, but with no Content Transfer Encoding of any non-ASCII characters. To support SMTP (**RFC 5321**), "Address" handles one special case: if "username" and "domain" are both the empty string (or "None"), then the string value of the "Address" is "<>". class email.headerregistry.Group(display_name=None, addresses=None) The class used to represent an address group. The general form of an address group is: display_name: [address-list]; As a convenience for processing lists of addresses that consist of a mixture of groups and single addresses, a "Group" may also be used to represent single addresses that are not part of a group by setting *display_name* to "None" and providing a list of the single address as *addresses*. display_name The "display_name" of the group. If it is "None" and there is exactly one "Address" in "addresses", then the "Group" represents a single address that is not in a group. addresses A possibly empty tuple of "Address" objects representing the addresses in the group. __str__() The "str" value of a "Group" is formatted according to **RFC 5322**, but with no Content Transfer Encoding of any non-ASCII characters. If "display_name" is none and there is a single "Address" in the "addresses" list, the "str" value will be the same as the "str" of that single "Address". -[ Footnotes ]- [1] Originally added in 3.3 as a *provisional module* "email.iterators": Iterators **************************** **Source code:** Lib/email/iterators.py ====================================================================== Iterating over a message object tree is fairly easy with the "Message.walk" method. The "email.iterators" module provides some useful higher level iterations over message object trees. email.iterators.body_line_iterator(msg, decode=False) This iterates over all the payloads in all the subparts of *msg*, returning the string payloads line-by-line. It skips over all the subpart headers, and it skips over any subpart with a payload that isn’t a Python string. This is somewhat equivalent to reading the flat text representation of the message from a file using "readline()", skipping over all the intervening headers. Optional *decode* is passed through to "Message.get_payload". email.iterators.typed_subpart_iterator(msg, maintype='text', subtype=None) This iterates over all the subparts of *msg*, returning only those subparts that match the MIME type specified by *maintype* and *subtype*. Note that *subtype* is optional; if omitted, then subpart MIME type matching is done only with the main type. *maintype* is optional too; it defaults to *text*. Thus, by default "typed_subpart_iterator()" returns each subpart that has a MIME type of *text/**. The following function has been added as a useful debugging tool. It should *not* be considered part of the supported public interface for the package. email.iterators._structure(msg, fp=None, level=0, include_default=False) Prints an indented representation of the content types of the message object structure. For example: >>> msg = email.message_from_file(somefile) >>> _structure(msg) multipart/mixed text/plain text/plain multipart/digest message/rfc822 text/plain message/rfc822 text/plain message/rfc822 text/plain message/rfc822 text/plain message/rfc822 text/plain text/plain Optional *fp* is a file-like object to print the output to. It must be suitable for Python’s "print()" function. *level* is used internally. *include_default*, if true, prints the default type as well. "email.message": Representing an email message ********************************************** **Source code:** Lib/email/message.py ====================================================================== New in version 3.6: [1] The central class in the "email" package is the "EmailMessage" class, imported from the "email.message" module. It is the base class for the "email" object model. "EmailMessage" provides the core functionality for setting and querying header fields, for accessing message bodies, and for creating or modifying structured messages. An email message consists of *headers* and a *payload* (which is also referred to as the *content*). Headers are **RFC 5322** or **RFC 6532** style field names and values, where the field name and value are separated by a colon. The colon is not part of either the field name or the field value. The payload may be a simple text message, or a binary object, or a structured sequence of sub-messages each with their own set of headers and their own payload. The latter type of payload is indicated by the message having a MIME type such as *multipart/** or *message/rfc822*. The conceptual model provided by an "EmailMessage" object is that of an ordered dictionary of headers coupled with a *payload* that represents the **RFC 5322** body of the message, which might be a list of sub-"EmailMessage" objects. In addition to the normal dictionary methods for accessing the header names and values, there are methods for accessing specialized information from the headers (for example the MIME content type), for operating on the payload, for generating a serialized version of the message, and for recursively walking over the object tree. The "EmailMessage" dictionary-like interface is indexed by the header names, which must be ASCII values. The values of the dictionary are strings with some extra methods. Headers are stored and returned in case-preserving form, but field names are matched case-insensitively. Unlike a real dict, there is an ordering to the keys, and there can be duplicate keys. Additional methods are provided for working with headers that have duplicate keys. The *payload* is either a string or bytes object, in the case of simple message objects, or a list of "EmailMessage" objects, for MIME container documents such as *multipart/** and *message/rfc822* message objects. class email.message.EmailMessage(policy=default) If *policy* is specified use the rules it specifies to update and serialize the representation of the message. If *policy* is not set, use the "default" policy, which follows the rules of the email RFCs except for line endings (instead of the RFC mandated "\r\n", it uses the Python standard "\n" line endings). For more information see the "policy" documentation. as_string(unixfrom=False, maxheaderlen=None, policy=None) Return the entire message flattened as a string. When optional *unixfrom* is true, the envelope header is included in the returned string. *unixfrom* defaults to "False". For backward compatibility with the base "Message" class *maxheaderlen* is accepted, but defaults to "None", which means that by default the line length is controlled by the "max_line_length" of the policy. The *policy* argument may be used to override the default policy obtained from the message instance. This can be used to control some of the formatting produced by the method, since the specified *policy* will be passed to the "Generator". Flattening the message may trigger changes to the "EmailMessage" if defaults need to be filled in to complete the transformation to a string (for example, MIME boundaries may be generated or modified). Note that this method is provided as a convenience and may not be the most useful way to serialize messages in your application, especially if you are dealing with multiple messages. See "email.generator.Generator" for a more flexible API for serializing messages. Note also that this method is restricted to producing messages serialized as “7 bit clean” when "utf8" is "False", which is the default. Changed in version 3.6: the default behavior when *maxheaderlen* is not specified was changed from defaulting to 0 to defaulting to the value of *max_line_length* from the policy. __str__() Equivalent to "as_string(policy=self.policy.clone(utf8=True))". Allows "str(msg)" to produce a string containing the serialized message in a readable format. Changed in version 3.4: the method was changed to use "utf8=True", thus producing an **RFC 6531**-like message representation, instead of being a direct alias for "as_string()". as_bytes(unixfrom=False, policy=None) Return the entire message flattened as a bytes object. When optional *unixfrom* is true, the envelope header is included in the returned string. *unixfrom* defaults to "False". The *policy* argument may be used to override the default policy obtained from the message instance. This can be used to control some of the formatting produced by the method, since the specified *policy* will be passed to the "BytesGenerator". Flattening the message may trigger changes to the "EmailMessage" if defaults need to be filled in to complete the transformation to a string (for example, MIME boundaries may be generated or modified). Note that this method is provided as a convenience and may not be the most useful way to serialize messages in your application, especially if you are dealing with multiple messages. See "email.generator.BytesGenerator" for a more flexible API for serializing messages. __bytes__() Equivalent to "as_bytes()". Allows "bytes(msg)" to produce a bytes object containing the serialized message. is_multipart() Return "True" if the message’s payload is a list of sub-"EmailMessage" objects, otherwise return "False". When "is_multipart()" returns "False", the payload should be a string object (which might be a CTE encoded binary payload). Note that "is_multipart()" returning "True" does not necessarily mean that “msg.get_content_maintype() == ‘multipart’” will return the "True". For example, "is_multipart" will return "True" when the "EmailMessage" is of type "message/rfc822". set_unixfrom(unixfrom) Set the message’s envelope header to *unixfrom*, which should be a string. (See "mboxMessage" for a brief description of this header.) get_unixfrom() Return the message’s envelope header. Defaults to "None" if the envelope header was never set. The following methods implement the mapping-like interface for accessing the message’s headers. Note that there are some semantic differences between these methods and a normal mapping (i.e. dictionary) interface. For example, in a dictionary there are no duplicate keys, but here there may be duplicate message headers. Also, in dictionaries there is no guaranteed order to the keys returned by "keys()", but in an "EmailMessage" object, headers are always returned in the order they appeared in the original message, or in which they were added to the message later. Any header deleted and then re-added is always appended to the end of the header list. These semantic differences are intentional and are biased toward convenience in the most common use cases. Note that in all cases, any envelope header present in the message is not included in the mapping interface. __len__() Return the total number of headers, including duplicates. __contains__(name) Return "True" if the message object has a field named *name*. Matching is done without regard to case and *name* does not include the trailing colon. Used for the "in" operator. For example: if 'message-id' in myMessage: print('Message-ID:', myMessage['message-id']) __getitem__(name) Return the value of the named header field. *name* does not include the colon field separator. If the header is missing, "None" is returned; a "KeyError" is never raised. Note that if the named field appears more than once in the message’s headers, exactly which of those field values will be returned is undefined. Use the "get_all()" method to get the values of all the extant headers named *name*. Using the standard (non-"compat32") policies, the returned value is an instance of a subclass of "email.headerregistry.BaseHeader". __setitem__(name, val) Add a header to the message with field name *name* and value *val*. The field is appended to the end of the message’s existing headers. Note that this does *not* overwrite or delete any existing header with the same name. If you want to ensure that the new header is the only one present in the message with field name *name*, delete the field first, e.g.: del msg['subject'] msg['subject'] = 'Python roolz!' If the "policy" defines certain headers to be unique (as the standard policies do), this method may raise a "ValueError" when an attempt is made to assign a value to such a header when one already exists. This behavior is intentional for consistency’s sake, but do not depend on it as we may choose to make such assignments do an automatic deletion of the existing header in the future. __delitem__(name) Delete all occurrences of the field with name *name* from the message’s headers. No exception is raised if the named field isn’t present in the headers. keys() Return a list of all the message’s header field names. values() Return a list of all the message’s field values. items() Return a list of 2-tuples containing all the message’s field headers and values. get(name, failobj=None) Return the value of the named header field. This is identical to "__getitem__()" except that optional *failobj* is returned if the named header is missing (*failobj* defaults to "None"). Here are some additional useful header related methods: get_all(name, failobj=None) Return a list of all the values for the field named *name*. If there are no such named headers in the message, *failobj* is returned (defaults to "None"). add_header(_name, _value, **_params) Extended header setting. This method is similar to "__setitem__()" except that additional header parameters can be provided as keyword arguments. *_name* is the header field to add and *_value* is the *primary* value for the header. For each item in the keyword argument dictionary *_params*, the key is taken as the parameter name, with underscores converted to dashes (since dashes are illegal in Python identifiers). Normally, the parameter will be added as "key="value"" unless the value is "None", in which case only the key will be added. If the value contains non-ASCII characters, the charset and language may be explicitly controlled by specifying the value as a three tuple in the format "(CHARSET, LANGUAGE, VALUE)", where "CHARSET" is a string naming the charset to be used to encode the value, "LANGUAGE" can usually be set to "None" or the empty string (see **RFC 2231** for other possibilities), and "VALUE" is the string value containing non-ASCII code points. If a three tuple is not passed and the value contains non-ASCII characters, it is automatically encoded in **RFC 2231** format using a "CHARSET" of "utf-8" and a "LANGUAGE" of "None". Here is an example: msg.add_header('Content-Disposition', 'attachment', filename='bud.gif') This will add a header that looks like Content-Disposition: attachment; filename="bud.gif" An example of the extended interface with non-ASCII characters: msg.add_header('Content-Disposition', 'attachment', filename=('iso-8859-1', '', 'Fußballer.ppt')) replace_header(_name, _value) Replace a header. Replace the first header found in the message that matches *_name*, retaining header order and field name case of the original header. If no matching header is found, raise a "KeyError". get_content_type() Return the message’s content type, coerced to lower case of the form *maintype/subtype*. If there is no *Content-Type* header in the message return the value returned by "get_default_type()". If the *Content-Type* header is invalid, return "text/plain". (According to **RFC 2045**, messages always have a default type, "get_content_type()" will always return a value. **RFC 2045** defines a message’s default type to be *text/plain* unless it appears inside a *multipart/digest* container, in which case it would be *message/rfc822*. If the *Content-Type* header has an invalid type specification, **RFC 2045** mandates that the default type be *text/plain*.) get_content_maintype() Return the message’s main content type. This is the *maintype* part of the string returned by "get_content_type()". get_content_subtype() Return the message’s sub-content type. This is the *subtype* part of the string returned by "get_content_type()". get_default_type() Return the default content type. Most messages have a default content type of *text/plain*, except for messages that are subparts of *multipart/digest* containers. Such subparts have a default content type of *message/rfc822*. set_default_type(ctype) Set the default content type. *ctype* should either be *text/plain* or *message/rfc822*, although this is not enforced. The default content type is not stored in the *Content-Type* header, so it only affects the return value of the "get_content_type" methods when no *Content-Type* header is present in the message. set_param(param, value, header='Content-Type', requote=True, charset=None, language='', replace=False) Set a parameter in the *Content-Type* header. If the parameter already exists in the header, replace its value with *value*. When *header* is "Content-Type" (the default) and the header does not yet exist in the message, add it, set its value to *text/plain*, and append the new parameter value. Optional *header* specifies an alternative header to *Content-Type*. If the value contains non-ASCII characters, the charset and language may be explicitly specified using the optional *charset* and *language* parameters. Optional *language* specifies the **RFC 2231** language, defaulting to the empty string. Both *charset* and *language* should be strings. The default is to use the "utf8" *charset* and "None" for the *language*. If *replace* is "False" (the default) the header is moved to the end of the list of headers. If *replace* is "True", the header will be updated in place. Use of the *requote* parameter with "EmailMessage" objects is deprecated. Note that existing parameter values of headers may be accessed through the "params" attribute of the header value (for example, "msg['Content-Type'].params['charset']"). Changed in version 3.4: "replace" keyword was added. del_param(param, header='content-type', requote=True) Remove the given parameter completely from the *Content-Type* header. The header will be re-written in place without the parameter or its value. Optional *header* specifies an alternative to *Content-Type*. Use of the *requote* parameter with "EmailMessage" objects is deprecated. get_filename(failobj=None) Return the value of the "filename" parameter of the *Content- Disposition* header of the message. If the header does not have a "filename" parameter, this method falls back to looking for the "name" parameter on the *Content-Type* header. If neither is found, or the header is missing, then *failobj* is returned. The returned string will always be unquoted as per "email.utils.unquote()". get_boundary(failobj=None) Return the value of the "boundary" parameter of the *Content- Type* header of the message, or *failobj* if either the header is missing, or has no "boundary" parameter. The returned string will always be unquoted as per "email.utils.unquote()". set_boundary(boundary) Set the "boundary" parameter of the *Content-Type* header to *boundary*. "set_boundary()" will always quote *boundary* if necessary. A "HeaderParseError" is raised if the message object has no *Content-Type* header. Note that using this method is subtly different from deleting the old *Content-Type* header and adding a new one with the new boundary via "add_header()", because "set_boundary()" preserves the order of the *Content-Type* header in the list of headers. get_content_charset(failobj=None) Return the "charset" parameter of the *Content-Type* header, coerced to lower case. If there is no *Content-Type* header, or if that header has no "charset" parameter, *failobj* is returned. get_charsets(failobj=None) Return a list containing the character set names in the message. If the message is a *multipart*, then the list will contain one element for each subpart in the payload, otherwise, it will be a list of length 1. Each item in the list will be a string which is the value of the "charset" parameter in the *Content-Type* header for the represented subpart. If the subpart has no *Content-Type* header, no "charset" parameter, or is not of the *text* main MIME type, then that item in the returned list will be *failobj*. is_attachment() Return "True" if there is a *Content-Disposition* header and its (case insensitive) value is "attachment", "False" otherwise. Changed in version 3.4.2: is_attachment is now a method instead of a property, for consistency with "is_multipart()". get_content_disposition() Return the lowercased value (without parameters) of the message’s *Content-Disposition* header if it has one, or "None". The possible values for this method are *inline*, *attachment* or "None" if the message follows **RFC 2183**. New in version 3.5. The following methods relate to interrogating and manipulating the content (payload) of the message. walk() The "walk()" method is an all-purpose generator which can be used to iterate over all the parts and subparts of a message object tree, in depth-first traversal order. You will typically use "walk()" as the iterator in a "for" loop; each iteration returns the next subpart. Here’s an example that prints the MIME type of every part of a multipart message structure: >>> for part in msg.walk(): ... print(part.get_content_type()) multipart/report text/plain message/delivery-status text/plain text/plain message/rfc822 text/plain "walk" iterates over the subparts of any part where "is_multipart()" returns "True", even though "msg.get_content_maintype() == 'multipart'" may return "False". We can see this in our example by making use of the "_structure" debug helper function: >>> from email.iterators import _structure >>> for part in msg.walk(): ... print(part.get_content_maintype() == 'multipart', ... part.is_multipart()) True True False False False True False False False False False True False False >>> _structure(msg) multipart/report text/plain message/delivery-status text/plain text/plain message/rfc822 text/plain Here the "message" parts are not "multiparts", but they do contain subparts. "is_multipart()" returns "True" and "walk" descends into the subparts. get_body(preferencelist=('related', 'html', 'plain')) Return the MIME part that is the best candidate to be the “body” of the message. *preferencelist* must be a sequence of strings from the set "related", "html", and "plain", and indicates the order of preference for the content type of the part returned. Start looking for candidate matches with the object on which the "get_body" method is called. If "related" is not included in *preferencelist*, consider the root part (or subpart of the root part) of any related encountered as a candidate if the (sub-)part matches a preference. When encountering a "multipart/related", check the "start" parameter and if a part with a matching *Content-ID* is found, consider only it when looking for candidate matches. Otherwise consider only the first (default root) part of the "multipart/related". If a part has a *Content-Disposition* header, only consider the part a candidate match if the value of the header is "inline". If none of the candidates matches any of the preferences in *preferencelist*, return "None". Notes: (1) For most applications the only *preferencelist* combinations that really make sense are "('plain',)", "('html', 'plain')", and the default "('related', 'html', 'plain')". (2) Because matching starts with the object on which "get_body" is called, calling "get_body" on a "multipart/related" will return the object itself unless *preferencelist* has a non-default value. (3) Messages (or message parts) that do not specify a *Content-Type* or whose *Content-Type* header is invalid will be treated as if they are of type "text/plain", which may occasionally cause "get_body" to return unexpected results. iter_attachments() Return an iterator over all of the immediate sub-parts of the message that are not candidate “body” parts. That is, skip the first occurrence of each of "text/plain", "text/html", "multipart/related", or "multipart/alternative" (unless they are explicitly marked as attachments via *Content-Disposition: attachment*), and return all remaining parts. When applied directly to a "multipart/related", return an iterator over the all the related parts except the root part (ie: the part pointed to by the "start" parameter, or the first part if there is no "start" parameter or the "start" parameter doesn’t match the *Content-ID* of any of the parts). When applied directly to a "multipart/alternative" or a non-"multipart", return an empty iterator. iter_parts() Return an iterator over all of the immediate sub-parts of the message, which will be empty for a non-"multipart". (See also "walk()".) get_content(*args, content_manager=None, **kw) Call the "get_content()" method of the *content_manager*, passing self as the message object, and passing along any other arguments or keywords as additional arguments. If *content_manager* is not specified, use the "content_manager" specified by the current "policy". set_content(*args, content_manager=None, **kw) Call the "set_content()" method of the *content_manager*, passing self as the message object, and passing along any other arguments or keywords as additional arguments. If *content_manager* is not specified, use the "content_manager" specified by the current "policy". make_related(boundary=None) Convert a non-"multipart" message into a "multipart/related" message, moving any existing *Content-* headers and payload into a (new) first part of the "multipart". If *boundary* is specified, use it as the boundary string in the multipart, otherwise leave the boundary to be automatically created when it is needed (for example, when the message is serialized). make_alternative(boundary=None) Convert a non-"multipart" or a "multipart/related" into a "multipart/alternative", moving any existing *Content-* headers and payload into a (new) first part of the "multipart". If *boundary* is specified, use it as the boundary string in the multipart, otherwise leave the boundary to be automatically created when it is needed (for example, when the message is serialized). make_mixed(boundary=None) Convert a non-"multipart", a "multipart/related", or a "multipart-alternative" into a "multipart/mixed", moving any existing *Content-* headers and payload into a (new) first part of the "multipart". If *boundary* is specified, use it as the boundary string in the multipart, otherwise leave the boundary to be automatically created when it is needed (for example, when the message is serialized). add_related(*args, content_manager=None, **kw) If the message is a "multipart/related", create a new message object, pass all of the arguments to its "set_content()" method, and "attach()" it to the "multipart". If the message is a non-"multipart", call "make_related()" and then proceed as above. If the message is any other type of "multipart", raise a "TypeError". If *content_manager* is not specified, use the "content_manager" specified by the current "policy". If the added part has no *Content-Disposition* header, add one with the value "inline". add_alternative(*args, content_manager=None, **kw) If the message is a "multipart/alternative", create a new message object, pass all of the arguments to its "set_content()" method, and "attach()" it to the "multipart". If the message is a non-"multipart" or "multipart/related", call "make_alternative()" and then proceed as above. If the message is any other type of "multipart", raise a "TypeError". If *content_manager* is not specified, use the "content_manager" specified by the current "policy". add_attachment(*args, content_manager=None, **kw) If the message is a "multipart/mixed", create a new message object, pass all of the arguments to its "set_content()" method, and "attach()" it to the "multipart". If the message is a non-"multipart", "multipart/related", or "multipart/alternative", call "make_mixed()" and then proceed as above. If *content_manager* is not specified, use the "content_manager" specified by the current "policy". If the added part has no *Content-Disposition* header, add one with the value "attachment". This method can be used both for explicit attachments (*Content-Disposition: attachment*) and "inline" attachments (*Content-Disposition: inline*), by passing appropriate options to the "content_manager". clear() Remove the payload and all of the headers. clear_content() Remove the payload and all of the "Content-" headers, leaving all other headers intact and in their original order. "EmailMessage" objects have the following instance attributes: preamble The format of a MIME document allows for some text between the blank line following the headers, and the first multipart boundary string. Normally, this text is never visible in a MIME- aware mail reader because it falls outside the standard MIME armor. However, when viewing the raw text of the message, or when viewing the message in a non-MIME aware reader, this text can become visible. The *preamble* attribute contains this leading extra-armor text for MIME documents. When the "Parser" discovers some text after the headers but before the first boundary string, it assigns this text to the message’s *preamble* attribute. When the "Generator" is writing out the plain text representation of a MIME message, and it finds the message has a *preamble* attribute, it will write this text in the area between the headers and the first boundary. See "email.parser" and "email.generator" for details. Note that if the message object has no preamble, the *preamble* attribute will be "None". epilogue The *epilogue* attribute acts the same way as the *preamble* attribute, except that it contains text that appears between the last boundary and the end of the message. As with the "preamble", if there is no epilog text this attribute will be "None". defects The *defects* attribute contains a list of all the problems found when parsing this message. See "email.errors" for a detailed description of the possible parsing defects. class email.message.MIMEPart(policy=default) This class represents a subpart of a MIME message. It is identical to "EmailMessage", except that no *MIME-Version* headers are added when "set_content()" is called, since sub-parts do not need their own *MIME-Version* headers. -[ Footnotes ]- [1] Originally added in 3.4 as a *provisional module*. Docs for legacy message class moved to email.message.Message: Representing an email message using the compat32 API. "email.mime": Creating email and MIME objects from scratch ********************************************************** **Source code:** Lib/email/mime/ ====================================================================== This module is part of the legacy ("Compat32") email API. Its functionality is partially replaced by the "contentmanager" in the new API, but in certain applications these classes may still be useful, even in non-legacy code. Ordinarily, you get a message object structure by passing a file or some text to a parser, which parses the text and returns the root message object. However you can also build a complete message structure from scratch, or even individual "Message" objects by hand. In fact, you can also take an existing structure and add new "Message" objects, move them around, etc. This makes a very convenient interface for slicing-and-dicing MIME messages. You can create a new object structure by creating "Message" instances, adding attachments and all the appropriate headers manually. For MIME messages though, the "email" package provides some convenient subclasses to make things easier. Here are the classes: class email.mime.base.MIMEBase(_maintype, _subtype, *, policy=compat32, **_params) Module: "email.mime.base" This is the base class for all the MIME-specific subclasses of "Message". Ordinarily you won’t create instances specifically of "MIMEBase", although you could. "MIMEBase" is provided primarily as a convenient base class for more specific MIME-aware subclasses. *_maintype* is the *Content-Type* major type (e.g. *text* or *image*), and *_subtype* is the *Content-Type* minor type (e.g. *plain* or *gif*). *_params* is a parameter key/value dictionary and is passed directly to "Message.add_header". If *policy* is specified, (defaults to the "compat32" policy) it will be passed to "Message". The "MIMEBase" class always adds a *Content-Type* header (based on *_maintype*, *_subtype*, and *_params*), and a *MIME-Version* header (always set to "1.0"). Changed in version 3.6: Added *policy* keyword-only parameter. class email.mime.nonmultipart.MIMENonMultipart Module: "email.mime.nonmultipart" A subclass of "MIMEBase", this is an intermediate base class for MIME messages that are not *multipart*. The primary purpose of this class is to prevent the use of the "attach()" method, which only makes sense for *multipart* messages. If "attach()" is called, a "MultipartConversionError" exception is raised. class email.mime.multipart.MIMEMultipart(_subtype='mixed', boundary=None, _subparts=None, *, policy=compat32, **_params) Module: "email.mime.multipart" A subclass of "MIMEBase", this is an intermediate base class for MIME messages that are *multipart*. Optional *_subtype* defaults to *mixed*, but can be used to specify the subtype of the message. A *Content-Type* header of *multipart/_subtype* will be added to the message object. A *MIME-Version* header will also be added. Optional *boundary* is the multipart boundary string. When "None" (the default), the boundary is calculated when needed (for example, when the message is serialized). *_subparts* is a sequence of initial subparts for the payload. It must be possible to convert this sequence to a list. You can always attach new subparts to the message by using the "Message.attach" method. Optional *policy* argument defaults to "compat32". Additional parameters for the *Content-Type* header are taken from the keyword arguments, or passed into the *_params* argument, which is a keyword dictionary. Changed in version 3.6: Added *policy* keyword-only parameter. class email.mime.application.MIMEApplication(_data, _subtype='octet-stream', _encoder=email.encoders.encode_base64, *, policy=compat32, **_params) Module: "email.mime.application" A subclass of "MIMENonMultipart", the "MIMEApplication" class is used to represent MIME message objects of major type *application*. *_data* is a string containing the raw byte data. Optional *_subtype* specifies the MIME subtype and defaults to *octet- stream*. Optional *_encoder* is a callable (i.e. function) which will perform the actual encoding of the data for transport. This callable takes one argument, which is the "MIMEApplication" instance. It should use "get_payload()" and "set_payload()" to change the payload to encoded form. It should also add any *Content-Transfer-Encoding* or other headers to the message object as necessary. The default encoding is base64. See the "email.encoders" module for a list of the built-in encoders. Optional *policy* argument defaults to "compat32". *_params* are passed straight through to the base class constructor. Changed in version 3.6: Added *policy* keyword-only parameter. class email.mime.audio.MIMEAudio(_audiodata, _subtype=None, _encoder=email.encoders.encode_base64, *, policy=compat32, **_params) Module: "email.mime.audio" A subclass of "MIMENonMultipart", the "MIMEAudio" class is used to create MIME message objects of major type *audio*. *_audiodata* is a string containing the raw audio data. If this data can be decoded by the standard Python module "sndhdr", then the subtype will be automatically included in the *Content-Type* header. Otherwise you can explicitly specify the audio subtype via the *_subtype* argument. If the minor type could not be guessed and *_subtype* was not given, then "TypeError" is raised. Optional *_encoder* is a callable (i.e. function) which will perform the actual encoding of the audio data for transport. This callable takes one argument, which is the "MIMEAudio" instance. It should use "get_payload()" and "set_payload()" to change the payload to encoded form. It should also add any *Content-Transfer- Encoding* or other headers to the message object as necessary. The default encoding is base64. See the "email.encoders" module for a list of the built-in encoders. Optional *policy* argument defaults to "compat32". *_params* are passed straight through to the base class constructor. Changed in version 3.6: Added *policy* keyword-only parameter. class email.mime.image.MIMEImage(_imagedata, _subtype=None, _encoder=email.encoders.encode_base64, *, policy=compat32, **_params) Module: "email.mime.image" A subclass of "MIMENonMultipart", the "MIMEImage" class is used to create MIME message objects of major type *image*. *_imagedata* is a string containing the raw image data. If this data can be decoded by the standard Python module "imghdr", then the subtype will be automatically included in the *Content-Type* header. Otherwise you can explicitly specify the image subtype via the *_subtype* argument. If the minor type could not be guessed and *_subtype* was not given, then "TypeError" is raised. Optional *_encoder* is a callable (i.e. function) which will perform the actual encoding of the image data for transport. This callable takes one argument, which is the "MIMEImage" instance. It should use "get_payload()" and "set_payload()" to change the payload to encoded form. It should also add any *Content-Transfer- Encoding* or other headers to the message object as necessary. The default encoding is base64. See the "email.encoders" module for a list of the built-in encoders. Optional *policy* argument defaults to "compat32". *_params* are passed straight through to the "MIMEBase" constructor. Changed in version 3.6: Added *policy* keyword-only parameter. class email.mime.message.MIMEMessage(_msg, _subtype='rfc822', *, policy=compat32) Module: "email.mime.message" A subclass of "MIMENonMultipart", the "MIMEMessage" class is used to create MIME objects of main type *message*. *_msg* is used as the payload, and must be an instance of class "Message" (or a subclass thereof), otherwise a "TypeError" is raised. Optional *_subtype* sets the subtype of the message; it defaults to *rfc822*. Optional *policy* argument defaults to "compat32". Changed in version 3.6: Added *policy* keyword-only parameter. class email.mime.text.MIMEText(_text, _subtype='plain', _charset=None, *, policy=compat32) Module: "email.mime.text" A subclass of "MIMENonMultipart", the "MIMEText" class is used to create MIME objects of major type *text*. *_text* is the string for the payload. *_subtype* is the minor type and defaults to *plain*. *_charset* is the character set of the text and is passed as an argument to the "MIMENonMultipart" constructor; it defaults to "us- ascii" if the string contains only "ascii" code points, and "utf-8" otherwise. The *_charset* parameter accepts either a string or a "Charset" instance. Unless the *_charset* argument is explicitly set to "None", the MIMEText object created will have both a *Content-Type* header with a "charset" parameter, and a *Content-Transfer-Encoding* header. This means that a subsequent "set_payload" call will not result in an encoded payload, even if a charset is passed in the "set_payload" command. You can “reset” this behavior by deleting the "Content-Transfer-Encoding" header, after which a "set_payload" call will automatically encode the new payload (and add a new *Content-Transfer-Encoding* header). Optional *policy* argument defaults to "compat32". Changed in version 3.5: *_charset* also accepts "Charset" instances. Changed in version 3.6: Added *policy* keyword-only parameter. "email.parser": Parsing email messages ************************************** **Source code:** Lib/email/parser.py ====================================================================== Message object structures can be created in one of two ways: they can be created from whole cloth by creating an "EmailMessage" object, adding headers using the dictionary interface, and adding payload(s) using "set_content()" and related methods, or they can be created by parsing a serialized representation of the email message. The "email" package provides a standard parser that understands most email document structures, including MIME documents. You can pass the parser a bytes, string or file object, and the parser will return to you the root "EmailMessage" instance of the object structure. For simple, non-MIME messages the payload of this root object will likely be a string containing the text of the message. For MIME messages, the root object will return "True" from its "is_multipart()" method, and the subparts can be accessed via the payload manipulation methods, such as "get_body()", "iter_parts()", and "walk()". There are actually two parser interfaces available for use, the "Parser" API and the incremental "FeedParser" API. The "Parser" API is most useful if you have the entire text of the message in memory, or if the entire message lives in a file on the file system. "FeedParser" is more appropriate when you are reading the message from a stream which might block waiting for more input (such as reading an email message from a socket). The "FeedParser" can consume and parse the message incrementally, and only returns the root object when you close the parser. Note that the parser can be extended in limited ways, and of course you can implement your own parser completely from scratch. All of the logic that connects the "email" package’s bundled parser and the "EmailMessage" class is embodied in the "policy" class, so a custom parser can create message object trees any way it finds necessary by implementing custom versions of the appropriate "policy" methods. FeedParser API ============== The "BytesFeedParser", imported from the "email.feedparser" module, provides an API that is conducive to incremental parsing of email messages, such as would be necessary when reading the text of an email message from a source that can block (such as a socket). The "BytesFeedParser" can of course be used to parse an email message fully contained in a *bytes-like object*, string, or file, but the "BytesParser" API may be more convenient for such use cases. The semantics and results of the two parser APIs are identical. The "BytesFeedParser"’s API is simple; you create an instance, feed it a bunch of bytes until there’s no more to feed it, then close the parser to retrieve the root message object. The "BytesFeedParser" is extremely accurate when parsing standards-compliant messages, and it does a very good job of parsing non-compliant messages, providing information about how a message was deemed broken. It will populate a message object’s "defects" attribute with a list of any problems it found in a message. See the "email.errors" module for the list of defects that it can find. Here is the API for the "BytesFeedParser": class email.parser.BytesFeedParser(_factory=None, *, policy=policy.compat32) Create a "BytesFeedParser" instance. Optional *_factory* is a no- argument callable; if not specified use the "message_factory" from the *policy*. Call *_factory* whenever a new message object is needed. If *policy* is specified use the rules it specifies to update the representation of the message. If *policy* is not set, use the "compat32" policy, which maintains backward compatibility with the Python 3.2 version of the email package and provides "Message" as the default factory. All other policies provide "EmailMessage" as the default *_factory*. For more information on what else *policy* controls, see the "policy" documentation. Note: **The policy keyword should always be specified**; The default will change to "email.policy.default" in a future version of Python. New in version 3.2. Changed in version 3.3: Added the *policy* keyword. Changed in version 3.6: *_factory* defaults to the policy "message_factory". feed(data) Feed the parser some more data. *data* should be a *bytes-like object* containing one or more lines. The lines can be partial and the parser will stitch such partial lines together properly. The lines can have any of the three common line endings: carriage return, newline, or carriage return and newline (they can even be mixed). close() Complete the parsing of all previously fed data and return the root message object. It is undefined what happens if "feed()" is called after this method has been called. class email.parser.FeedParser(_factory=None, *, policy=policy.compat32) Works like "BytesFeedParser" except that the input to the "feed()" method must be a string. This is of limited utility, since the only way for such a message to be valid is for it to contain only ASCII text or, if "utf8" is "True", no binary attachments. Changed in version 3.3: Added the *policy* keyword. Parser API ========== The "BytesParser" class, imported from the "email.parser" module, provides an API that can be used to parse a message when the complete contents of the message are available in a *bytes-like object* or file. The "email.parser" module also provides "Parser" for parsing strings, and header-only parsers, "BytesHeaderParser" and "HeaderParser", which can be used if you’re only interested in the headers of the message. "BytesHeaderParser" and "HeaderParser" can be much faster in these situations, since they do not attempt to parse the message body, instead setting the payload to the raw body. class email.parser.BytesParser(_class=None, *, policy=policy.compat32) Create a "BytesParser" instance. The *_class* and *policy* arguments have the same meaning and semantics as the *_factory* and *policy* arguments of "BytesFeedParser". Note: **The policy keyword should always be specified**; The default will change to "email.policy.default" in a future version of Python. Changed in version 3.3: Removed the *strict* argument that was deprecated in 2.4. Added the *policy* keyword. Changed in version 3.6: *_class* defaults to the policy "message_factory". parse(fp, headersonly=False) Read all the data from the binary file-like object *fp*, parse the resulting bytes, and return the message object. *fp* must support both the "readline()" and the "read()" methods. The bytes contained in *fp* must be formatted as a block of **RFC 5322** (or, if "utf8" is "True", **RFC 6532**) style headers and header continuation lines, optionally preceded by an envelope header. The header block is terminated either by the end of the data or by a blank line. Following the header block is the body of the message (which may contain MIME-encoded subparts, including subparts with a *Content-Transfer-Encoding* of "8bit"). Optional *headersonly* is a flag specifying whether to stop parsing after reading the headers or not. The default is "False", meaning it parses the entire contents of the file. parsebytes(bytes, headersonly=False) Similar to the "parse()" method, except it takes a *bytes-like object* instead of a file-like object. Calling this method on a *bytes-like object* is equivalent to wrapping *bytes* in a "BytesIO" instance first and calling "parse()". Optional *headersonly* is as with the "parse()" method. New in version 3.2. class email.parser.BytesHeaderParser(_class=None, *, policy=policy.compat32) Exactly like "BytesParser", except that *headersonly* defaults to "True". New in version 3.3. class email.parser.Parser(_class=None, *, policy=policy.compat32) This class is parallel to "BytesParser", but handles string input. Changed in version 3.3: Removed the *strict* argument. Added the *policy* keyword. Changed in version 3.6: *_class* defaults to the policy "message_factory". parse(fp, headersonly=False) Read all the data from the text-mode file-like object *fp*, parse the resulting text, and return the root message object. *fp* must support both the "readline()" and the "read()" methods on file-like objects. Other than the text mode requirement, this method operates like "BytesParser.parse()". parsestr(text, headersonly=False) Similar to the "parse()" method, except it takes a string object instead of a file-like object. Calling this method on a string is equivalent to wrapping *text* in a "StringIO" instance first and calling "parse()". Optional *headersonly* is as with the "parse()" method. class email.parser.HeaderParser(_class=None, *, policy=policy.compat32) Exactly like "Parser", except that *headersonly* defaults to "True". Since creating a message object structure from a string or a file object is such a common task, four functions are provided as a convenience. They are available in the top-level "email" package namespace. email.message_from_bytes(s, _class=None, *, policy=policy.compat32) Return a message object structure from a *bytes-like object*. This is equivalent to "BytesParser().parsebytes(s)". Optional *_class* and *policy* are interpreted as with the "BytesParser" class constructor. New in version 3.2. Changed in version 3.3: Removed the *strict* argument. Added the *policy* keyword. email.message_from_binary_file(fp, _class=None, *, policy=policy.compat32) Return a message object structure tree from an open binary *file object*. This is equivalent to "BytesParser().parse(fp)". *_class* and *policy* are interpreted as with the "BytesParser" class constructor. New in version 3.2. Changed in version 3.3: Removed the *strict* argument. Added the *policy* keyword. email.message_from_string(s, _class=None, *, policy=policy.compat32) Return a message object structure from a string. This is equivalent to "Parser().parsestr(s)". *_class* and *policy* are interpreted as with the "Parser" class constructor. Changed in version 3.3: Removed the *strict* argument. Added the *policy* keyword. email.message_from_file(fp, _class=None, *, policy=policy.compat32) Return a message object structure tree from an open *file object*. This is equivalent to "Parser().parse(fp)". *_class* and *policy* are interpreted as with the "Parser" class constructor. Changed in version 3.3: Removed the *strict* argument. Added the *policy* keyword. Changed in version 3.6: *_class* defaults to the policy "message_factory". Here’s an example of how you might use "message_from_bytes()" at an interactive Python prompt: >>> import email >>> msg = email.message_from_bytes(myBytes) Additional notes ================ Here are some notes on the parsing semantics: * Most non-*multipart* type messages are parsed as a single message object with a string payload. These objects will return "False" for "is_multipart()", and "iter_parts()" will yield an empty list. * All *multipart* type messages will be parsed as a container message object with a list of sub-message objects for their payload. The outer container message will return "True" for "is_multipart()", and "iter_parts()" will yield a list of subparts. * Most messages with a content type of *message/** (such as *message /delivery-status* and *message/rfc822*) will also be parsed as container object containing a list payload of length 1. Their "is_multipart()" method will return "True". The single element yielded by "iter_parts()" will be a sub-message object. * Some non-standards-compliant messages may not be internally consistent about their *multipart*-edness. Such messages may have a *Content-Type* header of type *multipart*, but their "is_multipart()" method may return "False". If such messages were parsed with the "FeedParser", they will have an instance of the "MultipartInvariantViolationDefect" class in their *defects* attribute list. See "email.errors" for details. "email.policy": Policy Objects ****************************** New in version 3.3. **Source code:** Lib/email/policy.py ====================================================================== The "email" package’s prime focus is the handling of email messages as described by the various email and MIME RFCs. However, the general format of email messages (a block of header fields each consisting of a name followed by a colon followed by a value, the whole block followed by a blank line and an arbitrary ‘body’), is a format that has found utility outside of the realm of email. Some of these uses conform fairly closely to the main email RFCs, some do not. Even when working with email, there are times when it is desirable to break strict compliance with the RFCs, such as generating emails that interoperate with email servers that do not themselves follow the standards, or that implement extensions you want to use in ways that violate the standards. Policy objects give the email package the flexibility to handle all these disparate use cases. A "Policy" object encapsulates a set of attributes and methods that control the behavior of various components of the email package during use. "Policy" instances can be passed to various classes and methods in the email package to alter the default behavior. The settable values and their defaults are described below. There is a default policy used by all classes in the email package. For all of the "parser" classes and the related convenience functions, and for the "Message" class, this is the "Compat32" policy, via its corresponding pre-defined instance "compat32". This policy provides for complete backward compatibility (in some cases, including bug compatibility) with the pre-Python3.3 version of the email package. This default value for the *policy* keyword to "EmailMessage" is the "EmailPolicy" policy, via its pre-defined instance "default". When a "Message" or "EmailMessage" object is created, it acquires a policy. If the message is created by a "parser", a policy passed to the parser will be the policy used by the message it creates. If the message is created by the program, then the policy can be specified when it is created. When a message is passed to a "generator", the generator uses the policy from the message by default, but you can also pass a specific policy to the generator that will override the one stored on the message object. The default value for the *policy* keyword for the "email.parser" classes and the parser convenience functions **will be changing** in a future version of Python. Therefore you should **always specify explicitly which policy you want to use** when calling any of the classes and functions described in the "parser" module. The first part of this documentation covers the features of "Policy", an *abstract base class* that defines the features that are common to all policy objects, including "compat32". This includes certain hook methods that are called internally by the email package, which a custom policy could override to obtain different behavior. The second part describes the concrete classes "EmailPolicy" and "Compat32", which implement the hooks that provide the standard behavior and the backward compatible behavior and features, respectively. "Policy" instances are immutable, but they can be cloned, accepting the same keyword arguments as the class constructor and returning a new "Policy" instance that is a copy of the original but with the specified attributes values changed. As an example, the following code could be used to read an email message from a file on disk and pass it to the system "sendmail" program on a Unix system: >>> from email import message_from_binary_file >>> from email.generator import BytesGenerator >>> from email import policy >>> from subprocess import Popen, PIPE >>> with open('mymsg.txt', 'rb') as f: ... msg = message_from_binary_file(f, policy=policy.default) >>> p = Popen(['sendmail', msg['To'].addresses[0]], stdin=PIPE) >>> g = BytesGenerator(p.stdin, policy=msg.policy.clone(linesep='\r\n')) >>> g.flatten(msg) >>> p.stdin.close() >>> rc = p.wait() Here we are telling "BytesGenerator" to use the RFC correct line separator characters when creating the binary string to feed into "sendmail's" "stdin", where the default policy would use "\n" line separators. Some email package methods accept a *policy* keyword argument, allowing the policy to be overridden for that method. For example, the following code uses the "as_bytes()" method of the *msg* object from the previous example and writes the message to a file using the native line separators for the platform on which it is running: >>> import os >>> with open('converted.txt', 'wb') as f: ... f.write(msg.as_bytes(policy=msg.policy.clone(linesep=os.linesep))) 17 Policy objects can also be combined using the addition operator, producing a policy object whose settings are a combination of the non- default values of the summed objects: >>> compat_SMTP = policy.compat32.clone(linesep='\r\n') >>> compat_strict = policy.compat32.clone(raise_on_defect=True) >>> compat_strict_SMTP = compat_SMTP + compat_strict This operation is not commutative; that is, the order in which the objects are added matters. To illustrate: >>> policy100 = policy.compat32.clone(max_line_length=100) >>> policy80 = policy.compat32.clone(max_line_length=80) >>> apolicy = policy100 + policy80 >>> apolicy.max_line_length 80 >>> apolicy = policy80 + policy100 >>> apolicy.max_line_length 100 class email.policy.Policy(**kw) This is the *abstract base class* for all policy classes. It provides default implementations for a couple of trivial methods, as well as the implementation of the immutability property, the "clone()" method, and the constructor semantics. The constructor of a policy class can be passed various keyword arguments. The arguments that may be specified are any non-method properties on this class, plus any additional non-method properties on the concrete class. A value specified in the constructor will override the default value for the corresponding attribute. This class defines the following properties, and thus values for the following may be passed in the constructor of any policy class: max_line_length The maximum length of any line in the serialized output, not counting the end of line character(s). Default is 78, per **RFC 5322**. A value of "0" or "None" indicates that no line wrapping should be done at all. linesep The string to be used to terminate lines in serialized output. The default is "\n" because that’s the internal end-of-line discipline used by Python, though "\r\n" is required by the RFCs. cte_type Controls the type of Content Transfer Encodings that may be or are required to be used. The possible values are: +----------+-----------------------------------------------------------------+ | "7bit" | all data must be “7 bit clean” (ASCII-only). This means that | | | where necessary data will be encoded using either quoted- | | | printable or base64 encoding. | +----------+-----------------------------------------------------------------+ | "8bit" | data is not constrained to be 7 bit clean. Data in headers is | | | still required to be ASCII-only and so will be encoded (see | | | "fold_binary()" and "utf8" below for exceptions), but body | | | parts may use the "8bit" CTE. | +----------+-----------------------------------------------------------------+ A "cte_type" value of "8bit" only works with "BytesGenerator", not "Generator", because strings cannot contain binary data. If a "Generator" is operating under a policy that specifies "cte_type=8bit", it will act as if "cte_type" is "7bit". raise_on_defect If "True", any defects encountered will be raised as errors. If "False" (the default), defects will be passed to the "register_defect()" method. mangle_from_ If "True", lines starting with *“From “* in the body are escaped by putting a ">" in front of them. This parameter is used when the message is being serialized by a generator. Default: "False". New in version 3.5: The *mangle_from_* parameter. message_factory A factory function for constructing a new empty message object. Used by the parser when building messages. Defaults to "None", in which case "Message" is used. New in version 3.6. verify_generated_headers If "True" (the default), the generator will raise "HeaderWriteError" instead of writing a header that is improperly folded or delimited, such that it would be parsed as multiple headers or joined with adjacent data. Such headers can be generated by custom header classes or bugs in the "email" module. As it’s a security feature, this defaults to "True" even in the "Compat32" policy. For backwards compatible, but unsafe, behavior, it must be set to "False" explicitly. New in version 3.9.20. The following "Policy" method is intended to be called by code using the email library to create policy instances with custom settings: clone(**kw) Return a new "Policy" instance whose attributes have the same values as the current instance, except where those attributes are given new values by the keyword arguments. The remaining "Policy" methods are called by the email package code, and are not intended to be called by an application using the email package. A custom policy must implement all of these methods. handle_defect(obj, defect) Handle a *defect* found on *obj*. When the email package calls this method, *defect* will always be a subclass of "Defect". The default implementation checks the "raise_on_defect" flag. If it is "True", *defect* is raised as an exception. If it is "False" (the default), *obj* and *defect* are passed to "register_defect()". register_defect(obj, defect) Register a *defect* on *obj*. In the email package, *defect* will always be a subclass of "Defect". The default implementation calls the "append" method of the "defects" attribute of *obj*. When the email package calls "handle_defect", *obj* will normally have a "defects" attribute that has an "append" method. Custom object types used with the email package (for example, custom "Message" objects) should also provide such an attribute, otherwise defects in parsed messages will raise unexpected errors. header_max_count(name) Return the maximum allowed number of headers named *name*. Called when a header is added to an "EmailMessage" or "Message" object. If the returned value is not "0" or "None", and there are already a number of headers with the name *name* greater than or equal to the value returned, a "ValueError" is raised. Because the default behavior of "Message.__setitem__" is to append the value to the list of headers, it is easy to create duplicate headers without realizing it. This method allows certain headers to be limited in the number of instances of that header that may be added to a "Message" programmatically. (The limit is not observed by the parser, which will faithfully produce as many headers as exist in the message being parsed.) The default implementation returns "None" for all header names. header_source_parse(sourcelines) The email package calls this method with a list of strings, each string ending with the line separation characters found in the source being parsed. The first line includes the field header name and separator. All whitespace in the source is preserved. The method should return the "(name, value)" tuple that is to be stored in the "Message" to represent the parsed header. If an implementation wishes to retain compatibility with the existing email package policies, *name* should be the case preserved name (all characters up to the ‘":"’ separator), while *value* should be the unfolded value (all line separator characters removed, but whitespace kept intact), stripped of leading whitespace. *sourcelines* may contain surrogateescaped binary data. There is no default implementation header_store_parse(name, value) The email package calls this method with the name and value provided by the application program when the application program is modifying a "Message" programmatically (as opposed to a "Message" created by a parser). The method should return the "(name, value)" tuple that is to be stored in the "Message" to represent the header. If an implementation wishes to retain compatibility with the existing email package policies, the *name* and *value* should be strings or string subclasses that do not change the content of the passed in arguments. There is no default implementation header_fetch_parse(name, value) The email package calls this method with the *name* and *value* currently stored in the "Message" when that header is requested by the application program, and whatever the method returns is what is passed back to the application as the value of the header being retrieved. Note that there may be more than one header with the same name stored in the "Message"; the method is passed the specific name and value of the header destined to be returned to the application. *value* may contain surrogateescaped binary data. There should be no surrogateescaped binary data in the value returned by the method. There is no default implementation fold(name, value) The email package calls this method with the *name* and *value* currently stored in the "Message" for a given header. The method should return a string that represents that header “folded” correctly (according to the policy settings) by composing the *name* with the *value* and inserting "linesep" characters at the appropriate places. See **RFC 5322** for a discussion of the rules for folding email headers. *value* may contain surrogateescaped binary data. There should be no surrogateescaped binary data in the string returned by the method. fold_binary(name, value) The same as "fold()", except that the returned value should be a bytes object rather than a string. *value* may contain surrogateescaped binary data. These could be converted back into binary data in the returned bytes object. class email.policy.EmailPolicy(**kw) This concrete "Policy" provides behavior that is intended to be fully compliant with the current email RFCs. These include (but are not limited to) **RFC 5322**, **RFC 2047**, and the current MIME RFCs. This policy adds new header parsing and folding algorithms. Instead of simple strings, headers are "str" subclasses with attributes that depend on the type of the field. The parsing and folding algorithm fully implement **RFC 2047** and **RFC 5322**. The default value for the "message_factory" attribute is "EmailMessage". In addition to the settable attributes listed above that apply to all policies, this policy adds the following additional attributes: New in version 3.6: [1] utf8 If "False", follow **RFC 5322**, supporting non-ASCII characters in headers by encoding them as “encoded words”. If "True", follow **RFC 6532** and use "utf-8" encoding for headers. Messages formatted in this way may be passed to SMTP servers that support the "SMTPUTF8" extension (**RFC 6531**). refold_source If the value for a header in the "Message" object originated from a "parser" (as opposed to being set by a program), this attribute indicates whether or not a generator should refold that value when transforming the message back into serialized form. The possible values are: +----------+-----------------------------------------------------------------+ | "none" | all source values use original folding | +----------+-----------------------------------------------------------------+ | "long" | source values that have any line that is longer than | | | "max_line_length" will be refolded | +----------+-----------------------------------------------------------------+ | "all" | all values are refolded. | +----------+-----------------------------------------------------------------+ The default is "long". header_factory A callable that takes two arguments, "name" and "value", where "name" is a header field name and "value" is an unfolded header field value, and returns a string subclass that represents that header. A default "header_factory" (see "headerregistry") is provided that supports custom parsing for the various address and date **RFC 5322** header field types, and the major MIME header field stypes. Support for additional custom parsing will be added in the future. content_manager An object with at least two methods: get_content and set_content. When the "get_content()" or "set_content()" method of an "EmailMessage" object is called, it calls the corresponding method of this object, passing it the message object as its first argument, and any arguments or keywords that were passed to it as additional arguments. By default "content_manager" is set to "raw_data_manager". New in version 3.4. The class provides the following concrete implementations of the abstract methods of "Policy": header_max_count(name) Returns the value of the "max_count" attribute of the specialized class used to represent the header with the given name. header_source_parse(sourcelines) The name is parsed as everything up to the ‘":"’ and returned unmodified. The value is determined by stripping leading whitespace off the remainder of the first line, joining all subsequent lines together, and stripping any trailing carriage return or linefeed characters. header_store_parse(name, value) The name is returned unchanged. If the input value has a "name" attribute and it matches *name* ignoring case, the value is returned unchanged. Otherwise the *name* and *value* are passed to "header_factory", and the resulting header object is returned as the value. In this case a "ValueError" is raised if the input value contains CR or LF characters. header_fetch_parse(name, value) If the value has a "name" attribute, it is returned to unmodified. Otherwise the *name*, and the *value* with any CR or LF characters removed, are passed to the "header_factory", and the resulting header object is returned. Any surrogateescaped bytes get turned into the unicode unknown-character glyph. fold(name, value) Header folding is controlled by the "refold_source" policy setting. A value is considered to be a ‘source value’ if and only if it does not have a "name" attribute (having a "name" attribute means it is a header object of some sort). If a source value needs to be refolded according to the policy, it is converted into a header object by passing the *name* and the *value* with any CR and LF characters removed to the "header_factory". Folding of a header object is done by calling its "fold" method with the current policy. Source values are split into lines using "splitlines()". If the value is not to be refolded, the lines are rejoined using the "linesep" from the policy and returned. The exception is lines containing non-ascii binary data. In that case the value is refolded regardless of the "refold_source" setting, which causes the binary data to be CTE encoded using the "unknown-8bit" charset. fold_binary(name, value) The same as "fold()" if "cte_type" is "7bit", except that the returned value is bytes. If "cte_type" is "8bit", non-ASCII binary data is converted back into bytes. Headers with binary data are not refolded, regardless of the "refold_header" setting, since there is no way to know whether the binary data consists of single byte characters or multibyte characters. The following instances of "EmailPolicy" provide defaults suitable for specific application domains. Note that in the future the behavior of these instances (in particular the "HTTP" instance) may be adjusted to conform even more closely to the RFCs relevant to their domains. email.policy.default An instance of "EmailPolicy" with all defaults unchanged. This policy uses the standard Python "\n" line endings rather than the RFC-correct "\r\n". email.policy.SMTP Suitable for serializing messages in conformance with the email RFCs. Like "default", but with "linesep" set to "\r\n", which is RFC compliant. email.policy.SMTPUTF8 The same as "SMTP" except that "utf8" is "True". Useful for serializing messages to a message store without using encoded words in the headers. Should only be used for SMTP transmission if the sender or recipient addresses have non-ASCII characters (the "smtplib.SMTP.send_message()" method handles this automatically). email.policy.HTTP Suitable for serializing headers with for use in HTTP traffic. Like "SMTP" except that "max_line_length" is set to "None" (unlimited). email.policy.strict Convenience instance. The same as "default" except that "raise_on_defect" is set to "True". This allows any policy to be made strict by writing: somepolicy + policy.strict With all of these "EmailPolicies", the effective API of the email package is changed from the Python 3.2 API in the following ways: * Setting a header on a "Message" results in that header being parsed and a header object created. * Fetching a header value from a "Message" results in that header being parsed and a header object created and returned. * Any header object, or any header that is refolded due to the policy settings, is folded using an algorithm that fully implements the RFC folding algorithms, including knowing where encoded words are required and allowed. From the application view, this means that any header obtained through the "EmailMessage" is a header object with extra attributes, whose string value is the fully decoded unicode value of the header. Likewise, a header may be assigned a new value, or a new header created, using a unicode string, and the policy will take care of converting the unicode string into the correct RFC encoded form. The header objects and their attributes are described in "headerregistry". class email.policy.Compat32(**kw) This concrete "Policy" is the backward compatibility policy. It replicates the behavior of the email package in Python 3.2. The "policy" module also defines an instance of this class, "compat32", that is used as the default policy. Thus the default behavior of the email package is to maintain compatibility with Python 3.2. The following attributes have values that are different from the "Policy" default: mangle_from_ The default is "True". The class provides the following concrete implementations of the abstract methods of "Policy": header_source_parse(sourcelines) The name is parsed as everything up to the ‘":"’ and returned unmodified. The value is determined by stripping leading whitespace off the remainder of the first line, joining all subsequent lines together, and stripping any trailing carriage return or linefeed characters. header_store_parse(name, value) The name and value are returned unmodified. header_fetch_parse(name, value) If the value contains binary data, it is converted into a "Header" object using the "unknown-8bit" charset. Otherwise it is returned unmodified. fold(name, value) Headers are folded using the "Header" folding algorithm, which preserves existing line breaks in the value, and wraps each resulting line to the "max_line_length". Non-ASCII binary data are CTE encoded using the "unknown-8bit" charset. fold_binary(name, value) Headers are folded using the "Header" folding algorithm, which preserves existing line breaks in the value, and wraps each resulting line to the "max_line_length". If "cte_type" is "7bit", non-ascii binary data is CTE encoded using the "unknown- 8bit" charset. Otherwise the original source header is used, with its existing line breaks and any (RFC invalid) binary data it may contain. email.policy.compat32 An instance of "Compat32", providing backward compatibility with the behavior of the email package in Python 3.2. -[ Footnotes ]- [1] Originally added in 3.3 as a *provisional feature*. "email" — An email and MIME handling package ******************************************** **Source code:** Lib/email/__init__.py ====================================================================== The "email" package is a library for managing email messages. It is specifically *not* designed to do any sending of email messages to SMTP (**RFC 2821**), NNTP, or other servers; those are functions of modules such as "smtplib" and "nntplib". The "email" package attempts to be as RFC-compliant as possible, supporting **RFC 5322** and **RFC 6532**, as well as such MIME-related RFCs as **RFC 2045**, **RFC 2046**, **RFC 2047**, **RFC 2183**, and **RFC 2231**. The overall structure of the email package can be divided into three major components, plus a fourth component that controls the behavior of the other components. The central component of the package is an “object model” that represents email messages. An application interacts with the package primarily through the object model interface defined in the "message" sub-module. The application can use this API to ask questions about an existing email, to construct a new email, or to add or remove email subcomponents that themselves use the same object model interface. That is, following the nature of email messages and their MIME subcomponents, the email object model is a tree structure of objects that all provide the "EmailMessage" API. The other two major components of the package are the "parser" and the "generator". The parser takes the serialized version of an email message (a stream of bytes) and converts it into a tree of "EmailMessage" objects. The generator takes an "EmailMessage" and turns it back into a serialized byte stream. (The parser and generator also handle streams of text characters, but this usage is discouraged as it is too easy to end up with messages that are not valid in one way or another.) The control component is the "policy" module. Every "EmailMessage", every "generator", and every "parser" has an associated "policy" object that controls its behavior. Usually an application only needs to specify the policy when an "EmailMessage" is created, either by directly instantiating an "EmailMessage" to create a new email, or by parsing an input stream using a "parser". But the policy can be changed when the message is serialized using a "generator". This allows, for example, a generic email message to be parsed from disk, but to serialize it using standard SMTP settings when sending it to an email server. The email package does its best to hide the details of the various governing RFCs from the application. Conceptually the application should be able to treat the email message as a structured tree of unicode text and binary attachments, without having to worry about how these are represented when serialized. In practice, however, it is often necessary to be aware of at least some of the rules governing MIME messages and their structure, specifically the names and nature of the MIME “content types” and how they identify multipart documents. For the most part this knowledge should only be required for more complex applications, and even then it should only be the high level structure in question, and not the details of how those structures are represented. Since MIME content types are used widely in modern internet software (not just email), this will be a familiar concept to many programmers. The following sections describe the functionality of the "email" package. We start with the "message" object model, which is the primary interface an application will use, and follow that with the "parser" and "generator" components. Then we cover the "policy" controls, which completes the treatment of the main components of the library. The next three sections cover the exceptions the package may raise and the defects (non-compliance with the RFCs) that the "parser" may detect. Then we cover the "headerregistry" and the "contentmanager" sub-components, which provide tools for doing more detailed manipulation of headers and payloads, respectively. Both of these components contain features relevant to consuming and producing non- trivial messages, but also document their extensibility APIs, which will be of interest to advanced applications. Following those is a set of examples of using the fundamental parts of the APIs covered in the preceding sections. The foregoing represent the modern (unicode friendly) API of the email package. The remaining sections, starting with the "Message" class, cover the legacy "compat32" API that deals much more directly with the details of how email messages are represented. The "compat32" API does *not* hide the details of the RFCs from the application, but for applications that need to operate at that level, they can be useful tools. This documentation is also relevant for applications that are still using the "compat32" API for backward compatibility reasons. Changed in version 3.6: Docs reorganized and rewritten to promote the new "EmailMessage"/"EmailPolicy" API. Contents of the "email" package documentation: * "email.message": Representing an email message * "email.parser": Parsing email messages * FeedParser API * Parser API * Additional notes * "email.generator": Generating MIME documents * "email.policy": Policy Objects * "email.errors": Exception and Defect classes * "email.headerregistry": Custom Header Objects * "email.contentmanager": Managing MIME Content * Content Manager Instances * "email": Examples Legacy API: * "email.message.Message": Representing an email message using the "compat32" API * "email.mime": Creating email and MIME objects from scratch * "email.header": Internationalized headers * "email.charset": Representing character sets * "email.encoders": Encoders * "email.utils": Miscellaneous utilities * "email.iterators": Iterators See also: Module "smtplib" SMTP (Simple Mail Transport Protocol) client Module "poplib" POP (Post Office Protocol) client Module "imaplib" IMAP (Internet Message Access Protocol) client Module "nntplib" NNTP (Net News Transport Protocol) client Module "mailbox" Tools for creating, reading, and managing collections of messages on disk using a variety standard formats. Module "smtpd" SMTP server framework (primarily useful for testing) "email.utils": Miscellaneous utilities ************************************** **Source code:** Lib/email/utils.py ====================================================================== There are a couple of useful utilities provided in the "email.utils" module: email.utils.localtime(dt=None) Return local time as an aware datetime object. If called without arguments, return current time. Otherwise *dt* argument should be a "datetime" instance, and it is converted to the local time zone according to the system time zone database. If *dt* is naive (that is, "dt.tzinfo" is "None"), it is assumed to be in local time. In this case, a positive or zero value for *isdst* causes "localtime" to presume initially that summer time (for example, Daylight Saving Time) is or is not (respectively) in effect for the specified time. A negative value for *isdst* causes the "localtime" to attempt to divine whether summer time is in effect for the specified time. New in version 3.3. email.utils.make_msgid(idstring=None, domain=None) Returns a string suitable for an **RFC 2822**-compliant *Message- ID* header. Optional *idstring* if given, is a string used to strengthen the uniqueness of the message id. Optional *domain* if given provides the portion of the msgid after the ‘@’. The default is the local hostname. It is not normally necessary to override this default, but may be useful certain cases, such as a constructing distributed system that uses a consistent domain name across multiple hosts. Changed in version 3.2: Added the *domain* keyword. The remaining functions are part of the legacy ("Compat32") email API. There is no need to directly use these with the new API, since the parsing and formatting they provide is done automatically by the header parsing machinery of the new API. email.utils.quote(str) Return a new string with backslashes in *str* replaced by two backslashes, and double quotes replaced by backslash-double quote. email.utils.unquote(str) Return a new string which is an *unquoted* version of *str*. If *str* ends and begins with double quotes, they are stripped off. Likewise if *str* ends and begins with angle brackets, they are stripped off. email.utils.parseaddr(address, *, strict=True) Parse address – which should be the value of some address- containing field such as *To* or *Cc* – into its constituent *realname* and *email address* parts. Returns a tuple of that information, unless the parse fails, in which case a 2-tuple of "('', '')" is returned. If *strict* is true, use a strict parser which rejects malformed inputs. Changed in version 3.9.20: Add *strict* optional parameter and reject malformed inputs by default. email.utils.formataddr(pair, charset='utf-8') The inverse of "parseaddr()", this takes a 2-tuple of the form "(realname, email_address)" and returns the string value suitable for a *To* or *Cc* header. If the first element of *pair* is false, then the second element is returned unmodified. Optional *charset* is the character set that will be used in the **RFC 2047** encoding of the "realname" if the "realname" contains non-ASCII characters. Can be an instance of "str" or a "Charset". Defaults to "utf-8". Changed in version 3.3: Added the *charset* option. email.utils.getaddresses(fieldvalues, *, strict=True) This method returns a list of 2-tuples of the form returned by "parseaddr()". *fieldvalues* is a sequence of header field values as might be returned by "Message.get_all". If *strict* is true, use a strict parser which rejects malformed inputs. Here’s a simple example that gets all the recipients of a message: from email.utils import getaddresses tos = msg.get_all('to', []) ccs = msg.get_all('cc', []) resent_tos = msg.get_all('resent-to', []) resent_ccs = msg.get_all('resent-cc', []) all_recipients = getaddresses(tos + ccs + resent_tos + resent_ccs) Changed in version 3.9.20: Add *strict* optional parameter and reject malformed inputs by default. email.utils.parsedate(date) Attempts to parse a date according to the rules in **RFC 2822**. however, some mailers don’t follow that format as specified, so "parsedate()" tries to guess correctly in such cases. *date* is a string containing an **RFC 2822** date, such as ""Mon, 20 Nov 1995 19:12:08 -0500"". If it succeeds in parsing the date, "parsedate()" returns a 9-tuple that can be passed directly to "time.mktime()"; otherwise "None" will be returned. Note that indexes 6, 7, and 8 of the result tuple are not usable. email.utils.parsedate_tz(date) Performs the same function as "parsedate()", but returns either "None" or a 10-tuple; the first 9 elements make up a tuple that can be passed directly to "time.mktime()", and the tenth is the offset of the date’s timezone from UTC (which is the official term for Greenwich Mean Time) [1]. If the input string has no timezone, the last element of the tuple returned is "0", which represents UTC. Note that indexes 6, 7, and 8 of the result tuple are not usable. email.utils.parsedate_to_datetime(date) The inverse of "format_datetime()". Performs the same function as "parsedate()", but on success returns a "datetime". If the input date has a timezone of "-0000", the "datetime" will be a naive "datetime", and if the date is conforming to the RFCs it will represent a time in UTC but with no indication of the actual source timezone of the message the date comes from. If the input date has any other valid timezone offset, the "datetime" will be an aware "datetime" with the corresponding a "timezone" "tzinfo". New in version 3.3. email.utils.mktime_tz(tuple) Turn a 10-tuple as returned by "parsedate_tz()" into a UTC timestamp (seconds since the Epoch). If the timezone item in the tuple is "None", assume local time. email.utils.formatdate(timeval=None, localtime=False, usegmt=False) Returns a date string as per **RFC 2822**, e.g.: Fri, 09 Nov 2001 01:08:47 -0000 Optional *timeval* if given is a floating point time value as accepted by "time.gmtime()" and "time.localtime()", otherwise the current time is used. Optional *localtime* is a flag that when "True", interprets *timeval*, and returns a date relative to the local timezone instead of UTC, properly taking daylight savings time into account. The default is "False" meaning UTC is used. Optional *usegmt* is a flag that when "True", outputs a date string with the timezone as an ascii string "GMT", rather than a numeric "-0000". This is needed for some protocols (such as HTTP). This only applies when *localtime* is "False". The default is "False". email.utils.format_datetime(dt, usegmt=False) Like "formatdate", but the input is a "datetime" instance. If it is a naive datetime, it is assumed to be “UTC with no information about the source timezone”, and the conventional "-0000" is used for the timezone. If it is an aware "datetime", then the numeric timezone offset is used. If it is an aware timezone with offset zero, then *usegmt* may be set to "True", in which case the string "GMT" is used instead of the numeric timezone offset. This provides a way to generate standards conformant HTTP date headers. New in version 3.3. email.utils.decode_rfc2231(s) Decode the string *s* according to **RFC 2231**. email.utils.encode_rfc2231(s, charset=None, language=None) Encode the string *s* according to **RFC 2231**. Optional *charset* and *language*, if given is the character set name and language name to use. If neither is given, *s* is returned as-is. If *charset* is given but *language* is not, the string is encoded using the empty string for *language*. email.utils.collapse_rfc2231_value(value, errors='replace', fallback_charset='us-ascii') When a header parameter is encoded in **RFC 2231** format, "Message.get_param" may return a 3-tuple containing the character set, language, and value. "collapse_rfc2231_value()" turns this into a unicode string. Optional *errors* is passed to the *errors* argument of "str"’s "encode()" method; it defaults to "'replace'". Optional *fallback_charset* specifies the character set to use if the one in the **RFC 2231** header is not known by Python; it defaults to "'us-ascii'". For convenience, if the *value* passed to "collapse_rfc2231_value()" is not a tuple, it should be a string and it is returned unquoted. email.utils.decode_params(params) Decode parameters list according to **RFC 2231**. *params* is a sequence of 2-tuples containing elements of the form "(content- type, string-value)". -[ Footnotes ]- [1] Note that the sign of the timezone offset is the opposite of the sign of the "time.timezone" variable for the same timezone; the latter variable follows the POSIX standard while this module follows **RFC 2822**. "ensurepip" — Bootstrapping the "pip" installer *********************************************** New in version 3.4. ====================================================================== The "ensurepip" package provides support for bootstrapping the "pip" installer into an existing Python installation or virtual environment. This bootstrapping approach reflects the fact that "pip" is an independent project with its own release cycle, and the latest available stable version is bundled with maintenance and feature releases of the CPython reference interpreter. In most cases, end users of Python shouldn’t need to invoke this module directly (as "pip" should be bootstrapped by default), but it may be needed if installing "pip" was skipped when installing Python (or when creating a virtual environment) or after explicitly uninstalling "pip". Note: This module *does not* access the internet. All of the components needed to bootstrap "pip" are included as internal parts of the package. See also: Installing Python Modules The end user guide for installing Python packages **PEP 453**: Explicit bootstrapping of pip in Python installations The original rationale and specification for this module. Command line interface ====================== The command line interface is invoked using the interpreter’s "-m" switch. The simplest possible invocation is: python -m ensurepip This invocation will install "pip" if it is not already installed, but otherwise does nothing. To ensure the installed version of "pip" is at least as recent as the one bundled with "ensurepip", pass the "-- upgrade" option: python -m ensurepip --upgrade By default, "pip" is installed into the current virtual environment (if one is active) or into the system site packages (if there is no active virtual environment). The installation location can be controlled through two additional command line options: * "--root ": Installs "pip" relative to the given root directory rather than the root of the currently active virtual environment (if any) or the default root for the current Python installation. * "--user": Installs "pip" into the user site packages directory rather than globally for the current Python installation (this option is not permitted inside an active virtual environment). By default, the scripts "pipX" and "pipX.Y" will be installed (where X.Y stands for the version of Python used to invoke "ensurepip"). The scripts installed can be controlled through two additional command line options: * "--altinstall": if an alternate installation is requested, the "pipX" script will *not* be installed. * "--default-pip": if a “default pip” installation is requested, the "pip" script will be installed in addition to the two regular scripts. Providing both of the script selection options will trigger an exception. Module API ========== "ensurepip" exposes two functions for programmatic use: ensurepip.version() Returns a string specifying the bundled version of pip that will be installed when bootstrapping an environment. ensurepip.bootstrap(root=None, upgrade=False, user=False, altinstall=False, default_pip=False, verbosity=0) Bootstraps "pip" into the current or designated environment. *root* specifies an alternative root directory to install relative to. If *root* is "None", then installation uses the default install location for the current environment. *upgrade* indicates whether or not to upgrade an existing installation of an earlier version of "pip" to the bundled version. *user* indicates whether to use the user scheme rather than installing globally. By default, the scripts "pipX" and "pipX.Y" will be installed (where X.Y stands for the current version of Python). If *altinstall* is set, then "pipX" will *not* be installed. If *default_pip* is set, then "pip" will be installed in addition to the two regular scripts. Setting both *altinstall* and *default_pip* will trigger "ValueError". *verbosity* controls the level of output to "sys.stdout" from the bootstrapping operation. Raises an auditing event "ensurepip.bootstrap" with argument "root". Note: The bootstrapping process has side effects on both "sys.path" and "os.environ". Invoking the command line interface in a subprocess instead allows these side effects to be avoided. Note: The bootstrapping process may install additional modules required by "pip", but other software should not assume those dependencies will always be present by default (as the dependencies may be removed in a future version of "pip"). "enum" — Support for enumerations ********************************* New in version 3.4. **Source code:** Lib/enum.py ====================================================================== An enumeration is a set of symbolic names (members) bound to unique, constant values. Within an enumeration, the members can be compared by identity, and the enumeration itself can be iterated over. Note: Case of Enum MembersBecause Enums are used to represent constants we recommend using UPPER_CASE names for enum members, and will be using that style in our examples. Module Contents =============== This module defines four enumeration classes that can be used to define unique sets of names and values: "Enum", "IntEnum", "Flag", and "IntFlag". It also defines one decorator, "unique()", and one helper, "auto". class enum.Enum Base class for creating enumerated constants. See section Functional API for an alternate construction syntax. class enum.IntEnum Base class for creating enumerated constants that are also subclasses of "int". class enum.IntFlag Base class for creating enumerated constants that can be combined using the bitwise operators without losing their "IntFlag" membership. "IntFlag" members are also subclasses of "int". class enum.Flag Base class for creating enumerated constants that can be combined using the bitwise operations without losing their "Flag" membership. enum.unique() Enum class decorator that ensures only one name is bound to any one value. class enum.auto Instances are replaced with an appropriate value for Enum members. By default, the initial value starts at 1. New in version 3.6: "Flag", "IntFlag", "auto" Creating an Enum ================ Enumerations are created using the "class" syntax, which makes them easy to read and write. An alternative creation method is described in Functional API. To define an enumeration, subclass "Enum" as follows: >>> from enum import Enum >>> class Color(Enum): ... RED = 1 ... GREEN = 2 ... BLUE = 3 ... Note: Enum member valuesMember values can be anything: "int", "str", etc.. If the exact value is unimportant you may use "auto" instances and an appropriate value will be chosen for you. Care must be taken if you mix "auto" with other values. Note: Nomenclature * The class "Color" is an *enumeration* (or *enum*) * The attributes "Color.RED", "Color.GREEN", etc., are *enumeration members* (or *enum members*) and are functionally constants. * The enum members have *names* and *values* (the name of "Color.RED" is "RED", the value of "Color.BLUE" is "3", etc.) Note: Even though we use the "class" syntax to create Enums, Enums are not normal Python classes. See How are Enums different? for more details. Enumeration members have human readable string representations: >>> print(Color.RED) Color.RED …while their "repr" has more information: >>> print(repr(Color.RED)) The *type* of an enumeration member is the enumeration it belongs to: >>> type(Color.RED) >>> isinstance(Color.GREEN, Color) True >>> Enum members also have a property that contains just their item name: >>> print(Color.RED.name) RED Enumerations support iteration, in definition order: >>> class Shake(Enum): ... VANILLA = 7 ... CHOCOLATE = 4 ... COOKIES = 9 ... MINT = 3 ... >>> for shake in Shake: ... print(shake) ... Shake.VANILLA Shake.CHOCOLATE Shake.COOKIES Shake.MINT Enumeration members are hashable, so they can be used in dictionaries and sets: >>> apples = {} >>> apples[Color.RED] = 'red delicious' >>> apples[Color.GREEN] = 'granny smith' >>> apples == {Color.RED: 'red delicious', Color.GREEN: 'granny smith'} True Programmatic access to enumeration members and their attributes =============================================================== Sometimes it’s useful to access members in enumerations programmatically (i.e. situations where "Color.RED" won’t do because the exact color is not known at program-writing time). "Enum" allows such access: >>> Color(1) >>> Color(3) If you want to access enum members by *name*, use item access: >>> Color['RED'] >>> Color['GREEN'] If you have an enum member and need its "name" or "value": >>> member = Color.RED >>> member.name 'RED' >>> member.value 1 Duplicating enum members and values =================================== Having two enum members with the same name is invalid: >>> class Shape(Enum): ... SQUARE = 2 ... SQUARE = 3 ... Traceback (most recent call last): ... TypeError: Attempted to reuse key: 'SQUARE' However, two enum members are allowed to have the same value. Given two members A and B with the same value (and A defined first), B is an alias to A. By-value lookup of the value of A and B will return A. By-name lookup of B will also return A: >>> class Shape(Enum): ... SQUARE = 2 ... DIAMOND = 1 ... CIRCLE = 3 ... ALIAS_FOR_SQUARE = 2 ... >>> Shape.SQUARE >>> Shape.ALIAS_FOR_SQUARE >>> Shape(2) Note: Attempting to create a member with the same name as an already defined attribute (another member, a method, etc.) or attempting to create an attribute with the same name as a member is not allowed. Ensuring unique enumeration values ================================== By default, enumerations allow multiple names as aliases for the same value. When this behavior isn’t desired, the following decorator can be used to ensure each value is used only once in the enumeration: @enum.unique A "class" decorator specifically for enumerations. It searches an enumeration’s "__members__" gathering any aliases it finds; if any are found "ValueError" is raised with the details: >>> from enum import Enum, unique >>> @unique ... class Mistake(Enum): ... ONE = 1 ... TWO = 2 ... THREE = 3 ... FOUR = 3 ... Traceback (most recent call last): ... ValueError: duplicate values found in : FOUR -> THREE Using automatic values ====================== If the exact value is unimportant you can use "auto": >>> from enum import Enum, auto >>> class Color(Enum): ... RED = auto() ... BLUE = auto() ... GREEN = auto() ... >>> list(Color) [, , ] The values are chosen by "_generate_next_value_()", which can be overridden: >>> class AutoName(Enum): ... def _generate_next_value_(name, start, count, last_values): ... return name ... >>> class Ordinal(AutoName): ... NORTH = auto() ... SOUTH = auto() ... EAST = auto() ... WEST = auto() ... >>> list(Ordinal) [, , , ] Note: The goal of the default "_generate_next_value_()" method is to provide the next "int" in sequence with the last "int" provided, but the way it does this is an implementation detail and may change. Note: The "_generate_next_value_()" method must be defined before any members. Iteration ========= Iterating over the members of an enum does not provide the aliases: >>> list(Shape) [, , ] The special attribute "__members__" is a read-only ordered mapping of names to members. It includes all names defined in the enumeration, including the aliases: >>> for name, member in Shape.__members__.items(): ... name, member ... ('SQUARE', ) ('DIAMOND', ) ('CIRCLE', ) ('ALIAS_FOR_SQUARE', ) The "__members__" attribute can be used for detailed programmatic access to the enumeration members. For example, finding all the aliases: >>> [name for name, member in Shape.__members__.items() if member.name != name] ['ALIAS_FOR_SQUARE'] Comparisons =========== Enumeration members are compared by identity: >>> Color.RED is Color.RED True >>> Color.RED is Color.BLUE False >>> Color.RED is not Color.BLUE True Ordered comparisons between enumeration values are *not* supported. Enum members are not integers (but see IntEnum below): >>> Color.RED < Color.BLUE Traceback (most recent call last): File "", line 1, in TypeError: '<' not supported between instances of 'Color' and 'Color' Equality comparisons are defined though: >>> Color.BLUE == Color.RED False >>> Color.BLUE != Color.RED True >>> Color.BLUE == Color.BLUE True Comparisons against non-enumeration values will always compare not equal (again, "IntEnum" was explicitly designed to behave differently, see below): >>> Color.BLUE == 2 False Allowed members and attributes of enumerations ============================================== The examples above use integers for enumeration values. Using integers is short and handy (and provided by default by the Functional API), but not strictly enforced. In the vast majority of use-cases, one doesn’t care what the actual value of an enumeration is. But if the value *is* important, enumerations can have arbitrary values. Enumerations are Python classes, and can have methods and special methods as usual. If we have this enumeration: >>> class Mood(Enum): ... FUNKY = 1 ... HAPPY = 3 ... ... def describe(self): ... # self is the member here ... return self.name, self.value ... ... def __str__(self): ... return 'my custom str! {0}'.format(self.value) ... ... @classmethod ... def favorite_mood(cls): ... # cls here is the enumeration ... return cls.HAPPY ... Then: >>> Mood.favorite_mood() >>> Mood.HAPPY.describe() ('HAPPY', 3) >>> str(Mood.FUNKY) 'my custom str! 1' The rules for what is allowed are as follows: names that start and end with a single underscore are reserved by enum and cannot be used; all other attributes defined within an enumeration will become members of this enumeration, with the exception of special methods ("__str__()", "__add__()", etc.), descriptors (methods are also descriptors), and variable names listed in "_ignore_". Note: if your enumeration defines "__new__()" and/or "__init__()" then any value(s) given to the enum member will be passed into those methods. See Planet for an example. Restricted Enum subclassing =========================== A new "Enum" class must have one base Enum class, up to one concrete data type, and as many "object"-based mixin classes as needed. The order of these base classes is: class EnumName([mix-in, ...,] [data-type,] base-enum): pass Also, subclassing an enumeration is allowed only if the enumeration does not define any members. So this is forbidden: >>> class MoreColor(Color): ... PINK = 17 ... Traceback (most recent call last): ... TypeError: Cannot extend enumerations But this is allowed: >>> class Foo(Enum): ... def some_behavior(self): ... pass ... >>> class Bar(Foo): ... HAPPY = 1 ... SAD = 2 ... Allowing subclassing of enums that define members would lead to a violation of some important invariants of types and instances. On the other hand, it makes sense to allow sharing some common behavior between a group of enumerations. (See OrderedEnum for an example.) Pickling ======== Enumerations can be pickled and unpickled: >>> from test.test_enum import Fruit >>> from pickle import dumps, loads >>> Fruit.TOMATO is loads(dumps(Fruit.TOMATO)) True The usual restrictions for pickling apply: picklable enums must be defined in the top level of a module, since unpickling requires them to be importable from that module. Note: With pickle protocol version 4 it is possible to easily pickle enums nested in other classes. It is possible to modify how Enum members are pickled/unpickled by defining "__reduce_ex__()" in the enumeration class. Functional API ============== The "Enum" class is callable, providing the following functional API: >>> Animal = Enum('Animal', 'ANT BEE CAT DOG') >>> Animal >>> Animal.ANT >>> Animal.ANT.value 1 >>> list(Animal) [, , , ] The semantics of this API resemble "namedtuple". The first argument of the call to "Enum" is the name of the enumeration. The second argument is the *source* of enumeration member names. It can be a whitespace-separated string of names, a sequence of names, a sequence of 2-tuples with key/value pairs, or a mapping (e.g. dictionary) of names to values. The last two options enable assigning arbitrary values to enumerations; the others auto-assign increasing integers starting with 1 (use the "start" parameter to specify a different starting value). A new class derived from "Enum" is returned. In other words, the above assignment to "Animal" is equivalent to: >>> class Animal(Enum): ... ANT = 1 ... BEE = 2 ... CAT = 3 ... DOG = 4 ... The reason for defaulting to "1" as the starting number and not "0" is that "0" is "False" in a boolean sense, but enum members all evaluate to "True". Pickling enums created with the functional API can be tricky as frame stack implementation details are used to try and figure out which module the enumeration is being created in (e.g. it will fail if you use a utility function in separate module, and also may not work on IronPython or Jython). The solution is to specify the module name explicitly as follows: >>> Animal = Enum('Animal', 'ANT BEE CAT DOG', module=__name__) Warning: If "module" is not supplied, and Enum cannot determine what it is, the new Enum members will not be unpicklable; to keep errors closer to the source, pickling will be disabled. The new pickle protocol 4 also, in some circumstances, relies on "__qualname__" being set to the location where pickle will be able to find the class. For example, if the class was made available in class SomeData in the global scope: >>> Animal = Enum('Animal', 'ANT BEE CAT DOG', qualname='SomeData.Animal') The complete signature is: Enum(value='NewEnumName', names=<...>, *, module='...', qualname='...', type=, start=1) value: What the new Enum class will record as its name. names: The Enum members. This can be a whitespace or comma separated string (values will start at 1 unless otherwise specified): 'RED GREEN BLUE' | 'RED,GREEN,BLUE' | 'RED, GREEN, BLUE' or an iterator of names: ['RED', 'GREEN', 'BLUE'] or an iterator of (name, value) pairs: [('CYAN', 4), ('MAGENTA', 5), ('YELLOW', 6)] or a mapping: {'CHARTREUSE': 7, 'SEA_GREEN': 11, 'ROSEMARY': 42} module: name of module where new Enum class can be found. qualname: where in module new Enum class can be found. type: type to mix in to new Enum class. start: number to start counting at if only names are passed in. Changed in version 3.5: The *start* parameter was added. Derived Enumerations ==================== IntEnum ------- The first variation of "Enum" that is provided is also a subclass of "int". Members of an "IntEnum" can be compared to integers; by extension, integer enumerations of different types can also be compared to each other: >>> from enum import IntEnum >>> class Shape(IntEnum): ... CIRCLE = 1 ... SQUARE = 2 ... >>> class Request(IntEnum): ... POST = 1 ... GET = 2 ... >>> Shape == 1 False >>> Shape.CIRCLE == 1 True >>> Shape.CIRCLE == Request.POST True However, they still can’t be compared to standard "Enum" enumerations: >>> class Shape(IntEnum): ... CIRCLE = 1 ... SQUARE = 2 ... >>> class Color(Enum): ... RED = 1 ... GREEN = 2 ... >>> Shape.CIRCLE == Color.RED False "IntEnum" values behave like integers in other ways you’d expect: >>> int(Shape.CIRCLE) 1 >>> ['a', 'b', 'c'][Shape.CIRCLE] 'b' >>> [i for i in range(Shape.SQUARE)] [0, 1] IntFlag ------- The next variation of "Enum" provided, "IntFlag", is also based on "int". The difference being "IntFlag" members can be combined using the bitwise operators (&, |, ^, ~) and the result is still an "IntFlag" member. However, as the name implies, "IntFlag" members also subclass "int" and can be used wherever an "int" is used. Any operation on an "IntFlag" member besides the bit-wise operations will lose the "IntFlag" membership. New in version 3.6. Sample "IntFlag" class: >>> from enum import IntFlag >>> class Perm(IntFlag): ... R = 4 ... W = 2 ... X = 1 ... >>> Perm.R | Perm.W >>> Perm.R + Perm.W 6 >>> RW = Perm.R | Perm.W >>> Perm.R in RW True It is also possible to name the combinations: >>> class Perm(IntFlag): ... R = 4 ... W = 2 ... X = 1 ... RWX = 7 >>> Perm.RWX >>> ~Perm.RWX Another important difference between "IntFlag" and "Enum" is that if no flags are set (the value is 0), its boolean evaluation is "False": >>> Perm.R & Perm.X >>> bool(Perm.R & Perm.X) False Because "IntFlag" members are also subclasses of "int" they can be combined with them: >>> Perm.X | 8 Flag ---- The last variation is "Flag". Like "IntFlag", "Flag" members can be combined using the bitwise operators (&, |, ^, ~). Unlike "IntFlag", they cannot be combined with, nor compared against, any other "Flag" enumeration, nor "int". While it is possible to specify the values directly it is recommended to use "auto" as the value and let "Flag" select an appropriate value. New in version 3.6. Like "IntFlag", if a combination of "Flag" members results in no flags being set, the boolean evaluation is "False": >>> from enum import Flag, auto >>> class Color(Flag): ... RED = auto() ... BLUE = auto() ... GREEN = auto() ... >>> Color.RED & Color.GREEN >>> bool(Color.RED & Color.GREEN) False Individual flags should have values that are powers of two (1, 2, 4, 8, …), while combinations of flags won’t: >>> class Color(Flag): ... RED = auto() ... BLUE = auto() ... GREEN = auto() ... WHITE = RED | BLUE | GREEN ... >>> Color.WHITE Giving a name to the “no flags set” condition does not change its boolean value: >>> class Color(Flag): ... BLACK = 0 ... RED = auto() ... BLUE = auto() ... GREEN = auto() ... >>> Color.BLACK >>> bool(Color.BLACK) False Note: For the majority of new code, "Enum" and "Flag" are strongly recommended, since "IntEnum" and "IntFlag" break some semantic promises of an enumeration (by being comparable to integers, and thus by transitivity to other unrelated enumerations). "IntEnum" and "IntFlag" should be used only in cases where "Enum" and "Flag" will not do; for example, when integer constants are replaced with enumerations, or for interoperability with other systems. Others ------ While "IntEnum" is part of the "enum" module, it would be very simple to implement independently: class IntEnum(int, Enum): pass This demonstrates how similar derived enumerations can be defined; for example a "StrEnum" that mixes in "str" instead of "int". Some rules: 1. When subclassing "Enum", mix-in types must appear before "Enum" itself in the sequence of bases, as in the "IntEnum" example above. 2. While "Enum" can have members of any type, once you mix in an additional type, all the members must have values of that type, e.g. "int" above. This restriction does not apply to mix-ins which only add methods and don’t specify another type. 3. When another data type is mixed in, the "value" attribute is *not the same* as the enum member itself, although it is equivalent and will compare equal. 4. %-style formatting: *%s* and *%r* call the "Enum" class’s "__str__()" and "__repr__()" respectively; other codes (such as *%i* or *%h* for IntEnum) treat the enum member as its mixed-in type. 5. Formatted string literals, "str.format()", and "format()" will use the mixed-in type’s "__format__()" unless "__str__()" or "__format__()" is overridden in the subclass, in which case the overridden methods or "Enum" methods will be used. Use the !s and !r format codes to force usage of the "Enum" class’s "__str__()" and "__repr__()" methods. When to use "__new__()" vs. "__init__()" ======================================== "__new__()" must be used whenever you want to customize the actual value of the "Enum" member. Any other modifications may go in either "__new__()" or "__init__()", with "__init__()" being preferred. For example, if you want to pass several items to the constructor, but only want one of them to be the value: >>> class Coordinate(bytes, Enum): ... """ ... Coordinate with binary codes that can be indexed by the int code. ... """ ... def __new__(cls, value, label, unit): ... obj = bytes.__new__(cls, [value]) ... obj._value_ = value ... obj.label = label ... obj.unit = unit ... return obj ... PX = (0, 'P.X', 'km') ... PY = (1, 'P.Y', 'km') ... VX = (2, 'V.X', 'km/s') ... VY = (3, 'V.Y', 'km/s') ... >>> print(Coordinate['PY']) Coordinate.PY >>> print(Coordinate(3)) Coordinate.VY Interesting examples ==================== While "Enum", "IntEnum", "IntFlag", and "Flag" are expected to cover the majority of use-cases, they cannot cover them all. Here are recipes for some different types of enumerations that can be used directly, or as examples for creating one’s own. Omitting values --------------- In many use-cases one doesn’t care what the actual value of an enumeration is. There are several ways to define this type of simple enumeration: * use instances of "auto" for the value * use instances of "object" as the value * use a descriptive string as the value * use a tuple as the value and a custom "__new__()" to replace the tuple with an "int" value Using any of these methods signifies to the user that these values are not important, and also enables one to add, remove, or reorder members without having to renumber the remaining members. Whichever method you choose, you should provide a "repr()" that also hides the (unimportant) value: >>> class NoValue(Enum): ... def __repr__(self): ... return '<%s.%s>' % (self.__class__.__name__, self.name) ... Using "auto" ~~~~~~~~~~~~ Using "auto" would look like: >>> class Color(NoValue): ... RED = auto() ... BLUE = auto() ... GREEN = auto() ... >>> Color.GREEN Using "object" ~~~~~~~~~~~~~~ Using "object" would look like: >>> class Color(NoValue): ... RED = object() ... GREEN = object() ... BLUE = object() ... >>> Color.GREEN Using a descriptive string ~~~~~~~~~~~~~~~~~~~~~~~~~~ Using a string as the value would look like: >>> class Color(NoValue): ... RED = 'stop' ... GREEN = 'go' ... BLUE = 'too fast!' ... >>> Color.GREEN >>> Color.GREEN.value 'go' Using a custom "__new__()" ~~~~~~~~~~~~~~~~~~~~~~~~~~ Using an auto-numbering "__new__()" would look like: >>> class AutoNumber(NoValue): ... def __new__(cls): ... value = len(cls.__members__) + 1 ... obj = object.__new__(cls) ... obj._value_ = value ... return obj ... >>> class Color(AutoNumber): ... RED = () ... GREEN = () ... BLUE = () ... >>> Color.GREEN >>> Color.GREEN.value 2 To make a more general purpose "AutoNumber", add "*args" to the signature: >>> class AutoNumber(NoValue): ... def __new__(cls, *args): # this is the only change from above ... value = len(cls.__members__) + 1 ... obj = object.__new__(cls) ... obj._value_ = value ... return obj ... Then when you inherit from "AutoNumber" you can write your own "__init__" to handle any extra arguments: >>> class Swatch(AutoNumber): ... def __init__(self, pantone='unknown'): ... self.pantone = pantone ... AUBURN = '3497' ... SEA_GREEN = '1246' ... BLEACHED_CORAL = () # New color, no Pantone code yet! ... >>> Swatch.SEA_GREEN >>> Swatch.SEA_GREEN.pantone '1246' >>> Swatch.BLEACHED_CORAL.pantone 'unknown' Note: The "__new__()" method, if defined, is used during creation of the Enum members; it is then replaced by Enum’s "__new__()" which is used after class creation for lookup of existing members. OrderedEnum ----------- An ordered enumeration that is not based on "IntEnum" and so maintains the normal "Enum" invariants (such as not being comparable to other enumerations): >>> class OrderedEnum(Enum): ... def __ge__(self, other): ... if self.__class__ is other.__class__: ... return self.value >= other.value ... return NotImplemented ... def __gt__(self, other): ... if self.__class__ is other.__class__: ... return self.value > other.value ... return NotImplemented ... def __le__(self, other): ... if self.__class__ is other.__class__: ... return self.value <= other.value ... return NotImplemented ... def __lt__(self, other): ... if self.__class__ is other.__class__: ... return self.value < other.value ... return NotImplemented ... >>> class Grade(OrderedEnum): ... A = 5 ... B = 4 ... C = 3 ... D = 2 ... F = 1 ... >>> Grade.C < Grade.A True DuplicateFreeEnum ----------------- Raises an error if a duplicate member name is found instead of creating an alias: >>> class DuplicateFreeEnum(Enum): ... def __init__(self, *args): ... cls = self.__class__ ... if any(self.value == e.value for e in cls): ... a = self.name ... e = cls(self.value).name ... raise ValueError( ... "aliases not allowed in DuplicateFreeEnum: %r --> %r" ... % (a, e)) ... >>> class Color(DuplicateFreeEnum): ... RED = 1 ... GREEN = 2 ... BLUE = 3 ... GRENE = 2 ... Traceback (most recent call last): ... ValueError: aliases not allowed in DuplicateFreeEnum: 'GRENE' --> 'GREEN' Note: This is a useful example for subclassing Enum to add or change other behaviors as well as disallowing aliases. If the only desired change is disallowing aliases, the "unique()" decorator can be used instead. Planet ------ If "__new__()" or "__init__()" is defined the value of the enum member will be passed to those methods: >>> class Planet(Enum): ... MERCURY = (3.303e+23, 2.4397e6) ... VENUS = (4.869e+24, 6.0518e6) ... EARTH = (5.976e+24, 6.37814e6) ... MARS = (6.421e+23, 3.3972e6) ... JUPITER = (1.9e+27, 7.1492e7) ... SATURN = (5.688e+26, 6.0268e7) ... URANUS = (8.686e+25, 2.5559e7) ... NEPTUNE = (1.024e+26, 2.4746e7) ... def __init__(self, mass, radius): ... self.mass = mass # in kilograms ... self.radius = radius # in meters ... @property ... def surface_gravity(self): ... # universal gravitational constant (m3 kg-1 s-2) ... G = 6.67300E-11 ... return G * self.mass / (self.radius * self.radius) ... >>> Planet.EARTH.value (5.976e+24, 6378140.0) >>> Planet.EARTH.surface_gravity 9.802652743337129 TimePeriod ---------- An example to show the "_ignore_" attribute in use: >>> from datetime import timedelta >>> class Period(timedelta, Enum): ... "different lengths of time" ... _ignore_ = 'Period i' ... Period = vars() ... for i in range(367): ... Period['day_%d' % i] = i ... >>> list(Period)[:2] [, ] >>> list(Period)[-2:] [, ] How are Enums different? ======================== Enums have a custom metaclass that affects many aspects of both derived Enum classes and their instances (members). Enum Classes ------------ The "EnumMeta" metaclass is responsible for providing the "__contains__()", "__dir__()", "__iter__()" and other methods that allow one to do things with an "Enum" class that fail on a typical class, such as *list(Color)* or *some_enum_var in Color*. "EnumMeta" is responsible for ensuring that various other methods on the final "Enum" class are correct (such as "__new__()", "__getnewargs__()", "__str__()" and "__repr__()"). Enum Members (aka instances) ---------------------------- The most interesting thing about Enum members is that they are singletons. "EnumMeta" creates them all while it is creating the "Enum" class itself, and then puts a custom "__new__()" in place to ensure that no new ones are ever instantiated by returning only the existing member instances. Finer Points ------------ Supported "__dunder__" names ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ "__members__" is a read-only ordered mapping of "member_name":"member" items. It is only available on the class. "__new__()", if specified, must create and return the enum members; it is also a very good idea to set the member’s "_value_" appropriately. Once all the members are created it is no longer used. Supported "_sunder_" names ~~~~~~~~~~~~~~~~~~~~~~~~~~ * "_name_" – name of the member * "_value_" – value of the member; can be set / modified in "__new__" * "_missing_" – a lookup function used when a value is not found; may be overridden * "_ignore_" – a list of names, either as a "list" or a "str", that will not be transformed into members, and will be removed from the final class * "_order_" – used in Python 2/3 code to ensure member order is consistent (class attribute, removed during class creation) * "_generate_next_value_" – used by the Functional API and by "auto" to get an appropriate value for an enum member; may be overridden New in version 3.6: "_missing_", "_order_", "_generate_next_value_" New in version 3.7: "_ignore_" To help keep Python 2 / Python 3 code in sync an "_order_" attribute can be provided. It will be checked against the actual order of the enumeration and raise an error if the two do not match: >>> class Color(Enum): ... _order_ = 'RED GREEN BLUE' ... RED = 1 ... BLUE = 3 ... GREEN = 2 ... Traceback (most recent call last): ... TypeError: member order does not match _order_ Note: In Python 2 code the "_order_" attribute is necessary as definition order is lost before it can be recorded. _Private__names ~~~~~~~~~~~~~~~ Private names will be normal attributes in Python 3.11 instead of either an error or a member (depending on if the name ends with an underscore). Using these names in 3.9 and 3.10 will issue a "DeprecationWarning". "Enum" member type ~~~~~~~~~~~~~~~~~~ "Enum" members are instances of their "Enum" class, and are normally accessed as "EnumClass.member". Under certain circumstances they can also be accessed as "EnumClass.member.member", but you should never do this as that lookup may fail or, worse, return something besides the "Enum" member you are looking for (this is another good reason to use all-uppercase names for members): >>> class FieldTypes(Enum): ... name = 0 ... value = 1 ... size = 2 ... >>> FieldTypes.value.size >>> FieldTypes.size.value 2 Note: This behavior is deprecated and will be removed in 3.11. Changed in version 3.5. Boolean value of "Enum" classes and members ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ "Enum" members that are mixed with non-"Enum" types (such as "int", "str", etc.) are evaluated according to the mixed-in type’s rules; otherwise, all members evaluate as "True". To make your own Enum’s boolean evaluation depend on the member’s value add the following to your class: def __bool__(self): return bool(self.value) "Enum" classes always evaluate as "True". "Enum" classes with methods ~~~~~~~~~~~~~~~~~~~~~~~~~~~ If you give your "Enum" subclass extra methods, like the Planet class above, those methods will show up in a "dir()" of the member, but not of the class: >>> dir(Planet) ['EARTH', 'JUPITER', 'MARS', 'MERCURY', 'NEPTUNE', 'SATURN', 'URANUS', 'VENUS', '__class__', '__doc__', '__members__', '__module__'] >>> dir(Planet.EARTH) ['__class__', '__doc__', '__module__', 'name', 'surface_gravity', 'value'] Combining members of "Flag" ~~~~~~~~~~~~~~~~~~~~~~~~~~~ If a combination of Flag members is not named, the "repr()" will include all named flags and all named combinations of flags that are in the value: >>> class Color(Flag): ... RED = auto() ... GREEN = auto() ... BLUE = auto() ... MAGENTA = RED | BLUE ... YELLOW = RED | GREEN ... CYAN = GREEN | BLUE ... >>> Color(3) # named combination >>> Color(7) # not named combination Note: In 3.11 unnamed combinations of flags will only produce the canonical flag members (aka single-value flags). So "Color(7)" would produce something like "". "errno" — Standard errno system symbols *************************************** ====================================================================== This module makes available standard "errno" system symbols. The value of each symbol is the corresponding integer value. The names and descriptions are borrowed from "linux/include/errno.h", which should be all-inclusive. errno.errorcode Dictionary providing a mapping from the errno value to the string name in the underlying system. For instance, "errno.errorcode[errno.EPERM]" maps to "'EPERM'". To translate a numeric error code to an error message, use "os.strerror()". Of the following list, symbols that are not used on the current platform are not defined by the module. The specific list of defined symbols is available as "errno.errorcode.keys()". Symbols available can include: errno.EPERM Operation not permitted. This error is mapped to the exception "PermissionError". errno.ENOENT No such file or directory. This error is mapped to the exception "FileNotFoundError". errno.ESRCH No such process. This error is mapped to the exception "ProcessLookupError". errno.EINTR Interrupted system call. This error is mapped to the exception "InterruptedError". errno.EIO I/O error errno.ENXIO No such device or address errno.E2BIG Arg list too long errno.ENOEXEC Exec format error errno.EBADF Bad file number errno.ECHILD No child processes. This error is mapped to the exception "ChildProcessError". errno.EAGAIN Try again. This error is mapped to the exception "BlockingIOError". errno.ENOMEM Out of memory errno.EACCES Permission denied. This error is mapped to the exception "PermissionError". errno.EFAULT Bad address errno.ENOTBLK Block device required errno.EBUSY Device or resource busy errno.EEXIST File exists. This error is mapped to the exception "FileExistsError". errno.EXDEV Cross-device link errno.ENODEV No such device errno.ENOTDIR Not a directory. This error is mapped to the exception "NotADirectoryError". errno.EISDIR Is a directory. This error is mapped to the exception "IsADirectoryError". errno.EINVAL Invalid argument errno.ENFILE File table overflow errno.EMFILE Too many open files errno.ENOTTY Not a typewriter errno.ETXTBSY Text file busy errno.EFBIG File too large errno.ENOSPC No space left on device errno.ESPIPE Illegal seek errno.EROFS Read-only file system errno.EMLINK Too many links errno.EPIPE Broken pipe. This error is mapped to the exception "BrokenPipeError". errno.EDOM Math argument out of domain of func errno.ERANGE Math result not representable errno.EDEADLK Resource deadlock would occur errno.ENAMETOOLONG File name too long errno.ENOLCK No record locks available errno.ENOSYS Function not implemented errno.ENOTEMPTY Directory not empty errno.ELOOP Too many symbolic links encountered errno.EWOULDBLOCK Operation would block. This error is mapped to the exception "BlockingIOError". errno.ENOMSG No message of desired type errno.EIDRM Identifier removed errno.ECHRNG Channel number out of range errno.EL2NSYNC Level 2 not synchronized errno.EL3HLT Level 3 halted errno.EL3RST Level 3 reset errno.ELNRNG Link number out of range errno.EUNATCH Protocol driver not attached errno.ENOCSI No CSI structure available errno.EL2HLT Level 2 halted errno.EBADE Invalid exchange errno.EBADR Invalid request descriptor errno.EXFULL Exchange full errno.ENOANO No anode errno.EBADRQC Invalid request code errno.EBADSLT Invalid slot errno.EDEADLOCK File locking deadlock error errno.EBFONT Bad font file format errno.ENOSTR Device not a stream errno.ENODATA No data available errno.ETIME Timer expired errno.ENOSR Out of streams resources errno.ENONET Machine is not on the network errno.ENOPKG Package not installed errno.EREMOTE Object is remote errno.ENOLINK Link has been severed errno.EADV Advertise error errno.ESRMNT Srmount error errno.ECOMM Communication error on send errno.EPROTO Protocol error errno.EMULTIHOP Multihop attempted errno.EDOTDOT RFS specific error errno.EBADMSG Not a data message errno.EOVERFLOW Value too large for defined data type errno.ENOTUNIQ Name not unique on network errno.EBADFD File descriptor in bad state errno.EREMCHG Remote address changed errno.ELIBACC Can not access a needed shared library errno.ELIBBAD Accessing a corrupted shared library errno.ELIBSCN .lib section in a.out corrupted errno.ELIBMAX Attempting to link in too many shared libraries errno.ELIBEXEC Cannot exec a shared library directly errno.EILSEQ Illegal byte sequence errno.ERESTART Interrupted system call should be restarted errno.ESTRPIPE Streams pipe error errno.EUSERS Too many users errno.ENOTSOCK Socket operation on non-socket errno.EDESTADDRREQ Destination address required errno.EMSGSIZE Message too long errno.EPROTOTYPE Protocol wrong type for socket errno.ENOPROTOOPT Protocol not available errno.EPROTONOSUPPORT Protocol not supported errno.ESOCKTNOSUPPORT Socket type not supported errno.EOPNOTSUPP Operation not supported on transport endpoint errno.EPFNOSUPPORT Protocol family not supported errno.EAFNOSUPPORT Address family not supported by protocol errno.EADDRINUSE Address already in use errno.EADDRNOTAVAIL Cannot assign requested address errno.ENETDOWN Network is down errno.ENETUNREACH Network is unreachable errno.ENETRESET Network dropped connection because of reset errno.ECONNABORTED Software caused connection abort. This error is mapped to the exception "ConnectionAbortedError". errno.ECONNRESET Connection reset by peer. This error is mapped to the exception "ConnectionResetError". errno.ENOBUFS No buffer space available errno.EISCONN Transport endpoint is already connected errno.ENOTCONN Transport endpoint is not connected errno.ESHUTDOWN Cannot send after transport endpoint shutdown. This error is mapped to the exception "BrokenPipeError". errno.ETOOMANYREFS Too many references: cannot splice errno.ETIMEDOUT Connection timed out. This error is mapped to the exception "TimeoutError". errno.ECONNREFUSED Connection refused. This error is mapped to the exception "ConnectionRefusedError". errno.EHOSTDOWN Host is down errno.EHOSTUNREACH No route to host errno.EALREADY Operation already in progress. This error is mapped to the exception "BlockingIOError". errno.EINPROGRESS Operation now in progress. This error is mapped to the exception "BlockingIOError". errno.ESTALE Stale NFS file handle errno.EUCLEAN Structure needs cleaning errno.ENOTNAM Not a XENIX named type file errno.ENAVAIL No XENIX semaphores available errno.EISNAM Is a named type file errno.EREMOTEIO Remote I/O error errno.EDQUOT Quota exceeded Built-in Exceptions ******************* In Python, all exceptions must be instances of a class that derives from "BaseException". In a "try" statement with an "except" clause that mentions a particular class, that clause also handles any exception classes derived from that class (but not exception classes from which *it* is derived). Two exception classes that are not related via subclassing are never equivalent, even if they have the same name. The built-in exceptions listed below can be generated by the interpreter or built-in functions. Except where mentioned, they have an “associated value” indicating the detailed cause of the error. This may be a string or a tuple of several items of information (e.g., an error code and a string explaining the code). The associated value is usually passed as arguments to the exception class’s constructor. User code can raise built-in exceptions. This can be used to test an exception handler or to report an error condition “just like” the situation in which the interpreter raises the same exception; but beware that there is nothing to prevent user code from raising an inappropriate error. The built-in exception classes can be subclassed to define new exceptions; programmers are encouraged to derive new exceptions from the "Exception" class or one of its subclasses, and not from "BaseException". More information on defining exceptions is available in the Python Tutorial under User-defined Exceptions. Exception context ================= When raising a new exception while another exception is already being handled, the new exception’s "__context__" attribute is automatically set to the handled exception. An exception may be handled when an "except" or "finally" clause, or a "with" statement, is used. This implicit exception context can be supplemented with an explicit cause by using "from" with "raise": raise new_exc from original_exc The expression following "from" must be an exception or "None". It will be set as "__cause__" on the raised exception. Setting "__cause__" also implicitly sets the "__suppress_context__" attribute to "True", so that using "raise new_exc from None" effectively replaces the old exception with the new one for display purposes (e.g. converting "KeyError" to "AttributeError"), while leaving the old exception available in "__context__" for introspection when debugging. The default traceback display code shows these chained exceptions in addition to the traceback for the exception itself. An explicitly chained exception in "__cause__" is always shown when present. An implicitly chained exception in "__context__" is shown only if "__cause__" is "None" and "__suppress_context__" is false. In either case, the exception itself is always shown after any chained exceptions so that the final line of the traceback always shows the last exception that was raised. Inheriting from built-in exceptions =================================== User code can create subclasses that inherit from an exception type. It’s recommended to only subclass one exception type at a time to avoid any possible conflicts between how the bases handle the "args" attribute, as well as due to possible memory layout incompatibilities. **CPython implementation detail:** Most built-in exceptions are implemented in C for efficiency, see: Objects/exceptions.c. Some have custom memory layouts which makes it impossible to create a subclass that inherits from multiple exception types. The memory layout of a type is an implementation detail and might change between Python versions, leading to new conflicts in the future. Therefore, it’s recommended to avoid subclassing multiple exception types altogether. Base classes ============ The following exceptions are used mostly as base classes for other exceptions. exception BaseException The base class for all built-in exceptions. It is not meant to be directly inherited by user-defined classes (for that, use "Exception"). If "str()" is called on an instance of this class, the representation of the argument(s) to the instance are returned, or the empty string when there were no arguments. args The tuple of arguments given to the exception constructor. Some built-in exceptions (like "OSError") expect a certain number of arguments and assign a special meaning to the elements of this tuple, while others are usually called only with a single string giving an error message. with_traceback(tb) This method sets *tb* as the new traceback for the exception and returns the exception object. It is usually used in exception handling code like this: try: ... except SomeException: tb = sys.exc_info()[2] raise OtherException(...).with_traceback(tb) exception Exception All built-in, non-system-exiting exceptions are derived from this class. All user-defined exceptions should also be derived from this class. exception ArithmeticError The base class for those built-in exceptions that are raised for various arithmetic errors: "OverflowError", "ZeroDivisionError", "FloatingPointError". exception BufferError Raised when a buffer related operation cannot be performed. exception LookupError The base class for the exceptions that are raised when a key or index used on a mapping or sequence is invalid: "IndexError", "KeyError". This can be raised directly by "codecs.lookup()". Concrete exceptions =================== The following exceptions are the exceptions that are usually raised. exception AssertionError Raised when an "assert" statement fails. exception AttributeError Raised when an attribute reference (see Attribute references) or assignment fails. (When an object does not support attribute references or attribute assignments at all, "TypeError" is raised.) exception EOFError Raised when the "input()" function hits an end-of-file condition (EOF) without reading any data. (N.B.: the "io.IOBase.read()" and "io.IOBase.readline()" methods return an empty string when they hit EOF.) exception FloatingPointError Not currently used. exception GeneratorExit Raised when a *generator* or *coroutine* is closed; see "generator.close()" and "coroutine.close()". It directly inherits from "BaseException" instead of "Exception" since it is technically not an error. exception ImportError Raised when the "import" statement has troubles trying to load a module. Also raised when the “from list” in "from ... import" has a name that cannot be found. The "name" and "path" attributes can be set using keyword-only arguments to the constructor. When set they represent the name of the module that was attempted to be imported and the path to any file which triggered the exception, respectively. Changed in version 3.3: Added the "name" and "path" attributes. exception ModuleNotFoundError A subclass of "ImportError" which is raised by "import" when a module could not be located. It is also raised when "None" is found in "sys.modules". New in version 3.6. exception IndexError Raised when a sequence subscript is out of range. (Slice indices are silently truncated to fall in the allowed range; if an index is not an integer, "TypeError" is raised.) exception KeyError Raised when a mapping (dictionary) key is not found in the set of existing keys. exception KeyboardInterrupt Raised when the user hits the interrupt key (normally "Control-C" or "Delete"). During execution, a check for interrupts is made regularly. The exception inherits from "BaseException" so as to not be accidentally caught by code that catches "Exception" and thus prevent the interpreter from exiting. Note: Catching a "KeyboardInterrupt" requires special consideration. Because it can be raised at unpredictable points, it may, in some circumstances, leave the running program in an inconsistent state. It is generally best to allow "KeyboardInterrupt" to end the program as quickly as possible or avoid raising it entirely. (See Note on Signal Handlers and Exceptions.) exception MemoryError Raised when an operation runs out of memory but the situation may still be rescued (by deleting some objects). The associated value is a string indicating what kind of (internal) operation ran out of memory. Note that because of the underlying memory management architecture (C’s "malloc()" function), the interpreter may not always be able to completely recover from this situation; it nevertheless raises an exception so that a stack traceback can be printed, in case a run-away program was the cause. exception NameError Raised when a local or global name is not found. This applies only to unqualified names. The associated value is an error message that includes the name that could not be found. exception NotImplementedError This exception is derived from "RuntimeError". In user defined base classes, abstract methods should raise this exception when they require derived classes to override the method, or while the class is being developed to indicate that the real implementation still needs to be added. Note: It should not be used to indicate that an operator or method is not meant to be supported at all – in that case either leave the operator / method undefined or, if a subclass, set it to "None". Note: "NotImplementedError" and "NotImplemented" are not interchangeable, even though they have similar names and purposes. See "NotImplemented" for details on when to use it. exception OSError([arg]) exception OSError(errno, strerror[, filename[, winerror[, filename2]]]) This exception is raised when a system function returns a system- related error, including I/O failures such as “file not found” or “disk full” (not for illegal argument types or other incidental errors). The second form of the constructor sets the corresponding attributes, described below. The attributes default to "None" if not specified. For backwards compatibility, if three arguments are passed, the "args" attribute contains only a 2-tuple of the first two constructor arguments. The constructor often actually returns a subclass of "OSError", as described in OS exceptions below. The particular subclass depends on the final "errno" value. This behaviour only occurs when constructing "OSError" directly or via an alias, and is not inherited when subclassing. errno A numeric error code from the C variable "errno". winerror Under Windows, this gives you the native Windows error code. The "errno" attribute is then an approximate translation, in POSIX terms, of that native error code. Under Windows, if the *winerror* constructor argument is an integer, the "errno" attribute is determined from the Windows error code, and the *errno* argument is ignored. On other platforms, the *winerror* argument is ignored, and the "winerror" attribute does not exist. strerror The corresponding error message, as provided by the operating system. It is formatted by the C functions "perror()" under POSIX, and "FormatMessage()" under Windows. filename filename2 For exceptions that involve a file system path (such as "open()" or "os.unlink()"), "filename" is the file name passed to the function. For functions that involve two file system paths (such as "os.rename()"), "filename2" corresponds to the second file name passed to the function. Changed in version 3.3: "EnvironmentError", "IOError", "WindowsError", "socket.error", "select.error" and "mmap.error" have been merged into "OSError", and the constructor may return a subclass. Changed in version 3.4: The "filename" attribute is now the original file name passed to the function, instead of the name encoded to or decoded from the filesystem encoding. Also, the *filename2* constructor argument and attribute was added. exception OverflowError Raised when the result of an arithmetic operation is too large to be represented. This cannot occur for integers (which would rather raise "MemoryError" than give up). However, for historical reasons, OverflowError is sometimes raised for integers that are outside a required range. Because of the lack of standardization of floating point exception handling in C, most floating point operations are not checked. exception RecursionError This exception is derived from "RuntimeError". It is raised when the interpreter detects that the maximum recursion depth (see "sys.getrecursionlimit()") is exceeded. New in version 3.5: Previously, a plain "RuntimeError" was raised. exception ReferenceError This exception is raised when a weak reference proxy, created by the "weakref.proxy()" function, is used to access an attribute of the referent after it has been garbage collected. For more information on weak references, see the "weakref" module. exception RuntimeError Raised when an error is detected that doesn’t fall in any of the other categories. The associated value is a string indicating what precisely went wrong. exception StopIteration Raised by built-in function "next()" and an *iterator*’s "__next__()" method to signal that there are no further items produced by the iterator. The exception object has a single attribute "value", which is given as an argument when constructing the exception, and defaults to "None". When a *generator* or *coroutine* function returns, a new "StopIteration" instance is raised, and the value returned by the function is used as the "value" parameter to the constructor of the exception. If a generator code directly or indirectly raises "StopIteration", it is converted into a "RuntimeError" (retaining the "StopIteration" as the new exception’s cause). Changed in version 3.3: Added "value" attribute and the ability for generator functions to use it to return a value. Changed in version 3.5: Introduced the RuntimeError transformation via "from __future__ import generator_stop", see **PEP 479**. Changed in version 3.7: Enable **PEP 479** for all code by default: a "StopIteration" error raised in a generator is transformed into a "RuntimeError". exception StopAsyncIteration Must be raised by "__anext__()" method of an *asynchronous iterator* object to stop the iteration. New in version 3.5. exception SyntaxError(message, details) Raised when the parser encounters a syntax error. This may occur in an "import" statement, in a call to the built-in functions "compile()", "exec()", or "eval()", or when reading the initial script or standard input (also interactively). The "str()" of the exception instance returns only the error message. Details is a tuple whose members are also available as separate attributes. filename The name of the file the syntax error occurred in. lineno Which line number in the file the error occurred in. This is 1-indexed: the first line in the file has a "lineno" of 1. offset The column in the line where the error occurred. This is 1-indexed: the first character in the line has an "offset" of 1. text The source code text involved in the error. For errors in f-string fields, the message is prefixed by “f-string: ” and the offsets are offsets in a text constructed from the replacement expression. For example, compiling f’Bad {a b} field’ results in this args attribute: (‘f-string: …’, (‘’, 1, 4, ‘(a b)n’)). exception IndentationError Base class for syntax errors related to incorrect indentation. This is a subclass of "SyntaxError". exception TabError Raised when indentation contains an inconsistent use of tabs and spaces. This is a subclass of "IndentationError". exception SystemError Raised when the interpreter finds an internal error, but the situation does not look so serious to cause it to abandon all hope. The associated value is a string indicating what went wrong (in low-level terms). You should report this to the author or maintainer of your Python interpreter. Be sure to report the version of the Python interpreter ("sys.version"; it is also printed at the start of an interactive Python session), the exact error message (the exception’s associated value) and if possible the source of the program that triggered the error. exception SystemExit This exception is raised by the "sys.exit()" function. It inherits from "BaseException" instead of "Exception" so that it is not accidentally caught by code that catches "Exception". This allows the exception to properly propagate up and cause the interpreter to exit. When it is not handled, the Python interpreter exits; no stack traceback is printed. The constructor accepts the same optional argument passed to "sys.exit()". If the value is an integer, it specifies the system exit status (passed to C’s "exit()" function); if it is "None", the exit status is zero; if it has another type (such as a string), the object’s value is printed and the exit status is one. A call to "sys.exit()" is translated into an exception so that clean-up handlers ("finally" clauses of "try" statements) can be executed, and so that a debugger can execute a script without running the risk of losing control. The "os._exit()" function can be used if it is absolutely positively necessary to exit immediately (for example, in the child process after a call to "os.fork()"). code The exit status or error message that is passed to the constructor. (Defaults to "None".) exception TypeError Raised when an operation or function is applied to an object of inappropriate type. The associated value is a string giving details about the type mismatch. This exception may be raised by user code to indicate that an attempted operation on an object is not supported, and is not meant to be. If an object is meant to support a given operation but has not yet provided an implementation, "NotImplementedError" is the proper exception to raise. Passing arguments of the wrong type (e.g. passing a "list" when an "int" is expected) should result in a "TypeError", but passing arguments with the wrong value (e.g. a number outside expected boundaries) should result in a "ValueError". exception UnboundLocalError Raised when a reference is made to a local variable in a function or method, but no value has been bound to that variable. This is a subclass of "NameError". exception UnicodeError Raised when a Unicode-related encoding or decoding error occurs. It is a subclass of "ValueError". "UnicodeError" has attributes that describe the encoding or decoding error. For example, "err.object[err.start:err.end]" gives the particular invalid input that the codec failed on. encoding The name of the encoding that raised the error. reason A string describing the specific codec error. object The object the codec was attempting to encode or decode. start The first index of invalid data in "object". end The index after the last invalid data in "object". exception UnicodeEncodeError Raised when a Unicode-related error occurs during encoding. It is a subclass of "UnicodeError". exception UnicodeDecodeError Raised when a Unicode-related error occurs during decoding. It is a subclass of "UnicodeError". exception UnicodeTranslateError Raised when a Unicode-related error occurs during translating. It is a subclass of "UnicodeError". exception ValueError Raised when an operation or function receives an argument that has the right type but an inappropriate value, and the situation is not described by a more precise exception such as "IndexError". exception ZeroDivisionError Raised when the second argument of a division or modulo operation is zero. The associated value is a string indicating the type of the operands and the operation. The following exceptions are kept for compatibility with previous versions; starting from Python 3.3, they are aliases of "OSError". exception EnvironmentError exception IOError exception WindowsError Only available on Windows. OS exceptions ------------- The following exceptions are subclasses of "OSError", they get raised depending on the system error code. exception BlockingIOError Raised when an operation would block on an object (e.g. socket) set for non-blocking operation. Corresponds to "errno" "EAGAIN", "EALREADY", "EWOULDBLOCK" and "EINPROGRESS". In addition to those of "OSError", "BlockingIOError" can have one more attribute: characters_written An integer containing the number of characters written to the stream before it blocked. This attribute is available when using the buffered I/O classes from the "io" module. exception ChildProcessError Raised when an operation on a child process failed. Corresponds to "errno" "ECHILD". exception ConnectionError A base class for connection-related issues. Subclasses are "BrokenPipeError", "ConnectionAbortedError", "ConnectionRefusedError" and "ConnectionResetError". exception BrokenPipeError A subclass of "ConnectionError", raised when trying to write on a pipe while the other end has been closed, or trying to write on a socket which has been shutdown for writing. Corresponds to "errno" "EPIPE" and "ESHUTDOWN". exception ConnectionAbortedError A subclass of "ConnectionError", raised when a connection attempt is aborted by the peer. Corresponds to "errno" "ECONNABORTED". exception ConnectionRefusedError A subclass of "ConnectionError", raised when a connection attempt is refused by the peer. Corresponds to "errno" "ECONNREFUSED". exception ConnectionResetError A subclass of "ConnectionError", raised when a connection is reset by the peer. Corresponds to "errno" "ECONNRESET". exception FileExistsError Raised when trying to create a file or directory which already exists. Corresponds to "errno" "EEXIST". exception FileNotFoundError Raised when a file or directory is requested but doesn’t exist. Corresponds to "errno" "ENOENT". exception InterruptedError Raised when a system call is interrupted by an incoming signal. Corresponds to "errno" "EINTR". Changed in version 3.5: Python now retries system calls when a syscall is interrupted by a signal, except if the signal handler raises an exception (see **PEP 475** for the rationale), instead of raising "InterruptedError". exception IsADirectoryError Raised when a file operation (such as "os.remove()") is requested on a directory. Corresponds to "errno" "EISDIR". exception NotADirectoryError Raised when a directory operation (such as "os.listdir()") is requested on something which is not a directory. On most POSIX platforms, it may also be raised if an operation attempts to open or traverse a non-directory file as if it were a directory. Corresponds to "errno" "ENOTDIR". exception PermissionError Raised when trying to run an operation without the adequate access rights - for example filesystem permissions. Corresponds to "errno" "EACCES" and "EPERM". exception ProcessLookupError Raised when a given process doesn’t exist. Corresponds to "errno" "ESRCH". exception TimeoutError Raised when a system function timed out at the system level. Corresponds to "errno" "ETIMEDOUT". New in version 3.3: All the above "OSError" subclasses were added. See also: **PEP 3151** - Reworking the OS and IO exception hierarchy Warnings ======== The following exceptions are used as warning categories; see the Warning Categories documentation for more details. exception Warning Base class for warning categories. exception UserWarning Base class for warnings generated by user code. exception DeprecationWarning Base class for warnings about deprecated features when those warnings are intended for other Python developers. Ignored by the default warning filters, except in the "__main__" module (**PEP 565**). Enabling the Python Development Mode shows this warning. The deprecation policy is described in **PEP 387**. exception PendingDeprecationWarning Base class for warnings about features which are obsolete and expected to be deprecated in the future, but are not deprecated at the moment. This class is rarely used as emitting a warning about a possible upcoming deprecation is unusual, and "DeprecationWarning" is preferred for already active deprecations. Ignored by the default warning filters. Enabling the Python Development Mode shows this warning. The deprecation policy is described in **PEP 387**. exception SyntaxWarning Base class for warnings about dubious syntax. exception RuntimeWarning Base class for warnings about dubious runtime behavior. exception FutureWarning Base class for warnings about deprecated features when those warnings are intended for end users of applications that are written in Python. exception ImportWarning Base class for warnings about probable mistakes in module imports. Ignored by the default warning filters. Enabling the Python Development Mode shows this warning. exception UnicodeWarning Base class for warnings related to Unicode. exception BytesWarning Base class for warnings related to "bytes" and "bytearray". exception ResourceWarning Base class for warnings related to resource usage. Ignored by the default warning filters. Enabling the Python Development Mode shows this warning. New in version 3.2. Exception hierarchy =================== The class hierarchy for built-in exceptions is: BaseException +-- SystemExit +-- KeyboardInterrupt +-- GeneratorExit +-- Exception +-- StopIteration +-- StopAsyncIteration +-- ArithmeticError | +-- FloatingPointError | +-- OverflowError | +-- ZeroDivisionError +-- AssertionError +-- AttributeError +-- BufferError +-- EOFError +-- ImportError | +-- ModuleNotFoundError +-- LookupError | +-- IndexError | +-- KeyError +-- MemoryError +-- NameError | +-- UnboundLocalError +-- OSError | +-- BlockingIOError | +-- ChildProcessError | +-- ConnectionError | | +-- BrokenPipeError | | +-- ConnectionAbortedError | | +-- ConnectionRefusedError | | +-- ConnectionResetError | +-- FileExistsError | +-- FileNotFoundError | +-- InterruptedError | +-- IsADirectoryError | +-- NotADirectoryError | +-- PermissionError | +-- ProcessLookupError | +-- TimeoutError +-- ReferenceError +-- RuntimeError | +-- NotImplementedError | +-- RecursionError +-- SyntaxError | +-- IndentationError | +-- TabError +-- SystemError +-- TypeError +-- ValueError | +-- UnicodeError | +-- UnicodeDecodeError | +-- UnicodeEncodeError | +-- UnicodeTranslateError +-- Warning +-- DeprecationWarning +-- PendingDeprecationWarning +-- RuntimeWarning +-- SyntaxWarning +-- UserWarning +-- FutureWarning +-- ImportWarning +-- UnicodeWarning +-- BytesWarning +-- ResourceWarning "faulthandler" — Dump the Python traceback ****************************************** New in version 3.3. ====================================================================== This module contains functions to dump Python tracebacks explicitly, on a fault, after a timeout, or on a user signal. Call "faulthandler.enable()" to install fault handlers for the "SIGSEGV", "SIGFPE", "SIGABRT", "SIGBUS", and "SIGILL" signals. You can also enable them at startup by setting the "PYTHONFAULTHANDLER" environment variable or by using the "-X" "faulthandler" command line option. The fault handler is compatible with system fault handlers like Apport or the Windows fault handler. The module uses an alternative stack for signal handlers if the "sigaltstack()" function is available. This allows it to dump the traceback even on a stack overflow. The fault handler is called on catastrophic cases and therefore can only use signal-safe functions (e.g. it cannot allocate memory on the heap). Because of this limitation traceback dumping is minimal compared to normal Python tracebacks: * Only ASCII is supported. The "backslashreplace" error handler is used on encoding. * Each string is limited to 500 characters. * Only the filename, the function name and the line number are displayed. (no source code) * It is limited to 100 frames and 100 threads. * The order is reversed: the most recent call is shown first. By default, the Python traceback is written to "sys.stderr". To see tracebacks, applications must be run in the terminal. A log file can alternatively be passed to "faulthandler.enable()". The module is implemented in C, so tracebacks can be dumped on a crash or when Python is deadlocked. The Python Development Mode calls "faulthandler.enable()" at Python startup. Dumping the traceback ===================== faulthandler.dump_traceback(file=sys.stderr, all_threads=True) Dump the tracebacks of all threads into *file*. If *all_threads* is "False", dump only the current thread. Changed in version 3.5: Added support for passing file descriptor to this function. Fault handler state =================== faulthandler.enable(file=sys.stderr, all_threads=True) Enable the fault handler: install handlers for the "SIGSEGV", "SIGFPE", "SIGABRT", "SIGBUS" and "SIGILL" signals to dump the Python traceback. If *all_threads* is "True", produce tracebacks for every running thread. Otherwise, dump only the current thread. The *file* must be kept open until the fault handler is disabled: see issue with file descriptors. Changed in version 3.5: Added support for passing file descriptor to this function. Changed in version 3.6: On Windows, a handler for Windows exception is also installed. faulthandler.disable() Disable the fault handler: uninstall the signal handlers installed by "enable()". faulthandler.is_enabled() Check if the fault handler is enabled. Dumping the tracebacks after a timeout ====================================== faulthandler.dump_traceback_later(timeout, repeat=False, file=sys.stderr, exit=False) Dump the tracebacks of all threads, after a timeout of *timeout* seconds, or every *timeout* seconds if *repeat* is "True". If *exit* is "True", call "_exit()" with status=1 after dumping the tracebacks. (Note "_exit()" exits the process immediately, which means it doesn’t do any cleanup like flushing file buffers.) If the function is called twice, the new call replaces previous parameters and resets the timeout. The timer has a sub-second resolution. The *file* must be kept open until the traceback is dumped or "cancel_dump_traceback_later()" is called: see issue with file descriptors. This function is implemented using a watchdog thread. Changed in version 3.7: This function is now always available. Changed in version 3.5: Added support for passing file descriptor to this function. faulthandler.cancel_dump_traceback_later() Cancel the last call to "dump_traceback_later()". Dumping the traceback on a user signal ====================================== faulthandler.register(signum, file=sys.stderr, all_threads=True, chain=False) Register a user signal: install a handler for the *signum* signal to dump the traceback of all threads, or of the current thread if *all_threads* is "False", into *file*. Call the previous handler if chain is "True". The *file* must be kept open until the signal is unregistered by "unregister()": see issue with file descriptors. Not available on Windows. Changed in version 3.5: Added support for passing file descriptor to this function. faulthandler.unregister(signum) Unregister a user signal: uninstall the handler of the *signum* signal installed by "register()". Return "True" if the signal was registered, "False" otherwise. Not available on Windows. Issue with file descriptors =========================== "enable()", "dump_traceback_later()" and "register()" keep the file descriptor of their *file* argument. If the file is closed and its file descriptor is reused by a new file, or if "os.dup2()" is used to replace the file descriptor, the traceback will be written into a different file. Call these functions again each time that the file is replaced. Example ======= Example of a segmentation fault on Linux with and without enabling the fault handler: $ python3 -c "import ctypes; ctypes.string_at(0)" Segmentation fault $ python3 -q -X faulthandler >>> import ctypes >>> ctypes.string_at(0) Fatal Python error: Segmentation fault Current thread 0x00007fb899f39700 (most recent call first): File "/home/python/cpython/Lib/ctypes/__init__.py", line 486 in string_at File "", line 1 in Segmentation fault "fcntl" — The "fcntl" and "ioctl" system calls ********************************************** ====================================================================== This module performs file control and I/O control on file descriptors. It is an interface to the "fcntl()" and "ioctl()" Unix routines. For a complete description of these calls, see *fcntl(2)* and *ioctl(2)* Unix manual pages. All functions in this module take a file descriptor *fd* as their first argument. This can be an integer file descriptor, such as returned by "sys.stdin.fileno()", or an "io.IOBase" object, such as "sys.stdin" itself, which provides a "fileno()" that returns a genuine file descriptor. Changed in version 3.3: Operations in this module used to raise an "IOError" where they now raise an "OSError". Changed in version 3.8: The fcntl module now contains "F_ADD_SEALS", "F_GET_SEALS", and "F_SEAL_*" constants for sealing of "os.memfd_create()" file descriptors. Changed in version 3.9: On macOS, the fcntl module exposes the "F_GETPATH" constant, which obtains the path of a file from a file descriptor. On Linux(>=3.15), the fcntl module exposes the "F_OFD_GETLK", "F_OFD_SETLK" and "F_OFD_SETLKW" constants, which are used when working with open file description locks. The module defines the following functions: fcntl.fcntl(fd, cmd, arg=0) Perform the operation *cmd* on file descriptor *fd* (file objects providing a "fileno()" method are accepted as well). The values used for *cmd* are operating system dependent, and are available as constants in the "fcntl" module, using the same names as used in the relevant C header files. The argument *arg* can either be an integer value, or a "bytes" object. With an integer value, the return value of this function is the integer return value of the C "fcntl()" call. When the argument is bytes it represents a binary structure, e.g. created by "struct.pack()". The binary data is copied to a buffer whose address is passed to the C "fcntl()" call. The return value after a successful call is the contents of the buffer, converted to a "bytes" object. The length of the returned object will be the same as the length of the *arg* argument. This is limited to 1024 bytes. If the information returned in the buffer by the operating system is larger than 1024 bytes, this is most likely to result in a segmentation violation or a more subtle data corruption. If the "fcntl()" fails, an "OSError" is raised. Raises an auditing event "fcntl.fcntl" with arguments "fd", "cmd", "arg". fcntl.ioctl(fd, request, arg=0, mutate_flag=True) This function is identical to the "fcntl()" function, except that the argument handling is even more complicated. The *request* parameter is limited to values that can fit in 32-bits. Additional constants of interest for use as the *request* argument can be found in the "termios" module, under the same names as used in the relevant C header files. The parameter *arg* can be one of an integer, an object supporting the read-only buffer interface (like "bytes") or an object supporting the read-write buffer interface (like "bytearray"). In all but the last case, behaviour is as for the "fcntl()" function. If a mutable buffer is passed, then the behaviour is determined by the value of the *mutate_flag* parameter. If it is false, the buffer’s mutability is ignored and behaviour is as for a read-only buffer, except that the 1024 byte limit mentioned above is avoided – so long as the buffer you pass is at least as long as what the operating system wants to put there, things should work. If *mutate_flag* is true (the default), then the buffer is (in effect) passed to the underlying "ioctl()" system call, the latter’s return code is passed back to the calling Python, and the buffer’s new contents reflect the action of the "ioctl()". This is a slight simplification, because if the supplied buffer is less than 1024 bytes long it is first copied into a static buffer 1024 bytes long which is then passed to "ioctl()" and copied back into the supplied buffer. If the "ioctl()" fails, an "OSError" exception is raised. An example: >>> import array, fcntl, struct, termios, os >>> os.getpgrp() 13341 >>> struct.unpack('h', fcntl.ioctl(0, termios.TIOCGPGRP, " "))[0] 13341 >>> buf = array.array('h', [0]) >>> fcntl.ioctl(0, termios.TIOCGPGRP, buf, 1) 0 >>> buf array('h', [13341]) Raises an auditing event "fcntl.ioctl" with arguments "fd", "request", "arg". fcntl.flock(fd, operation) Perform the lock operation *operation* on file descriptor *fd* (file objects providing a "fileno()" method are accepted as well). See the Unix manual *flock(2)* for details. (On some systems, this function is emulated using "fcntl()".) If the "flock()" fails, an "OSError" exception is raised. Raises an auditing event "fcntl.flock" with arguments "fd", "operation". fcntl.lockf(fd, cmd, len=0, start=0, whence=0) This is essentially a wrapper around the "fcntl()" locking calls. *fd* is the file descriptor (file objects providing a "fileno()" method are accepted as well) of the file to lock or unlock, and *cmd* is one of the following values: * "LOCK_UN" – unlock * "LOCK_SH" – acquire a shared lock * "LOCK_EX" – acquire an exclusive lock When *cmd* is "LOCK_SH" or "LOCK_EX", it can also be bitwise ORed with "LOCK_NB" to avoid blocking on lock acquisition. If "LOCK_NB" is used and the lock cannot be acquired, an "OSError" will be raised and the exception will have an *errno* attribute set to "EACCES" or "EAGAIN" (depending on the operating system; for portability, check for both values). On at least some systems, "LOCK_EX" can only be used if the file descriptor refers to a file opened for writing. *len* is the number of bytes to lock, *start* is the byte offset at which the lock starts, relative to *whence*, and *whence* is as with "io.IOBase.seek()", specifically: * "0" – relative to the start of the file ("os.SEEK_SET") * "1" – relative to the current buffer position ("os.SEEK_CUR") * "2" – relative to the end of the file ("os.SEEK_END") The default for *start* is 0, which means to start at the beginning of the file. The default for *len* is 0 which means to lock to the end of the file. The default for *whence* is also 0. Raises an auditing event "fcntl.lockf" with arguments "fd", "cmd", "len", "start", "whence". Examples (all on a SVR4 compliant system): import struct, fcntl, os f = open(...) rv = fcntl.fcntl(f, fcntl.F_SETFL, os.O_NDELAY) lockdata = struct.pack('hhllhh', fcntl.F_WRLCK, 0, 0, 0, 0, 0) rv = fcntl.fcntl(f, fcntl.F_SETLKW, lockdata) Note that in the first example the return value variable *rv* will hold an integer value; in the second example it will hold a "bytes" object. The structure lay-out for the *lockdata* variable is system dependent — therefore using the "flock()" call may be better. See also: Module "os" If the locking flags "O_SHLOCK" and "O_EXLOCK" are present in the "os" module (on BSD only), the "os.open()" function provides an alternative to the "lockf()" and "flock()" functions. "filecmp" — File and Directory Comparisons ****************************************** **Source code:** Lib/filecmp.py ====================================================================== The "filecmp" module defines functions to compare files and directories, with various optional time/correctness trade-offs. For comparing files, see also the "difflib" module. The "filecmp" module defines the following functions: filecmp.cmp(f1, f2, shallow=True) Compare the files named *f1* and *f2*, returning "True" if they seem equal, "False" otherwise. If *shallow* is true and the "os.stat()" signatures (file type, size, and modification time) of both files are identical, the files are taken to be equal. Otherwise, the files are treated as different if their sizes or contents differ. Note that no external programs are called from this function, giving it portability and efficiency. This function uses a cache for past comparisons and the results, with cache entries invalidated if the "os.stat()" information for the file changes. The entire cache may be cleared using "clear_cache()". filecmp.cmpfiles(dir1, dir2, common, shallow=True) Compare the files in the two directories *dir1* and *dir2* whose names are given by *common*. Returns three lists of file names: *match*, *mismatch*, *errors*. *match* contains the list of files that match, *mismatch* contains the names of those that don’t, and *errors* lists the names of files which could not be compared. Files are listed in *errors* if they don’t exist in one of the directories, the user lacks permission to read them or if the comparison could not be done for some other reason. The *shallow* parameter has the same meaning and default value as for "filecmp.cmp()". For example, "cmpfiles('a', 'b', ['c', 'd/e'])" will compare "a/c" with "b/c" and "a/d/e" with "b/d/e". "'c'" and "'d/e'" will each be in one of the three returned lists. filecmp.clear_cache() Clear the filecmp cache. This may be useful if a file is compared so quickly after it is modified that it is within the mtime resolution of the underlying filesystem. New in version 3.4. The "dircmp" class ================== class filecmp.dircmp(a, b, ignore=None, hide=None) Construct a new directory comparison object, to compare the directories *a* and *b*. *ignore* is a list of names to ignore, and defaults to "filecmp.DEFAULT_IGNORES". *hide* is a list of names to hide, and defaults to "[os.curdir, os.pardir]". The "dircmp" class compares files by doing *shallow* comparisons as described for "filecmp.cmp()". The "dircmp" class provides the following methods: report() Print (to "sys.stdout") a comparison between *a* and *b*. report_partial_closure() Print a comparison between *a* and *b* and common immediate subdirectories. report_full_closure() Print a comparison between *a* and *b* and common subdirectories (recursively). The "dircmp" class offers a number of interesting attributes that may be used to get various bits of information about the directory trees being compared. Note that via "__getattr__()" hooks, all attributes are computed lazily, so there is no speed penalty if only those attributes which are lightweight to compute are used. left The directory *a*. right The directory *b*. left_list Files and subdirectories in *a*, filtered by *hide* and *ignore*. right_list Files and subdirectories in *b*, filtered by *hide* and *ignore*. common Files and subdirectories in both *a* and *b*. left_only Files and subdirectories only in *a*. right_only Files and subdirectories only in *b*. common_dirs Subdirectories in both *a* and *b*. common_files Files in both *a* and *b*. common_funny Names in both *a* and *b*, such that the type differs between the directories, or names for which "os.stat()" reports an error. same_files Files which are identical in both *a* and *b*, using the class’s file comparison operator. diff_files Files which are in both *a* and *b*, whose contents differ according to the class’s file comparison operator. funny_files Files which are in both *a* and *b*, but could not be compared. subdirs A dictionary mapping names in "common_dirs" to "dircmp" objects. filecmp.DEFAULT_IGNORES New in version 3.4. List of directories ignored by "dircmp" by default. Here is a simplified example of using the "subdirs" attribute to search recursively through two directories to show common different files: >>> from filecmp import dircmp >>> def print_diff_files(dcmp): ... for name in dcmp.diff_files: ... print("diff_file %s found in %s and %s" % (name, dcmp.left, ... dcmp.right)) ... for sub_dcmp in dcmp.subdirs.values(): ... print_diff_files(sub_dcmp) ... >>> dcmp = dircmp('dir1', 'dir2') >>> print_diff_files(dcmp) File Formats ************ The modules described in this chapter parse various miscellaneous file formats that aren’t markup languages and are not related to e-mail. * "csv" — CSV File Reading and Writing * Module Contents * Dialects and Formatting Parameters * Reader Objects * Writer Objects * Examples * "configparser" — Configuration file parser * Quick Start * Supported Datatypes * Fallback Values * Supported INI File Structure * Interpolation of values * Mapping Protocol Access * Customizing Parser Behaviour * Legacy API Examples * ConfigParser Objects * RawConfigParser Objects * Exceptions * "netrc" — netrc file processing * netrc Objects * "plistlib" — Generate and parse Apple ".plist" files * Examples "fileinput" — Iterate over lines from multiple input streams ************************************************************ **Source code:** Lib/fileinput.py ====================================================================== This module implements a helper class and functions to quickly write a loop over standard input or a list of files. If you just want to read or write one file see "open()". The typical use is: import fileinput for line in fileinput.input(): process(line) This iterates over the lines of all files listed in "sys.argv[1:]", defaulting to "sys.stdin" if the list is empty. If a filename is "'-'", it is also replaced by "sys.stdin" and the optional arguments *mode* and *openhook* are ignored. To specify an alternative list of filenames, pass it as the first argument to "input()". A single file name is also allowed. All files are opened in text mode by default, but you can override this by specifying the *mode* parameter in the call to "input()" or "FileInput". If an I/O error occurs during opening or reading a file, "OSError" is raised. Changed in version 3.3: "IOError" used to be raised; it is now an alias of "OSError". If "sys.stdin" is used more than once, the second and further use will return no lines, except perhaps for interactive use, or if it has been explicitly reset (e.g. using "sys.stdin.seek(0)"). Empty files are opened and immediately closed; the only time their presence in the list of filenames is noticeable at all is when the last file opened is empty. Lines are returned with any newlines intact, which means that the last line in a file may not have one. You can control how files are opened by providing an opening hook via the *openhook* parameter to "fileinput.input()" or "FileInput()". The hook must be a function that takes two arguments, *filename* and *mode*, and returns an accordingly opened file-like object. Two useful hooks are already provided by this module. The following function is the primary interface of this module: fileinput.input(files=None, inplace=False, backup='', *, mode='r', openhook=None) Create an instance of the "FileInput" class. The instance will be used as global state for the functions of this module, and is also returned to use during iteration. The parameters to this function will be passed along to the constructor of the "FileInput" class. The "FileInput" instance can be used as a context manager in the "with" statement. In this example, *input* is closed after the "with" statement is exited, even if an exception occurs: with fileinput.input(files=('spam.txt', 'eggs.txt')) as f: for line in f: process(line) Changed in version 3.2: Can be used as a context manager. Changed in version 3.8: The keyword parameters *mode* and *openhook* are now keyword-only. The following functions use the global state created by "fileinput.input()"; if there is no active state, "RuntimeError" is raised. fileinput.filename() Return the name of the file currently being read. Before the first line has been read, returns "None". fileinput.fileno() Return the integer “file descriptor” for the current file. When no file is opened (before the first line and between files), returns "-1". fileinput.lineno() Return the cumulative line number of the line that has just been read. Before the first line has been read, returns "0". After the last line of the last file has been read, returns the line number of that line. fileinput.filelineno() Return the line number in the current file. Before the first line has been read, returns "0". After the last line of the last file has been read, returns the line number of that line within the file. fileinput.isfirstline() Return "True" if the line just read is the first line of its file, otherwise return "False". fileinput.isstdin() Return "True" if the last line was read from "sys.stdin", otherwise return "False". fileinput.nextfile() Close the current file so that the next iteration will read the first line from the next file (if any); lines not read from the file will not count towards the cumulative line count. The filename is not changed until after the first line of the next file has been read. Before the first line has been read, this function has no effect; it cannot be used to skip the first file. After the last line of the last file has been read, this function has no effect. fileinput.close() Close the sequence. The class which implements the sequence behavior provided by the module is available for subclassing as well: class fileinput.FileInput(files=None, inplace=False, backup='', *, mode='r', openhook=None) Class "FileInput" is the implementation; its methods "filename()", "fileno()", "lineno()", "filelineno()", "isfirstline()", "isstdin()", "nextfile()" and "close()" correspond to the functions of the same name in the module. In addition it has a "readline()" method which returns the next input line, and a "__getitem__()" method which implements the sequence behavior. The sequence must be accessed in strictly sequential order; random access and "readline()" cannot be mixed. With *mode* you can specify which file mode will be passed to "open()". It must be one of "'r'", "'rU'", "'U'" and "'rb'". The *openhook*, when given, must be a function that takes two arguments, *filename* and *mode*, and returns an accordingly opened file-like object. You cannot use *inplace* and *openhook* together. A "FileInput" instance can be used as a context manager in the "with" statement. In this example, *input* is closed after the "with" statement is exited, even if an exception occurs: with FileInput(files=('spam.txt', 'eggs.txt')) as input: process(input) Changed in version 3.2: Can be used as a context manager. Deprecated since version 3.4: The "'rU'" and "'U'" modes. Deprecated since version 3.8: Support for "__getitem__()" method is deprecated. Changed in version 3.8: The keyword parameter *mode* and *openhook* are now keyword-only. **Optional in-place filtering:** if the keyword argument "inplace=True" is passed to "fileinput.input()" or to the "FileInput" constructor, the file is moved to a backup file and standard output is directed to the input file (if a file of the same name as the backup file already exists, it will be replaced silently). This makes it possible to write a filter that rewrites its input file in place. If the *backup* parameter is given (typically as "backup='.'"), it specifies the extension for the backup file, and the backup file remains around; by default, the extension is "'.bak'" and it is deleted when the output file is closed. In-place filtering is disabled when standard input is read. The two following opening hooks are provided by this module: fileinput.hook_compressed(filename, mode) Transparently opens files compressed with gzip and bzip2 (recognized by the extensions "'.gz'" and "'.bz2'") using the "gzip" and "bz2" modules. If the filename extension is not "'.gz'" or "'.bz2'", the file is opened normally (ie, using "open()" without any decompression). Usage example: "fi = fileinput.FileInput(openhook=fileinput.hook_compressed)" fileinput.hook_encoded(encoding, errors=None) Returns a hook which opens each file with "open()", using the given *encoding* and *errors* to read the file. Usage example: "fi = fileinput.FileInput(openhook=fileinput.hook_encoded("utf-8", "surrogateescape"))" Changed in version 3.6: Added the optional *errors* parameter. File and Directory Access ************************* The modules described in this chapter deal with disk files and directories. For example, there are modules for reading the properties of files, manipulating paths in a portable way, and creating temporary files. The full list of modules in this chapter is: * "pathlib" — Object-oriented filesystem paths * Basic use * Pure paths * General properties * Operators * Accessing individual parts * Methods and properties * Concrete paths * Methods * Correspondence to tools in the "os" module * "os.path" — Common pathname manipulations * "fileinput" — Iterate over lines from multiple input streams * "stat" — Interpreting "stat()" results * "filecmp" — File and Directory Comparisons * The "dircmp" class * "tempfile" — Generate temporary files and directories * Examples * Deprecated functions and variables * "glob" — Unix style pathname pattern expansion * "fnmatch" — Unix filename pattern matching * "linecache" — Random access to text lines * "shutil" — High-level file operations * Directory and files operations * Platform-dependent efficient copy operations * copytree example * rmtree example * Archiving operations * Archiving example * Archiving example with *base_dir* * Querying the size of the output terminal See also: Module "os" Operating system interfaces, including functions to work with files at a lower level than Python *file objects*. Module "io" Python’s built-in I/O library, including both abstract classes and some concrete classes such as file I/O. Built-in function "open()" The standard way to open files for reading and writing with Python. "fnmatch" — Unix filename pattern matching ****************************************** **Source code:** Lib/fnmatch.py ====================================================================== This module provides support for Unix shell-style wildcards, which are *not* the same as regular expressions (which are documented in the "re" module). The special characters used in shell-style wildcards are: +--------------+--------------------------------------+ | Pattern | Meaning | |==============|======================================| | "*" | matches everything | +--------------+--------------------------------------+ | "?" | matches any single character | +--------------+--------------------------------------+ | "[seq]" | matches any character in *seq* | +--------------+--------------------------------------+ | "[!seq]" | matches any character not in *seq* | +--------------+--------------------------------------+ For a literal match, wrap the meta-characters in brackets. For example, "'[?]'" matches the character "'?'". Note that the filename separator ("'/'" on Unix) is *not* special to this module. See module "glob" for pathname expansion ("glob" uses "filter()" to match pathname segments). Similarly, filenames starting with a period are not special for this module, and are matched by the "*" and "?" patterns. fnmatch.fnmatch(filename, pattern) Test whether the *filename* string matches the *pattern* string, returning "True" or "False". Both parameters are case-normalized using "os.path.normcase()". "fnmatchcase()" can be used to perform a case-sensitive comparison, regardless of whether that’s standard for the operating system. This example will print all file names in the current directory with the extension ".txt": import fnmatch import os for file in os.listdir('.'): if fnmatch.fnmatch(file, '*.txt'): print(file) fnmatch.fnmatchcase(filename, pattern) Test whether *filename* matches *pattern*, returning "True" or "False"; the comparison is case-sensitive and does not apply "os.path.normcase()". fnmatch.filter(names, pattern) Construct a list from those elements of the iterable *names* that match *pattern*. It is the same as "[n for n in names if fnmatch(n, pattern)]", but implemented more efficiently. fnmatch.translate(pattern) Return the shell-style *pattern* converted to a regular expression for using with "re.match()". Example: >>> import fnmatch, re >>> >>> regex = fnmatch.translate('*.txt') >>> regex '(?s:.*\\.txt)\\Z' >>> reobj = re.compile(regex) >>> reobj.match('foobar.txt') See also: Module "glob" Unix shell-style path expansion. "formatter" — Generic output formatting *************************************** Deprecated since version 3.4: Due to lack of usage, the formatter module has been deprecated. ====================================================================== This module supports two interface definitions, each with multiple implementations: The *formatter* interface, and the *writer* interface which is required by the formatter interface. Formatter objects transform an abstract flow of formatting events into specific output events on writer objects. Formatters manage several stack structures to allow various properties of a writer object to be changed and restored; writers need not be able to handle relative changes nor any sort of “change back” operation. Specific writer properties which may be controlled via formatter objects are horizontal alignment, font, and left margin indentations. A mechanism is provided which supports providing arbitrary, non-exclusive style settings to a writer as well. Additional interfaces facilitate formatting events which are not reversible, such as paragraph separation. Writer objects encapsulate device interfaces. Abstract devices, such as file formats, are supported as well as physical devices. The provided implementations all work with abstract devices. The interface makes available mechanisms for setting the properties which formatter objects manage and inserting data into the output. The Formatter Interface ======================= Interfaces to create formatters are dependent on the specific formatter class being instantiated. The interfaces described below are the required interfaces which all formatters must support once initialized. One data element is defined at the module level: formatter.AS_IS Value which can be used in the font specification passed to the "push_font()" method described below, or as the new value to any other "push_property()" method. Pushing the "AS_IS" value allows the corresponding "pop_property()" method to be called without having to track whether the property was changed. The following attributes are defined for formatter instance objects: formatter.writer The writer instance with which the formatter interacts. formatter.end_paragraph(blanklines) Close any open paragraphs and insert at least *blanklines* before the next paragraph. formatter.add_line_break() Add a hard line break if one does not already exist. This does not break the logical paragraph. formatter.add_hor_rule(*args, **kw) Insert a horizontal rule in the output. A hard break is inserted if there is data in the current paragraph, but the logical paragraph is not broken. The arguments and keywords are passed on to the writer’s "send_line_break()" method. formatter.add_flowing_data(data) Provide data which should be formatted with collapsed whitespace. Whitespace from preceding and successive calls to "add_flowing_data()" is considered as well when the whitespace collapse is performed. The data which is passed to this method is expected to be word-wrapped by the output device. Note that any word-wrapping still must be performed by the writer object due to the need to rely on device and font information. formatter.add_literal_data(data) Provide data which should be passed to the writer unchanged. Whitespace, including newline and tab characters, are considered legal in the value of *data*. formatter.add_label_data(format, counter) Insert a label which should be placed to the left of the current left margin. This should be used for constructing bulleted or numbered lists. If the *format* value is a string, it is interpreted as a format specification for *counter*, which should be an integer. The result of this formatting becomes the value of the label; if *format* is not a string it is used as the label value directly. The label value is passed as the only argument to the writer’s "send_label_data()" method. Interpretation of non- string label values is dependent on the associated writer. Format specifications are strings which, in combination with a counter value, are used to compute label values. Each character in the format string is copied to the label value, with some characters recognized to indicate a transform on the counter value. Specifically, the character "'1'" represents the counter value formatter as an Arabic number, the characters "'A'" and "'a'" represent alphabetic representations of the counter value in upper and lower case, respectively, and "'I'" and "'i'" represent the counter value in Roman numerals, in upper and lower case. Note that the alphabetic and roman transforms require that the counter value be greater than zero. formatter.flush_softspace() Send any pending whitespace buffered from a previous call to "add_flowing_data()" to the associated writer object. This should be called before any direct manipulation of the writer object. formatter.push_alignment(align) Push a new alignment setting onto the alignment stack. This may be "AS_IS" if no change is desired. If the alignment value is changed from the previous setting, the writer’s "new_alignment()" method is called with the *align* value. formatter.pop_alignment() Restore the previous alignment. formatter.push_font((size, italic, bold, teletype)) Change some or all font properties of the writer object. Properties which are not set to "AS_IS" are set to the values passed in while others are maintained at their current settings. The writer’s "new_font()" method is called with the fully resolved font specification. formatter.pop_font() Restore the previous font. formatter.push_margin(margin) Increase the number of left margin indentations by one, associating the logical tag *margin* with the new indentation. The initial margin level is "0". Changed values of the logical tag must be true values; false values other than "AS_IS" are not sufficient to change the margin. formatter.pop_margin() Restore the previous margin. formatter.push_style(*styles) Push any number of arbitrary style specifications. All styles are pushed onto the styles stack in order. A tuple representing the entire stack, including "AS_IS" values, is passed to the writer’s "new_styles()" method. formatter.pop_style(n=1) Pop the last *n* style specifications passed to "push_style()". A tuple representing the revised stack, including "AS_IS" values, is passed to the writer’s "new_styles()" method. formatter.set_spacing(spacing) Set the spacing style for the writer. formatter.assert_line_data(flag=1) Inform the formatter that data has been added to the current paragraph out-of-band. This should be used when the writer has been manipulated directly. The optional *flag* argument can be set to false if the writer manipulations produced a hard line break at the end of the output. Formatter Implementations ========================= Two implementations of formatter objects are provided by this module. Most applications may use one of these classes without modification or subclassing. class formatter.NullFormatter(writer=None) A formatter which does nothing. If *writer* is omitted, a "NullWriter" instance is created. No methods of the writer are called by "NullFormatter" instances. Implementations should inherit from this class if implementing a writer interface but don’t need to inherit any implementation. class formatter.AbstractFormatter(writer) The standard formatter. This implementation has demonstrated wide applicability to many writers, and may be used directly in most circumstances. It has been used to implement a full-featured World Wide Web browser. The Writer Interface ==================== Interfaces to create writers are dependent on the specific writer class being instantiated. The interfaces described below are the required interfaces which all writers must support once initialized. Note that while most applications can use the "AbstractFormatter" class as a formatter, the writer must typically be provided by the application. writer.flush() Flush any buffered output or device control events. writer.new_alignment(align) Set the alignment style. The *align* value can be any object, but by convention is a string or "None", where "None" indicates that the writer’s “preferred” alignment should be used. Conventional *align* values are "'left'", "'center'", "'right'", and "'justify'". writer.new_font(font) Set the font style. The value of *font* will be "None", indicating that the device’s default font should be used, or a tuple of the form "(size, italic, bold, teletype)". Size will be a string indicating the size of font that should be used; specific strings and their interpretation must be defined by the application. The *italic*, *bold*, and *teletype* values are Boolean values specifying which of those font attributes should be used. writer.new_margin(margin, level) Set the margin level to the integer *level* and the logical tag to *margin*. Interpretation of the logical tag is at the writer’s discretion; the only restriction on the value of the logical tag is that it not be a false value for non-zero values of *level*. writer.new_spacing(spacing) Set the spacing style to *spacing*. writer.new_styles(styles) Set additional styles. The *styles* value is a tuple of arbitrary values; the value "AS_IS" should be ignored. The *styles* tuple may be interpreted either as a set or as a stack depending on the requirements of the application and writer implementation. writer.send_line_break() Break the current line. writer.send_paragraph(blankline) Produce a paragraph separation of at least *blankline* blank lines, or the equivalent. The *blankline* value will be an integer. Note that the implementation will receive a call to "send_line_break()" before this call if a line break is needed; this method should not include ending the last line of the paragraph. It is only responsible for vertical spacing between paragraphs. writer.send_hor_rule(*args, **kw) Display a horizontal rule on the output device. The arguments to this method are entirely application- and writer-specific, and should be interpreted with care. The method implementation may assume that a line break has already been issued via "send_line_break()". writer.send_flowing_data(data) Output character data which may be word-wrapped and re-flowed as needed. Within any sequence of calls to this method, the writer may assume that spans of multiple whitespace characters have been collapsed to single space characters. writer.send_literal_data(data) Output character data which has already been formatted for display. Generally, this should be interpreted to mean that line breaks indicated by newline characters should be preserved and no new line breaks should be introduced. The data may contain embedded newline and tab characters, unlike data provided to the "send_formatted_data()" interface. writer.send_label_data(data) Set *data* to the left of the current left margin, if possible. The value of *data* is not restricted; treatment of non-string values is entirely application- and writer-dependent. This method will only be called at the beginning of a line. Writer Implementations ====================== Three implementations of the writer object interface are provided as examples by this module. Most applications will need to derive new writer classes from the "NullWriter" class. class formatter.NullWriter A writer which only provides the interface definition; no actions are taken on any methods. This should be the base class for all writers which do not need to inherit any implementation methods. class formatter.AbstractWriter A writer which can be used in debugging formatters, but not much else. Each method simply announces itself by printing its name and arguments on standard output. class formatter.DumbWriter(file=None, maxcol=72) Simple writer class which writes output on the *file object* passed in as *file* or, if *file* is omitted, on standard output. The output is simply word-wrapped to the number of columns specified by *maxcol*. This class is suitable for reflowing a sequence of paragraphs. "fractions" — Rational numbers ****************************** **Source code:** Lib/fractions.py ====================================================================== The "fractions" module provides support for rational number arithmetic. A Fraction instance can be constructed from a pair of integers, from another rational number, or from a string. class fractions.Fraction(numerator=0, denominator=1) class fractions.Fraction(other_fraction) class fractions.Fraction(float) class fractions.Fraction(decimal) class fractions.Fraction(string) The first version requires that *numerator* and *denominator* are instances of "numbers.Rational" and returns a new "Fraction" instance with value "numerator/denominator". If *denominator* is "0", it raises a "ZeroDivisionError". The second version requires that *other_fraction* is an instance of "numbers.Rational" and returns a "Fraction" instance with the same value. The next two versions accept either a "float" or a "decimal.Decimal" instance, and return a "Fraction" instance with exactly the same value. Note that due to the usual issues with binary floating-point (see Floating Point Arithmetic: Issues and Limitations), the argument to "Fraction(1.1)" is not exactly equal to 11/10, and so "Fraction(1.1)" does *not* return "Fraction(11, 10)" as one might expect. (But see the documentation for the "limit_denominator()" method below.) The last version of the constructor expects a string or unicode instance. The usual form for this instance is: [sign] numerator ['/' denominator] where the optional "sign" may be either ‘+’ or ‘-’ and "numerator" and "denominator" (if present) are strings of decimal digits. In addition, any string that represents a finite value and is accepted by the "float" constructor is also accepted by the "Fraction" constructor. In either form the input string may also have leading and/or trailing whitespace. Here are some examples: >>> from fractions import Fraction >>> Fraction(16, -10) Fraction(-8, 5) >>> Fraction(123) Fraction(123, 1) >>> Fraction() Fraction(0, 1) >>> Fraction('3/7') Fraction(3, 7) >>> Fraction(' -3/7 ') Fraction(-3, 7) >>> Fraction('1.414213 \t\n') Fraction(1414213, 1000000) >>> Fraction('-.125') Fraction(-1, 8) >>> Fraction('7e-6') Fraction(7, 1000000) >>> Fraction(2.25) Fraction(9, 4) >>> Fraction(1.1) Fraction(2476979795053773, 2251799813685248) >>> from decimal import Decimal >>> Fraction(Decimal('1.1')) Fraction(11, 10) The "Fraction" class inherits from the abstract base class "numbers.Rational", and implements all of the methods and operations from that class. "Fraction" instances are hashable, and should be treated as immutable. In addition, "Fraction" has the following properties and methods: Changed in version 3.2: The "Fraction" constructor now accepts "float" and "decimal.Decimal" instances. Changed in version 3.9: The "math.gcd()" function is now used to normalize the *numerator* and *denominator*. "math.gcd()" always return a "int" type. Previously, the GCD type depended on *numerator* and *denominator*. numerator Numerator of the Fraction in lowest term. denominator Denominator of the Fraction in lowest term. as_integer_ratio() Return a tuple of two integers, whose ratio is equal to the Fraction and with a positive denominator. New in version 3.8. from_float(flt) This class method constructs a "Fraction" representing the exact value of *flt*, which must be a "float". Beware that "Fraction.from_float(0.3)" is not the same value as "Fraction(3, 10)". Note: From Python 3.2 onwards, you can also construct a "Fraction" instance directly from a "float". from_decimal(dec) This class method constructs a "Fraction" representing the exact value of *dec*, which must be a "decimal.Decimal" instance. Note: From Python 3.2 onwards, you can also construct a "Fraction" instance directly from a "decimal.Decimal" instance. limit_denominator(max_denominator=1000000) Finds and returns the closest "Fraction" to "self" that has denominator at most max_denominator. This method is useful for finding rational approximations to a given floating-point number: >>> from fractions import Fraction >>> Fraction('3.1415926535897932').limit_denominator(1000) Fraction(355, 113) or for recovering a rational number that’s represented as a float: >>> from math import pi, cos >>> Fraction(cos(pi/3)) Fraction(4503599627370497, 9007199254740992) >>> Fraction(cos(pi/3)).limit_denominator() Fraction(1, 2) >>> Fraction(1.1).limit_denominator() Fraction(11, 10) __floor__() Returns the greatest "int" "<= self". This method can also be accessed through the "math.floor()" function: >>> from math import floor >>> floor(Fraction(355, 113)) 3 __ceil__() Returns the least "int" ">= self". This method can also be accessed through the "math.ceil()" function. __round__() __round__(ndigits) The first version returns the nearest "int" to "self", rounding half to even. The second version rounds "self" to the nearest multiple of "Fraction(1, 10**ndigits)" (logically, if "ndigits" is negative), again rounding half toward even. This method can also be accessed through the "round()" function. See also: Module "numbers" The abstract base classes making up the numeric tower. Program Frameworks ****************** The modules described in this chapter are frameworks that will largely dictate the structure of your program. Currently the modules described here are all oriented toward writing command-line interfaces. The full list of modules described in this chapter is: * "turtle" — Turtle graphics * Introduction * Overview of available Turtle and Screen methods * Turtle methods * Methods of TurtleScreen/Screen * Methods of RawTurtle/Turtle and corresponding functions * Turtle motion * Tell Turtle’s state * Settings for measurement * Pen control * Drawing state * Color control * Filling * More drawing control * Turtle state * Visibility * Appearance * Using events * Special Turtle methods * Compound shapes * Methods of TurtleScreen/Screen and corresponding functions * Window control * Animation control * Using screen events * Input methods * Settings and special methods * Methods specific to Screen, not inherited from TurtleScreen * Public classes * Help and configuration * How to use help * Translation of docstrings into different languages * How to configure Screen and Turtles * "turtledemo" — Demo scripts * Changes since Python 2.6 * Changes since Python 3.0 * "cmd" — Support for line-oriented command interpreters * Cmd Objects * Cmd Example * "shlex" — Simple lexical analysis * shlex Objects * Parsing Rules * Improved Compatibility with Shells "ftplib" — FTP protocol client ****************************** **Source code:** Lib/ftplib.py ====================================================================== This module defines the class "FTP" and a few related items. The "FTP" class implements the client side of the FTP protocol. You can use this to write Python programs that perform a variety of automated FTP jobs, such as mirroring other FTP servers. It is also used by the module "urllib.request" to handle URLs that use FTP. For more information on FTP (File Transfer Protocol), see Internet **RFC 959**. The default encoding is UTF-8, following **RFC 2640**. Here’s a sample session using the "ftplib" module: >>> from ftplib import FTP >>> ftp = FTP('ftp.us.debian.org') # connect to host, default port >>> ftp.login() # user anonymous, passwd anonymous@ '230 Login successful.' >>> ftp.cwd('debian') # change into "debian" directory '250 Directory successfully changed.' >>> ftp.retrlines('LIST') # list directory contents -rw-rw-r-- 1 1176 1176 1063 Jun 15 10:18 README ... drwxr-sr-x 5 1176 1176 4096 Dec 19 2000 pool drwxr-sr-x 4 1176 1176 4096 Nov 17 2008 project drwxr-xr-x 3 1176 1176 4096 Oct 10 2012 tools '226 Directory send OK.' >>> with open('README', 'wb') as fp: >>> ftp.retrbinary('RETR README', fp.write) '226 Transfer complete.' >>> ftp.quit() '221 Goodbye.' The module defines the following items: class ftplib.FTP(host='', user='', passwd='', acct='', timeout=None, source_address=None, *, encoding='utf-8') Return a new instance of the "FTP" class. When *host* is given, the method call "connect(host)" is made. When *user* is given, additionally the method call "login(user, passwd, acct)" is made (where *passwd* and *acct* default to the empty string when not given). The optional *timeout* parameter specifies a timeout in seconds for blocking operations like the connection attempt (if is not specified, the global default timeout setting will be used). *source_address* is a 2-tuple "(host, port)" for the socket to bind to as its source address before connecting. The *encoding* parameter specifies the encoding for directories and filenames. The "FTP" class supports the "with" statement, e.g.: >>> from ftplib import FTP >>> with FTP("ftp1.at.proftpd.org") as ftp: ... ftp.login() ... ftp.dir() ... '230 Anonymous login ok, restrictions apply.' dr-xr-xr-x 9 ftp ftp 154 May 6 10:43 . dr-xr-xr-x 9 ftp ftp 154 May 6 10:43 .. dr-xr-xr-x 5 ftp ftp 4096 May 6 10:43 CentOS dr-xr-xr-x 3 ftp ftp 18 Jul 10 2008 Fedora >>> Changed in version 3.2: Support for the "with" statement was added. Changed in version 3.3: *source_address* parameter was added. Changed in version 3.9: If the *timeout* parameter is set to be zero, it will raise a "ValueError" to prevent the creation of a non-blocking socket. The *encoding* parameter was added, and the default was changed from Latin-1 to UTF-8 to follow **RFC 2640**. class ftplib.FTP_TLS(host='', user='', passwd='', acct='', keyfile=None, certfile=None, context=None, timeout=None, source_address=None, *, encoding='utf-8') A "FTP" subclass which adds TLS support to FTP as described in **RFC 4217**. Connect as usual to port 21 implicitly securing the FTP control connection before authenticating. Securing the data connection requires the user to explicitly ask for it by calling the "prot_p()" method. *context* is a "ssl.SSLContext" object which allows bundling SSL configuration options, certificates and private keys into a single (potentially long-lived) structure. Please read Security considerations for best practices. *keyfile* and *certfile* are a legacy alternative to *context* – they can point to PEM-formatted private key and certificate chain files (respectively) for the SSL connection. New in version 3.2. Changed in version 3.3: *source_address* parameter was added. Changed in version 3.4: The class now supports hostname check with "ssl.SSLContext.check_hostname" and *Server Name Indication* (see "ssl.HAS_SNI"). Deprecated since version 3.6: *keyfile* and *certfile* are deprecated in favor of *context*. Please use "ssl.SSLContext.load_cert_chain()" instead, or let "ssl.create_default_context()" select the system’s trusted CA certificates for you. Changed in version 3.9: If the *timeout* parameter is set to be zero, it will raise a "ValueError" to prevent the creation of a non-blocking socket. The *encoding* parameter was added, and the default was changed from Latin-1 to UTF-8 to follow **RFC 2640**. Here’s a sample session using the "FTP_TLS" class: >>> ftps = FTP_TLS('ftp.pureftpd.org') >>> ftps.login() '230 Anonymous user logged in' >>> ftps.prot_p() '200 Data protection level set to "private"' >>> ftps.nlst() ['6jack', 'OpenBSD', 'antilink', 'blogbench', 'bsdcam', 'clockspeed', 'djbdns-jedi', 'docs', 'eaccelerator-jedi', 'favicon.ico', 'francotone', 'fugu', 'ignore', 'libpuzzle', 'metalog', 'minidentd', 'misc', 'mysql-udf-global-user-variables', 'php-jenkins-hash', 'php-skein-hash', 'php-webdav', 'phpaudit', 'phpbench', 'pincaster', 'ping', 'posto', 'pub', 'public', 'public_keys', 'pure-ftpd', 'qscan', 'qtc', 'sharedance', 'skycache', 'sound', 'tmp', 'ucarp'] exception ftplib.error_reply Exception raised when an unexpected reply is received from the server. exception ftplib.error_temp Exception raised when an error code signifying a temporary error (response codes in the range 400–499) is received. exception ftplib.error_perm Exception raised when an error code signifying a permanent error (response codes in the range 500–599) is received. exception ftplib.error_proto Exception raised when a reply is received from the server that does not fit the response specifications of the File Transfer Protocol, i.e. begin with a digit in the range 1–5. ftplib.all_errors The set of all exceptions (as a tuple) that methods of "FTP" instances may raise as a result of problems with the FTP connection (as opposed to programming errors made by the caller). This set includes the four exceptions listed above as well as "OSError" and "EOFError". See also: Module "netrc" Parser for the ".netrc" file format. The file ".netrc" is typically used by FTP clients to load user authentication information before prompting the user. FTP Objects =========== Several methods are available in two flavors: one for handling text files and another for binary files. These are named for the command which is used followed by "lines" for the text version or "binary" for the binary version. "FTP" instances have the following methods: FTP.set_debuglevel(level) Set the instance’s debugging level. This controls the amount of debugging output printed. The default, "0", produces no debugging output. A value of "1" produces a moderate amount of debugging output, generally a single line per request. A value of "2" or higher produces the maximum amount of debugging output, logging each line sent and received on the control connection. FTP.connect(host='', port=0, timeout=None, source_address=None) Connect to the given host and port. The default port number is "21", as specified by the FTP protocol specification. It is rarely needed to specify a different port number. This function should be called only once for each instance; it should not be called at all if a host was given when the instance was created. All other methods can only be used after a connection has been made. The optional *timeout* parameter specifies a timeout in seconds for the connection attempt. If no *timeout* is passed, the global default timeout setting will be used. *source_address* is a 2-tuple "(host, port)" for the socket to bind to as its source address before connecting. Raises an auditing event "ftplib.connect" with arguments "self", "host", "port". Changed in version 3.3: *source_address* parameter was added. FTP.getwelcome() Return the welcome message sent by the server in reply to the initial connection. (This message sometimes contains disclaimers or help information that may be relevant to the user.) FTP.login(user='anonymous', passwd='', acct='') Log in as the given *user*. The *passwd* and *acct* parameters are optional and default to the empty string. If no *user* is specified, it defaults to "'anonymous'". If *user* is "'anonymous'", the default *passwd* is "'anonymous@'". This function should be called only once for each instance, after a connection has been established; it should not be called at all if a host and user were given when the instance was created. Most FTP commands are only allowed after the client has logged in. The *acct* parameter supplies “accounting information”; few systems implement this. FTP.abort() Abort a file transfer that is in progress. Using this does not always work, but it’s worth a try. FTP.sendcmd(cmd) Send a simple command string to the server and return the response string. Raises an auditing event "ftplib.sendcmd" with arguments "self", "cmd". FTP.voidcmd(cmd) Send a simple command string to the server and handle the response. Return nothing if a response code corresponding to success (codes in the range 200–299) is received. Raise "error_reply" otherwise. Raises an auditing event "ftplib.sendcmd" with arguments "self", "cmd". FTP.retrbinary(cmd, callback, blocksize=8192, rest=None) Retrieve a file in binary transfer mode. *cmd* should be an appropriate "RETR" command: "'RETR filename'". The *callback* function is called for each block of data received, with a single bytes argument giving the data block. The optional *blocksize* argument specifies the maximum chunk size to read on the low-level socket object created to do the actual transfer (which will also be the largest size of the data blocks passed to *callback*). A reasonable default is chosen. *rest* means the same thing as in the "transfercmd()" method. FTP.retrlines(cmd, callback=None) Retrieve a file or directory listing in the encoding specified by the *encoding* parameter at initialization. *cmd* should be an appropriate "RETR" command (see "retrbinary()") or a command such as "LIST" or "NLST" (usually just the string "'LIST'"). "LIST" retrieves a list of files and information about those files. "NLST" retrieves a list of file names. The *callback* function is called for each line with a string argument containing the line with the trailing CRLF stripped. The default *callback* prints the line to "sys.stdout". FTP.set_pasv(val) Enable “passive” mode if *val* is true, otherwise disable passive mode. Passive mode is on by default. FTP.storbinary(cmd, fp, blocksize=8192, callback=None, rest=None) Store a file in binary transfer mode. *cmd* should be an appropriate "STOR" command: ""STOR filename"". *fp* is a *file object* (opened in binary mode) which is read until EOF using its "read()" method in blocks of size *blocksize* to provide the data to be stored. The *blocksize* argument defaults to 8192. *callback* is an optional single parameter callable that is called on each block of data after it is sent. *rest* means the same thing as in the "transfercmd()" method. Changed in version 3.2: *rest* parameter added. FTP.storlines(cmd, fp, callback=None) Store a file in line mode. *cmd* should be an appropriate "STOR" command (see "storbinary()"). Lines are read until EOF from the *file object* *fp* (opened in binary mode) using its "readline()" method to provide the data to be stored. *callback* is an optional single parameter callable that is called on each line after it is sent. FTP.transfercmd(cmd, rest=None) Initiate a transfer over the data connection. If the transfer is active, send an "EPRT" or "PORT" command and the transfer command specified by *cmd*, and accept the connection. If the server is passive, send an "EPSV" or "PASV" command, connect to it, and start the transfer command. Either way, return the socket for the connection. If optional *rest* is given, a "REST" command is sent to the server, passing *rest* as an argument. *rest* is usually a byte offset into the requested file, telling the server to restart sending the file’s bytes at the requested offset, skipping over the initial bytes. Note however that the "transfercmd()" method converts *rest* to a string with the *encoding* parameter specified at initialization, but no check is performed on the string’s contents. If the server does not recognize the "REST" command, an "error_reply" exception will be raised. If this happens, simply call "transfercmd()" without a *rest* argument. FTP.ntransfercmd(cmd, rest=None) Like "transfercmd()", but returns a tuple of the data connection and the expected size of the data. If the expected size could not be computed, "None" will be returned as the expected size. *cmd* and *rest* means the same thing as in "transfercmd()". FTP.mlsd(path="", facts=[]) List a directory in a standardized format by using "MLSD" command (**RFC 3659**). If *path* is omitted the current directory is assumed. *facts* is a list of strings representing the type of information desired (e.g. "["type", "size", "perm"]"). Return a generator object yielding a tuple of two elements for every file found in path. First element is the file name, the second one is a dictionary containing facts about the file name. Content of this dictionary might be limited by the *facts* argument but server is not guaranteed to return all requested facts. New in version 3.3. FTP.nlst(argument[, ...]) Return a list of file names as returned by the "NLST" command. The optional *argument* is a directory to list (default is the current server directory). Multiple arguments can be used to pass non- standard options to the "NLST" command. Note: If your server supports the command, "mlsd()" offers a better API. FTP.dir(argument[, ...]) Produce a directory listing as returned by the "LIST" command, printing it to standard output. The optional *argument* is a directory to list (default is the current server directory). Multiple arguments can be used to pass non-standard options to the "LIST" command. If the last argument is a function, it is used as a *callback* function as for "retrlines()"; the default prints to "sys.stdout". This method returns "None". Note: If your server supports the command, "mlsd()" offers a better API. FTP.rename(fromname, toname) Rename file *fromname* on the server to *toname*. FTP.delete(filename) Remove the file named *filename* from the server. If successful, returns the text of the response, otherwise raises "error_perm" on permission errors or "error_reply" on other errors. FTP.cwd(pathname) Set the current directory on the server. FTP.mkd(pathname) Create a new directory on the server. FTP.pwd() Return the pathname of the current directory on the server. FTP.rmd(dirname) Remove the directory named *dirname* on the server. FTP.size(filename) Request the size of the file named *filename* on the server. On success, the size of the file is returned as an integer, otherwise "None" is returned. Note that the "SIZE" command is not standardized, but is supported by many common server implementations. FTP.quit() Send a "QUIT" command to the server and close the connection. This is the “polite” way to close a connection, but it may raise an exception if the server responds with an error to the "QUIT" command. This implies a call to the "close()" method which renders the "FTP" instance useless for subsequent calls (see below). FTP.close() Close the connection unilaterally. This should not be applied to an already closed connection such as after a successful call to "quit()". After this call the "FTP" instance should not be used any more (after a call to "close()" or "quit()" you cannot reopen the connection by issuing another "login()" method). FTP_TLS Objects =============== "FTP_TLS" class inherits from "FTP", defining these additional objects: FTP_TLS.ssl_version The SSL version to use (defaults to "ssl.PROTOCOL_SSLv23"). FTP_TLS.auth() Set up a secure control connection by using TLS or SSL, depending on what is specified in the "ssl_version" attribute. Changed in version 3.4: The method now supports hostname check with "ssl.SSLContext.check_hostname" and *Server Name Indication* (see "ssl.HAS_SNI"). FTP_TLS.ccc() Revert control channel back to plaintext. This can be useful to take advantage of firewalls that know how to handle NAT with non- secure FTP without opening fixed ports. New in version 3.3. FTP_TLS.prot_p() Set up secure data connection. FTP_TLS.prot_c() Set up clear text data connection. Functional Programming Modules ****************************** The modules described in this chapter provide functions and classes that support a functional programming style, and general operations on callables. The following modules are documented in this chapter: * "itertools" — Functions creating iterators for efficient looping * Itertool functions * Itertools Recipes * "functools" — Higher-order functions and operations on callable objects * "partial" Objects * "operator" — Standard operators as functions * Mapping Operators to Functions * In-place Operators Built-in Functions ****************** The Python interpreter has a number of functions and types built into it that are always available. They are listed here in alphabetical order. +---------------------+-------------------+--------------------+--------------------+----------------------+ | | | Built-in Functions | | | |=====================|===================|====================|====================|======================| | "abs()" | "delattr()" | "hash()" | "memoryview()" | "set()" | +---------------------+-------------------+--------------------+--------------------+----------------------+ | "all()" | "dict()" | "help()" | "min()" | "setattr()" | +---------------------+-------------------+--------------------+--------------------+----------------------+ | "any()" | "dir()" | "hex()" | "next()" | "slice()" | +---------------------+-------------------+--------------------+--------------------+----------------------+ | "ascii()" | "divmod()" | "id()" | "object()" | "sorted()" | +---------------------+-------------------+--------------------+--------------------+----------------------+ | "bin()" | "enumerate()" | "input()" | "oct()" | "staticmethod()" | +---------------------+-------------------+--------------------+--------------------+----------------------+ | "bool()" | "eval()" | "int()" | "open()" | "str()" | +---------------------+-------------------+--------------------+--------------------+----------------------+ | "breakpoint()" | "exec()" | "isinstance()" | "ord()" | "sum()" | +---------------------+-------------------+--------------------+--------------------+----------------------+ | "bytearray()" | "filter()" | "issubclass()" | "pow()" | "super()" | +---------------------+-------------------+--------------------+--------------------+----------------------+ | "bytes()" | "float()" | "iter()" | "print()" | "tuple()" | +---------------------+-------------------+--------------------+--------------------+----------------------+ | "callable()" | "format()" | "len()" | "property()" | "type()" | +---------------------+-------------------+--------------------+--------------------+----------------------+ | "chr()" | "frozenset()" | "list()" | "range()" | "vars()" | +---------------------+-------------------+--------------------+--------------------+----------------------+ | "classmethod()" | "getattr()" | "locals()" | "repr()" | "zip()" | +---------------------+-------------------+--------------------+--------------------+----------------------+ | "compile()" | "globals()" | "map()" | "reversed()" | "__import__()" | +---------------------+-------------------+--------------------+--------------------+----------------------+ | "complex()" | "hasattr()" | "max()" | "round()" | | +---------------------+-------------------+--------------------+--------------------+----------------------+ abs(x) Return the absolute value of a number. The argument may be an integer, a floating point number, or an object implementing "__abs__()". If the argument is a complex number, its magnitude is returned. all(iterable) Return "True" if all elements of the *iterable* are true (or if the iterable is empty). Equivalent to: def all(iterable): for element in iterable: if not element: return False return True any(iterable) Return "True" if any element of the *iterable* is true. If the iterable is empty, return "False". Equivalent to: def any(iterable): for element in iterable: if element: return True return False ascii(object) As "repr()", return a string containing a printable representation of an object, but escape the non-ASCII characters in the string returned by "repr()" using "\x", "\u" or "\U" escapes. This generates a string similar to that returned by "repr()" in Python 2. bin(x) Convert an integer number to a binary string prefixed with “0b”. The result is a valid Python expression. If *x* is not a Python "int" object, it has to define an "__index__()" method that returns an integer. Some examples: >>> bin(3) '0b11' >>> bin(-10) '-0b1010' If prefix “0b” is desired or not, you can use either of the following ways. >>> format(14, '#b'), format(14, 'b') ('0b1110', '1110') >>> f'{14:#b}', f'{14:b}' ('0b1110', '1110') See also "format()" for more information. class bool([x]) Return a Boolean value, i.e. one of "True" or "False". *x* is converted using the standard truth testing procedure. If *x* is false or omitted, this returns "False"; otherwise it returns "True". The "bool" class is a subclass of "int" (see Numeric Types — int, float, complex). It cannot be subclassed further. Its only instances are "False" and "True" (see Boolean Values). Changed in version 3.7: *x* is now a positional-only parameter. breakpoint(*args, **kws) This function drops you into the debugger at the call site. Specifically, it calls "sys.breakpointhook()", passing "args" and "kws" straight through. By default, "sys.breakpointhook()" calls "pdb.set_trace()" expecting no arguments. In this case, it is purely a convenience function so you don’t have to explicitly import "pdb" or type as much code to enter the debugger. However, "sys.breakpointhook()" can be set to some other function and "breakpoint()" will automatically call that, allowing you to drop into the debugger of choice. Raises an auditing event "builtins.breakpoint" with argument "breakpointhook". New in version 3.7. class bytearray([source[, encoding[, errors]]]) Return a new array of bytes. The "bytearray" class is a mutable sequence of integers in the range 0 <= x < 256. It has most of the usual methods of mutable sequences, described in Mutable Sequence Types, as well as most methods that the "bytes" type has, see Bytes and Bytearray Operations. The optional *source* parameter can be used to initialize the array in a few different ways: * If it is a *string*, you must also give the *encoding* (and optionally, *errors*) parameters; "bytearray()" then converts the string to bytes using "str.encode()". * If it is an *integer*, the array will have that size and will be initialized with null bytes. * If it is an object conforming to the buffer interface, a read- only buffer of the object will be used to initialize the bytes array. * If it is an *iterable*, it must be an iterable of integers in the range "0 <= x < 256", which are used as the initial contents of the array. Without an argument, an array of size 0 is created. See also Binary Sequence Types — bytes, bytearray, memoryview and Bytearray Objects. class bytes([source[, encoding[, errors]]]) Return a new “bytes” object, which is an immutable sequence of integers in the range "0 <= x < 256". "bytes" is an immutable version of "bytearray" – it has the same non-mutating methods and the same indexing and slicing behavior. Accordingly, constructor arguments are interpreted as for "bytearray()". Bytes objects can also be created with literals, see String and Bytes literals. See also Binary Sequence Types — bytes, bytearray, memoryview, Bytes Objects, and Bytes and Bytearray Operations. callable(object) Return "True" if the *object* argument appears callable, "False" if not. If this returns "True", it is still possible that a call fails, but if it is "False", calling *object* will never succeed. Note that classes are callable (calling a class returns a new instance); instances are callable if their class has a "__call__()" method. New in version 3.2: This function was first removed in Python 3.0 and then brought back in Python 3.2. chr(i) Return the string representing a character whose Unicode code point is the integer *i*. For example, "chr(97)" returns the string "'a'", while "chr(8364)" returns the string "'€'". This is the inverse of "ord()". The valid range for the argument is from 0 through 1,114,111 (0x10FFFF in base 16). "ValueError" will be raised if *i* is outside that range. @classmethod Transform a method into a class method. A class method receives the class as implicit first argument, just like an instance method receives the instance. To declare a class method, use this idiom: class C: @classmethod def f(cls, arg1, arg2): ... The "@classmethod" form is a function *decorator* – see Function definitions for details. A class method can be called either on the class (such as "C.f()") or on an instance (such as "C().f()"). The instance is ignored except for its class. If a class method is called for a derived class, the derived class object is passed as the implied first argument. Class methods are different than C++ or Java static methods. If you want those, see "staticmethod()" in this section. For more information on class methods, see The standard type hierarchy. Changed in version 3.9: Class methods can now wrap other *descriptors* such as "property()". compile(source, filename, mode, flags=0, dont_inherit=False, optimize=-1) Compile the *source* into a code or AST object. Code objects can be executed by "exec()" or "eval()". *source* can either be a normal string, a byte string, or an AST object. Refer to the "ast" module documentation for information on how to work with AST objects. The *filename* argument should give the file from which the code was read; pass some recognizable value if it wasn’t read from a file ("''" is commonly used). The *mode* argument specifies what kind of code must be compiled; it can be "'exec'" if *source* consists of a sequence of statements, "'eval'" if it consists of a single expression, or "'single'" if it consists of a single interactive statement (in the latter case, expression statements that evaluate to something other than "None" will be printed). The optional arguments *flags* and *dont_inherit* control which compiler options should be activated and which future features should be allowed. If neither is present (or both are zero) the code is compiled with the same flags that affect the code that is calling "compile()". If the *flags* argument is given and *dont_inherit* is not (or is zero) then the compiler options and the future statements specified by the *flags* argument are used in addition to those that would be used anyway. If *dont_inherit* is a non-zero integer then the *flags* argument is it – the flags (future features and compiler options) in the surrounding code are ignored. Compiler options and future statements are specified by bits which can be bitwise ORed together to specify multiple options. The bitfield required to specify a given future feature can be found as the "compiler_flag" attribute on the "_Feature" instance in the "__future__" module. Compiler flags can be found in "ast" module, with "PyCF_" prefix. The argument *optimize* specifies the optimization level of the compiler; the default value of "-1" selects the optimization level of the interpreter as given by "-O" options. Explicit levels are "0" (no optimization; "__debug__" is true), "1" (asserts are removed, "__debug__" is false) or "2" (docstrings are removed too). This function raises "SyntaxError" if the compiled source is invalid, and "ValueError" if the source contains null bytes. If you want to parse Python code into its AST representation, see "ast.parse()". Raises an auditing event "compile" with arguments "source" and "filename". This event may also be raised by implicit compilation. Note: When compiling a string with multi-line code in "'single'" or "'eval'" mode, input must be terminated by at least one newline character. This is to facilitate detection of incomplete and complete statements in the "code" module. Warning: It is possible to crash the Python interpreter with a sufficiently large/complex string when compiling to an AST object due to stack depth limitations in Python’s AST compiler. Changed in version 3.2: Allowed use of Windows and Mac newlines. Also input in "'exec'" mode does not have to end in a newline anymore. Added the *optimize* parameter. Changed in version 3.5: Previously, "TypeError" was raised when null bytes were encountered in *source*. New in version 3.8: "ast.PyCF_ALLOW_TOP_LEVEL_AWAIT" can now be passed in flags to enable support for top-level "await", "async for", and "async with". class complex([real[, imag]]) Return a complex number with the value *real* + *imag**1j or convert a string or number to a complex number. If the first parameter is a string, it will be interpreted as a complex number and the function must be called without a second parameter. The second parameter can never be a string. Each argument may be any numeric type (including complex). If *imag* is omitted, it defaults to zero and the constructor serves as a numeric conversion like "int" and "float". If both arguments are omitted, returns "0j". For a general Python object "x", "complex(x)" delegates to "x.__complex__()". If "__complex__()" is not defined then it falls back to "__float__()". If "__float__()" is not defined then it falls back to "__index__()". Note: When converting from a string, the string must not contain whitespace around the central "+" or "-" operator. For example, "complex('1+2j')" is fine, but "complex('1 + 2j')" raises "ValueError". The complex type is described in Numeric Types — int, float, complex. Changed in version 3.6: Grouping digits with underscores as in code literals is allowed. Changed in version 3.8: Falls back to "__index__()" if "__complex__()" and "__float__()" are not defined. delattr(object, name) This is a relative of "setattr()". The arguments are an object and a string. The string must be the name of one of the object’s attributes. The function deletes the named attribute, provided the object allows it. For example, "delattr(x, 'foobar')" is equivalent to "del x.foobar". class dict(**kwarg) class dict(mapping, **kwarg) class dict(iterable, **kwarg) Create a new dictionary. The "dict" object is the dictionary class. See "dict" and Mapping Types — dict for documentation about this class. For other containers see the built-in "list", "set", and "tuple" classes, as well as the "collections" module. dir([object]) Without arguments, return the list of names in the current local scope. With an argument, attempt to return a list of valid attributes for that object. If the object has a method named "__dir__()", this method will be called and must return the list of attributes. This allows objects that implement a custom "__getattr__()" or "__getattribute__()" function to customize the way "dir()" reports their attributes. If the object does not provide "__dir__()", the function tries its best to gather information from the object’s "__dict__" attribute, if defined, and from its type object. The resulting list is not necessarily complete, and may be inaccurate when the object has a custom "__getattr__()". The default "dir()" mechanism behaves differently with different types of objects, as it attempts to produce the most relevant, rather than complete, information: * If the object is a module object, the list contains the names of the module’s attributes. * If the object is a type or class object, the list contains the names of its attributes, and recursively of the attributes of its bases. * Otherwise, the list contains the object’s attributes’ names, the names of its class’s attributes, and recursively of the attributes of its class’s base classes. The resulting list is sorted alphabetically. For example: >>> import struct >>> dir() # show the names in the module namespace ['__builtins__', '__name__', 'struct'] >>> dir(struct) # show the names in the struct module ['Struct', '__all__', '__builtins__', '__cached__', '__doc__', '__file__', '__initializing__', '__loader__', '__name__', '__package__', '_clearcache', 'calcsize', 'error', 'pack', 'pack_into', 'unpack', 'unpack_from'] >>> class Shape: ... def __dir__(self): ... return ['area', 'perimeter', 'location'] >>> s = Shape() >>> dir(s) ['area', 'location', 'perimeter'] Note: Because "dir()" is supplied primarily as a convenience for use at an interactive prompt, it tries to supply an interesting set of names more than it tries to supply a rigorously or consistently defined set of names, and its detailed behavior may change across releases. For example, metaclass attributes are not in the result list when the argument is a class. divmod(a, b) Take two (non complex) numbers as arguments and return a pair of numbers consisting of their quotient and remainder when using integer division. With mixed operand types, the rules for binary arithmetic operators apply. For integers, the result is the same as "(a // b, a % b)". For floating point numbers the result is "(q, a % b)", where *q* is usually "math.floor(a / b)" but may be 1 less than that. In any case "q * b + a % b" is very close to *a*, if "a % b" is non-zero it has the same sign as *b*, and "0 <= abs(a % b) < abs(b)". enumerate(iterable, start=0) Return an enumerate object. *iterable* must be a sequence, an *iterator*, or some other object which supports iteration. The "__next__()" method of the iterator returned by "enumerate()" returns a tuple containing a count (from *start* which defaults to 0) and the values obtained from iterating over *iterable*. >>> seasons = ['Spring', 'Summer', 'Fall', 'Winter'] >>> list(enumerate(seasons)) [(0, 'Spring'), (1, 'Summer'), (2, 'Fall'), (3, 'Winter')] >>> list(enumerate(seasons, start=1)) [(1, 'Spring'), (2, 'Summer'), (3, 'Fall'), (4, 'Winter')] Equivalent to: def enumerate(sequence, start=0): n = start for elem in sequence: yield n, elem n += 1 eval(expression[, globals[, locals]]) The arguments are a string and optional globals and locals. If provided, *globals* must be a dictionary. If provided, *locals* can be any mapping object. The *expression* argument is parsed and evaluated as a Python expression (technically speaking, a condition list) using the *globals* and *locals* dictionaries as global and local namespace. If the *globals* dictionary is present and does not contain a value for the key "__builtins__", a reference to the dictionary of the built-in module "builtins" is inserted under that key before *expression* is parsed. This means that *expression* normally has full access to the standard "builtins" module and restricted environments are propagated. If the *locals* dictionary is omitted it defaults to the *globals* dictionary. If both dictionaries are omitted, the expression is executed with the *globals* and *locals* in the environment where "eval()" is called. Note, *eval()* does not have access to the *nested scopes* (non-locals) in the enclosing environment. The return value is the result of the evaluated expression. Syntax errors are reported as exceptions. Example: >>> x = 1 >>> eval('x+1') 2 This function can also be used to execute arbitrary code objects (such as those created by "compile()"). In this case pass a code object instead of a string. If the code object has been compiled with "'exec'" as the *mode* argument, "eval()"’s return value will be "None". Hints: dynamic execution of statements is supported by the "exec()" function. The "globals()" and "locals()" functions returns the current global and local dictionary, respectively, which may be useful to pass around for use by "eval()" or "exec()". See "ast.literal_eval()" for a function that can safely evaluate strings with expressions containing only literals. Raises an auditing event "exec" with the code object as the argument. Code compilation events may also be raised. exec(object[, globals[, locals]]) This function supports dynamic execution of Python code. *object* must be either a string or a code object. If it is a string, the string is parsed as a suite of Python statements which is then executed (unless a syntax error occurs). [1] If it is a code object, it is simply executed. In all cases, the code that’s executed is expected to be valid as file input (see the section File input in the Reference Manual). Be aware that the "nonlocal", "yield", and "return" statements may not be used outside of function definitions even within the context of code passed to the "exec()" function. The return value is "None". In all cases, if the optional parts are omitted, the code is executed in the current scope. If only *globals* is provided, it must be a dictionary (and not a subclass of dictionary), which will be used for both the global and the local variables. If *globals* and *locals* are given, they are used for the global and local variables, respectively. If provided, *locals* can be any mapping object. Remember that at module level, globals and locals are the same dictionary. If exec gets two separate objects as *globals* and *locals*, the code will be executed as if it were embedded in a class definition. If the *globals* dictionary does not contain a value for the key "__builtins__", a reference to the dictionary of the built-in module "builtins" is inserted under that key. That way you can control what builtins are available to the executed code by inserting your own "__builtins__" dictionary into *globals* before passing it to "exec()". Raises an auditing event "exec" with the code object as the argument. Code compilation events may also be raised. Note: The built-in functions "globals()" and "locals()" return the current global and local dictionary, respectively, which may be useful to pass around for use as the second and third argument to "exec()". Note: The default *locals* act as described for function "locals()" below: modifications to the default *locals* dictionary should not be attempted. Pass an explicit *locals* dictionary if you need to see effects of the code on *locals* after function "exec()" returns. filter(function, iterable) Construct an iterator from those elements of *iterable* for which *function* returns true. *iterable* may be either a sequence, a container which supports iteration, or an iterator. If *function* is "None", the identity function is assumed, that is, all elements of *iterable* that are false are removed. Note that "filter(function, iterable)" is equivalent to the generator expression "(item for item in iterable if function(item))" if function is not "None" and "(item for item in iterable if item)" if function is "None". See "itertools.filterfalse()" for the complementary function that returns elements of *iterable* for which *function* returns false. class float([x]) Return a floating point number constructed from a number or string *x*. If the argument is a string, it should contain a decimal number, optionally preceded by a sign, and optionally embedded in whitespace. The optional sign may be "'+'" or "'-'"; a "'+'" sign has no effect on the value produced. The argument may also be a string representing a NaN (not-a-number), or a positive or negative infinity. More precisely, the input must conform to the following grammar after leading and trailing whitespace characters are removed: sign ::= "+" | "-" infinity ::= "Infinity" | "inf" nan ::= "nan" numeric_value ::= floatnumber | infinity | nan numeric_string ::= [sign] numeric_value Here "floatnumber" is the form of a Python floating-point literal, described in Floating point literals. Case is not significant, so, for example, “inf”, “Inf”, “INFINITY” and “iNfINity” are all acceptable spellings for positive infinity. Otherwise, if the argument is an integer or a floating point number, a floating point number with the same value (within Python’s floating point precision) is returned. If the argument is outside the range of a Python float, an "OverflowError" will be raised. For a general Python object "x", "float(x)" delegates to "x.__float__()". If "__float__()" is not defined then it falls back to "__index__()". If no argument is given, "0.0" is returned. Examples: >>> float('+1.23') 1.23 >>> float(' -12345\n') -12345.0 >>> float('1e-003') 0.001 >>> float('+1E6') 1000000.0 >>> float('-Infinity') -inf The float type is described in Numeric Types — int, float, complex. Changed in version 3.6: Grouping digits with underscores as in code literals is allowed. Changed in version 3.7: *x* is now a positional-only parameter. Changed in version 3.8: Falls back to "__index__()" if "__float__()" is not defined. format(value[, format_spec]) Convert a *value* to a “formatted” representation, as controlled by *format_spec*. The interpretation of *format_spec* will depend on the type of the *value* argument, however there is a standard formatting syntax that is used by most built-in types: Format Specification Mini-Language. The default *format_spec* is an empty string which usually gives the same effect as calling "str(value)". A call to "format(value, format_spec)" is translated to "type(value).__format__(value, format_spec)" which bypasses the instance dictionary when searching for the value’s "__format__()" method. A "TypeError" exception is raised if the method search reaches "object" and the *format_spec* is non-empty, or if either the *format_spec* or the return value are not strings. Changed in version 3.4: "object().__format__(format_spec)" raises "TypeError" if *format_spec* is not an empty string. class frozenset([iterable]) Return a new "frozenset" object, optionally with elements taken from *iterable*. "frozenset" is a built-in class. See "frozenset" and Set Types — set, frozenset for documentation about this class. For other containers see the built-in "set", "list", "tuple", and "dict" classes, as well as the "collections" module. getattr(object, name[, default]) Return the value of the named attribute of *object*. *name* must be a string. If the string is the name of one of the object’s attributes, the result is the value of that attribute. For example, "getattr(x, 'foobar')" is equivalent to "x.foobar". If the named attribute does not exist, *default* is returned if provided, otherwise "AttributeError" is raised. Note: Since private name mangling happens at compilation time, one must manually mangle a private attribute’s (attributes with two leading underscores) name in order to retrieve it with "getattr()". globals() Return the dictionary implementing the current module namespace. For code within functions, this is set when the function is defined and remains the same regardless of where the function is called. hasattr(object, name) The arguments are an object and a string. The result is "True" if the string is the name of one of the object’s attributes, "False" if not. (This is implemented by calling "getattr(object, name)" and seeing whether it raises an "AttributeError" or not.) hash(object) Return the hash value of the object (if it has one). Hash values are integers. They are used to quickly compare dictionary keys during a dictionary lookup. Numeric values that compare equal have the same hash value (even if they are of different types, as is the case for 1 and 1.0). Note: For objects with custom "__hash__()" methods, note that "hash()" truncates the return value based on the bit width of the host machine. See "__hash__()" for details. help([object]) Invoke the built-in help system. (This function is intended for interactive use.) If no argument is given, the interactive help system starts on the interpreter console. If the argument is a string, then the string is looked up as the name of a module, function, class, method, keyword, or documentation topic, and a help page is printed on the console. If the argument is any other kind of object, a help page on the object is generated. Note that if a slash(/) appears in the parameter list of a function, when invoking "help()", it means that the parameters prior to the slash are positional-only. For more info, see the FAQ entry on positional-only parameters. This function is added to the built-in namespace by the "site" module. Changed in version 3.4: Changes to "pydoc" and "inspect" mean that the reported signatures for callables are now more comprehensive and consistent. hex(x) Convert an integer number to a lowercase hexadecimal string prefixed with “0x”. If *x* is not a Python "int" object, it has to define an "__index__()" method that returns an integer. Some examples: >>> hex(255) '0xff' >>> hex(-42) '-0x2a' If you want to convert an integer number to an uppercase or lower hexadecimal string with prefix or not, you can use either of the following ways: >>> '%#x' % 255, '%x' % 255, '%X' % 255 ('0xff', 'ff', 'FF') >>> format(255, '#x'), format(255, 'x'), format(255, 'X') ('0xff', 'ff', 'FF') >>> f'{255:#x}', f'{255:x}', f'{255:X}' ('0xff', 'ff', 'FF') See also "format()" for more information. See also "int()" for converting a hexadecimal string to an integer using a base of 16. Note: To obtain a hexadecimal string representation for a float, use the "float.hex()" method. id(object) Return the “identity” of an object. This is an integer which is guaranteed to be unique and constant for this object during its lifetime. Two objects with non-overlapping lifetimes may have the same "id()" value. **CPython implementation detail:** This is the address of the object in memory. Raises an auditing event "builtins.id" with argument "id". input([prompt]) If the *prompt* argument is present, it is written to standard output without a trailing newline. The function then reads a line from input, converts it to a string (stripping a trailing newline), and returns that. When EOF is read, "EOFError" is raised. Example: >>> s = input('--> ') --> Monty Python's Flying Circus >>> s "Monty Python's Flying Circus" If the "readline" module was loaded, then "input()" will use it to provide elaborate line editing and history features. Raises an auditing event "builtins.input" with argument "prompt" before reading input Raises an auditing event "builtins.input/result" with the result after successfully reading input. class int([x]) class int(x, base=10) Return an integer object constructed from a number or string *x*, or return "0" if no arguments are given. If *x* defines "__int__()", "int(x)" returns "x.__int__()". If *x* defines "__index__()", it returns "x.__index__()". If *x* defines "__trunc__()", it returns "x.__trunc__()". For floating point numbers, this truncates towards zero. If *x* is not a number or if *base* is given, then *x* must be a string, "bytes", or "bytearray" instance representing an integer literal in radix *base*. Optionally, the literal can be preceded by "+" or "-" (with no space in between) and surrounded by whitespace. A base-n literal consists of the digits 0 to n-1, with "a" to "z" (or "A" to "Z") having values 10 to 35. The default *base* is 10. The allowed values are 0 and 2–36. Base-2, -8, and -16 literals can be optionally prefixed with "0b"/"0B", "0o"/"0O", or "0x"/"0X", as with integer literals in code. Base 0 means to interpret exactly as a code literal, so that the actual base is 2, 8, 10, or 16, and so that "int('010', 0)" is not legal, while "int('010')" is, as well as "int('010', 8)". The integer type is described in Numeric Types — int, float, complex. Changed in version 3.4: If *base* is not an instance of "int" and the *base* object has a "base.__index__" method, that method is called to obtain an integer for the base. Previous versions used "base.__int__" instead of "base.__index__". Changed in version 3.6: Grouping digits with underscores as in code literals is allowed. Changed in version 3.7: *x* is now a positional-only parameter. Changed in version 3.8: Falls back to "__index__()" if "__int__()" is not defined. Changed in version 3.9.14: "int" string inputs and string representations can be limited to help avoid denial of service attacks. A "ValueError" is raised when the limit is exceeded while converting a string *x* to an "int" or when converting an "int" into a string would exceed the limit. See the integer string conversion length limitation documentation. isinstance(object, classinfo) Return "True" if the *object* argument is an instance of the *classinfo* argument, or of a (direct, indirect or *virtual*) subclass thereof. If *object* is not an object of the given type, the function always returns "False". If *classinfo* is a tuple of type objects (or recursively, other such tuples), return "True" if *object* is an instance of any of the types. If *classinfo* is not a type or tuple of types and such tuples, a "TypeError" exception is raised. issubclass(class, classinfo) Return "True" if *class* is a subclass (direct, indirect or *virtual*) of *classinfo*. A class is considered a subclass of itself. *classinfo* may be a tuple of class objects (or recursively, other such tuples), in which case return "True" if *class* is a subclass of any entry in *classinfo*. In any other case, a "TypeError" exception is raised. iter(object[, sentinel]) Return an *iterator* object. The first argument is interpreted very differently depending on the presence of the second argument. Without a second argument, *object* must be a collection object which supports the iteration protocol (the "__iter__()" method), or it must support the sequence protocol (the "__getitem__()" method with integer arguments starting at "0"). If it does not support either of those protocols, "TypeError" is raised. If the second argument, *sentinel*, is given, then *object* must be a callable object. The iterator created in this case will call *object* with no arguments for each call to its "__next__()" method; if the value returned is equal to *sentinel*, "StopIteration" will be raised, otherwise the value will be returned. See also Iterator Types. One useful application of the second form of "iter()" is to build a block-reader. For example, reading fixed-width blocks from a binary database file until the end of file is reached: from functools import partial with open('mydata.db', 'rb') as f: for block in iter(partial(f.read, 64), b''): process_block(block) len(s) Return the length (the number of items) of an object. The argument may be a sequence (such as a string, bytes, tuple, list, or range) or a collection (such as a dictionary, set, or frozen set). **CPython implementation detail:** "len" raises "OverflowError" on lengths larger than "sys.maxsize", such as "range(2 ** 100)". class list([iterable]) Rather than being a function, "list" is actually a mutable sequence type, as documented in Lists and Sequence Types — list, tuple, range. locals() Update and return a dictionary representing the current local symbol table. Free variables are returned by "locals()" when it is called in function blocks, but not in class blocks. Note that at the module level, "locals()" and "globals()" are the same dictionary. Note: The contents of this dictionary should not be modified; changes may not affect the values of local and free variables used by the interpreter. map(function, iterable, ...) Return an iterator that applies *function* to every item of *iterable*, yielding the results. If additional *iterable* arguments are passed, *function* must take that many arguments and is applied to the items from all iterables in parallel. With multiple iterables, the iterator stops when the shortest iterable is exhausted. For cases where the function inputs are already arranged into argument tuples, see "itertools.starmap()". max(iterable, *[, key, default]) max(arg1, arg2, *args[, key]) Return the largest item in an iterable or the largest of two or more arguments. If one positional argument is provided, it should be an *iterable*. The largest item in the iterable is returned. If two or more positional arguments are provided, the largest of the positional arguments is returned. There are two optional keyword-only arguments. The *key* argument specifies a one-argument ordering function like that used for "list.sort()". The *default* argument specifies an object to return if the provided iterable is empty. If the iterable is empty and *default* is not provided, a "ValueError" is raised. If multiple items are maximal, the function returns the first one encountered. This is consistent with other sort-stability preserving tools such as "sorted(iterable, key=keyfunc, reverse=True)[0]" and "heapq.nlargest(1, iterable, key=keyfunc)". New in version 3.4: The *default* keyword-only argument. Changed in version 3.8: The *key* can be "None". class memoryview(object) Return a “memory view” object created from the given argument. See Memory Views for more information. min(iterable, *[, key, default]) min(arg1, arg2, *args[, key]) Return the smallest item in an iterable or the smallest of two or more arguments. If one positional argument is provided, it should be an *iterable*. The smallest item in the iterable is returned. If two or more positional arguments are provided, the smallest of the positional arguments is returned. There are two optional keyword-only arguments. The *key* argument specifies a one-argument ordering function like that used for "list.sort()". The *default* argument specifies an object to return if the provided iterable is empty. If the iterable is empty and *default* is not provided, a "ValueError" is raised. If multiple items are minimal, the function returns the first one encountered. This is consistent with other sort-stability preserving tools such as "sorted(iterable, key=keyfunc)[0]" and "heapq.nsmallest(1, iterable, key=keyfunc)". New in version 3.4: The *default* keyword-only argument. Changed in version 3.8: The *key* can be "None". next(iterator[, default]) Retrieve the next item from the *iterator* by calling its "__next__()" method. If *default* is given, it is returned if the iterator is exhausted, otherwise "StopIteration" is raised. class object Return a new featureless object. "object" is a base for all classes. It has the methods that are common to all instances of Python classes. This function does not accept any arguments. Note: "object" does *not* have a "__dict__", so you can’t assign arbitrary attributes to an instance of the "object" class. oct(x) Convert an integer number to an octal string prefixed with “0o”. The result is a valid Python expression. If *x* is not a Python "int" object, it has to define an "__index__()" method that returns an integer. For example: >>> oct(8) '0o10' >>> oct(-56) '-0o70' If you want to convert an integer number to octal string either with prefix “0o” or not, you can use either of the following ways. >>> '%#o' % 10, '%o' % 10 ('0o12', '12') >>> format(10, '#o'), format(10, 'o') ('0o12', '12') >>> f'{10:#o}', f'{10:o}' ('0o12', '12') See also "format()" for more information. open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None) Open *file* and return a corresponding *file object*. If the file cannot be opened, an "OSError" is raised. See Reading and Writing Files for more examples of how to use this function. *file* is a *path-like object* giving the pathname (absolute or relative to the current working directory) of the file to be opened or an integer file descriptor of the file to be wrapped. (If a file descriptor is given, it is closed when the returned I/O object is closed, unless *closefd* is set to "False".) *mode* is an optional string that specifies the mode in which the file is opened. It defaults to "'r'" which means open for reading in text mode. Other common values are "'w'" for writing (truncating the file if it already exists), "'x'" for exclusive creation and "'a'" for appending (which on *some* Unix systems, means that *all* writes append to the end of the file regardless of the current seek position). In text mode, if *encoding* is not specified the encoding used is platform dependent: "locale.getpreferredencoding(False)" is called to get the current locale encoding. (For reading and writing raw bytes use binary mode and leave *encoding* unspecified.) The available modes are: +-----------+-----------------------------------------------------------------+ | Character | Meaning | |===========|=================================================================| | "'r'" | open for reading (default) | +-----------+-----------------------------------------------------------------+ | "'w'" | open for writing, truncating the file first | +-----------+-----------------------------------------------------------------+ | "'x'" | open for exclusive creation, failing if the file already exists | +-----------+-----------------------------------------------------------------+ | "'a'" | open for writing, appending to the end of the file if it exists | +-----------+-----------------------------------------------------------------+ | "'b'" | binary mode | +-----------+-----------------------------------------------------------------+ | "'t'" | text mode (default) | +-----------+-----------------------------------------------------------------+ | "'+'" | open for updating (reading and writing) | +-----------+-----------------------------------------------------------------+ The default mode is "'r'" (open for reading text, synonym of "'rt'"). Modes "'w+'" and "'w+b'" open and truncate the file. Modes "'r+'" and "'r+b'" open the file with no truncation. As mentioned in the Overview, Python distinguishes between binary and text I/O. Files opened in binary mode (including "'b'" in the *mode* argument) return contents as "bytes" objects without any decoding. In text mode (the default, or when "'t'" is included in the *mode* argument), the contents of the file are returned as "str", the bytes having been first decoded using a platform- dependent encoding or using the specified *encoding* if given. There is an additional mode character permitted, "'U'", which no longer has any effect, and is considered deprecated. It previously enabled *universal newlines* in text mode, which became the default behaviour in Python 3.0. Refer to the documentation of the newline parameter for further details. Note: Python doesn’t depend on the underlying operating system’s notion of text files; all the processing is done by Python itself, and is therefore platform-independent. *buffering* is an optional integer used to set the buffering policy. Pass 0 to switch buffering off (only allowed in binary mode), 1 to select line buffering (only usable in text mode), and an integer > 1 to indicate the size in bytes of a fixed-size chunk buffer. Note that specifying a buffer size this way applies for binary buffered I/O, but "TextIOWrapper" (i.e., files opened with "mode='r+'") would have another buffering. To disable buffering in "TextIOWrapper", consider using the "write_through" flag for "io.TextIOWrapper.reconfigure()". When no *buffering* argument is given, the default buffering policy works as follows: * Binary files are buffered in fixed-size chunks; the size of the buffer is chosen using a heuristic trying to determine the underlying device’s “block size” and falling back on "io.DEFAULT_BUFFER_SIZE". On many systems, the buffer will typically be 4096 or 8192 bytes long. * “Interactive” text files (files for which "isatty()" returns "True") use line buffering. Other text files use the policy described above for binary files. *encoding* is the name of the encoding used to decode or encode the file. This should only be used in text mode. The default encoding is platform dependent (whatever "locale.getpreferredencoding()" returns), but any *text encoding* supported by Python can be used. See the "codecs" module for the list of supported encodings. *errors* is an optional string that specifies how encoding and decoding errors are to be handled—this cannot be used in binary mode. A variety of standard error handlers are available (listed under Error Handlers), though any error handling name that has been registered with "codecs.register_error()" is also valid. The standard names include: * "'strict'" to raise a "ValueError" exception if there is an encoding error. The default value of "None" has the same effect. * "'ignore'" ignores errors. Note that ignoring encoding errors can lead to data loss. * "'replace'" causes a replacement marker (such as "'?'") to be inserted where there is malformed data. * "'surrogateescape'" will represent any incorrect bytes as low surrogate code units ranging from U+DC80 to U+DCFF. These surrogate code units will then be turned back into the same bytes when the "surrogateescape" error handler is used when writing data. This is useful for processing files in an unknown encoding. * "'xmlcharrefreplace'" is only supported when writing to a file. Characters not supported by the encoding are replaced with the appropriate XML character reference "&#nnn;". * "'backslashreplace'" replaces malformed data by Python’s backslashed escape sequences. * "'namereplace'" (also only supported when writing) replaces unsupported characters with "\N{...}" escape sequences. *newline* controls how *universal newlines* mode works (it only applies to text mode). It can be "None", "''", "'\n'", "'\r'", and "'\r\n'". It works as follows: * When reading input from the stream, if *newline* is "None", universal newlines mode is enabled. Lines in the input can end in "'\n'", "'\r'", or "'\r\n'", and these are translated into "'\n'" before being returned to the caller. If it is "''", universal newlines mode is enabled, but line endings are returned to the caller untranslated. If it has any of the other legal values, input lines are only terminated by the given string, and the line ending is returned to the caller untranslated. * When writing output to the stream, if *newline* is "None", any "'\n'" characters written are translated to the system default line separator, "os.linesep". If *newline* is "''" or "'\n'", no translation takes place. If *newline* is any of the other legal values, any "'\n'" characters written are translated to the given string. If *closefd* is "False" and a file descriptor rather than a filename was given, the underlying file descriptor will be kept open when the file is closed. If a filename is given *closefd* must be "True" (the default) otherwise an error will be raised. A custom opener can be used by passing a callable as *opener*. The underlying file descriptor for the file object is then obtained by calling *opener* with (*file*, *flags*). *opener* must return an open file descriptor (passing "os.open" as *opener* results in functionality similar to passing "None"). The newly created file is non-inheritable. The following example uses the dir_fd parameter of the "os.open()" function to open a file relative to a given directory: >>> import os >>> dir_fd = os.open('somedir', os.O_RDONLY) >>> def opener(path, flags): ... return os.open(path, flags, dir_fd=dir_fd) ... >>> with open('spamspam.txt', 'w', opener=opener) as f: ... print('This will be written to somedir/spamspam.txt', file=f) ... >>> os.close(dir_fd) # don't leak a file descriptor The type of *file object* returned by the "open()" function depends on the mode. When "open()" is used to open a file in a text mode ("'w'", "'r'", "'wt'", "'rt'", etc.), it returns a subclass of "io.TextIOBase" (specifically "io.TextIOWrapper"). When used to open a file in a binary mode with buffering, the returned class is a subclass of "io.BufferedIOBase". The exact class varies: in read binary mode, it returns an "io.BufferedReader"; in write binary and append binary modes, it returns an "io.BufferedWriter", and in read/write mode, it returns an "io.BufferedRandom". When buffering is disabled, the raw stream, a subclass of "io.RawIOBase", "io.FileIO", is returned. See also the file handling modules, such as, "fileinput", "io" (where "open()" is declared), "os", "os.path", "tempfile", and "shutil". Raises an auditing event "open" with arguments "file", "mode", "flags". The "mode" and "flags" arguments may have been modified or inferred from the original call. Changed in version 3.3: * The *opener* parameter was added. * The "'x'" mode was added. * "IOError" used to be raised, it is now an alias of "OSError". * "FileExistsError" is now raised if the file opened in exclusive creation mode ("'x'") already exists. Changed in version 3.4: * The file is now non-inheritable. Deprecated since version 3.4, will be removed in version 3.10: The "'U'" mode. Changed in version 3.5: * If the system call is interrupted and the signal handler does not raise an exception, the function now retries the system call instead of raising an "InterruptedError" exception (see **PEP 475** for the rationale). * The "'namereplace'" error handler was added. Changed in version 3.6: * Support added to accept objects implementing "os.PathLike". * On Windows, opening a console buffer may return a subclass of "io.RawIOBase" other than "io.FileIO". ord(c) Given a string representing one Unicode character, return an integer representing the Unicode code point of that character. For example, "ord('a')" returns the integer "97" and "ord('€')" (Euro sign) returns "8364". This is the inverse of "chr()". pow(base, exp[, mod]) Return *base* to the power *exp*; if *mod* is present, return *base* to the power *exp*, modulo *mod* (computed more efficiently than "pow(base, exp) % mod"). The two-argument form "pow(base, exp)" is equivalent to using the power operator: "base**exp". The arguments must have numeric types. With mixed operand types, the coercion rules for binary arithmetic operators apply. For "int" operands, the result has the same type as the operands (after coercion) unless the second argument is negative; in that case, all arguments are converted to float and a float result is delivered. For example, "pow(10, 2)" returns "100", but "pow(10, -2)" returns "0.01". For a negative base of type "int" or "float" and a non- integral exponent, a complex result is delivered. For example, "pow(-9, 0.5)" returns a value close to "3j". For "int" operands *base* and *exp*, if *mod* is present, *mod* must also be of integer type and *mod* must be nonzero. If *mod* is present and *exp* is negative, *base* must be relatively prime to *mod*. In that case, "pow(inv_base, -exp, mod)" is returned, where *inv_base* is an inverse to *base* modulo *mod*. Here’s an example of computing an inverse for "38" modulo "97": >>> pow(38, -1, mod=97) 23 >>> 23 * 38 % 97 == 1 True Changed in version 3.8: For "int" operands, the three-argument form of "pow" now allows the second argument to be negative, permitting computation of modular inverses. Changed in version 3.8: Allow keyword arguments. Formerly, only positional arguments were supported. print(*objects, sep=' ', end='\n', file=sys.stdout, flush=False) Print *objects* to the text stream *file*, separated by *sep* and followed by *end*. *sep*, *end*, *file* and *flush*, if present, must be given as keyword arguments. All non-keyword arguments are converted to strings like "str()" does and written to the stream, separated by *sep* and followed by *end*. Both *sep* and *end* must be strings; they can also be "None", which means to use the default values. If no *objects* are given, "print()" will just write *end*. The *file* argument must be an object with a "write(string)" method; if it is not present or "None", "sys.stdout" will be used. Since printed arguments are converted to text strings, "print()" cannot be used with binary mode file objects. For these, use "file.write(...)" instead. Whether output is buffered is usually determined by *file*, but if the *flush* keyword argument is true, the stream is forcibly flushed. Changed in version 3.3: Added the *flush* keyword argument. class property(fget=None, fset=None, fdel=None, doc=None) Return a property attribute. *fget* is a function for getting an attribute value. *fset* is a function for setting an attribute value. *fdel* is a function for deleting an attribute value. And *doc* creates a docstring for the attribute. A typical use is to define a managed attribute "x": class C: def __init__(self): self._x = None def getx(self): return self._x def setx(self, value): self._x = value def delx(self): del self._x x = property(getx, setx, delx, "I'm the 'x' property.") If *c* is an instance of *C*, "c.x" will invoke the getter, "c.x = value" will invoke the setter and "del c.x" the deleter. If given, *doc* will be the docstring of the property attribute. Otherwise, the property will copy *fget*’s docstring (if it exists). This makes it possible to create read-only properties easily using "property()" as a *decorator*: class Parrot: def __init__(self): self._voltage = 100000 @property def voltage(self): """Get the current voltage.""" return self._voltage The "@property" decorator turns the "voltage()" method into a “getter” for a read-only attribute with the same name, and it sets the docstring for *voltage* to “Get the current voltage.” A property object has "getter", "setter", and "deleter" methods usable as decorators that create a copy of the property with the corresponding accessor function set to the decorated function. This is best explained with an example: class C: def __init__(self): self._x = None @property def x(self): """I'm the 'x' property.""" return self._x @x.setter def x(self, value): self._x = value @x.deleter def x(self): del self._x This code is exactly equivalent to the first example. Be sure to give the additional functions the same name as the original property ("x" in this case.) The returned property object also has the attributes "fget", "fset", and "fdel" corresponding to the constructor arguments. Changed in version 3.5: The docstrings of property objects are now writeable. class range(stop) class range(start, stop[, step]) Rather than being a function, "range" is actually an immutable sequence type, as documented in Ranges and Sequence Types — list, tuple, range. repr(object) Return a string containing a printable representation of an object. For many types, this function makes an attempt to return a string that would yield an object with the same value when passed to "eval()", otherwise the representation is a string enclosed in angle brackets that contains the name of the type of the object together with additional information often including the name and address of the object. A class can control what this function returns for its instances by defining a "__repr__()" method. reversed(seq) Return a reverse *iterator*. *seq* must be an object which has a "__reversed__()" method or supports the sequence protocol (the "__len__()" method and the "__getitem__()" method with integer arguments starting at "0"). round(number[, ndigits]) Return *number* rounded to *ndigits* precision after the decimal point. If *ndigits* is omitted or is "None", it returns the nearest integer to its input. For the built-in types supporting "round()", values are rounded to the closest multiple of 10 to the power minus *ndigits*; if two multiples are equally close, rounding is done toward the even choice (so, for example, both "round(0.5)" and "round(-0.5)" are "0", and "round(1.5)" is "2"). Any integer value is valid for *ndigits* (positive, zero, or negative). The return value is an integer if *ndigits* is omitted or "None". Otherwise the return value has the same type as *number*. For a general Python object "number", "round" delegates to "number.__round__". Note: The behavior of "round()" for floats can be surprising: for example, "round(2.675, 2)" gives "2.67" instead of the expected "2.68". This is not a bug: it’s a result of the fact that most decimal fractions can’t be represented exactly as a float. See Floating Point Arithmetic: Issues and Limitations for more information. class set([iterable]) Return a new "set" object, optionally with elements taken from *iterable*. "set" is a built-in class. See "set" and Set Types — set, frozenset for documentation about this class. For other containers see the built-in "frozenset", "list", "tuple", and "dict" classes, as well as the "collections" module. setattr(object, name, value) This is the counterpart of "getattr()". The arguments are an object, a string and an arbitrary value. The string may name an existing attribute or a new attribute. The function assigns the value to the attribute, provided the object allows it. For example, "setattr(x, 'foobar', 123)" is equivalent to "x.foobar = 123". Note: Since private name mangling happens at compilation time, one must manually mangle a private attribute’s (attributes with two leading underscores) name in order to set it with "setattr()". class slice(stop) class slice(start, stop[, step]) Return a *slice* object representing the set of indices specified by "range(start, stop, step)". The *start* and *step* arguments default to "None". Slice objects have read-only data attributes "start", "stop" and "step" which merely return the argument values (or their default). They have no other explicit functionality; however they are used by NumPy and other third party packages. Slice objects are also generated when extended indexing syntax is used. For example: "a[start:stop:step]" or "a[start:stop, i]". See "itertools.islice()" for an alternate version that returns an iterator. sorted(iterable, /, *, key=None, reverse=False) Return a new sorted list from the items in *iterable*. Has two optional arguments which must be specified as keyword arguments. *key* specifies a function of one argument that is used to extract a comparison key from each element in *iterable* (for example, "key=str.lower"). The default value is "None" (compare the elements directly). *reverse* is a boolean value. If set to "True", then the list elements are sorted as if each comparison were reversed. Use "functools.cmp_to_key()" to convert an old-style *cmp* function to a *key* function. The built-in "sorted()" function is guaranteed to be stable. A sort is stable if it guarantees not to change the relative order of elements that compare equal — this is helpful for sorting in multiple passes (for example, sort by department, then by salary grade). The sort algorithm uses only "<" comparisons between items. While defining an "__lt__()" method will suffice for sorting, **PEP 8** recommends that all six rich comparisons be implemented. This will help avoid bugs when using the same data with other ordering tools such as "max()" that rely on a different underlying method. Implementing all six comparisons also helps avoid confusion for mixed type comparisons which can call reflected the "__gt__()" method. For sorting examples and a brief sorting tutorial, see Sorting HOW TO. @staticmethod Transform a method into a static method. A static method does not receive an implicit first argument. To declare a static method, use this idiom: class C: @staticmethod def f(arg1, arg2, ...): ... The "@staticmethod" form is a function *decorator* – see Function definitions for details. A static method can be called either on the class (such as "C.f()") or on an instance (such as "C().f()"). Static methods in Python are similar to those found in Java or C++. Also see "classmethod()" for a variant that is useful for creating alternate class constructors. Like all decorators, it is also possible to call "staticmethod" as a regular function and do something with its result. This is needed in some cases where you need a reference to a function from a class body and you want to avoid the automatic transformation to instance method. For these cases, use this idiom: class C: builtin_open = staticmethod(open) For more information on static methods, see The standard type hierarchy. class str(object='') class str(object=b'', encoding='utf-8', errors='strict') Return a "str" version of *object*. See "str()" for details. "str" is the built-in string *class*. For general information about strings, see Text Sequence Type — str. sum(iterable, /, start=0) Sums *start* and the items of an *iterable* from left to right and returns the total. The *iterable*’s items are normally numbers, and the start value is not allowed to be a string. For some use cases, there are good alternatives to "sum()". The preferred, fast way to concatenate a sequence of strings is by calling "''.join(sequence)". To add floating point values with extended precision, see "math.fsum()". To concatenate a series of iterables, consider using "itertools.chain()". Changed in version 3.8: The *start* parameter can be specified as a keyword argument. super([type[, object-or-type]]) Return a proxy object that delegates method calls to a parent or sibling class of *type*. This is useful for accessing inherited methods that have been overridden in a class. The *object-or-type* determines the *method resolution order* to be searched. The search starts from the class right after the *type*. For example, if "__mro__" of *object-or-type* is "D -> B -> C -> A -> object" and the value of *type* is "B", then "super()" searches "C -> A -> object". The "__mro__" attribute of the *object-or-type* lists the method resolution search order used by both "getattr()" and "super()". The attribute is dynamic and can change whenever the inheritance hierarchy is updated. If the second argument is omitted, the super object returned is unbound. If the second argument is an object, "isinstance(obj, type)" must be true. If the second argument is a type, "issubclass(type2, type)" must be true (this is useful for classmethods). There are two typical use cases for *super*. In a class hierarchy with single inheritance, *super* can be used to refer to parent classes without naming them explicitly, thus making the code more maintainable. This use closely parallels the use of *super* in other programming languages. The second use case is to support cooperative multiple inheritance in a dynamic execution environment. This use case is unique to Python and is not found in statically compiled languages or languages that only support single inheritance. This makes it possible to implement “diamond diagrams” where multiple base classes implement the same method. Good design dictates that such implementations have the same calling signature in every case (because the order of calls is determined at runtime, because that order adapts to changes in the class hierarchy, and because that order can include sibling classes that are unknown prior to runtime). For both use cases, a typical superclass call looks like this: class C(B): def method(self, arg): super().method(arg) # This does the same thing as: # super(C, self).method(arg) In addition to method lookups, "super()" also works for attribute lookups. One possible use case for this is calling *descriptors* in a parent or sibling class. Note that "super()" is implemented as part of the binding process for explicit dotted attribute lookups such as "super().__getitem__(name)". It does so by implementing its own "__getattribute__()" method for searching classes in a predictable order that supports cooperative multiple inheritance. Accordingly, "super()" is undefined for implicit lookups using statements or operators such as "super()[name]". Also note that, aside from the zero argument form, "super()" is not limited to use inside methods. The two argument form specifies the arguments exactly and makes the appropriate references. The zero argument form only works inside a class definition, as the compiler fills in the necessary details to correctly retrieve the class being defined, as well as accessing the current instance for ordinary methods. For practical suggestions on how to design cooperative classes using "super()", see guide to using super(). class tuple([iterable]) Rather than being a function, "tuple" is actually an immutable sequence type, as documented in Tuples and Sequence Types — list, tuple, range. class type(object) class type(name, bases, dict, **kwds) With one argument, return the type of an *object*. The return value is a type object and generally the same object as returned by "object.__class__". The "isinstance()" built-in function is recommended for testing the type of an object, because it takes subclasses into account. With three arguments, return a new type object. This is essentially a dynamic form of the "class" statement. The *name* string is the class name and becomes the "__name__" attribute. The *bases* tuple contains the base classes and becomes the "__bases__" attribute; if empty, "object", the ultimate base of all classes, is added. The *dict* dictionary contains attribute and method definitions for the class body; it may be copied or wrapped before becoming the "__dict__" attribute. The following two statements create identical "type" objects: >>> class X: ... a = 1 ... >>> X = type('X', (), dict(a=1)) See also Type Objects. Keyword arguments provided to the three argument form are passed to the appropriate metaclass machinery (usually "__init_subclass__()") in the same way that keywords in a class definition (besides *metaclass*) would. See also Customizing class creation. Changed in version 3.6: Subclasses of "type" which don’t override "type.__new__" may no longer use the one-argument form to get the type of an object. vars([object]) Return the "__dict__" attribute for a module, class, instance, or any other object with a "__dict__" attribute. Objects such as modules and instances have an updateable "__dict__" attribute; however, other objects may have write restrictions on their "__dict__" attributes (for example, classes use a "types.MappingProxyType" to prevent direct dictionary updates). Without an argument, "vars()" acts like "locals()". Note, the locals dictionary is only useful for reads since updates to the locals dictionary are ignored. A "TypeError" exception is raised if an object is specified but it doesn’t have a "__dict__" attribute (for example, if its class defines the "__slots__" attribute). zip(*iterables) Make an iterator that aggregates elements from each of the iterables. Returns an iterator of tuples, where the *i*-th tuple contains the *i*-th element from each of the argument sequences or iterables. The iterator stops when the shortest input iterable is exhausted. With a single iterable argument, it returns an iterator of 1-tuples. With no arguments, it returns an empty iterator. Equivalent to: def zip(*iterables): # zip('ABCD', 'xy') --> Ax By sentinel = object() iterators = [iter(it) for it in iterables] while iterators: result = [] for it in iterators: elem = next(it, sentinel) if elem is sentinel: return result.append(elem) yield tuple(result) The left-to-right evaluation order of the iterables is guaranteed. This makes possible an idiom for clustering a data series into n-length groups using "zip(*[iter(s)]*n)". This repeats the *same* iterator "n" times so that each output tuple has the result of "n" calls to the iterator. This has the effect of dividing the input into n-length chunks. "zip()" should only be used with unequal length inputs when you don’t care about trailing, unmatched values from the longer iterables. If those values are important, use "itertools.zip_longest()" instead. "zip()" in conjunction with the "*" operator can be used to unzip a list: >>> x = [1, 2, 3] >>> y = [4, 5, 6] >>> zipped = zip(x, y) >>> list(zipped) [(1, 4), (2, 5), (3, 6)] >>> x2, y2 = zip(*zip(x, y)) >>> x == list(x2) and y == list(y2) True __import__(name, globals=None, locals=None, fromlist=(), level=0) Note: This is an advanced function that is not needed in everyday Python programming, unlike "importlib.import_module()". This function is invoked by the "import" statement. It can be replaced (by importing the "builtins" module and assigning to "builtins.__import__") in order to change semantics of the "import" statement, but doing so is **strongly** discouraged as it is usually simpler to use import hooks (see **PEP 302**) to attain the same goals and does not cause issues with code which assumes the default import implementation is in use. Direct use of "__import__()" is also discouraged in favor of "importlib.import_module()". The function imports the module *name*, potentially using the given *globals* and *locals* to determine how to interpret the name in a package context. The *fromlist* gives the names of objects or submodules that should be imported from the module given by *name*. The standard implementation does not use its *locals* argument at all, and uses its *globals* only to determine the package context of the "import" statement. *level* specifies whether to use absolute or relative imports. "0" (the default) means only perform absolute imports. Positive values for *level* indicate the number of parent directories to search relative to the directory of the module calling "__import__()" (see **PEP 328** for the details). When the *name* variable is of the form "package.module", normally, the top-level package (the name up till the first dot) is returned, *not* the module named by *name*. However, when a non-empty *fromlist* argument is given, the module named by *name* is returned. For example, the statement "import spam" results in bytecode resembling the following code: spam = __import__('spam', globals(), locals(), [], 0) The statement "import spam.ham" results in this call: spam = __import__('spam.ham', globals(), locals(), [], 0) Note how "__import__()" returns the toplevel module here because this is the object that is bound to a name by the "import" statement. On the other hand, the statement "from spam.ham import eggs, sausage as saus" results in _temp = __import__('spam.ham', globals(), locals(), ['eggs', 'sausage'], 0) eggs = _temp.eggs saus = _temp.sausage Here, the "spam.ham" module is returned from "__import__()". From this object, the names to import are retrieved and assigned to their respective names. If you simply want to import a module (potentially within a package) by name, use "importlib.import_module()". Changed in version 3.3: Negative values for *level* are no longer supported (which also changes the default value to 0). Changed in version 3.9: When the command line options "-E" or "-I" are being used, the environment variable "PYTHONCASEOK" is now ignored. -[ Footnotes ]- [1] Note that the parser only accepts the Unix-style end of line convention. If you are reading the code from a file, make sure to use newline conversion mode to convert Windows or Mac-style newlines. "functools" — Higher-order functions and operations on callable objects *********************************************************************** **Source code:** Lib/functools.py ====================================================================== The "functools" module is for higher-order functions: functions that act on or return other functions. In general, any callable object can be treated as a function for the purposes of this module. The "functools" module defines the following functions: @functools.cache(user_function) Simple lightweight unbounded function cache. Sometimes called “memoize”. Returns the same as "lru_cache(maxsize=None)", creating a thin wrapper around a dictionary lookup for the function arguments. Because it never needs to evict old values, this is smaller and faster than "lru_cache()" with a size limit. For example: @cache def factorial(n): return n * factorial(n-1) if n else 1 >>> factorial(10) # no previously cached result, makes 11 recursive calls 3628800 >>> factorial(5) # just looks up cached value result 120 >>> factorial(12) # makes two new recursive calls, the other 10 are cached 479001600 New in version 3.9. @functools.cached_property(func) Transform a method of a class into a property whose value is computed once and then cached as a normal attribute for the life of the instance. Similar to "property()", with the addition of caching. Useful for expensive computed properties of instances that are otherwise effectively immutable. Example: class DataSet: def __init__(self, sequence_of_numbers): self._data = tuple(sequence_of_numbers) @cached_property def stdev(self): return statistics.stdev(self._data) The mechanics of "cached_property()" are somewhat different from "property()". A regular property blocks attribute writes unless a setter is defined. In contrast, a *cached_property* allows writes. The *cached_property* decorator only runs on lookups and only when an attribute of the same name doesn’t exist. When it does run, the *cached_property* writes to the attribute with the same name. Subsequent attribute reads and writes take precedence over the *cached_property* method and it works like a normal attribute. The cached value can be cleared by deleting the attribute. This allows the *cached_property* method to run again. Note, this decorator interferes with the operation of **PEP 412** key-sharing dictionaries. This means that instance dictionaries can take more space than usual. Also, this decorator requires that the "__dict__" attribute on each instance be a mutable mapping. This means it will not work with some types, such as metaclasses (since the "__dict__" attributes on type instances are read-only proxies for the class namespace), and those that specify "__slots__" without including "__dict__" as one of the defined slots (as such classes don’t provide a "__dict__" attribute at all). If a mutable mapping is not available or if space-efficient key sharing is desired, an effect similar to "cached_property()" can be achieved by a stacking "property()" on top of "cache()": class DataSet: def __init__(self, sequence_of_numbers): self._data = sequence_of_numbers @property @cache def stdev(self): return statistics.stdev(self._data) New in version 3.8. functools.cmp_to_key(func) Transform an old-style comparison function to a *key function*. Used with tools that accept key functions (such as "sorted()", "min()", "max()", "heapq.nlargest()", "heapq.nsmallest()", "itertools.groupby()"). This function is primarily used as a transition tool for programs being converted from Python 2 which supported the use of comparison functions. A comparison function is any callable that accept two arguments, compares them, and returns a negative number for less-than, zero for equality, or a positive number for greater-than. A key function is a callable that accepts one argument and returns another value to be used as the sort key. Example: sorted(iterable, key=cmp_to_key(locale.strcoll)) # locale-aware sort order For sorting examples and a brief sorting tutorial, see Sorting HOW TO. New in version 3.2. @functools.lru_cache(user_function) @functools.lru_cache(maxsize=128, typed=False) Decorator to wrap a function with a memoizing callable that saves up to the *maxsize* most recent calls. It can save time when an expensive or I/O bound function is periodically called with the same arguments. Since a dictionary is used to cache results, the positional and keyword arguments to the function must be hashable. Distinct argument patterns may be considered to be distinct calls with separate cache entries. For example, *f(a=1, b=2)* and *f(b=2, a=1)* differ in their keyword argument order and may have two separate cache entries. If *user_function* is specified, it must be a callable. This allows the *lru_cache* decorator to be applied directly to a user function, leaving the *maxsize* at its default value of 128: @lru_cache def count_vowels(sentence): sentence = sentence.casefold() return sum(sentence.count(vowel) for vowel in 'aeiou') If *maxsize* is set to "None", the LRU feature is disabled and the cache can grow without bound. If *typed* is set to true, function arguments of different types will be cached separately. For example, "f(3)" and "f(3.0)" will be treated as distinct calls with distinct results. The wrapped function is instrumented with a "cache_parameters()" function that returns a new "dict" showing the values for *maxsize* and *typed*. This is for information purposes only. Mutating the values has no effect. To help measure the effectiveness of the cache and tune the *maxsize* parameter, the wrapped function is instrumented with a "cache_info()" function that returns a *named tuple* showing *hits*, *misses*, *maxsize* and *currsize*. In a multi-threaded environment, the hits and misses are approximate. The decorator also provides a "cache_clear()" function for clearing or invalidating the cache. The original underlying function is accessible through the "__wrapped__" attribute. This is useful for introspection, for bypassing the cache, or for rewrapping the function with a different cache. An LRU (least recently used) cache works best when the most recent calls are the best predictors of upcoming calls (for example, the most popular articles on a news server tend to change each day). The cache’s size limit assures that the cache does not grow without bound on long-running processes such as web servers. In general, the LRU cache should only be used when you want to reuse previously computed values. Accordingly, it doesn’t make sense to cache functions with side-effects, functions that need to create distinct mutable objects on each call, or impure functions such as time() or random(). Example of an LRU cache for static web content: @lru_cache(maxsize=32) def get_pep(num): 'Retrieve text of a Python Enhancement Proposal' resource = 'https://www.python.org/dev/peps/pep-%04d/' % num try: with urllib.request.urlopen(resource) as s: return s.read() except urllib.error.HTTPError: return 'Not Found' >>> for n in 8, 290, 308, 320, 8, 218, 320, 279, 289, 320, 9991: ... pep = get_pep(n) ... print(n, len(pep)) >>> get_pep.cache_info() CacheInfo(hits=3, misses=8, maxsize=32, currsize=8) Example of efficiently computing Fibonacci numbers using a cache to implement a dynamic programming technique: @lru_cache(maxsize=None) def fib(n): if n < 2: return n return fib(n-1) + fib(n-2) >>> [fib(n) for n in range(16)] [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610] >>> fib.cache_info() CacheInfo(hits=28, misses=16, maxsize=None, currsize=16) New in version 3.2. Changed in version 3.3: Added the *typed* option. Changed in version 3.8: Added the *user_function* option. New in version 3.9: Added the function "cache_parameters()" @functools.total_ordering Given a class defining one or more rich comparison ordering methods, this class decorator supplies the rest. This simplifies the effort involved in specifying all of the possible rich comparison operations: The class must define one of "__lt__()", "__le__()", "__gt__()", or "__ge__()". In addition, the class should supply an "__eq__()" method. For example: @total_ordering class Student: def _is_valid_operand(self, other): return (hasattr(other, "lastname") and hasattr(other, "firstname")) def __eq__(self, other): if not self._is_valid_operand(other): return NotImplemented return ((self.lastname.lower(), self.firstname.lower()) == (other.lastname.lower(), other.firstname.lower())) def __lt__(self, other): if not self._is_valid_operand(other): return NotImplemented return ((self.lastname.lower(), self.firstname.lower()) < (other.lastname.lower(), other.firstname.lower())) Note: While this decorator makes it easy to create well behaved totally ordered types, it *does* come at the cost of slower execution and more complex stack traces for the derived comparison methods. If performance benchmarking indicates this is a bottleneck for a given application, implementing all six rich comparison methods instead is likely to provide an easy speed boost. New in version 3.2. Changed in version 3.4: Returning NotImplemented from the underlying comparison function for unrecognised types is now supported. functools.partial(func, /, *args, **keywords) Return a new partial object which when called will behave like *func* called with the positional arguments *args* and keyword arguments *keywords*. If more arguments are supplied to the call, they are appended to *args*. If additional keyword arguments are supplied, they extend and override *keywords*. Roughly equivalent to: def partial(func, /, *args, **keywords): def newfunc(*fargs, **fkeywords): newkeywords = {**keywords, **fkeywords} return func(*args, *fargs, **newkeywords) newfunc.func = func newfunc.args = args newfunc.keywords = keywords return newfunc The "partial()" is used for partial function application which “freezes” some portion of a function’s arguments and/or keywords resulting in a new object with a simplified signature. For example, "partial()" can be used to create a callable that behaves like the "int()" function where the *base* argument defaults to two: >>> from functools import partial >>> basetwo = partial(int, base=2) >>> basetwo.__doc__ = 'Convert base 2 string to an int.' >>> basetwo('10010') 18 class functools.partialmethod(func, /, *args, **keywords) Return a new "partialmethod" descriptor which behaves like "partial" except that it is designed to be used as a method definition rather than being directly callable. *func* must be a *descriptor* or a callable (objects which are both, like normal functions, are handled as descriptors). When *func* is a descriptor (such as a normal Python function, "classmethod()", "staticmethod()", "abstractmethod()" or another instance of "partialmethod"), calls to "__get__" are delegated to the underlying descriptor, and an appropriate partial object returned as the result. When *func* is a non-descriptor callable, an appropriate bound method is created dynamically. This behaves like a normal Python function when used as a method: the *self* argument will be inserted as the first positional argument, even before the *args* and *keywords* supplied to the "partialmethod" constructor. Example: >>> class Cell: ... def __init__(self): ... self._alive = False ... @property ... def alive(self): ... return self._alive ... def set_state(self, state): ... self._alive = bool(state) ... set_alive = partialmethod(set_state, True) ... set_dead = partialmethod(set_state, False) ... >>> c = Cell() >>> c.alive False >>> c.set_alive() >>> c.alive True New in version 3.4. functools.reduce(function, iterable[, initializer]) Apply *function* of two arguments cumulatively to the items of *iterable*, from left to right, so as to reduce the iterable to a single value. For example, "reduce(lambda x, y: x+y, [1, 2, 3, 4, 5])" calculates "((((1+2)+3)+4)+5)". The left argument, *x*, is the accumulated value and the right argument, *y*, is the update value from the *iterable*. If the optional *initializer* is present, it is placed before the items of the iterable in the calculation, and serves as a default when the iterable is empty. If *initializer* is not given and *iterable* contains only one item, the first item is returned. Roughly equivalent to: def reduce(function, iterable, initializer=None): it = iter(iterable) if initializer is None: value = next(it) else: value = initializer for element in it: value = function(value, element) return value See "itertools.accumulate()" for an iterator that yields all intermediate values. @functools.singledispatch Transform a function into a *single-dispatch* *generic function*. To define a generic function, decorate it with the "@singledispatch" decorator. When defining a function using "@singledispatch", note that the dispatch happens on the type of the first argument: >>> from functools import singledispatch >>> @singledispatch ... def fun(arg, verbose=False): ... if verbose: ... print("Let me just say,", end=" ") ... print(arg) To add overloaded implementations to the function, use the "register()" attribute of the generic function, which can be used as a decorator. For functions annotated with types, the decorator will infer the type of the first argument automatically: >>> @fun.register ... def _(arg: int, verbose=False): ... if verbose: ... print("Strength in numbers, eh?", end=" ") ... print(arg) ... >>> @fun.register ... def _(arg: list, verbose=False): ... if verbose: ... print("Enumerate this:") ... for i, elem in enumerate(arg): ... print(i, elem) For code which doesn’t use type annotations, the appropriate type argument can be passed explicitly to the decorator itself: >>> @fun.register(complex) ... def _(arg, verbose=False): ... if verbose: ... print("Better than complicated.", end=" ") ... print(arg.real, arg.imag) ... To enable registering *lambdas* and pre-existing functions, the "register()" attribute can also be used in a functional form: >>> def nothing(arg, verbose=False): ... print("Nothing.") ... >>> fun.register(type(None), nothing) The "register()" attribute returns the undecorated function. This enables decorator stacking, "pickling", and the creation of unit tests for each variant independently: >>> @fun.register(float) ... @fun.register(Decimal) ... def fun_num(arg, verbose=False): ... if verbose: ... print("Half of your number:", end=" ") ... print(arg / 2) ... >>> fun_num is fun False When called, the generic function dispatches on the type of the first argument: >>> fun("Hello, world.") Hello, world. >>> fun("test.", verbose=True) Let me just say, test. >>> fun(42, verbose=True) Strength in numbers, eh? 42 >>> fun(['spam', 'spam', 'eggs', 'spam'], verbose=True) Enumerate this: 0 spam 1 spam 2 eggs 3 spam >>> fun(None) Nothing. >>> fun(1.23) 0.615 Where there is no registered implementation for a specific type, its method resolution order is used to find a more generic implementation. The original function decorated with "@singledispatch" is registered for the base "object" type, which means it is used if no better implementation is found. If an implementation is registered to an *abstract base class*, virtual subclasses of the base class will be dispatched to that implementation: >>> from collections.abc import Mapping >>> @fun.register ... def _(arg: Mapping, verbose=False): ... if verbose: ... print("Keys & Values") ... for key, value in arg.items(): ... print(key, "=>", value) ... >>> fun({"a": "b"}) a => b To check which implementation the generic function will choose for a given type, use the "dispatch()" attribute: >>> fun.dispatch(float) >>> fun.dispatch(dict) # note: default implementation To access all registered implementations, use the read-only "registry" attribute: >>> fun.registry.keys() dict_keys([, , , , , ]) >>> fun.registry[float] >>> fun.registry[object] New in version 3.4. Changed in version 3.7: The "register()" attribute now supports using type annotations. class functools.singledispatchmethod(func) Transform a method into a *single-dispatch* *generic function*. To define a generic method, decorate it with the "@singledispatchmethod" decorator. When defining a function using "@singledispatchmethod", note that the dispatch happens on the type of the first non-*self* or non-*cls* argument: class Negator: @singledispatchmethod def neg(self, arg): raise NotImplementedError("Cannot negate a") @neg.register def _(self, arg: int): return -arg @neg.register def _(self, arg: bool): return not arg "@singledispatchmethod" supports nesting with other decorators such as "@classmethod". Note that to allow for "dispatcher.register", "singledispatchmethod" must be the *outer most* decorator. Here is the "Negator" class with the "neg" methods bound to the class, rather than an instance of the class: class Negator: @singledispatchmethod @classmethod def neg(cls, arg): raise NotImplementedError("Cannot negate a") @neg.register @classmethod def _(cls, arg: int): return -arg @neg.register @classmethod def _(cls, arg: bool): return not arg The same pattern can be used for other similar decorators: "@staticmethod", "@abstractmethod", and others. New in version 3.8. functools.update_wrapper(wrapper, wrapped, assigned=WRAPPER_ASSIGNMENTS, updated=WRAPPER_UPDATES) Update a *wrapper* function to look like the *wrapped* function. The optional arguments are tuples to specify which attributes of the original function are assigned directly to the matching attributes on the wrapper function and which attributes of the wrapper function are updated with the corresponding attributes from the original function. The default values for these arguments are the module level constants "WRAPPER_ASSIGNMENTS" (which assigns to the wrapper function’s "__module__", "__name__", "__qualname__", "__annotations__" and "__doc__", the documentation string) and "WRAPPER_UPDATES" (which updates the wrapper function’s "__dict__", i.e. the instance dictionary). To allow access to the original function for introspection and other purposes (e.g. bypassing a caching decorator such as "lru_cache()"), this function automatically adds a "__wrapped__" attribute to the wrapper that refers to the function being wrapped. The main intended use for this function is in *decorator* functions which wrap the decorated function and return the wrapper. If the wrapper function is not updated, the metadata of the returned function will reflect the wrapper definition rather than the original function definition, which is typically less than helpful. "update_wrapper()" may be used with callables other than functions. Any attributes named in *assigned* or *updated* that are missing from the object being wrapped are ignored (i.e. this function will not attempt to set them on the wrapper function). "AttributeError" is still raised if the wrapper function itself is missing any attributes named in *updated*. New in version 3.2: Automatic addition of the "__wrapped__" attribute. New in version 3.2: Copying of the "__annotations__" attribute by default. Changed in version 3.2: Missing attributes no longer trigger an "AttributeError". Changed in version 3.4: The "__wrapped__" attribute now always refers to the wrapped function, even if that function defined a "__wrapped__" attribute. (see bpo-17482) @functools.wraps(wrapped, assigned=WRAPPER_ASSIGNMENTS, updated=WRAPPER_UPDATES) This is a convenience function for invoking "update_wrapper()" as a function decorator when defining a wrapper function. It is equivalent to "partial(update_wrapper, wrapped=wrapped, assigned=assigned, updated=updated)". For example: >>> from functools import wraps >>> def my_decorator(f): ... @wraps(f) ... def wrapper(*args, **kwds): ... print('Calling decorated function') ... return f(*args, **kwds) ... return wrapper ... >>> @my_decorator ... def example(): ... """Docstring""" ... print('Called example function') ... >>> example() Calling decorated function Called example function >>> example.__name__ 'example' >>> example.__doc__ 'Docstring' Without the use of this decorator factory, the name of the example function would have been "'wrapper'", and the docstring of the original "example()" would have been lost. "partial" Objects ================= "partial" objects are callable objects created by "partial()". They have three read-only attributes: partial.func A callable object or function. Calls to the "partial" object will be forwarded to "func" with new arguments and keywords. partial.args The leftmost positional arguments that will be prepended to the positional arguments provided to a "partial" object call. partial.keywords The keyword arguments that will be supplied when the "partial" object is called. "partial" objects are like "function" objects in that they are callable, weak referencable, and can have attributes. There are some important differences. For instance, the "__name__" and "__doc__" attributes are not created automatically. Also, "partial" objects defined in classes behave like static methods and do not transform into bound methods during instance attribute look-up. "gc" — Garbage Collector interface ********************************** ====================================================================== This module provides an interface to the optional garbage collector. It provides the ability to disable the collector, tune the collection frequency, and set debugging options. It also provides access to unreachable objects that the collector found but cannot free. Since the collector supplements the reference counting already used in Python, you can disable the collector if you are sure your program does not create reference cycles. Automatic collection can be disabled by calling "gc.disable()". To debug a leaking program call "gc.set_debug(gc.DEBUG_LEAK)". Notice that this includes "gc.DEBUG_SAVEALL", causing garbage-collected objects to be saved in gc.garbage for inspection. The "gc" module provides the following functions: gc.enable() Enable automatic garbage collection. gc.disable() Disable automatic garbage collection. gc.isenabled() Return "True" if automatic collection is enabled. gc.collect(generation=2) With no arguments, run a full collection. The optional argument *generation* may be an integer specifying which generation to collect (from 0 to 2). A "ValueError" is raised if the generation number is invalid. The number of unreachable objects found is returned. The free lists maintained for a number of built-in types are cleared whenever a full collection or collection of the highest generation (2) is run. Not all items in some free lists may be freed due to the particular implementation, in particular "float". gc.set_debug(flags) Set the garbage collection debugging flags. Debugging information will be written to "sys.stderr". See below for a list of debugging flags which can be combined using bit operations to control debugging. gc.get_debug() Return the debugging flags currently set. gc.get_objects(generation=None) Returns a list of all objects tracked by the collector, excluding the list returned. If *generation* is not None, return only the objects tracked by the collector that are in that generation. Changed in version 3.8: New *generation* parameter. Raises an auditing event "gc.get_objects" with argument "generation". gc.get_stats() Return a list of three per-generation dictionaries containing collection statistics since interpreter start. The number of keys may change in the future, but currently each dictionary will contain the following items: * "collections" is the number of times this generation was collected; * "collected" is the total number of objects collected inside this generation; * "uncollectable" is the total number of objects which were found to be uncollectable (and were therefore moved to the "garbage" list) inside this generation. New in version 3.4. gc.set_threshold(threshold0[, threshold1[, threshold2]]) Set the garbage collection thresholds (the collection frequency). Setting *threshold0* to zero disables collection. The GC classifies objects into three generations depending on how many collection sweeps they have survived. New objects are placed in the youngest generation (generation "0"). If an object survives a collection it is moved into the next older generation. Since generation "2" is the oldest generation, objects in that generation remain there after a collection. In order to decide when to run, the collector keeps track of the number object allocations and deallocations since the last collection. When the number of allocations minus the number of deallocations exceeds *threshold0*, collection starts. Initially only generation "0" is examined. If generation "0" has been examined more than *threshold1* times since generation "1" has been examined, then generation "1" is examined as well. With the third generation, things are a bit more complicated, see Collecting the oldest generation for more information. gc.get_count() Return the current collection counts as a tuple of "(count0, count1, count2)". gc.get_threshold() Return the current collection thresholds as a tuple of "(threshold0, threshold1, threshold2)". gc.get_referrers(*objs) Return the list of objects that directly refer to any of objs. This function will only locate those containers which support garbage collection; extension types which do refer to other objects but do not support garbage collection will not be found. Note that objects which have already been dereferenced, but which live in cycles and have not yet been collected by the garbage collector can be listed among the resulting referrers. To get only currently live objects, call "collect()" before calling "get_referrers()". Warning: Care must be taken when using objects returned by "get_referrers()" because some of them could still be under construction and hence in a temporarily invalid state. Avoid using "get_referrers()" for any purpose other than debugging. Raises an auditing event "gc.get_referrers" with argument "objs". gc.get_referents(*objs) Return a list of objects directly referred to by any of the arguments. The referents returned are those objects visited by the arguments’ C-level "tp_traverse" methods (if any), and may not be all objects actually directly reachable. "tp_traverse" methods are supported only by objects that support garbage collection, and are only required to visit objects that may be involved in a cycle. So, for example, if an integer is directly reachable from an argument, that integer object may or may not appear in the result list. Raises an auditing event "gc.get_referents" with argument "objs". gc.is_tracked(obj) Returns "True" if the object is currently tracked by the garbage collector, "False" otherwise. As a general rule, instances of atomic types aren’t tracked and instances of non-atomic types (containers, user-defined objects…) are. However, some type- specific optimizations can be present in order to suppress the garbage collector footprint of simple instances (e.g. dicts containing only atomic keys and values): >>> gc.is_tracked(0) False >>> gc.is_tracked("a") False >>> gc.is_tracked([]) True >>> gc.is_tracked({}) False >>> gc.is_tracked({"a": 1}) False >>> gc.is_tracked({"a": []}) True New in version 3.1. gc.is_finalized(obj) Returns "True" if the given object has been finalized by the garbage collector, "False" otherwise. >>> x = None >>> class Lazarus: ... def __del__(self): ... global x ... x = self ... >>> lazarus = Lazarus() >>> gc.is_finalized(lazarus) False >>> del lazarus >>> gc.is_finalized(x) True New in version 3.9. gc.freeze() Freeze all the objects tracked by gc - move them to a permanent generation and ignore all the future collections. This can be used before a POSIX fork() call to make the gc copy-on-write friendly or to speed up collection. Also collection before a POSIX fork() call may free pages for future allocation which can cause copy-on-write too so it’s advised to disable gc in parent process and freeze before fork and enable gc in child process. New in version 3.7. gc.unfreeze() Unfreeze the objects in the permanent generation, put them back into the oldest generation. New in version 3.7. gc.get_freeze_count() Return the number of objects in the permanent generation. New in version 3.7. The following variables are provided for read-only access (you can mutate the values but should not rebind them): gc.garbage A list of objects which the collector found to be unreachable but could not be freed (uncollectable objects). Starting with Python 3.4, this list should be empty most of the time, except when using instances of C extension types with a non-"NULL" "tp_del" slot. If "DEBUG_SAVEALL" is set, then all unreachable objects will be added to this list rather than freed. Changed in version 3.2: If this list is non-empty at *interpreter shutdown*, a "ResourceWarning" is emitted, which is silent by default. If "DEBUG_UNCOLLECTABLE" is set, in addition all uncollectable objects are printed. Changed in version 3.4: Following **PEP 442**, objects with a "__del__()" method don’t end up in "gc.garbage" anymore. gc.callbacks A list of callbacks that will be invoked by the garbage collector before and after collection. The callbacks will be called with two arguments, *phase* and *info*. *phase* can be one of two values: “start”: The garbage collection is about to start. “stop”: The garbage collection has finished. *info* is a dict providing more information for the callback. The following keys are currently defined: “generation”: The oldest generation being collected. “collected”: When *phase* is “stop”, the number of objects successfully collected. “uncollectable”: When *phase* is “stop”, the number of objects that could not be collected and were put in "garbage". Applications can add their own callbacks to this list. The primary use cases are: Gathering statistics about garbage collection, such as how often various generations are collected, and how long the collection takes. Allowing applications to identify and clear their own uncollectable types when they appear in "garbage". New in version 3.3. The following constants are provided for use with "set_debug()": gc.DEBUG_STATS Print statistics during collection. This information can be useful when tuning the collection frequency. gc.DEBUG_COLLECTABLE Print information on collectable objects found. gc.DEBUG_UNCOLLECTABLE Print information of uncollectable objects found (objects which are not reachable but cannot be freed by the collector). These objects will be added to the "garbage" list. Changed in version 3.2: Also print the contents of the "garbage" list at *interpreter shutdown*, if it isn’t empty. gc.DEBUG_SAVEALL When set, all unreachable objects found will be appended to *garbage* rather than being freed. This can be useful for debugging a leaking program. gc.DEBUG_LEAK The debugging flags necessary for the collector to print information about a leaking program (equal to "DEBUG_COLLECTABLE | DEBUG_UNCOLLECTABLE | DEBUG_SAVEALL"). "getopt" — C-style parser for command line options ************************************************** **Source code:** Lib/getopt.py Note: The "getopt" module is a parser for command line options whose API is designed to be familiar to users of the C "getopt()" function. Users who are unfamiliar with the C "getopt()" function or who would like to write less code and get better help and error messages should consider using the "argparse" module instead. ====================================================================== This module helps scripts to parse the command line arguments in "sys.argv". It supports the same conventions as the Unix "getopt()" function (including the special meanings of arguments of the form ‘"-"’ and ‘"--"‘). Long options similar to those supported by GNU software may be used as well via an optional third argument. This module provides two functions and an exception: getopt.getopt(args, shortopts, longopts=[]) Parses command line options and parameter list. *args* is the argument list to be parsed, without the leading reference to the running program. Typically, this means "sys.argv[1:]". *shortopts* is the string of option letters that the script wants to recognize, with options that require an argument followed by a colon ("':'"; i.e., the same format that Unix "getopt()" uses). Note: Unlike GNU "getopt()", after a non-option argument, all further arguments are considered also non-options. This is similar to the way non-GNU Unix systems work. *longopts*, if specified, must be a list of strings with the names of the long options which should be supported. The leading "'--'" characters should not be included in the option name. Long options which require an argument should be followed by an equal sign ("'='"). Optional arguments are not supported. To accept only long options, *shortopts* should be an empty string. Long options on the command line can be recognized so long as they provide a prefix of the option name that matches exactly one of the accepted options. For example, if *longopts* is "['foo', 'frob']", the option "--fo" will match as "--foo", but "--f" will not match uniquely, so "GetoptError" will be raised. The return value consists of two elements: the first is a list of "(option, value)" pairs; the second is the list of program arguments left after the option list was stripped (this is a trailing slice of *args*). Each option-and-value pair returned has the option as its first element, prefixed with a hyphen for short options (e.g., "'-x'") or two hyphens for long options (e.g., "'-- long-option'"), and the option argument as its second element, or an empty string if the option has no argument. The options occur in the list in the same order in which they were found, thus allowing multiple occurrences. Long and short options may be mixed. getopt.gnu_getopt(args, shortopts, longopts=[]) This function works like "getopt()", except that GNU style scanning mode is used by default. This means that option and non-option arguments may be intermixed. The "getopt()" function stops processing options as soon as a non-option argument is encountered. If the first character of the option string is "'+'", or if the environment variable "POSIXLY_CORRECT" is set, then option processing stops as soon as a non-option argument is encountered. exception getopt.GetoptError This is raised when an unrecognized option is found in the argument list or when an option requiring an argument is given none. The argument to the exception is a string indicating the cause of the error. For long options, an argument given to an option which does not require one will also cause this exception to be raised. The attributes "msg" and "opt" give the error message and related option; if there is no specific option to which the exception relates, "opt" is an empty string. exception getopt.error Alias for "GetoptError"; for backward compatibility. An example using only Unix style options: >>> import getopt >>> args = '-a -b -cfoo -d bar a1 a2'.split() >>> args ['-a', '-b', '-cfoo', '-d', 'bar', 'a1', 'a2'] >>> optlist, args = getopt.getopt(args, 'abc:d:') >>> optlist [('-a', ''), ('-b', ''), ('-c', 'foo'), ('-d', 'bar')] >>> args ['a1', 'a2'] Using long option names is equally easy: >>> s = '--condition=foo --testing --output-file abc.def -x a1 a2' >>> args = s.split() >>> args ['--condition=foo', '--testing', '--output-file', 'abc.def', '-x', 'a1', 'a2'] >>> optlist, args = getopt.getopt(args, 'x', [ ... 'condition=', 'output-file=', 'testing']) >>> optlist [('--condition', 'foo'), ('--testing', ''), ('--output-file', 'abc.def'), ('-x', '')] >>> args ['a1', 'a2'] In a script, typical usage is something like this: import getopt, sys def main(): try: opts, args = getopt.getopt(sys.argv[1:], "ho:v", ["help", "output="]) except getopt.GetoptError as err: # print help information and exit: print(err) # will print something like "option -a not recognized" usage() sys.exit(2) output = None verbose = False for o, a in opts: if o == "-v": verbose = True elif o in ("-h", "--help"): usage() sys.exit() elif o in ("-o", "--output"): output = a else: assert False, "unhandled option" # ... if __name__ == "__main__": main() Note that an equivalent command line interface could be produced with less code and more informative help and error messages by using the "argparse" module: import argparse if __name__ == '__main__': parser = argparse.ArgumentParser() parser.add_argument('-o', '--output') parser.add_argument('-v', dest='verbose', action='store_true') args = parser.parse_args() # ... do something with args.output ... # ... do something with args.verbose .. See also: Module "argparse" Alternative command line option and argument parsing library. "getpass" — Portable password input *********************************** **Source code:** Lib/getpass.py ====================================================================== The "getpass" module provides two functions: getpass.getpass(prompt='Password: ', stream=None) Prompt the user for a password without echoing. The user is prompted using the string *prompt*, which defaults to "'Password: '". On Unix, the prompt is written to the file-like object *stream* using the replace error handler if needed. *stream* defaults to the controlling terminal ("/dev/tty") or if that is unavailable to "sys.stderr" (this argument is ignored on Windows). If echo free input is unavailable getpass() falls back to printing a warning message to *stream* and reading from "sys.stdin" and issuing a "GetPassWarning". Note: If you call getpass from within IDLE, the input may be done in the terminal you launched IDLE from rather than the idle window itself. exception getpass.GetPassWarning A "UserWarning" subclass issued when password input may be echoed. getpass.getuser() Return the “login name” of the user. This function checks the environment variables "LOGNAME", "USER", "LNAME" and "USERNAME", in order, and returns the value of the first one which is set to a non-empty string. If none are set, the login name from the password database is returned on systems which support the "pwd" module, otherwise, an exception is raised. In general, this function should be preferred over "os.getlogin()". "gettext" — Multilingual internationalization services ****************************************************** **Source code:** Lib/gettext.py ====================================================================== The "gettext" module provides internationalization (I18N) and localization (L10N) services for your Python modules and applications. It supports both the GNU **gettext** message catalog API and a higher level, class-based API that may be more appropriate for Python files. The interface described below allows you to write your module and application messages in one natural language, and provide a catalog of translated messages for running under different natural languages. Some hints on localizing your Python modules and applications are also given. GNU **gettext** API =================== The "gettext" module defines the following API, which is very similar to the GNU **gettext** API. If you use this API you will affect the translation of your entire application globally. Often this is what you want if your application is monolingual, with the choice of language dependent on the locale of your user. If you are localizing a Python module, or if your application needs to switch languages on the fly, you probably want to use the class-based API instead. gettext.bindtextdomain(domain, localedir=None) Bind the *domain* to the locale directory *localedir*. More concretely, "gettext" will look for binary ".mo" files for the given domain using the path (on Unix): "*localedir*/*language*/LC_MESSAGES/*domain*.mo", where *language* is searched for in the environment variables "LANGUAGE", "LC_ALL", "LC_MESSAGES", and "LANG" respectively. If *localedir* is omitted or "None", then the current binding for *domain* is returned. [1] gettext.bind_textdomain_codeset(domain, codeset=None) Bind the *domain* to *codeset*, changing the encoding of byte strings returned by the "lgettext()", "ldgettext()", "lngettext()" and "ldngettext()" functions. If *codeset* is omitted, then the current binding is returned. Deprecated since version 3.8, will be removed in version 3.10. gettext.textdomain(domain=None) Change or query the current global domain. If *domain* is "None", then the current global domain is returned, otherwise the global domain is set to *domain*, which is returned. gettext.gettext(message) Return the localized translation of *message*, based on the current global domain, language, and locale directory. This function is usually aliased as "_()" in the local namespace (see examples below). gettext.dgettext(domain, message) Like "gettext()", but look the message up in the specified *domain*. gettext.ngettext(singular, plural, n) Like "gettext()", but consider plural forms. If a translation is found, apply the plural formula to *n*, and return the resulting message (some languages have more than two plural forms). If no translation is found, return *singular* if *n* is 1; return *plural* otherwise. The Plural formula is taken from the catalog header. It is a C or Python expression that has a free variable *n*; the expression evaluates to the index of the plural in the catalog. See the GNU gettext documentation for the precise syntax to be used in ".po" files and the formulas for a variety of languages. gettext.dngettext(domain, singular, plural, n) Like "ngettext()", but look the message up in the specified *domain*. gettext.pgettext(context, message) gettext.dpgettext(domain, context, message) gettext.npgettext(context, singular, plural, n) gettext.dnpgettext(domain, context, singular, plural, n) Similar to the corresponding functions without the "p" in the prefix (that is, "gettext()", "dgettext()", "ngettext()", "dngettext()"), but the translation is restricted to the given message *context*. New in version 3.8. gettext.lgettext(message) gettext.ldgettext(domain, message) gettext.lngettext(singular, plural, n) gettext.ldngettext(domain, singular, plural, n) Equivalent to the corresponding functions without the "l" prefix ("gettext()", "dgettext()", "ngettext()" and "dngettext()"), but the translation is returned as a byte string encoded in the preferred system encoding if no other encoding was explicitly set with "bind_textdomain_codeset()". Warning: These functions should be avoided in Python 3, because they return encoded bytes. It’s much better to use alternatives which return Unicode strings instead, since most Python applications will want to manipulate human readable text as strings instead of bytes. Further, it’s possible that you may get unexpected Unicode-related exceptions if there are encoding problems with the translated strings. Deprecated since version 3.8, will be removed in version 3.10. Note that GNU **gettext** also defines a "dcgettext()" method, but this was deemed not useful and so it is currently unimplemented. Here’s an example of typical usage for this API: import gettext gettext.bindtextdomain('myapplication', '/path/to/my/language/directory') gettext.textdomain('myapplication') _ = gettext.gettext # ... print(_('This is a translatable string.')) Class-based API =============== The class-based API of the "gettext" module gives you more flexibility and greater convenience than the GNU **gettext** API. It is the recommended way of localizing your Python applications and modules. "gettext" defines a "GNUTranslations" class which implements the parsing of GNU ".mo" format files, and has methods for returning strings. Instances of this class can also install themselves in the built-in namespace as the function "_()". gettext.find(domain, localedir=None, languages=None, all=False) This function implements the standard ".mo" file search algorithm. It takes a *domain*, identical to what "textdomain()" takes. Optional *localedir* is as in "bindtextdomain()". Optional *languages* is a list of strings, where each string is a language code. If *localedir* is not given, then the default system locale directory is used. [2] If *languages* is not given, then the following environment variables are searched: "LANGUAGE", "LC_ALL", "LC_MESSAGES", and "LANG". The first one returning a non-empty value is used for the *languages* variable. The environment variables should contain a colon separated list of languages, which will be split on the colon to produce the expected list of language code strings. "find()" then expands and normalizes the languages, and then iterates through them, searching for an existing file built of these components: "*localedir*/*language*/LC_MESSAGES/*domain*.mo" The first such file name that exists is returned by "find()". If no such file is found, then "None" is returned. If *all* is given, it returns a list of all file names, in the order in which they appear in the languages list or the environment variables. gettext.translation(domain, localedir=None, languages=None, class_=None, fallback=False, codeset=None) Return a "*Translations" instance based on the *domain*, *localedir*, and *languages*, which are first passed to "find()" to get a list of the associated ".mo" file paths. Instances with identical ".mo" file names are cached. The actual class instantiated is *class_* if provided, otherwise "GNUTranslations". The class’s constructor must take a single *file object* argument. If provided, *codeset* will change the charset used to encode translated strings in the "lgettext()" and "lngettext()" methods. If multiple files are found, later files are used as fallbacks for earlier ones. To allow setting the fallback, "copy.copy()" is used to clone each translation object from the cache; the actual instance data is still shared with the cache. If no ".mo" file is found, this function raises "OSError" if *fallback* is false (which is the default), and returns a "NullTranslations" instance if *fallback* is true. Changed in version 3.3: "IOError" used to be raised instead of "OSError". Deprecated since version 3.8, will be removed in version 3.10: The *codeset* parameter. gettext.install(domain, localedir=None, codeset=None, names=None) This installs the function "_()" in Python’s builtins namespace, based on *domain*, *localedir*, and *codeset* which are passed to the function "translation()". For the *names* parameter, please see the description of the translation object’s "install()" method. As seen below, you usually mark the strings in your application that are candidates for translation, by wrapping them in a call to the "_()" function, like this: print(_('This string will be translated.')) For convenience, you want the "_()" function to be installed in Python’s builtins namespace, so it is easily accessible in all modules of your application. Deprecated since version 3.8, will be removed in version 3.10: The *codeset* parameter. The "NullTranslations" class ---------------------------- Translation classes are what actually implement the translation of original source file message strings to translated message strings. The base class used by all translation classes is "NullTranslations"; this provides the basic interface you can use to write your own specialized translation classes. Here are the methods of "NullTranslations": class gettext.NullTranslations(fp=None) Takes an optional *file object* *fp*, which is ignored by the base class. Initializes “protected” instance variables *_info* and *_charset* which are set by derived classes, as well as *_fallback*, which is set through "add_fallback()". It then calls "self._parse(fp)" if *fp* is not "None". _parse(fp) No-op in the base class, this method takes file object *fp*, and reads the data from the file, initializing its message catalog. If you have an unsupported message catalog file format, you should override this method to parse your format. add_fallback(fallback) Add *fallback* as the fallback object for the current translation object. A translation object should consult the fallback if it cannot provide a translation for a given message. gettext(message) If a fallback has been set, forward "gettext()" to the fallback. Otherwise, return *message*. Overridden in derived classes. ngettext(singular, plural, n) If a fallback has been set, forward "ngettext()" to the fallback. Otherwise, return *singular* if *n* is 1; return *plural* otherwise. Overridden in derived classes. pgettext(context, message) If a fallback has been set, forward "pgettext()" to the fallback. Otherwise, return the translated message. Overridden in derived classes. New in version 3.8. npgettext(context, singular, plural, n) If a fallback has been set, forward "npgettext()" to the fallback. Otherwise, return the translated message. Overridden in derived classes. New in version 3.8. lgettext(message) lngettext(singular, plural, n) Equivalent to "gettext()" and "ngettext()", but the translation is returned as a byte string encoded in the preferred system encoding if no encoding was explicitly set with "set_output_charset()". Overridden in derived classes. Warning: These methods should be avoided in Python 3. See the warning for the "lgettext()" function. Deprecated since version 3.8, will be removed in version 3.10. info() Return the “protected” "_info" variable, a dictionary containing the metadata found in the message catalog file. charset() Return the encoding of the message catalog file. output_charset() Return the encoding used to return translated messages in "lgettext()" and "lngettext()". Deprecated since version 3.8, will be removed in version 3.10. set_output_charset(charset) Change the encoding used to return translated messages. Deprecated since version 3.8, will be removed in version 3.10. install(names=None) This method installs "gettext()" into the built-in namespace, binding it to "_". If the *names* parameter is given, it must be a sequence containing the names of functions you want to install in the builtins namespace in addition to "_()". Supported names are "'gettext'", "'ngettext'", "'pgettext'", "'npgettext'", "'lgettext'", and "'lngettext'". Note that this is only one way, albeit the most convenient way, to make the "_()" function available to your application. Because it affects the entire application globally, and specifically the built-in namespace, localized modules should never install "_()". Instead, they should use this code to make "_()" available to their module: import gettext t = gettext.translation('mymodule', ...) _ = t.gettext This puts "_()" only in the module’s global namespace and so only affects calls within this module. Changed in version 3.8: Added "'pgettext'" and "'npgettext'". The "GNUTranslations" class --------------------------- The "gettext" module provides one additional class derived from "NullTranslations": "GNUTranslations". This class overrides "_parse()" to enable reading GNU **gettext** format ".mo" files in both big-endian and little-endian format. "GNUTranslations" parses optional metadata out of the translation catalog. It is convention with GNU **gettext** to include metadata as the translation for the empty string. This metadata is in **RFC 822**-style "key: value" pairs, and should contain the "Project-Id- Version" key. If the key "Content-Type" is found, then the "charset" property is used to initialize the “protected” "_charset" instance variable, defaulting to "None" if not found. If the charset encoding is specified, then all message ids and message strings read from the catalog are converted to Unicode using this encoding, else ASCII is assumed. Since message ids are read as Unicode strings too, all "*gettext()" methods will assume message ids as Unicode strings, not byte strings. The entire set of key/value pairs are placed into a dictionary and set as the “protected” "_info" instance variable. If the ".mo" file’s magic number is invalid, the major version number is unexpected, or if other problems occur while reading the file, instantiating a "GNUTranslations" class can raise "OSError". class gettext.GNUTranslations The following methods are overridden from the base class implementation: gettext(message) Look up the *message* id in the catalog and return the corresponding message string, as a Unicode string. If there is no entry in the catalog for the *message* id, and a fallback has been set, the look up is forwarded to the fallback’s "gettext()" method. Otherwise, the *message* id is returned. ngettext(singular, plural, n) Do a plural-forms lookup of a message id. *singular* is used as the message id for purposes of lookup in the catalog, while *n* is used to determine which plural form to use. The returned message string is a Unicode string. If the message id is not found in the catalog, and a fallback is specified, the request is forwarded to the fallback’s "ngettext()" method. Otherwise, when *n* is 1 *singular* is returned, and *plural* is returned in all other cases. Here is an example: n = len(os.listdir('.')) cat = GNUTranslations(somefile) message = cat.ngettext( 'There is %(num)d file in this directory', 'There are %(num)d files in this directory', n) % {'num': n} pgettext(context, message) Look up the *context* and *message* id in the catalog and return the corresponding message string, as a Unicode string. If there is no entry in the catalog for the *message* id and *context*, and a fallback has been set, the look up is forwarded to the fallback’s "pgettext()" method. Otherwise, the *message* id is returned. New in version 3.8. npgettext(context, singular, plural, n) Do a plural-forms lookup of a message id. *singular* is used as the message id for purposes of lookup in the catalog, while *n* is used to determine which plural form to use. If the message id for *context* is not found in the catalog, and a fallback is specified, the request is forwarded to the fallback’s "npgettext()" method. Otherwise, when *n* is 1 *singular* is returned, and *plural* is returned in all other cases. New in version 3.8. lgettext(message) lngettext(singular, plural, n) Equivalent to "gettext()" and "ngettext()", but the translation is returned as a byte string encoded in the preferred system encoding if no encoding was explicitly set with "set_output_charset()". Warning: These methods should be avoided in Python 3. See the warning for the "lgettext()" function. Deprecated since version 3.8, will be removed in version 3.10. Solaris message catalog support ------------------------------- The Solaris operating system defines its own binary ".mo" file format, but since no documentation can be found on this format, it is not supported at this time. The Catalog constructor ----------------------- GNOME uses a version of the "gettext" module by James Henstridge, but this version has a slightly different API. Its documented usage was: import gettext cat = gettext.Catalog(domain, localedir) _ = cat.gettext print(_('hello world')) For compatibility with this older module, the function "Catalog()" is an alias for the "translation()" function described above. One difference between this module and Henstridge’s: his catalog objects supported access through a mapping API, but this appears to be unused and so is not currently supported. Internationalizing your programs and modules ============================================ Internationalization (I18N) refers to the operation by which a program is made aware of multiple languages. Localization (L10N) refers to the adaptation of your program, once internationalized, to the local language and cultural habits. In order to provide multilingual messages for your Python programs, you need to take the following steps: 1. prepare your program or module by specially marking translatable strings 2. run a suite of tools over your marked files to generate raw messages catalogs 3. create language-specific translations of the message catalogs 4. use the "gettext" module so that message strings are properly translated In order to prepare your code for I18N, you need to look at all the strings in your files. Any string that needs to be translated should be marked by wrapping it in "_('...')" — that is, a call to the function "_()". For example: filename = 'mylog.txt' message = _('writing a log message') with open(filename, 'w') as fp: fp.write(message) In this example, the string "'writing a log message'" is marked as a candidate for translation, while the strings "'mylog.txt'" and "'w'" are not. There are a few tools to extract the strings meant for translation. The original GNU **gettext** only supported C or C++ source code but its extended version **xgettext** scans code written in a number of languages, including Python, to find strings marked as translatable. Babel is a Python internationalization library that includes a "pybabel" script to extract and compile message catalogs. François Pinard’s program called **xpot** does a similar job and is available as part of his po-utils package. (Python also includes pure-Python versions of these programs, called **pygettext.py** and **msgfmt.py**; some Python distributions will install them for you. **pygettext.py** is similar to **xgettext**, but only understands Python source code and cannot handle other programming languages such as C or C++. **pygettext.py** supports a command-line interface similar to **xgettext**; for details on its use, run "pygettext.py --help". **msgfmt.py** is binary compatible with GNU **msgfmt**. With these two programs, you may not need the GNU **gettext** package to internationalize your Python applications.) **xgettext**, **pygettext**, and similar tools generate ".po" files that are message catalogs. They are structured human-readable files that contain every marked string in the source code, along with a placeholder for the translated versions of these strings. Copies of these ".po" files are then handed over to the individual human translators who write translations for every supported natural language. They send back the completed language-specific versions as a ".po" file that’s compiled into a machine-readable ".mo" binary catalog file using the **msgfmt** program. The ".mo" files are used by the "gettext" module for the actual translation processing at run-time. How you use the "gettext" module in your code depends on whether you are internationalizing a single module or your entire application. The next two sections will discuss each case. Localizing your module ---------------------- If you are localizing your module, you must take care not to make global changes, e.g. to the built-in namespace. You should not use the GNU **gettext** API but instead the class-based API. Let’s say your module is called “spam” and the module’s various natural language translation ".mo" files reside in "/usr/share/locale" in GNU **gettext** format. Here’s what you would put at the top of your module: import gettext t = gettext.translation('spam', '/usr/share/locale') _ = t.gettext Localizing your application --------------------------- If you are localizing your application, you can install the "_()" function globally into the built-in namespace, usually in the main driver file of your application. This will let all your application- specific files just use "_('...')" without having to explicitly install it in each file. In the simple case then, you need only add the following bit of code to the main driver file of your application: import gettext gettext.install('myapplication') If you need to set the locale directory, you can pass it into the "install()" function: import gettext gettext.install('myapplication', '/usr/share/locale') Changing languages on the fly ----------------------------- If your program needs to support many languages at the same time, you may want to create multiple translation instances and then switch between them explicitly, like so: import gettext lang1 = gettext.translation('myapplication', languages=['en']) lang2 = gettext.translation('myapplication', languages=['fr']) lang3 = gettext.translation('myapplication', languages=['de']) # start by using language1 lang1.install() # ... time goes by, user selects language 2 lang2.install() # ... more time goes by, user selects language 3 lang3.install() Deferred translations --------------------- In most coding situations, strings are translated where they are coded. Occasionally however, you need to mark strings for translation, but defer actual translation until later. A classic example is: animals = ['mollusk', 'albatross', 'rat', 'penguin', 'python', ] # ... for a in animals: print(a) Here, you want to mark the strings in the "animals" list as being translatable, but you don’t actually want to translate them until they are printed. Here is one way you can handle this situation: def _(message): return message animals = [_('mollusk'), _('albatross'), _('rat'), _('penguin'), _('python'), ] del _ # ... for a in animals: print(_(a)) This works because the dummy definition of "_()" simply returns the string unchanged. And this dummy definition will temporarily override any definition of "_()" in the built-in namespace (until the "del" command). Take care, though if you have a previous definition of "_()" in the local namespace. Note that the second use of "_()" will not identify “a” as being translatable to the **gettext** program, because the parameter is not a string literal. Another way to handle this is with the following example: def N_(message): return message animals = [N_('mollusk'), N_('albatross'), N_('rat'), N_('penguin'), N_('python'), ] # ... for a in animals: print(_(a)) In this case, you are marking translatable strings with the function "N_()", which won’t conflict with any definition of "_()". However, you will need to teach your message extraction program to look for translatable strings marked with "N_()". **xgettext**, **pygettext**, "pybabel extract", and **xpot** all support this through the use of the "-k" command-line switch. The choice of "N_()" here is totally arbitrary; it could have just as easily been "MarkThisStringForTranslation()". Acknowledgements ================ The following people contributed code, feedback, design suggestions, previous implementations, and valuable experience to the creation of this module: * Peter Funk * James Henstridge * Juan David Ibáñez Palomar * Marc-André Lemburg * Martin von Löwis * François Pinard * Barry Warsaw * Gustavo Niemeyer -[ Footnotes ]- [1] The default locale directory is system dependent; for example, on RedHat Linux it is "/usr/share/locale", but on Solaris it is "/usr/lib/locale". The "gettext" module does not try to support these system dependent defaults; instead its default is "*sys.base_prefix*/share/locale" (see "sys.base_prefix"). For this reason, it is always best to call "bindtextdomain()" with an explicit absolute path at the start of your application. [2] See the footnote for "bindtextdomain()" above. "glob" — Unix style pathname pattern expansion ********************************************** **Source code:** Lib/glob.py ====================================================================== The "glob" module finds all the pathnames matching a specified pattern according to the rules used by the Unix shell, although results are returned in arbitrary order. No tilde expansion is done, but "*", "?", and character ranges expressed with "[]" will be correctly matched. This is done by using the "os.scandir()" and "fnmatch.fnmatch()" functions in concert, and not by actually invoking a subshell. Note that files beginning with a dot (".") can only be matched by patterns that also start with a dot, unlike "fnmatch.fnmatch()" or "pathlib.Path.glob()". (For tilde and shell variable expansion, use "os.path.expanduser()" and "os.path.expandvars()".) For a literal match, wrap the meta-characters in brackets. For example, "'[?]'" matches the character "'?'". See also: The "pathlib" module offers high-level path objects. glob.glob(pathname, *, recursive=False) Return a possibly-empty list of path names that match *pathname*, which must be a string containing a path specification. *pathname* can be either absolute (like "/usr/src/Python-1.5/Makefile") or relative (like "../../Tools/*/*.gif"), and can contain shell-style wildcards. Broken symlinks are included in the results (as in the shell). Whether or not the results are sorted depends on the file system. If a file that satisfies conditions is removed or added during the call of this function, whether a path name for that file be included is unspecified. If *recursive* is true, the pattern “"**"” will match any files and zero or more directories, subdirectories and symbolic links to directories. If the pattern is followed by an "os.sep" or "os.altsep" then files will not match. Raises an auditing event "glob.glob" with arguments "pathname", "recursive". Note: Using the “"**"” pattern in large directory trees may consume an inordinate amount of time. Changed in version 3.5: Support for recursive globs using “"**"”. glob.iglob(pathname, *, recursive=False) Return an *iterator* which yields the same values as "glob()" without actually storing them all simultaneously. Raises an auditing event "glob.glob" with arguments "pathname", "recursive". glob.escape(pathname) Escape all special characters ("'?'", "'*'" and "'['"). This is useful if you want to match an arbitrary literal string that may have special characters in it. Special characters in drive/UNC sharepoints are not escaped, e.g. on Windows "escape('//?/c:/Quo vadis?.txt')" returns "'//?/c:/Quo vadis[?].txt'". New in version 3.4. For example, consider a directory containing the following files: "1.gif", "2.txt", "card.gif" and a subdirectory "sub" which contains only the file "3.txt". "glob()" will produce the following results. Notice how any leading components of the path are preserved. >>> import glob >>> glob.glob('./[0-9].*') ['./1.gif', './2.txt'] >>> glob.glob('*.gif') ['1.gif', 'card.gif'] >>> glob.glob('?.gif') ['1.gif'] >>> glob.glob('**/*.txt', recursive=True) ['2.txt', 'sub/3.txt'] >>> glob.glob('./**/', recursive=True) ['./', './sub/'] If the directory contains files starting with "." they won’t be matched by default. For example, consider a directory containing "card.gif" and ".card.gif": >>> import glob >>> glob.glob('*.gif') ['card.gif'] >>> glob.glob('.c*') ['.card.gif'] See also: Module "fnmatch" Shell-style filename (not path) expansion "graphlib" — Functionality to operate with graph-like structures **************************************************************** **Source code:** Lib/graphlib.py ====================================================================== class graphlib.TopologicalSorter(graph=None) Provides functionality to topologically sort a graph of hashable nodes. A topological order is a linear ordering of the vertices in a graph such that for every directed edge u -> v from vertex u to vertex v, vertex u comes before vertex v in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this example, a topological ordering is just a valid sequence for the tasks. A complete topological ordering is possible if and only if the graph has no directed cycles, that is, if it is a directed acyclic graph. If the optional *graph* argument is provided it must be a dictionary representing a directed acyclic graph where the keys are nodes and the values are iterables of all predecessors of that node in the graph (the nodes that have edges that point to the value in the key). Additional nodes can be added to the graph using the "add()" method. In the general case, the steps required to perform the sorting of a given graph are as follows: * Create an instance of the "TopologicalSorter" with an optional initial graph. * Add additional nodes to the graph. * Call "prepare()" on the graph. * While "is_active()" is "True", iterate over the nodes returned by "get_ready()" and process them. Call "done()" on each node as it finishes processing. In case just an immediate sorting of the nodes in the graph is required and no parallelism is involved, the convenience method "TopologicalSorter.static_order()" can be used directly: >>> graph = {"D": {"B", "C"}, "C": {"A"}, "B": {"A"}} >>> ts = TopologicalSorter(graph) >>> tuple(ts.static_order()) ('A', 'C', 'B', 'D') The class is designed to easily support parallel processing of the nodes as they become ready. For instance: topological_sorter = TopologicalSorter() # Add nodes to 'topological_sorter'... topological_sorter.prepare() while topological_sorter.is_active(): for node in topological_sorter.get_ready(): # Worker threads or processes take nodes to work on off the # 'task_queue' queue. task_queue.put(node) # When the work for a node is done, workers put the node in # 'finalized_tasks_queue' so we can get more nodes to work on. # The definition of 'is_active()' guarantees that, at this point, at # least one node has been placed on 'task_queue' that hasn't yet # been passed to 'done()', so this blocking 'get()' must (eventually) # succeed. After calling 'done()', we loop back to call 'get_ready()' # again, so put newly freed nodes on 'task_queue' as soon as # logically possible. node = finalized_tasks_queue.get() topological_sorter.done(node) add(node, *predecessors) Add a new node and its predecessors to the graph. Both the *node* and all elements in *predecessors* must be hashable. If called multiple times with the same node argument, the set of dependencies will be the union of all dependencies passed in. It is possible to add a node with no dependencies (*predecessors* is not provided) or to provide a dependency twice. If a node that has not been provided before is included among *predecessors* it will be automatically added to the graph with no predecessors of its own. Raises "ValueError" if called after "prepare()". prepare() Mark the graph as finished and check for cycles in the graph. If any cycle is detected, "CycleError" will be raised, but "get_ready()" can still be used to obtain as many nodes as possible until cycles block more progress. After a call to this function, the graph cannot be modified, and therefore no more nodes can be added using "add()". is_active() Returns "True" if more progress can be made and "False" otherwise. Progress can be made if cycles do not block the resolution and either there are still nodes ready that haven’t yet been returned by "TopologicalSorter.get_ready()" or the number of nodes marked "TopologicalSorter.done()" is less than the number that have been returned by "TopologicalSorter.get_ready()". The "__bool__()" method of this class defers to this function, so instead of: if ts.is_active(): ... it is possible to simply do: if ts: ... Raises "ValueError" if called without calling "prepare()" previously. done(*nodes) Marks a set of nodes returned by "TopologicalSorter.get_ready()" as processed, unblocking any successor of each node in *nodes* for being returned in the future by a call to "TopologicalSorter.get_ready()". Raises "ValueError" if any node in *nodes* has already been marked as processed by a previous call to this method or if a node was not added to the graph by using "TopologicalSorter.add()", if called without calling "prepare()" or if node has not yet been returned by "get_ready()". get_ready() Returns a "tuple" with all the nodes that are ready. Initially it returns all nodes with no predecessors, and once those are marked as processed by calling "TopologicalSorter.done()", further calls will return all new nodes that have all their predecessors already processed. Once no more progress can be made, empty tuples are returned. Raises "ValueError" if called without calling "prepare()" previously. static_order() Returns an iterator object which will iterate over nodes in a topological order. When using this method, "prepare()" and "done()" should not be called. This method is equivalent to: def static_order(self): self.prepare() while self.is_active(): node_group = self.get_ready() yield from node_group self.done(*node_group) The particular order that is returned may depend on the specific order in which the items were inserted in the graph. For example: >>> ts = TopologicalSorter() >>> ts.add(3, 2, 1) >>> ts.add(1, 0) >>> print([*ts.static_order()]) [2, 0, 1, 3] >>> ts2 = TopologicalSorter() >>> ts2.add(1, 0) >>> ts2.add(3, 2, 1) >>> print([*ts2.static_order()]) [0, 2, 1, 3] This is due to the fact that “0” and “2” are in the same level in the graph (they would have been returned in the same call to "get_ready()") and the order between them is determined by the order of insertion. If any cycle is detected, "CycleError" will be raised. New in version 3.9. Exceptions ========== The "graphlib" module defines the following exception classes: exception graphlib.CycleError Subclass of "ValueError" raised by "TopologicalSorter.prepare()" if cycles exist in the working graph. If multiple cycles exist, only one undefined choice among them will be reported and included in the exception. The detected cycle can be accessed via the second element in the "args" attribute of the exception instance and consists in a list of nodes, such that each node is, in the graph, an immediate predecessor of the next node in the list. In the reported list, the first and the last node will be the same, to make it clear that it is cyclic. "grp" — The group database ************************** ====================================================================== This module provides access to the Unix group database. It is available on all Unix versions. Group database entries are reported as a tuple-like object, whose attributes correspond to the members of the "group" structure (Attribute field below, see ""): +---------+-------------+-----------------------------------+ | Index | Attribute | Meaning | |=========|=============|===================================| | 0 | gr_name | the name of the group | +---------+-------------+-----------------------------------+ | 1 | gr_passwd | the (encrypted) group password; | | | | often empty | +---------+-------------+-----------------------------------+ | 2 | gr_gid | the numerical group ID | +---------+-------------+-----------------------------------+ | 3 | gr_mem | all the group member’s user | | | | names | +---------+-------------+-----------------------------------+ The gid is an integer, name and password are strings, and the member list is a list of strings. (Note that most users are not explicitly listed as members of the group they are in according to the password database. Check both databases to get complete membership information. Also note that a "gr_name" that starts with a "+" or "-" is likely to be a YP/NIS reference and may not be accessible via "getgrnam()" or "getgrgid()".) It defines the following items: grp.getgrgid(gid) Return the group database entry for the given numeric group ID. "KeyError" is raised if the entry asked for cannot be found. Deprecated since version 3.6: Since Python 3.6 the support of non- integer arguments like floats or strings in "getgrgid()" is deprecated. grp.getgrnam(name) Return the group database entry for the given group name. "KeyError" is raised if the entry asked for cannot be found. grp.getgrall() Return a list of all available group entries, in arbitrary order. See also: Module "pwd" An interface to the user database, similar to this. Module "spwd" An interface to the shadow password database, similar to this. "gzip" — Support for **gzip** files *********************************** **Source code:** Lib/gzip.py ====================================================================== This module provides a simple interface to compress and decompress files just like the GNU programs **gzip** and **gunzip** would. The data compression is provided by the "zlib" module. The "gzip" module provides the "GzipFile" class, as well as the "open()", "compress()" and "decompress()" convenience functions. The "GzipFile" class reads and writes **gzip**-format files, automatically compressing or decompressing the data so that it looks like an ordinary *file object*. Note that additional file formats which can be decompressed by the **gzip** and **gunzip** programs, such as those produced by **compress** and **pack**, are not supported by this module. The module defines the following items: gzip.open(filename, mode='rb', compresslevel=9, encoding=None, errors=None, newline=None) Open a gzip-compressed file in binary or text mode, returning a *file object*. The *filename* argument can be an actual filename (a "str" or "bytes" object), or an existing file object to read from or write to. The *mode* argument can be any of "'r'", "'rb'", "'a'", "'ab'", "'w'", "'wb'", "'x'" or "'xb'" for binary mode, or "'rt'", "'at'", "'wt'", or "'xt'" for text mode. The default is "'rb'". The *compresslevel* argument is an integer from 0 to 9, as for the "GzipFile" constructor. For binary mode, this function is equivalent to the "GzipFile" constructor: "GzipFile(filename, mode, compresslevel)". In this case, the *encoding*, *errors* and *newline* arguments must not be provided. For text mode, a "GzipFile" object is created, and wrapped in an "io.TextIOWrapper" instance with the specified encoding, error handling behavior, and line ending(s). Changed in version 3.3: Added support for *filename* being a file object, support for text mode, and the *encoding*, *errors* and *newline* arguments. Changed in version 3.4: Added support for the "'x'", "'xb'" and "'xt'" modes. Changed in version 3.6: Accepts a *path-like object*. exception gzip.BadGzipFile An exception raised for invalid gzip files. It inherits "OSError". "EOFError" and "zlib.error" can also be raised for invalid gzip files. New in version 3.8. class gzip.GzipFile(filename=None, mode=None, compresslevel=9, fileobj=None, mtime=None) Constructor for the "GzipFile" class, which simulates most of the methods of a *file object*, with the exception of the "truncate()" method. At least one of *fileobj* and *filename* must be given a non-trivial value. The new class instance is based on *fileobj*, which can be a regular file, an "io.BytesIO" object, or any other object which simulates a file. It defaults to "None", in which case *filename* is opened to provide a file object. When *fileobj* is not "None", the *filename* argument is only used to be included in the **gzip** file header, which may include the original filename of the uncompressed file. It defaults to the filename of *fileobj*, if discernible; otherwise, it defaults to the empty string, and in this case the original filename is not included in the header. The *mode* argument can be any of "'r'", "'rb'", "'a'", "'ab'", "'w'", "'wb'", "'x'", or "'xb'", depending on whether the file will be read or written. The default is the mode of *fileobj* if discernible; otherwise, the default is "'rb'". In future Python releases the mode of *fileobj* will not be used. It is better to always specify *mode* for writing. Note that the file is always opened in binary mode. To open a compressed file in text mode, use "open()" (or wrap your "GzipFile" with an "io.TextIOWrapper"). The *compresslevel* argument is an integer from "0" to "9" controlling the level of compression; "1" is fastest and produces the least compression, and "9" is slowest and produces the most compression. "0" is no compression. The default is "9". The *mtime* argument is an optional numeric timestamp to be written to the last modification time field in the stream when compressing. It should only be provided in compression mode. If omitted or "None", the current time is used. See the "mtime" attribute for more details. Calling a "GzipFile" object’s "close()" method does not close *fileobj*, since you might wish to append more material after the compressed data. This also allows you to pass an "io.BytesIO" object opened for writing as *fileobj*, and retrieve the resulting memory buffer using the "io.BytesIO" object’s "getvalue()" method. "GzipFile" supports the "io.BufferedIOBase" interface, including iteration and the "with" statement. Only the "truncate()" method isn’t implemented. "GzipFile" also provides the following method and attribute: peek(n) Read *n* uncompressed bytes without advancing the file position. At most one single read on the compressed stream is done to satisfy the call. The number of bytes returned may be more or less than requested. Note: While calling "peek()" does not change the file position of the "GzipFile", it may change the position of the underlying file object (e.g. if the "GzipFile" was constructed with the *fileobj* parameter). New in version 3.2. mtime When decompressing, the value of the last modification time field in the most recently read header may be read from this attribute, as an integer. The initial value before reading any headers is "None". All **gzip** compressed streams are required to contain this timestamp field. Some programs, such as **gunzip**, make use of the timestamp. The format is the same as the return value of "time.time()" and the "st_mtime" attribute of the object returned by "os.stat()". Changed in version 3.1: Support for the "with" statement was added, along with the *mtime* constructor argument and "mtime" attribute. Changed in version 3.2: Support for zero-padded and unseekable files was added. Changed in version 3.3: The "io.BufferedIOBase.read1()" method is now implemented. Changed in version 3.4: Added support for the "'x'" and "'xb'" modes. Changed in version 3.5: Added support for writing arbitrary *bytes- like objects*. The "read()" method now accepts an argument of "None". Changed in version 3.6: Accepts a *path-like object*. Deprecated since version 3.9: Opening "GzipFile" for writing without specifying the *mode* argument is deprecated. gzip.compress(data, compresslevel=9, *, mtime=None) Compress the *data*, returning a "bytes" object containing the compressed data. *compresslevel* and *mtime* have the same meaning as in the "GzipFile" constructor above. New in version 3.2. Changed in version 3.8: Added the *mtime* parameter for reproducible output. gzip.decompress(data) Decompress the *data*, returning a "bytes" object containing the uncompressed data. New in version 3.2. Examples of usage ================= Example of how to read a compressed file: import gzip with gzip.open('/home/joe/file.txt.gz', 'rb') as f: file_content = f.read() Example of how to create a compressed GZIP file: import gzip content = b"Lots of content here" with gzip.open('/home/joe/file.txt.gz', 'wb') as f: f.write(content) Example of how to GZIP compress an existing file: import gzip import shutil with open('/home/joe/file.txt', 'rb') as f_in: with gzip.open('/home/joe/file.txt.gz', 'wb') as f_out: shutil.copyfileobj(f_in, f_out) Example of how to GZIP compress a binary string: import gzip s_in = b"Lots of content here" s_out = gzip.compress(s_in) See also: Module "zlib" The basic data compression module needed to support the **gzip** file format. Command Line Interface ====================== The "gzip" module provides a simple command line interface to compress or decompress files. Once executed the "gzip" module keeps the input file(s). Changed in version 3.8: Add a new command line interface with a usage. By default, when you will execute the CLI, the default compression level is 6. Command line options -------------------- file If *file* is not specified, read from "sys.stdin". --fast Indicates the fastest compression method (less compression). --best Indicates the slowest compression method (best compression). -d, --decompress Decompress the given file. -h, --help Show the help message. "hashlib" — Secure hashes and message digests ********************************************* **Source code:** Lib/hashlib.py ====================================================================== This module implements a common interface to many different secure hash and message digest algorithms. Included are the FIPS secure hash algorithms SHA1, SHA224, SHA256, SHA384, and SHA512 (defined in FIPS 180-2) as well as RSA’s MD5 algorithm (defined in Internet **RFC 1321**). The terms “secure hash” and “message digest” are interchangeable. Older algorithms were called message digests. The modern term is secure hash. Note: If you want the adler32 or crc32 hash functions, they are available in the "zlib" module. Warning: Some algorithms have known hash collision weaknesses, refer to the “See also” section at the end. Hash algorithms =============== There is one constructor method named for each type of *hash*. All return a hash object with the same simple interface. For example: use "sha256()" to create a SHA-256 hash object. You can now feed this object with *bytes-like objects* (normally "bytes") using the "update()" method. At any point you can ask it for the *digest* of the concatenation of the data fed to it so far using the "digest()" or "hexdigest()" methods. Note: For better multithreading performance, the Python *GIL* is released for data larger than 2047 bytes at object creation or on update. Note: Feeding string objects into "update()" is not supported, as hashes work on bytes, not on characters. Constructors for hash algorithms that are always present in this module are "sha1()", "sha224()", "sha256()", "sha384()", "sha512()", "blake2b()", and "blake2s()". "md5()" is normally available as well, though it may be missing or blocked if you are using a rare “FIPS compliant” build of Python. Additional algorithms may also be available depending upon the OpenSSL library that Python uses on your platform. On most platforms the "sha3_224()", "sha3_256()", "sha3_384()", "sha3_512()", "shake_128()", "shake_256()" are also available. New in version 3.6: SHA3 (Keccak) and SHAKE constructors "sha3_224()", "sha3_256()", "sha3_384()", "sha3_512()", "shake_128()", "shake_256()". New in version 3.6: "blake2b()" and "blake2s()" were added. Changed in version 3.9: All hashlib constructors take a keyword-only argument *usedforsecurity* with default value "True". A false value allows the use of insecure and blocked hashing algorithms in restricted environments. "False" indicates that the hashing algorithm is not used in a security context, e.g. as a non-cryptographic one-way compression function.Hashlib now uses SHA3 and SHAKE from OpenSSL 1.1.1 and newer. For example, to obtain the digest of the byte string "b'Nobody inspects the spammish repetition'": >>> import hashlib >>> m = hashlib.sha256() >>> m.update(b"Nobody inspects") >>> m.update(b" the spammish repetition") >>> m.digest() b'\x03\x1e\xdd}Ae\x15\x93\xc5\xfe\\\x00o\xa5u+7\xfd\xdf\xf7\xbcN\x84:\xa6\xaf\x0c\x95\x0fK\x94\x06' >>> m.digest_size 32 >>> m.block_size 64 More condensed: >>> hashlib.sha224(b"Nobody inspects the spammish repetition").hexdigest() 'a4337bc45a8fc544c03f52dc550cd6e1e87021bc896588bd79e901e2' hashlib.new(name[, data], *, usedforsecurity=True) Is a generic constructor that takes the string *name* of the desired algorithm as its first parameter. It also exists to allow access to the above listed hashes as well as any other algorithms that your OpenSSL library may offer. The named constructors are much faster than "new()" and should be preferred. Using "new()" with an algorithm provided by OpenSSL: >>> h = hashlib.new('sha256') >>> h.update(b"Nobody inspects the spammish repetition") >>> h.hexdigest() '031edd7d41651593c5fe5c006fa5752b37fddff7bc4e843aa6af0c950f4b9406' Hashlib provides the following constant attributes: hashlib.algorithms_guaranteed A set containing the names of the hash algorithms guaranteed to be supported by this module on all platforms. Note that ‘md5’ is in this list despite some upstream vendors offering an odd “FIPS compliant” Python build that excludes it. New in version 3.2. hashlib.algorithms_available A set containing the names of the hash algorithms that are available in the running Python interpreter. These names will be recognized when passed to "new()". "algorithms_guaranteed" will always be a subset. The same algorithm may appear multiple times in this set under different names (thanks to OpenSSL). New in version 3.2. The following values are provided as constant attributes of the hash objects returned by the constructors: hash.digest_size The size of the resulting hash in bytes. hash.block_size The internal block size of the hash algorithm in bytes. A hash object has the following attributes: hash.name The canonical name of this hash, always lowercase and always suitable as a parameter to "new()" to create another hash of this type. Changed in version 3.4: The name attribute has been present in CPython since its inception, but until Python 3.4 was not formally specified, so may not exist on some platforms. A hash object has the following methods: hash.update(data) Update the hash object with the *bytes-like object*. Repeated calls are equivalent to a single call with the concatenation of all the arguments: "m.update(a); m.update(b)" is equivalent to "m.update(a+b)". Changed in version 3.1: The Python GIL is released to allow other threads to run while hash updates on data larger than 2047 bytes is taking place when using hash algorithms supplied by OpenSSL. hash.digest() Return the digest of the data passed to the "update()" method so far. This is a bytes object of size "digest_size" which may contain bytes in the whole range from 0 to 255. hash.hexdigest() Like "digest()" except the digest is returned as a string object of double length, containing only hexadecimal digits. This may be used to exchange the value safely in email or other non-binary environments. hash.copy() Return a copy (“clone”) of the hash object. This can be used to efficiently compute the digests of data sharing a common initial substring. SHAKE variable length digests ============================= The "shake_128()" and "shake_256()" algorithms provide variable length digests with length_in_bits//2 up to 128 or 256 bits of security. As such, their digest methods require a length. Maximum length is not limited by the SHAKE algorithm. shake.digest(length) Return the digest of the data passed to the "update()" method so far. This is a bytes object of size *length* which may contain bytes in the whole range from 0 to 255. shake.hexdigest(length) Like "digest()" except the digest is returned as a string object of double length, containing only hexadecimal digits. This may be used to exchange the value safely in email or other non-binary environments. Key derivation ============== Key derivation and key stretching algorithms are designed for secure password hashing. Naive algorithms such as "sha1(password)" are not resistant against brute-force attacks. A good password hashing function must be tunable, slow, and include a salt. hashlib.pbkdf2_hmac(hash_name, password, salt, iterations, dklen=None) The function provides PKCS#5 password-based key derivation function 2. It uses HMAC as pseudorandom function. The string *hash_name* is the desired name of the hash digest algorithm for HMAC, e.g. ‘sha1’ or ‘sha256’. *password* and *salt* are interpreted as buffers of bytes. Applications and libraries should limit *password* to a sensible length (e.g. 1024). *salt* should be about 16 or more bytes from a proper source, e.g. "os.urandom()". The number of *iterations* should be chosen based on the hash algorithm and computing power. As of 2013, at least 100,000 iterations of SHA-256 are suggested. *dklen* is the length of the derived key. If *dklen* is "None" then the digest size of the hash algorithm *hash_name* is used, e.g. 64 for SHA-512. >>> import hashlib >>> dk = hashlib.pbkdf2_hmac('sha256', b'password', b'salt', 100000) >>> dk.hex() '0394a2ede332c9a13eb82e9b24631604c31df978b4e2f0fbd2c549944f9d79a5' New in version 3.4. Note: A fast implementation of *pbkdf2_hmac* is available with OpenSSL. The Python implementation uses an inline version of "hmac". It is about three times slower and doesn’t release the GIL. hashlib.scrypt(password, *, salt, n, r, p, maxmem=0, dklen=64) The function provides scrypt password-based key derivation function as defined in **RFC 7914**. *password* and *salt* must be *bytes-like objects*. Applications and libraries should limit *password* to a sensible length (e.g. 1024). *salt* should be about 16 or more bytes from a proper source, e.g. "os.urandom()". *n* is the CPU/Memory cost factor, *r* the block size, *p* parallelization factor and *maxmem* limits memory (OpenSSL 1.1.0 defaults to 32 MiB). *dklen* is the length of the derived key. Availability: OpenSSL 1.1+. New in version 3.6. BLAKE2 ====== BLAKE2 is a cryptographic hash function defined in **RFC 7693** that comes in two flavors: * **BLAKE2b**, optimized for 64-bit platforms and produces digests of any size between 1 and 64 bytes, * **BLAKE2s**, optimized for 8- to 32-bit platforms and produces digests of any size between 1 and 32 bytes. BLAKE2 supports **keyed mode** (a faster and simpler replacement for HMAC), **salted hashing**, **personalization**, and **tree hashing**. Hash objects from this module follow the API of standard library’s "hashlib" objects. Creating hash objects --------------------- New hash objects are created by calling constructor functions: hashlib.blake2b(data=b'', *, digest_size=64, key=b'', salt=b'', person=b'', fanout=1, depth=1, leaf_size=0, node_offset=0, node_depth=0, inner_size=0, last_node=False, usedforsecurity=True) hashlib.blake2s(data=b'', *, digest_size=32, key=b'', salt=b'', person=b'', fanout=1, depth=1, leaf_size=0, node_offset=0, node_depth=0, inner_size=0, last_node=False, usedforsecurity=True) These functions return the corresponding hash objects for calculating BLAKE2b or BLAKE2s. They optionally take these general parameters: * *data*: initial chunk of data to hash, which must be *bytes-like object*. It can be passed only as positional argument. * *digest_size*: size of output digest in bytes. * *key*: key for keyed hashing (up to 64 bytes for BLAKE2b, up to 32 bytes for BLAKE2s). * *salt*: salt for randomized hashing (up to 16 bytes for BLAKE2b, up to 8 bytes for BLAKE2s). * *person*: personalization string (up to 16 bytes for BLAKE2b, up to 8 bytes for BLAKE2s). The following table shows limits for general parameters (in bytes): +---------+-------------+----------+-----------+-------------+ | Hash | digest_size | len(key) | len(salt) | len(person) | |=========|=============|==========|===========|=============| | BLAKE2b | 64 | 64 | 16 | 16 | +---------+-------------+----------+-----------+-------------+ | BLAKE2s | 32 | 32 | 8 | 8 | +---------+-------------+----------+-----------+-------------+ Note: BLAKE2 specification defines constant lengths for salt and personalization parameters, however, for convenience, this implementation accepts byte strings of any size up to the specified length. If the length of the parameter is less than specified, it is padded with zeros, thus, for example, "b'salt'" and "b'salt\x00'" is the same value. (This is not the case for *key*.) These sizes are available as module constants described below. Constructor functions also accept the following tree hashing parameters: * *fanout*: fanout (0 to 255, 0 if unlimited, 1 in sequential mode). * *depth*: maximal depth of tree (1 to 255, 255 if unlimited, 1 in sequential mode). * *leaf_size*: maximal byte length of leaf (0 to "2**32-1", 0 if unlimited or in sequential mode). * *node_offset*: node offset (0 to "2**64-1" for BLAKE2b, 0 to "2**48-1" for BLAKE2s, 0 for the first, leftmost, leaf, or in sequential mode). * *node_depth*: node depth (0 to 255, 0 for leaves, or in sequential mode). * *inner_size*: inner digest size (0 to 64 for BLAKE2b, 0 to 32 for BLAKE2s, 0 in sequential mode). * *last_node*: boolean indicating whether the processed node is the last one (*False* for sequential mode). [image: Explanation of tree mode parameters.][image] See section 2.10 in BLAKE2 specification for comprehensive review of tree hashing. Constants --------- blake2b.SALT_SIZE blake2s.SALT_SIZE Salt length (maximum length accepted by constructors). blake2b.PERSON_SIZE blake2s.PERSON_SIZE Personalization string length (maximum length accepted by constructors). blake2b.MAX_KEY_SIZE blake2s.MAX_KEY_SIZE Maximum key size. blake2b.MAX_DIGEST_SIZE blake2s.MAX_DIGEST_SIZE Maximum digest size that the hash function can output. Examples -------- Simple hashing ~~~~~~~~~~~~~~ To calculate hash of some data, you should first construct a hash object by calling the appropriate constructor function ("blake2b()" or "blake2s()"), then update it with the data by calling "update()" on the object, and, finally, get the digest out of the object by calling "digest()" (or "hexdigest()" for hex-encoded string). >>> from hashlib import blake2b >>> h = blake2b() >>> h.update(b'Hello world') >>> h.hexdigest() '6ff843ba685842aa82031d3f53c48b66326df7639a63d128974c5c14f31a0f33343a8c65551134ed1ae0f2b0dd2bb495dc81039e3eeb0aa1bb0388bbeac29183' As a shortcut, you can pass the first chunk of data to update directly to the constructor as the positional argument: >>> from hashlib import blake2b >>> blake2b(b'Hello world').hexdigest() '6ff843ba685842aa82031d3f53c48b66326df7639a63d128974c5c14f31a0f33343a8c65551134ed1ae0f2b0dd2bb495dc81039e3eeb0aa1bb0388bbeac29183' You can call "hash.update()" as many times as you need to iteratively update the hash: >>> from hashlib import blake2b >>> items = [b'Hello', b' ', b'world'] >>> h = blake2b() >>> for item in items: ... h.update(item) >>> h.hexdigest() '6ff843ba685842aa82031d3f53c48b66326df7639a63d128974c5c14f31a0f33343a8c65551134ed1ae0f2b0dd2bb495dc81039e3eeb0aa1bb0388bbeac29183' Using different digest sizes ~~~~~~~~~~~~~~~~~~~~~~~~~~~~ BLAKE2 has configurable size of digests up to 64 bytes for BLAKE2b and up to 32 bytes for BLAKE2s. For example, to replace SHA-1 with BLAKE2b without changing the size of output, we can tell BLAKE2b to produce 20-byte digests: >>> from hashlib import blake2b >>> h = blake2b(digest_size=20) >>> h.update(b'Replacing SHA1 with the more secure function') >>> h.hexdigest() 'd24f26cf8de66472d58d4e1b1774b4c9158b1f4c' >>> h.digest_size 20 >>> len(h.digest()) 20 Hash objects with different digest sizes have completely different outputs (shorter hashes are *not* prefixes of longer hashes); BLAKE2b and BLAKE2s produce different outputs even if the output length is the same: >>> from hashlib import blake2b, blake2s >>> blake2b(digest_size=10).hexdigest() '6fa1d8fcfd719046d762' >>> blake2b(digest_size=11).hexdigest() 'eb6ec15daf9546254f0809' >>> blake2s(digest_size=10).hexdigest() '1bf21a98c78a1c376ae9' >>> blake2s(digest_size=11).hexdigest() '567004bf96e4a25773ebf4' Keyed hashing ~~~~~~~~~~~~~ Keyed hashing can be used for authentication as a faster and simpler replacement for Hash-based message authentication code (HMAC). BLAKE2 can be securely used in prefix-MAC mode thanks to the indifferentiability property inherited from BLAKE. This example shows how to get a (hex-encoded) 128-bit authentication code for message "b'message data'" with key "b'pseudorandom key'": >>> from hashlib import blake2b >>> h = blake2b(key=b'pseudorandom key', digest_size=16) >>> h.update(b'message data') >>> h.hexdigest() '3d363ff7401e02026f4a4687d4863ced' As a practical example, a web application can symmetrically sign cookies sent to users and later verify them to make sure they weren’t tampered with: >>> from hashlib import blake2b >>> from hmac import compare_digest >>> >>> SECRET_KEY = b'pseudorandomly generated server secret key' >>> AUTH_SIZE = 16 >>> >>> def sign(cookie): ... h = blake2b(digest_size=AUTH_SIZE, key=SECRET_KEY) ... h.update(cookie) ... return h.hexdigest().encode('utf-8') >>> >>> def verify(cookie, sig): ... good_sig = sign(cookie) ... return compare_digest(good_sig, sig) >>> >>> cookie = b'user-alice' >>> sig = sign(cookie) >>> print("{0},{1}".format(cookie.decode('utf-8'), sig)) user-alice,b'43b3c982cf697e0c5ab22172d1ca7421' >>> verify(cookie, sig) True >>> verify(b'user-bob', sig) False >>> verify(cookie, b'0102030405060708090a0b0c0d0e0f00') False Even though there’s a native keyed hashing mode, BLAKE2 can, of course, be used in HMAC construction with "hmac" module: >>> import hmac, hashlib >>> m = hmac.new(b'secret key', digestmod=hashlib.blake2s) >>> m.update(b'message') >>> m.hexdigest() 'e3c8102868d28b5ff85fc35dda07329970d1a01e273c37481326fe0c861c8142' Randomized hashing ~~~~~~~~~~~~~~~~~~ By setting *salt* parameter users can introduce randomization to the hash function. Randomized hashing is useful for protecting against collision attacks on the hash function used in digital signatures. Randomized hashing is designed for situations where one party, the message preparer, generates all or part of a message to be signed by a second party, the message signer. If the message preparer is able to find cryptographic hash function collisions (i.e., two messages producing the same hash value), then they might prepare meaningful versions of the message that would produce the same hash value and digital signature, but with different results (e.g., transferring $1,000,000 to an account, rather than $10). Cryptographic hash functions have been designed with collision resistance as a major goal, but the current concentration on attacking cryptographic hash functions may result in a given cryptographic hash function providing less collision resistance than expected. Randomized hashing offers the signer additional protection by reducing the likelihood that a preparer can generate two or more messages that ultimately yield the same hash value during the digital signature generation process — even if it is practical to find collisions for the hash function. However, the use of randomized hashing may reduce the amount of security provided by a digital signature when all portions of the message are prepared by the signer. (NIST SP-800-106 “Randomized Hashing for Digital Signatures”) In BLAKE2 the salt is processed as a one-time input to the hash function during initialization, rather than as an input to each compression function. Warning: *Salted hashing* (or just hashing) with BLAKE2 or any other general- purpose cryptographic hash function, such as SHA-256, is not suitable for hashing passwords. See BLAKE2 FAQ for more information. >>> import os >>> from hashlib import blake2b >>> msg = b'some message' >>> # Calculate the first hash with a random salt. >>> salt1 = os.urandom(blake2b.SALT_SIZE) >>> h1 = blake2b(salt=salt1) >>> h1.update(msg) >>> # Calculate the second hash with a different random salt. >>> salt2 = os.urandom(blake2b.SALT_SIZE) >>> h2 = blake2b(salt=salt2) >>> h2.update(msg) >>> # The digests are different. >>> h1.digest() != h2.digest() True Personalization ~~~~~~~~~~~~~~~ Sometimes it is useful to force hash function to produce different digests for the same input for different purposes. Quoting the authors of the Skein hash function: We recommend that all application designers seriously consider doing this; we have seen many protocols where a hash that is computed in one part of the protocol can be used in an entirely different part because two hash computations were done on similar or related data, and the attacker can force the application to make the hash inputs the same. Personalizing each hash function used in the protocol summarily stops this type of attack. (The Skein Hash Function Family, p. 21) BLAKE2 can be personalized by passing bytes to the *person* argument: >>> from hashlib import blake2b >>> FILES_HASH_PERSON = b'MyApp Files Hash' >>> BLOCK_HASH_PERSON = b'MyApp Block Hash' >>> h = blake2b(digest_size=32, person=FILES_HASH_PERSON) >>> h.update(b'the same content') >>> h.hexdigest() '20d9cd024d4fb086aae819a1432dd2466de12947831b75c5a30cf2676095d3b4' >>> h = blake2b(digest_size=32, person=BLOCK_HASH_PERSON) >>> h.update(b'the same content') >>> h.hexdigest() 'cf68fb5761b9c44e7878bfb2c4c9aea52264a80b75005e65619778de59f383a3' Personalization together with the keyed mode can also be used to derive different keys from a single one. >>> from hashlib import blake2s >>> from base64 import b64decode, b64encode >>> orig_key = b64decode(b'Rm5EPJai72qcK3RGBpW3vPNfZy5OZothY+kHY6h21KM=') >>> enc_key = blake2s(key=orig_key, person=b'kEncrypt').digest() >>> mac_key = blake2s(key=orig_key, person=b'kMAC').digest() >>> print(b64encode(enc_key).decode('utf-8')) rbPb15S/Z9t+agffno5wuhB77VbRi6F9Iv2qIxU7WHw= >>> print(b64encode(mac_key).decode('utf-8')) G9GtHFE1YluXY1zWPlYk1e/nWfu0WSEb0KRcjhDeP/o= Tree mode ~~~~~~~~~ Here’s an example of hashing a minimal tree with two leaf nodes: 10 / \ 00 01 This example uses 64-byte internal digests, and returns the 32-byte final digest: >>> from hashlib import blake2b >>> >>> FANOUT = 2 >>> DEPTH = 2 >>> LEAF_SIZE = 4096 >>> INNER_SIZE = 64 >>> >>> buf = bytearray(6000) >>> >>> # Left leaf ... h00 = blake2b(buf[0:LEAF_SIZE], fanout=FANOUT, depth=DEPTH, ... leaf_size=LEAF_SIZE, inner_size=INNER_SIZE, ... node_offset=0, node_depth=0, last_node=False) >>> # Right leaf ... h01 = blake2b(buf[LEAF_SIZE:], fanout=FANOUT, depth=DEPTH, ... leaf_size=LEAF_SIZE, inner_size=INNER_SIZE, ... node_offset=1, node_depth=0, last_node=True) >>> # Root node ... h10 = blake2b(digest_size=32, fanout=FANOUT, depth=DEPTH, ... leaf_size=LEAF_SIZE, inner_size=INNER_SIZE, ... node_offset=0, node_depth=1, last_node=True) >>> h10.update(h00.digest()) >>> h10.update(h01.digest()) >>> h10.hexdigest() '3ad2a9b37c6070e374c7a8c508fe20ca86b6ed54e286e93a0318e95e881db5aa' Credits ------- BLAKE2 was designed by *Jean-Philippe Aumasson*, *Samuel Neves*, *Zooko Wilcox-O’Hearn*, and *Christian Winnerlein* based on SHA-3 finalist BLAKE created by *Jean-Philippe Aumasson*, *Luca Henzen*, *Willi Meier*, and *Raphael C.-W. Phan*. It uses core algorithm from ChaCha cipher designed by *Daniel J. Bernstein*. The stdlib implementation is based on pyblake2 module. It was written by *Dmitry Chestnykh* based on C implementation written by *Samuel Neves*. The documentation was copied from pyblake2 and written by *Dmitry Chestnykh*. The C code was partly rewritten for Python by *Christian Heimes*. The following public domain dedication applies for both C hash function implementation, extension code, and this documentation: To the extent possible under law, the author(s) have dedicated all copyright and related and neighboring rights to this software to the public domain worldwide. This software is distributed without any warranty. You should have received a copy of the CC0 Public Domain Dedication along with this software. If not, see https://creativecommons.org/publicdomain/zero/1.0/. The following people have helped with development or contributed their changes to the project and the public domain according to the Creative Commons Public Domain Dedication 1.0 Universal: * *Alexandr Sokolovskiy* See also: Module "hmac" A module to generate message authentication codes using hashes. Module "base64" Another way to encode binary hashes for non-binary environments. https://blake2.net Official BLAKE2 website. https://csrc.nist.gov/csrc/media/publications/fips/180/2/archive/20 02-08-01/documents/fips180-2.pdf The FIPS 180-2 publication on Secure Hash Algorithms. https://en.wikipedia.org/wiki/Cryptographic_hash_function#Cryptogra phic_hash_algorithms Wikipedia article with information on which algorithms have known issues and what that means regarding their use. https://www.ietf.org/rfc/rfc2898.txt PKCS #5: Password-Based Cryptography Specification Version 2.0 "heapq" — Heap queue algorithm ****************************** **Source code:** Lib/heapq.py ====================================================================== This module provides an implementation of the heap queue algorithm, also known as the priority queue algorithm. Heaps are binary trees for which every parent node has a value less than or equal to any of its children. This implementation uses arrays for which "heap[k] <= heap[2*k+1]" and "heap[k] <= heap[2*k+2]" for all *k*, counting elements from zero. For the sake of comparison, non-existing elements are considered to be infinite. The interesting property of a heap is that its smallest element is always the root, "heap[0]". The API below differs from textbook heap algorithms in two aspects: (a) We use zero-based indexing. This makes the relationship between the index for a node and the indexes for its children slightly less obvious, but is more suitable since Python uses zero-based indexing. (b) Our pop method returns the smallest item, not the largest (called a “min heap” in textbooks; a “max heap” is more common in texts because of its suitability for in-place sorting). These two make it possible to view the heap as a regular Python list without surprises: "heap[0]" is the smallest item, and "heap.sort()" maintains the heap invariant! To create a heap, use a list initialized to "[]", or you can transform a populated list into a heap via function "heapify()". The following functions are provided: heapq.heappush(heap, item) Push the value *item* onto the *heap*, maintaining the heap invariant. heapq.heappop(heap) Pop and return the smallest item from the *heap*, maintaining the heap invariant. If the heap is empty, "IndexError" is raised. To access the smallest item without popping it, use "heap[0]". heapq.heappushpop(heap, item) Push *item* on the heap, then pop and return the smallest item from the *heap*. The combined action runs more efficiently than "heappush()" followed by a separate call to "heappop()". heapq.heapify(x) Transform list *x* into a heap, in-place, in linear time. heapq.heapreplace(heap, item) Pop and return the smallest item from the *heap*, and also push the new *item*. The heap size doesn’t change. If the heap is empty, "IndexError" is raised. This one step operation is more efficient than a "heappop()" followed by "heappush()" and can be more appropriate when using a fixed-size heap. The pop/push combination always returns an element from the heap and replaces it with *item*. The value returned may be larger than the *item* added. If that isn’t desired, consider using "heappushpop()" instead. Its push/pop combination returns the smaller of the two values, leaving the larger value on the heap. The module also offers three general purpose functions based on heaps. heapq.merge(*iterables, key=None, reverse=False) Merge multiple sorted inputs into a single sorted output (for example, merge timestamped entries from multiple log files). Returns an *iterator* over the sorted values. Similar to "sorted(itertools.chain(*iterables))" but returns an iterable, does not pull the data into memory all at once, and assumes that each of the input streams is already sorted (smallest to largest). Has two optional arguments which must be specified as keyword arguments. *key* specifies a *key function* of one argument that is used to extract a comparison key from each input element. The default value is "None" (compare the elements directly). *reverse* is a boolean value. If set to "True", then the input elements are merged as if each comparison were reversed. To achieve behavior similar to "sorted(itertools.chain(*iterables), reverse=True)", all iterables must be sorted from largest to smallest. Changed in version 3.5: Added the optional *key* and *reverse* parameters. heapq.nlargest(n, iterable, key=None) Return a list with the *n* largest elements from the dataset defined by *iterable*. *key*, if provided, specifies a function of one argument that is used to extract a comparison key from each element in *iterable* (for example, "key=str.lower"). Equivalent to: "sorted(iterable, key=key, reverse=True)[:n]". heapq.nsmallest(n, iterable, key=None) Return a list with the *n* smallest elements from the dataset defined by *iterable*. *key*, if provided, specifies a function of one argument that is used to extract a comparison key from each element in *iterable* (for example, "key=str.lower"). Equivalent to: "sorted(iterable, key=key)[:n]". The latter two functions perform best for smaller values of *n*. For larger values, it is more efficient to use the "sorted()" function. Also, when "n==1", it is more efficient to use the built-in "min()" and "max()" functions. If repeated usage of these functions is required, consider turning the iterable into an actual heap. Basic Examples ============== A heapsort can be implemented by pushing all values onto a heap and then popping off the smallest values one at a time: >>> def heapsort(iterable): ... h = [] ... for value in iterable: ... heappush(h, value) ... return [heappop(h) for i in range(len(h))] ... >>> heapsort([1, 3, 5, 7, 9, 2, 4, 6, 8, 0]) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] This is similar to "sorted(iterable)", but unlike "sorted()", this implementation is not stable. Heap elements can be tuples. This is useful for assigning comparison values (such as task priorities) alongside the main record being tracked: >>> h = [] >>> heappush(h, (5, 'write code')) >>> heappush(h, (7, 'release product')) >>> heappush(h, (1, 'write spec')) >>> heappush(h, (3, 'create tests')) >>> heappop(h) (1, 'write spec') Priority Queue Implementation Notes =================================== A priority queue is common use for a heap, and it presents several implementation challenges: * Sort stability: how do you get two tasks with equal priorities to be returned in the order they were originally added? * Tuple comparison breaks for (priority, task) pairs if the priorities are equal and the tasks do not have a default comparison order. * If the priority of a task changes, how do you move it to a new position in the heap? * Or if a pending task needs to be deleted, how do you find it and remove it from the queue? A solution to the first two challenges is to store entries as 3-element list including the priority, an entry count, and the task. The entry count serves as a tie-breaker so that two tasks with the same priority are returned in the order they were added. And since no two entry counts are the same, the tuple comparison will never attempt to directly compare two tasks. Another solution to the problem of non-comparable tasks is to create a wrapper class that ignores the task item and only compares the priority field: from dataclasses import dataclass, field from typing import Any @dataclass(order=True) class PrioritizedItem: priority: int item: Any=field(compare=False) The remaining challenges revolve around finding a pending task and making changes to its priority or removing it entirely. Finding a task can be done with a dictionary pointing to an entry in the queue. Removing the entry or changing its priority is more difficult because it would break the heap structure invariants. So, a possible solution is to mark the entry as removed and add a new entry with the revised priority: pq = [] # list of entries arranged in a heap entry_finder = {} # mapping of tasks to entries REMOVED = '' # placeholder for a removed task counter = itertools.count() # unique sequence count def add_task(task, priority=0): 'Add a new task or update the priority of an existing task' if task in entry_finder: remove_task(task) count = next(counter) entry = [priority, count, task] entry_finder[task] = entry heappush(pq, entry) def remove_task(task): 'Mark an existing task as REMOVED. Raise KeyError if not found.' entry = entry_finder.pop(task) entry[-1] = REMOVED def pop_task(): 'Remove and return the lowest priority task. Raise KeyError if empty.' while pq: priority, count, task = heappop(pq) if task is not REMOVED: del entry_finder[task] return task raise KeyError('pop from an empty priority queue') Theory ====== Heaps are arrays for which "a[k] <= a[2*k+1]" and "a[k] <= a[2*k+2]" for all *k*, counting elements from 0. For the sake of comparison, non-existing elements are considered to be infinite. The interesting property of a heap is that "a[0]" is always its smallest element. The strange invariant above is meant to be an efficient memory representation for a tournament. The numbers below are *k*, not "a[k]": 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 In the tree above, each cell *k* is topping "2*k+1" and "2*k+2". In a usual binary tournament we see in sports, each cell is the winner over the two cells it tops, and we can trace the winner down the tree to see all opponents s/he had. However, in many computer applications of such tournaments, we do not need to trace the history of a winner. To be more memory efficient, when a winner is promoted, we try to replace it by something else at a lower level, and the rule becomes that a cell and the two cells it tops contain three different items, but the top cell “wins” over the two topped cells. If this heap invariant is protected at all time, index 0 is clearly the overall winner. The simplest algorithmic way to remove it and find the “next” winner is to move some loser (let’s say cell 30 in the diagram above) into the 0 position, and then percolate this new 0 down the tree, exchanging values, until the invariant is re-established. This is clearly logarithmic on the total number of items in the tree. By iterating over all items, you get an O(n log n) sort. A nice feature of this sort is that you can efficiently insert new items while the sort is going on, provided that the inserted items are not “better” than the last 0’th element you extracted. This is especially useful in simulation contexts, where the tree holds all incoming events, and the “win” condition means the smallest scheduled time. When an event schedules other events for execution, they are scheduled into the future, so they can easily go into the heap. So, a heap is a good structure for implementing schedulers (this is what I used for my MIDI sequencer :-). Various structures for implementing schedulers have been extensively studied, and heaps are good for this, as they are reasonably speedy, the speed is almost constant, and the worst case is not much different than the average case. However, there are other representations which are more efficient overall, yet the worst cases might be terrible. Heaps are also very useful in big disk sorts. You most probably all know that a big sort implies producing “runs” (which are pre-sorted sequences, whose size is usually related to the amount of CPU memory), followed by a merging passes for these runs, which merging is often very cleverly organised [1]. It is very important that the initial sort produces the longest runs possible. Tournaments are a good way to achieve that. If, using all the memory available to hold a tournament, you replace and percolate items that happen to fit the current run, you’ll produce runs which are twice the size of the memory for random input, and much better for input fuzzily ordered. Moreover, if you output the 0’th item on disk and get an input which may not fit in the current tournament (because the value “wins” over the last output value), it cannot fit in the heap, so the size of the heap decreases. The freed memory could be cleverly reused immediately for progressively building a second heap, which grows at exactly the same rate the first heap is melting. When the first heap completely vanishes, you switch heaps and start a new run. Clever and quite effective! In a word, heaps are useful memory structures to know. I use them in a few applications, and I think it is good to keep a ‘heap’ module around. :-) -[ Footnotes ]- [1] The disk balancing algorithms which are current, nowadays, are more annoying than clever, and this is a consequence of the seeking capabilities of the disks. On devices which cannot seek, like big tape drives, the story was quite different, and one had to be very clever to ensure (far in advance) that each tape movement will be the most effective possible (that is, will best participate at “progressing” the merge). Some tapes were even able to read backwards, and this was also used to avoid the rewinding time. Believe me, real good tape sorts were quite spectacular to watch! From all times, sorting has always been a Great Art! :-) "hmac" — Keyed-Hashing for Message Authentication ************************************************* **Source code:** Lib/hmac.py ====================================================================== This module implements the HMAC algorithm as described by **RFC 2104**. hmac.new(key, msg=None, digestmod='') Return a new hmac object. *key* is a bytes or bytearray object giving the secret key. If *msg* is present, the method call "update(msg)" is made. *digestmod* is the digest name, digest constructor or module for the HMAC object to use. It may be any name suitable to "hashlib.new()". Despite its argument position, it is required. Changed in version 3.4: Parameter *key* can be a bytes or bytearray object. Parameter *msg* can be of any type supported by "hashlib". Parameter *digestmod* can be the name of a hash algorithm. Deprecated since version 3.4, removed in version 3.8: MD5 as implicit default digest for *digestmod* is deprecated. The digestmod parameter is now required. Pass it as a keyword argument to avoid awkwardness when you do not have an initial msg. hmac.digest(key, msg, digest) Return digest of *msg* for given secret *key* and *digest*. The function is equivalent to "HMAC(key, msg, digest).digest()", but uses an optimized C or inline implementation, which is faster for messages that fit into memory. The parameters *key*, *msg*, and *digest* have the same meaning as in "new()". CPython implementation detail, the optimized C implementation is only used when *digest* is a string and name of a digest algorithm, which is supported by OpenSSL. New in version 3.7. An HMAC object has the following methods: HMAC.update(msg) Update the hmac object with *msg*. Repeated calls are equivalent to a single call with the concatenation of all the arguments: "m.update(a); m.update(b)" is equivalent to "m.update(a + b)". Changed in version 3.4: Parameter *msg* can be of any type supported by "hashlib". HMAC.digest() Return the digest of the bytes passed to the "update()" method so far. This bytes object will be the same length as the *digest_size* of the digest given to the constructor. It may contain non-ASCII bytes, including NUL bytes. Warning: When comparing the output of "digest()" to an externally-supplied digest during a verification routine, it is recommended to use the "compare_digest()" function instead of the "==" operator to reduce the vulnerability to timing attacks. HMAC.hexdigest() Like "digest()" except the digest is returned as a string twice the length containing only hexadecimal digits. This may be used to exchange the value safely in email or other non-binary environments. Warning: When comparing the output of "hexdigest()" to an externally- supplied digest during a verification routine, it is recommended to use the "compare_digest()" function instead of the "==" operator to reduce the vulnerability to timing attacks. HMAC.copy() Return a copy (“clone”) of the hmac object. This can be used to efficiently compute the digests of strings that share a common initial substring. A hash object has the following attributes: HMAC.digest_size The size of the resulting HMAC digest in bytes. HMAC.block_size The internal block size of the hash algorithm in bytes. New in version 3.4. HMAC.name The canonical name of this HMAC, always lowercase, e.g. "hmac-md5". New in version 3.4. Deprecated since version 3.9: The undocumented attributes "HMAC.digest_cons", "HMAC.inner", and "HMAC.outer" are internal implementation details and will be removed in Python 3.10. This module also provides the following helper function: hmac.compare_digest(a, b) Return "a == b". This function uses an approach designed to prevent timing analysis by avoiding content-based short circuiting behaviour, making it appropriate for cryptography. *a* and *b* must both be of the same type: either "str" (ASCII only, as e.g. returned by "HMAC.hexdigest()"), or a *bytes-like object*. Note: If *a* and *b* are of different lengths, or if an error occurs, a timing attack could theoretically reveal information about the types and lengths of *a* and *b*—but not their values. New in version 3.3. Changed in version 3.9: The function uses OpenSSL’s "CRYPTO_memcmp()" internally when available. See also: Module "hashlib" The Python module providing secure hash functions. "html.entities" — Definitions of HTML general entities ****************************************************** **Source code:** Lib/html/entities.py ====================================================================== This module defines four dictionaries, "html5", "name2codepoint", "codepoint2name", and "entitydefs". html.entities.html5 A dictionary that maps HTML5 named character references [1] to the equivalent Unicode character(s), e.g. "html5['gt;'] == '>'". Note that the trailing semicolon is included in the name (e.g. "'gt;'"), however some of the names are accepted by the standard even without the semicolon: in this case the name is present with and without the "';'". See also "html.unescape()". New in version 3.3. html.entities.entitydefs A dictionary mapping XHTML 1.0 entity definitions to their replacement text in ISO Latin-1. html.entities.name2codepoint A dictionary that maps HTML entity names to the Unicode code points. html.entities.codepoint2name A dictionary that maps Unicode code points to HTML entity names. -[ Footnotes ]- [1] See https://html.spec.whatwg.org/multipage/syntax.html#named- character-references "html.parser" — Simple HTML and XHTML parser ******************************************** **Source code:** Lib/html/parser.py ====================================================================== This module defines a class "HTMLParser" which serves as the basis for parsing text files formatted in HTML (HyperText Mark-up Language) and XHTML. class html.parser.HTMLParser(*, convert_charrefs=True) Create a parser instance able to parse invalid markup. If *convert_charrefs* is "True" (the default), all character references (except the ones in "script"/"style" elements) are automatically converted to the corresponding Unicode characters. An "HTMLParser" instance is fed HTML data and calls handler methods when start tags, end tags, text, comments, and other markup elements are encountered. The user should subclass "HTMLParser" and override its methods to implement the desired behavior. This parser does not check that end tags match start tags or call the end-tag handler for elements which are closed implicitly by closing an outer element. Changed in version 3.4: *convert_charrefs* keyword argument added. Changed in version 3.5: The default value for argument *convert_charrefs* is now "True". Example HTML Parser Application =============================== As a basic example, below is a simple HTML parser that uses the "HTMLParser" class to print out start tags, end tags, and data as they are encountered: from html.parser import HTMLParser class MyHTMLParser(HTMLParser): def handle_starttag(self, tag, attrs): print("Encountered a start tag:", tag) def handle_endtag(self, tag): print("Encountered an end tag :", tag) def handle_data(self, data): print("Encountered some data :", data) parser = MyHTMLParser() parser.feed('Test' '

Parse me!

') The output will then be: Encountered a start tag: html Encountered a start tag: head Encountered a start tag: title Encountered some data : Test Encountered an end tag : title Encountered an end tag : head Encountered a start tag: body Encountered a start tag: h1 Encountered some data : Parse me! Encountered an end tag : h1 Encountered an end tag : body Encountered an end tag : html "HTMLParser" Methods ==================== "HTMLParser" instances have the following methods: HTMLParser.feed(data) Feed some text to the parser. It is processed insofar as it consists of complete elements; incomplete data is buffered until more data is fed or "close()" is called. *data* must be "str". HTMLParser.close() Force processing of all buffered data as if it were followed by an end-of-file mark. This method may be redefined by a derived class to define additional processing at the end of the input, but the redefined version should always call the "HTMLParser" base class method "close()". HTMLParser.reset() Reset the instance. Loses all unprocessed data. This is called implicitly at instantiation time. HTMLParser.getpos() Return current line number and offset. HTMLParser.get_starttag_text() Return the text of the most recently opened start tag. This should not normally be needed for structured processing, but may be useful in dealing with HTML “as deployed” or for re-generating input with minimal changes (whitespace between attributes can be preserved, etc.). The following methods are called when data or markup elements are encountered and they are meant to be overridden in a subclass. The base class implementations do nothing (except for "handle_startendtag()"): HTMLParser.handle_starttag(tag, attrs) This method is called to handle the start tag of an element (e.g. "
"). The *tag* argument is the name of the tag converted to lower case. The *attrs* argument is a list of "(name, value)" pairs containing the attributes found inside the tag’s "<>" brackets. The *name* will be translated to lower case, and quotes in the *value* have been removed, and character and entity references have been replaced. For instance, for the tag "", this method would be called as "handle_starttag('a', [('href', 'https://www.cwi.nl/')])". All entity references from "html.entities" are replaced in the attribute values. HTMLParser.handle_endtag(tag) This method is called to handle the end tag of an element (e.g. "
"). The *tag* argument is the name of the tag converted to lower case. HTMLParser.handle_startendtag(tag, attrs) Similar to "handle_starttag()", but called when the parser encounters an XHTML-style empty tag (""). This method may be overridden by subclasses which require this particular lexical information; the default implementation simply calls "handle_starttag()" and "handle_endtag()". HTMLParser.handle_data(data) This method is called to process arbitrary data (e.g. text nodes and the content of "" and ""). HTMLParser.handle_entityref(name) This method is called to process a named character reference of the form "&name;" (e.g. ">"), where *name* is a general entity reference (e.g. "'gt'"). This method is never called if *convert_charrefs* is "True". HTMLParser.handle_charref(name) This method is called to process decimal and hexadecimal numeric character references of the form "&#NNN;" and "&#xNNN;". For example, the decimal equivalent for ">" is ">", whereas the hexadecimal is ">"; in this case the method will receive "'62'" or "'x3E'". This method is never called if *convert_charrefs* is "True". HTMLParser.handle_comment(data) This method is called when a comment is encountered (e.g. ""). For example, the comment "" will cause this method to be called with the argument "' comment '". The content of Internet Explorer conditional comments (condcoms) will also be sent to this method, so, for "", this method will receive "'[if IE 9]>IE9-specific content"). The *decl* parameter will be the entire contents of the declaration inside the "" markup (e.g. "'DOCTYPE html'"). HTMLParser.handle_pi(data) Method called when a processing instruction is encountered. The *data* parameter will contain the entire processing instruction. For example, for the processing instruction "", this method would be called as "handle_pi("proc color='red'")". It is intended to be overridden by a derived class; the base class implementation does nothing. Note: The "HTMLParser" class uses the SGML syntactic rules for processing instructions. An XHTML processing instruction using the trailing "'?'" will cause the "'?'" to be included in *data*. HTMLParser.unknown_decl(data) This method is called when an unrecognized declaration is read by the parser. The *data* parameter will be the entire contents of the declaration inside the "" markup. It is sometimes useful to be overridden by a derived class. The base class implementation does nothing. Examples ======== The following class implements a parser that will be used to illustrate more examples: from html.parser import HTMLParser from html.entities import name2codepoint class MyHTMLParser(HTMLParser): def handle_starttag(self, tag, attrs): print("Start tag:", tag) for attr in attrs: print(" attr:", attr) def handle_endtag(self, tag): print("End tag :", tag) def handle_data(self, data): print("Data :", data) def handle_comment(self, data): print("Comment :", data) def handle_entityref(self, name): c = chr(name2codepoint[name]) print("Named ent:", c) def handle_charref(self, name): if name.startswith('x'): c = chr(int(name[1:], 16)) else: c = chr(int(name)) print("Num ent :", c) def handle_decl(self, data): print("Decl :", data) parser = MyHTMLParser() Parsing a doctype: >>> parser.feed('') Decl : DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd" Parsing an element with a few attributes and a title: >>> parser.feed('The Python logo') Start tag: img attr: ('src', 'python-logo.png') attr: ('alt', 'The Python logo') >>> >>> parser.feed('

Python

') Start tag: h1 Data : Python End tag : h1 The content of "script" and "style" elements is returned as is, without further parsing: >>> parser.feed('') Start tag: style attr: ('type', 'text/css') Data : #python { color: green } End tag : style >>> parser.feed('') Start tag: script attr: ('type', 'text/javascript') Data : alert("hello!"); End tag : script Parsing comments: >>> parser.feed('' ... '') Comment : a comment Comment : [if IE 9]>IE-specific content'"): >>> parser.feed('>>>') Named ent: > Num ent : > Num ent : > Feeding incomplete chunks to "feed()" works, but "handle_data()" might be called more than once (unless *convert_charrefs* is set to "True"): >>> for chunk in ['buff', 'ered ', 'text']: ... parser.feed(chunk) ... Start tag: span Data : buff Data : ered Data : text End tag : span Parsing invalid HTML (e.g. unquoted attributes) also works: >>> parser.feed('

tag soup

') Start tag: p Start tag: a attr: ('class', 'link') attr: ('href', '#main') Data : tag soup End tag : p End tag : a "html" — HyperText Markup Language support ****************************************** **Source code:** Lib/html/__init__.py ====================================================================== This module defines utilities to manipulate HTML. html.escape(s, quote=True) Convert the characters "&", "<" and ">" in string *s* to HTML-safe sequences. Use this if you need to display text that might contain such characters in HTML. If the optional flag *quote* is true, the characters (""") and ("'") are also translated; this helps for inclusion in an HTML attribute value delimited by quotes, as in "". New in version 3.2. html.unescape(s) Convert all named and numeric character references (e.g. ">", ">", ">") in the string *s* to the corresponding Unicode characters. This function uses the rules defined by the HTML 5 standard for both valid and invalid character references, and the "list of HTML 5 named character references". New in version 3.4. ====================================================================== Submodules in the "html" package are: * "html.parser" – HTML/XHTML parser with lenient parsing mode * "html.entities" – HTML entity definitions "http.client" — HTTP protocol client ************************************ **Source code:** Lib/http/client.py ====================================================================== This module defines classes which implement the client side of the HTTP and HTTPS protocols. It is normally not used directly — the module "urllib.request" uses it to handle URLs that use HTTP and HTTPS. See also: The Requests package is recommended for a higher-level HTTP client interface. Note: HTTPS support is only available if Python was compiled with SSL support (through the "ssl" module). The module provides the following classes: class http.client.HTTPConnection(host, port=None[, timeout], source_address=None, blocksize=8192) An "HTTPConnection" instance represents one transaction with an HTTP server. It should be instantiated passing it a host and optional port number. If no port number is passed, the port is extracted from the host string if it has the form "host:port", else the default HTTP port (80) is used. If the optional *timeout* parameter is given, blocking operations (like connection attempts) will timeout after that many seconds (if it is not given, the global default timeout setting is used). The optional *source_address* parameter may be a tuple of a (host, port) to use as the source address the HTTP connection is made from. The optional *blocksize* parameter sets the buffer size in bytes for sending a file-like message body. For example, the following calls all create instances that connect to the server at the same host and port: >>> h1 = http.client.HTTPConnection('www.python.org') >>> h2 = http.client.HTTPConnection('www.python.org:80') >>> h3 = http.client.HTTPConnection('www.python.org', 80) >>> h4 = http.client.HTTPConnection('www.python.org', 80, timeout=10) Changed in version 3.2: *source_address* was added. Changed in version 3.4: The *strict* parameter was removed. HTTP 0.9-style “Simple Responses” are not longer supported. Changed in version 3.7: *blocksize* parameter was added. class http.client.HTTPSConnection(host, port=None, key_file=None, cert_file=None[, timeout], source_address=None, *, context=None, check_hostname=None, blocksize=8192) A subclass of "HTTPConnection" that uses SSL for communication with secure servers. Default port is "443". If *context* is specified, it must be a "ssl.SSLContext" instance describing the various SSL options. Please read Security considerations for more information on best practices. Changed in version 3.2: *source_address*, *context* and *check_hostname* were added. Changed in version 3.2: This class now supports HTTPS virtual hosts if possible (that is, if "ssl.HAS_SNI" is true). Changed in version 3.4: The *strict* parameter was removed. HTTP 0.9-style “Simple Responses” are no longer supported. Changed in version 3.4.3: This class now performs all the necessary certificate and hostname checks by default. To revert to the previous, unverified, behavior "ssl._create_unverified_context()" can be passed to the *context* parameter. Changed in version 3.8: This class now enables TLS 1.3 "ssl.SSLContext.post_handshake_auth" for the default *context* or when *cert_file* is passed with a custom *context*. Deprecated since version 3.6: *key_file* and *cert_file* are deprecated in favor of *context*. Please use "ssl.SSLContext.load_cert_chain()" instead, or let "ssl.create_default_context()" select the system’s trusted CA certificates for you.The *check_hostname* parameter is also deprecated; the "ssl.SSLContext.check_hostname" attribute of *context* should be used instead. class http.client.HTTPResponse(sock, debuglevel=0, method=None, url=None) Class whose instances are returned upon successful connection. Not instantiated directly by user. Changed in version 3.4: The *strict* parameter was removed. HTTP 0.9 style “Simple Responses” are no longer supported. This module provides the following function: http.client.parse_headers(fp) Parse the headers from a file pointer *fp* representing a HTTP request/response. The file has to be a "BufferedIOBase" reader (i.e. not text) and must provide a valid **RFC 2822** style header. This function returns an instance of "http.client.HTTPMessage" that holds the header fields, but no payload (the same as "HTTPResponse.msg" and "http.server.BaseHTTPRequestHandler.headers"). After returning, the file pointer *fp* is ready to read the HTTP body. Note: "parse_headers()" does not parse the start-line of a HTTP message; it only parses the "Name: value" lines. The file has to be ready to read these field lines, so the first line should already be consumed before calling the function. The following exceptions are raised as appropriate: exception http.client.HTTPException The base class of the other exceptions in this module. It is a subclass of "Exception". exception http.client.NotConnected A subclass of "HTTPException". exception http.client.InvalidURL A subclass of "HTTPException", raised if a port is given and is either non-numeric or empty. exception http.client.UnknownProtocol A subclass of "HTTPException". exception http.client.UnknownTransferEncoding A subclass of "HTTPException". exception http.client.UnimplementedFileMode A subclass of "HTTPException". exception http.client.IncompleteRead A subclass of "HTTPException". exception http.client.ImproperConnectionState A subclass of "HTTPException". exception http.client.CannotSendRequest A subclass of "ImproperConnectionState". exception http.client.CannotSendHeader A subclass of "ImproperConnectionState". exception http.client.ResponseNotReady A subclass of "ImproperConnectionState". exception http.client.BadStatusLine A subclass of "HTTPException". Raised if a server responds with a HTTP status code that we don’t understand. exception http.client.LineTooLong A subclass of "HTTPException". Raised if an excessively long line is received in the HTTP protocol from the server. exception http.client.RemoteDisconnected A subclass of "ConnectionResetError" and "BadStatusLine". Raised by "HTTPConnection.getresponse()" when the attempt to read the response results in no data read from the connection, indicating that the remote end has closed the connection. New in version 3.5: Previously, "BadStatusLine""('')" was raised. The constants defined in this module are: http.client.HTTP_PORT The default port for the HTTP protocol (always "80"). http.client.HTTPS_PORT The default port for the HTTPS protocol (always "443"). http.client.responses This dictionary maps the HTTP 1.1 status codes to the W3C names. Example: "http.client.responses[http.client.NOT_FOUND]" is "'Not Found'". See HTTP status codes for a list of HTTP status codes that are available in this module as constants. HTTPConnection Objects ====================== "HTTPConnection" instances have the following methods: HTTPConnection.request(method, url, body=None, headers={}, *, encode_chunked=False) This will send a request to the server using the HTTP request method *method* and the selector *url*. If *body* is specified, the specified data is sent after the headers are finished. It may be a "str", a *bytes-like object*, an open *file object*, or an iterable of "bytes". If *body* is a string, it is encoded as ISO-8859-1, the default for HTTP. If it is a bytes-like object, the bytes are sent as is. If it is a *file object*, the contents of the file is sent; this file object should support at least the "read()" method. If the file object is an instance of "io.TextIOBase", the data returned by the "read()" method will be encoded as ISO-8859-1, otherwise the data returned by "read()" is sent as is. If *body* is an iterable, the elements of the iterable are sent as is until the iterable is exhausted. The *headers* argument should be a mapping of extra HTTP headers to send with the request. If *headers* contains neither Content-Length nor Transfer-Encoding, but there is a request body, one of those header fields will be added automatically. If *body* is "None", the Content-Length header is set to "0" for methods that expect a body ("PUT", "POST", and "PATCH"). If *body* is a string or a bytes-like object that is not also a *file*, the Content-Length header is set to its length. Any other type of *body* (files and iterables in general) will be chunk-encoded, and the Transfer-Encoding header will automatically be set instead of Content-Length. The *encode_chunked* argument is only relevant if Transfer-Encoding is specified in *headers*. If *encode_chunked* is "False", the HTTPConnection object assumes that all encoding is handled by the calling code. If it is "True", the body will be chunk-encoded. Note: Chunked transfer encoding has been added to the HTTP protocol version 1.1. Unless the HTTP server is known to handle HTTP 1.1, the caller must either specify the Content-Length, or must pass a "str" or bytes-like object that is not also a file as the body representation. New in version 3.2: *body* can now be an iterable. Changed in version 3.6: If neither Content-Length nor Transfer- Encoding are set in *headers*, file and iterable *body* objects are now chunk-encoded. The *encode_chunked* argument was added. No attempt is made to determine the Content-Length for file objects. HTTPConnection.getresponse() Should be called after a request is sent to get the response from the server. Returns an "HTTPResponse" instance. Note: Note that you must have read the whole response before you can send a new request to the server. Changed in version 3.5: If a "ConnectionError" or subclass is raised, the "HTTPConnection" object will be ready to reconnect when a new request is sent. HTTPConnection.set_debuglevel(level) Set the debugging level. The default debug level is "0", meaning no debugging output is printed. Any value greater than "0" will cause all currently defined debug output to be printed to stdout. The "debuglevel" is passed to any new "HTTPResponse" objects that are created. New in version 3.1. HTTPConnection.set_tunnel(host, port=None, headers=None) Set the host and the port for HTTP Connect Tunnelling. This allows running the connection through a proxy server. The host and port arguments specify the endpoint of the tunneled connection (i.e. the address included in the CONNECT request, *not* the address of the proxy server). The headers argument should be a mapping of extra HTTP headers to send with the CONNECT request. For example, to tunnel through a HTTPS proxy server running locally on port 8080, we would pass the address of the proxy to the "HTTPSConnection" constructor, and the address of the host that we eventually want to reach to the "set_tunnel()" method: >>> import http.client >>> conn = http.client.HTTPSConnection("localhost", 8080) >>> conn.set_tunnel("www.python.org") >>> conn.request("HEAD","/index.html") New in version 3.2. HTTPConnection.connect() Connect to the server specified when the object was created. By default, this is called automatically when making a request if the client does not already have a connection. HTTPConnection.close() Close the connection to the server. HTTPConnection.blocksize Buffer size in bytes for sending a file-like message body. New in version 3.7. As an alternative to using the "request()" method described above, you can also send your request step by step, by using the four functions below. HTTPConnection.putrequest(method, url, skip_host=False, skip_accept_encoding=False) This should be the first call after the connection to the server has been made. It sends a line to the server consisting of the *method* string, the *url* string, and the HTTP version ("HTTP/1.1"). To disable automatic sending of "Host:" or "Accept- Encoding:" headers (for example to accept additional content encodings), specify *skip_host* or *skip_accept_encoding* with non- False values. HTTPConnection.putheader(header, argument[, ...]) Send an **RFC 822**-style header to the server. It sends a line to the server consisting of the header, a colon and a space, and the first argument. If more arguments are given, continuation lines are sent, each consisting of a tab and an argument. HTTPConnection.endheaders(message_body=None, *, encode_chunked=False) Send a blank line to the server, signalling the end of the headers. The optional *message_body* argument can be used to pass a message body associated with the request. If *encode_chunked* is "True", the result of each iteration of *message_body* will be chunk-encoded as specified in **RFC 7230**, Section 3.3.1. How the data is encoded is dependent on the type of *message_body*. If *message_body* implements the buffer interface the encoding will result in a single chunk. If *message_body* is a "collections.abc.Iterable", each iteration of *message_body* will result in a chunk. If *message_body* is a *file object*, each call to ".read()" will result in a chunk. The method automatically signals the end of the chunk-encoded data immediately after *message_body*. Note: Due to the chunked encoding specification, empty chunks yielded by an iterator body will be ignored by the chunk-encoder. This is to avoid premature termination of the read of the request by the target server due to malformed encoding. New in version 3.6: Chunked encoding support. The *encode_chunked* parameter was added. HTTPConnection.send(data) Send data to the server. This should be used directly only after the "endheaders()" method has been called and before "getresponse()" is called. HTTPResponse Objects ==================== An "HTTPResponse" instance wraps the HTTP response from the server. It provides access to the request headers and the entity body. The response is an iterable object and can be used in a with statement. Changed in version 3.5: The "io.BufferedIOBase" interface is now implemented and all of its reader operations are supported. HTTPResponse.read([amt]) Reads and returns the response body, or up to the next *amt* bytes. HTTPResponse.readinto(b) Reads up to the next len(b) bytes of the response body into the buffer *b*. Returns the number of bytes read. New in version 3.3. HTTPResponse.getheader(name, default=None) Return the value of the header *name*, or *default* if there is no header matching *name*. If there is more than one header with the name *name*, return all of the values joined by ‘, ‘. If ‘default’ is any iterable other than a single string, its elements are similarly returned joined by commas. HTTPResponse.getheaders() Return a list of (header, value) tuples. HTTPResponse.fileno() Return the "fileno" of the underlying socket. HTTPResponse.msg A "http.client.HTTPMessage" instance containing the response headers. "http.client.HTTPMessage" is a subclass of "email.message.Message". HTTPResponse.version HTTP protocol version used by server. 10 for HTTP/1.0, 11 for HTTP/1.1. HTTPResponse.url URL of the resource retrieved, commonly used to determine if a redirect was followed. HTTPResponse.headers Headers of the response in the form of an "email.message.EmailMessage" instance. HTTPResponse.status Status code returned by server. HTTPResponse.reason Reason phrase returned by server. HTTPResponse.debuglevel A debugging hook. If "debuglevel" is greater than zero, messages will be printed to stdout as the response is read and parsed. HTTPResponse.closed Is "True" if the stream is closed. HTTPResponse.geturl() Deprecated since version 3.9: Deprecated in favor of "url". HTTPResponse.info() Deprecated since version 3.9: Deprecated in favor of "headers". HTTPResponse.getstatus() Deprecated since version 3.9: Deprecated in favor of "status". Examples ======== Here is an example session that uses the "GET" method: >>> import http.client >>> conn = http.client.HTTPSConnection("www.python.org") >>> conn.request("GET", "/") >>> r1 = conn.getresponse() >>> print(r1.status, r1.reason) 200 OK >>> data1 = r1.read() # This will return entire content. >>> # The following example demonstrates reading data in chunks. >>> conn.request("GET", "/") >>> r1 = conn.getresponse() >>> while chunk := r1.read(200): ... print(repr(chunk)) b'\n 10 11 12 13 14 ..." | +--------------------+-------------------+---------------------------------------------------+-------------------------------------------+ | "cycle()" | p | p0, p1, … plast, p0, p1, … | "cycle('ABCD') --> A B C D A B C D ..." | +--------------------+-------------------+---------------------------------------------------+-------------------------------------------+ | "repeat()" | elem [,n] | elem, elem, elem, … endlessly or up to n times | "repeat(10, 3) --> 10 10 10" | +--------------------+-------------------+---------------------------------------------------+-------------------------------------------+ **Iterators terminating on the shortest input sequence:** +------------------------------+------------------------------+---------------------------------------------------+---------------------------------------------------------------+ | Iterator | Arguments | Results | Example | |==============================|==============================|===================================================|===============================================================| | "accumulate()" | p [,func] | p0, p0+p1, p0+p1+p2, … | "accumulate([1,2,3,4,5]) --> 1 3 6 10 15" | +------------------------------+------------------------------+---------------------------------------------------+---------------------------------------------------------------+ | "chain()" | p, q, … | p0, p1, … plast, q0, q1, … | "chain('ABC', 'DEF') --> A B C D E F" | +------------------------------+------------------------------+---------------------------------------------------+---------------------------------------------------------------+ | "chain.from_iterable()" | iterable | p0, p1, … plast, q0, q1, … | "chain.from_iterable(['ABC', 'DEF']) --> A B C D E F" | +------------------------------+------------------------------+---------------------------------------------------+---------------------------------------------------------------+ | "compress()" | data, selectors | (d[0] if s[0]), (d[1] if s[1]), … | "compress('ABCDEF', [1,0,1,0,1,1]) --> A C E F" | +------------------------------+------------------------------+---------------------------------------------------+---------------------------------------------------------------+ | "dropwhile()" | pred, seq | seq[n], seq[n+1], starting when pred fails | "dropwhile(lambda x: x<5, [1,4,6,4,1]) --> 6 4 1" | +------------------------------+------------------------------+---------------------------------------------------+---------------------------------------------------------------+ | "filterfalse()" | pred, seq | elements of seq where pred(elem) is false | "filterfalse(lambda x: x%2, range(10)) --> 0 2 4 6 8" | +------------------------------+------------------------------+---------------------------------------------------+---------------------------------------------------------------+ | "groupby()" | iterable[, key] | sub-iterators grouped by value of key(v) | | +------------------------------+------------------------------+---------------------------------------------------+---------------------------------------------------------------+ | "islice()" | seq, [start,] stop [, step] | elements from seq[start:stop:step] | "islice('ABCDEFG', 2, None) --> C D E F G" | +------------------------------+------------------------------+---------------------------------------------------+---------------------------------------------------------------+ | "starmap()" | func, seq | func(*seq[0]), func(*seq[1]), … | "starmap(pow, [(2,5), (3,2), (10,3)]) --> 32 9 1000" | +------------------------------+------------------------------+---------------------------------------------------+---------------------------------------------------------------+ | "takewhile()" | pred, seq | seq[0], seq[1], until pred fails | "takewhile(lambda x: x<5, [1,4,6,4,1]) --> 1 4" | +------------------------------+------------------------------+---------------------------------------------------+---------------------------------------------------------------+ | "tee()" | it, n | it1, it2, … itn splits one iterator into n | | +------------------------------+------------------------------+---------------------------------------------------+---------------------------------------------------------------+ | "zip_longest()" | p, q, … | (p[0], q[0]), (p[1], q[1]), … | "zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D-" | +------------------------------+------------------------------+---------------------------------------------------+---------------------------------------------------------------+ **Combinatoric iterators:** +------------------------------------------------+----------------------+---------------------------------------------------------------+ | Iterator | Arguments | Results | |================================================|======================|===============================================================| | "product()" | p, q, … [repeat=1] | cartesian product, equivalent to a nested for-loop | +------------------------------------------------+----------------------+---------------------------------------------------------------+ | "permutations()" | p[, r] | r-length tuples, all possible orderings, no repeated elements | +------------------------------------------------+----------------------+---------------------------------------------------------------+ | "combinations()" | p, r | r-length tuples, in sorted order, no repeated elements | +------------------------------------------------+----------------------+---------------------------------------------------------------+ | "combinations_with_replacement()" | p, r | r-length tuples, in sorted order, with repeated elements | +------------------------------------------------+----------------------+---------------------------------------------------------------+ +------------------------------------------------+---------------------------------------------------------------+ | Examples | Results | |================================================|===============================================================| | "product('ABCD', repeat=2)" | "AA AB AC AD BA BB BC BD CA CB CC CD DA DB DC DD" | +------------------------------------------------+---------------------------------------------------------------+ | "permutations('ABCD', 2)" | "AB AC AD BA BC BD CA CB CD DA DB DC" | +------------------------------------------------+---------------------------------------------------------------+ | "combinations('ABCD', 2)" | "AB AC AD BC BD CD" | +------------------------------------------------+---------------------------------------------------------------+ | "combinations_with_replacement('ABCD', 2)" | "AA AB AC AD BB BC BD CC CD DD" | +------------------------------------------------+---------------------------------------------------------------+ Itertool functions ================== The following module functions all construct and return iterators. Some provide streams of infinite length, so they should only be accessed by functions or loops that truncate the stream. itertools.accumulate(iterable[, func, *, initial=None]) Make an iterator that returns accumulated sums, or accumulated results of other binary functions (specified via the optional *func* argument). If *func* is supplied, it should be a function of two arguments. Elements of the input *iterable* may be any type that can be accepted as arguments to *func*. (For example, with the default operation of addition, elements may be any addable type including "Decimal" or "Fraction".) Usually, the number of elements output matches the input iterable. However, if the keyword argument *initial* is provided, the accumulation leads off with the *initial* value so that the output has one more element than the input iterable. Roughly equivalent to: def accumulate(iterable, func=operator.add, *, initial=None): 'Return running totals' # accumulate([1,2,3,4,5]) --> 1 3 6 10 15 # accumulate([1,2,3,4,5], initial=100) --> 100 101 103 106 110 115 # accumulate([1,2,3,4,5], operator.mul) --> 1 2 6 24 120 it = iter(iterable) total = initial if initial is None: try: total = next(it) except StopIteration: return yield total for element in it: total = func(total, element) yield total There are a number of uses for the *func* argument. It can be set to "min()" for a running minimum, "max()" for a running maximum, or "operator.mul()" for a running product. Amortization tables can be built by accumulating interest and applying payments. First-order recurrence relations can be modeled by supplying the initial value in the iterable and using only the accumulated total in *func* argument: >>> data = [3, 4, 6, 2, 1, 9, 0, 7, 5, 8] >>> list(accumulate(data, operator.mul)) # running product [3, 12, 72, 144, 144, 1296, 0, 0, 0, 0] >>> list(accumulate(data, max)) # running maximum [3, 4, 6, 6, 6, 9, 9, 9, 9, 9] # Amortize a 5% loan of 1000 with 4 annual payments of 90 >>> cashflows = [1000, -90, -90, -90, -90] >>> list(accumulate(cashflows, lambda bal, pmt: bal*1.05 + pmt)) [1000, 960.0, 918.0, 873.9000000000001, 827.5950000000001] # Chaotic recurrence relation https://en.wikipedia.org/wiki/Logistic_map >>> logistic_map = lambda x, _: r * x * (1 - x) >>> r = 3.8 >>> x0 = 0.4 >>> inputs = repeat(x0, 36) # only the initial value is used >>> [format(x, '.2f') for x in accumulate(inputs, logistic_map)] ['0.40', '0.91', '0.30', '0.81', '0.60', '0.92', '0.29', '0.79', '0.63', '0.88', '0.39', '0.90', '0.33', '0.84', '0.52', '0.95', '0.18', '0.57', '0.93', '0.25', '0.71', '0.79', '0.63', '0.88', '0.39', '0.91', '0.32', '0.83', '0.54', '0.95', '0.20', '0.60', '0.91', '0.30', '0.80', '0.60'] See "functools.reduce()" for a similar function that returns only the final accumulated value. New in version 3.2. Changed in version 3.3: Added the optional *func* parameter. Changed in version 3.8: Added the optional *initial* parameter. itertools.chain(*iterables) Make an iterator that returns elements from the first iterable until it is exhausted, then proceeds to the next iterable, until all of the iterables are exhausted. Used for treating consecutive sequences as a single sequence. Roughly equivalent to: def chain(*iterables): # chain('ABC', 'DEF') --> A B C D E F for it in iterables: for element in it: yield element classmethod chain.from_iterable(iterable) Alternate constructor for "chain()". Gets chained inputs from a single iterable argument that is evaluated lazily. Roughly equivalent to: def from_iterable(iterables): # chain.from_iterable(['ABC', 'DEF']) --> A B C D E F for it in iterables: for element in it: yield element itertools.combinations(iterable, r) Return *r* length subsequences of elements from the input *iterable*. The combination tuples are emitted in lexicographic ordering according to the order of the input *iterable*. So, if the input *iterable* is sorted, the combination tuples will be produced in sorted order. Elements are treated as unique based on their position, not on their value. So if the input elements are unique, there will be no repeat values in each combination. Roughly equivalent to: def combinations(iterable, r): # combinations('ABCD', 2) --> AB AC AD BC BD CD # combinations(range(4), 3) --> 012 013 023 123 pool = tuple(iterable) n = len(pool) if r > n: return indices = list(range(r)) yield tuple(pool[i] for i in indices) while True: for i in reversed(range(r)): if indices[i] != i + n - r: break else: return indices[i] += 1 for j in range(i+1, r): indices[j] = indices[j-1] + 1 yield tuple(pool[i] for i in indices) The code for "combinations()" can be also expressed as a subsequence of "permutations()" after filtering entries where the elements are not in sorted order (according to their position in the input pool): def combinations(iterable, r): pool = tuple(iterable) n = len(pool) for indices in permutations(range(n), r): if sorted(indices) == list(indices): yield tuple(pool[i] for i in indices) The number of items returned is "n! / r! / (n-r)!" when "0 <= r <= n" or zero when "r > n". itertools.combinations_with_replacement(iterable, r) Return *r* length subsequences of elements from the input *iterable* allowing individual elements to be repeated more than once. The combination tuples are emitted in lexicographic ordering according to the order of the input *iterable*. So, if the input *iterable* is sorted, the combination tuples will be produced in sorted order. Elements are treated as unique based on their position, not on their value. So if the input elements are unique, the generated combinations will also be unique. Roughly equivalent to: def combinations_with_replacement(iterable, r): # combinations_with_replacement('ABC', 2) --> AA AB AC BB BC CC pool = tuple(iterable) n = len(pool) if not n and r: return indices = [0] * r yield tuple(pool[i] for i in indices) while True: for i in reversed(range(r)): if indices[i] != n - 1: break else: return indices[i:] = [indices[i] + 1] * (r - i) yield tuple(pool[i] for i in indices) The code for "combinations_with_replacement()" can be also expressed as a subsequence of "product()" after filtering entries where the elements are not in sorted order (according to their position in the input pool): def combinations_with_replacement(iterable, r): pool = tuple(iterable) n = len(pool) for indices in product(range(n), repeat=r): if sorted(indices) == list(indices): yield tuple(pool[i] for i in indices) The number of items returned is "(n+r-1)! / r! / (n-1)!" when "n > 0". New in version 3.1. itertools.compress(data, selectors) Make an iterator that filters elements from *data* returning only those that have a corresponding element in *selectors* that evaluates to "True". Stops when either the *data* or *selectors* iterables has been exhausted. Roughly equivalent to: def compress(data, selectors): # compress('ABCDEF', [1,0,1,0,1,1]) --> A C E F return (d for d, s in zip(data, selectors) if s) New in version 3.1. itertools.count(start=0, step=1) Make an iterator that returns evenly spaced values starting with number *start*. Often used as an argument to "map()" to generate consecutive data points. Also, used with "zip()" to add sequence numbers. Roughly equivalent to: def count(start=0, step=1): # count(10) --> 10 11 12 13 14 ... # count(2.5, 0.5) -> 2.5 3.0 3.5 ... n = start while True: yield n n += step When counting with floating point numbers, better accuracy can sometimes be achieved by substituting multiplicative code such as: "(start + step * i for i in count())". Changed in version 3.1: Added *step* argument and allowed non- integer arguments. itertools.cycle(iterable) Make an iterator returning elements from the iterable and saving a copy of each. When the iterable is exhausted, return elements from the saved copy. Repeats indefinitely. Roughly equivalent to: def cycle(iterable): # cycle('ABCD') --> A B C D A B C D A B C D ... saved = [] for element in iterable: yield element saved.append(element) while saved: for element in saved: yield element Note, this member of the toolkit may require significant auxiliary storage (depending on the length of the iterable). itertools.dropwhile(predicate, iterable) Make an iterator that drops elements from the iterable as long as the predicate is true; afterwards, returns every element. Note, the iterator does not produce *any* output until the predicate first becomes false, so it may have a lengthy start-up time. Roughly equivalent to: def dropwhile(predicate, iterable): # dropwhile(lambda x: x<5, [1,4,6,4,1]) --> 6 4 1 iterable = iter(iterable) for x in iterable: if not predicate(x): yield x break for x in iterable: yield x itertools.filterfalse(predicate, iterable) Make an iterator that filters elements from iterable returning only those for which the predicate is "False". If *predicate* is "None", return the items that are false. Roughly equivalent to: def filterfalse(predicate, iterable): # filterfalse(lambda x: x%2, range(10)) --> 0 2 4 6 8 if predicate is None: predicate = bool for x in iterable: if not predicate(x): yield x itertools.groupby(iterable, key=None) Make an iterator that returns consecutive keys and groups from the *iterable*. The *key* is a function computing a key value for each element. If not specified or is "None", *key* defaults to an identity function and returns the element unchanged. Generally, the iterable needs to already be sorted on the same key function. The operation of "groupby()" is similar to the "uniq" filter in Unix. It generates a break or new group every time the value of the key function changes (which is why it is usually necessary to have sorted the data using the same key function). That behavior differs from SQL’s GROUP BY which aggregates common elements regardless of their input order. The returned group is itself an iterator that shares the underlying iterable with "groupby()". Because the source is shared, when the "groupby()" object is advanced, the previous group is no longer visible. So, if that data is needed later, it should be stored as a list: groups = [] uniquekeys = [] data = sorted(data, key=keyfunc) for k, g in groupby(data, keyfunc): groups.append(list(g)) # Store group iterator as a list uniquekeys.append(k) "groupby()" is roughly equivalent to: class groupby: # [k for k, g in groupby('AAAABBBCCDAABBB')] --> A B C D A B # [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D def __init__(self, iterable, key=None): if key is None: key = lambda x: x self.keyfunc = key self.it = iter(iterable) self.tgtkey = self.currkey = self.currvalue = object() def __iter__(self): return self def __next__(self): self.id = object() while self.currkey == self.tgtkey: self.currvalue = next(self.it) # Exit on StopIteration self.currkey = self.keyfunc(self.currvalue) self.tgtkey = self.currkey return (self.currkey, self._grouper(self.tgtkey, self.id)) def _grouper(self, tgtkey, id): while self.id is id and self.currkey == tgtkey: yield self.currvalue try: self.currvalue = next(self.it) except StopIteration: return self.currkey = self.keyfunc(self.currvalue) itertools.islice(iterable, stop) itertools.islice(iterable, start, stop[, step]) Make an iterator that returns selected elements from the iterable. If *start* is non-zero, then elements from the iterable are skipped until start is reached. Afterward, elements are returned consecutively unless *step* is set higher than one which results in items being skipped. If *stop* is "None", then iteration continues until the iterator is exhausted, if at all; otherwise, it stops at the specified position. Unlike regular slicing, "islice()" does not support negative values for *start*, *stop*, or *step*. Can be used to extract related fields from data where the internal structure has been flattened (for example, a multi-line report may list a name field on every third line). Roughly equivalent to: def islice(iterable, *args): # islice('ABCDEFG', 2) --> A B # islice('ABCDEFG', 2, 4) --> C D # islice('ABCDEFG', 2, None) --> C D E F G # islice('ABCDEFG', 0, None, 2) --> A C E G s = slice(*args) start, stop, step = s.start or 0, s.stop or sys.maxsize, s.step or 1 it = iter(range(start, stop, step)) try: nexti = next(it) except StopIteration: # Consume *iterable* up to the *start* position. for i, element in zip(range(start), iterable): pass return try: for i, element in enumerate(iterable): if i == nexti: yield element nexti = next(it) except StopIteration: # Consume to *stop*. for i, element in zip(range(i + 1, stop), iterable): pass If *start* is "None", then iteration starts at zero. If *step* is "None", then the step defaults to one. itertools.permutations(iterable, r=None) Return successive *r* length permutations of elements in the *iterable*. If *r* is not specified or is "None", then *r* defaults to the length of the *iterable* and all possible full-length permutations are generated. The permutation tuples are emitted in lexicographic ordering according to the order of the input *iterable*. So, if the input *iterable* is sorted, the combination tuples will be produced in sorted order. Elements are treated as unique based on their position, not on their value. So if the input elements are unique, there will be no repeat values in each permutation. Roughly equivalent to: def permutations(iterable, r=None): # permutations('ABCD', 2) --> AB AC AD BA BC BD CA CB CD DA DB DC # permutations(range(3)) --> 012 021 102 120 201 210 pool = tuple(iterable) n = len(pool) r = n if r is None else r if r > n: return indices = list(range(n)) cycles = list(range(n, n-r, -1)) yield tuple(pool[i] for i in indices[:r]) while n: for i in reversed(range(r)): cycles[i] -= 1 if cycles[i] == 0: indices[i:] = indices[i+1:] + indices[i:i+1] cycles[i] = n - i else: j = cycles[i] indices[i], indices[-j] = indices[-j], indices[i] yield tuple(pool[i] for i in indices[:r]) break else: return The code for "permutations()" can be also expressed as a subsequence of "product()", filtered to exclude entries with repeated elements (those from the same position in the input pool): def permutations(iterable, r=None): pool = tuple(iterable) n = len(pool) r = n if r is None else r for indices in product(range(n), repeat=r): if len(set(indices)) == r: yield tuple(pool[i] for i in indices) The number of items returned is "n! / (n-r)!" when "0 <= r <= n" or zero when "r > n". itertools.product(*iterables, repeat=1) Cartesian product of input iterables. Roughly equivalent to nested for-loops in a generator expression. For example, "product(A, B)" returns the same as "((x,y) for x in A for y in B)". The nested loops cycle like an odometer with the rightmost element advancing on every iteration. This pattern creates a lexicographic ordering so that if the input’s iterables are sorted, the product tuples are emitted in sorted order. To compute the product of an iterable with itself, specify the number of repetitions with the optional *repeat* keyword argument. For example, "product(A, repeat=4)" means the same as "product(A, A, A, A)". This function is roughly equivalent to the following code, except that the actual implementation does not build up intermediate results in memory: def product(*args, repeat=1): # product('ABCD', 'xy') --> Ax Ay Bx By Cx Cy Dx Dy # product(range(2), repeat=3) --> 000 001 010 011 100 101 110 111 pools = [tuple(pool) for pool in args] * repeat result = [[]] for pool in pools: result = [x+[y] for x in result for y in pool] for prod in result: yield tuple(prod) Before "product()" runs, it completely consumes the input iterables, keeping pools of values in memory to generate the products. Accordingly, it is only useful with finite inputs. itertools.repeat(object[, times]) Make an iterator that returns *object* over and over again. Runs indefinitely unless the *times* argument is specified. Used as argument to "map()" for invariant parameters to the called function. Also used with "zip()" to create an invariant part of a tuple record. Roughly equivalent to: def repeat(object, times=None): # repeat(10, 3) --> 10 10 10 if times is None: while True: yield object else: for i in range(times): yield object A common use for *repeat* is to supply a stream of constant values to *map* or *zip*: >>> list(map(pow, range(10), repeat(2))) [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] itertools.starmap(function, iterable) Make an iterator that computes the function using arguments obtained from the iterable. Used instead of "map()" when argument parameters are already grouped in tuples from a single iterable (the data has been “pre-zipped”). The difference between "map()" and "starmap()" parallels the distinction between "function(a,b)" and "function(*c)". Roughly equivalent to: def starmap(function, iterable): # starmap(pow, [(2,5), (3,2), (10,3)]) --> 32 9 1000 for args in iterable: yield function(*args) itertools.takewhile(predicate, iterable) Make an iterator that returns elements from the iterable as long as the predicate is true. Roughly equivalent to: def takewhile(predicate, iterable): # takewhile(lambda x: x<5, [1,4,6,4,1]) --> 1 4 for x in iterable: if predicate(x): yield x else: break itertools.tee(iterable, n=2) Return *n* independent iterators from a single iterable. The following Python code helps explain what *tee* does (although the actual implementation is more complex and uses only a single underlying FIFO (first-in, first-out) queue). Roughly equivalent to: def tee(iterable, n=2): it = iter(iterable) deques = [collections.deque() for i in range(n)] def gen(mydeque): while True: if not mydeque: # when the local deque is empty try: newval = next(it) # fetch a new value and except StopIteration: return for d in deques: # load it to all the deques d.append(newval) yield mydeque.popleft() return tuple(gen(d) for d in deques) Once "tee()" has made a split, the original *iterable* should not be used anywhere else; otherwise, the *iterable* could get advanced without the tee objects being informed. "tee" iterators are not threadsafe. A "RuntimeError" may be raised when using simultaneously iterators returned by the same "tee()" call, even if the original *iterable* is threadsafe. This itertool may require significant auxiliary storage (depending on how much temporary data needs to be stored). In general, if one iterator uses most or all of the data before another iterator starts, it is faster to use "list()" instead of "tee()". itertools.zip_longest(*iterables, fillvalue=None) Make an iterator that aggregates elements from each of the iterables. If the iterables are of uneven length, missing values are filled-in with *fillvalue*. Iteration continues until the longest iterable is exhausted. Roughly equivalent to: def zip_longest(*args, fillvalue=None): # zip_longest('ABCD', 'xy', fillvalue='-') --> Ax By C- D- iterators = [iter(it) for it in args] num_active = len(iterators) if not num_active: return while True: values = [] for i, it in enumerate(iterators): try: value = next(it) except StopIteration: num_active -= 1 if not num_active: return iterators[i] = repeat(fillvalue) value = fillvalue values.append(value) yield tuple(values) If one of the iterables is potentially infinite, then the "zip_longest()" function should be wrapped with something that limits the number of calls (for example "islice()" or "takewhile()"). If not specified, *fillvalue* defaults to "None". Itertools Recipes ================= This section shows recipes for creating an extended toolset using the existing itertools as building blocks. Substantially all of these recipes and many, many others can be installed from the more-itertools project found on the Python Package Index: pip install more-itertools The extended tools offer the same high performance as the underlying toolset. The superior memory performance is kept by processing elements one at a time rather than bringing the whole iterable into memory all at once. Code volume is kept small by linking the tools together in a functional style which helps eliminate temporary variables. High speed is retained by preferring “vectorized” building blocks over the use of for-loops and *generator*s which incur interpreter overhead. def take(n, iterable): "Return first n items of the iterable as a list" return list(islice(iterable, n)) def prepend(value, iterator): "Prepend a single value in front of an iterator" # prepend(1, [2, 3, 4]) -> 1 2 3 4 return chain([value], iterator) def tabulate(function, start=0): "Return function(0), function(1), ..." return map(function, count(start)) def tail(n, iterable): "Return an iterator over the last n items" # tail(3, 'ABCDEFG') --> E F G return iter(collections.deque(iterable, maxlen=n)) def consume(iterator, n=None): "Advance the iterator n-steps ahead. If n is None, consume entirely." # Use functions that consume iterators at C speed. if n is None: # feed the entire iterator into a zero-length deque collections.deque(iterator, maxlen=0) else: # advance to the empty slice starting at position n next(islice(iterator, n, n), None) def nth(iterable, n, default=None): "Returns the nth item or a default value" return next(islice(iterable, n, None), default) def all_equal(iterable): "Returns True if all the elements are equal to each other" g = groupby(iterable) return next(g, True) and not next(g, False) def quantify(iterable, pred=bool): "Count how many times the predicate is true" return sum(map(pred, iterable)) def pad_none(iterable): """Returns the sequence elements and then returns None indefinitely. Useful for emulating the behavior of the built-in map() function. """ return chain(iterable, repeat(None)) def ncycles(iterable, n): "Returns the sequence elements n times" return chain.from_iterable(repeat(tuple(iterable), n)) def dotproduct(vec1, vec2): return sum(map(operator.mul, vec1, vec2)) def convolve(signal, kernel): # See: https://betterexplained.com/articles/intuitive-convolution/ # convolve(data, [0.25, 0.25, 0.25, 0.25]) --> Moving average (blur) # convolve(data, [1, -1]) --> 1st finite difference (1st derivative) # convolve(data, [1, -2, 1]) --> 2nd finite difference (2nd derivative) kernel = tuple(kernel)[::-1] n = len(kernel) window = collections.deque([0], maxlen=n) * n for x in chain(signal, repeat(0, n-1)): window.append(x) yield sum(map(operator.mul, kernel, window)) def flatten(list_of_lists): "Flatten one level of nesting" return chain.from_iterable(list_of_lists) def repeatfunc(func, times=None, *args): """Repeat calls to func with specified arguments. Example: repeatfunc(random.random) """ if times is None: return starmap(func, repeat(args)) return starmap(func, repeat(args, times)) def pairwise(iterable): "s -> (s0,s1), (s1,s2), (s2, s3), ..." a, b = tee(iterable) next(b, None) return zip(a, b) def grouper(iterable, n, fillvalue=None): "Collect data into fixed-length chunks or blocks" # grouper('ABCDEFG', 3, 'x') --> ABC DEF Gxx" args = [iter(iterable)] * n return zip_longest(*args, fillvalue=fillvalue) def roundrobin(*iterables): "roundrobin('ABC', 'D', 'EF') --> A D E B F C" # Recipe credited to George Sakkis num_active = len(iterables) nexts = cycle(iter(it).__next__ for it in iterables) while num_active: try: for next in nexts: yield next() except StopIteration: # Remove the iterator we just exhausted from the cycle. num_active -= 1 nexts = cycle(islice(nexts, num_active)) def partition(pred, iterable): "Use a predicate to partition entries into false entries and true entries" # partition(is_odd, range(10)) --> 0 2 4 6 8 and 1 3 5 7 9 t1, t2 = tee(iterable) return filterfalse(pred, t1), filter(pred, t2) def powerset(iterable): "powerset([1,2,3]) --> () (1,) (2,) (3,) (1,2) (1,3) (2,3) (1,2,3)" s = list(iterable) return chain.from_iterable(combinations(s, r) for r in range(len(s)+1)) def unique_everseen(iterable, key=None): "List unique elements, preserving order. Remember all elements ever seen." # unique_everseen('AAAABBBCCDAABBB') --> A B C D # unique_everseen('ABBCcAD', str.lower) --> A B C D seen = set() seen_add = seen.add if key is None: for element in filterfalse(seen.__contains__, iterable): seen_add(element) yield element else: for element in iterable: k = key(element) if k not in seen: seen_add(k) yield element def unique_justseen(iterable, key=None): "List unique elements, preserving order. Remember only the element just seen." # unique_justseen('AAAABBBCCDAABBB') --> A B C D A B # unique_justseen('ABBCcAD', str.lower) --> A B C A D return map(next, map(operator.itemgetter(1), groupby(iterable, key))) def iter_except(func, exception, first=None): """ Call a function repeatedly until an exception is raised. Converts a call-until-exception interface to an iterator interface. Like builtins.iter(func, sentinel) but uses an exception instead of a sentinel to end the loop. Examples: iter_except(functools.partial(heappop, h), IndexError) # priority queue iterator iter_except(d.popitem, KeyError) # non-blocking dict iterator iter_except(d.popleft, IndexError) # non-blocking deque iterator iter_except(q.get_nowait, Queue.Empty) # loop over a producer Queue iter_except(s.pop, KeyError) # non-blocking set iterator """ try: if first is not None: yield first() # For database APIs needing an initial cast to db.first() while True: yield func() except exception: pass def first_true(iterable, default=False, pred=None): """Returns the first true value in the iterable. If no true value is found, returns *default* If *pred* is not None, returns the first item for which pred(item) is true. """ # first_true([a,b,c], x) --> a or b or c or x # first_true([a,b], x, f) --> a if f(a) else b if f(b) else x return next(filter(pred, iterable), default) def random_product(*args, repeat=1): "Random selection from itertools.product(*args, **kwds)" pools = [tuple(pool) for pool in args] * repeat return tuple(map(random.choice, pools)) def random_permutation(iterable, r=None): "Random selection from itertools.permutations(iterable, r)" pool = tuple(iterable) r = len(pool) if r is None else r return tuple(random.sample(pool, r)) def random_combination(iterable, r): "Random selection from itertools.combinations(iterable, r)" pool = tuple(iterable) n = len(pool) indices = sorted(random.sample(range(n), r)) return tuple(pool[i] for i in indices) def random_combination_with_replacement(iterable, r): "Random selection from itertools.combinations_with_replacement(iterable, r)" pool = tuple(iterable) n = len(pool) indices = sorted(random.choices(range(n), k=r)) return tuple(pool[i] for i in indices) def nth_combination(iterable, r, index): "Equivalent to list(combinations(iterable, r))[index]" pool = tuple(iterable) n = len(pool) if r < 0 or r > n: raise ValueError c = 1 k = min(r, n-r) for i in range(1, k+1): c = c * (n - k + i) // i if index < 0: index += c if index < 0 or index >= c: raise IndexError result = [] while r: c, n, r = c*r//n, n-1, r-1 while index >= c: index -= c c, n = c*(n-r)//n, n-1 result.append(pool[-1-n]) return tuple(result) "json" — JSON encoder and decoder ********************************* **Source code:** Lib/json/__init__.py ====================================================================== JSON (JavaScript Object Notation), specified by **RFC 7159** (which obsoletes **RFC 4627**) and by ECMA-404, is a lightweight data interchange format inspired by JavaScript object literal syntax (although it is not a strict subset of JavaScript [1] ). Warning: Be cautious when parsing JSON data from untrusted sources. A malicious JSON string may cause the decoder to consume considerable CPU and memory resources. Limiting the size of data to be parsed is recommended. "json" exposes an API familiar to users of the standard library "marshal" and "pickle" modules. Encoding basic Python object hierarchies: >>> import json >>> json.dumps(['foo', {'bar': ('baz', None, 1.0, 2)}]) '["foo", {"bar": ["baz", null, 1.0, 2]}]' >>> print(json.dumps("\"foo\bar")) "\"foo\bar" >>> print(json.dumps('\u1234')) "\u1234" >>> print(json.dumps('\\')) "\\" >>> print(json.dumps({"c": 0, "b": 0, "a": 0}, sort_keys=True)) {"a": 0, "b": 0, "c": 0} >>> from io import StringIO >>> io = StringIO() >>> json.dump(['streaming API'], io) >>> io.getvalue() '["streaming API"]' Compact encoding: >>> import json >>> json.dumps([1, 2, 3, {'4': 5, '6': 7}], separators=(',', ':')) '[1,2,3,{"4":5,"6":7}]' Pretty printing: >>> import json >>> print(json.dumps({'4': 5, '6': 7}, sort_keys=True, indent=4)) { "4": 5, "6": 7 } Decoding JSON: >>> import json >>> json.loads('["foo", {"bar":["baz", null, 1.0, 2]}]') ['foo', {'bar': ['baz', None, 1.0, 2]}] >>> json.loads('"\\"foo\\bar"') '"foo\x08ar' >>> from io import StringIO >>> io = StringIO('["streaming API"]') >>> json.load(io) ['streaming API'] Specializing JSON object decoding: >>> import json >>> def as_complex(dct): ... if '__complex__' in dct: ... return complex(dct['real'], dct['imag']) ... return dct ... >>> json.loads('{"__complex__": true, "real": 1, "imag": 2}', ... object_hook=as_complex) (1+2j) >>> import decimal >>> json.loads('1.1', parse_float=decimal.Decimal) Decimal('1.1') Extending "JSONEncoder": >>> import json >>> class ComplexEncoder(json.JSONEncoder): ... def default(self, obj): ... if isinstance(obj, complex): ... return [obj.real, obj.imag] ... # Let the base class default method raise the TypeError ... return json.JSONEncoder.default(self, obj) ... >>> json.dumps(2 + 1j, cls=ComplexEncoder) '[2.0, 1.0]' >>> ComplexEncoder().encode(2 + 1j) '[2.0, 1.0]' >>> list(ComplexEncoder().iterencode(2 + 1j)) ['[2.0', ', 1.0', ']'] Using "json.tool" from the shell to validate and pretty-print: $ echo '{"json":"obj"}' | python -m json.tool { "json": "obj" } $ echo '{1.2:3.4}' | python -m json.tool Expecting property name enclosed in double quotes: line 1 column 2 (char 1) See Command Line Interface for detailed documentation. Note: JSON is a subset of YAML 1.2. The JSON produced by this module’s default settings (in particular, the default *separators* value) is also a subset of YAML 1.0 and 1.1. This module can thus also be used as a YAML serializer. Note: This module’s encoders and decoders preserve input and output order by default. Order is only lost if the underlying containers are unordered. Basic Usage =========== json.dump(obj, fp, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw) Serialize *obj* as a JSON formatted stream to *fp* (a ".write()"-supporting *file-like object*) using this conversion table. If *skipkeys* is true (default: "False"), then dict keys that are not of a basic type ("str", "int", "float", "bool", "None") will be skipped instead of raising a "TypeError". The "json" module always produces "str" objects, not "bytes" objects. Therefore, "fp.write()" must support "str" input. If *ensure_ascii* is true (the default), the output is guaranteed to have all incoming non-ASCII characters escaped. If *ensure_ascii* is false, these characters will be output as-is. If *check_circular* is false (default: "True"), then the circular reference check for container types will be skipped and a circular reference will result in an "RecursionError" (or worse). If *allow_nan* is false (default: "True"), then it will be a "ValueError" to serialize out of range "float" values ("nan", "inf", "-inf") in strict compliance of the JSON specification. If *allow_nan* is true, their JavaScript equivalents ("NaN", "Infinity", "-Infinity") will be used. If *indent* is a non-negative integer or string, then JSON array elements and object members will be pretty-printed with that indent level. An indent level of 0, negative, or """" will only insert newlines. "None" (the default) selects the most compact representation. Using a positive integer indent indents that many spaces per level. If *indent* is a string (such as ""\t""), that string is used to indent each level. Changed in version 3.2: Allow strings for *indent* in addition to integers. If specified, *separators* should be an "(item_separator, key_separator)" tuple. The default is "(', ', ': ')" if *indent* is "None" and "(',', ': ')" otherwise. To get the most compact JSON representation, you should specify "(',', ':')" to eliminate whitespace. Changed in version 3.4: Use "(',', ': ')" as default if *indent* is not "None". If specified, *default* should be a function that gets called for objects that can’t otherwise be serialized. It should return a JSON encodable version of the object or raise a "TypeError". If not specified, "TypeError" is raised. If *sort_keys* is true (default: "False"), then the output of dictionaries will be sorted by key. To use a custom "JSONEncoder" subclass (e.g. one that overrides the "default()" method to serialize additional types), specify it with the *cls* kwarg; otherwise "JSONEncoder" is used. Changed in version 3.6: All optional parameters are now keyword- only. Note: Unlike "pickle" and "marshal", JSON is not a framed protocol, so trying to serialize multiple objects with repeated calls to "dump()" using the same *fp* will result in an invalid JSON file. json.dumps(obj, *, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, cls=None, indent=None, separators=None, default=None, sort_keys=False, **kw) Serialize *obj* to a JSON formatted "str" using this conversion table. The arguments have the same meaning as in "dump()". Note: Keys in key/value pairs of JSON are always of the type "str". When a dictionary is converted into JSON, all the keys of the dictionary are coerced to strings. As a result of this, if a dictionary is converted into JSON and then back into a dictionary, the dictionary may not equal the original one. That is, "loads(dumps(x)) != x" if x has non-string keys. json.load(fp, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw) Deserialize *fp* (a ".read()"-supporting *text file* or *binary file* containing a JSON document) to a Python object using this conversion table. *object_hook* is an optional function that will be called with the result of any object literal decoded (a "dict"). The return value of *object_hook* will be used instead of the "dict". This feature can be used to implement custom decoders (e.g. JSON-RPC class hinting). *object_pairs_hook* is an optional function that will be called with the result of any object literal decoded with an ordered list of pairs. The return value of *object_pairs_hook* will be used instead of the "dict". This feature can be used to implement custom decoders. If *object_hook* is also defined, the *object_pairs_hook* takes priority. Changed in version 3.1: Added support for *object_pairs_hook*. *parse_float*, if specified, will be called with the string of every JSON float to be decoded. By default, this is equivalent to "float(num_str)". This can be used to use another datatype or parser for JSON floats (e.g. "decimal.Decimal"). *parse_int*, if specified, will be called with the string of every JSON int to be decoded. By default, this is equivalent to "int(num_str)". This can be used to use another datatype or parser for JSON integers (e.g. "float"). Changed in version 3.9.14: The default *parse_int* of "int()" now limits the maximum length of the integer string via the interpreter’s integer string conversion length limitation to help avoid denial of service attacks. *parse_constant*, if specified, will be called with one of the following strings: "'-Infinity'", "'Infinity'", "'NaN'". This can be used to raise an exception if invalid JSON numbers are encountered. Changed in version 3.1: *parse_constant* doesn’t get called on ‘null’, ‘true’, ‘false’ anymore. To use a custom "JSONDecoder" subclass, specify it with the "cls" kwarg; otherwise "JSONDecoder" is used. Additional keyword arguments will be passed to the constructor of the class. If the data being deserialized is not a valid JSON document, a "JSONDecodeError" will be raised. Changed in version 3.6: All optional parameters are now keyword- only. Changed in version 3.6: *fp* can now be a *binary file*. The input encoding should be UTF-8, UTF-16 or UTF-32. json.loads(s, *, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None, **kw) Deserialize *s* (a "str", "bytes" or "bytearray" instance containing a JSON document) to a Python object using this conversion table. The other arguments have the same meaning as in "load()". If the data being deserialized is not a valid JSON document, a "JSONDecodeError" will be raised. Changed in version 3.6: *s* can now be of type "bytes" or "bytearray". The input encoding should be UTF-8, UTF-16 or UTF-32. Changed in version 3.9: The keyword argument *encoding* has been removed. Encoders and Decoders ===================== class json.JSONDecoder(*, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, strict=True, object_pairs_hook=None) Simple JSON decoder. Performs the following translations in decoding by default: +-----------------+---------------------+ | JSON | Python | |=================|=====================| | object | dict | +-----------------+---------------------+ | array | list | +-----------------+---------------------+ | string | str | +-----------------+---------------------+ | number (int) | int | +-----------------+---------------------+ | number (real) | float | +-----------------+---------------------+ | true | True | +-----------------+---------------------+ | false | False | +-----------------+---------------------+ | null | None | +-----------------+---------------------+ It also understands "NaN", "Infinity", and "-Infinity" as their corresponding "float" values, which is outside the JSON spec. *object_hook*, if specified, will be called with the result of every JSON object decoded and its return value will be used in place of the given "dict". This can be used to provide custom deserializations (e.g. to support JSON-RPC class hinting). *object_pairs_hook*, if specified will be called with the result of every JSON object decoded with an ordered list of pairs. The return value of *object_pairs_hook* will be used instead of the "dict". This feature can be used to implement custom decoders. If *object_hook* is also defined, the *object_pairs_hook* takes priority. Changed in version 3.1: Added support for *object_pairs_hook*. *parse_float*, if specified, will be called with the string of every JSON float to be decoded. By default, this is equivalent to "float(num_str)". This can be used to use another datatype or parser for JSON floats (e.g. "decimal.Decimal"). *parse_int*, if specified, will be called with the string of every JSON int to be decoded. By default, this is equivalent to "int(num_str)". This can be used to use another datatype or parser for JSON integers (e.g. "float"). *parse_constant*, if specified, will be called with one of the following strings: "'-Infinity'", "'Infinity'", "'NaN'". This can be used to raise an exception if invalid JSON numbers are encountered. If *strict* is false ("True" is the default), then control characters will be allowed inside strings. Control characters in this context are those with character codes in the 0–31 range, including "'\t'" (tab), "'\n'", "'\r'" and "'\0'". If the data being deserialized is not a valid JSON document, a "JSONDecodeError" will be raised. Changed in version 3.6: All parameters are now keyword-only. decode(s) Return the Python representation of *s* (a "str" instance containing a JSON document). "JSONDecodeError" will be raised if the given JSON document is not valid. raw_decode(s) Decode a JSON document from *s* (a "str" beginning with a JSON document) and return a 2-tuple of the Python representation and the index in *s* where the document ended. This can be used to decode a JSON document from a string that may have extraneous data at the end. class json.JSONEncoder(*, skipkeys=False, ensure_ascii=True, check_circular=True, allow_nan=True, sort_keys=False, indent=None, separators=None, default=None) Extensible JSON encoder for Python data structures. Supports the following objects and types by default: +------------------------------------------+-----------------+ | Python | JSON | |==========================================|=================| | dict | object | +------------------------------------------+-----------------+ | list, tuple | array | +------------------------------------------+-----------------+ | str | string | +------------------------------------------+-----------------+ | int, float, int- & float-derived Enums | number | +------------------------------------------+-----------------+ | True | true | +------------------------------------------+-----------------+ | False | false | +------------------------------------------+-----------------+ | None | null | +------------------------------------------+-----------------+ Changed in version 3.4: Added support for int- and float-derived Enum classes. To extend this to recognize other objects, subclass and implement a "default()" method with another method that returns a serializable object for "o" if possible, otherwise it should call the superclass implementation (to raise "TypeError"). If *skipkeys* is false (the default), a "TypeError" will be raised when trying to encode keys that are not "str", "int", "float" or "None". If *skipkeys* is true, such items are simply skipped. If *ensure_ascii* is true (the default), the output is guaranteed to have all incoming non-ASCII characters escaped. If *ensure_ascii* is false, these characters will be output as-is. If *check_circular* is true (the default), then lists, dicts, and custom encoded objects will be checked for circular references during encoding to prevent an infinite recursion (which would cause an "RecursionError"). Otherwise, no such check takes place. If *allow_nan* is true (the default), then "NaN", "Infinity", and "-Infinity" will be encoded as such. This behavior is not JSON specification compliant, but is consistent with most JavaScript based encoders and decoders. Otherwise, it will be a "ValueError" to encode such floats. If *sort_keys* is true (default: "False"), then the output of dictionaries will be sorted by key; this is useful for regression tests to ensure that JSON serializations can be compared on a day- to-day basis. If *indent* is a non-negative integer or string, then JSON array elements and object members will be pretty-printed with that indent level. An indent level of 0, negative, or """" will only insert newlines. "None" (the default) selects the most compact representation. Using a positive integer indent indents that many spaces per level. If *indent* is a string (such as ""\t""), that string is used to indent each level. Changed in version 3.2: Allow strings for *indent* in addition to integers. If specified, *separators* should be an "(item_separator, key_separator)" tuple. The default is "(', ', ': ')" if *indent* is "None" and "(',', ': ')" otherwise. To get the most compact JSON representation, you should specify "(',', ':')" to eliminate whitespace. Changed in version 3.4: Use "(',', ': ')" as default if *indent* is not "None". If specified, *default* should be a function that gets called for objects that can’t otherwise be serialized. It should return a JSON encodable version of the object or raise a "TypeError". If not specified, "TypeError" is raised. Changed in version 3.6: All parameters are now keyword-only. default(o) Implement this method in a subclass such that it returns a serializable object for *o*, or calls the base implementation (to raise a "TypeError"). For example, to support arbitrary iterators, you could implement "default()" like this: def default(self, o): try: iterable = iter(o) except TypeError: pass else: return list(iterable) # Let the base class default method raise the TypeError return json.JSONEncoder.default(self, o) encode(o) Return a JSON string representation of a Python data structure, *o*. For example: >>> json.JSONEncoder().encode({"foo": ["bar", "baz"]}) '{"foo": ["bar", "baz"]}' iterencode(o) Encode the given object, *o*, and yield each string representation as available. For example: for chunk in json.JSONEncoder().iterencode(bigobject): mysocket.write(chunk) Exceptions ========== exception json.JSONDecodeError(msg, doc, pos) Subclass of "ValueError" with the following additional attributes: msg The unformatted error message. doc The JSON document being parsed. pos The start index of *doc* where parsing failed. lineno The line corresponding to *pos*. colno The column corresponding to *pos*. New in version 3.5. Standard Compliance and Interoperability ======================================== The JSON format is specified by **RFC 7159** and by ECMA-404. This section details this module’s level of compliance with the RFC. For simplicity, "JSONEncoder" and "JSONDecoder" subclasses, and parameters other than those explicitly mentioned, are not considered. This module does not comply with the RFC in a strict fashion, implementing some extensions that are valid JavaScript but not valid JSON. In particular: * Infinite and NaN number values are accepted and output; * Repeated names within an object are accepted, and only the value of the last name-value pair is used. Since the RFC permits RFC-compliant parsers to accept input texts that are not RFC-compliant, this module’s deserializer is technically RFC- compliant under default settings. Character Encodings ------------------- The RFC requires that JSON be represented using either UTF-8, UTF-16, or UTF-32, with UTF-8 being the recommended default for maximum interoperability. As permitted, though not required, by the RFC, this module’s serializer sets *ensure_ascii=True* by default, thus escaping the output so that the resulting strings only contain ASCII characters. Other than the *ensure_ascii* parameter, this module is defined strictly in terms of conversion between Python objects and "Unicode strings", and thus does not otherwise directly address the issue of character encodings. The RFC prohibits adding a byte order mark (BOM) to the start of a JSON text, and this module’s serializer does not add a BOM to its output. The RFC permits, but does not require, JSON deserializers to ignore an initial BOM in their input. This module’s deserializer raises a "ValueError" when an initial BOM is present. The RFC does not explicitly forbid JSON strings which contain byte sequences that don’t correspond to valid Unicode characters (e.g. unpaired UTF-16 surrogates), but it does note that they may cause interoperability problems. By default, this module accepts and outputs (when present in the original "str") code points for such sequences. Infinite and NaN Number Values ------------------------------ The RFC does not permit the representation of infinite or NaN number values. Despite that, by default, this module accepts and outputs "Infinity", "-Infinity", and "NaN" as if they were valid JSON number literal values: >>> # Neither of these calls raises an exception, but the results are not valid JSON >>> json.dumps(float('-inf')) '-Infinity' >>> json.dumps(float('nan')) 'NaN' >>> # Same when deserializing >>> json.loads('-Infinity') -inf >>> json.loads('NaN') nan In the serializer, the *allow_nan* parameter can be used to alter this behavior. In the deserializer, the *parse_constant* parameter can be used to alter this behavior. Repeated Names Within an Object ------------------------------- The RFC specifies that the names within a JSON object should be unique, but does not mandate how repeated names in JSON objects should be handled. By default, this module does not raise an exception; instead, it ignores all but the last name-value pair for a given name: >>> weird_json = '{"x": 1, "x": 2, "x": 3}' >>> json.loads(weird_json) {'x': 3} The *object_pairs_hook* parameter can be used to alter this behavior. Top-level Non-Object, Non-Array Values -------------------------------------- The old version of JSON specified by the obsolete **RFC 4627** required that the top-level value of a JSON text must be either a JSON object or array (Python "dict" or "list"), and could not be a JSON null, boolean, number, or string value. **RFC 7159** removed that restriction, and this module does not and has never implemented that restriction in either its serializer or its deserializer. Regardless, for maximum interoperability, you may wish to voluntarily adhere to the restriction yourself. Implementation Limitations -------------------------- Some JSON deserializer implementations may set limits on: * the size of accepted JSON texts * the maximum level of nesting of JSON objects and arrays * the range and precision of JSON numbers * the content and maximum length of JSON strings This module does not impose any such limits beyond those of the relevant Python datatypes themselves or the Python interpreter itself. When serializing to JSON, beware any such limitations in applications that may consume your JSON. In particular, it is common for JSON numbers to be deserialized into IEEE 754 double precision numbers and thus subject to that representation’s range and precision limitations. This is especially relevant when serializing Python "int" values of extremely large magnitude, or when serializing instances of “exotic” numerical types such as "decimal.Decimal". Command Line Interface ====================== **Source code:** Lib/json/tool.py ====================================================================== The "json.tool" module provides a simple command line interface to validate and pretty-print JSON objects. If the optional "infile" and "outfile" arguments are not specified, "sys.stdin" and "sys.stdout" will be used respectively: $ echo '{"json": "obj"}' | python -m json.tool { "json": "obj" } $ echo '{1.2:3.4}' | python -m json.tool Expecting property name enclosed in double quotes: line 1 column 2 (char 1) Changed in version 3.5: The output is now in the same order as the input. Use the "--sort-keys" option to sort the output of dictionaries alphabetically by key. Command line options -------------------- infile The JSON file to be validated or pretty-printed: $ python -m json.tool mp_films.json [ { "title": "And Now for Something Completely Different", "year": 1971 }, { "title": "Monty Python and the Holy Grail", "year": 1975 } ] If *infile* is not specified, read from "sys.stdin". outfile Write the output of the *infile* to the given *outfile*. Otherwise, write it to "sys.stdout". --sort-keys Sort the output of dictionaries alphabetically by key. New in version 3.5. --no-ensure-ascii Disable escaping of non-ascii characters, see "json.dumps()" for more information. New in version 3.9. --json-lines Parse every input line as separate JSON object. New in version 3.8. --indent, --tab, --no-indent, --compact Mutually exclusive options for whitespace control. New in version 3.9. -h, --help Show the help message. -[ Footnotes ]- [1] As noted in the errata for RFC 7159, JSON permits literal U+2028 (LINE SEPARATOR) and U+2029 (PARAGRAPH SEPARATOR) characters in strings, whereas JavaScript (as of ECMAScript Edition 5.1) does not. "keyword" — Testing for Python keywords *************************************** **Source code:** Lib/keyword.py ====================================================================== This module allows a Python program to determine if a string is a keyword. keyword.iskeyword(s) Return "True" if *s* is a Python keyword. keyword.kwlist Sequence containing all the keywords defined for the interpreter. If any keywords are defined to only be active when particular "__future__" statements are in effect, these will be included as well. keyword.issoftkeyword(s) Return "True" if *s* is a Python soft keyword. New in version 3.9. keyword.softkwlist Sequence containing all the soft keywords defined for the interpreter. If any soft keywords are defined to only be active when particular "__future__" statements are in effect, these will be included as well. New in version 3.9. Python Language Services ************************ Python provides a number of modules to assist in working with the Python language. These modules support tokenizing, parsing, syntax analysis, bytecode disassembly, and various other facilities. These modules include: * "parser" — Access Python parse trees * Creating ST Objects * Converting ST Objects * Queries on ST Objects * Exceptions and Error Handling * ST Objects * Example: Emulation of "compile()" * "ast" — Abstract Syntax Trees * Abstract Grammar * Node classes * Literals * Variables * Expressions * Subscripting * Comprehensions * Statements * Imports * Control flow * Function and class definitions * Async and await * "ast" Helpers * Compiler Flags * Command-Line Usage * "symtable" — Access to the compiler’s symbol tables * Generating Symbol Tables * Examining Symbol Tables * "symbol" — Constants used with Python parse trees * "token" — Constants used with Python parse trees * "keyword" — Testing for Python keywords * "tokenize" — Tokenizer for Python source * Tokenizing Input * Command-Line Usage * Examples * "tabnanny" — Detection of ambiguous indentation * "pyclbr" — Python module browser support * Function Objects * Class Objects * "py_compile" — Compile Python source files * "compileall" — Byte-compile Python libraries * Command-line use * Public functions * "dis" — Disassembler for Python bytecode * Bytecode analysis * Analysis functions * Python Bytecode Instructions * Opcode collections * "pickletools" — Tools for pickle developers * Command line usage * Command line options * Programmatic Interface "linecache" — Random access to text lines ***************************************** **Source code:** Lib/linecache.py ====================================================================== The "linecache" module allows one to get any line from a Python source file, while attempting to optimize internally, using a cache, the common case where many lines are read from a single file. This is used by the "traceback" module to retrieve source lines for inclusion in the formatted traceback. The "tokenize.open()" function is used to open files. This function uses "tokenize.detect_encoding()" to get the encoding of the file; in the absence of an encoding token, the file encoding defaults to UTF-8. The "linecache" module defines the following functions: linecache.getline(filename, lineno, module_globals=None) Get line *lineno* from file named *filename*. This function will never raise an exception — it will return "''" on errors (the terminating newline character will be included for lines that are found). If a file named *filename* is not found, the function first checks for a **PEP 302** "__loader__" in *module_globals*. If there is such a loader and it defines a "get_source" method, then that determines the source lines (if "get_source()" returns "None", then "''" is returned). Finally, if *filename* is a relative filename, it is looked up relative to the entries in the module search path, "sys.path". linecache.clearcache() Clear the cache. Use this function if you no longer need lines from files previously read using "getline()". linecache.checkcache(filename=None) Check the cache for validity. Use this function if files in the cache may have changed on disk, and you require the updated version. If *filename* is omitted, it will check all the entries in the cache. linecache.lazycache(filename, module_globals) Capture enough detail about a non-file-based module to permit getting its lines later via "getline()" even if *module_globals* is "None" in the later call. This avoids doing I/O until a line is actually needed, without having to carry the module globals around indefinitely. New in version 3.5. Example: >>> import linecache >>> linecache.getline(linecache.__file__, 8) 'import sys\n' "locale" — Internationalization services **************************************** **Source code:** Lib/locale.py ====================================================================== The "locale" module opens access to the POSIX locale database and functionality. The POSIX locale mechanism allows programmers to deal with certain cultural issues in an application, without requiring the programmer to know all the specifics of each country where the software is executed. The "locale" module is implemented on top of the "_locale" module, which in turn uses an ANSI C locale implementation if available. The "locale" module defines the following exception and functions: exception locale.Error Exception raised when the locale passed to "setlocale()" is not recognized. locale.setlocale(category, locale=None) If *locale* is given and not "None", "setlocale()" modifies the locale setting for the *category*. The available categories are listed in the data description below. *locale* may be a string, or an iterable of two strings (language code and encoding). If it’s an iterable, it’s converted to a locale name using the locale aliasing engine. An empty string specifies the user’s default settings. If the modification of the locale fails, the exception "Error" is raised. If successful, the new locale setting is returned. If *locale* is omitted or "None", the current setting for *category* is returned. "setlocale()" is not thread-safe on most systems. Applications typically start with a call of import locale locale.setlocale(locale.LC_ALL, '') This sets the locale for all categories to the user’s default setting (typically specified in the "LANG" environment variable). If the locale is not changed thereafter, using multithreading should not cause problems. locale.localeconv() Returns the database of the local conventions as a dictionary. This dictionary has the following strings as keys: +------------------------+---------------------------------------+----------------------------------+ | Category | Key | Meaning | |========================|=======================================|==================================| | "LC_NUMERIC" | "'decimal_point'" | Decimal point character. | +------------------------+---------------------------------------+----------------------------------+ | | "'grouping'" | Sequence of numbers specifying | | | | which relative positions the | | | | "'thousands_sep'" is expected. | | | | If the sequence is terminated | | | | with "CHAR_MAX", no further | | | | grouping is performed. If the | | | | sequence terminates with a "0", | | | | the last group size is | | | | repeatedly used. | +------------------------+---------------------------------------+----------------------------------+ | | "'thousands_sep'" | Character used between groups. | +------------------------+---------------------------------------+----------------------------------+ | "LC_MONETARY" | "'int_curr_symbol'" | International currency symbol. | +------------------------+---------------------------------------+----------------------------------+ | | "'currency_symbol'" | Local currency symbol. | +------------------------+---------------------------------------+----------------------------------+ | | "'p_cs_precedes/n_cs_precedes'" | Whether the currency symbol | | | | precedes the value (for positive | | | | resp. negative values). | +------------------------+---------------------------------------+----------------------------------+ | | "'p_sep_by_space/n_sep_by_space'" | Whether the currency symbol is | | | | separated from the value by a | | | | space (for positive resp. | | | | negative values). | +------------------------+---------------------------------------+----------------------------------+ | | "'mon_decimal_point'" | Decimal point used for monetary | | | | values. | +------------------------+---------------------------------------+----------------------------------+ | | "'frac_digits'" | Number of fractional digits used | | | | in local formatting of monetary | | | | values. | +------------------------+---------------------------------------+----------------------------------+ | | "'int_frac_digits'" | Number of fractional digits used | | | | in international formatting of | | | | monetary values. | +------------------------+---------------------------------------+----------------------------------+ | | "'mon_thousands_sep'" | Group separator used for | | | | monetary values. | +------------------------+---------------------------------------+----------------------------------+ | | "'mon_grouping'" | Equivalent to "'grouping'", used | | | | for monetary values. | +------------------------+---------------------------------------+----------------------------------+ | | "'positive_sign'" | Symbol used to annotate a | | | | positive monetary value. | +------------------------+---------------------------------------+----------------------------------+ | | "'negative_sign'" | Symbol used to annotate a | | | | negative monetary value. | +------------------------+---------------------------------------+----------------------------------+ | | "'p_sign_posn/n_sign_posn'" | The position of the sign (for | | | | positive resp. negative values), | | | | see below. | +------------------------+---------------------------------------+----------------------------------+ All numeric values can be set to "CHAR_MAX" to indicate that there is no value specified in this locale. The possible values for "'p_sign_posn'" and "'n_sign_posn'" are given below. +----------------+-------------------------------------------+ | Value | Explanation | |================|===========================================| | "0" | Currency and value are surrounded by | | | parentheses. | +----------------+-------------------------------------------+ | "1" | The sign should precede the value and | | | currency symbol. | +----------------+-------------------------------------------+ | "2" | The sign should follow the value and | | | currency symbol. | +----------------+-------------------------------------------+ | "3" | The sign should immediately precede the | | | value. | +----------------+-------------------------------------------+ | "4" | The sign should immediately follow the | | | value. | +----------------+-------------------------------------------+ | "CHAR_MAX" | Nothing is specified in this locale. | +----------------+-------------------------------------------+ The function sets temporarily the "LC_CTYPE" locale to the "LC_NUMERIC" locale or the "LC_MONETARY" locale if locales are different and numeric or monetary strings are non-ASCII. This temporary change affects other threads. Changed in version 3.7: The function now sets temporarily the "LC_CTYPE" locale to the "LC_NUMERIC" locale in some cases. locale.nl_langinfo(option) Return some locale-specific information as a string. This function is not available on all systems, and the set of possible options might also vary across platforms. The possible argument values are numbers, for which symbolic constants are available in the locale module. The "nl_langinfo()" function accepts one of the following keys. Most descriptions are taken from the corresponding description in the GNU C library. locale.CODESET Get a string with the name of the character encoding used in the selected locale. locale.D_T_FMT Get a string that can be used as a format string for "time.strftime()" to represent date and time in a locale- specific way. locale.D_FMT Get a string that can be used as a format string for "time.strftime()" to represent a date in a locale-specific way. locale.T_FMT Get a string that can be used as a format string for "time.strftime()" to represent a time in a locale-specific way. locale.T_FMT_AMPM Get a format string for "time.strftime()" to represent time in the am/pm format. DAY_1 ... DAY_7 Get the name of the n-th day of the week. Note: This follows the US convention of "DAY_1" being Sunday, not the international convention (ISO 8601) that Monday is the first day of the week. ABDAY_1 ... ABDAY_7 Get the abbreviated name of the n-th day of the week. MON_1 ... MON_12 Get the name of the n-th month. ABMON_1 ... ABMON_12 Get the abbreviated name of the n-th month. locale.RADIXCHAR Get the radix character (decimal dot, decimal comma, etc.). locale.THOUSEP Get the separator character for thousands (groups of three digits). locale.YESEXPR Get a regular expression that can be used with the regex function to recognize a positive response to a yes/no question. Note: The expression is in the syntax suitable for the "regex()" function from the C library, which might differ from the syntax used in "re". locale.NOEXPR Get a regular expression that can be used with the regex(3) function to recognize a negative response to a yes/no question. locale.CRNCYSTR Get the currency symbol, preceded by “-” if the symbol should appear before the value, “+” if the symbol should appear after the value, or “.” if the symbol should replace the radix character. locale.ERA Get a string that represents the era used in the current locale. Most locales do not define this value. An example of a locale which does define this value is the Japanese one. In Japan, the traditional representation of dates includes the name of the era corresponding to the then-emperor’s reign. Normally it should not be necessary to use this value directly. Specifying the "E" modifier in their format strings causes the "time.strftime()" function to use this information. The format of the returned string is not specified, and therefore you should not assume knowledge of it on different systems. locale.ERA_D_T_FMT Get a format string for "time.strftime()" to represent date and time in a locale-specific era-based way. locale.ERA_D_FMT Get a format string for "time.strftime()" to represent a date in a locale-specific era-based way. locale.ERA_T_FMT Get a format string for "time.strftime()" to represent a time in a locale-specific era-based way. locale.ALT_DIGITS Get a representation of up to 100 values used to represent the values 0 to 99. locale.getdefaultlocale([envvars]) Tries to determine the default locale settings and returns them as a tuple of the form "(language code, encoding)". According to POSIX, a program which has not called "setlocale(LC_ALL, '')" runs using the portable "'C'" locale. Calling "setlocale(LC_ALL, '')" lets it use the default locale as defined by the "LANG" variable. Since we do not want to interfere with the current locale setting we thus emulate the behavior in the way described above. To maintain compatibility with other platforms, not only the "LANG" variable is tested, but a list of variables given as envvars parameter. The first found to be defined will be used. *envvars* defaults to the search path used in GNU gettext; it must always contain the variable name "'LANG'". The GNU gettext search path contains "'LC_ALL'", "'LC_CTYPE'", "'LANG'" and "'LANGUAGE'", in that order. Except for the code "'C'", the language code corresponds to **RFC 1766**. *language code* and *encoding* may be "None" if their values cannot be determined. locale.getlocale(category=LC_CTYPE) Returns the current setting for the given locale category as sequence containing *language code*, *encoding*. *category* may be one of the "LC_*" values except "LC_ALL". It defaults to "LC_CTYPE". Except for the code "'C'", the language code corresponds to **RFC 1766**. *language code* and *encoding* may be "None" if their values cannot be determined. locale.getpreferredencoding(do_setlocale=True) Return the encoding used for text data, according to user preferences. User preferences are expressed differently on different systems, and might not be available programmatically on some systems, so this function only returns a guess. On some systems, it is necessary to invoke "setlocale()" to obtain the user preferences, so this function is not thread-safe. If invoking setlocale is not necessary or desired, *do_setlocale* should be set to "False". On Android or in the UTF-8 mode ("-X" "utf8" option), always return "'UTF-8'", the locale and the *do_setlocale* argument are ignored. Changed in version 3.7: The function now always returns "UTF-8" on Android or if the UTF-8 mode is enabled. locale.normalize(localename) Returns a normalized locale code for the given locale name. The returned locale code is formatted for use with "setlocale()". If normalization fails, the original name is returned unchanged. If the given encoding is not known, the function defaults to the default encoding for the locale code just like "setlocale()". locale.resetlocale(category=LC_ALL) Sets the locale for *category* to the default setting. The default setting is determined by calling "getdefaultlocale()". *category* defaults to "LC_ALL". locale.strcoll(string1, string2) Compares two strings according to the current "LC_COLLATE" setting. As any other compare function, returns a negative, or a positive value, or "0", depending on whether *string1* collates before or after *string2* or is equal to it. locale.strxfrm(string) Transforms a string to one that can be used in locale-aware comparisons. For example, "strxfrm(s1) < strxfrm(s2)" is equivalent to "strcoll(s1, s2) < 0". This function can be used when the same string is compared repeatedly, e.g. when collating a sequence of strings. locale.format_string(format, val, grouping=False, monetary=False) Formats a number *val* according to the current "LC_NUMERIC" setting. The format follows the conventions of the "%" operator. For floating point values, the decimal point is modified if appropriate. If *grouping* is true, also takes the grouping into account. If *monetary* is true, the conversion uses monetary thousands separator and grouping strings. Processes formatting specifiers as in "format % val", but takes the current locale settings into account. Changed in version 3.7: The *monetary* keyword parameter was added. locale.format(format, val, grouping=False, monetary=False) Please note that this function works like "format_string()" but will only work for exactly one "%char" specifier. For example, "'%f'" and "'%.0f'" are both valid specifiers, but "'%f KiB'" is not. For whole format strings, use "format_string()". Deprecated since version 3.7: Use "format_string()" instead. locale.currency(val, symbol=True, grouping=False, international=False) Formats a number *val* according to the current "LC_MONETARY" settings. The returned string includes the currency symbol if *symbol* is true, which is the default. If *grouping* is true (which is not the default), grouping is done with the value. If *international* is true (which is not the default), the international currency symbol is used. Note that this function will not work with the ‘C’ locale, so you have to set a locale via "setlocale()" first. locale.str(float) Formats a floating point number using the same format as the built- in function "str(float)", but takes the decimal point into account. locale.delocalize(string) Converts a string into a normalized number string, following the "LC_NUMERIC" settings. New in version 3.5. locale.atof(string, func=float) Converts a string to a number, following the "LC_NUMERIC" settings, by calling *func* on the result of calling "delocalize()" on *string*. locale.atoi(string) Converts a string to an integer, following the "LC_NUMERIC" conventions. locale.LC_CTYPE Locale category for the character type functions. Depending on the settings of this category, the functions of module "string" dealing with case change their behaviour. locale.LC_COLLATE Locale category for sorting strings. The functions "strcoll()" and "strxfrm()" of the "locale" module are affected. locale.LC_TIME Locale category for the formatting of time. The function "time.strftime()" follows these conventions. locale.LC_MONETARY Locale category for formatting of monetary values. The available options are available from the "localeconv()" function. locale.LC_MESSAGES Locale category for message display. Python currently does not support application specific locale-aware messages. Messages displayed by the operating system, like those returned by "os.strerror()" might be affected by this category. locale.LC_NUMERIC Locale category for formatting numbers. The functions "format()", "atoi()", "atof()" and "str()" of the "locale" module are affected by that category. All other numeric formatting operations are not affected. locale.LC_ALL Combination of all locale settings. If this flag is used when the locale is changed, setting the locale for all categories is attempted. If that fails for any category, no category is changed at all. When the locale is retrieved using this flag, a string indicating the setting for all categories is returned. This string can be later used to restore the settings. locale.CHAR_MAX This is a symbolic constant used for different values returned by "localeconv()". Example: >>> import locale >>> loc = locale.getlocale() # get current locale # use German locale; name might vary with platform >>> locale.setlocale(locale.LC_ALL, 'de_DE') >>> locale.strcoll('f\xe4n', 'foo') # compare a string containing an umlaut >>> locale.setlocale(locale.LC_ALL, '') # use user's preferred locale >>> locale.setlocale(locale.LC_ALL, 'C') # use default (C) locale >>> locale.setlocale(locale.LC_ALL, loc) # restore saved locale Background, details, hints, tips and caveats ============================================ The C standard defines the locale as a program-wide property that may be relatively expensive to change. On top of that, some implementation are broken in such a way that frequent locale changes may cause core dumps. This makes the locale somewhat painful to use correctly. Initially, when a program is started, the locale is the "C" locale, no matter what the user’s preferred locale is. There is one exception: the "LC_CTYPE" category is changed at startup to set the current locale encoding to the user’s preferred locale encoding. The program must explicitly say that it wants the user’s preferred locale settings for other categories by calling "setlocale(LC_ALL, '')". It is generally a bad idea to call "setlocale()" in some library routine, since as a side effect it affects the entire program. Saving and restoring it is almost as bad: it is expensive and affects other threads that happen to run before the settings have been restored. If, when coding a module for general use, you need a locale independent version of an operation that is affected by the locale (such as certain formats used with "time.strftime()"), you will have to find a way to do it without using the standard library routine. Even better is convincing yourself that using locale settings is okay. Only as a last resort should you document that your module is not compatible with non-"C" locale settings. The only way to perform numeric operations according to the locale is to use the special functions defined by this module: "atof()", "atoi()", "format()", "str()". There is no way to perform case conversions and character classifications according to the locale. For (Unicode) text strings these are done according to the character value only, while for byte strings, the conversions and classifications are done according to the ASCII value of the byte, and bytes whose high bit is set (i.e., non- ASCII bytes) are never converted or considered part of a character class such as letter or whitespace. For extension writers and programs that embed Python ==================================================== Extension modules should never call "setlocale()", except to find out what the current locale is. But since the return value can only be used portably to restore it, that is not very useful (except perhaps to find out whether or not the locale is "C"). When Python code uses the "locale" module to change the locale, this also affects the embedding application. If the embedding application doesn’t want this to happen, it should remove the "_locale" extension module (which does all the work) from the table of built-in modules in the "config.c" file, and make sure that the "_locale" module is not accessible as a shared library. Access to message catalogs ========================== locale.gettext(msg) locale.dgettext(domain, msg) locale.dcgettext(domain, msg, category) locale.textdomain(domain) locale.bindtextdomain(domain, dir) The locale module exposes the C library’s gettext interface on systems that provide this interface. It consists of the functions "gettext()", "dgettext()", "dcgettext()", "textdomain()", "bindtextdomain()", and "bind_textdomain_codeset()". These are similar to the same functions in the "gettext" module, but use the C library’s binary format for message catalogs, and the C library’s search algorithms for locating message catalogs. Python applications should normally find no need to invoke these functions, and should use "gettext" instead. A known exception to this rule are applications that link with additional C libraries which internally invoke "gettext()" or "dcgettext()". For these applications, it may be necessary to bind the text domain, so that the libraries can properly locate their message catalogs. "logging.config" — Logging configuration **************************************** **Source code:** Lib/logging/config.py Important ^^^^^^^^^ This page contains only reference information. For tutorials, please see * Basic Tutorial * Advanced Tutorial * Logging Cookbook ====================================================================== This section describes the API for configuring the logging module. Configuration functions ======================= The following functions configure the logging module. They are located in the "logging.config" module. Their use is optional — you can configure the logging module using these functions or by making calls to the main API (defined in "logging" itself) and defining handlers which are declared either in "logging" or "logging.handlers". logging.config.dictConfig(config) Takes the logging configuration from a dictionary. The contents of this dictionary are described in Configuration dictionary schema below. If an error is encountered during configuration, this function will raise a "ValueError", "TypeError", "AttributeError" or "ImportError" with a suitably descriptive message. The following is a (possibly incomplete) list of conditions which will raise an error: * A "level" which is not a string or which is a string not corresponding to an actual logging level. * A "propagate" value which is not a boolean. * An id which does not have a corresponding destination. * A non-existent handler id found during an incremental call. * An invalid logger name. * Inability to resolve to an internal or external object. Parsing is performed by the "DictConfigurator" class, whose constructor is passed the dictionary used for configuration, and has a "configure()" method. The "logging.config" module has a callable attribute "dictConfigClass" which is initially set to "DictConfigurator". You can replace the value of "dictConfigClass" with a suitable implementation of your own. "dictConfig()" calls "dictConfigClass" passing the specified dictionary, and then calls the "configure()" method on the returned object to put the configuration into effect: def dictConfig(config): dictConfigClass(config).configure() For example, a subclass of "DictConfigurator" could call "DictConfigurator.__init__()" in its own "__init__()", then set up custom prefixes which would be usable in the subsequent "configure()" call. "dictConfigClass" would be bound to this new subclass, and then "dictConfig()" could be called exactly as in the default, uncustomized state. New in version 3.2. logging.config.fileConfig(fname, defaults=None, disable_existing_loggers=True) Reads the logging configuration from a "configparser"-format file. The format of the file should be as described in Configuration file format. This function can be called several times from an application, allowing an end user to select from various pre-canned configurations (if the developer provides a mechanism to present the choices and load the chosen configuration). Parameters: * **fname** – A filename, or a file-like object, or an instance derived from "RawConfigParser". If a "RawConfigParser"-derived instance is passed, it is used as is. Otherwise, a "Configparser" is instantiated, and the configuration read by it from the object passed in "fname". If that has a "readline()" method, it is assumed to be a file-like object and read using "read_file()"; otherwise, it is assumed to be a filename and passed to "read()". * **defaults** – Defaults to be passed to the ConfigParser can be specified in this argument. * **disable_existing_loggers** – If specified as "False", loggers which exist when this call is made are left enabled. The default is "True" because this enables old behaviour in a backward-compatible way. This behaviour is to disable any existing non-root loggers unless they or their ancestors are explicitly named in the logging configuration. Changed in version 3.4: An instance of a subclass of "RawConfigParser" is now accepted as a value for "fname". This facilitates: * Use of a configuration file where logging configuration is just part of the overall application configuration. * Use of a configuration read from a file, and then modified by the using application (e.g. based on command-line parameters or other aspects of the runtime environment) before being passed to "fileConfig". logging.config.listen(port=DEFAULT_LOGGING_CONFIG_PORT, verify=None) Starts up a socket server on the specified port, and listens for new configurations. If no port is specified, the module’s default "DEFAULT_LOGGING_CONFIG_PORT" is used. Logging configurations will be sent as a file suitable for processing by "dictConfig()" or "fileConfig()". Returns a "Thread" instance on which you can call "start()" to start the server, and which you can "join()" when appropriate. To stop the server, call "stopListening()". The "verify" argument, if specified, should be a callable which should verify whether bytes received across the socket are valid and should be processed. This could be done by encrypting and/or signing what is sent across the socket, such that the "verify" callable can perform signature verification and/or decryption. The "verify" callable is called with a single argument - the bytes received across the socket - and should return the bytes to be processed, or "None" to indicate that the bytes should be discarded. The returned bytes could be the same as the passed in bytes (e.g. when only verification is done), or they could be completely different (perhaps if decryption were performed). To send a configuration to the socket, read in the configuration file and send it to the socket as a sequence of bytes preceded by a four-byte length string packed in binary using "struct.pack('>L', n)". Note: Because portions of the configuration are passed through "eval()", use of this function may open its users to a security risk. While the function only binds to a socket on "localhost", and so does not accept connections from remote machines, there are scenarios where untrusted code could be run under the account of the process which calls "listen()". Specifically, if the process calling "listen()" runs on a multi-user machine where users cannot trust each other, then a malicious user could arrange to run essentially arbitrary code in a victim user’s process, simply by connecting to the victim’s "listen()" socket and sending a configuration which runs whatever code the attacker wants to have executed in the victim’s process. This is especially easy to do if the default port is used, but not hard even if a different port is used. To avoid the risk of this happening, use the "verify" argument to "listen()" to prevent unrecognised configurations from being applied. Changed in version 3.4: The "verify" argument was added. Note: If you want to send configurations to the listener which don’t disable existing loggers, you will need to use a JSON format for the configuration, which will use "dictConfig()" for configuration. This method allows you to specify "disable_existing_loggers" as "False" in the configuration you send. logging.config.stopListening() Stops the listening server which was created with a call to "listen()". This is typically called before calling "join()" on the return value from "listen()". Security considerations ======================= The logging configuration functionality tries to offer convenience, and in part this is done by offering the ability to convert text in configuration files into Python objects used in logging configuration - for example, as described in User-defined objects. However, these same mechanisms (importing callables from user-defined modules and calling them with parameters from the configuration) could be used to invoke any code you like, and for this reason you should treat configuration files from untrusted sources with *extreme caution* and satisfy yourself that nothing bad can happen if you load them, before actually loading them. Configuration dictionary schema =============================== Describing a logging configuration requires listing the various objects to create and the connections between them; for example, you may create a handler named ‘console’ and then say that the logger named ‘startup’ will send its messages to the ‘console’ handler. These objects aren’t limited to those provided by the "logging" module because you might write your own formatter or handler class. The parameters to these classes may also need to include external objects such as "sys.stderr". The syntax for describing these objects and connections is defined in Object connections below. Dictionary Schema Details ------------------------- The dictionary passed to "dictConfig()" must contain the following keys: * *version* - to be set to an integer value representing the schema version. The only valid value at present is 1, but having this key allows the schema to evolve while still preserving backwards compatibility. All other keys are optional, but if present they will be interpreted as described below. In all cases below where a ‘configuring dict’ is mentioned, it will be checked for the special "'()'" key to see if a custom instantiation is required. If so, the mechanism described in User-defined objects below is used to create an instance; otherwise, the context is used to determine what to instantiate. * *formatters* - the corresponding value will be a dict in which each key is a formatter id and each value is a dict describing how to configure the corresponding "Formatter" instance. The configuring dict is searched for keys "format" and "datefmt" (with defaults of "None") and these are used to construct a "Formatter" instance. Changed in version 3.8: a "validate" key (with default of "True") can be added into the "formatters" section of the configuring dict, this is to validate the format. * *filters* - the corresponding value will be a dict in which each key is a filter id and each value is a dict describing how to configure the corresponding Filter instance. The configuring dict is searched for the key "name" (defaulting to the empty string) and this is used to construct a "logging.Filter" instance. * *handlers* - the corresponding value will be a dict in which each key is a handler id and each value is a dict describing how to configure the corresponding Handler instance. The configuring dict is searched for the following keys: * "class" (mandatory). This is the fully qualified name of the handler class. * "level" (optional). The level of the handler. * "formatter" (optional). The id of the formatter for this handler. * "filters" (optional). A list of ids of the filters for this handler. All *other* keys are passed through as keyword arguments to the handler’s constructor. For example, given the snippet: handlers: console: class : logging.StreamHandler formatter: brief level : INFO filters: [allow_foo] stream : ext://sys.stdout file: class : logging.handlers.RotatingFileHandler formatter: precise filename: logconfig.log maxBytes: 1024 backupCount: 3 the handler with id "console" is instantiated as a "logging.StreamHandler", using "sys.stdout" as the underlying stream. The handler with id "file" is instantiated as a "logging.handlers.RotatingFileHandler" with the keyword arguments "filename='logconfig.log', maxBytes=1024, backupCount=3". * *loggers* - the corresponding value will be a dict in which each key is a logger name and each value is a dict describing how to configure the corresponding Logger instance. The configuring dict is searched for the following keys: * "level" (optional). The level of the logger. * "propagate" (optional). The propagation setting of the logger. * "filters" (optional). A list of ids of the filters for this logger. * "handlers" (optional). A list of ids of the handlers for this logger. The specified loggers will be configured according to the level, propagation, filters and handlers specified. * *root* - this will be the configuration for the root logger. Processing of the configuration will be as for any logger, except that the "propagate" setting will not be applicable. * *incremental* - whether the configuration is to be interpreted as incremental to the existing configuration. This value defaults to "False", which means that the specified configuration replaces the existing configuration with the same semantics as used by the existing "fileConfig()" API. If the specified value is "True", the configuration is processed as described in the section on Incremental Configuration. * *disable_existing_loggers* - whether any existing non-root loggers are to be disabled. This setting mirrors the parameter of the same name in "fileConfig()". If absent, this parameter defaults to "True". This value is ignored if *incremental* is "True". Incremental Configuration ------------------------- It is difficult to provide complete flexibility for incremental configuration. For example, because objects such as filters and formatters are anonymous, once a configuration is set up, it is not possible to refer to such anonymous objects when augmenting a configuration. Furthermore, there is not a compelling case for arbitrarily altering the object graph of loggers, handlers, filters, formatters at run- time, once a configuration is set up; the verbosity of loggers and handlers can be controlled just by setting levels (and, in the case of loggers, propagation flags). Changing the object graph arbitrarily in a safe way is problematic in a multi-threaded environment; while not impossible, the benefits are not worth the complexity it adds to the implementation. Thus, when the "incremental" key of a configuration dict is present and is "True", the system will completely ignore any "formatters" and "filters" entries, and process only the "level" settings in the "handlers" entries, and the "level" and "propagate" settings in the "loggers" and "root" entries. Using a value in the configuration dict lets configurations to be sent over the wire as pickled dicts to a socket listener. Thus, the logging verbosity of a long-running application can be altered over time with no need to stop and restart the application. Object connections ------------------ The schema describes a set of logging objects - loggers, handlers, formatters, filters - which are connected to each other in an object graph. Thus, the schema needs to represent connections between the objects. For example, say that, once configured, a particular logger has attached to it a particular handler. For the purposes of this discussion, we can say that the logger represents the source, and the handler the destination, of a connection between the two. Of course in the configured objects this is represented by the logger holding a reference to the handler. In the configuration dict, this is done by giving each destination object an id which identifies it unambiguously, and then using the id in the source object’s configuration to indicate that a connection exists between the source and the destination object with that id. So, for example, consider the following YAML snippet: formatters: brief: # configuration for formatter with id 'brief' goes here precise: # configuration for formatter with id 'precise' goes here handlers: h1: #This is an id # configuration of handler with id 'h1' goes here formatter: brief h2: #This is another id # configuration of handler with id 'h2' goes here formatter: precise loggers: foo.bar.baz: # other configuration for logger 'foo.bar.baz' handlers: [h1, h2] (Note: YAML used here because it’s a little more readable than the equivalent Python source form for the dictionary.) The ids for loggers are the logger names which would be used programmatically to obtain a reference to those loggers, e.g. "foo.bar.baz". The ids for Formatters and Filters can be any string value (such as "brief", "precise" above) and they are transient, in that they are only meaningful for processing the configuration dictionary and used to determine connections between objects, and are not persisted anywhere when the configuration call is complete. The above snippet indicates that logger named "foo.bar.baz" should have two handlers attached to it, which are described by the handler ids "h1" and "h2". The formatter for "h1" is that described by id "brief", and the formatter for "h2" is that described by id "precise". User-defined objects -------------------- The schema supports user-defined objects for handlers, filters and formatters. (Loggers do not need to have different types for different instances, so there is no support in this configuration schema for user-defined logger classes.) Objects to be configured are described by dictionaries which detail their configuration. In some places, the logging system will be able to infer from the context how an object is to be instantiated, but when a user-defined object is to be instantiated, the system will not know how to do this. In order to provide complete flexibility for user-defined object instantiation, the user needs to provide a ‘factory’ - a callable which is called with a configuration dictionary and which returns the instantiated object. This is signalled by an absolute import path to the factory being made available under the special key "'()'". Here’s a concrete example: formatters: brief: format: '%(message)s' default: format: '%(asctime)s %(levelname)-8s %(name)-15s %(message)s' datefmt: '%Y-%m-%d %H:%M:%S' custom: (): my.package.customFormatterFactory bar: baz spam: 99.9 answer: 42 The above YAML snippet defines three formatters. The first, with id "brief", is a standard "logging.Formatter" instance with the specified format string. The second, with id "default", has a longer format and also defines the time format explicitly, and will result in a "logging.Formatter" initialized with those two format strings. Shown in Python source form, the "brief" and "default" formatters have configuration sub-dictionaries: { 'format' : '%(message)s' } and: { 'format' : '%(asctime)s %(levelname)-8s %(name)-15s %(message)s', 'datefmt' : '%Y-%m-%d %H:%M:%S' } respectively, and as these dictionaries do not contain the special key "'()'", the instantiation is inferred from the context: as a result, standard "logging.Formatter" instances are created. The configuration sub-dictionary for the third formatter, with id "custom", is: { '()' : 'my.package.customFormatterFactory', 'bar' : 'baz', 'spam' : 99.9, 'answer' : 42 } and this contains the special key "'()'", which means that user- defined instantiation is wanted. In this case, the specified factory callable will be used. If it is an actual callable it will be used directly - otherwise, if you specify a string (as in the example) the actual callable will be located using normal import mechanisms. The callable will be called with the **remaining** items in the configuration sub-dictionary as keyword arguments. In the above example, the formatter with id "custom" will be assumed to be returned by the call: my.package.customFormatterFactory(bar='baz', spam=99.9, answer=42) The key "'()'" has been used as the special key because it is not a valid keyword parameter name, and so will not clash with the names of the keyword arguments used in the call. The "'()'" also serves as a mnemonic that the corresponding value is a callable. Access to external objects -------------------------- There are times where a configuration needs to refer to objects external to the configuration, for example "sys.stderr". If the configuration dict is constructed using Python code, this is straightforward, but a problem arises when the configuration is provided via a text file (e.g. JSON, YAML). In a text file, there is no standard way to distinguish "sys.stderr" from the literal string "'sys.stderr'". To facilitate this distinction, the configuration system looks for certain special prefixes in string values and treat them specially. For example, if the literal string "'ext://sys.stderr'" is provided as a value in the configuration, then the "ext://" will be stripped off and the remainder of the value processed using normal import mechanisms. The handling of such prefixes is done in a way analogous to protocol handling: there is a generic mechanism to look for prefixes which match the regular expression "^(?P[a-z]+)://(?P.*)$" whereby, if the "prefix" is recognised, the "suffix" is processed in a prefix-dependent manner and the result of the processing replaces the string value. If the prefix is not recognised, then the string value will be left as-is. Access to internal objects -------------------------- As well as external objects, there is sometimes also a need to refer to objects in the configuration. This will be done implicitly by the configuration system for things that it knows about. For example, the string value "'DEBUG'" for a "level" in a logger or handler will automatically be converted to the value "logging.DEBUG", and the "handlers", "filters" and "formatter" entries will take an object id and resolve to the appropriate destination object. However, a more generic mechanism is needed for user-defined objects which are not known to the "logging" module. For example, consider "logging.handlers.MemoryHandler", which takes a "target" argument which is another handler to delegate to. Since the system already knows about this class, then in the configuration, the given "target" just needs to be the object id of the relevant target handler, and the system will resolve to the handler from the id. If, however, a user defines a "my.package.MyHandler" which has an "alternate" handler, the configuration system would not know that the "alternate" referred to a handler. To cater for this, a generic resolution system allows the user to specify: handlers: file: # configuration of file handler goes here custom: (): my.package.MyHandler alternate: cfg://handlers.file The literal string "'cfg://handlers.file'" will be resolved in an analogous way to strings with the "ext://" prefix, but looking in the configuration itself rather than the import namespace. The mechanism allows access by dot or by index, in a similar way to that provided by "str.format". Thus, given the following snippet: handlers: email: class: logging.handlers.SMTPHandler mailhost: localhost fromaddr: my_app@domain.tld toaddrs: - support_team@domain.tld - dev_team@domain.tld subject: Houston, we have a problem. in the configuration, the string "'cfg://handlers'" would resolve to the dict with key "handlers", the string "'cfg://handlers.email" would resolve to the dict with key "email" in the "handlers" dict, and so on. The string "'cfg://handlers.email.toaddrs[1]" would resolve to "'dev_team.domain.tld'" and the string "'cfg://handlers.email.toaddrs[0]'" would resolve to the value "'support_team@domain.tld'". The "subject" value could be accessed using either "'cfg://handlers.email.subject'" or, equivalently, "'cfg://handlers.email[subject]'". The latter form only needs to be used if the key contains spaces or non-alphanumeric characters. If an index value consists only of decimal digits, access will be attempted using the corresponding integer value, falling back to the string value if needed. Given a string "cfg://handlers.myhandler.mykey.123", this will resolve to "config_dict['handlers']['myhandler']['mykey']['123']". If the string is specified as "cfg://handlers.myhandler.mykey[123]", the system will attempt to retrieve the value from "config_dict['handlers']['myhandler']['mykey'][123]", and fall back to "config_dict['handlers']['myhandler']['mykey']['123']" if that fails. Import resolution and custom importers -------------------------------------- Import resolution, by default, uses the builtin "__import__()" function to do its importing. You may want to replace this with your own importing mechanism: if so, you can replace the "importer" attribute of the "DictConfigurator" or its superclass, the "BaseConfigurator" class. However, you need to be careful because of the way functions are accessed from classes via descriptors. If you are using a Python callable to do your imports, and you want to define it at class level rather than instance level, you need to wrap it with "staticmethod()". For example: from importlib import import_module from logging.config import BaseConfigurator BaseConfigurator.importer = staticmethod(import_module) You don’t need to wrap with "staticmethod()" if you’re setting the import callable on a configurator *instance*. Configuration file format ========================= The configuration file format understood by "fileConfig()" is based on "configparser" functionality. The file must contain sections called "[loggers]", "[handlers]" and "[formatters]" which identify by name the entities of each type which are defined in the file. For each such entity, there is a separate section which identifies how that entity is configured. Thus, for a logger named "log01" in the "[loggers]" section, the relevant configuration details are held in a section "[logger_log01]". Similarly, a handler called "hand01" in the "[handlers]" section will have its configuration held in a section called "[handler_hand01]", while a formatter called "form01" in the "[formatters]" section will have its configuration specified in a section called "[formatter_form01]". The root logger configuration must be specified in a section called "[logger_root]". Note: The "fileConfig()" API is older than the "dictConfig()" API and does not provide functionality to cover certain aspects of logging. For example, you cannot configure "Filter" objects, which provide for filtering of messages beyond simple integer levels, using "fileConfig()". If you need to have instances of "Filter" in your logging configuration, you will need to use "dictConfig()". Note that future enhancements to configuration functionality will be added to "dictConfig()", so it’s worth considering transitioning to this newer API when it’s convenient to do so. Examples of these sections in the file are given below. [loggers] keys=root,log02,log03,log04,log05,log06,log07 [handlers] keys=hand01,hand02,hand03,hand04,hand05,hand06,hand07,hand08,hand09 [formatters] keys=form01,form02,form03,form04,form05,form06,form07,form08,form09 The root logger must specify a level and a list of handlers. An example of a root logger section is given below. [logger_root] level=NOTSET handlers=hand01 The "level" entry can be one of "DEBUG, INFO, WARNING, ERROR, CRITICAL" or "NOTSET". For the root logger only, "NOTSET" means that all messages will be logged. Level values are "eval()"uated in the context of the "logging" package’s namespace. The "handlers" entry is a comma-separated list of handler names, which must appear in the "[handlers]" section. These names must appear in the "[handlers]" section and have corresponding sections in the configuration file. For loggers other than the root logger, some additional information is required. This is illustrated by the following example. [logger_parser] level=DEBUG handlers=hand01 propagate=1 qualname=compiler.parser The "level" and "handlers" entries are interpreted as for the root logger, except that if a non-root logger’s level is specified as "NOTSET", the system consults loggers higher up the hierarchy to determine the effective level of the logger. The "propagate" entry is set to 1 to indicate that messages must propagate to handlers higher up the logger hierarchy from this logger, or 0 to indicate that messages are **not** propagated to handlers up the hierarchy. The "qualname" entry is the hierarchical channel name of the logger, that is to say the name used by the application to get the logger. Sections which specify handler configuration are exemplified by the following. [handler_hand01] class=StreamHandler level=NOTSET formatter=form01 args=(sys.stdout,) The "class" entry indicates the handler’s class (as determined by "eval()" in the "logging" package’s namespace). The "level" is interpreted as for loggers, and "NOTSET" is taken to mean ‘log everything’. The "formatter" entry indicates the key name of the formatter for this handler. If blank, a default formatter ("logging._defaultFormatter") is used. If a name is specified, it must appear in the "[formatters]" section and have a corresponding section in the configuration file. The "args" entry, when "eval()"uated in the context of the "logging" package’s namespace, is the list of arguments to the constructor for the handler class. Refer to the constructors for the relevant handlers, or to the examples below, to see how typical entries are constructed. If not provided, it defaults to "()". The optional "kwargs" entry, when "eval()"uated in the context of the "logging" package’s namespace, is the keyword argument dict to the constructor for the handler class. If not provided, it defaults to "{}". [handler_hand02] class=FileHandler level=DEBUG formatter=form02 args=('python.log', 'w') [handler_hand03] class=handlers.SocketHandler level=INFO formatter=form03 args=('localhost', handlers.DEFAULT_TCP_LOGGING_PORT) [handler_hand04] class=handlers.DatagramHandler level=WARN formatter=form04 args=('localhost', handlers.DEFAULT_UDP_LOGGING_PORT) [handler_hand05] class=handlers.SysLogHandler level=ERROR formatter=form05 args=(('localhost', handlers.SYSLOG_UDP_PORT), handlers.SysLogHandler.LOG_USER) [handler_hand06] class=handlers.NTEventLogHandler level=CRITICAL formatter=form06 args=('Python Application', '', 'Application') [handler_hand07] class=handlers.SMTPHandler level=WARN formatter=form07 args=('localhost', 'from@abc', ['user1@abc', 'user2@xyz'], 'Logger Subject') kwargs={'timeout': 10.0} [handler_hand08] class=handlers.MemoryHandler level=NOTSET formatter=form08 target= args=(10, ERROR) [handler_hand09] class=handlers.HTTPHandler level=NOTSET formatter=form09 args=('localhost:9022', '/log', 'GET') kwargs={'secure': True} Sections which specify formatter configuration are typified by the following. [formatter_form01] format=F1 %(asctime)s %(levelname)s %(message)s datefmt= class=logging.Formatter The "format" entry is the overall format string, and the "datefmt" entry is the "strftime()"-compatible date/time format string. If empty, the package substitutes something which is almost equivalent to specifying the date format string "'%Y-%m-%d %H:%M:%S'". This format also specifies milliseconds, which are appended to the result of using the above format string, with a comma separator. An example time in this format is "2003-01-23 00:29:50,411". The "class" entry is optional. It indicates the name of the formatter’s class (as a dotted module and class name.) This option is useful for instantiating a "Formatter" subclass. Subclasses of "Formatter" can present exception tracebacks in an expanded or condensed format. Note: Due to the use of "eval()" as described above, there are potential security risks which result from using the "listen()" to send and receive configurations via sockets. The risks are limited to where multiple users with no mutual trust run code on the same machine; see the "listen()" documentation for more information. See also: Module "logging" API reference for the logging module. Module "logging.handlers" Useful handlers included with the logging module. "logging.handlers" — Logging handlers ************************************* **Source code:** Lib/logging/handlers.py Important ^^^^^^^^^ This page contains only reference information. For tutorials, please see * Basic Tutorial * Advanced Tutorial * Logging Cookbook ====================================================================== The following useful handlers are provided in the package. Note that three of the handlers ("StreamHandler", "FileHandler" and "NullHandler") are actually defined in the "logging" module itself, but have been documented here along with the other handlers. StreamHandler ============= The "StreamHandler" class, located in the core "logging" package, sends logging output to streams such as *sys.stdout*, *sys.stderr* or any file-like object (or, more precisely, any object which supports "write()" and "flush()" methods). class logging.StreamHandler(stream=None) Returns a new instance of the "StreamHandler" class. If *stream* is specified, the instance will use it for logging output; otherwise, *sys.stderr* will be used. emit(record) If a formatter is specified, it is used to format the record. The record is then written to the stream followed by "terminator". If exception information is present, it is formatted using "traceback.print_exception()" and appended to the stream. flush() Flushes the stream by calling its "flush()" method. Note that the "close()" method is inherited from "Handler" and so does no output, so an explicit "flush()" call may be needed at times. setStream(stream) Sets the instance’s stream to the specified value, if it is different. The old stream is flushed before the new stream is set. Parameters: **stream** – The stream that the handler should use. Returns: the old stream, if the stream was changed, or *None* if it wasn’t. New in version 3.7. terminator String used as the terminator when writing a formatted record to a stream. Default value is "'\n'". If you don’t want a newline termination, you can set the handler instance’s "terminator" attribute to the empty string. In earlier versions, the terminator was hardcoded as "'\n'". New in version 3.2. FileHandler =========== The "FileHandler" class, located in the core "logging" package, sends logging output to a disk file. It inherits the output functionality from "StreamHandler". class logging.FileHandler(filename, mode='a', encoding=None, delay=False, errors=None) Returns a new instance of the "FileHandler" class. The specified file is opened and used as the stream for logging. If *mode* is not specified, "'a'" is used. If *encoding* is not "None", it is used to open the file with that encoding. If *delay* is true, then file opening is deferred until the first call to "emit()". By default, the file grows indefinitely. If *errors* is specified, it’s used to determine how encoding errors are handled. Changed in version 3.6: As well as string values, "Path" objects are also accepted for the *filename* argument. Changed in version 3.9: The *errors* parameter was added. close() Closes the file. emit(record) Outputs the record to the file. NullHandler =========== New in version 3.1. The "NullHandler" class, located in the core "logging" package, does not do any formatting or output. It is essentially a ‘no-op’ handler for use by library developers. class logging.NullHandler Returns a new instance of the "NullHandler" class. emit(record) This method does nothing. handle(record) This method does nothing. createLock() This method returns "None" for the lock, since there is no underlying I/O to which access needs to be serialized. See Configuring Logging for a Library for more information on how to use "NullHandler". WatchedFileHandler ================== The "WatchedFileHandler" class, located in the "logging.handlers" module, is a "FileHandler" which watches the file it is logging to. If the file changes, it is closed and reopened using the file name. A file change can happen because of usage of programs such as *newsyslog* and *logrotate* which perform log file rotation. This handler, intended for use under Unix/Linux, watches the file to see if it has changed since the last emit. (A file is deemed to have changed if its device or inode have changed.) If the file has changed, the old file stream is closed, and the file opened to get a new stream. This handler is not appropriate for use under Windows, because under Windows open log files cannot be moved or renamed - logging opens the files with exclusive locks - and so there is no need for such a handler. Furthermore, *ST_INO* is not supported under Windows; "stat()" always returns zero for this value. class logging.handlers.WatchedFileHandler(filename, mode='a', encoding=None, delay=False, errors=None) Returns a new instance of the "WatchedFileHandler" class. The specified file is opened and used as the stream for logging. If *mode* is not specified, "'a'" is used. If *encoding* is not "None", it is used to open the file with that encoding. If *delay* is true, then file opening is deferred until the first call to "emit()". By default, the file grows indefinitely. If *errors* is provided, it determines how encoding errors are handled. Changed in version 3.6: As well as string values, "Path" objects are also accepted for the *filename* argument. Changed in version 3.9: The *errors* parameter was added. reopenIfNeeded() Checks to see if the file has changed. If it has, the existing stream is flushed and closed and the file opened again, typically as a precursor to outputting the record to the file. New in version 3.6. emit(record) Outputs the record to the file, but first calls "reopenIfNeeded()" to reopen the file if it has changed. BaseRotatingHandler =================== The "BaseRotatingHandler" class, located in the "logging.handlers" module, is the base class for the rotating file handlers, "RotatingFileHandler" and "TimedRotatingFileHandler". You should not need to instantiate this class, but it has attributes and methods you may need to override. class logging.handlers.BaseRotatingHandler(filename, mode, encoding=None, delay=False, errors=None) The parameters are as for "FileHandler". The attributes are: namer If this attribute is set to a callable, the "rotation_filename()" method delegates to this callable. The parameters passed to the callable are those passed to "rotation_filename()". Note: The namer function is called quite a few times during rollover, so it should be as simple and as fast as possible. It should also return the same output every time for a given input, otherwise the rollover behaviour may not work as expected.It’s also worth noting that care should be taken when using a namer to preserve certain attributes in the filename which are used during rotation. For example, "RotatingFileHandler" expects to have a set of log files whose names contain successive integers, so that rotation works as expected, and "TimedRotatingFileHandler" deletes old log files (based on the "backupCount" parameter passed to the handler’s initializer) by determining the oldest files to delete. For this to happen, the filenames should be sortable using the date/time portion of the filename, and a namer needs to respect this. (If a namer is wanted that doesn’t respect this scheme, it will need to be used in a subclass of "TimedRotatingFileHandler" which overrides the "getFilesToDelete()" method to fit in with the custom naming scheme.) New in version 3.3. rotator If this attribute is set to a callable, the "rotate()" method delegates to this callable. The parameters passed to the callable are those passed to "rotate()". New in version 3.3. rotation_filename(default_name) Modify the filename of a log file when rotating. This is provided so that a custom filename can be provided. The default implementation calls the ‘namer’ attribute of the handler, if it’s callable, passing the default name to it. If the attribute isn’t callable (the default is "None"), the name is returned unchanged. Parameters: **default_name** – The default name for the log file. New in version 3.3. rotate(source, dest) When rotating, rotate the current log. The default implementation calls the ‘rotator’ attribute of the handler, if it’s callable, passing the source and dest arguments to it. If the attribute isn’t callable (the default is "None"), the source is simply renamed to the destination. Parameters: * **source** – The source filename. This is normally the base filename, e.g. ‘test.log’. * **dest** – The destination filename. This is normally what the source is rotated to, e.g. ‘test.log.1’. New in version 3.3. The reason the attributes exist is to save you having to subclass - you can use the same callables for instances of "RotatingFileHandler" and "TimedRotatingFileHandler". If either the namer or rotator callable raises an exception, this will be handled in the same way as any other exception during an "emit()" call, i.e. via the "handleError()" method of the handler. If you need to make more significant changes to rotation processing, you can override the methods. For an example, see Using a rotator and namer to customize log rotation processing. RotatingFileHandler =================== The "RotatingFileHandler" class, located in the "logging.handlers" module, supports rotation of disk log files. class logging.handlers.RotatingFileHandler(filename, mode='a', maxBytes=0, backupCount=0, encoding=None, delay=False, errors=None) Returns a new instance of the "RotatingFileHandler" class. The specified file is opened and used as the stream for logging. If *mode* is not specified, "'a'" is used. If *encoding* is not "None", it is used to open the file with that encoding. If *delay* is true, then file opening is deferred until the first call to "emit()". By default, the file grows indefinitely. If *errors* is provided, it determines how encoding errors are handled. You can use the *maxBytes* and *backupCount* values to allow the file to *rollover* at a predetermined size. When the size is about to be exceeded, the file is closed and a new file is silently opened for output. Rollover occurs whenever the current log file is nearly *maxBytes* in length; but if either of *maxBytes* or *backupCount* is zero, rollover never occurs, so you generally want to set *backupCount* to at least 1, and have a non-zero *maxBytes*. When *backupCount* is non-zero, the system will save old log files by appending the extensions ‘.1’, ‘.2’ etc., to the filename. For example, with a *backupCount* of 5 and a base file name of "app.log", you would get "app.log", "app.log.1", "app.log.2", up to "app.log.5". The file being written to is always "app.log". When this file is filled, it is closed and renamed to "app.log.1", and if files "app.log.1", "app.log.2", etc. exist, then they are renamed to "app.log.2", "app.log.3" etc. respectively. Changed in version 3.6: As well as string values, "Path" objects are also accepted for the *filename* argument. Changed in version 3.9: The *errors* parameter was added. doRollover() Does a rollover, as described above. emit(record) Outputs the record to the file, catering for rollover as described previously. TimedRotatingFileHandler ======================== The "TimedRotatingFileHandler" class, located in the "logging.handlers" module, supports rotation of disk log files at certain timed intervals. class logging.handlers.TimedRotatingFileHandler(filename, when='h', interval=1, backupCount=0, encoding=None, delay=False, utc=False, atTime=None, errors=None) Returns a new instance of the "TimedRotatingFileHandler" class. The specified file is opened and used as the stream for logging. On rotating it also sets the filename suffix. Rotating happens based on the product of *when* and *interval*. You can use the *when* to specify the type of *interval*. The list of possible values is below. Note that they are not case sensitive. +------------------+------------------------------+---------------------------+ | Value | Type of interval | If/how *atTime* is used | |==================|==============================|===========================| | "'S'" | Seconds | Ignored | +------------------+------------------------------+---------------------------+ | "'M'" | Minutes | Ignored | +------------------+------------------------------+---------------------------+ | "'H'" | Hours | Ignored | +------------------+------------------------------+---------------------------+ | "'D'" | Days | Ignored | +------------------+------------------------------+---------------------------+ | "'W0'-'W6'" | Weekday (0=Monday) | Used to compute initial | | | | rollover time | +------------------+------------------------------+---------------------------+ | "'midnight'" | Roll over at midnight, if | Used to compute initial | | | *atTime* not specified, else | rollover time | | | at time *atTime* | | +------------------+------------------------------+---------------------------+ When using weekday-based rotation, specify ‘W0’ for Monday, ‘W1’ for Tuesday, and so on up to ‘W6’ for Sunday. In this case, the value passed for *interval* isn’t used. The system will save old log files by appending extensions to the filename. The extensions are date-and-time based, using the strftime format "%Y-%m-%d_%H-%M-%S" or a leading portion thereof, depending on the rollover interval. When computing the next rollover time for the first time (when the handler is created), the last modification time of an existing log file, or else the current time, is used to compute when the next rotation will occur. If the *utc* argument is true, times in UTC will be used; otherwise local time is used. If *backupCount* is nonzero, at most *backupCount* files will be kept, and if more would be created when rollover occurs, the oldest one is deleted. The deletion logic uses the interval to determine which files to delete, so changing the interval may leave old files lying around. If *delay* is true, then file opening is deferred until the first call to "emit()". If *atTime* is not "None", it must be a "datetime.time" instance which specifies the time of day when rollover occurs, for the cases where rollover is set to happen “at midnight” or “on a particular weekday”. Note that in these cases, the *atTime* value is effectively used to compute the *initial* rollover, and subsequent rollovers would be calculated via the normal interval calculation. If *errors* is specified, it’s used to determine how encoding errors are handled. Note: Calculation of the initial rollover time is done when the handler is initialised. Calculation of subsequent rollover times is done only when rollover occurs, and rollover occurs only when emitting output. If this is not kept in mind, it might lead to some confusion. For example, if an interval of “every minute” is set, that does not mean you will always see log files with times (in the filename) separated by a minute; if, during application execution, logging output is generated more frequently than once a minute, *then* you can expect to see log files with times separated by a minute. If, on the other hand, logging messages are only output once every five minutes (say), then there will be gaps in the file times corresponding to the minutes where no output (and hence no rollover) occurred. Changed in version 3.4: *atTime* parameter was added. Changed in version 3.6: As well as string values, "Path" objects are also accepted for the *filename* argument. Changed in version 3.9: The *errors* parameter was added. doRollover() Does a rollover, as described above. emit(record) Outputs the record to the file, catering for rollover as described above. getFilesToDelete() Returns a list of filenames which should be deleted as part of rollover. These are the absolute paths of the oldest backup log files written by the handler. SocketHandler ============= The "SocketHandler" class, located in the "logging.handlers" module, sends logging output to a network socket. The base class uses a TCP socket. class logging.handlers.SocketHandler(host, port) Returns a new instance of the "SocketHandler" class intended to communicate with a remote machine whose address is given by *host* and *port*. Changed in version 3.4: If "port" is specified as "None", a Unix domain socket is created using the value in "host" - otherwise, a TCP socket is created. close() Closes the socket. emit() Pickles the record’s attribute dictionary and writes it to the socket in binary format. If there is an error with the socket, silently drops the packet. If the connection was previously lost, re-establishes the connection. To unpickle the record at the receiving end into a "LogRecord", use the "makeLogRecord()" function. handleError() Handles an error which has occurred during "emit()". The most likely cause is a lost connection. Closes the socket so that we can retry on the next event. makeSocket() This is a factory method which allows subclasses to define the precise type of socket they want. The default implementation creates a TCP socket ("socket.SOCK_STREAM"). makePickle(record) Pickles the record’s attribute dictionary in binary format with a length prefix, and returns it ready for transmission across the socket. The details of this operation are equivalent to: data = pickle.dumps(record_attr_dict, 1) datalen = struct.pack('>L', len(data)) return datalen + data Note that pickles aren’t completely secure. If you are concerned about security, you may want to override this method to implement a more secure mechanism. For example, you can sign pickles using HMAC and then verify them on the receiving end, or alternatively you can disable unpickling of global objects on the receiving end. send(packet) Send a pickled byte-string *packet* to the socket. The format of the sent byte-string is as described in the documentation for "makePickle()". This function allows for partial sends, which can happen when the network is busy. createSocket() Tries to create a socket; on failure, uses an exponential back- off algorithm. On initial failure, the handler will drop the message it was trying to send. When subsequent messages are handled by the same instance, it will not try connecting until some time has passed. The default parameters are such that the initial delay is one second, and if after that delay the connection still can’t be made, the handler will double the delay each time up to a maximum of 30 seconds. This behaviour is controlled by the following handler attributes: * "retryStart" (initial delay, defaulting to 1.0 seconds). * "retryFactor" (multiplier, defaulting to 2.0). * "retryMax" (maximum delay, defaulting to 30.0 seconds). This means that if the remote listener starts up *after* the handler has been used, you could lose messages (since the handler won’t even attempt a connection until the delay has elapsed, but just silently drop messages during the delay period). DatagramHandler =============== The "DatagramHandler" class, located in the "logging.handlers" module, inherits from "SocketHandler" to support sending logging messages over UDP sockets. class logging.handlers.DatagramHandler(host, port) Returns a new instance of the "DatagramHandler" class intended to communicate with a remote machine whose address is given by *host* and *port*. Changed in version 3.4: If "port" is specified as "None", a Unix domain socket is created using the value in "host" - otherwise, a UDP socket is created. emit() Pickles the record’s attribute dictionary and writes it to the socket in binary format. If there is an error with the socket, silently drops the packet. To unpickle the record at the receiving end into a "LogRecord", use the "makeLogRecord()" function. makeSocket() The factory method of "SocketHandler" is here overridden to create a UDP socket ("socket.SOCK_DGRAM"). send(s) Send a pickled byte-string to a socket. The format of the sent byte-string is as described in the documentation for "SocketHandler.makePickle()". SysLogHandler ============= The "SysLogHandler" class, located in the "logging.handlers" module, supports sending logging messages to a remote or local Unix syslog. class logging.handlers.SysLogHandler(address=('localhost', SYSLOG_UDP_PORT), facility=LOG_USER, socktype=socket.SOCK_DGRAM) Returns a new instance of the "SysLogHandler" class intended to communicate with a remote Unix machine whose address is given by *address* in the form of a "(host, port)" tuple. If *address* is not specified, "('localhost', 514)" is used. The address is used to open a socket. An alternative to providing a "(host, port)" tuple is providing an address as a string, for example ‘/dev/log’. In this case, a Unix domain socket is used to send the message to the syslog. If *facility* is not specified, "LOG_USER" is used. The type of socket opened depends on the *socktype* argument, which defaults to "socket.SOCK_DGRAM" and thus opens a UDP socket. To open a TCP socket (for use with the newer syslog daemons such as rsyslog), specify a value of "socket.SOCK_STREAM". Note that if your server is not listening on UDP port 514, "SysLogHandler" may appear not to work. In that case, check what address you should be using for a domain socket - it’s system dependent. For example, on Linux it’s usually ‘/dev/log’ but on OS/X it’s ‘/var/run/syslog’. You’ll need to check your platform and use the appropriate address (you may need to do this check at runtime if your application needs to run on several platforms). On Windows, you pretty much have to use the UDP option. Changed in version 3.2: *socktype* was added. close() Closes the socket to the remote host. emit(record) The record is formatted, and then sent to the syslog server. If exception information is present, it is *not* sent to the server. Changed in version 3.2.1: (See: bpo-12168.) In earlier versions, the message sent to the syslog daemons was always terminated with a NUL byte, because early versions of these daemons expected a NUL terminated message - even though it’s not in the relevant specification (**RFC 5424**). More recent versions of these daemons don’t expect the NUL byte but strip it off if it’s there, and even more recent daemons (which adhere more closely to RFC 5424) pass the NUL byte on as part of the message.To enable easier handling of syslog messages in the face of all these differing daemon behaviours, the appending of the NUL byte has been made configurable, through the use of a class-level attribute, "append_nul". This defaults to "True" (preserving the existing behaviour) but can be set to "False" on a "SysLogHandler" instance in order for that instance to *not* append the NUL terminator. Changed in version 3.3: (See: bpo-12419.) In earlier versions, there was no facility for an “ident” or “tag” prefix to identify the source of the message. This can now be specified using a class-level attribute, defaulting to """" to preserve existing behaviour, but which can be overridden on a "SysLogHandler" instance in order for that instance to prepend the ident to every message handled. Note that the provided ident must be text, not bytes, and is prepended to the message exactly as is. encodePriority(facility, priority) Encodes the facility and priority into an integer. You can pass in strings or integers - if strings are passed, internal mapping dictionaries are used to convert them to integers. The symbolic "LOG_" values are defined in "SysLogHandler" and mirror the values defined in the "sys/syslog.h" header file. **Priorities** +----------------------------+-----------------+ | Name (string) | Symbolic value | |============================|=================| | "alert" | LOG_ALERT | +----------------------------+-----------------+ | "crit" or "critical" | LOG_CRIT | +----------------------------+-----------------+ | "debug" | LOG_DEBUG | +----------------------------+-----------------+ | "emerg" or "panic" | LOG_EMERG | +----------------------------+-----------------+ | "err" or "error" | LOG_ERR | +----------------------------+-----------------+ | "info" | LOG_INFO | +----------------------------+-----------------+ | "notice" | LOG_NOTICE | +----------------------------+-----------------+ | "warn" or "warning" | LOG_WARNING | +----------------------------+-----------------+ **Facilities** +-----------------+-----------------+ | Name (string) | Symbolic value | |=================|=================| | "auth" | LOG_AUTH | +-----------------+-----------------+ | "authpriv" | LOG_AUTHPRIV | +-----------------+-----------------+ | "cron" | LOG_CRON | +-----------------+-----------------+ | "daemon" | LOG_DAEMON | +-----------------+-----------------+ | "ftp" | LOG_FTP | +-----------------+-----------------+ | "kern" | LOG_KERN | +-----------------+-----------------+ | "lpr" | LOG_LPR | +-----------------+-----------------+ | "mail" | LOG_MAIL | +-----------------+-----------------+ | "news" | LOG_NEWS | +-----------------+-----------------+ | "syslog" | LOG_SYSLOG | +-----------------+-----------------+ | "user" | LOG_USER | +-----------------+-----------------+ | "uucp" | LOG_UUCP | +-----------------+-----------------+ | "local0" | LOG_LOCAL0 | +-----------------+-----------------+ | "local1" | LOG_LOCAL1 | +-----------------+-----------------+ | "local2" | LOG_LOCAL2 | +-----------------+-----------------+ | "local3" | LOG_LOCAL3 | +-----------------+-----------------+ | "local4" | LOG_LOCAL4 | +-----------------+-----------------+ | "local5" | LOG_LOCAL5 | +-----------------+-----------------+ | "local6" | LOG_LOCAL6 | +-----------------+-----------------+ | "local7" | LOG_LOCAL7 | +-----------------+-----------------+ mapPriority(levelname) Maps a logging level name to a syslog priority name. You may need to override this if you are using custom levels, or if the default algorithm is not suitable for your needs. The default algorithm maps "DEBUG", "INFO", "WARNING", "ERROR" and "CRITICAL" to the equivalent syslog names, and all other level names to ‘warning’. NTEventLogHandler ================= The "NTEventLogHandler" class, located in the "logging.handlers" module, supports sending logging messages to a local Windows NT, Windows 2000 or Windows XP event log. Before you can use it, you need Mark Hammond’s Win32 extensions for Python installed. class logging.handlers.NTEventLogHandler(appname, dllname=None, logtype='Application') Returns a new instance of the "NTEventLogHandler" class. The *appname* is used to define the application name as it appears in the event log. An appropriate registry entry is created using this name. The *dllname* should give the fully qualified pathname of a .dll or .exe which contains message definitions to hold in the log (if not specified, "'win32service.pyd'" is used - this is installed with the Win32 extensions and contains some basic placeholder message definitions. Note that use of these placeholders will make your event logs big, as the entire message source is held in the log. If you want slimmer logs, you have to pass in the name of your own .dll or .exe which contains the message definitions you want to use in the event log). The *logtype* is one of "'Application'", "'System'" or "'Security'", and defaults to "'Application'". close() At this point, you can remove the application name from the registry as a source of event log entries. However, if you do this, you will not be able to see the events as you intended in the Event Log Viewer - it needs to be able to access the registry to get the .dll name. The current version does not do this. emit(record) Determines the message ID, event category and event type, and then logs the message in the NT event log. getEventCategory(record) Returns the event category for the record. Override this if you want to specify your own categories. This version returns 0. getEventType(record) Returns the event type for the record. Override this if you want to specify your own types. This version does a mapping using the handler’s typemap attribute, which is set up in "__init__()" to a dictionary which contains mappings for "DEBUG", "INFO", "WARNING", "ERROR" and "CRITICAL". If you are using your own levels, you will either need to override this method or place a suitable dictionary in the handler’s *typemap* attribute. getMessageID(record) Returns the message ID for the record. If you are using your own messages, you could do this by having the *msg* passed to the logger being an ID rather than a format string. Then, in here, you could use a dictionary lookup to get the message ID. This version returns 1, which is the base message ID in "win32service.pyd". SMTPHandler =========== The "SMTPHandler" class, located in the "logging.handlers" module, supports sending logging messages to an email address via SMTP. class logging.handlers.SMTPHandler(mailhost, fromaddr, toaddrs, subject, credentials=None, secure=None, timeout=1.0) Returns a new instance of the "SMTPHandler" class. The instance is initialized with the from and to addresses and subject line of the email. The *toaddrs* should be a list of strings. To specify a non- standard SMTP port, use the (host, port) tuple format for the *mailhost* argument. If you use a string, the standard SMTP port is used. If your SMTP server requires authentication, you can specify a (username, password) tuple for the *credentials* argument. To specify the use of a secure protocol (TLS), pass in a tuple to the *secure* argument. This will only be used when authentication credentials are supplied. The tuple should be either an empty tuple, or a single-value tuple with the name of a keyfile, or a 2-value tuple with the names of the keyfile and certificate file. (This tuple is passed to the "smtplib.SMTP.starttls()" method.) A timeout can be specified for communication with the SMTP server using the *timeout* argument. New in version 3.3: The *timeout* argument was added. emit(record) Formats the record and sends it to the specified addressees. getSubject(record) If you want to specify a subject line which is record-dependent, override this method. MemoryHandler ============= The "MemoryHandler" class, located in the "logging.handlers" module, supports buffering of logging records in memory, periodically flushing them to a *target* handler. Flushing occurs whenever the buffer is full, or when an event of a certain severity or greater is seen. "MemoryHandler" is a subclass of the more general "BufferingHandler", which is an abstract class. This buffers logging records in memory. Whenever each record is added to the buffer, a check is made by calling "shouldFlush()" to see if the buffer should be flushed. If it should, then "flush()" is expected to do the flushing. class logging.handlers.BufferingHandler(capacity) Initializes the handler with a buffer of the specified capacity. Here, *capacity* means the number of logging records buffered. emit(record) Append the record to the buffer. If "shouldFlush()" returns true, call "flush()" to process the buffer. flush() You can override this to implement custom flushing behavior. This version just zaps the buffer to empty. shouldFlush(record) Return "True" if the buffer is up to capacity. This method can be overridden to implement custom flushing strategies. class logging.handlers.MemoryHandler(capacity, flushLevel=ERROR, target=None, flushOnClose=True) Returns a new instance of the "MemoryHandler" class. The instance is initialized with a buffer size of *capacity* (number of records buffered). If *flushLevel* is not specified, "ERROR" is used. If no *target* is specified, the target will need to be set using "setTarget()" before this handler does anything useful. If *flushOnClose* is specified as "False", then the buffer is *not* flushed when the handler is closed. If not specified or specified as "True", the previous behaviour of flushing the buffer will occur when the handler is closed. Changed in version 3.6: The *flushOnClose* parameter was added. close() Calls "flush()", sets the target to "None" and clears the buffer. flush() For a "MemoryHandler", flushing means just sending the buffered records to the target, if there is one. The buffer is also cleared when this happens. Override if you want different behavior. setTarget(target) Sets the target handler for this handler. shouldFlush(record) Checks for buffer full or a record at the *flushLevel* or higher. HTTPHandler =========== The "HTTPHandler" class, located in the "logging.handlers" module, supports sending logging messages to a Web server, using either "GET" or "POST" semantics. class logging.handlers.HTTPHandler(host, url, method='GET', secure=False, credentials=None, context=None) Returns a new instance of the "HTTPHandler" class. The *host* can be of the form "host:port", should you need to use a specific port number. If no *method* is specified, "GET" is used. If *secure* is true, a HTTPS connection will be used. The *context* parameter may be set to a "ssl.SSLContext" instance to configure the SSL settings used for the HTTPS connection. If *credentials* is specified, it should be a 2-tuple consisting of userid and password, which will be placed in a HTTP ‘Authorization’ header using Basic authentication. If you specify credentials, you should also specify secure=True so that your userid and password are not passed in cleartext across the wire. Changed in version 3.5: The *context* parameter was added. mapLogRecord(record) Provides a dictionary, based on "record", which is to be URL- encoded and sent to the web server. The default implementation just returns "record.__dict__". This method can be overridden if e.g. only a subset of "LogRecord" is to be sent to the web server, or if more specific customization of what’s sent to the server is required. emit(record) Sends the record to the Web server as a URL-encoded dictionary. The "mapLogRecord()" method is used to convert the record to the dictionary to be sent. Note: Since preparing a record for sending it to a Web server is not the same as a generic formatting operation, using "setFormatter()" to specify a "Formatter" for a "HTTPHandler" has no effect. Instead of calling "format()", this handler calls "mapLogRecord()" and then "urllib.parse.urlencode()" to encode the dictionary in a form suitable for sending to a Web server. QueueHandler ============ New in version 3.2. The "QueueHandler" class, located in the "logging.handlers" module, supports sending logging messages to a queue, such as those implemented in the "queue" or "multiprocessing" modules. Along with the "QueueListener" class, "QueueHandler" can be used to let handlers do their work on a separate thread from the one which does the logging. This is important in Web applications and also other service applications where threads servicing clients need to respond as quickly as possible, while any potentially slow operations (such as sending an email via "SMTPHandler") are done on a separate thread. class logging.handlers.QueueHandler(queue) Returns a new instance of the "QueueHandler" class. The instance is initialized with the queue to send messages to. The *queue* can be any queue-like object; it’s used as-is by the "enqueue()" method, which needs to know how to send messages to it. The queue is not *required* to have the task tracking API, which means that you can use "SimpleQueue" instances for *queue*. emit(record) Enqueues the result of preparing the LogRecord. Should an exception occur (e.g. because a bounded queue has filled up), the "handleError()" method is called to handle the error. This can result in the record silently being dropped (if "logging.raiseExceptions" is "False") or a message printed to "sys.stderr" (if "logging.raiseExceptions" is "True"). prepare(record) Prepares a record for queuing. The object returned by this method is enqueued. The base implementation formats the record to merge the message, arguments, and exception information, if present. It also removes unpickleable items from the record in-place. You might want to override this method if you want to convert the record to a dict or JSON string, or send a modified copy of the record while leaving the original intact. enqueue(record) Enqueues the record on the queue using "put_nowait()"; you may want to override this if you want to use blocking behaviour, or a timeout, or a customized queue implementation. QueueListener ============= New in version 3.2. The "QueueListener" class, located in the "logging.handlers" module, supports receiving logging messages from a queue, such as those implemented in the "queue" or "multiprocessing" modules. The messages are received from a queue in an internal thread and passed, on the same thread, to one or more handlers for processing. While "QueueListener" is not itself a handler, it is documented here because it works hand-in-hand with "QueueHandler". Along with the "QueueHandler" class, "QueueListener" can be used to let handlers do their work on a separate thread from the one which does the logging. This is important in Web applications and also other service applications where threads servicing clients need to respond as quickly as possible, while any potentially slow operations (such as sending an email via "SMTPHandler") are done on a separate thread. class logging.handlers.QueueListener(queue, *handlers, respect_handler_level=False) Returns a new instance of the "QueueListener" class. The instance is initialized with the queue to send messages to and a list of handlers which will handle entries placed on the queue. The queue can be any queue-like object; it’s passed as-is to the "dequeue()" method, which needs to know how to get messages from it. The queue is not *required* to have the task tracking API (though it’s used if available), which means that you can use "SimpleQueue" instances for *queue*. If "respect_handler_level" is "True", a handler’s level is respected (compared with the level for the message) when deciding whether to pass messages to that handler; otherwise, the behaviour is as in previous Python versions - to always pass each message to each handler. Changed in version 3.5: The "respect_handler_level" argument was added. dequeue(block) Dequeues a record and return it, optionally blocking. The base implementation uses "get()". You may want to override this method if you want to use timeouts or work with custom queue implementations. prepare(record) Prepare a record for handling. This implementation just returns the passed-in record. You may want to override this method if you need to do any custom marshalling or manipulation of the record before passing it to the handlers. handle(record) Handle a record. This just loops through the handlers offering them the record to handle. The actual object passed to the handlers is that which is returned from "prepare()". start() Starts the listener. This starts up a background thread to monitor the queue for LogRecords to process. stop() Stops the listener. This asks the thread to terminate, and then waits for it to do so. Note that if you don’t call this before your application exits, there may be some records still left on the queue, which won’t be processed. enqueue_sentinel() Writes a sentinel to the queue to tell the listener to quit. This implementation uses "put_nowait()". You may want to override this method if you want to use timeouts or work with custom queue implementations. New in version 3.3. See also: Module "logging" API reference for the logging module. Module "logging.config" Configuration API for the logging module. "logging" — Logging facility for Python *************************************** **Source code:** Lib/logging/__init__.py Important ^^^^^^^^^ This page contains the API reference information. For tutorial information and discussion of more advanced topics, see * Basic Tutorial * Advanced Tutorial * Logging Cookbook ====================================================================== This module defines functions and classes which implement a flexible event logging system for applications and libraries. The key benefit of having the logging API provided by a standard library module is that all Python modules can participate in logging, so your application log can include your own messages integrated with messages from third-party modules. The module provides a lot of functionality and flexibility. If you are unfamiliar with logging, the best way to get to grips with it is to see the tutorials (see the links on the right). The basic classes defined by the module, together with their functions, are listed below. * Loggers expose the interface that application code directly uses. * Handlers send the log records (created by loggers) to the appropriate destination. * Filters provide a finer grained facility for determining which log records to output. * Formatters specify the layout of log records in the final output. Logger Objects ============== Loggers have the following attributes and methods. Note that Loggers should *NEVER* be instantiated directly, but always through the module-level function "logging.getLogger(name)". Multiple calls to "getLogger()" with the same name will always return a reference to the same Logger object. The "name" is potentially a period-separated hierarchical value, like "foo.bar.baz" (though it could also be just plain "foo", for example). Loggers that are further down in the hierarchical list are children of loggers higher up in the list. For example, given a logger with a name of "foo", loggers with names of "foo.bar", "foo.bar.baz", and "foo.bam" are all descendants of "foo". The logger name hierarchy is analogous to the Python package hierarchy, and identical to it if you organise your loggers on a per-module basis using the recommended construction "logging.getLogger(__name__)". That’s because in a module, "__name__" is the module’s name in the Python package namespace. class logging.Logger propagate If this attribute evaluates to true, events logged to this logger will be passed to the handlers of higher level (ancestor) loggers, in addition to any handlers attached to this logger. Messages are passed directly to the ancestor loggers’ handlers - neither the level nor filters of the ancestor loggers in question are considered. If this evaluates to false, logging messages are not passed to the handlers of ancestor loggers. Spelling it out with an example: If the propagate attribute of the logger named "A.B.C" evaluates to true, any event logged to "A.B.C" via a method call such as "logging.getLogger('A.B.C').error(...)" will [subject to passing that logger’s level and filter settings] be passed in turn to any handlers attached to loggers named "A.B", "A" and the root logger, after first being passed to any handlers attached to "A.B.C". If any logger in the chain "A.B.C", "A.B", "A" has its "propagate" attribute set to false, then that is the last logger whose handlers are offered the event to handle, and propagation stops at that point. The constructor sets this attribute to "True". Note: If you attach a handler to a logger *and* one or more of its ancestors, it may emit the same record multiple times. In general, you should not need to attach a handler to more than one logger - if you just attach it to the appropriate logger which is highest in the logger hierarchy, then it will see all events logged by all descendant loggers, provided that their propagate setting is left set to "True". A common scenario is to attach handlers only to the root logger, and to let propagation take care of the rest. setLevel(level) Sets the threshold for this logger to *level*. Logging messages which are less severe than *level* will be ignored; logging messages which have severity *level* or higher will be emitted by whichever handler or handlers service this logger, unless a handler’s level has been set to a higher severity level than *level*. When a logger is created, the level is set to "NOTSET" (which causes all messages to be processed when the logger is the root logger, or delegation to the parent when the logger is a non- root logger). Note that the root logger is created with level "WARNING". The term ‘delegation to the parent’ means that if a logger has a level of NOTSET, its chain of ancestor loggers is traversed until either an ancestor with a level other than NOTSET is found, or the root is reached. If an ancestor is found with a level other than NOTSET, then that ancestor’s level is treated as the effective level of the logger where the ancestor search began, and is used to determine how a logging event is handled. If the root is reached, and it has a level of NOTSET, then all messages will be processed. Otherwise, the root’s level will be used as the effective level. See Logging Levels for a list of levels. Changed in version 3.2: The *level* parameter now accepts a string representation of the level such as ‘INFO’ as an alternative to the integer constants such as "INFO". Note, however, that levels are internally stored as integers, and methods such as e.g. "getEffectiveLevel()" and "isEnabledFor()" will return/expect to be passed integers. isEnabledFor(level) Indicates if a message of severity *level* would be processed by this logger. This method checks first the module-level level set by "logging.disable(level)" and then the logger’s effective level as determined by "getEffectiveLevel()". getEffectiveLevel() Indicates the effective level for this logger. If a value other than "NOTSET" has been set using "setLevel()", it is returned. Otherwise, the hierarchy is traversed towards the root until a value other than "NOTSET" is found, and that value is returned. The value returned is an integer, typically one of "logging.DEBUG", "logging.INFO" etc. getChild(suffix) Returns a logger which is a descendant to this logger, as determined by the suffix. Thus, "logging.getLogger('abc').getChild('def.ghi')" would return the same logger as would be returned by "logging.getLogger('abc.def.ghi')". This is a convenience method, useful when the parent logger is named using e.g. "__name__" rather than a literal string. New in version 3.2. debug(msg, *args, **kwargs) Logs a message with level "DEBUG" on this logger. The *msg* is the message format string, and the *args* are the arguments which are merged into *msg* using the string formatting operator. (Note that this means that you can use keywords in the format string, together with a single dictionary argument.) No % formatting operation is performed on *msg* when no *args* are supplied. There are four keyword arguments in *kwargs* which are inspected: *exc_info*, *stack_info*, *stacklevel* and *extra*. If *exc_info* does not evaluate as false, it causes exception information to be added to the logging message. If an exception tuple (in the format returned by "sys.exc_info()") or an exception instance is provided, it is used; otherwise, "sys.exc_info()" is called to get the exception information. The second optional keyword argument is *stack_info*, which defaults to "False". If true, stack information is added to the logging message, including the actual logging call. Note that this is not the same stack information as that displayed through specifying *exc_info*: The former is stack frames from the bottom of the stack up to the logging call in the current thread, whereas the latter is information about stack frames which have been unwound, following an exception, while searching for exception handlers. You can specify *stack_info* independently of *exc_info*, e.g. to just show how you got to a certain point in your code, even when no exceptions were raised. The stack frames are printed following a header line which says: Stack (most recent call last): This mimics the "Traceback (most recent call last):" which is used when displaying exception frames. The third optional keyword argument is *stacklevel*, which defaults to "1". If greater than 1, the corresponding number of stack frames are skipped when computing the line number and function name set in the "LogRecord" created for the logging event. This can be used in logging helpers so that the function name, filename and line number recorded are not the information for the helper function/method, but rather its caller. The name of this parameter mirrors the equivalent one in the "warnings" module. The fourth keyword argument is *extra* which can be used to pass a dictionary which is used to populate the __dict__ of the "LogRecord" created for the logging event with user-defined attributes. These custom attributes can then be used as you like. For example, they could be incorporated into logged messages. For example: FORMAT = '%(asctime)s %(clientip)-15s %(user)-8s %(message)s' logging.basicConfig(format=FORMAT) d = {'clientip': '192.168.0.1', 'user': 'fbloggs'} logger = logging.getLogger('tcpserver') logger.warning('Protocol problem: %s', 'connection reset', extra=d) would print something like 2006-02-08 22:20:02,165 192.168.0.1 fbloggs Protocol problem: connection reset The keys in the dictionary passed in *extra* should not clash with the keys used by the logging system. (See the "Formatter" documentation for more information on which keys are used by the logging system.) If you choose to use these attributes in logged messages, you need to exercise some care. In the above example, for instance, the "Formatter" has been set up with a format string which expects ‘clientip’ and ‘user’ in the attribute dictionary of the "LogRecord". If these are missing, the message will not be logged because a string formatting exception will occur. So in this case, you always need to pass the *extra* dictionary with these keys. While this might be annoying, this feature is intended for use in specialized circumstances, such as multi-threaded servers where the same code executes in many contexts, and interesting conditions which arise are dependent on this context (such as remote client IP address and authenticated user name, in the above example). In such circumstances, it is likely that specialized "Formatter"s would be used with particular "Handler"s. Changed in version 3.2: The *stack_info* parameter was added. Changed in version 3.5: The *exc_info* parameter can now accept exception instances. Changed in version 3.8: The *stacklevel* parameter was added. info(msg, *args, **kwargs) Logs a message with level "INFO" on this logger. The arguments are interpreted as for "debug()". warning(msg, *args, **kwargs) Logs a message with level "WARNING" on this logger. The arguments are interpreted as for "debug()". Note: There is an obsolete method "warn" which is functionally identical to "warning". As "warn" is deprecated, please do not use it - use "warning" instead. error(msg, *args, **kwargs) Logs a message with level "ERROR" on this logger. The arguments are interpreted as for "debug()". critical(msg, *args, **kwargs) Logs a message with level "CRITICAL" on this logger. The arguments are interpreted as for "debug()". log(level, msg, *args, **kwargs) Logs a message with integer level *level* on this logger. The other arguments are interpreted as for "debug()". exception(msg, *args, **kwargs) Logs a message with level "ERROR" on this logger. The arguments are interpreted as for "debug()". Exception info is added to the logging message. This method should only be called from an exception handler. addFilter(filter) Adds the specified filter *filter* to this logger. removeFilter(filter) Removes the specified filter *filter* from this logger. filter(record) Apply this logger’s filters to the record and return "True" if the record is to be processed. The filters are consulted in turn, until one of them returns a false value. If none of them return a false value, the record will be processed (passed to handlers). If one returns a false value, no further processing of the record occurs. addHandler(hdlr) Adds the specified handler *hdlr* to this logger. removeHandler(hdlr) Removes the specified handler *hdlr* from this logger. findCaller(stack_info=False, stacklevel=1) Finds the caller’s source filename and line number. Returns the filename, line number, function name and stack information as a 4-element tuple. The stack information is returned as "None" unless *stack_info* is "True". The *stacklevel* parameter is passed from code calling the "debug()" and other APIs. If greater than 1, the excess is used to skip stack frames before determining the values to be returned. This will generally be useful when calling logging APIs from helper/wrapper code, so that the information in the event log refers not to the helper/wrapper code, but to the code that calls it. handle(record) Handles a record by passing it to all handlers associated with this logger and its ancestors (until a false value of *propagate* is found). This method is used for unpickled records received from a socket, as well as those created locally. Logger-level filtering is applied using "filter()". makeRecord(name, level, fn, lno, msg, args, exc_info, func=None, extra=None, sinfo=None) This is a factory method which can be overridden in subclasses to create specialized "LogRecord" instances. hasHandlers() Checks to see if this logger has any handlers configured. This is done by looking for handlers in this logger and its parents in the logger hierarchy. Returns "True" if a handler was found, else "False". The method stops searching up the hierarchy whenever a logger with the ‘propagate’ attribute set to false is found - that will be the last logger which is checked for the existence of handlers. New in version 3.2. Changed in version 3.7: Loggers can now be pickled and unpickled. Logging Levels ============== The numeric values of logging levels are given in the following table. These are primarily of interest if you want to define your own levels, and need them to have specific values relative to the predefined levels. If you define a level with the same numeric value, it overwrites the predefined value; the predefined name is lost. +----------------+-----------------+ | Level | Numeric value | |================|=================| | "CRITICAL" | 50 | +----------------+-----------------+ | "ERROR" | 40 | +----------------+-----------------+ | "WARNING" | 30 | +----------------+-----------------+ | "INFO" | 20 | +----------------+-----------------+ | "DEBUG" | 10 | +----------------+-----------------+ | "NOTSET" | 0 | +----------------+-----------------+ Handler Objects =============== Handlers have the following attributes and methods. Note that "Handler" is never instantiated directly; this class acts as a base for more useful subclasses. However, the "__init__()" method in subclasses needs to call "Handler.__init__()". class logging.Handler __init__(level=NOTSET) Initializes the "Handler" instance by setting its level, setting the list of filters to the empty list and creating a lock (using "createLock()") for serializing access to an I/O mechanism. createLock() Initializes a thread lock which can be used to serialize access to underlying I/O functionality which may not be threadsafe. acquire() Acquires the thread lock created with "createLock()". release() Releases the thread lock acquired with "acquire()". setLevel(level) Sets the threshold for this handler to *level*. Logging messages which are less severe than *level* will be ignored. When a handler is created, the level is set to "NOTSET" (which causes all messages to be processed). See Logging Levels for a list of levels. Changed in version 3.2: The *level* parameter now accepts a string representation of the level such as ‘INFO’ as an alternative to the integer constants such as "INFO". setFormatter(fmt) Sets the "Formatter" for this handler to *fmt*. addFilter(filter) Adds the specified filter *filter* to this handler. removeFilter(filter) Removes the specified filter *filter* from this handler. filter(record) Apply this handler’s filters to the record and return "True" if the record is to be processed. The filters are consulted in turn, until one of them returns a false value. If none of them return a false value, the record will be emitted. If one returns a false value, the handler will not emit the record. flush() Ensure all logging output has been flushed. This version does nothing and is intended to be implemented by subclasses. close() Tidy up any resources used by the handler. This version does no output but removes the handler from an internal list of handlers which is closed when "shutdown()" is called. Subclasses should ensure that this gets called from overridden "close()" methods. handle(record) Conditionally emits the specified logging record, depending on filters which may have been added to the handler. Wraps the actual emission of the record with acquisition/release of the I/O thread lock. handleError(record) This method should be called from handlers when an exception is encountered during an "emit()" call. If the module-level attribute "raiseExceptions" is "False", exceptions get silently ignored. This is what is mostly wanted for a logging system - most users will not care about errors in the logging system, they are more interested in application errors. You could, however, replace this with a custom handler if you wish. The specified record is the one which was being processed when the exception occurred. (The default value of "raiseExceptions" is "True", as that is more useful during development). format(record) Do formatting for a record - if a formatter is set, use it. Otherwise, use the default formatter for the module. emit(record) Do whatever it takes to actually log the specified logging record. This version is intended to be implemented by subclasses and so raises a "NotImplementedError". For a list of handlers included as standard, see "logging.handlers". Formatter Objects ================= "Formatter" objects have the following attributes and methods. They are responsible for converting a "LogRecord" to (usually) a string which can be interpreted by either a human or an external system. The base "Formatter" allows a formatting string to be specified. If none is supplied, the default value of "'%(message)s'" is used, which just includes the message in the logging call. To have additional items of information in the formatted output (such as a timestamp), keep reading. A Formatter can be initialized with a format string which makes use of knowledge of the "LogRecord" attributes - such as the default value mentioned above making use of the fact that the user’s message and arguments are pre-formatted into a "LogRecord"’s *message* attribute. This format string contains standard Python %-style mapping keys. See section printf-style String Formatting for more information on string formatting. The useful mapping keys in a "LogRecord" are given in the section on LogRecord attributes. class logging.Formatter(fmt=None, datefmt=None, style='%', validate=True) Returns a new instance of the "Formatter" class. The instance is initialized with a format string for the message as a whole, as well as a format string for the date/time portion of a message. If no *fmt* is specified, "'%(message)s'" is used. If no *datefmt* is specified, a format is used which is described in the "formatTime()" documentation. The *style* parameter can be one of ‘%’, ‘{’ or ‘$’ and determines how the format string will be merged with its data: using one of %-formatting, "str.format()" or "string.Template". This only applies to the format string *fmt* (e.g. "'%(message)s'" or "{message}"), not to the actual log messages passed to "Logger.debug" etc; see Using particular formatting styles throughout your application for more information on using {- and $-formatting for log messages. Changed in version 3.2: The *style* parameter was added. Changed in version 3.8: The *validate* parameter was added. Incorrect or mismatched style and fmt will raise a "ValueError". For example: "logging.Formatter('%(asctime)s - %(message)s', style='{')". format(record) The record’s attribute dictionary is used as the operand to a string formatting operation. Returns the resulting string. Before formatting the dictionary, a couple of preparatory steps are carried out. The *message* attribute of the record is computed using *msg* % *args*. If the formatting string contains "'(asctime)'", "formatTime()" is called to format the event time. If there is exception information, it is formatted using "formatException()" and appended to the message. Note that the formatted exception information is cached in attribute *exc_text*. This is useful because the exception information can be pickled and sent across the wire, but you should be careful if you have more than one "Formatter" subclass which customizes the formatting of exception information. In this case, you will have to clear the cached value after a formatter has done its formatting, so that the next formatter to handle the event doesn’t use the cached value but recalculates it afresh. If stack information is available, it’s appended after the exception information, using "formatStack()" to transform it if necessary. formatTime(record, datefmt=None) This method should be called from "format()" by a formatter which wants to make use of a formatted time. This method can be overridden in formatters to provide for any specific requirement, but the basic behavior is as follows: if *datefmt* (a string) is specified, it is used with "time.strftime()" to format the creation time of the record. Otherwise, the format ‘%Y-%m-%d %H:%M:%S,uuu’ is used, where the uuu part is a millisecond value and the other letters are as per the "time.strftime()" documentation. An example time in this format is "2003-01-23 00:29:50,411". The resulting string is returned. This function uses a user-configurable function to convert the creation time to a tuple. By default, "time.localtime()" is used; to change this for a particular formatter instance, set the "converter" attribute to a function with the same signature as "time.localtime()" or "time.gmtime()". To change it for all formatters, for example if you want all logging times to be shown in GMT, set the "converter" attribute in the "Formatter" class. Changed in version 3.3: Previously, the default format was hard- coded as in this example: "2010-09-06 22:38:15,292" where the part before the comma is handled by a strptime format string ("'%Y-%m-%d %H:%M:%S'"), and the part after the comma is a millisecond value. Because strptime does not have a format placeholder for milliseconds, the millisecond value is appended using another format string, "'%s,%03d'" — and both of these format strings have been hardcoded into this method. With the change, these strings are defined as class-level attributes which can be overridden at the instance level when desired. The names of the attributes are "default_time_format" (for the strptime format string) and "default_msec_format" (for appending the millisecond value). Changed in version 3.9: The "default_msec_format" can be "None". formatException(exc_info) Formats the specified exception information (a standard exception tuple as returned by "sys.exc_info()") as a string. This default implementation just uses "traceback.print_exception()". The resulting string is returned. formatStack(stack_info) Formats the specified stack information (a string as returned by "traceback.print_stack()", but with the last newline removed) as a string. This default implementation just returns the input value. Filter Objects ============== "Filters" can be used by "Handlers" and "Loggers" for more sophisticated filtering than is provided by levels. The base filter class only allows events which are below a certain point in the logger hierarchy. For example, a filter initialized with ‘A.B’ will allow events logged by loggers ‘A.B’, ‘A.B.C’, ‘A.B.C.D’, ‘A.B.D’ etc. but not ‘A.BB’, ‘B.A.B’ etc. If initialized with the empty string, all events are passed. class logging.Filter(name='') Returns an instance of the "Filter" class. If *name* is specified, it names a logger which, together with its children, will have its events allowed through the filter. If *name* is the empty string, allows every event. filter(record) Is the specified record to be logged? Returns zero for no, nonzero for yes. If deemed appropriate, the record may be modified in-place by this method. Note that filters attached to handlers are consulted before an event is emitted by the handler, whereas filters attached to loggers are consulted whenever an event is logged (using "debug()", "info()", etc.), before sending an event to handlers. This means that events which have been generated by descendant loggers will not be filtered by a logger’s filter setting, unless the filter has also been applied to those descendant loggers. You don’t actually need to subclass "Filter": you can pass any instance which has a "filter" method with the same semantics. Changed in version 3.2: You don’t need to create specialized "Filter" classes, or use other classes with a "filter" method: you can use a function (or other callable) as a filter. The filtering logic will check to see if the filter object has a "filter" attribute: if it does, it’s assumed to be a "Filter" and its "filter()" method is called. Otherwise, it’s assumed to be a callable and called with the record as the single parameter. The returned value should conform to that returned by "filter()". Although filters are used primarily to filter records based on more sophisticated criteria than levels, they get to see every record which is processed by the handler or logger they’re attached to: this can be useful if you want to do things like counting how many records were processed by a particular logger or handler, or adding, changing or removing attributes in the "LogRecord" being processed. Obviously changing the LogRecord needs to be done with some care, but it does allow the injection of contextual information into logs (see Using Filters to impart contextual information). LogRecord Objects ================= "LogRecord" instances are created automatically by the "Logger" every time something is logged, and can be created manually via "makeLogRecord()" (for example, from a pickled event received over the wire). class logging.LogRecord(name, level, pathname, lineno, msg, args, exc_info, func=None, sinfo=None) Contains all the information pertinent to the event being logged. The primary information is passed in "msg" and "args", which are combined using "msg % args" to create the "message" field of the record. Parameters: * **name** – The name of the logger used to log the event represented by this LogRecord. Note that this name will always have this value, even though it may be emitted by a handler attached to a different (ancestor) logger. * **level** – The numeric level of the logging event (one of DEBUG, INFO etc.) Note that this is converted to *two* attributes of the LogRecord: "levelno" for the numeric value and "levelname" for the corresponding level name. * **pathname** – The full pathname of the source file where the logging call was made. * **lineno** – The line number in the source file where the logging call was made. * **msg** – The event description message, possibly a format string with placeholders for variable data. * **args** – Variable data to merge into the *msg* argument to obtain the event description. * **exc_info** – An exception tuple with the current exception information, or "None" if no exception information is available. * **func** – The name of the function or method from which the logging call was invoked. * **sinfo** – A text string representing stack information from the base of the stack in the current thread, up to the logging call. getMessage() Returns the message for this "LogRecord" instance after merging any user-supplied arguments with the message. If the user- supplied message argument to the logging call is not a string, "str()" is called on it to convert it to a string. This allows use of user-defined classes as messages, whose "__str__" method can return the actual format string to be used. Changed in version 3.2: The creation of a "LogRecord" has been made more configurable by providing a factory which is used to create the record. The factory can be set using "getLogRecordFactory()" and "setLogRecordFactory()" (see this for the factory’s signature). This functionality can be used to inject your own values into a "LogRecord" at creation time. You can use the following pattern: old_factory = logging.getLogRecordFactory() def record_factory(*args, **kwargs): record = old_factory(*args, **kwargs) record.custom_attribute = 0xdecafbad return record logging.setLogRecordFactory(record_factory) With this pattern, multiple factories could be chained, and as long as they don’t overwrite each other’s attributes or unintentionally overwrite the standard attributes listed above, there should be no surprises. LogRecord attributes ==================== The LogRecord has a number of attributes, most of which are derived from the parameters to the constructor. (Note that the names do not always correspond exactly between the LogRecord constructor parameters and the LogRecord attributes.) These attributes can be used to merge data from the record into the format string. The following table lists (in alphabetical order) the attribute names, their meanings and the corresponding placeholder in a %-style format string. If you are using {}-formatting ("str.format()"), you can use "{attrname}" as the placeholder in the format string. If you are using $-formatting ("string.Template"), use the form "${attrname}". In both cases, of course, replace "attrname" with the actual attribute name you want to use. In the case of {}-formatting, you can specify formatting flags by placing them after the attribute name, separated from it with a colon. For example: a placeholder of "{msecs:03d}" would format a millisecond value of "4" as "004". Refer to the "str.format()" documentation for full details on the options available to you. +------------------+---------------------------+-------------------------------------------------+ | Attribute name | Format | Description | |==================|===========================|=================================================| | args | You shouldn’t need to | The tuple of arguments merged into "msg" to | | | format this yourself. | produce "message", or a dict whose values are | | | | used for the merge (when there is only one | | | | argument, and it is a dictionary). | +------------------+---------------------------+-------------------------------------------------+ | asctime | "%(asctime)s" | Human-readable time when the "LogRecord" was | | | | created. By default this is of the form | | | | ‘2003-07-08 16:49:45,896’ (the numbers after | | | | the comma are millisecond portion of the time). | +------------------+---------------------------+-------------------------------------------------+ | created | "%(created)f" | Time when the "LogRecord" was created (as | | | | returned by "time.time()"). | +------------------+---------------------------+-------------------------------------------------+ | exc_info | You shouldn’t need to | Exception tuple (à la "sys.exc_info") or, if no | | | format this yourself. | exception has occurred, "None". | +------------------+---------------------------+-------------------------------------------------+ | filename | "%(filename)s" | Filename portion of "pathname". | +------------------+---------------------------+-------------------------------------------------+ | funcName | "%(funcName)s" | Name of function containing the logging call. | +------------------+---------------------------+-------------------------------------------------+ | levelname | "%(levelname)s" | Text logging level for the message ("'DEBUG'", | | | | "'INFO'", "'WARNING'", "'ERROR'", | | | | "'CRITICAL'"). | +------------------+---------------------------+-------------------------------------------------+ | levelno | "%(levelno)s" | Numeric logging level for the message ("DEBUG", | | | | "INFO", "WARNING", "ERROR", "CRITICAL"). | +------------------+---------------------------+-------------------------------------------------+ | lineno | "%(lineno)d" | Source line number where the logging call was | | | | issued (if available). | +------------------+---------------------------+-------------------------------------------------+ | message | "%(message)s" | The logged message, computed as "msg % args". | | | | This is set when "Formatter.format()" is | | | | invoked. | +------------------+---------------------------+-------------------------------------------------+ | module | "%(module)s" | Module (name portion of "filename"). | +------------------+---------------------------+-------------------------------------------------+ | msecs | "%(msecs)d" | Millisecond portion of the time when the | | | | "LogRecord" was created. | +------------------+---------------------------+-------------------------------------------------+ | msg | You shouldn’t need to | The format string passed in the original | | | format this yourself. | logging call. Merged with "args" to produce | | | | "message", or an arbitrary object (see Using | | | | arbitrary objects as messages). | +------------------+---------------------------+-------------------------------------------------+ | name | "%(name)s" | Name of the logger used to log the call. | +------------------+---------------------------+-------------------------------------------------+ | pathname | "%(pathname)s" | Full pathname of the source file where the | | | | logging call was issued (if available). | +------------------+---------------------------+-------------------------------------------------+ | process | "%(process)d" | Process ID (if available). | +------------------+---------------------------+-------------------------------------------------+ | processName | "%(processName)s" | Process name (if available). | +------------------+---------------------------+-------------------------------------------------+ | relativeCreated | "%(relativeCreated)d" | Time in milliseconds when the LogRecord was | | | | created, relative to the time the logging | | | | module was loaded. | +------------------+---------------------------+-------------------------------------------------+ | stack_info | You shouldn’t need to | Stack frame information (where available) from | | | format this yourself. | the bottom of the stack in the current thread, | | | | up to and including the stack frame of the | | | | logging call which resulted in the creation of | | | | this record. | +------------------+---------------------------+-------------------------------------------------+ | thread | "%(thread)d" | Thread ID (if available). | +------------------+---------------------------+-------------------------------------------------+ | threadName | "%(threadName)s" | Thread name (if available). | +------------------+---------------------------+-------------------------------------------------+ Changed in version 3.1: *processName* was added. LoggerAdapter Objects ===================== "LoggerAdapter" instances are used to conveniently pass contextual information into logging calls. For a usage example, see the section on adding contextual information to your logging output. class logging.LoggerAdapter(logger, extra) Returns an instance of "LoggerAdapter" initialized with an underlying "Logger" instance and a dict-like object. process(msg, kwargs) Modifies the message and/or keyword arguments passed to a logging call in order to insert contextual information. This implementation takes the object passed as *extra* to the constructor and adds it to *kwargs* using key ‘extra’. The return value is a (*msg*, *kwargs*) tuple which has the (possibly modified) versions of the arguments passed in. In addition to the above, "LoggerAdapter" supports the following methods of "Logger": "debug()", "info()", "warning()", "error()", "exception()", "critical()", "log()", "isEnabledFor()", "getEffectiveLevel()", "setLevel()" and "hasHandlers()". These methods have the same signatures as their counterparts in "Logger", so you can use the two types of instances interchangeably. Changed in version 3.2: The "isEnabledFor()", "getEffectiveLevel()", "setLevel()" and "hasHandlers()" methods were added to "LoggerAdapter". These methods delegate to the underlying logger. Changed in version 3.6: Attribute "manager" and method "_log()" were added, which delegate to the underlying logger and allow adapters to be nested. Thread Safety ============= The logging module is intended to be thread-safe without any special work needing to be done by its clients. It achieves this though using threading locks; there is one lock to serialize access to the module’s shared data, and each handler also creates a lock to serialize access to its underlying I/O. If you are implementing asynchronous signal handlers using the "signal" module, you may not be able to use logging from within such handlers. This is because lock implementations in the "threading" module are not always re-entrant, and so cannot be invoked from such signal handlers. Module-Level Functions ====================== In addition to the classes described above, there are a number of module-level functions. logging.getLogger(name=None) Return a logger with the specified name or, if name is "None", return a logger which is the root logger of the hierarchy. If specified, the name is typically a dot-separated hierarchical name like *‘a’*, *‘a.b’* or *‘a.b.c.d’*. Choice of these names is entirely up to the developer who is using logging. All calls to this function with a given name return the same logger instance. This means that logger instances never need to be passed between different parts of an application. logging.getLoggerClass() Return either the standard "Logger" class, or the last class passed to "setLoggerClass()". This function may be called from within a new class definition, to ensure that installing a customized "Logger" class will not undo customizations already applied by other code. For example: class MyLogger(logging.getLoggerClass()): # ... override behaviour here logging.getLogRecordFactory() Return a callable which is used to create a "LogRecord". New in version 3.2: This function has been provided, along with "setLogRecordFactory()", to allow developers more control over how the "LogRecord" representing a logging event is constructed. See "setLogRecordFactory()" for more information about the how the factory is called. logging.debug(msg, *args, **kwargs) Logs a message with level "DEBUG" on the root logger. The *msg* is the message format string, and the *args* are the arguments which are merged into *msg* using the string formatting operator. (Note that this means that you can use keywords in the format string, together with a single dictionary argument.) There are three keyword arguments in *kwargs* which are inspected: *exc_info* which, if it does not evaluate as false, causes exception information to be added to the logging message. If an exception tuple (in the format returned by "sys.exc_info()") or an exception instance is provided, it is used; otherwise, "sys.exc_info()" is called to get the exception information. The second optional keyword argument is *stack_info*, which defaults to "False". If true, stack information is added to the logging message, including the actual logging call. Note that this is not the same stack information as that displayed through specifying *exc_info*: The former is stack frames from the bottom of the stack up to the logging call in the current thread, whereas the latter is information about stack frames which have been unwound, following an exception, while searching for exception handlers. You can specify *stack_info* independently of *exc_info*, e.g. to just show how you got to a certain point in your code, even when no exceptions were raised. The stack frames are printed following a header line which says: Stack (most recent call last): This mimics the "Traceback (most recent call last):" which is used when displaying exception frames. The third optional keyword argument is *extra* which can be used to pass a dictionary which is used to populate the __dict__ of the LogRecord created for the logging event with user-defined attributes. These custom attributes can then be used as you like. For example, they could be incorporated into logged messages. For example: FORMAT = '%(asctime)s %(clientip)-15s %(user)-8s %(message)s' logging.basicConfig(format=FORMAT) d = {'clientip': '192.168.0.1', 'user': 'fbloggs'} logging.warning('Protocol problem: %s', 'connection reset', extra=d) would print something like: 2006-02-08 22:20:02,165 192.168.0.1 fbloggs Protocol problem: connection reset The keys in the dictionary passed in *extra* should not clash with the keys used by the logging system. (See the "Formatter" documentation for more information on which keys are used by the logging system.) If you choose to use these attributes in logged messages, you need to exercise some care. In the above example, for instance, the "Formatter" has been set up with a format string which expects ‘clientip’ and ‘user’ in the attribute dictionary of the LogRecord. If these are missing, the message will not be logged because a string formatting exception will occur. So in this case, you always need to pass the *extra* dictionary with these keys. While this might be annoying, this feature is intended for use in specialized circumstances, such as multi-threaded servers where the same code executes in many contexts, and interesting conditions which arise are dependent on this context (such as remote client IP address and authenticated user name, in the above example). In such circumstances, it is likely that specialized "Formatter"s would be used with particular "Handler"s. Changed in version 3.2: The *stack_info* parameter was added. logging.info(msg, *args, **kwargs) Logs a message with level "INFO" on the root logger. The arguments are interpreted as for "debug()". logging.warning(msg, *args, **kwargs) Logs a message with level "WARNING" on the root logger. The arguments are interpreted as for "debug()". Note: There is an obsolete function "warn" which is functionally identical to "warning". As "warn" is deprecated, please do not use it - use "warning" instead. logging.error(msg, *args, **kwargs) Logs a message with level "ERROR" on the root logger. The arguments are interpreted as for "debug()". logging.critical(msg, *args, **kwargs) Logs a message with level "CRITICAL" on the root logger. The arguments are interpreted as for "debug()". logging.exception(msg, *args, **kwargs) Logs a message with level "ERROR" on the root logger. The arguments are interpreted as for "debug()". Exception info is added to the logging message. This function should only be called from an exception handler. logging.log(level, msg, *args, **kwargs) Logs a message with level *level* on the root logger. The other arguments are interpreted as for "debug()". logging.disable(level=CRITICAL) Provides an overriding level *level* for all loggers which takes precedence over the logger’s own level. When the need arises to temporarily throttle logging output down across the whole application, this function can be useful. Its effect is to disable all logging calls of severity *level* and below, so that if you call it with a value of INFO, then all INFO and DEBUG events would be discarded, whereas those of severity WARNING and above would be processed according to the logger’s effective level. If "logging.disable(logging.NOTSET)" is called, it effectively removes this overriding level, so that logging output again depends on the effective levels of individual loggers. Note that if you have defined any custom logging level higher than "CRITICAL" (this is not recommended), you won’t be able to rely on the default value for the *level* parameter, but will have to explicitly supply a suitable value. Changed in version 3.7: The *level* parameter was defaulted to level "CRITICAL". See bpo-28524 for more information about this change. logging.addLevelName(level, levelName) Associates level *level* with text *levelName* in an internal dictionary, which is used to map numeric levels to a textual representation, for example when a "Formatter" formats a message. This function can also be used to define your own levels. The only constraints are that all levels used must be registered using this function, levels should be positive integers and they should increase in increasing order of severity. Note: If you are thinking of defining your own levels, please see the section on Custom Levels. logging.getLevelName(level) Returns the textual or numeric representation of logging level *level*. If *level* is one of the predefined levels "CRITICAL", "ERROR", "WARNING", "INFO" or "DEBUG" then you get the corresponding string. If you have associated levels with names using "addLevelName()" then the name you have associated with *level* is returned. If a numeric value corresponding to one of the defined levels is passed in, the corresponding string representation is returned. The *level* parameter also accepts a string representation of the level such as ‘INFO’. In such cases, this functions returns the corresponding numeric value of the level. If no matching numeric or string value is passed in, the string ‘Level %s’ % level is returned. Note: Levels are internally integers (as they need to be compared in the logging logic). This function is used to convert between an integer level and the level name displayed in the formatted log output by means of the "%(levelname)s" format specifier (see LogRecord attributes), and vice versa. Changed in version 3.4: In Python versions earlier than 3.4, this function could also be passed a text level, and would return the corresponding numeric value of the level. This undocumented behaviour was considered a mistake, and was removed in Python 3.4, but reinstated in 3.4.2 due to retain backward compatibility. logging.makeLogRecord(attrdict) Creates and returns a new "LogRecord" instance whose attributes are defined by *attrdict*. This function is useful for taking a pickled "LogRecord" attribute dictionary, sent over a socket, and reconstituting it as a "LogRecord" instance at the receiving end. logging.basicConfig(**kwargs) Does basic configuration for the logging system by creating a "StreamHandler" with a default "Formatter" and adding it to the root logger. The functions "debug()", "info()", "warning()", "error()" and "critical()" will call "basicConfig()" automatically if no handlers are defined for the root logger. This function does nothing if the root logger already has handlers configured, unless the keyword argument *force* is set to "True". Note: This function should be called from the main thread before other threads are started. In versions of Python prior to 2.7.1 and 3.2, if this function is called from multiple threads, it is possible (in rare circumstances) that a handler will be added to the root logger more than once, leading to unexpected results such as messages being duplicated in the log. The following keyword arguments are supported. +----------------+-----------------------------------------------+ | Format | Description | |================|===============================================| | *filename* | Specifies that a FileHandler be created, | | | using the specified filename, rather than a | | | StreamHandler. | +----------------+-----------------------------------------------+ | *filemode* | If *filename* is specified, open the file in | | | this mode. Defaults to "'a'". | +----------------+-----------------------------------------------+ | *format* | Use the specified format string for the | | | handler. Defaults to attributes "levelname", | | | "name" and "message" separated by colons. | +----------------+-----------------------------------------------+ | *datefmt* | Use the specified date/time format, as | | | accepted by "time.strftime()". | +----------------+-----------------------------------------------+ | *style* | If *format* is specified, use this style for | | | the format string. One of "'%'", "'{'" or | | | "'$'" for printf-style, "str.format()" or | | | "string.Template" respectively. Defaults to | | | "'%'". | +----------------+-----------------------------------------------+ | *level* | Set the root logger level to the specified | | | level. | +----------------+-----------------------------------------------+ | *stream* | Use the specified stream to initialize the | | | StreamHandler. Note that this argument is | | | incompatible with *filename* - if both are | | | present, a "ValueError" is raised. | +----------------+-----------------------------------------------+ | *handlers* | If specified, this should be an iterable of | | | already created handlers to add to the root | | | logger. Any handlers which don’t already have | | | a formatter set will be assigned the default | | | formatter created in this function. Note that | | | this argument is incompatible with *filename* | | | or *stream* - if both are present, a | | | "ValueError" is raised. | +----------------+-----------------------------------------------+ | *force* | If this keyword argument is specified as | | | true, any existing handlers attached to the | | | root logger are removed and closed, before | | | carrying out the configuration as specified | | | by the other arguments. | +----------------+-----------------------------------------------+ | *encoding* | If this keyword argument is specified along | | | with *filename*, its value is used when the | | | FileHandler is created, and thus used when | | | opening the output file. | +----------------+-----------------------------------------------+ | *errors* | If this keyword argument is specified along | | | with *filename*, its value is used when the | | | FileHandler is created, and thus used when | | | opening the output file. If not specified, | | | the value ‘backslashreplace’ is used. Note | | | that if "None" is specified, it will be | | | passed as such to func:*open*, which means | | | that it will be treated the same as passing | | | ‘errors’. | +----------------+-----------------------------------------------+ Changed in version 3.2: The *style* argument was added. Changed in version 3.3: The *handlers* argument was added. Additional checks were added to catch situations where incompatible arguments are specified (e.g. *handlers* together with *stream* or *filename*, or *stream* together with *filename*). Changed in version 3.8: The *force* argument was added. Changed in version 3.9: The *encoding* and *errors* arguments were added. logging.shutdown() Informs the logging system to perform an orderly shutdown by flushing and closing all handlers. This should be called at application exit and no further use of the logging system should be made after this call. When the logging module is imported, it registers this function as an exit handler (see "atexit"), so normally there’s no need to do that manually. logging.setLoggerClass(klass) Tells the logging system to use the class *klass* when instantiating a logger. The class should define "__init__()" such that only a name argument is required, and the "__init__()" should call "Logger.__init__()". This function is typically called before any loggers are instantiated by applications which need to use custom logger behavior. After this call, as at any other time, do not instantiate loggers directly using the subclass: continue to use the "logging.getLogger()" API to get your loggers. logging.setLogRecordFactory(factory) Set a callable which is used to create a "LogRecord". Parameters: **factory** – The factory callable to be used to instantiate a log record. New in version 3.2: This function has been provided, along with "getLogRecordFactory()", to allow developers more control over how the "LogRecord" representing a logging event is constructed. The factory has the following signature: "factory(name, level, fn, lno, msg, args, exc_info, func=None, sinfo=None, **kwargs)" name: The logger name. level: The logging level (numeric). fn: The full pathname of the file where the logging call was made. lno: The line number in the file where the logging call was made. msg: The logging message. args: The arguments for the logging message. exc_info: An exception tuple, or "None". func: The name of the function or method which invoked the logging call. sinfo: A stack traceback such as is provided by "traceback.print_stack()", showing the call hierarchy. kwargs: Additional keyword arguments. Module-Level Attributes ======================= logging.lastResort A “handler of last resort” is available through this attribute. This is a "StreamHandler" writing to "sys.stderr" with a level of "WARNING", and is used to handle logging events in the absence of any logging configuration. The end result is to just print the message to "sys.stderr". This replaces the earlier error message saying that “no handlers could be found for logger XYZ”. If you need the earlier behaviour for some reason, "lastResort" can be set to "None". New in version 3.2. Integration with the warnings module ==================================== The "captureWarnings()" function can be used to integrate "logging" with the "warnings" module. logging.captureWarnings(capture) This function is used to turn the capture of warnings by logging on and off. If *capture* is "True", warnings issued by the "warnings" module will be redirected to the logging system. Specifically, a warning will be formatted using "warnings.formatwarning()" and the resulting string logged to a logger named "'py.warnings'" with a severity of "WARNING". If *capture* is "False", the redirection of warnings to the logging system will stop, and warnings will be redirected to their original destinations (i.e. those in effect before "captureWarnings(True)" was called). See also: Module "logging.config" Configuration API for the logging module. Module "logging.handlers" Useful handlers included with the logging module. **PEP 282** - A Logging System The proposal which described this feature for inclusion in the Python standard library. Original Python logging package This is the original source for the "logging" package. The version of the package available from this site is suitable for use with Python 1.5.2, 2.1.x and 2.2.x, which do not include the "logging" package in the standard library. "lzma" — Compression using the LZMA algorithm ********************************************* New in version 3.3. **Source code:** Lib/lzma.py ====================================================================== This module provides classes and convenience functions for compressing and decompressing data using the LZMA compression algorithm. Also included is a file interface supporting the ".xz" and legacy ".lzma" file formats used by the **xz** utility, as well as raw compressed streams. The interface provided by this module is very similar to that of the "bz2" module. However, note that "LZMAFile" is *not* thread-safe, unlike "bz2.BZ2File", so if you need to use a single "LZMAFile" instance from multiple threads, it is necessary to protect it with a lock. exception lzma.LZMAError This exception is raised when an error occurs during compression or decompression, or while initializing the compressor/decompressor state. Reading and writing compressed files ==================================== lzma.open(filename, mode="rb", *, format=None, check=-1, preset=None, filters=None, encoding=None, errors=None, newline=None) Open an LZMA-compressed file in binary or text mode, returning a *file object*. The *filename* argument can be either an actual file name (given as a "str", "bytes" or *path-like* object), in which case the named file is opened, or it can be an existing file object to read from or write to. The *mode* argument can be any of ""r"", ""rb"", ""w"", ""wb"", ""x"", ""xb"", ""a"" or ""ab"" for binary mode, or ""rt"", ""wt"", ""xt"", or ""at"" for text mode. The default is ""rb"". When opening a file for reading, the *format* and *filters* arguments have the same meanings as for "LZMADecompressor". In this case, the *check* and *preset* arguments should not be used. When opening a file for writing, the *format*, *check*, *preset* and *filters* arguments have the same meanings as for "LZMACompressor". For binary mode, this function is equivalent to the "LZMAFile" constructor: "LZMAFile(filename, mode, ...)". In this case, the *encoding*, *errors* and *newline* arguments must not be provided. For text mode, a "LZMAFile" object is created, and wrapped in an "io.TextIOWrapper" instance with the specified encoding, error handling behavior, and line ending(s). Changed in version 3.4: Added support for the ""x"", ""xb"" and ""xt"" modes. Changed in version 3.6: Accepts a *path-like object*. class lzma.LZMAFile(filename=None, mode="r", *, format=None, check=-1, preset=None, filters=None) Open an LZMA-compressed file in binary mode. An "LZMAFile" can wrap an already-open *file object*, or operate directly on a named file. The *filename* argument specifies either the file object to wrap, or the name of the file to open (as a "str", "bytes" or *path-like* object). When wrapping an existing file object, the wrapped file will not be closed when the "LZMAFile" is closed. The *mode* argument can be either ""r"" for reading (default), ""w"" for overwriting, ""x"" for exclusive creation, or ""a"" for appending. These can equivalently be given as ""rb"", ""wb"", ""xb"" and ""ab"" respectively. If *filename* is a file object (rather than an actual file name), a mode of ""w"" does not truncate the file, and is instead equivalent to ""a"". When opening a file for reading, the input file may be the concatenation of multiple separate compressed streams. These are transparently decoded as a single logical stream. When opening a file for reading, the *format* and *filters* arguments have the same meanings as for "LZMADecompressor". In this case, the *check* and *preset* arguments should not be used. When opening a file for writing, the *format*, *check*, *preset* and *filters* arguments have the same meanings as for "LZMACompressor". "LZMAFile" supports all the members specified by "io.BufferedIOBase", except for "detach()" and "truncate()". Iteration and the "with" statement are supported. The following method is also provided: peek(size=-1) Return buffered data without advancing the file position. At least one byte of data will be returned, unless EOF has been reached. The exact number of bytes returned is unspecified (the *size* argument is ignored). Note: While calling "peek()" does not change the file position of the "LZMAFile", it may change the position of the underlying file object (e.g. if the "LZMAFile" was constructed by passing a file object for *filename*). Changed in version 3.4: Added support for the ""x"" and ""xb"" modes. Changed in version 3.5: The "read()" method now accepts an argument of "None". Changed in version 3.6: Accepts a *path-like object*. Compressing and decompressing data in memory ============================================ class lzma.LZMACompressor(format=FORMAT_XZ, check=-1, preset=None, filters=None) Create a compressor object, which can be used to compress data incrementally. For a more convenient way of compressing a single chunk of data, see "compress()". The *format* argument specifies what container format should be used. Possible values are: * "FORMAT_XZ": The ".xz" container format. This is the default format. * "FORMAT_ALONE": The legacy ".lzma" container format. This format is more limited than ".xz" – it does not support integrity checks or multiple filters. * "FORMAT_RAW": A raw data stream, not using any container format. This format specifier does not support integrity checks, and requires that you always specify a custom filter chain (for both compression and decompression). Additionally, data compressed in this manner cannot be decompressed using "FORMAT_AUTO" (see "LZMADecompressor"). The *check* argument specifies the type of integrity check to include in the compressed data. This check is used when decompressing, to ensure that the data has not been corrupted. Possible values are: * "CHECK_NONE": No integrity check. This is the default (and the only acceptable value) for "FORMAT_ALONE" and "FORMAT_RAW". * "CHECK_CRC32": 32-bit Cyclic Redundancy Check. * "CHECK_CRC64": 64-bit Cyclic Redundancy Check. This is the default for "FORMAT_XZ". * "CHECK_SHA256": 256-bit Secure Hash Algorithm. If the specified check is not supported, an "LZMAError" is raised. The compression settings can be specified either as a preset compression level (with the *preset* argument), or in detail as a custom filter chain (with the *filters* argument). The *preset* argument (if provided) should be an integer between "0" and "9" (inclusive), optionally OR-ed with the constant "PRESET_EXTREME". If neither *preset* nor *filters* are given, the default behavior is to use "PRESET_DEFAULT" (preset level "6"). Higher presets produce smaller output, but make the compression process slower. Note: In addition to being more CPU-intensive, compression with higher presets also requires much more memory (and produces output that needs more memory to decompress). With preset "9" for example, the overhead for an "LZMACompressor" object can be as high as 800 MiB. For this reason, it is generally best to stick with the default preset. The *filters* argument (if provided) should be a filter chain specifier. See Specifying custom filter chains for details. compress(data) Compress *data* (a "bytes" object), returning a "bytes" object containing compressed data for at least part of the input. Some of *data* may be buffered internally, for use in later calls to "compress()" and "flush()". The returned data should be concatenated with the output of any previous calls to "compress()". flush() Finish the compression process, returning a "bytes" object containing any data stored in the compressor’s internal buffers. The compressor cannot be used after this method has been called. class lzma.LZMADecompressor(format=FORMAT_AUTO, memlimit=None, filters=None) Create a decompressor object, which can be used to decompress data incrementally. For a more convenient way of decompressing an entire compressed stream at once, see "decompress()". The *format* argument specifies the container format that should be used. The default is "FORMAT_AUTO", which can decompress both ".xz" and ".lzma" files. Other possible values are "FORMAT_XZ", "FORMAT_ALONE", and "FORMAT_RAW". The *memlimit* argument specifies a limit (in bytes) on the amount of memory that the decompressor can use. When this argument is used, decompression will fail with an "LZMAError" if it is not possible to decompress the input within the given memory limit. The *filters* argument specifies the filter chain that was used to create the stream being decompressed. This argument is required if *format* is "FORMAT_RAW", but should not be used for other formats. See Specifying custom filter chains for more information about filter chains. Note: This class does not transparently handle inputs containing multiple compressed streams, unlike "decompress()" and "LZMAFile". To decompress a multi-stream input with "LZMADecompressor", you must create a new decompressor for each stream. decompress(data, max_length=-1) Decompress *data* (a *bytes-like object*), returning uncompressed data as bytes. Some of *data* may be buffered internally, for use in later calls to "decompress()". The returned data should be concatenated with the output of any previous calls to "decompress()". If *max_length* is nonnegative, returns at most *max_length* bytes of decompressed data. If this limit is reached and further output can be produced, the "needs_input" attribute will be set to "False". In this case, the next call to "decompress()" may provide *data* as "b''" to obtain more of the output. If all of the input data was decompressed and returned (either because this was less than *max_length* bytes, or because *max_length* was negative), the "needs_input" attribute will be set to "True". Attempting to decompress data after the end of stream is reached raises an *EOFError*. Any data found after the end of the stream is ignored and saved in the "unused_data" attribute. Changed in version 3.5: Added the *max_length* parameter. check The ID of the integrity check used by the input stream. This may be "CHECK_UNKNOWN" until enough of the input has been decoded to determine what integrity check it uses. eof "True" if the end-of-stream marker has been reached. unused_data Data found after the end of the compressed stream. Before the end of the stream is reached, this will be "b""". needs_input "False" if the "decompress()" method can provide more decompressed data before requiring new uncompressed input. New in version 3.5. lzma.compress(data, format=FORMAT_XZ, check=-1, preset=None, filters=None) Compress *data* (a "bytes" object), returning the compressed data as a "bytes" object. See "LZMACompressor" above for a description of the *format*, *check*, *preset* and *filters* arguments. lzma.decompress(data, format=FORMAT_AUTO, memlimit=None, filters=None) Decompress *data* (a "bytes" object), returning the uncompressed data as a "bytes" object. If *data* is the concatenation of multiple distinct compressed streams, decompress all of these streams, and return the concatenation of the results. See "LZMADecompressor" above for a description of the *format*, *memlimit* and *filters* arguments. Miscellaneous ============= lzma.is_check_supported(check) Return "True" if the given integrity check is supported on this system. "CHECK_NONE" and "CHECK_CRC32" are always supported. "CHECK_CRC64" and "CHECK_SHA256" may be unavailable if you are using a version of **liblzma** that was compiled with a limited feature set. Specifying custom filter chains =============================== A filter chain specifier is a sequence of dictionaries, where each dictionary contains the ID and options for a single filter. Each dictionary must contain the key ""id"", and may contain additional keys to specify filter-dependent options. Valid filter IDs are as follows: * Compression filters: * "FILTER_LZMA1" (for use with "FORMAT_ALONE") * "FILTER_LZMA2" (for use with "FORMAT_XZ" and "FORMAT_RAW") * Delta filter: * "FILTER_DELTA" * Branch-Call-Jump (BCJ) filters: * "FILTER_X86" * "FILTER_IA64" * "FILTER_ARM" * "FILTER_ARMTHUMB" * "FILTER_POWERPC" * "FILTER_SPARC" A filter chain can consist of up to 4 filters, and cannot be empty. The last filter in the chain must be a compression filter, and any other filters must be delta or BCJ filters. Compression filters support the following options (specified as additional entries in the dictionary representing the filter): * "preset": A compression preset to use as a source of default values for options that are not specified explicitly. * "dict_size": Dictionary size in bytes. This should be between 4 KiB and 1.5 GiB (inclusive). * "lc": Number of literal context bits. * "lp": Number of literal position bits. The sum "lc + lp" must be at most 4. * "pb": Number of position bits; must be at most 4. * "mode": "MODE_FAST" or "MODE_NORMAL". * "nice_len": What should be considered a “nice length” for a match. This should be 273 or less. * "mf": What match finder to use – "MF_HC3", "MF_HC4", "MF_BT2", "MF_BT3", or "MF_BT4". * "depth": Maximum search depth used by match finder. 0 (default) means to select automatically based on other filter options. The delta filter stores the differences between bytes, producing more repetitive input for the compressor in certain circumstances. It supports one option, "dist". This indicates the distance between bytes to be subtracted. The default is 1, i.e. take the differences between adjacent bytes. The BCJ filters are intended to be applied to machine code. They convert relative branches, calls and jumps in the code to use absolute addressing, with the aim of increasing the redundancy that can be exploited by the compressor. These filters support one option, "start_offset". This specifies the address that should be mapped to the beginning of the input data. The default is 0. Examples ======== Reading in a compressed file: import lzma with lzma.open("file.xz") as f: file_content = f.read() Creating a compressed file: import lzma data = b"Insert Data Here" with lzma.open("file.xz", "w") as f: f.write(data) Compressing data in memory: import lzma data_in = b"Insert Data Here" data_out = lzma.compress(data_in) Incremental compression: import lzma lzc = lzma.LZMACompressor() out1 = lzc.compress(b"Some data\n") out2 = lzc.compress(b"Another piece of data\n") out3 = lzc.compress(b"Even more data\n") out4 = lzc.flush() # Concatenate all the partial results: result = b"".join([out1, out2, out3, out4]) Writing compressed data to an already-open file: import lzma with open("file.xz", "wb") as f: f.write(b"This data will not be compressed\n") with lzma.open(f, "w") as lzf: lzf.write(b"This *will* be compressed\n") f.write(b"Not compressed\n") Creating a compressed file using a custom filter chain: import lzma my_filters = [ {"id": lzma.FILTER_DELTA, "dist": 5}, {"id": lzma.FILTER_LZMA2, "preset": 7 | lzma.PRESET_EXTREME}, ] with lzma.open("file.xz", "w", filters=my_filters) as f: f.write(b"blah blah blah") "mailbox" — Manipulate mailboxes in various formats *************************************************** **Source code:** Lib/mailbox.py ====================================================================== This module defines two classes, "Mailbox" and "Message", for accessing and manipulating on-disk mailboxes and the messages they contain. "Mailbox" offers a dictionary-like mapping from keys to messages. "Message" extends the "email.message" module’s "Message" class with format-specific state and behavior. Supported mailbox formats are Maildir, mbox, MH, Babyl, and MMDF. See also: Module "email" Represent and manipulate messages. "Mailbox" objects ================= class mailbox.Mailbox A mailbox, which may be inspected and modified. The "Mailbox" class defines an interface and is not intended to be instantiated. Instead, format-specific subclasses should inherit from "Mailbox" and your code should instantiate a particular subclass. The "Mailbox" interface is dictionary-like, with small keys corresponding to messages. Keys are issued by the "Mailbox" instance with which they will be used and are only meaningful to that "Mailbox" instance. A key continues to identify a message even if the corresponding message is modified, such as by replacing it with another message. Messages may be added to a "Mailbox" instance using the set-like method "add()" and removed using a "del" statement or the set-like methods "remove()" and "discard()". "Mailbox" interface semantics differ from dictionary semantics in some noteworthy ways. Each time a message is requested, a new representation (typically a "Message" instance) is generated based upon the current state of the mailbox. Similarly, when a message is added to a "Mailbox" instance, the provided message representation’s contents are copied. In neither case is a reference to the message representation kept by the "Mailbox" instance. The default "Mailbox" iterator iterates over message representations, not keys as the default dictionary iterator does. Moreover, modification of a mailbox during iteration is safe and well-defined. Messages added to the mailbox after an iterator is created will not be seen by the iterator. Messages removed from the mailbox before the iterator yields them will be silently skipped, though using a key from an iterator may result in a "KeyError" exception if the corresponding message is subsequently removed. Warning: Be very cautious when modifying mailboxes that might be simultaneously changed by some other process. The safest mailbox format to use for such tasks is Maildir; try to avoid using single-file formats such as mbox for concurrent writing. If you’re modifying a mailbox, you *must* lock it by calling the "lock()" and "unlock()" methods *before* reading any messages in the file or making any changes by adding or deleting a message. Failing to lock the mailbox runs the risk of losing messages or corrupting the entire mailbox. "Mailbox" instances have the following methods: add(message) Add *message* to the mailbox and return the key that has been assigned to it. Parameter *message* may be a "Message" instance, an "email.message.Message" instance, a string, a byte string, or a file-like object (which should be open in binary mode). If *message* is an instance of the appropriate format-specific "Message" subclass (e.g., if it’s an "mboxMessage" instance and this is an "mbox" instance), its format-specific information is used. Otherwise, reasonable defaults for format-specific information are used. Changed in version 3.2: Support for binary input was added. remove(key) __delitem__(key) discard(key) Delete the message corresponding to *key* from the mailbox. If no such message exists, a "KeyError" exception is raised if the method was called as "remove()" or "__delitem__()" but no exception is raised if the method was called as "discard()". The behavior of "discard()" may be preferred if the underlying mailbox format supports concurrent modification by other processes. __setitem__(key, message) Replace the message corresponding to *key* with *message*. Raise a "KeyError" exception if no message already corresponds to *key*. As with "add()", parameter *message* may be a "Message" instance, an "email.message.Message" instance, a string, a byte string, or a file-like object (which should be open in binary mode). If *message* is an instance of the appropriate format- specific "Message" subclass (e.g., if it’s an "mboxMessage" instance and this is an "mbox" instance), its format-specific information is used. Otherwise, the format-specific information of the message that currently corresponds to *key* is left unchanged. iterkeys() keys() Return an iterator over all keys if called as "iterkeys()" or return a list of keys if called as "keys()". itervalues() __iter__() values() Return an iterator over representations of all messages if called as "itervalues()" or "__iter__()" or return a list of such representations if called as "values()". The messages are represented as instances of the appropriate format-specific "Message" subclass unless a custom message factory was specified when the "Mailbox" instance was initialized. Note: The behavior of "__iter__()" is unlike that of dictionaries, which iterate over keys. iteritems() items() Return an iterator over (*key*, *message*) pairs, where *key* is a key and *message* is a message representation, if called as "iteritems()" or return a list of such pairs if called as "items()". The messages are represented as instances of the appropriate format-specific "Message" subclass unless a custom message factory was specified when the "Mailbox" instance was initialized. get(key, default=None) __getitem__(key) Return a representation of the message corresponding to *key*. If no such message exists, *default* is returned if the method was called as "get()" and a "KeyError" exception is raised if the method was called as "__getitem__()". The message is represented as an instance of the appropriate format-specific "Message" subclass unless a custom message factory was specified when the "Mailbox" instance was initialized. get_message(key) Return a representation of the message corresponding to *key* as an instance of the appropriate format-specific "Message" subclass, or raise a "KeyError" exception if no such message exists. get_bytes(key) Return a byte representation of the message corresponding to *key*, or raise a "KeyError" exception if no such message exists. New in version 3.2. get_string(key) Return a string representation of the message corresponding to *key*, or raise a "KeyError" exception if no such message exists. The message is processed through "email.message.Message" to convert it to a 7bit clean representation. get_file(key) Return a file-like representation of the message corresponding to *key*, or raise a "KeyError" exception if no such message exists. The file-like object behaves as if open in binary mode. This file should be closed once it is no longer needed. Changed in version 3.2: The file object really is a binary file; previously it was incorrectly returned in text mode. Also, the file-like object now supports the context management protocol: you can use a "with" statement to automatically close it. Note: Unlike other representations of messages, file-like representations are not necessarily independent of the "Mailbox" instance that created them or of the underlying mailbox. More specific documentation is provided by each subclass. __contains__(key) Return "True" if *key* corresponds to a message, "False" otherwise. __len__() Return a count of messages in the mailbox. clear() Delete all messages from the mailbox. pop(key, default=None) Return a representation of the message corresponding to *key* and delete the message. If no such message exists, return *default*. The message is represented as an instance of the appropriate format-specific "Message" subclass unless a custom message factory was specified when the "Mailbox" instance was initialized. popitem() Return an arbitrary (*key*, *message*) pair, where *key* is a key and *message* is a message representation, and delete the corresponding message. If the mailbox is empty, raise a "KeyError" exception. The message is represented as an instance of the appropriate format-specific "Message" subclass unless a custom message factory was specified when the "Mailbox" instance was initialized. update(arg) Parameter *arg* should be a *key*-to-*message* mapping or an iterable of (*key*, *message*) pairs. Updates the mailbox so that, for each given *key* and *message*, the message corresponding to *key* is set to *message* as if by using "__setitem__()". As with "__setitem__()", each *key* must already correspond to a message in the mailbox or else a "KeyError" exception will be raised, so in general it is incorrect for *arg* to be a "Mailbox" instance. Note: Unlike with dictionaries, keyword arguments are not supported. flush() Write any pending changes to the filesystem. For some "Mailbox" subclasses, changes are always written immediately and "flush()" does nothing, but you should still make a habit of calling this method. lock() Acquire an exclusive advisory lock on the mailbox so that other processes know not to modify it. An "ExternalClashError" is raised if the lock is not available. The particular locking mechanisms used depend upon the mailbox format. You should *always* lock the mailbox before making any modifications to its contents. unlock() Release the lock on the mailbox, if any. close() Flush the mailbox, unlock it if necessary, and close any open files. For some "Mailbox" subclasses, this method does nothing. "Maildir" --------- class mailbox.Maildir(dirname, factory=None, create=True) A subclass of "Mailbox" for mailboxes in Maildir format. Parameter *factory* is a callable object that accepts a file-like message representation (which behaves as if opened in binary mode) and returns a custom representation. If *factory* is "None", "MaildirMessage" is used as the default message representation. If *create* is "True", the mailbox is created if it does not exist. If *create* is "True" and the *dirname* path exists, it will be treated as an existing maildir without attempting to verify its directory layout. It is for historical reasons that *dirname* is named as such rather than *path*. Maildir is a directory-based mailbox format invented for the qmail mail transfer agent and now widely supported by other programs. Messages in a Maildir mailbox are stored in separate files within a common directory structure. This design allows Maildir mailboxes to be accessed and modified by multiple unrelated programs without data corruption, so file locking is unnecessary. Maildir mailboxes contain three subdirectories, namely: "tmp", "new", and "cur". Messages are created momentarily in the "tmp" subdirectory and then moved to the "new" subdirectory to finalize delivery. A mail user agent may subsequently move the message to the "cur" subdirectory and store information about the state of the message in a special “info” section appended to its file name. Folders of the style introduced by the Courier mail transfer agent are also supported. Any subdirectory of the main mailbox is considered a folder if "'.'" is the first character in its name. Folder names are represented by "Maildir" without the leading "'.'". Each folder is itself a Maildir mailbox but should not contain other folders. Instead, a logical nesting is indicated using "'.'" to delimit levels, e.g., “Archived.2005.07”. Note: The Maildir specification requires the use of a colon ("':'") in certain message file names. However, some operating systems do not permit this character in file names, If you wish to use a Maildir-like format on such an operating system, you should specify another character to use instead. The exclamation point ("'!'") is a popular choice. For example: import mailbox mailbox.Maildir.colon = '!' The "colon" attribute may also be set on a per-instance basis. "Maildir" instances have all of the methods of "Mailbox" in addition to the following: list_folders() Return a list of the names of all folders. get_folder(folder) Return a "Maildir" instance representing the folder whose name is *folder*. A "NoSuchMailboxError" exception is raised if the folder does not exist. add_folder(folder) Create a folder whose name is *folder* and return a "Maildir" instance representing it. remove_folder(folder) Delete the folder whose name is *folder*. If the folder contains any messages, a "NotEmptyError" exception will be raised and the folder will not be deleted. clean() Delete temporary files from the mailbox that have not been accessed in the last 36 hours. The Maildir specification says that mail-reading programs should do this occasionally. Some "Mailbox" methods implemented by "Maildir" deserve special remarks: add(message) __setitem__(key, message) update(arg) Warning: These methods generate unique file names based upon the current process ID. When using multiple threads, undetected name clashes may occur and cause corruption of the mailbox unless threads are coordinated to avoid using these methods to manipulate the same mailbox simultaneously. flush() All changes to Maildir mailboxes are immediately applied, so this method does nothing. lock() unlock() Maildir mailboxes do not support (or require) locking, so these methods do nothing. close() "Maildir" instances do not keep any open files and the underlying mailboxes do not support locking, so this method does nothing. get_file(key) Depending upon the host platform, it may not be possible to modify or remove the underlying message while the returned file remains open. See also: maildir man page from Courier A specification of the format. Describes a common extension for supporting folders. Using maildir format Notes on Maildir by its inventor. Includes an updated name- creation scheme and details on “info” semantics. "mbox" ------ class mailbox.mbox(path, factory=None, create=True) A subclass of "Mailbox" for mailboxes in mbox format. Parameter *factory* is a callable object that accepts a file-like message representation (which behaves as if opened in binary mode) and returns a custom representation. If *factory* is "None", "mboxMessage" is used as the default message representation. If *create* is "True", the mailbox is created if it does not exist. The mbox format is the classic format for storing mail on Unix systems. All messages in an mbox mailbox are stored in a single file with the beginning of each message indicated by a line whose first five characters are “From “. Several variations of the mbox format exist to address perceived shortcomings in the original. In the interest of compatibility, "mbox" implements the original format, which is sometimes referred to as *mboxo*. This means that the *Content-Length* header, if present, is ignored and that any occurrences of “From ” at the beginning of a line in a message body are transformed to “>From ” when storing the message, although occurrences of “>From ” are not transformed to “From ” when reading the message. Some "Mailbox" methods implemented by "mbox" deserve special remarks: get_file(key) Using the file after calling "flush()" or "close()" on the "mbox" instance may yield unpredictable results or raise an exception. lock() unlock() Three locking mechanisms are used—dot locking and, if available, the "flock()" and "lockf()" system calls. See also: mbox man page from tin A specification of the format, with details on locking. Configuring Netscape Mail on Unix: Why The Content-Length Format is Bad An argument for using the original mbox format rather than a variation. “mbox” is a family of several mutually incompatible mailbox formats A history of mbox variations. "MH" ---- class mailbox.MH(path, factory=None, create=True) A subclass of "Mailbox" for mailboxes in MH format. Parameter *factory* is a callable object that accepts a file-like message representation (which behaves as if opened in binary mode) and returns a custom representation. If *factory* is "None", "MHMessage" is used as the default message representation. If *create* is "True", the mailbox is created if it does not exist. MH is a directory-based mailbox format invented for the MH Message Handling System, a mail user agent. Each message in an MH mailbox resides in its own file. An MH mailbox may contain other MH mailboxes (called *folders*) in addition to messages. Folders may be nested indefinitely. MH mailboxes also support *sequences*, which are named lists used to logically group messages without moving them to sub-folders. Sequences are defined in a file called ".mh_sequences" in each folder. The "MH" class manipulates MH mailboxes, but it does not attempt to emulate all of **mh**’s behaviors. In particular, it does not modify and is not affected by the "context" or ".mh_profile" files that are used by **mh** to store its state and configuration. "MH" instances have all of the methods of "Mailbox" in addition to the following: list_folders() Return a list of the names of all folders. get_folder(folder) Return an "MH" instance representing the folder whose name is *folder*. A "NoSuchMailboxError" exception is raised if the folder does not exist. add_folder(folder) Create a folder whose name is *folder* and return an "MH" instance representing it. remove_folder(folder) Delete the folder whose name is *folder*. If the folder contains any messages, a "NotEmptyError" exception will be raised and the folder will not be deleted. get_sequences() Return a dictionary of sequence names mapped to key lists. If there are no sequences, the empty dictionary is returned. set_sequences(sequences) Re-define the sequences that exist in the mailbox based upon *sequences*, a dictionary of names mapped to key lists, like returned by "get_sequences()". pack() Rename messages in the mailbox as necessary to eliminate gaps in numbering. Entries in the sequences list are updated correspondingly. Note: Already-issued keys are invalidated by this operation and should not be subsequently used. Some "Mailbox" methods implemented by "MH" deserve special remarks: remove(key) __delitem__(key) discard(key) These methods immediately delete the message. The MH convention of marking a message for deletion by prepending a comma to its name is not used. lock() unlock() Three locking mechanisms are used—dot locking and, if available, the "flock()" and "lockf()" system calls. For MH mailboxes, locking the mailbox means locking the ".mh_sequences" file and, only for the duration of any operations that affect them, locking individual message files. get_file(key) Depending upon the host platform, it may not be possible to remove the underlying message while the returned file remains open. flush() All changes to MH mailboxes are immediately applied, so this method does nothing. close() "MH" instances do not keep any open files, so this method is equivalent to "unlock()". See also: nmh - Message Handling System Home page of **nmh**, an updated version of the original **mh**. MH & nmh: Email for Users & Programmers A GPL-licensed book on **mh** and **nmh**, with some information on the mailbox format. "Babyl" ------- class mailbox.Babyl(path, factory=None, create=True) A subclass of "Mailbox" for mailboxes in Babyl format. Parameter *factory* is a callable object that accepts a file-like message representation (which behaves as if opened in binary mode) and returns a custom representation. If *factory* is "None", "BabylMessage" is used as the default message representation. If *create* is "True", the mailbox is created if it does not exist. Babyl is a single-file mailbox format used by the Rmail mail user agent included with Emacs. The beginning of a message is indicated by a line containing the two characters Control-Underscore ("'\037'") and Control-L ("'\014'"). The end of a message is indicated by the start of the next message or, in the case of the last message, a line containing a Control-Underscore ("'\037'") character. Messages in a Babyl mailbox have two sets of headers, original headers and so-called visible headers. Visible headers are typically a subset of the original headers that have been reformatted or abridged to be more attractive. Each message in a Babyl mailbox also has an accompanying list of *labels*, or short strings that record extra information about the message, and a list of all user-defined labels found in the mailbox is kept in the Babyl options section. "Babyl" instances have all of the methods of "Mailbox" in addition to the following: get_labels() Return a list of the names of all user-defined labels used in the mailbox. Note: The actual messages are inspected to determine which labels exist in the mailbox rather than consulting the list of labels in the Babyl options section, but the Babyl section is updated whenever the mailbox is modified. Some "Mailbox" methods implemented by "Babyl" deserve special remarks: get_file(key) In Babyl mailboxes, the headers of a message are not stored contiguously with the body of the message. To generate a file- like representation, the headers and body are copied together into an "io.BytesIO" instance, which has an API identical to that of a file. As a result, the file-like object is truly independent of the underlying mailbox but does not save memory compared to a string representation. lock() unlock() Three locking mechanisms are used—dot locking and, if available, the "flock()" and "lockf()" system calls. See also: Format of Version 5 Babyl Files A specification of the Babyl format. Reading Mail with Rmail The Rmail manual, with some information on Babyl semantics. "MMDF" ------ class mailbox.MMDF(path, factory=None, create=True) A subclass of "Mailbox" for mailboxes in MMDF format. Parameter *factory* is a callable object that accepts a file-like message representation (which behaves as if opened in binary mode) and returns a custom representation. If *factory* is "None", "MMDFMessage" is used as the default message representation. If *create* is "True", the mailbox is created if it does not exist. MMDF is a single-file mailbox format invented for the Multichannel Memorandum Distribution Facility, a mail transfer agent. Each message is in the same form as an mbox message but is bracketed before and after by lines containing four Control-A ("'\001'") characters. As with the mbox format, the beginning of each message is indicated by a line whose first five characters are “From “, but additional occurrences of “From ” are not transformed to “>From ” when storing messages because the extra message separator lines prevent mistaking such occurrences for the starts of subsequent messages. Some "Mailbox" methods implemented by "MMDF" deserve special remarks: get_file(key) Using the file after calling "flush()" or "close()" on the "MMDF" instance may yield unpredictable results or raise an exception. lock() unlock() Three locking mechanisms are used—dot locking and, if available, the "flock()" and "lockf()" system calls. See also: mmdf man page from tin A specification of MMDF format from the documentation of tin, a newsreader. MMDF A Wikipedia article describing the Multichannel Memorandum Distribution Facility. "Message" objects ================= class mailbox.Message(message=None) A subclass of the "email.message" module’s "Message". Subclasses of "mailbox.Message" add mailbox-format-specific state and behavior. If *message* is omitted, the new instance is created in a default, empty state. If *message* is an "email.message.Message" instance, its contents are copied; furthermore, any format-specific information is converted insofar as possible if *message* is a "Message" instance. If *message* is a string, a byte string, or a file, it should contain an **RFC 2822**-compliant message, which is read and parsed. Files should be open in binary mode, but text mode files are accepted for backward compatibility. The format-specific state and behaviors offered by subclasses vary, but in general it is only the properties that are not specific to a particular mailbox that are supported (although presumably the properties are specific to a particular mailbox format). For example, file offsets for single-file mailbox formats and file names for directory-based mailbox formats are not retained, because they are only applicable to the original mailbox. But state such as whether a message has been read by the user or marked as important is retained, because it applies to the message itself. There is no requirement that "Message" instances be used to represent messages retrieved using "Mailbox" instances. In some situations, the time and memory required to generate "Message" representations might not be acceptable. For such situations, "Mailbox" instances also offer string and file-like representations, and a custom message factory may be specified when a "Mailbox" instance is initialized. "MaildirMessage" ---------------- class mailbox.MaildirMessage(message=None) A message with Maildir-specific behaviors. Parameter *message* has the same meaning as with the "Message" constructor. Typically, a mail user agent application moves all of the messages in the "new" subdirectory to the "cur" subdirectory after the first time the user opens and closes the mailbox, recording that the messages are old whether or not they’ve actually been read. Each message in "cur" has an “info” section added to its file name to store information about its state. (Some mail readers may also add an “info” section to messages in "new".) The “info” section may take one of two forms: it may contain “2,” followed by a list of standardized flags (e.g., “2,FR”) or it may contain “1,” followed by so-called experimental information. Standard flags for Maildir messages are as follows: +--------+-----------+----------------------------------+ | Flag | Meaning | Explanation | |========|===========|==================================| | D | Draft | Under composition | +--------+-----------+----------------------------------+ | F | Flagged | Marked as important | +--------+-----------+----------------------------------+ | P | Passed | Forwarded, resent, or bounced | +--------+-----------+----------------------------------+ | R | Replied | Replied to | +--------+-----------+----------------------------------+ | S | Seen | Read | +--------+-----------+----------------------------------+ | T | Trashed | Marked for subsequent deletion | +--------+-----------+----------------------------------+ "MaildirMessage" instances offer the following methods: get_subdir() Return either “new” (if the message should be stored in the "new" subdirectory) or “cur” (if the message should be stored in the "cur" subdirectory). Note: A message is typically moved from "new" to "cur" after its mailbox has been accessed, whether or not the message is has been read. A message "msg" has been read if ""S" in msg.get_flags()" is "True". set_subdir(subdir) Set the subdirectory the message should be stored in. Parameter *subdir* must be either “new” or “cur”. get_flags() Return a string specifying the flags that are currently set. If the message complies with the standard Maildir format, the result is the concatenation in alphabetical order of zero or one occurrence of each of "'D'", "'F'", "'P'", "'R'", "'S'", and "'T'". The empty string is returned if no flags are set or if “info” contains experimental semantics. set_flags(flags) Set the flags specified by *flags* and unset all others. add_flag(flag) Set the flag(s) specified by *flag* without changing other flags. To add more than one flag at a time, *flag* may be a string of more than one character. The current “info” is overwritten whether or not it contains experimental information rather than flags. remove_flag(flag) Unset the flag(s) specified by *flag* without changing other flags. To remove more than one flag at a time, *flag* maybe a string of more than one character. If “info” contains experimental information rather than flags, the current “info” is not modified. get_date() Return the delivery date of the message as a floating-point number representing seconds since the epoch. set_date(date) Set the delivery date of the message to *date*, a floating-point number representing seconds since the epoch. get_info() Return a string containing the “info” for a message. This is useful for accessing and modifying “info” that is experimental (i.e., not a list of flags). set_info(info) Set “info” to *info*, which should be a string. When a "MaildirMessage" instance is created based upon an "mboxMessage" or "MMDFMessage" instance, the *Status* and *X-Status* headers are omitted and the following conversions take place: +----------------------+------------------------------------------------+ | Resulting state | "mboxMessage" or "MMDFMessage" state | |======================|================================================| | “cur” subdirectory | O flag | +----------------------+------------------------------------------------+ | F flag | F flag | +----------------------+------------------------------------------------+ | R flag | A flag | +----------------------+------------------------------------------------+ | S flag | R flag | +----------------------+------------------------------------------------+ | T flag | D flag | +----------------------+------------------------------------------------+ When a "MaildirMessage" instance is created based upon an "MHMessage" instance, the following conversions take place: +---------------------------------+----------------------------+ | Resulting state | "MHMessage" state | |=================================|============================| | “cur” subdirectory | “unseen” sequence | +---------------------------------+----------------------------+ | “cur” subdirectory and S flag | no “unseen” sequence | +---------------------------------+----------------------------+ | F flag | “flagged” sequence | +---------------------------------+----------------------------+ | R flag | “replied” sequence | +---------------------------------+----------------------------+ When a "MaildirMessage" instance is created based upon a "BabylMessage" instance, the following conversions take place: +---------------------------------+---------------------------------+ | Resulting state | "BabylMessage" state | |=================================|=================================| | “cur” subdirectory | “unseen” label | +---------------------------------+---------------------------------+ | “cur” subdirectory and S flag | no “unseen” label | +---------------------------------+---------------------------------+ | P flag | “forwarded” or “resent” label | +---------------------------------+---------------------------------+ | R flag | “answered” label | +---------------------------------+---------------------------------+ | T flag | “deleted” label | +---------------------------------+---------------------------------+ "mboxMessage" ------------- class mailbox.mboxMessage(message=None) A message with mbox-specific behaviors. Parameter *message* has the same meaning as with the "Message" constructor. Messages in an mbox mailbox are stored together in a single file. The sender’s envelope address and the time of delivery are typically stored in a line beginning with “From ” that is used to indicate the start of a message, though there is considerable variation in the exact format of this data among mbox implementations. Flags that indicate the state of the message, such as whether it has been read or marked as important, are typically stored in *Status* and *X-Status* headers. Conventional flags for mbox messages are as follows: +--------+------------+----------------------------------+ | Flag | Meaning | Explanation | |========|============|==================================| | R | Read | Read | +--------+------------+----------------------------------+ | O | Old | Previously detected by MUA | +--------+------------+----------------------------------+ | D | Deleted | Marked for subsequent deletion | +--------+------------+----------------------------------+ | F | Flagged | Marked as important | +--------+------------+----------------------------------+ | A | Answered | Replied to | +--------+------------+----------------------------------+ The “R” and “O” flags are stored in the *Status* header, and the “D”, “F”, and “A” flags are stored in the *X-Status* header. The flags and headers typically appear in the order mentioned. "mboxMessage" instances offer the following methods: get_from() Return a string representing the “From ” line that marks the start of the message in an mbox mailbox. The leading “From ” and the trailing newline are excluded. set_from(from_, time_=None) Set the “From ” line to *from_*, which should be specified without a leading “From ” or trailing newline. For convenience, *time_* may be specified and will be formatted appropriately and appended to *from_*. If *time_* is specified, it should be a "time.struct_time" instance, a tuple suitable for passing to "time.strftime()", or "True" (to use "time.gmtime()"). get_flags() Return a string specifying the flags that are currently set. If the message complies with the conventional format, the result is the concatenation in the following order of zero or one occurrence of each of "'R'", "'O'", "'D'", "'F'", and "'A'". set_flags(flags) Set the flags specified by *flags* and unset all others. Parameter *flags* should be the concatenation in any order of zero or more occurrences of each of "'R'", "'O'", "'D'", "'F'", and "'A'". add_flag(flag) Set the flag(s) specified by *flag* without changing other flags. To add more than one flag at a time, *flag* may be a string of more than one character. remove_flag(flag) Unset the flag(s) specified by *flag* without changing other flags. To remove more than one flag at a time, *flag* maybe a string of more than one character. When an "mboxMessage" instance is created based upon a "MaildirMessage" instance, a “From ” line is generated based upon the "MaildirMessage" instance’s delivery date, and the following conversions take place: +-------------------+---------------------------------+ | Resulting state | "MaildirMessage" state | |===================|=================================| | R flag | S flag | +-------------------+---------------------------------+ | O flag | “cur” subdirectory | +-------------------+---------------------------------+ | D flag | T flag | +-------------------+---------------------------------+ | F flag | F flag | +-------------------+---------------------------------+ | A flag | R flag | +-------------------+---------------------------------+ When an "mboxMessage" instance is created based upon an "MHMessage" instance, the following conversions take place: +---------------------+----------------------------+ | Resulting state | "MHMessage" state | |=====================|============================| | R flag and O flag | no “unseen” sequence | +---------------------+----------------------------+ | O flag | “unseen” sequence | +---------------------+----------------------------+ | F flag | “flagged” sequence | +---------------------+----------------------------+ | A flag | “replied” sequence | +---------------------+----------------------------+ When an "mboxMessage" instance is created based upon a "BabylMessage" instance, the following conversions take place: +---------------------+-------------------------------+ | Resulting state | "BabylMessage" state | |=====================|===============================| | R flag and O flag | no “unseen” label | +---------------------+-------------------------------+ | O flag | “unseen” label | +---------------------+-------------------------------+ | D flag | “deleted” label | +---------------------+-------------------------------+ | A flag | “answered” label | +---------------------+-------------------------------+ When a "Message" instance is created based upon an "MMDFMessage" instance, the “From ” line is copied and all flags directly correspond: +-------------------+------------------------------+ | Resulting state | "MMDFMessage" state | |===================|==============================| | R flag | R flag | +-------------------+------------------------------+ | O flag | O flag | +-------------------+------------------------------+ | D flag | D flag | +-------------------+------------------------------+ | F flag | F flag | +-------------------+------------------------------+ | A flag | A flag | +-------------------+------------------------------+ "MHMessage" ----------- class mailbox.MHMessage(message=None) A message with MH-specific behaviors. Parameter *message* has the same meaning as with the "Message" constructor. MH messages do not support marks or flags in the traditional sense, but they do support sequences, which are logical groupings of arbitrary messages. Some mail reading programs (although not the standard **mh** and **nmh**) use sequences in much the same way flags are used with other formats, as follows: +------------+--------------------------------------------+ | Sequence | Explanation | |============|============================================| | unseen | Not read, but previously detected by MUA | +------------+--------------------------------------------+ | replied | Replied to | +------------+--------------------------------------------+ | flagged | Marked as important | +------------+--------------------------------------------+ "MHMessage" instances offer the following methods: get_sequences() Return a list of the names of sequences that include this message. set_sequences(sequences) Set the list of sequences that include this message. add_sequence(sequence) Add *sequence* to the list of sequences that include this message. remove_sequence(sequence) Remove *sequence* from the list of sequences that include this message. When an "MHMessage" instance is created based upon a "MaildirMessage" instance, the following conversions take place: +----------------------+---------------------------------+ | Resulting state | "MaildirMessage" state | |======================|=================================| | “unseen” sequence | no S flag | +----------------------+---------------------------------+ | “replied” sequence | R flag | +----------------------+---------------------------------+ | “flagged” sequence | F flag | +----------------------+---------------------------------+ When an "MHMessage" instance is created based upon an "mboxMessage" or "MMDFMessage" instance, the *Status* and *X-Status* headers are omitted and the following conversions take place: +----------------------+------------------------------------------------+ | Resulting state | "mboxMessage" or "MMDFMessage" state | |======================|================================================| | “unseen” sequence | no R flag | +----------------------+------------------------------------------------+ | “replied” sequence | A flag | +----------------------+------------------------------------------------+ | “flagged” sequence | F flag | +----------------------+------------------------------------------------+ When an "MHMessage" instance is created based upon a "BabylMessage" instance, the following conversions take place: +----------------------+-------------------------------+ | Resulting state | "BabylMessage" state | |======================|===============================| | “unseen” sequence | “unseen” label | +----------------------+-------------------------------+ | “replied” sequence | “answered” label | +----------------------+-------------------------------+ "BabylMessage" -------------- class mailbox.BabylMessage(message=None) A message with Babyl-specific behaviors. Parameter *message* has the same meaning as with the "Message" constructor. Certain message labels, called *attributes*, are defined by convention to have special meanings. The attributes are as follows: +-------------+--------------------------------------------+ | Label | Explanation | |=============|============================================| | unseen | Not read, but previously detected by MUA | +-------------+--------------------------------------------+ | deleted | Marked for subsequent deletion | +-------------+--------------------------------------------+ | filed | Copied to another file or mailbox | +-------------+--------------------------------------------+ | answered | Replied to | +-------------+--------------------------------------------+ | forwarded | Forwarded | +-------------+--------------------------------------------+ | edited | Modified by the user | +-------------+--------------------------------------------+ | resent | Resent | +-------------+--------------------------------------------+ By default, Rmail displays only visible headers. The "BabylMessage" class, though, uses the original headers because they are more complete. Visible headers may be accessed explicitly if desired. "BabylMessage" instances offer the following methods: get_labels() Return a list of labels on the message. set_labels(labels) Set the list of labels on the message to *labels*. add_label(label) Add *label* to the list of labels on the message. remove_label(label) Remove *label* from the list of labels on the message. get_visible() Return an "Message" instance whose headers are the message’s visible headers and whose body is empty. set_visible(visible) Set the message’s visible headers to be the same as the headers in *message*. Parameter *visible* should be a "Message" instance, an "email.message.Message" instance, a string, or a file-like object (which should be open in text mode). update_visible() When a "BabylMessage" instance’s original headers are modified, the visible headers are not automatically modified to correspond. This method updates the visible headers as follows: each visible header with a corresponding original header is set to the value of the original header, each visible header without a corresponding original header is removed, and any of *Date*, *From*, *Reply-To*, *To*, *CC*, and *Subject* that are present in the original headers but not the visible headers are added to the visible headers. When a "BabylMessage" instance is created based upon a "MaildirMessage" instance, the following conversions take place: +---------------------+---------------------------------+ | Resulting state | "MaildirMessage" state | |=====================|=================================| | “unseen” label | no S flag | +---------------------+---------------------------------+ | “deleted” label | T flag | +---------------------+---------------------------------+ | “answered” label | R flag | +---------------------+---------------------------------+ | “forwarded” label | P flag | +---------------------+---------------------------------+ When a "BabylMessage" instance is created based upon an "mboxMessage" or "MMDFMessage" instance, the *Status* and *X-Status* headers are omitted and the following conversions take place: +--------------------+------------------------------------------------+ | Resulting state | "mboxMessage" or "MMDFMessage" state | |====================|================================================| | “unseen” label | no R flag | +--------------------+------------------------------------------------+ | “deleted” label | D flag | +--------------------+------------------------------------------------+ | “answered” label | A flag | +--------------------+------------------------------------------------+ When a "BabylMessage" instance is created based upon an "MHMessage" instance, the following conversions take place: +--------------------+----------------------------+ | Resulting state | "MHMessage" state | |====================|============================| | “unseen” label | “unseen” sequence | +--------------------+----------------------------+ | “answered” label | “replied” sequence | +--------------------+----------------------------+ "MMDFMessage" ------------- class mailbox.MMDFMessage(message=None) A message with MMDF-specific behaviors. Parameter *message* has the same meaning as with the "Message" constructor. As with message in an mbox mailbox, MMDF messages are stored with the sender’s address and the delivery date in an initial line beginning with “From “. Likewise, flags that indicate the state of the message are typically stored in *Status* and *X-Status* headers. Conventional flags for MMDF messages are identical to those of mbox message and are as follows: +--------+------------+----------------------------------+ | Flag | Meaning | Explanation | |========|============|==================================| | R | Read | Read | +--------+------------+----------------------------------+ | O | Old | Previously detected by MUA | +--------+------------+----------------------------------+ | D | Deleted | Marked for subsequent deletion | +--------+------------+----------------------------------+ | F | Flagged | Marked as important | +--------+------------+----------------------------------+ | A | Answered | Replied to | +--------+------------+----------------------------------+ The “R” and “O” flags are stored in the *Status* header, and the “D”, “F”, and “A” flags are stored in the *X-Status* header. The flags and headers typically appear in the order mentioned. "MMDFMessage" instances offer the following methods, which are identical to those offered by "mboxMessage": get_from() Return a string representing the “From ” line that marks the start of the message in an mbox mailbox. The leading “From ” and the trailing newline are excluded. set_from(from_, time_=None) Set the “From ” line to *from_*, which should be specified without a leading “From ” or trailing newline. For convenience, *time_* may be specified and will be formatted appropriately and appended to *from_*. If *time_* is specified, it should be a "time.struct_time" instance, a tuple suitable for passing to "time.strftime()", or "True" (to use "time.gmtime()"). get_flags() Return a string specifying the flags that are currently set. If the message complies with the conventional format, the result is the concatenation in the following order of zero or one occurrence of each of "'R'", "'O'", "'D'", "'F'", and "'A'". set_flags(flags) Set the flags specified by *flags* and unset all others. Parameter *flags* should be the concatenation in any order of zero or more occurrences of each of "'R'", "'O'", "'D'", "'F'", and "'A'". add_flag(flag) Set the flag(s) specified by *flag* without changing other flags. To add more than one flag at a time, *flag* may be a string of more than one character. remove_flag(flag) Unset the flag(s) specified by *flag* without changing other flags. To remove more than one flag at a time, *flag* maybe a string of more than one character. When an "MMDFMessage" instance is created based upon a "MaildirMessage" instance, a “From ” line is generated based upon the "MaildirMessage" instance’s delivery date, and the following conversions take place: +-------------------+---------------------------------+ | Resulting state | "MaildirMessage" state | |===================|=================================| | R flag | S flag | +-------------------+---------------------------------+ | O flag | “cur” subdirectory | +-------------------+---------------------------------+ | D flag | T flag | +-------------------+---------------------------------+ | F flag | F flag | +-------------------+---------------------------------+ | A flag | R flag | +-------------------+---------------------------------+ When an "MMDFMessage" instance is created based upon an "MHMessage" instance, the following conversions take place: +---------------------+----------------------------+ | Resulting state | "MHMessage" state | |=====================|============================| | R flag and O flag | no “unseen” sequence | +---------------------+----------------------------+ | O flag | “unseen” sequence | +---------------------+----------------------------+ | F flag | “flagged” sequence | +---------------------+----------------------------+ | A flag | “replied” sequence | +---------------------+----------------------------+ When an "MMDFMessage" instance is created based upon a "BabylMessage" instance, the following conversions take place: +---------------------+-------------------------------+ | Resulting state | "BabylMessage" state | |=====================|===============================| | R flag and O flag | no “unseen” label | +---------------------+-------------------------------+ | O flag | “unseen” label | +---------------------+-------------------------------+ | D flag | “deleted” label | +---------------------+-------------------------------+ | A flag | “answered” label | +---------------------+-------------------------------+ When an "MMDFMessage" instance is created based upon an "mboxMessage" instance, the “From ” line is copied and all flags directly correspond: +-------------------+------------------------------+ | Resulting state | "mboxMessage" state | |===================|==============================| | R flag | R flag | +-------------------+------------------------------+ | O flag | O flag | +-------------------+------------------------------+ | D flag | D flag | +-------------------+------------------------------+ | F flag | F flag | +-------------------+------------------------------+ | A flag | A flag | +-------------------+------------------------------+ Exceptions ========== The following exception classes are defined in the "mailbox" module: exception mailbox.Error The based class for all other module-specific exceptions. exception mailbox.NoSuchMailboxError Raised when a mailbox is expected but is not found, such as when instantiating a "Mailbox" subclass with a path that does not exist (and with the *create* parameter set to "False"), or when opening a folder that does not exist. exception mailbox.NotEmptyError Raised when a mailbox is not empty but is expected to be, such as when deleting a folder that contains messages. exception mailbox.ExternalClashError Raised when some mailbox-related condition beyond the control of the program causes it to be unable to proceed, such as when failing to acquire a lock that another program already holds a lock, or when a uniquely-generated file name already exists. exception mailbox.FormatError Raised when the data in a file cannot be parsed, such as when an "MH" instance attempts to read a corrupted ".mh_sequences" file. Examples ======== A simple example of printing the subjects of all messages in a mailbox that seem interesting: import mailbox for message in mailbox.mbox('~/mbox'): subject = message['subject'] # Could possibly be None. if subject and 'python' in subject.lower(): print(subject) To copy all mail from a Babyl mailbox to an MH mailbox, converting all of the format-specific information that can be converted: import mailbox destination = mailbox.MH('~/Mail') destination.lock() for message in mailbox.Babyl('~/RMAIL'): destination.add(mailbox.MHMessage(message)) destination.flush() destination.unlock() This example sorts mail from several mailing lists into different mailboxes, being careful to avoid mail corruption due to concurrent modification by other programs, mail loss due to interruption of the program, or premature termination due to malformed messages in the mailbox: import mailbox import email.errors list_names = ('python-list', 'python-dev', 'python-bugs') boxes = {name: mailbox.mbox('~/email/%s' % name) for name in list_names} inbox = mailbox.Maildir('~/Maildir', factory=None) for key in inbox.iterkeys(): try: message = inbox[key] except email.errors.MessageParseError: continue # The message is malformed. Just leave it. for name in list_names: list_id = message['list-id'] if list_id and name in list_id: # Get mailbox to use box = boxes[name] # Write copy to disk before removing original. # If there's a crash, you might duplicate a message, but # that's better than losing a message completely. box.lock() box.add(message) box.flush() box.unlock() # Remove original message inbox.lock() inbox.discard(key) inbox.flush() inbox.unlock() break # Found destination, so stop looking. for box in boxes.itervalues(): box.close() "mailcap" — Mailcap file handling ********************************* **Source code:** Lib/mailcap.py Deprecated since version 3.11: The "mailcap" module is deprecated (see **PEP 594** for details). The "mimetypes" module provides an alternative. ====================================================================== Mailcap files are used to configure how MIME-aware applications such as mail readers and Web browsers react to files with different MIME types. (The name “mailcap” is derived from the phrase “mail capability”.) For example, a mailcap file might contain a line like "video/mpeg; xmpeg %s". Then, if the user encounters an email message or Web document with the MIME type *video/mpeg*, "%s" will be replaced by a filename (usually one belonging to a temporary file) and the **xmpeg** program can be automatically started to view the file. The mailcap format is documented in **RFC 1524**, “A User Agent Configuration Mechanism For Multimedia Mail Format Information”, but is not an Internet standard. However, mailcap files are supported on most Unix systems. mailcap.findmatch(caps, MIMEtype, key='view', filename='/dev/null', plist=[]) Return a 2-tuple; the first element is a string containing the command line to be executed (which can be passed to "os.system()"), and the second element is the mailcap entry for a given MIME type. If no matching MIME type can be found, "(None, None)" is returned. *key* is the name of the field desired, which represents the type of activity to be performed; the default value is ‘view’, since in the most common case you simply want to view the body of the MIME- typed data. Other possible values might be ‘compose’ and ‘edit’, if you wanted to create a new body of the given MIME type or alter the existing body data. See **RFC 1524** for a complete list of these fields. *filename* is the filename to be substituted for "%s" in the command line; the default value is "'/dev/null'" which is almost certainly not what you want, so usually you’ll override it by specifying a filename. *plist* can be a list containing named parameters; the default value is simply an empty list. Each entry in the list must be a string containing the parameter name, an equals sign ("'='"), and the parameter’s value. Mailcap entries can contain named parameters like "%{foo}", which will be replaced by the value of the parameter named ‘foo’. For example, if the command line "showpartial %{id} %{number} %{total}" was in a mailcap file, and *plist* was set to "['id=1', 'number=2', 'total=3']", the resulting command line would be "'showpartial 1 2 3'". In a mailcap file, the “test” field can optionally be specified to test some external condition (such as the machine architecture, or the window system in use) to determine whether or not the mailcap line applies. "findmatch()" will automatically check such conditions and skip the entry if the check fails. Changed in version 3.9.16: To prevent security issues with shell metacharacters (symbols that have special effects in a shell command line), "findmatch" will refuse to inject ASCII characters other than alphanumerics and "@+=:,./-_" into the returned command line.If a disallowed character appears in *filename*, "findmatch" will always return "(None, None)" as if no entry was found. If such a character appears elsewhere (a value in *plist* or in *MIMEtype*), "findmatch" will ignore all mailcap entries which use that value. A "warning" will be raised in either case. mailcap.getcaps() Returns a dictionary mapping MIME types to a list of mailcap file entries. This dictionary must be passed to the "findmatch()" function. An entry is stored as a list of dictionaries, but it shouldn’t be necessary to know the details of this representation. The information is derived from all of the mailcap files found on the system. Settings in the user’s mailcap file "$HOME/.mailcap" will override settings in the system mailcap files "/etc/mailcap", "/usr/etc/mailcap", and "/usr/local/etc/mailcap". An example usage: >>> import mailcap >>> d = mailcap.getcaps() >>> mailcap.findmatch(d, 'video/mpeg', filename='tmp1223') ('xmpeg tmp1223', {'view': 'xmpeg %s'}) Structured Markup Processing Tools ********************************** Python supports a variety of modules to work with various forms of structured data markup. This includes modules to work with the Standard Generalized Markup Language (SGML) and the Hypertext Markup Language (HTML), and several interfaces for working with the Extensible Markup Language (XML). * "html" — HyperText Markup Language support * "html.parser" — Simple HTML and XHTML parser * Example HTML Parser Application * "HTMLParser" Methods * Examples * "html.entities" — Definitions of HTML general entities * XML Processing Modules * XML vulnerabilities * The "defusedxml" Package * "xml.etree.ElementTree" — The ElementTree XML API * Tutorial * XML tree and elements * Parsing XML * Pull API for non-blocking parsing * Finding interesting elements * Modifying an XML File * Building XML documents * Parsing XML with Namespaces * XPath support * Example * Supported XPath syntax * Reference * Functions * XInclude support * Example * Reference * Functions * Element Objects * ElementTree Objects * QName Objects * TreeBuilder Objects * XMLParser Objects * XMLPullParser Objects * Exceptions * "xml.dom" — The Document Object Model API * Module Contents * Objects in the DOM * DOMImplementation Objects * Node Objects * NodeList Objects * DocumentType Objects * Document Objects * Element Objects * Attr Objects * NamedNodeMap Objects * Comment Objects * Text and CDATASection Objects * ProcessingInstruction Objects * Exceptions * Conformance * Type Mapping * Accessor Methods * "xml.dom.minidom" — Minimal DOM implementation * DOM Objects * DOM Example * minidom and the DOM standard * "xml.dom.pulldom" — Support for building partial DOM trees * DOMEventStream Objects * "xml.sax" — Support for SAX2 parsers * SAXException Objects * "xml.sax.handler" — Base classes for SAX handlers * ContentHandler Objects * DTDHandler Objects * EntityResolver Objects * ErrorHandler Objects * "xml.sax.saxutils" — SAX Utilities * "xml.sax.xmlreader" — Interface for XML parsers * XMLReader Objects * IncrementalParser Objects * Locator Objects * InputSource Objects * The "Attributes" Interface * The "AttributesNS" Interface * "xml.parsers.expat" — Fast XML parsing using Expat * XMLParser Objects * ExpatError Exceptions * Example * Content Model Descriptions * Expat error constants "marshal" — Internal Python object serialization ************************************************ ====================================================================== This module contains functions that can read and write Python values in a binary format. The format is specific to Python, but independent of machine architecture issues (e.g., you can write a Python value to a file on a PC, transport the file to a Sun, and read it back there). Details of the format are undocumented on purpose; it may change between Python versions (although it rarely does). [1] This is not a general “persistence” module. For general persistence and transfer of Python objects through RPC calls, see the modules "pickle" and "shelve". The "marshal" module exists mainly to support reading and writing the “pseudo-compiled” code for Python modules of ".pyc" files. Therefore, the Python maintainers reserve the right to modify the marshal format in backward incompatible ways should the need arise. If you’re serializing and de-serializing Python objects, use the "pickle" module instead – the performance is comparable, version independence is guaranteed, and pickle supports a substantially wider range of objects than marshal. Warning: The "marshal" module is not intended to be secure against erroneous or maliciously constructed data. Never unmarshal data received from an untrusted or unauthenticated source. Not all Python object types are supported; in general, only objects whose value is independent from a particular invocation of Python can be written and read by this module. The following types are supported: booleans, integers, floating point numbers, complex numbers, strings, bytes, bytearrays, tuples, lists, sets, frozensets, dictionaries, and code objects, where it should be understood that tuples, lists, sets, frozensets and dictionaries are only supported as long as the values contained therein are themselves supported. The singletons "None", "Ellipsis" and "StopIteration" can also be marshalled and unmarshalled. For format *version* lower than 3, recursive lists, sets and dictionaries cannot be written (see below). There are functions that read/write files as well as functions operating on bytes-like objects. The module defines these functions: marshal.dump(value, file[, version]) Write the value on the open file. The value must be a supported type. The file must be a writeable *binary file*. If the value has (or contains an object that has) an unsupported type, a "ValueError" exception is raised — but garbage data will also be written to the file. The object will not be properly read back by "load()". The *version* argument indicates the data format that "dump" should use (see below). Raises an auditing event "marshal.dumps" with arguments "value", "version". marshal.load(file) Read one value from the open file and return it. If no valid value is read (e.g. because the data has a different Python version’s incompatible marshal format), raise "EOFError", "ValueError" or "TypeError". The file must be a readable *binary file*. Raises an auditing event "marshal.load" with no arguments. Note: If an object containing an unsupported type was marshalled with "dump()", "load()" will substitute "None" for the unmarshallable type. Changed in version 3.9.7: This call used to raise a "code.__new__" audit event for each code object. Now it raises a single "marshal.load" event for the entire load operation. marshal.dumps(value[, version]) Return the bytes object that would be written to a file by "dump(value, file)". The value must be a supported type. Raise a "ValueError" exception if value has (or contains an object that has) an unsupported type. The *version* argument indicates the data format that "dumps" should use (see below). Raises an auditing event "marshal.dumps" with arguments "value", "version". marshal.loads(bytes) Convert the *bytes-like object* to a value. If no valid value is found, raise "EOFError", "ValueError" or "TypeError". Extra bytes in the input are ignored. Raises an auditing event "marshal.loads" with argument "bytes". Changed in version 3.9.7: This call used to raise a "code.__new__" audit event for each code object. Now it raises a single "marshal.loads" event for the entire load operation. In addition, the following constants are defined: marshal.version Indicates the format that the module uses. Version 0 is the historical format, version 1 shares interned strings and version 2 uses a binary format for floating point numbers. Version 3 adds support for object instancing and recursion. The current version is 4. -[ Footnotes ]- [1] The name of this module stems from a bit of terminology used by the designers of Modula-3 (amongst others), who use the term “marshalling” for shipping of data around in a self-contained form. Strictly speaking, “to marshal” means to convert some data from internal to external form (in an RPC buffer for instance) and “unmarshalling” for the reverse process. "math" — Mathematical functions ******************************* ====================================================================== This module provides access to the mathematical functions defined by the C standard. These functions cannot be used with complex numbers; use the functions of the same name from the "cmath" module if you require support for complex numbers. The distinction between functions which support complex numbers and those which don’t is made since most users do not want to learn quite as much mathematics as required to understand complex numbers. Receiving an exception instead of a complex result allows earlier detection of the unexpected complex number used as a parameter, so that the programmer can determine how and why it was generated in the first place. The following functions are provided by this module. Except when explicitly noted otherwise, all return values are floats. Number-theoretic and representation functions ============================================= math.ceil(x) Return the ceiling of *x*, the smallest integer greater than or equal to *x*. If *x* is not a float, delegates to "x.__ceil__", which should return an "Integral" value. math.comb(n, k) Return the number of ways to choose *k* items from *n* items without repetition and without order. Evaluates to "n! / (k! * (n - k)!)" when "k <= n" and evaluates to zero when "k > n". Also called the binomial coefficient because it is equivalent to the coefficient of k-th term in polynomial expansion of the expression "(1 + x) ** n". Raises "TypeError" if either of the arguments are not integers. Raises "ValueError" if either of the arguments are negative. New in version 3.8. math.copysign(x, y) Return a float with the magnitude (absolute value) of *x* but the sign of *y*. On platforms that support signed zeros, "copysign(1.0, -0.0)" returns *-1.0*. math.fabs(x) Return the absolute value of *x*. math.factorial(x) Return *x* factorial as an integer. Raises "ValueError" if *x* is not integral or is negative. Deprecated since version 3.9: Accepting floats with integral values (like "5.0") is deprecated. math.floor(x) Return the floor of *x*, the largest integer less than or equal to *x*. If *x* is not a float, delegates to "x.__floor__", which should return an "Integral" value. math.fmod(x, y) Return "fmod(x, y)", as defined by the platform C library. Note that the Python expression "x % y" may not return the same result. The intent of the C standard is that "fmod(x, y)" be exactly (mathematically; to infinite precision) equal to "x - n*y" for some integer *n* such that the result has the same sign as *x* and magnitude less than "abs(y)". Python’s "x % y" returns a result with the sign of *y* instead, and may not be exactly computable for float arguments. For example, "fmod(-1e-100, 1e100)" is "-1e-100", but the result of Python’s "-1e-100 % 1e100" is "1e100-1e-100", which cannot be represented exactly as a float, and rounds to the surprising "1e100". For this reason, function "fmod()" is generally preferred when working with floats, while Python’s "x % y" is preferred when working with integers. math.frexp(x) Return the mantissa and exponent of *x* as the pair "(m, e)". *m* is a float and *e* is an integer such that "x == m * 2**e" exactly. If *x* is zero, returns "(0.0, 0)", otherwise "0.5 <= abs(m) < 1". This is used to “pick apart” the internal representation of a float in a portable way. math.fsum(iterable) Return an accurate floating point sum of values in the iterable. Avoids loss of precision by tracking multiple intermediate partial sums: >>> sum([.1, .1, .1, .1, .1, .1, .1, .1, .1, .1]) 0.9999999999999999 >>> fsum([.1, .1, .1, .1, .1, .1, .1, .1, .1, .1]) 1.0 The algorithm’s accuracy depends on IEEE-754 arithmetic guarantees and the typical case where the rounding mode is half-even. On some non-Windows builds, the underlying C library uses extended precision addition and may occasionally double-round an intermediate sum causing it to be off in its least significant bit. For further discussion and two alternative approaches, see the ASPN cookbook recipes for accurate floating point summation. math.gcd(*integers) Return the greatest common divisor of the specified integer arguments. If any of the arguments is nonzero, then the returned value is the largest positive integer that is a divisor of all arguments. If all arguments are zero, then the returned value is "0". "gcd()" without arguments returns "0". New in version 3.5. Changed in version 3.9: Added support for an arbitrary number of arguments. Formerly, only two arguments were supported. math.isclose(a, b, *, rel_tol=1e-09, abs_tol=0.0) Return "True" if the values *a* and *b* are close to each other and "False" otherwise. Whether or not two values are considered close is determined according to given absolute and relative tolerances. *rel_tol* is the relative tolerance – it is the maximum allowed difference between *a* and *b*, relative to the larger absolute value of *a* or *b*. For example, to set a tolerance of 5%, pass "rel_tol=0.05". The default tolerance is "1e-09", which assures that the two values are the same within about 9 decimal digits. *rel_tol* must be greater than zero. *abs_tol* is the minimum absolute tolerance – useful for comparisons near zero. *abs_tol* must be at least zero. If no errors occur, the result will be: "abs(a-b) <= max(rel_tol * max(abs(a), abs(b)), abs_tol)". The IEEE 754 special values of "NaN", "inf", and "-inf" will be handled according to IEEE rules. Specifically, "NaN" is not considered close to any other value, including "NaN". "inf" and "-inf" are only considered close to themselves. New in version 3.5. See also: **PEP 485** – A function for testing approximate equality math.isfinite(x) Return "True" if *x* is neither an infinity nor a NaN, and "False" otherwise. (Note that "0.0" *is* considered finite.) New in version 3.2. math.isinf(x) Return "True" if *x* is a positive or negative infinity, and "False" otherwise. math.isnan(x) Return "True" if *x* is a NaN (not a number), and "False" otherwise. math.isqrt(n) Return the integer square root of the nonnegative integer *n*. This is the floor of the exact square root of *n*, or equivalently the greatest integer *a* such that *a*² ≤ *n*. For some applications, it may be more convenient to have the least integer *a* such that *n* ≤ *a*², or in other words the ceiling of the exact square root of *n*. For positive *n*, this can be computed using "a = 1 + isqrt(n - 1)". New in version 3.8. math.lcm(*integers) Return the least common multiple of the specified integer arguments. If all arguments are nonzero, then the returned value is the smallest positive integer that is a multiple of all arguments. If any of the arguments is zero, then the returned value is "0". "lcm()" without arguments returns "1". New in version 3.9. math.ldexp(x, i) Return "x * (2**i)". This is essentially the inverse of function "frexp()". math.modf(x) Return the fractional and integer parts of *x*. Both results carry the sign of *x* and are floats. math.nextafter(x, y) Return the next floating-point value after *x* towards *y*. If *x* is equal to *y*, return *y*. Examples: * "math.nextafter(x, math.inf)" goes up: towards positive infinity. * "math.nextafter(x, -math.inf)" goes down: towards minus infinity. * "math.nextafter(x, 0.0)" goes towards zero. * "math.nextafter(x, math.copysign(math.inf, x))" goes away from zero. See also "math.ulp()". New in version 3.9. math.perm(n, k=None) Return the number of ways to choose *k* items from *n* items without repetition and with order. Evaluates to "n! / (n - k)!" when "k <= n" and evaluates to zero when "k > n". If *k* is not specified or is None, then *k* defaults to *n* and the function returns "n!". Raises "TypeError" if either of the arguments are not integers. Raises "ValueError" if either of the arguments are negative. New in version 3.8. math.prod(iterable, *, start=1) Calculate the product of all the elements in the input *iterable*. The default *start* value for the product is "1". When the iterable is empty, return the start value. This function is intended specifically for use with numeric values and may reject non-numeric types. New in version 3.8. math.remainder(x, y) Return the IEEE 754-style remainder of *x* with respect to *y*. For finite *x* and finite nonzero *y*, this is the difference "x - n*y", where "n" is the closest integer to the exact value of the quotient "x / y". If "x / y" is exactly halfway between two consecutive integers, the nearest *even* integer is used for "n". The remainder "r = remainder(x, y)" thus always satisfies "abs(r) <= 0.5 * abs(y)". Special cases follow IEEE 754: in particular, "remainder(x, math.inf)" is *x* for any finite *x*, and "remainder(x, 0)" and "remainder(math.inf, x)" raise "ValueError" for any non-NaN *x*. If the result of the remainder operation is zero, that zero will have the same sign as *x*. On platforms using IEEE 754 binary floating-point, the result of this operation is always exactly representable: no rounding error is introduced. New in version 3.7. math.trunc(x) Return *x* with the fractional part removed, leaving the integer part. This rounds toward 0: "trunc()" is equivalent to "floor()" for positive *x*, and equivalent to "ceil()" for negative *x*. If *x* is not a float, delegates to "x.__trunc__", which should return an "Integral" value. math.ulp(x) Return the value of the least significant bit of the float *x*: * If *x* is a NaN (not a number), return *x*. * If *x* is negative, return "ulp(-x)". * If *x* is a positive infinity, return *x*. * If *x* is equal to zero, return the smallest positive *denormalized* representable float (smaller than the minimum positive *normalized* float, "sys.float_info.min"). * If *x* is equal to the largest positive representable float, return the value of the least significant bit of *x*, such that the first float smaller than *x* is "x - ulp(x)". * Otherwise (*x* is a positive finite number), return the value of the least significant bit of *x*, such that the first float bigger than *x* is "x + ulp(x)". ULP stands for “Unit in the Last Place”. See also "math.nextafter()" and "sys.float_info.epsilon". New in version 3.9. Note that "frexp()" and "modf()" have a different call/return pattern than their C equivalents: they take a single argument and return a pair of values, rather than returning their second return value through an ‘output parameter’ (there is no such thing in Python). For the "ceil()", "floor()", and "modf()" functions, note that *all* floating-point numbers of sufficiently large magnitude are exact integers. Python floats typically carry no more than 53 bits of precision (the same as the platform C double type), in which case any float *x* with "abs(x) >= 2**52" necessarily has no fractional bits. Power and logarithmic functions =============================== math.exp(x) Return *e* raised to the power *x*, where *e* = 2.718281… is the base of natural logarithms. This is usually more accurate than "math.e ** x" or "pow(math.e, x)". math.expm1(x) Return *e* raised to the power *x*, minus 1. Here *e* is the base of natural logarithms. For small floats *x*, the subtraction in "exp(x) - 1" can result in a significant loss of precision; the "expm1()" function provides a way to compute this quantity to full precision: >>> from math import exp, expm1 >>> exp(1e-5) - 1 # gives result accurate to 11 places 1.0000050000069649e-05 >>> expm1(1e-5) # result accurate to full precision 1.0000050000166668e-05 New in version 3.2. math.log(x[, base]) With one argument, return the natural logarithm of *x* (to base *e*). With two arguments, return the logarithm of *x* to the given *base*, calculated as "log(x)/log(base)". math.log1p(x) Return the natural logarithm of *1+x* (base *e*). The result is calculated in a way which is accurate for *x* near zero. math.log2(x) Return the base-2 logarithm of *x*. This is usually more accurate than "log(x, 2)". New in version 3.3. See also: "int.bit_length()" returns the number of bits necessary to represent an integer in binary, excluding the sign and leading zeros. math.log10(x) Return the base-10 logarithm of *x*. This is usually more accurate than "log(x, 10)". math.pow(x, y) Return "x" raised to the power "y". Exceptional cases follow Annex ‘F’ of the C99 standard as far as possible. In particular, "pow(1.0, x)" and "pow(x, 0.0)" always return "1.0", even when "x" is a zero or a NaN. If both "x" and "y" are finite, "x" is negative, and "y" is not an integer then "pow(x, y)" is undefined, and raises "ValueError". Unlike the built-in "**" operator, "math.pow()" converts both its arguments to type "float". Use "**" or the built-in "pow()" function for computing exact integer powers. math.sqrt(x) Return the square root of *x*. Trigonometric functions ======================= math.acos(x) Return the arc cosine of *x*, in radians. The result is between "0" and "pi". math.asin(x) Return the arc sine of *x*, in radians. The result is between "-pi/2" and "pi/2". math.atan(x) Return the arc tangent of *x*, in radians. The result is between "-pi/2" and "pi/2". math.atan2(y, x) Return "atan(y / x)", in radians. The result is between "-pi" and "pi". The vector in the plane from the origin to point "(x, y)" makes this angle with the positive X axis. The point of "atan2()" is that the signs of both inputs are known to it, so it can compute the correct quadrant for the angle. For example, "atan(1)" and "atan2(1, 1)" are both "pi/4", but "atan2(-1, -1)" is "-3*pi/4". math.cos(x) Return the cosine of *x* radians. math.dist(p, q) Return the Euclidean distance between two points *p* and *q*, each given as a sequence (or iterable) of coordinates. The two points must have the same dimension. Roughly equivalent to: sqrt(sum((px - qx) ** 2.0 for px, qx in zip(p, q))) New in version 3.8. math.hypot(*coordinates) Return the Euclidean norm, "sqrt(sum(x**2 for x in coordinates))". This is the length of the vector from the origin to the point given by the coordinates. For a two dimensional point "(x, y)", this is equivalent to computing the hypotenuse of a right triangle using the Pythagorean theorem, "sqrt(x*x + y*y)". Changed in version 3.8: Added support for n-dimensional points. Formerly, only the two dimensional case was supported. math.sin(x) Return the sine of *x* radians. math.tan(x) Return the tangent of *x* radians. Angular conversion ================== math.degrees(x) Convert angle *x* from radians to degrees. math.radians(x) Convert angle *x* from degrees to radians. Hyperbolic functions ==================== Hyperbolic functions are analogs of trigonometric functions that are based on hyperbolas instead of circles. math.acosh(x) Return the inverse hyperbolic cosine of *x*. math.asinh(x) Return the inverse hyperbolic sine of *x*. math.atanh(x) Return the inverse hyperbolic tangent of *x*. math.cosh(x) Return the hyperbolic cosine of *x*. math.sinh(x) Return the hyperbolic sine of *x*. math.tanh(x) Return the hyperbolic tangent of *x*. Special functions ================= math.erf(x) Return the error function at *x*. The "erf()" function can be used to compute traditional statistical functions such as the cumulative standard normal distribution: def phi(x): 'Cumulative distribution function for the standard normal distribution' return (1.0 + erf(x / sqrt(2.0))) / 2.0 New in version 3.2. math.erfc(x) Return the complementary error function at *x*. The complementary error function is defined as "1.0 - erf(x)". It is used for large values of *x* where a subtraction from one would cause a loss of significance. New in version 3.2. math.gamma(x) Return the Gamma function at *x*. New in version 3.2. math.lgamma(x) Return the natural logarithm of the absolute value of the Gamma function at *x*. New in version 3.2. Constants ========= math.pi The mathematical constant *π* = 3.141592…, to available precision. math.e The mathematical constant *e* = 2.718281…, to available precision. math.tau The mathematical constant *τ* = 6.283185…, to available precision. Tau is a circle constant equal to 2*π*, the ratio of a circle’s circumference to its radius. To learn more about Tau, check out Vi Hart’s video Pi is (still) Wrong, and start celebrating Tau day by eating twice as much pie! New in version 3.6. math.inf A floating-point positive infinity. (For negative infinity, use "-math.inf".) Equivalent to the output of "float('inf')". New in version 3.5. math.nan A floating-point “not a number” (NaN) value. Equivalent to the output of "float('nan')". Due to the requirements of the IEEE-754 standard, "math.nan" and "float('nan')" are not considered to equal to any other numeric value, including themselves. To check whether a number is a NaN, use the "isnan()" function to test for NaNs instead of "is" or "==". Example: >>> import math >>> math.nan == math.nan False >>> float('nan') == float('nan') False >>> math.isnan(math.nan) True >>> math.isnan(float('nan')) True New in version 3.5. **CPython implementation detail:** The "math" module consists mostly of thin wrappers around the platform C math library functions. Behavior in exceptional cases follows Annex F of the C99 standard where appropriate. The current implementation will raise "ValueError" for invalid operations like "sqrt(-1.0)" or "log(0.0)" (where C99 Annex F recommends signaling invalid operation or divide-by-zero), and "OverflowError" for results that overflow (for example, "exp(1000.0)"). A NaN will not be returned from any of the functions above unless one or more of the input arguments was a NaN; in that case, most functions will return a NaN, but (again following C99 Annex F) there are some exceptions to this rule, for example "pow(float('nan'), 0.0)" or "hypot(float('nan'), float('inf'))". Note that Python makes no effort to distinguish signaling NaNs from quiet NaNs, and behavior for signaling NaNs remains unspecified. Typical behavior is to treat all NaNs as though they were quiet. See also: Module "cmath" Complex number versions of many of these functions. "mimetypes" — Map filenames to MIME types ***************************************** **Source code:** Lib/mimetypes.py ====================================================================== The "mimetypes" module converts between a filename or URL and the MIME type associated with the filename extension. Conversions are provided from filename to MIME type and from MIME type to filename extension; encodings are not supported for the latter conversion. The module provides one class and a number of convenience functions. The functions are the normal interface to this module, but some applications may be interested in the class as well. The functions described below provide the primary interface for this module. If the module has not been initialized, they will call "init()" if they rely on the information "init()" sets up. mimetypes.guess_type(url, strict=True) Guess the type of a file based on its filename, path or URL, given by *url*. URL can be a string or a *path-like object*. The return value is a tuple "(type, encoding)" where *type* is "None" if the type can’t be guessed (missing or unknown suffix) or a string of the form "'type/subtype'", usable for a MIME *content- type* header. *encoding* is "None" for no encoding or the name of the program used to encode (e.g. **compress** or **gzip**). The encoding is suitable for use as a *Content-Encoding* header, **not** as a *Content-Transfer-Encoding* header. The mappings are table driven. Encoding suffixes are case sensitive; type suffixes are first tried case sensitively, then case insensitively. The optional *strict* argument is a flag specifying whether the list of known MIME types is limited to only the official types registered with IANA. When *strict* is "True" (the default), only the IANA types are supported; when *strict* is "False", some additional non-standard but commonly used MIME types are also recognized. Changed in version 3.8: Added support for url being a *path-like object*. mimetypes.guess_all_extensions(type, strict=True) Guess the extensions for a file based on its MIME type, given by *type*. The return value is a list of strings giving all possible filename extensions, including the leading dot ("'.'"). The extensions are not guaranteed to have been associated with any particular data stream, but would be mapped to the MIME type *type* by "guess_type()". The optional *strict* argument has the same meaning as with the "guess_type()" function. mimetypes.guess_extension(type, strict=True) Guess the extension for a file based on its MIME type, given by *type*. The return value is a string giving a filename extension, including the leading dot ("'.'"). The extension is not guaranteed to have been associated with any particular data stream, but would be mapped to the MIME type *type* by "guess_type()". If no extension can be guessed for *type*, "None" is returned. The optional *strict* argument has the same meaning as with the "guess_type()" function. Some additional functions and data items are available for controlling the behavior of the module. mimetypes.init(files=None) Initialize the internal data structures. If given, *files* must be a sequence of file names which should be used to augment the default type map. If omitted, the file names to use are taken from "knownfiles"; on Windows, the current registry settings are loaded. Each file named in *files* or "knownfiles" takes precedence over those named before it. Calling "init()" repeatedly is allowed. Specifying an empty list for *files* will prevent the system defaults from being applied: only the well-known values will be present from a built-in list. If *files* is "None" the internal data structure is completely rebuilt to its initial default value. This is a stable operation and will produce the same results when called multiple times. Changed in version 3.2: Previously, Windows registry settings were ignored. mimetypes.read_mime_types(filename) Load the type map given in the file *filename*, if it exists. The type map is returned as a dictionary mapping filename extensions, including the leading dot ("'.'"), to strings of the form "'type/subtype'". If the file *filename* does not exist or cannot be read, "None" is returned. mimetypes.add_type(type, ext, strict=True) Add a mapping from the MIME type *type* to the extension *ext*. When the extension is already known, the new type will replace the old one. When the type is already known the extension will be added to the list of known extensions. When *strict* is "True" (the default), the mapping will be added to the official MIME types, otherwise to the non-standard ones. mimetypes.inited Flag indicating whether or not the global data structures have been initialized. This is set to "True" by "init()". mimetypes.knownfiles List of type map file names commonly installed. These files are typically named "mime.types" and are installed in different locations by different packages. mimetypes.suffix_map Dictionary mapping suffixes to suffixes. This is used to allow recognition of encoded files for which the encoding and the type are indicated by the same extension. For example, the ".tgz" extension is mapped to ".tar.gz" to allow the encoding and type to be recognized separately. mimetypes.encodings_map Dictionary mapping filename extensions to encoding types. mimetypes.types_map Dictionary mapping filename extensions to MIME types. mimetypes.common_types Dictionary mapping filename extensions to non-standard, but commonly found MIME types. An example usage of the module: >>> import mimetypes >>> mimetypes.init() >>> mimetypes.knownfiles ['/etc/mime.types', '/etc/httpd/mime.types', ... ] >>> mimetypes.suffix_map['.tgz'] '.tar.gz' >>> mimetypes.encodings_map['.gz'] 'gzip' >>> mimetypes.types_map['.tgz'] 'application/x-tar-gz' MimeTypes Objects ================= The "MimeTypes" class may be useful for applications which may want more than one MIME-type database; it provides an interface similar to the one of the "mimetypes" module. class mimetypes.MimeTypes(filenames=(), strict=True) This class represents a MIME-types database. By default, it provides access to the same database as the rest of this module. The initial database is a copy of that provided by the module, and may be extended by loading additional "mime.types"-style files into the database using the "read()" or "readfp()" methods. The mapping dictionaries may also be cleared before loading additional data if the default data is not desired. The optional *filenames* parameter can be used to cause additional files to be loaded “on top” of the default database. suffix_map Dictionary mapping suffixes to suffixes. This is used to allow recognition of encoded files for which the encoding and the type are indicated by the same extension. For example, the ".tgz" extension is mapped to ".tar.gz" to allow the encoding and type to be recognized separately. This is initially a copy of the global "suffix_map" defined in the module. encodings_map Dictionary mapping filename extensions to encoding types. This is initially a copy of the global "encodings_map" defined in the module. types_map Tuple containing two dictionaries, mapping filename extensions to MIME types: the first dictionary is for the non-standards types and the second one is for the standard types. They are initialized by "common_types" and "types_map". types_map_inv Tuple containing two dictionaries, mapping MIME types to a list of filename extensions: the first dictionary is for the non- standards types and the second one is for the standard types. They are initialized by "common_types" and "types_map". guess_extension(type, strict=True) Similar to the "guess_extension()" function, using the tables stored as part of the object. guess_type(url, strict=True) Similar to the "guess_type()" function, using the tables stored as part of the object. guess_all_extensions(type, strict=True) Similar to the "guess_all_extensions()" function, using the tables stored as part of the object. read(filename, strict=True) Load MIME information from a file named *filename*. This uses "readfp()" to parse the file. If *strict* is "True", information will be added to list of standard types, else to the list of non-standard types. readfp(fp, strict=True) Load MIME type information from an open file *fp*. The file must have the format of the standard "mime.types" files. If *strict* is "True", information will be added to the list of standard types, else to the list of non-standard types. read_windows_registry(strict=True) Load MIME type information from the Windows registry. Availability: Windows. If *strict* is "True", information will be added to the list of standard types, else to the list of non-standard types. New in version 3.2. Miscellaneous Services ********************** The modules described in this chapter provide miscellaneous services that are available in all Python versions. Here’s an overview: * "formatter" — Generic output formatting * The Formatter Interface * Formatter Implementations * The Writer Interface * Writer Implementations Multimedia Services ******************* The modules described in this chapter implement various algorithms or interfaces that are mainly useful for multimedia applications. They are available at the discretion of the installation. Here’s an overview: * "wave" — Read and write WAV files * Wave_read Objects * Wave_write Objects * "colorsys" — Conversions between color systems "mmap" — Memory-mapped file support *********************************** ====================================================================== Memory-mapped file objects behave like both "bytearray" and like *file objects*. You can use mmap objects in most places where "bytearray" are expected; for example, you can use the "re" module to search through a memory-mapped file. You can also change a single byte by doing "obj[index] = 97", or change a subsequence by assigning to a slice: "obj[i1:i2] = b'...'". You can also read and write data starting at the current file position, and "seek()" through the file to different positions. A memory-mapped file is created by the "mmap" constructor, which is different on Unix and on Windows. In either case you must provide a file descriptor for a file opened for update. If you wish to map an existing Python file object, use its "fileno()" method to obtain the correct value for the *fileno* parameter. Otherwise, you can open the file using the "os.open()" function, which returns a file descriptor directly (the file still needs to be closed when done). Note: If you want to create a memory-mapping for a writable, buffered file, you should "flush()" the file first. This is necessary to ensure that local modifications to the buffers are actually available to the mapping. For both the Unix and Windows versions of the constructor, *access* may be specified as an optional keyword parameter. *access* accepts one of four values: "ACCESS_READ", "ACCESS_WRITE", or "ACCESS_COPY" to specify read-only, write-through or copy-on-write memory respectively, or "ACCESS_DEFAULT" to defer to *prot*. *access* can be used on both Unix and Windows. If *access* is not specified, Windows mmap returns a write-through mapping. The initial memory values for all three access types are taken from the specified file. Assignment to an "ACCESS_READ" memory map raises a "TypeError" exception. Assignment to an "ACCESS_WRITE" memory map affects both memory and the underlying file. Assignment to an "ACCESS_COPY" memory map affects memory but does not update the underlying file. Changed in version 3.7: Added "ACCESS_DEFAULT" constant. To map anonymous memory, -1 should be passed as the fileno along with the length. class mmap.mmap(fileno, length, tagname=None, access=ACCESS_DEFAULT[, offset]) **(Windows version)** Maps *length* bytes from the file specified by the file handle *fileno*, and creates a mmap object. If *length* is larger than the current size of the file, the file is extended to contain *length* bytes. If *length* is "0", the maximum length of the map is the current size of the file, except that if the file is empty Windows raises an exception (you cannot create an empty mapping on Windows). *tagname*, if specified and not "None", is a string giving a tag name for the mapping. Windows allows you to have many different mappings against the same file. If you specify the name of an existing tag, that tag is opened, otherwise a new tag of this name is created. If this parameter is omitted or "None", the mapping is created without a name. Avoiding the use of the tag parameter will assist in keeping your code portable between Unix and Windows. *offset* may be specified as a non-negative integer offset. mmap references will be relative to the offset from the beginning of the file. *offset* defaults to 0. *offset* must be a multiple of the "ALLOCATIONGRANULARITY". Raises an auditing event "mmap.__new__" with arguments "fileno", "length", "access", "offset". class mmap.mmap(fileno, length, flags=MAP_SHARED, prot=PROT_WRITE|PROT_READ, access=ACCESS_DEFAULT[, offset]) **(Unix version)** Maps *length* bytes from the file specified by the file descriptor *fileno*, and returns a mmap object. If *length* is "0", the maximum length of the map will be the current size of the file when "mmap" is called. *flags* specifies the nature of the mapping. "MAP_PRIVATE" creates a private copy-on-write mapping, so changes to the contents of the mmap object will be private to this process, and "MAP_SHARED" creates a mapping that’s shared with all other processes mapping the same areas of the file. The default value is "MAP_SHARED". *prot*, if specified, gives the desired memory protection; the two most useful values are "PROT_READ" and "PROT_WRITE", to specify that the pages may be read or written. *prot* defaults to "PROT_READ | PROT_WRITE". *access* may be specified in lieu of *flags* and *prot* as an optional keyword parameter. It is an error to specify both *flags*, *prot* and *access*. See the description of *access* above for information on how to use this parameter. *offset* may be specified as a non-negative integer offset. mmap references will be relative to the offset from the beginning of the file. *offset* defaults to 0. *offset* must be a multiple of "ALLOCATIONGRANULARITY" which is equal to "PAGESIZE" on Unix systems. To ensure validity of the created memory mapping the file specified by the descriptor *fileno* is internally automatically synchronized with physical backing store on macOS and OpenVMS. This example shows a simple way of using "mmap": import mmap # write a simple example file with open("hello.txt", "wb") as f: f.write(b"Hello Python!\n") with open("hello.txt", "r+b") as f: # memory-map the file, size 0 means whole file mm = mmap.mmap(f.fileno(), 0) # read content via standard file methods print(mm.readline()) # prints b"Hello Python!\n" # read content via slice notation print(mm[:5]) # prints b"Hello" # update content using slice notation; # note that new content must have same size mm[6:] = b" world!\n" # ... and read again using standard file methods mm.seek(0) print(mm.readline()) # prints b"Hello world!\n" # close the map mm.close() "mmap" can also be used as a context manager in a "with" statement: import mmap with mmap.mmap(-1, 13) as mm: mm.write(b"Hello world!") New in version 3.2: Context manager support. The next example demonstrates how to create an anonymous map and exchange data between the parent and child processes: import mmap import os mm = mmap.mmap(-1, 13) mm.write(b"Hello world!") pid = os.fork() if pid == 0: # In a child process mm.seek(0) print(mm.readline()) mm.close() Raises an auditing event "mmap.__new__" with arguments "fileno", "length", "access", "offset". Memory-mapped file objects support the following methods: close() Closes the mmap. Subsequent calls to other methods of the object will result in a ValueError exception being raised. This will not close the open file. closed "True" if the file is closed. New in version 3.2. find(sub[, start[, end]]) Returns the lowest index in the object where the subsequence *sub* is found, such that *sub* is contained in the range [*start*, *end*]. Optional arguments *start* and *end* are interpreted as in slice notation. Returns "-1" on failure. Changed in version 3.5: Writable *bytes-like object* is now accepted. flush([offset[, size]]) Flushes changes made to the in-memory copy of a file back to disk. Without use of this call there is no guarantee that changes are written back before the object is destroyed. If *offset* and *size* are specified, only changes to the given range of bytes will be flushed to disk; otherwise, the whole extent of the mapping is flushed. *offset* must be a multiple of the "PAGESIZE" or "ALLOCATIONGRANULARITY". "None" is returned to indicate success. An exception is raised when the call failed. Changed in version 3.8: Previously, a nonzero value was returned on success; zero was returned on error under Windows. A zero value was returned on success; an exception was raised on error under Unix. madvise(option[, start[, length]]) Send advice *option* to the kernel about the memory region beginning at *start* and extending *length* bytes. *option* must be one of the MADV_* constants available on the system. If *start* and *length* are omitted, the entire mapping is spanned. On some systems (including Linux), *start* must be a multiple of the "PAGESIZE". Availability: Systems with the "madvise()" system call. New in version 3.8. move(dest, src, count) Copy the *count* bytes starting at offset *src* to the destination index *dest*. If the mmap was created with "ACCESS_READ", then calls to move will raise a "TypeError" exception. read([n]) Return a "bytes" containing up to *n* bytes starting from the current file position. If the argument is omitted, "None" or negative, return all bytes from the current file position to the end of the mapping. The file position is updated to point after the bytes that were returned. Changed in version 3.3: Argument can be omitted or "None". read_byte() Returns a byte at the current file position as an integer, and advances the file position by 1. readline() Returns a single line, starting at the current file position and up to the next newline. The file position is updated to point after the bytes that were returned. resize(newsize) Resizes the map and the underlying file, if any. If the mmap was created with "ACCESS_READ" or "ACCESS_COPY", resizing the map will raise a "TypeError" exception. rfind(sub[, start[, end]]) Returns the highest index in the object where the subsequence *sub* is found, such that *sub* is contained in the range [*start*, *end*]. Optional arguments *start* and *end* are interpreted as in slice notation. Returns "-1" on failure. Changed in version 3.5: Writable *bytes-like object* is now accepted. seek(pos[, whence]) Set the file’s current position. *whence* argument is optional and defaults to "os.SEEK_SET" or "0" (absolute file positioning); other values are "os.SEEK_CUR" or "1" (seek relative to the current position) and "os.SEEK_END" or "2" (seek relative to the file’s end). size() Return the length of the file, which can be larger than the size of the memory-mapped area. tell() Returns the current position of the file pointer. write(bytes) Write the bytes in *bytes* into memory at the current position of the file pointer and return the number of bytes written (never less than "len(bytes)", since if the write fails, a "ValueError" will be raised). The file position is updated to point after the bytes that were written. If the mmap was created with "ACCESS_READ", then writing to it will raise a "TypeError" exception. Changed in version 3.5: Writable *bytes-like object* is now accepted. Changed in version 3.6: The number of bytes written is now returned. write_byte(byte) Write the integer *byte* into memory at the current position of the file pointer; the file position is advanced by "1". If the mmap was created with "ACCESS_READ", then writing to it will raise a "TypeError" exception. MADV_* Constants ================ mmap.MADV_NORMAL mmap.MADV_RANDOM mmap.MADV_SEQUENTIAL mmap.MADV_WILLNEED mmap.MADV_DONTNEED mmap.MADV_REMOVE mmap.MADV_DONTFORK mmap.MADV_DOFORK mmap.MADV_HWPOISON mmap.MADV_MERGEABLE mmap.MADV_UNMERGEABLE mmap.MADV_SOFT_OFFLINE mmap.MADV_HUGEPAGE mmap.MADV_NOHUGEPAGE mmap.MADV_DONTDUMP mmap.MADV_DODUMP mmap.MADV_FREE mmap.MADV_NOSYNC mmap.MADV_AUTOSYNC mmap.MADV_NOCORE mmap.MADV_CORE mmap.MADV_PROTECT These options can be passed to "mmap.madvise()". Not every option will be present on every system. Availability: Systems with the madvise() system call. New in version 3.8. "modulefinder" — Find modules used by a script ********************************************** **Source code:** Lib/modulefinder.py ====================================================================== This module provides a "ModuleFinder" class that can be used to determine the set of modules imported by a script. "modulefinder.py" can also be run as a script, giving the filename of a Python script as its argument, after which a report of the imported modules will be printed. modulefinder.AddPackagePath(pkg_name, path) Record that the package named *pkg_name* can be found in the specified *path*. modulefinder.ReplacePackage(oldname, newname) Allows specifying that the module named *oldname* is in fact the package named *newname*. class modulefinder.ModuleFinder(path=None, debug=0, excludes=[], replace_paths=[]) This class provides "run_script()" and "report()" methods to determine the set of modules imported by a script. *path* can be a list of directories to search for modules; if not specified, "sys.path" is used. *debug* sets the debugging level; higher values make the class print debugging messages about what it’s doing. *excludes* is a list of module names to exclude from the analysis. *replace_paths* is a list of "(oldpath, newpath)" tuples that will be replaced in module paths. report() Print a report to standard output that lists the modules imported by the script and their paths, as well as modules that are missing or seem to be missing. run_script(pathname) Analyze the contents of the *pathname* file, which must contain Python code. modules A dictionary mapping module names to modules. See Example usage of ModuleFinder. Example usage of "ModuleFinder" =============================== The script that is going to get analyzed later on (bacon.py): import re, itertools try: import baconhameggs except ImportError: pass try: import guido.python.ham except ImportError: pass The script that will output the report of bacon.py: from modulefinder import ModuleFinder finder = ModuleFinder() finder.run_script('bacon.py') print('Loaded modules:') for name, mod in finder.modules.items(): print('%s: ' % name, end='') print(','.join(list(mod.globalnames.keys())[:3])) print('-'*50) print('Modules not imported:') print('\n'.join(finder.badmodules.keys())) Sample output (may vary depending on the architecture): Loaded modules: _types: copyreg: _inverted_registry,_slotnames,__all__ sre_compile: isstring,_sre,_optimize_unicode _sre: sre_constants: REPEAT_ONE,makedict,AT_END_LINE sys: re: __module__,finditer,_expand itertools: __main__: re,itertools,baconhameggs sre_parse: _PATTERNENDERS,SRE_FLAG_UNICODE array: types: __module__,IntType,TypeType --------------------------------------------------- Modules not imported: guido.python.ham baconhameggs Importing Modules ***************** The modules described in this chapter provide new ways to import other Python modules and hooks for customizing the import process. The full list of modules described in this chapter is: * "zipimport" — Import modules from Zip archives * zipimporter Objects * Examples * "pkgutil" — Package extension utility * "modulefinder" — Find modules used by a script * Example usage of "ModuleFinder" * "runpy" — Locating and executing Python modules * "importlib" — The implementation of "import" * Introduction * Functions * "importlib.abc" – Abstract base classes related to import * "importlib.resources" – Resources * "importlib.machinery" – Importers and path hooks * "importlib.util" – Utility code for importers * Examples * Importing programmatically * Checking if a module can be imported * Importing a source file directly * Setting up an importer * Approximating "importlib.import_module()" * Using "importlib.metadata" * Overview * Functional API * Entry points * Distribution metadata * Distribution versions * Distribution files * Distribution requirements * Distributions * Extending the search algorithm "msilib" — Read and write Microsoft Installer files *************************************************** **Source code:** Lib/msilib/__init__.py Deprecated since version 3.11: The "msilib" module is deprecated (see **PEP 594** for details). ====================================================================== The "msilib" supports the creation of Microsoft Installer (".msi") files. Because these files often contain an embedded “cabinet” file (".cab"), it also exposes an API to create CAB files. Support for reading ".cab" files is currently not implemented; read support for the ".msi" database is possible. This package aims to provide complete access to all tables in an ".msi" file, therefore, it is a fairly low-level API. Two primary applications of this package are the "distutils" command "bdist_msi", and the creation of Python installer package itself (although that currently uses a different version of "msilib"). The package contents can be roughly split into four parts: low-level CAB routines, low-level MSI routines, higher-level MSI routines, and standard table structures. msilib.FCICreate(cabname, files) Create a new CAB file named *cabname*. *files* must be a list of tuples, each containing the name of the file on disk, and the name of the file inside the CAB file. The files are added to the CAB file in the order they appear in the list. All files are added into a single CAB file, using the MSZIP compression algorithm. Callbacks to Python for the various steps of MSI creation are currently not exposed. msilib.UuidCreate() Return the string representation of a new unique identifier. This wraps the Windows API functions "UuidCreate()" and "UuidToString()". msilib.OpenDatabase(path, persist) Return a new database object by calling MsiOpenDatabase. *path* is the file name of the MSI file; *persist* can be one of the constants "MSIDBOPEN_CREATEDIRECT", "MSIDBOPEN_CREATE", "MSIDBOPEN_DIRECT", "MSIDBOPEN_READONLY", or "MSIDBOPEN_TRANSACT", and may include the flag "MSIDBOPEN_PATCHFILE". See the Microsoft documentation for the meaning of these flags; depending on the flags, an existing database is opened, or a new one created. msilib.CreateRecord(count) Return a new record object by calling "MSICreateRecord()". *count* is the number of fields of the record. msilib.init_database(name, schema, ProductName, ProductCode, ProductVersion, Manufacturer) Create and return a new database *name*, initialize it with *schema*, and set the properties *ProductName*, *ProductCode*, *ProductVersion*, and *Manufacturer*. *schema* must be a module object containing "tables" and "_Validation_records" attributes; typically, "msilib.schema" should be used. The database will contain just the schema and the validation records when this function returns. msilib.add_data(database, table, records) Add all *records* to the table named *table* in *database*. The *table* argument must be one of the predefined tables in the MSI schema, e.g. "'Feature'", "'File'", "'Component'", "'Dialog'", "'Control'", etc. *records* should be a list of tuples, each one containing all fields of a record according to the schema of the table. For optional fields, "None" can be passed. Field values can be ints, strings, or instances of the Binary class. class msilib.Binary(filename) Represents entries in the Binary table; inserting such an object using "add_data()" reads the file named *filename* into the table. msilib.add_tables(database, module) Add all table content from *module* to *database*. *module* must contain an attribute *tables* listing all tables for which content should be added, and one attribute per table that has the actual content. This is typically used to install the sequence tables. msilib.add_stream(database, name, path) Add the file *path* into the "_Stream" table of *database*, with the stream name *name*. msilib.gen_uuid() Return a new UUID, in the format that MSI typically requires (i.e. in curly braces, and with all hexdigits in upper-case). See also: FCICreate UuidCreate UuidToString Database Objects ================ Database.OpenView(sql) Return a view object, by calling "MSIDatabaseOpenView()". *sql* is the SQL statement to execute. Database.Commit() Commit the changes pending in the current transaction, by calling "MSIDatabaseCommit()". Database.GetSummaryInformation(count) Return a new summary information object, by calling "MsiGetSummaryInformation()". *count* is the maximum number of updated values. Database.Close() Close the database object, through "MsiCloseHandle()". New in version 3.7. See also: MSIDatabaseOpenView MSIDatabaseCommit MSIGetSummaryInformation MsiCloseHandle View Objects ============ View.Execute(params) Execute the SQL query of the view, through "MSIViewExecute()". If *params* is not "None", it is a record describing actual values of the parameter tokens in the query. View.GetColumnInfo(kind) Return a record describing the columns of the view, through calling "MsiViewGetColumnInfo()". *kind* can be either "MSICOLINFO_NAMES" or "MSICOLINFO_TYPES". View.Fetch() Return a result record of the query, through calling "MsiViewFetch()". View.Modify(kind, data) Modify the view, by calling "MsiViewModify()". *kind* can be one of "MSIMODIFY_SEEK", "MSIMODIFY_REFRESH", "MSIMODIFY_INSERT", "MSIMODIFY_UPDATE", "MSIMODIFY_ASSIGN", "MSIMODIFY_REPLACE", "MSIMODIFY_MERGE", "MSIMODIFY_DELETE", "MSIMODIFY_INSERT_TEMPORARY", "MSIMODIFY_VALIDATE", "MSIMODIFY_VALIDATE_NEW", "MSIMODIFY_VALIDATE_FIELD", or "MSIMODIFY_VALIDATE_DELETE". *data* must be a record describing the new data. View.Close() Close the view, through "MsiViewClose()". See also: MsiViewExecute MSIViewGetColumnInfo MsiViewFetch MsiViewModify MsiViewClose Summary Information Objects =========================== SummaryInformation.GetProperty(field) Return a property of the summary, through "MsiSummaryInfoGetProperty()". *field* is the name of the property, and can be one of the constants "PID_CODEPAGE", "PID_TITLE", "PID_SUBJECT", "PID_AUTHOR", "PID_KEYWORDS", "PID_COMMENTS", "PID_TEMPLATE", "PID_LASTAUTHOR", "PID_REVNUMBER", "PID_LASTPRINTED", "PID_CREATE_DTM", "PID_LASTSAVE_DTM", "PID_PAGECOUNT", "PID_WORDCOUNT", "PID_CHARCOUNT", "PID_APPNAME", or "PID_SECURITY". SummaryInformation.GetPropertyCount() Return the number of summary properties, through "MsiSummaryInfoGetPropertyCount()". SummaryInformation.SetProperty(field, value) Set a property through "MsiSummaryInfoSetProperty()". *field* can have the same values as in "GetProperty()", *value* is the new value of the property. Possible value types are integer and string. SummaryInformation.Persist() Write the modified properties to the summary information stream, using "MsiSummaryInfoPersist()". See also: MsiSummaryInfoGetProperty MsiSummaryInfoGetPropertyCount MsiSummaryInfoSetProperty MsiSummaryInfoPersist Record Objects ============== Record.GetFieldCount() Return the number of fields of the record, through "MsiRecordGetFieldCount()". Record.GetInteger(field) Return the value of *field* as an integer where possible. *field* must be an integer. Record.GetString(field) Return the value of *field* as a string where possible. *field* must be an integer. Record.SetString(field, value) Set *field* to *value* through "MsiRecordSetString()". *field* must be an integer; *value* a string. Record.SetStream(field, value) Set *field* to the contents of the file named *value*, through "MsiRecordSetStream()". *field* must be an integer; *value* a string. Record.SetInteger(field, value) Set *field* to *value* through "MsiRecordSetInteger()". Both *field* and *value* must be an integer. Record.ClearData() Set all fields of the record to 0, through "MsiRecordClearData()". See also: MsiRecordGetFieldCount MsiRecordSetString MsiRecordSetStream MsiRecordSetInteger MsiRecordClearData Errors ====== All wrappers around MSI functions raise "MSIError"; the string inside the exception will contain more detail. CAB Objects =========== class msilib.CAB(name) The class "CAB" represents a CAB file. During MSI construction, files will be added simultaneously to the "Files" table, and to a CAB file. Then, when all files have been added, the CAB file can be written, then added to the MSI file. *name* is the name of the CAB file in the MSI file. append(full, file, logical) Add the file with the pathname *full* to the CAB file, under the name *logical*. If there is already a file named *logical*, a new file name is created. Return the index of the file in the CAB file, and the new name of the file inside the CAB file. commit(database) Generate a CAB file, add it as a stream to the MSI file, put it into the "Media" table, and remove the generated file from the disk. Directory Objects ================= class msilib.Directory(database, cab, basedir, physical, logical, default[, componentflags]) Create a new directory in the Directory table. There is a current component at each point in time for the directory, which is either explicitly created through "start_component()", or implicitly when files are added for the first time. Files are added into the current component, and into the cab file. To create a directory, a base directory object needs to be specified (can be "None"), the path to the physical directory, and a logical directory name. *default* specifies the DefaultDir slot in the directory table. *componentflags* specifies the default flags that new components get. start_component(component=None, feature=None, flags=None, keyfile=None, uuid=None) Add an entry to the Component table, and make this component the current component for this directory. If no component name is given, the directory name is used. If no *feature* is given, the current feature is used. If no *flags* are given, the directory’s default flags are used. If no *keyfile* is given, the KeyPath is left null in the Component table. add_file(file, src=None, version=None, language=None) Add a file to the current component of the directory, starting a new one if there is no current component. By default, the file name in the source and the file table will be identical. If the *src* file is specified, it is interpreted relative to the current directory. Optionally, a *version* and a *language* can be specified for the entry in the File table. glob(pattern, exclude=None) Add a list of files to the current component as specified in the glob pattern. Individual files can be excluded in the *exclude* list. remove_pyc() Remove ".pyc" files on uninstall. See also: Directory Table File Table Component Table FeatureComponents Table Features ======== class msilib.Feature(db, id, title, desc, display, level=1, parent=None, directory=None, attributes=0) Add a new record to the "Feature" table, using the values *id*, *parent.id*, *title*, *desc*, *display*, *level*, *directory*, and *attributes*. The resulting feature object can be passed to the "start_component()" method of "Directory". set_current() Make this feature the current feature of "msilib". New components are automatically added to the default feature, unless a feature is explicitly specified. See also: Feature Table GUI classes =========== "msilib" provides several classes that wrap the GUI tables in an MSI database. However, no standard user interface is provided; use "bdist_msi" to create MSI files with a user-interface for installing Python packages. class msilib.Control(dlg, name) Base class of the dialog controls. *dlg* is the dialog object the control belongs to, and *name* is the control’s name. event(event, argument, condition=1, ordering=None) Make an entry into the "ControlEvent" table for this control. mapping(event, attribute) Make an entry into the "EventMapping" table for this control. condition(action, condition) Make an entry into the "ControlCondition" table for this control. class msilib.RadioButtonGroup(dlg, name, property) Create a radio button control named *name*. *property* is the installer property that gets set when a radio button is selected. add(name, x, y, width, height, text, value=None) Add a radio button named *name* to the group, at the coordinates *x*, *y*, *width*, *height*, and with the label *text*. If *value* is "None", it defaults to *name*. class msilib.Dialog(db, name, x, y, w, h, attr, title, first, default, cancel) Return a new "Dialog" object. An entry in the "Dialog" table is made, with the specified coordinates, dialog attributes, title, name of the first, default, and cancel controls. control(name, type, x, y, width, height, attributes, property, text, control_next, help) Return a new "Control" object. An entry in the "Control" table is made with the specified parameters. This is a generic method; for specific types, specialized methods are provided. text(name, x, y, width, height, attributes, text) Add and return a "Text" control. bitmap(name, x, y, width, height, text) Add and return a "Bitmap" control. line(name, x, y, width, height) Add and return a "Line" control. pushbutton(name, x, y, width, height, attributes, text, next_control) Add and return a "PushButton" control. radiogroup(name, x, y, width, height, attributes, property, text, next_control) Add and return a "RadioButtonGroup" control. checkbox(name, x, y, width, height, attributes, property, text, next_control) Add and return a "CheckBox" control. See also: Dialog Table Control Table Control Types ControlCondition Table ControlEvent Table EventMapping Table RadioButton Table Precomputed tables ================== "msilib" provides a few subpackages that contain only schema and table definitions. Currently, these definitions are based on MSI version 2.0. msilib.schema This is the standard MSI schema for MSI 2.0, with the *tables* variable providing a list of table definitions, and *_Validation_records* providing the data for MSI validation. msilib.sequence This module contains table contents for the standard sequence tables: *AdminExecuteSequence*, *AdminUISequence*, *AdvtExecuteSequence*, *InstallExecuteSequence*, and *InstallUISequence*. msilib.text This module contains definitions for the UIText and ActionText tables, for the standard installer actions. "msvcrt" — Useful routines from the MS VC++ runtime *************************************************** ====================================================================== These functions provide access to some useful capabilities on Windows platforms. Some higher-level modules use these functions to build the Windows implementations of their services. For example, the "getpass" module uses this in the implementation of the "getpass()" function. Further documentation on these functions can be found in the Platform API documentation. The module implements both the normal and wide char variants of the console I/O api. The normal API deals only with ASCII characters and is of limited use for internationalized applications. The wide char API should be used where ever possible. Changed in version 3.3: Operations in this module now raise "OSError" where "IOError" was raised. File Operations =============== msvcrt.locking(fd, mode, nbytes) Lock part of a file based on file descriptor *fd* from the C runtime. Raises "OSError" on failure. The locked region of the file extends from the current file position for *nbytes* bytes, and may continue beyond the end of the file. *mode* must be one of the "LK_*" constants listed below. Multiple regions in a file may be locked at the same time, but may not overlap. Adjacent regions are not merged; they must be unlocked individually. Raises an auditing event "msvcrt.locking" with arguments "fd", "mode", "nbytes". msvcrt.LK_LOCK msvcrt.LK_RLCK Locks the specified bytes. If the bytes cannot be locked, the program immediately tries again after 1 second. If, after 10 attempts, the bytes cannot be locked, "OSError" is raised. msvcrt.LK_NBLCK msvcrt.LK_NBRLCK Locks the specified bytes. If the bytes cannot be locked, "OSError" is raised. msvcrt.LK_UNLCK Unlocks the specified bytes, which must have been previously locked. msvcrt.setmode(fd, flags) Set the line-end translation mode for the file descriptor *fd*. To set it to text mode, *flags* should be "os.O_TEXT"; for binary, it should be "os.O_BINARY". msvcrt.open_osfhandle(handle, flags) Create a C runtime file descriptor from the file handle *handle*. The *flags* parameter should be a bitwise OR of "os.O_APPEND", "os.O_RDONLY", and "os.O_TEXT". The returned file descriptor may be used as a parameter to "os.fdopen()" to create a file object. Raises an auditing event "msvcrt.open_osfhandle" with arguments "handle", "flags". msvcrt.get_osfhandle(fd) Return the file handle for the file descriptor *fd*. Raises "OSError" if *fd* is not recognized. Raises an auditing event "msvcrt.get_osfhandle" with argument "fd". Console I/O =========== msvcrt.kbhit() Return "True" if a keypress is waiting to be read. msvcrt.getch() Read a keypress and return the resulting character as a byte string. Nothing is echoed to the console. This call will block if a keypress is not already available, but will not wait for "Enter" to be pressed. If the pressed key was a special function key, this will return "'\000'" or "'\xe0'"; the next call will return the keycode. The "Control-C" keypress cannot be read with this function. msvcrt.getwch() Wide char variant of "getch()", returning a Unicode value. msvcrt.getche() Similar to "getch()", but the keypress will be echoed if it represents a printable character. msvcrt.getwche() Wide char variant of "getche()", returning a Unicode value. msvcrt.putch(char) Print the byte string *char* to the console without buffering. msvcrt.putwch(unicode_char) Wide char variant of "putch()", accepting a Unicode value. msvcrt.ungetch(char) Cause the byte string *char* to be “pushed back” into the console buffer; it will be the next character read by "getch()" or "getche()". msvcrt.ungetwch(unicode_char) Wide char variant of "ungetch()", accepting a Unicode value. Other Functions =============== msvcrt.heapmin() Force the "malloc()" heap to clean itself up and return unused blocks to the operating system. On failure, this raises "OSError". "multiprocessing.shared_memory" — Provides shared memory for direct access across processes ******************************************************************************************** **Source code:** Lib/multiprocessing/shared_memory.py New in version 3.8. ====================================================================== This module provides a class, "SharedMemory", for the allocation and management of shared memory to be accessed by one or more processes on a multicore or symmetric multiprocessor (SMP) machine. To assist with the life-cycle management of shared memory especially across distinct processes, a "BaseManager" subclass, "SharedMemoryManager", is also provided in the "multiprocessing.managers" module. In this module, shared memory refers to “System V style” shared memory blocks (though is not necessarily implemented explicitly as such) and does not refer to “distributed shared memory”. This style of shared memory permits distinct processes to potentially read and write to a common (or shared) region of volatile memory. Processes are conventionally limited to only have access to their own process memory space but shared memory permits the sharing of data between processes, avoiding the need to instead send messages between processes containing that data. Sharing data directly via memory can provide significant performance benefits compared to sharing data via disk or socket or other communications requiring the serialization/deserialization and copying of data. class multiprocessing.shared_memory.SharedMemory(name=None, create=False, size=0) Creates a new shared memory block or attaches to an existing shared memory block. Each shared memory block is assigned a unique name. In this way, one process can create a shared memory block with a particular name and a different process can attach to that same shared memory block using that same name. As a resource for sharing data across processes, shared memory blocks may outlive the original process that created them. When one process no longer needs access to a shared memory block that might still be needed by other processes, the "close()" method should be called. When a shared memory block is no longer needed by any process, the "unlink()" method should be called to ensure proper cleanup. *name* is the unique name for the requested shared memory, specified as a string. When creating a new shared memory block, if "None" (the default) is supplied for the name, a novel name will be generated. *create* controls whether a new shared memory block is created ("True") or an existing shared memory block is attached ("False"). *size* specifies the requested number of bytes when creating a new shared memory block. Because some platforms choose to allocate chunks of memory based upon that platform’s memory page size, the exact size of the shared memory block may be larger or equal to the size requested. When attaching to an existing shared memory block, the "size" parameter is ignored. close() Closes access to the shared memory from this instance. In order to ensure proper cleanup of resources, all instances should call "close()" once the instance is no longer needed. Note that calling "close()" does not cause the shared memory block itself to be destroyed. unlink() Requests that the underlying shared memory block be destroyed. In order to ensure proper cleanup of resources, "unlink()" should be called once (and only once) across all processes which have need for the shared memory block. After requesting its destruction, a shared memory block may or may not be immediately destroyed and this behavior may differ across platforms. Attempts to access data inside the shared memory block after "unlink()" has been called may result in memory access errors. Note: the last process relinquishing its hold on a shared memory block may call "unlink()" and "close()" in either order. buf A memoryview of contents of the shared memory block. name Read-only access to the unique name of the shared memory block. size Read-only access to size in bytes of the shared memory block. The following example demonstrates low-level use of "SharedMemory" instances: >>> from multiprocessing import shared_memory >>> shm_a = shared_memory.SharedMemory(create=True, size=10) >>> type(shm_a.buf) >>> buffer = shm_a.buf >>> len(buffer) 10 >>> buffer[:4] = bytearray([22, 33, 44, 55]) # Modify multiple at once >>> buffer[4] = 100 # Modify single byte at a time >>> # Attach to an existing shared memory block >>> shm_b = shared_memory.SharedMemory(shm_a.name) >>> import array >>> array.array('b', shm_b.buf[:5]) # Copy the data into a new array.array array('b', [22, 33, 44, 55, 100]) >>> shm_b.buf[:5] = b'howdy' # Modify via shm_b using bytes >>> bytes(shm_a.buf[:5]) # Access via shm_a b'howdy' >>> shm_b.close() # Close each SharedMemory instance >>> shm_a.close() >>> shm_a.unlink() # Call unlink only once to release the shared memory The following example demonstrates a practical use of the "SharedMemory" class with NumPy arrays, accessing the same "numpy.ndarray" from two distinct Python shells: >>> # In the first Python interactive shell >>> import numpy as np >>> a = np.array([1, 1, 2, 3, 5, 8]) # Start with an existing NumPy array >>> from multiprocessing import shared_memory >>> shm = shared_memory.SharedMemory(create=True, size=a.nbytes) >>> # Now create a NumPy array backed by shared memory >>> b = np.ndarray(a.shape, dtype=a.dtype, buffer=shm.buf) >>> b[:] = a[:] # Copy the original data into shared memory >>> b array([1, 1, 2, 3, 5, 8]) >>> type(b) >>> type(a) >>> shm.name # We did not specify a name so one was chosen for us 'psm_21467_46075' >>> # In either the same shell or a new Python shell on the same machine >>> import numpy as np >>> from multiprocessing import shared_memory >>> # Attach to the existing shared memory block >>> existing_shm = shared_memory.SharedMemory(name='psm_21467_46075') >>> # Note that a.shape is (6,) and a.dtype is np.int64 in this example >>> c = np.ndarray((6,), dtype=np.int64, buffer=existing_shm.buf) >>> c array([1, 1, 2, 3, 5, 8]) >>> c[-1] = 888 >>> c array([ 1, 1, 2, 3, 5, 888]) >>> # Back in the first Python interactive shell, b reflects this change >>> b array([ 1, 1, 2, 3, 5, 888]) >>> # Clean up from within the second Python shell >>> del c # Unnecessary; merely emphasizing the array is no longer used >>> existing_shm.close() >>> # Clean up from within the first Python shell >>> del b # Unnecessary; merely emphasizing the array is no longer used >>> shm.close() >>> shm.unlink() # Free and release the shared memory block at the very end class multiprocessing.managers.SharedMemoryManager([address[, authkey]]) A subclass of "BaseManager" which can be used for the management of shared memory blocks across processes. A call to "start()" on a "SharedMemoryManager" instance causes a new process to be started. This new process’s sole purpose is to manage the life cycle of all shared memory blocks created through it. To trigger the release of all shared memory blocks managed by that process, call "shutdown()" on the instance. This triggers a "SharedMemory.unlink()" call on all of the "SharedMemory" objects managed by that process and then stops the process itself. By creating "SharedMemory" instances through a "SharedMemoryManager", we avoid the need to manually track and trigger the freeing of shared memory resources. This class provides methods for creating and returning "SharedMemory" instances and for creating a list-like object ("ShareableList") backed by shared memory. Refer to "multiprocessing.managers.BaseManager" for a description of the inherited *address* and *authkey* optional input arguments and how they may be used to connect to an existing "SharedMemoryManager" service from other processes. SharedMemory(size) Create and return a new "SharedMemory" object with the specified "size" in bytes. ShareableList(sequence) Create and return a new "ShareableList" object, initialized by the values from the input "sequence". The following example demonstrates the basic mechanisms of a "SharedMemoryManager": >>> from multiprocessing.managers import SharedMemoryManager >>> smm = SharedMemoryManager() >>> smm.start() # Start the process that manages the shared memory blocks >>> sl = smm.ShareableList(range(4)) >>> sl ShareableList([0, 1, 2, 3], name='psm_6572_7512') >>> raw_shm = smm.SharedMemory(size=128) >>> another_sl = smm.ShareableList('alpha') >>> another_sl ShareableList(['a', 'l', 'p', 'h', 'a'], name='psm_6572_12221') >>> smm.shutdown() # Calls unlink() on sl, raw_shm, and another_sl The following example depicts a potentially more convenient pattern for using "SharedMemoryManager" objects via the "with" statement to ensure that all shared memory blocks are released after they are no longer needed: >>> with SharedMemoryManager() as smm: ... sl = smm.ShareableList(range(2000)) ... # Divide the work among two processes, storing partial results in sl ... p1 = Process(target=do_work, args=(sl, 0, 1000)) ... p2 = Process(target=do_work, args=(sl, 1000, 2000)) ... p1.start() ... p2.start() # A multiprocessing.Pool might be more efficient ... p1.join() ... p2.join() # Wait for all work to complete in both processes ... total_result = sum(sl) # Consolidate the partial results now in sl When using a "SharedMemoryManager" in a "with" statement, the shared memory blocks created using that manager are all released when the "with" statement’s code block finishes execution. class multiprocessing.shared_memory.ShareableList(sequence=None, *, name=None) Provides a mutable list-like object where all values stored within are stored in a shared memory block. This constrains storable values to only the "int", "float", "bool", "str" (less than 10M bytes each), "bytes" (less than 10M bytes each), and "None" built- in data types. It also notably differs from the built-in "list" type in that these lists can not change their overall length (i.e. no append, insert, etc.) and do not support the dynamic creation of new "ShareableList" instances via slicing. *sequence* is used in populating a new "ShareableList" full of values. Set to "None" to instead attach to an already existing "ShareableList" by its unique shared memory name. *name* is the unique name for the requested shared memory, as described in the definition for "SharedMemory". When attaching to an existing "ShareableList", specify its shared memory block’s unique name while leaving "sequence" set to "None". count(value) Returns the number of occurrences of "value". index(value) Returns first index position of "value". Raises "ValueError" if "value" is not present. format Read-only attribute containing the "struct" packing format used by all currently stored values. shm The "SharedMemory" instance where the values are stored. The following example demonstrates basic use of a "ShareableList" instance: >>> from multiprocessing import shared_memory >>> a = shared_memory.ShareableList(['howdy', b'HoWdY', -273.154, 100, None, True, 42]) >>> [ type(entry) for entry in a ] [, , , , , , ] >>> a[2] -273.154 >>> a[2] = -78.5 >>> a[2] -78.5 >>> a[2] = 'dry ice' # Changing data types is supported as well >>> a[2] 'dry ice' >>> a[2] = 'larger than previously allocated storage space' Traceback (most recent call last): ... ValueError: exceeds available storage for existing str >>> a[2] 'dry ice' >>> len(a) 7 >>> a.index(42) 6 >>> a.count(b'howdy') 0 >>> a.count(b'HoWdY') 1 >>> a.shm.close() >>> a.shm.unlink() >>> del a # Use of a ShareableList after call to unlink() is unsupported The following example depicts how one, two, or many processes may access the same "ShareableList" by supplying the name of the shared memory block behind it: >>> b = shared_memory.ShareableList(range(5)) # In a first process >>> c = shared_memory.ShareableList(name=b.shm.name) # In a second process >>> c ShareableList([0, 1, 2, 3, 4], name='...') >>> c[-1] = -999 >>> b[-1] -999 >>> b.shm.close() >>> c.shm.close() >>> c.shm.unlink() "multiprocessing" — Process-based parallelism ********************************************* **Source code:** Lib/multiprocessing/ ====================================================================== Introduction ============ "multiprocessing" is a package that supports spawning processes using an API similar to the "threading" module. The "multiprocessing" package offers both local and remote concurrency, effectively side- stepping the *Global Interpreter Lock* by using subprocesses instead of threads. Due to this, the "multiprocessing" module allows the programmer to fully leverage multiple processors on a given machine. It runs on both Unix and Windows. The "multiprocessing" module also introduces APIs which do not have analogs in the "threading" module. A prime example of this is the "Pool" object which offers a convenient means of parallelizing the execution of a function across multiple input values, distributing the input data across processes (data parallelism). The following example demonstrates the common practice of defining such functions in a module so that child processes can successfully import that module. This basic example of data parallelism using "Pool", from multiprocessing import Pool def f(x): return x*x if __name__ == '__main__': with Pool(5) as p: print(p.map(f, [1, 2, 3])) will print to standard output [1, 4, 9] The "Process" class ------------------- In "multiprocessing", processes are spawned by creating a "Process" object and then calling its "start()" method. "Process" follows the API of "threading.Thread". A trivial example of a multiprocess program is from multiprocessing import Process def f(name): print('hello', name) if __name__ == '__main__': p = Process(target=f, args=('bob',)) p.start() p.join() To show the individual process IDs involved, here is an expanded example: from multiprocessing import Process import os def info(title): print(title) print('module name:', __name__) print('parent process:', os.getppid()) print('process id:', os.getpid()) def f(name): info('function f') print('hello', name) if __name__ == '__main__': info('main line') p = Process(target=f, args=('bob',)) p.start() p.join() For an explanation of why the "if __name__ == '__main__'" part is necessary, see Programming guidelines. Contexts and start methods -------------------------- Depending on the platform, "multiprocessing" supports three ways to start a process. These *start methods* are *spawn* The parent process starts a fresh python interpreter process. The child process will only inherit those resources necessary to run the process object’s "run()" method. In particular, unnecessary file descriptors and handles from the parent process will not be inherited. Starting a process using this method is rather slow compared to using *fork* or *forkserver*. Available on Unix and Windows. The default on Windows and macOS. *fork* The parent process uses "os.fork()" to fork the Python interpreter. The child process, when it begins, is effectively identical to the parent process. All resources of the parent are inherited by the child process. Note that safely forking a multithreaded process is problematic. Available on Unix only. The default on Unix. *forkserver* When the program starts and selects the *forkserver* start method, a server process is started. From then on, whenever a new process is needed, the parent process connects to the server and requests that it fork a new process. The fork server process is single threaded so it is safe for it to use "os.fork()". No unnecessary resources are inherited. Available on Unix platforms which support passing file descriptors over Unix pipes. Changed in version 3.8: On macOS, the *spawn* start method is now the default. The *fork* start method should be considered unsafe as it can lead to crashes of the subprocess. See bpo-33725. Changed in version 3.4: *spawn* added on all unix platforms, and *forkserver* added for some unix platforms. Child processes no longer inherit all of the parents inheritable handles on Windows. On Unix using the *spawn* or *forkserver* start methods will also start a *resource tracker* process which tracks the unlinked named system resources (such as named semaphores or "SharedMemory" objects) created by processes of the program. When all processes have exited the resource tracker unlinks any remaining tracked object. Usually there should be none, but if a process was killed by a signal there may be some “leaked” resources. (Neither leaked semaphores nor shared memory segments will be automatically unlinked until the next reboot. This is problematic for both objects because the system allows only a limited number of named semaphores, and shared memory segments occupy some space in the main memory.) To select a start method you use the "set_start_method()" in the "if __name__ == '__main__'" clause of the main module. For example: import multiprocessing as mp def foo(q): q.put('hello') if __name__ == '__main__': mp.set_start_method('spawn') q = mp.Queue() p = mp.Process(target=foo, args=(q,)) p.start() print(q.get()) p.join() "set_start_method()" should not be used more than once in the program. Alternatively, you can use "get_context()" to obtain a context object. Context objects have the same API as the multiprocessing module, and allow one to use multiple start methods in the same program. import multiprocessing as mp def foo(q): q.put('hello') if __name__ == '__main__': ctx = mp.get_context('spawn') q = ctx.Queue() p = ctx.Process(target=foo, args=(q,)) p.start() print(q.get()) p.join() Note that objects related to one context may not be compatible with processes for a different context. In particular, locks created using the *fork* context cannot be passed to processes started using the *spawn* or *forkserver* start methods. A library which wants to use a particular start method should probably use "get_context()" to avoid interfering with the choice of the library user. Warning: The "'spawn'" and "'forkserver'" start methods cannot currently be used with “frozen” executables (i.e., binaries produced by packages like **PyInstaller** and **cx_Freeze**) on Unix. The "'fork'" start method does work. Exchanging objects between processes ------------------------------------ "multiprocessing" supports two types of communication channel between processes: **Queues** The "Queue" class is a near clone of "queue.Queue". For example: from multiprocessing import Process, Queue def f(q): q.put([42, None, 'hello']) if __name__ == '__main__': q = Queue() p = Process(target=f, args=(q,)) p.start() print(q.get()) # prints "[42, None, 'hello']" p.join() Queues are thread and process safe. **Pipes** The "Pipe()" function returns a pair of connection objects connected by a pipe which by default is duplex (two-way). For example: from multiprocessing import Process, Pipe def f(conn): conn.send([42, None, 'hello']) conn.close() if __name__ == '__main__': parent_conn, child_conn = Pipe() p = Process(target=f, args=(child_conn,)) p.start() print(parent_conn.recv()) # prints "[42, None, 'hello']" p.join() The two connection objects returned by "Pipe()" represent the two ends of the pipe. Each connection object has "send()" and "recv()" methods (among others). Note that data in a pipe may become corrupted if two processes (or threads) try to read from or write to the *same* end of the pipe at the same time. Of course there is no risk of corruption from processes using different ends of the pipe at the same time. Synchronization between processes --------------------------------- "multiprocessing" contains equivalents of all the synchronization primitives from "threading". For instance one can use a lock to ensure that only one process prints to standard output at a time: from multiprocessing import Process, Lock def f(l, i): l.acquire() try: print('hello world', i) finally: l.release() if __name__ == '__main__': lock = Lock() for num in range(10): Process(target=f, args=(lock, num)).start() Without using the lock output from the different processes is liable to get all mixed up. Sharing state between processes ------------------------------- As mentioned above, when doing concurrent programming it is usually best to avoid using shared state as far as possible. This is particularly true when using multiple processes. However, if you really do need to use some shared data then "multiprocessing" provides a couple of ways of doing so. **Shared memory** Data can be stored in a shared memory map using "Value" or "Array". For example, the following code from multiprocessing import Process, Value, Array def f(n, a): n.value = 3.1415927 for i in range(len(a)): a[i] = -a[i] if __name__ == '__main__': num = Value('d', 0.0) arr = Array('i', range(10)) p = Process(target=f, args=(num, arr)) p.start() p.join() print(num.value) print(arr[:]) will print 3.1415927 [0, -1, -2, -3, -4, -5, -6, -7, -8, -9] The "'d'" and "'i'" arguments used when creating "num" and "arr" are typecodes of the kind used by the "array" module: "'d'" indicates a double precision float and "'i'" indicates a signed integer. These shared objects will be process and thread-safe. For more flexibility in using shared memory one can use the "multiprocessing.sharedctypes" module which supports the creation of arbitrary ctypes objects allocated from shared memory. **Server process** A manager object returned by "Manager()" controls a server process which holds Python objects and allows other processes to manipulate them using proxies. A manager returned by "Manager()" will support types "list", "dict", "Namespace", "Lock", "RLock", "Semaphore", "BoundedSemaphore", "Condition", "Event", "Barrier", "Queue", "Value" and "Array". For example, from multiprocessing import Process, Manager def f(d, l): d[1] = '1' d['2'] = 2 d[0.25] = None l.reverse() if __name__ == '__main__': with Manager() as manager: d = manager.dict() l = manager.list(range(10)) p = Process(target=f, args=(d, l)) p.start() p.join() print(d) print(l) will print {0.25: None, 1: '1', '2': 2} [9, 8, 7, 6, 5, 4, 3, 2, 1, 0] Server process managers are more flexible than using shared memory objects because they can be made to support arbitrary object types. Also, a single manager can be shared by processes on different computers over a network. They are, however, slower than using shared memory. Using a pool of workers ----------------------- The "Pool" class represents a pool of worker processes. It has methods which allows tasks to be offloaded to the worker processes in a few different ways. For example: from multiprocessing import Pool, TimeoutError import time import os def f(x): return x*x if __name__ == '__main__': # start 4 worker processes with Pool(processes=4) as pool: # print "[0, 1, 4,..., 81]" print(pool.map(f, range(10))) # print same numbers in arbitrary order for i in pool.imap_unordered(f, range(10)): print(i) # evaluate "f(20)" asynchronously res = pool.apply_async(f, (20,)) # runs in *only* one process print(res.get(timeout=1)) # prints "400" # evaluate "os.getpid()" asynchronously res = pool.apply_async(os.getpid, ()) # runs in *only* one process print(res.get(timeout=1)) # prints the PID of that process # launching multiple evaluations asynchronously *may* use more processes multiple_results = [pool.apply_async(os.getpid, ()) for i in range(4)] print([res.get(timeout=1) for res in multiple_results]) # make a single worker sleep for 10 secs res = pool.apply_async(time.sleep, (10,)) try: print(res.get(timeout=1)) except TimeoutError: print("We lacked patience and got a multiprocessing.TimeoutError") print("For the moment, the pool remains available for more work") # exiting the 'with'-block has stopped the pool print("Now the pool is closed and no longer available") Note that the methods of a pool should only ever be used by the process which created it. Note: Functionality within this package requires that the "__main__" module be importable by the children. This is covered in Programming guidelines however it is worth pointing out here. This means that some examples, such as the "multiprocessing.pool.Pool" examples will not work in the interactive interpreter. For example: >>> from multiprocessing import Pool >>> p = Pool(5) >>> def f(x): ... return x*x ... >>> with p: ... p.map(f, [1,2,3]) Process PoolWorker-1: Process PoolWorker-2: Process PoolWorker-3: Traceback (most recent call last): Traceback (most recent call last): Traceback (most recent call last): AttributeError: 'module' object has no attribute 'f' AttributeError: 'module' object has no attribute 'f' AttributeError: 'module' object has no attribute 'f' (If you try this it will actually output three full tracebacks interleaved in a semi-random fashion, and then you may have to stop the parent process somehow.) Reference ========= The "multiprocessing" package mostly replicates the API of the "threading" module. "Process" and exceptions ------------------------ class multiprocessing.Process(group=None, target=None, name=None, args=(), kwargs={}, *, daemon=None) Process objects represent activity that is run in a separate process. The "Process" class has equivalents of all the methods of "threading.Thread". The constructor should always be called with keyword arguments. *group* should always be "None"; it exists solely for compatibility with "threading.Thread". *target* is the callable object to be invoked by the "run()" method. It defaults to "None", meaning nothing is called. *name* is the process name (see "name" for more details). *args* is the argument tuple for the target invocation. *kwargs* is a dictionary of keyword arguments for the target invocation. If provided, the keyword-only *daemon* argument sets the process "daemon" flag to "True" or "False". If "None" (the default), this flag will be inherited from the creating process. By default, no arguments are passed to *target*. If a subclass overrides the constructor, it must make sure it invokes the base class constructor ("Process.__init__()") before doing anything else to the process. Changed in version 3.3: Added the *daemon* argument. run() Method representing the process’s activity. You may override this method in a subclass. The standard "run()" method invokes the callable object passed to the object’s constructor as the target argument, if any, with sequential and keyword arguments taken from the *args* and *kwargs* arguments, respectively. start() Start the process’s activity. This must be called at most once per process object. It arranges for the object’s "run()" method to be invoked in a separate process. join([timeout]) If the optional argument *timeout* is "None" (the default), the method blocks until the process whose "join()" method is called terminates. If *timeout* is a positive number, it blocks at most *timeout* seconds. Note that the method returns "None" if its process terminates or if the method times out. Check the process’s "exitcode" to determine if it terminated. A process can be joined many times. A process cannot join itself because this would cause a deadlock. It is an error to attempt to join a process before it has been started. name The process’s name. The name is a string used for identification purposes only. It has no semantics. Multiple processes may be given the same name. The initial name is set by the constructor. If no explicit name is provided to the constructor, a name of the form ‘Process-N_1:N_2:…:N_k’ is constructed, where each N_k is the N-th child of its parent. is_alive() Return whether the process is alive. Roughly, a process object is alive from the moment the "start()" method returns until the child process terminates. daemon The process’s daemon flag, a Boolean value. This must be set before "start()" is called. The initial value is inherited from the creating process. When a process exits, it attempts to terminate all of its daemonic child processes. Note that a daemonic process is not allowed to create child processes. Otherwise a daemonic process would leave its children orphaned if it gets terminated when its parent process exits. Additionally, these are **not** Unix daemons or services, they are normal processes that will be terminated (and not joined) if non-daemonic processes have exited. In addition to the "threading.Thread" API, "Process" objects also support the following attributes and methods: pid Return the process ID. Before the process is spawned, this will be "None". exitcode The child’s exit code. This will be "None" if the process has not yet terminated. If the child’s "run()" method returned normally, the exit code will be 0. If it terminated via "sys.exit()" with an integer argument *N*, the exit code will be *N*. If the child terminated due to an exception not caught within "run()", the exit code will be 1. If it was terminated by signal *N*, the exit code will be the negative value *-N*. authkey The process’s authentication key (a byte string). When "multiprocessing" is initialized the main process is assigned a random string using "os.urandom()". When a "Process" object is created, it will inherit the authentication key of its parent process, although this may be changed by setting "authkey" to another byte string. See Authentication keys. sentinel A numeric handle of a system object which will become “ready” when the process ends. You can use this value if you want to wait on several events at once using "multiprocessing.connection.wait()". Otherwise calling "join()" is simpler. On Windows, this is an OS handle usable with the "WaitForSingleObject" and "WaitForMultipleObjects" family of API calls. On Unix, this is a file descriptor usable with primitives from the "select" module. New in version 3.3. terminate() Terminate the process. On Unix this is done using the "SIGTERM" signal; on Windows "TerminateProcess()" is used. Note that exit handlers and finally clauses, etc., will not be executed. Note that descendant processes of the process will *not* be terminated – they will simply become orphaned. Warning: If this method is used when the associated process is using a pipe or queue then the pipe or queue is liable to become corrupted and may become unusable by other process. Similarly, if the process has acquired a lock or semaphore etc. then terminating it is liable to cause other processes to deadlock. kill() Same as "terminate()" but using the "SIGKILL" signal on Unix. New in version 3.7. close() Close the "Process" object, releasing all resources associated with it. "ValueError" is raised if the underlying process is still running. Once "close()" returns successfully, most other methods and attributes of the "Process" object will raise "ValueError". New in version 3.7. Note that the "start()", "join()", "is_alive()", "terminate()" and "exitcode" methods should only be called by the process that created the process object. Example usage of some of the methods of "Process": >>> import multiprocessing, time, signal >>> p = multiprocessing.Process(target=time.sleep, args=(1000,)) >>> print(p, p.is_alive()) False >>> p.start() >>> print(p, p.is_alive()) True >>> p.terminate() >>> time.sleep(0.1) >>> print(p, p.is_alive()) False >>> p.exitcode == -signal.SIGTERM True exception multiprocessing.ProcessError The base class of all "multiprocessing" exceptions. exception multiprocessing.BufferTooShort Exception raised by "Connection.recv_bytes_into()" when the supplied buffer object is too small for the message read. If "e" is an instance of "BufferTooShort" then "e.args[0]" will give the message as a byte string. exception multiprocessing.AuthenticationError Raised when there is an authentication error. exception multiprocessing.TimeoutError Raised by methods with a timeout when the timeout expires. Pipes and Queues ---------------- When using multiple processes, one generally uses message passing for communication between processes and avoids having to use any synchronization primitives like locks. For passing messages one can use "Pipe()" (for a connection between two processes) or a queue (which allows multiple producers and consumers). The "Queue", "SimpleQueue" and "JoinableQueue" types are multi- producer, multi-consumer FIFO (first-in, first-out) queues modelled on the "queue.Queue" class in the standard library. They differ in that "Queue" lacks the "task_done()" and "join()" methods introduced into Python 2.5’s "queue.Queue" class. If you use "JoinableQueue" then you **must** call "JoinableQueue.task_done()" for each task removed from the queue or else the semaphore used to count the number of unfinished tasks may eventually overflow, raising an exception. Note that one can also create a shared queue by using a manager object – see Managers. Note: "multiprocessing" uses the usual "queue.Empty" and "queue.Full" exceptions to signal a timeout. They are not available in the "multiprocessing" namespace so you need to import them from "queue". Note: When an object is put on a queue, the object is pickled and a background thread later flushes the pickled data to an underlying pipe. This has some consequences which are a little surprising, but should not cause any practical difficulties – if they really bother you then you can instead use a queue created with a manager. 1. After putting an object on an empty queue there may be an infinitesimal delay before the queue’s "empty()" method returns "False" and "get_nowait()" can return without raising "queue.Empty". 2. If multiple processes are enqueuing objects, it is possible for the objects to be received at the other end out-of-order. However, objects enqueued by the same process will always be in the expected order with respect to each other. Warning: If a process is killed using "Process.terminate()" or "os.kill()" while it is trying to use a "Queue", then the data in the queue is likely to become corrupted. This may cause any other process to get an exception when it tries to use the queue later on. Warning: As mentioned above, if a child process has put items on a queue (and it has not used "JoinableQueue.cancel_join_thread"), then that process will not terminate until all buffered items have been flushed to the pipe.This means that if you try joining that process you may get a deadlock unless you are sure that all items which have been put on the queue have been consumed. Similarly, if the child process is non-daemonic then the parent process may hang on exit when it tries to join all its non-daemonic children.Note that a queue created using a manager does not have this issue. See Programming guidelines. For an example of the usage of queues for interprocess communication see Examples. multiprocessing.Pipe([duplex]) Returns a pair "(conn1, conn2)" of "Connection" objects representing the ends of a pipe. If *duplex* is "True" (the default) then the pipe is bidirectional. If *duplex* is "False" then the pipe is unidirectional: "conn1" can only be used for receiving messages and "conn2" can only be used for sending messages. class multiprocessing.Queue([maxsize]) Returns a process shared queue implemented using a pipe and a few locks/semaphores. When a process first puts an item on the queue a feeder thread is started which transfers objects from a buffer into the pipe. The usual "queue.Empty" and "queue.Full" exceptions from the standard library’s "queue" module are raised to signal timeouts. "Queue" implements all the methods of "queue.Queue" except for "task_done()" and "join()". qsize() Return the approximate size of the queue. Because of multithreading/multiprocessing semantics, this number is not reliable. Note that this may raise "NotImplementedError" on Unix platforms like macOS where "sem_getvalue()" is not implemented. empty() Return "True" if the queue is empty, "False" otherwise. Because of multithreading/multiprocessing semantics, this is not reliable. full() Return "True" if the queue is full, "False" otherwise. Because of multithreading/multiprocessing semantics, this is not reliable. put(obj[, block[, timeout]]) Put obj into the queue. If the optional argument *block* is "True" (the default) and *timeout* is "None" (the default), block if necessary until a free slot is available. If *timeout* is a positive number, it blocks at most *timeout* seconds and raises the "queue.Full" exception if no free slot was available within that time. Otherwise (*block* is "False"), put an item on the queue if a free slot is immediately available, else raise the "queue.Full" exception (*timeout* is ignored in that case). Changed in version 3.8: If the queue is closed, "ValueError" is raised instead of "AssertionError". put_nowait(obj) Equivalent to "put(obj, False)". get([block[, timeout]]) Remove and return an item from the queue. If optional args *block* is "True" (the default) and *timeout* is "None" (the default), block if necessary until an item is available. If *timeout* is a positive number, it blocks at most *timeout* seconds and raises the "queue.Empty" exception if no item was available within that time. Otherwise (block is "False"), return an item if one is immediately available, else raise the "queue.Empty" exception (*timeout* is ignored in that case). Changed in version 3.8: If the queue is closed, "ValueError" is raised instead of "OSError". get_nowait() Equivalent to "get(False)". "multiprocessing.Queue" has a few additional methods not found in "queue.Queue". These methods are usually unnecessary for most code: close() Indicate that no more data will be put on this queue by the current process. The background thread will quit once it has flushed all buffered data to the pipe. This is called automatically when the queue is garbage collected. join_thread() Join the background thread. This can only be used after "close()" has been called. It blocks until the background thread exits, ensuring that all data in the buffer has been flushed to the pipe. By default if a process is not the creator of the queue then on exit it will attempt to join the queue’s background thread. The process can call "cancel_join_thread()" to make "join_thread()" do nothing. cancel_join_thread() Prevent "join_thread()" from blocking. In particular, this prevents the background thread from being joined automatically when the process exits – see "join_thread()". A better name for this method might be "allow_exit_without_flush()". It is likely to cause enqueued data to lost, and you almost certainly will not need to use it. It is really only there if you need the current process to exit immediately without waiting to flush enqueued data to the underlying pipe, and you don’t care about lost data. Note: This class’s functionality requires a functioning shared semaphore implementation on the host operating system. Without one, the functionality in this class will be disabled, and attempts to instantiate a "Queue" will result in an "ImportError". See bpo-3770 for additional information. The same holds true for any of the specialized queue types listed below. class multiprocessing.SimpleQueue It is a simplified "Queue" type, very close to a locked "Pipe". close() Close the queue: release internal resources. A queue must not be used anymore after it is closed. For example, "get()", "put()" and "empty()" methods must no longer be called. New in version 3.9. empty() Return "True" if the queue is empty, "False" otherwise. get() Remove and return an item from the queue. put(item) Put *item* into the queue. class multiprocessing.JoinableQueue([maxsize]) "JoinableQueue", a "Queue" subclass, is a queue which additionally has "task_done()" and "join()" methods. task_done() Indicate that a formerly enqueued task is complete. Used by queue consumers. For each "get()" used to fetch a task, a subsequent call to "task_done()" tells the queue that the processing on the task is complete. If a "join()" is currently blocking, it will resume when all items have been processed (meaning that a "task_done()" call was received for every item that had been "put()" into the queue). Raises a "ValueError" if called more times than there were items placed in the queue. join() Block until all items in the queue have been gotten and processed. The count of unfinished tasks goes up whenever an item is added to the queue. The count goes down whenever a consumer calls "task_done()" to indicate that the item was retrieved and all work on it is complete. When the count of unfinished tasks drops to zero, "join()" unblocks. Miscellaneous ------------- multiprocessing.active_children() Return list of all live children of the current process. Calling this has the side effect of “joining” any processes which have already finished. multiprocessing.cpu_count() Return the number of CPUs in the system. This number is not equivalent to the number of CPUs the current process can use. The number of usable CPUs can be obtained with "len(os.sched_getaffinity(0))" When the number of CPUs cannot be determined a "NotImplementedError" is raised. See also: "os.cpu_count()" multiprocessing.current_process() Return the "Process" object corresponding to the current process. An analogue of "threading.current_thread()". multiprocessing.parent_process() Return the "Process" object corresponding to the parent process of the "current_process()". For the main process, "parent_process" will be "None". New in version 3.8. multiprocessing.freeze_support() Add support for when a program which uses "multiprocessing" has been frozen to produce a Windows executable. (Has been tested with **py2exe**, **PyInstaller** and **cx_Freeze**.) One needs to call this function straight after the "if __name__ == '__main__'" line of the main module. For example: from multiprocessing import Process, freeze_support def f(): print('hello world!') if __name__ == '__main__': freeze_support() Process(target=f).start() If the "freeze_support()" line is omitted then trying to run the frozen executable will raise "RuntimeError". Calling "freeze_support()" has no effect when invoked on any operating system other than Windows. In addition, if the module is being run normally by the Python interpreter on Windows (the program has not been frozen), then "freeze_support()" has no effect. multiprocessing.get_all_start_methods() Returns a list of the supported start methods, the first of which is the default. The possible start methods are "'fork'", "'spawn'" and "'forkserver'". On Windows only "'spawn'" is available. On Unix "'fork'" and "'spawn'" are always supported, with "'fork'" being the default. New in version 3.4. multiprocessing.get_context(method=None) Return a context object which has the same attributes as the "multiprocessing" module. If *method* is "None" then the default context is returned. Otherwise *method* should be "'fork'", "'spawn'", "'forkserver'". "ValueError" is raised if the specified start method is not available. New in version 3.4. multiprocessing.get_start_method(allow_none=False) Return the name of start method used for starting processes. If the start method has not been fixed and *allow_none* is false, then the start method is fixed to the default and the name is returned. If the start method has not been fixed and *allow_none* is true then "None" is returned. The return value can be "'fork'", "'spawn'", "'forkserver'" or "None". "'fork'" is the default on Unix, while "'spawn'" is the default on Windows and macOS. Changed in version 3.8: On macOS, the *spawn* start method is now the default. The *fork* start method should be considered unsafe as it can lead to crashes of the subprocess. See bpo-33725. New in version 3.4. multiprocessing.set_executable(executable) Set the path of the Python interpreter to use when starting a child process. (By default "sys.executable" is used). Embedders will probably need to do some thing like set_executable(os.path.join(sys.exec_prefix, 'pythonw.exe')) before they can create child processes. Changed in version 3.4: Now supported on Unix when the "'spawn'" start method is used. multiprocessing.set_start_method(method) Set the method which should be used to start child processes. *method* can be "'fork'", "'spawn'" or "'forkserver'". Note that this should be called at most once, and it should be protected inside the "if __name__ == '__main__'" clause of the main module. New in version 3.4. Note: "multiprocessing" contains no analogues of "threading.active_count()", "threading.enumerate()", "threading.settrace()", "threading.setprofile()", "threading.Timer", or "threading.local". Connection Objects ------------------ Connection objects allow the sending and receiving of picklable objects or strings. They can be thought of as message oriented connected sockets. Connection objects are usually created using "Pipe" – see also Listeners and Clients. class multiprocessing.connection.Connection send(obj) Send an object to the other end of the connection which should be read using "recv()". The object must be picklable. Very large pickles (approximately 32 MiB+, though it depends on the OS) may raise a "ValueError" exception. recv() Return an object sent from the other end of the connection using "send()". Blocks until there is something to receive. Raises "EOFError" if there is nothing left to receive and the other end was closed. fileno() Return the file descriptor or handle used by the connection. close() Close the connection. This is called automatically when the connection is garbage collected. poll([timeout]) Return whether there is any data available to be read. If *timeout* is not specified then it will return immediately. If *timeout* is a number then this specifies the maximum time in seconds to block. If *timeout* is "None" then an infinite timeout is used. Note that multiple connection objects may be polled at once by using "multiprocessing.connection.wait()". send_bytes(buffer[, offset[, size]]) Send byte data from a *bytes-like object* as a complete message. If *offset* is given then data is read from that position in *buffer*. If *size* is given then that many bytes will be read from buffer. Very large buffers (approximately 32 MiB+, though it depends on the OS) may raise a "ValueError" exception recv_bytes([maxlength]) Return a complete message of byte data sent from the other end of the connection as a string. Blocks until there is something to receive. Raises "EOFError" if there is nothing left to receive and the other end has closed. If *maxlength* is specified and the message is longer than *maxlength* then "OSError" is raised and the connection will no longer be readable. Changed in version 3.3: This function used to raise "IOError", which is now an alias of "OSError". recv_bytes_into(buffer[, offset]) Read into *buffer* a complete message of byte data sent from the other end of the connection and return the number of bytes in the message. Blocks until there is something to receive. Raises "EOFError" if there is nothing left to receive and the other end was closed. *buffer* must be a writable *bytes-like object*. If *offset* is given then the message will be written into the buffer from that position. Offset must be a non-negative integer less than the length of *buffer* (in bytes). If the buffer is too short then a "BufferTooShort" exception is raised and the complete message is available as "e.args[0]" where "e" is the exception instance. Changed in version 3.3: Connection objects themselves can now be transferred between processes using "Connection.send()" and "Connection.recv()". New in version 3.3: Connection objects now support the context management protocol – see Context Manager Types. "__enter__()" returns the connection object, and "__exit__()" calls "close()". For example: >>> from multiprocessing import Pipe >>> a, b = Pipe() >>> a.send([1, 'hello', None]) >>> b.recv() [1, 'hello', None] >>> b.send_bytes(b'thank you') >>> a.recv_bytes() b'thank you' >>> import array >>> arr1 = array.array('i', range(5)) >>> arr2 = array.array('i', [0] * 10) >>> a.send_bytes(arr1) >>> count = b.recv_bytes_into(arr2) >>> assert count == len(arr1) * arr1.itemsize >>> arr2 array('i', [0, 1, 2, 3, 4, 0, 0, 0, 0, 0]) Warning: The "Connection.recv()" method automatically unpickles the data it receives, which can be a security risk unless you can trust the process which sent the message.Therefore, unless the connection object was produced using "Pipe()" you should only use the "recv()" and "send()" methods after performing some sort of authentication. See Authentication keys. Warning: If a process is killed while it is trying to read or write to a pipe then the data in the pipe is likely to become corrupted, because it may become impossible to be sure where the message boundaries lie. Synchronization primitives -------------------------- Generally synchronization primitives are not as necessary in a multiprocess program as they are in a multithreaded program. See the documentation for "threading" module. Note that one can also create synchronization primitives by using a manager object – see Managers. class multiprocessing.Barrier(parties[, action[, timeout]]) A barrier object: a clone of "threading.Barrier". New in version 3.3. class multiprocessing.BoundedSemaphore([value]) A bounded semaphore object: a close analog of "threading.BoundedSemaphore". A solitary difference from its close analog exists: its "acquire" method’s first argument is named *block*, as is consistent with "Lock.acquire()". Note: On macOS, this is indistinguishable from "Semaphore" because "sem_getvalue()" is not implemented on that platform. class multiprocessing.Condition([lock]) A condition variable: an alias for "threading.Condition". If *lock* is specified then it should be a "Lock" or "RLock" object from "multiprocessing". Changed in version 3.3: The "wait_for()" method was added. class multiprocessing.Event A clone of "threading.Event". class multiprocessing.Lock A non-recursive lock object: a close analog of "threading.Lock". Once a process or thread has acquired a lock, subsequent attempts to acquire it from any process or thread will block until it is released; any process or thread may release it. The concepts and behaviors of "threading.Lock" as it applies to threads are replicated here in "multiprocessing.Lock" as it applies to either processes or threads, except as noted. Note that "Lock" is actually a factory function which returns an instance of "multiprocessing.synchronize.Lock" initialized with a default context. "Lock" supports the *context manager* protocol and thus may be used in "with" statements. acquire(block=True, timeout=None) Acquire a lock, blocking or non-blocking. With the *block* argument set to "True" (the default), the method call will block until the lock is in an unlocked state, then set it to locked and return "True". Note that the name of this first argument differs from that in "threading.Lock.acquire()". With the *block* argument set to "False", the method call does not block. If the lock is currently in a locked state, return "False"; otherwise set the lock to a locked state and return "True". When invoked with a positive, floating-point value for *timeout*, block for at most the number of seconds specified by *timeout* as long as the lock can not be acquired. Invocations with a negative value for *timeout* are equivalent to a *timeout* of zero. Invocations with a *timeout* value of "None" (the default) set the timeout period to infinite. Note that the treatment of negative or "None" values for *timeout* differs from the implemented behavior in "threading.Lock.acquire()". The *timeout* argument has no practical implications if the *block* argument is set to "False" and is thus ignored. Returns "True" if the lock has been acquired or "False" if the timeout period has elapsed. release() Release a lock. This can be called from any process or thread, not only the process or thread which originally acquired the lock. Behavior is the same as in "threading.Lock.release()" except that when invoked on an unlocked lock, a "ValueError" is raised. class multiprocessing.RLock A recursive lock object: a close analog of "threading.RLock". A recursive lock must be released by the process or thread that acquired it. Once a process or thread has acquired a recursive lock, the same process or thread may acquire it again without blocking; that process or thread must release it once for each time it has been acquired. Note that "RLock" is actually a factory function which returns an instance of "multiprocessing.synchronize.RLock" initialized with a default context. "RLock" supports the *context manager* protocol and thus may be used in "with" statements. acquire(block=True, timeout=None) Acquire a lock, blocking or non-blocking. When invoked with the *block* argument set to "True", block until the lock is in an unlocked state (not owned by any process or thread) unless the lock is already owned by the current process or thread. The current process or thread then takes ownership of the lock (if it does not already have ownership) and the recursion level inside the lock increments by one, resulting in a return value of "True". Note that there are several differences in this first argument’s behavior compared to the implementation of "threading.RLock.acquire()", starting with the name of the argument itself. When invoked with the *block* argument set to "False", do not block. If the lock has already been acquired (and thus is owned) by another process or thread, the current process or thread does not take ownership and the recursion level within the lock is not changed, resulting in a return value of "False". If the lock is in an unlocked state, the current process or thread takes ownership and the recursion level is incremented, resulting in a return value of "True". Use and behaviors of the *timeout* argument are the same as in "Lock.acquire()". Note that some of these behaviors of *timeout* differ from the implemented behaviors in "threading.RLock.acquire()". release() Release a lock, decrementing the recursion level. If after the decrement the recursion level is zero, reset the lock to unlocked (not owned by any process or thread) and if any other processes or threads are blocked waiting for the lock to become unlocked, allow exactly one of them to proceed. If after the decrement the recursion level is still nonzero, the lock remains locked and owned by the calling process or thread. Only call this method when the calling process or thread owns the lock. An "AssertionError" is raised if this method is called by a process or thread other than the owner or if the lock is in an unlocked (unowned) state. Note that the type of exception raised in this situation differs from the implemented behavior in "threading.RLock.release()". class multiprocessing.Semaphore([value]) A semaphore object: a close analog of "threading.Semaphore". A solitary difference from its close analog exists: its "acquire" method’s first argument is named *block*, as is consistent with "Lock.acquire()". Note: On macOS, "sem_timedwait" is unsupported, so calling "acquire()" with a timeout will emulate that function’s behavior using a sleeping loop. Note: If the SIGINT signal generated by "Ctrl-C" arrives while the main thread is blocked by a call to "BoundedSemaphore.acquire()", "Lock.acquire()", "RLock.acquire()", "Semaphore.acquire()", "Condition.acquire()" or "Condition.wait()" then the call will be immediately interrupted and "KeyboardInterrupt" will be raised.This differs from the behaviour of "threading" where SIGINT will be ignored while the equivalent blocking calls are in progress. Note: Some of this package’s functionality requires a functioning shared semaphore implementation on the host operating system. Without one, the "multiprocessing.synchronize" module will be disabled, and attempts to import it will result in an "ImportError". See bpo-3770 for additional information. Shared "ctypes" Objects ----------------------- It is possible to create shared objects using shared memory which can be inherited by child processes. multiprocessing.Value(typecode_or_type, *args, lock=True) Return a "ctypes" object allocated from shared memory. By default the return value is actually a synchronized wrapper for the object. The object itself can be accessed via the *value* attribute of a "Value". *typecode_or_type* determines the type of the returned object: it is either a ctypes type or a one character typecode of the kind used by the "array" module. **args* is passed on to the constructor for the type. If *lock* is "True" (the default) then a new recursive lock object is created to synchronize access to the value. If *lock* is a "Lock" or "RLock" object then that will be used to synchronize access to the value. If *lock* is "False" then access to the returned object will not be automatically protected by a lock, so it will not necessarily be “process-safe”. Operations like "+=" which involve a read and write are not atomic. So if, for instance, you want to atomically increment a shared value it is insufficient to just do counter.value += 1 Assuming the associated lock is recursive (which it is by default) you can instead do with counter.get_lock(): counter.value += 1 Note that *lock* is a keyword-only argument. multiprocessing.Array(typecode_or_type, size_or_initializer, *, lock=True) Return a ctypes array allocated from shared memory. By default the return value is actually a synchronized wrapper for the array. *typecode_or_type* determines the type of the elements of the returned array: it is either a ctypes type or a one character typecode of the kind used by the "array" module. If *size_or_initializer* is an integer, then it determines the length of the array, and the array will be initially zeroed. Otherwise, *size_or_initializer* is a sequence which is used to initialize the array and whose length determines the length of the array. If *lock* is "True" (the default) then a new lock object is created to synchronize access to the value. If *lock* is a "Lock" or "RLock" object then that will be used to synchronize access to the value. If *lock* is "False" then access to the returned object will not be automatically protected by a lock, so it will not necessarily be “process-safe”. Note that *lock* is a keyword only argument. Note that an array of "ctypes.c_char" has *value* and *raw* attributes which allow one to use it to store and retrieve strings. The "multiprocessing.sharedctypes" module ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The "multiprocessing.sharedctypes" module provides functions for allocating "ctypes" objects from shared memory which can be inherited by child processes. Note: Although it is possible to store a pointer in shared memory remember that this will refer to a location in the address space of a specific process. However, the pointer is quite likely to be invalid in the context of a second process and trying to dereference the pointer from the second process may cause a crash. multiprocessing.sharedctypes.RawArray(typecode_or_type, size_or_initializer) Return a ctypes array allocated from shared memory. *typecode_or_type* determines the type of the elements of the returned array: it is either a ctypes type or a one character typecode of the kind used by the "array" module. If *size_or_initializer* is an integer then it determines the length of the array, and the array will be initially zeroed. Otherwise *size_or_initializer* is a sequence which is used to initialize the array and whose length determines the length of the array. Note that setting and getting an element is potentially non-atomic – use "Array()" instead to make sure that access is automatically synchronized using a lock. multiprocessing.sharedctypes.RawValue(typecode_or_type, *args) Return a ctypes object allocated from shared memory. *typecode_or_type* determines the type of the returned object: it is either a ctypes type or a one character typecode of the kind used by the "array" module. **args* is passed on to the constructor for the type. Note that setting and getting the value is potentially non-atomic – use "Value()" instead to make sure that access is automatically synchronized using a lock. Note that an array of "ctypes.c_char" has "value" and "raw" attributes which allow one to use it to store and retrieve strings – see documentation for "ctypes". multiprocessing.sharedctypes.Array(typecode_or_type, size_or_initializer, *, lock=True) The same as "RawArray()" except that depending on the value of *lock* a process-safe synchronization wrapper may be returned instead of a raw ctypes array. If *lock* is "True" (the default) then a new lock object is created to synchronize access to the value. If *lock* is a "Lock" or "RLock" object then that will be used to synchronize access to the value. If *lock* is "False" then access to the returned object will not be automatically protected by a lock, so it will not necessarily be “process-safe”. Note that *lock* is a keyword-only argument. multiprocessing.sharedctypes.Value(typecode_or_type, *args, lock=True) The same as "RawValue()" except that depending on the value of *lock* a process-safe synchronization wrapper may be returned instead of a raw ctypes object. If *lock* is "True" (the default) then a new lock object is created to synchronize access to the value. If *lock* is a "Lock" or "RLock" object then that will be used to synchronize access to the value. If *lock* is "False" then access to the returned object will not be automatically protected by a lock, so it will not necessarily be “process-safe”. Note that *lock* is a keyword-only argument. multiprocessing.sharedctypes.copy(obj) Return a ctypes object allocated from shared memory which is a copy of the ctypes object *obj*. multiprocessing.sharedctypes.synchronized(obj[, lock]) Return a process-safe wrapper object for a ctypes object which uses *lock* to synchronize access. If *lock* is "None" (the default) then a "multiprocessing.RLock" object is created automatically. A synchronized wrapper will have two methods in addition to those of the object it wraps: "get_obj()" returns the wrapped object and "get_lock()" returns the lock object used for synchronization. Note that accessing the ctypes object through the wrapper can be a lot slower than accessing the raw ctypes object. Changed in version 3.5: Synchronized objects support the *context manager* protocol. The table below compares the syntax for creating shared ctypes objects from shared memory with the normal ctypes syntax. (In the table "MyStruct" is some subclass of "ctypes.Structure".) +----------------------+----------------------------+-----------------------------+ | ctypes | sharedctypes using type | sharedctypes using typecode | |======================|============================|=============================| | c_double(2.4) | RawValue(c_double, 2.4) | RawValue(‘d’, 2.4) | +----------------------+----------------------------+-----------------------------+ | MyStruct(4, 6) | RawValue(MyStruct, 4, 6) | | +----------------------+----------------------------+-----------------------------+ | (c_short * 7)() | RawArray(c_short, 7) | RawArray(‘h’, 7) | +----------------------+----------------------------+-----------------------------+ | (c_int * 3)(9, 2, 8) | RawArray(c_int, (9, 2, 8)) | RawArray(‘i’, (9, 2, 8)) | +----------------------+----------------------------+-----------------------------+ Below is an example where a number of ctypes objects are modified by a child process: from multiprocessing import Process, Lock from multiprocessing.sharedctypes import Value, Array from ctypes import Structure, c_double class Point(Structure): _fields_ = [('x', c_double), ('y', c_double)] def modify(n, x, s, A): n.value **= 2 x.value **= 2 s.value = s.value.upper() for a in A: a.x **= 2 a.y **= 2 if __name__ == '__main__': lock = Lock() n = Value('i', 7) x = Value(c_double, 1.0/3.0, lock=False) s = Array('c', b'hello world', lock=lock) A = Array(Point, [(1.875,-6.25), (-5.75,2.0), (2.375,9.5)], lock=lock) p = Process(target=modify, args=(n, x, s, A)) p.start() p.join() print(n.value) print(x.value) print(s.value) print([(a.x, a.y) for a in A]) The results printed are 49 0.1111111111111111 HELLO WORLD [(3.515625, 39.0625), (33.0625, 4.0), (5.640625, 90.25)] Managers -------- Managers provide a way to create data which can be shared between different processes, including sharing over a network between processes running on different machines. A manager object controls a server process which manages *shared objects*. Other processes can access the shared objects by using proxies. multiprocessing.Manager() Returns a started "SyncManager" object which can be used for sharing objects between processes. The returned manager object corresponds to a spawned child process and has methods which will create shared objects and return corresponding proxies. Manager processes will be shutdown as soon as they are garbage collected or their parent process exits. The manager classes are defined in the "multiprocessing.managers" module: class multiprocessing.managers.BaseManager([address[, authkey]]) Create a BaseManager object. Once created one should call "start()" or "get_server().serve_forever()" to ensure that the manager object refers to a started manager process. *address* is the address on which the manager process listens for new connections. If *address* is "None" then an arbitrary one is chosen. *authkey* is the authentication key which will be used to check the validity of incoming connections to the server process. If *authkey* is "None" then "current_process().authkey" is used. Otherwise *authkey* is used and it must be a byte string. start([initializer[, initargs]]) Start a subprocess to start the manager. If *initializer* is not "None" then the subprocess will call "initializer(*initargs)" when it starts. get_server() Returns a "Server" object which represents the actual server under the control of the Manager. The "Server" object supports the "serve_forever()" method: >>> from multiprocessing.managers import BaseManager >>> manager = BaseManager(address=('', 50000), authkey=b'abc') >>> server = manager.get_server() >>> server.serve_forever() "Server" additionally has an "address" attribute. connect() Connect a local manager object to a remote manager process: >>> from multiprocessing.managers import BaseManager >>> m = BaseManager(address=('127.0.0.1', 50000), authkey=b'abc') >>> m.connect() shutdown() Stop the process used by the manager. This is only available if "start()" has been used to start the server process. This can be called multiple times. register(typeid[, callable[, proxytype[, exposed[, method_to_typeid[, create_method]]]]]) A classmethod which can be used for registering a type or callable with the manager class. *typeid* is a “type identifier” which is used to identify a particular type of shared object. This must be a string. *callable* is a callable used for creating objects for this type identifier. If a manager instance will be connected to the server using the "connect()" method, or if the *create_method* argument is "False" then this can be left as "None". *proxytype* is a subclass of "BaseProxy" which is used to create proxies for shared objects with this *typeid*. If "None" then a proxy class is created automatically. *exposed* is used to specify a sequence of method names which proxies for this typeid should be allowed to access using "BaseProxy._callmethod()". (If *exposed* is "None" then "proxytype._exposed_" is used instead if it exists.) In the case where no exposed list is specified, all “public methods” of the shared object will be accessible. (Here a “public method” means any attribute which has a "__call__()" method and whose name does not begin with "'_'".) *method_to_typeid* is a mapping used to specify the return type of those exposed methods which should return a proxy. It maps method names to typeid strings. (If *method_to_typeid* is "None" then "proxytype._method_to_typeid_" is used instead if it exists.) If a method’s name is not a key of this mapping or if the mapping is "None" then the object returned by the method will be copied by value. *create_method* determines whether a method should be created with name *typeid* which can be used to tell the server process to create a new shared object and return a proxy for it. By default it is "True". "BaseManager" instances also have one read-only property: address The address used by the manager. Changed in version 3.3: Manager objects support the context management protocol – see Context Manager Types. "__enter__()" starts the server process (if it has not already started) and then returns the manager object. "__exit__()" calls "shutdown()".In previous versions "__enter__()" did not start the manager’s server process if it was not already started. class multiprocessing.managers.SyncManager A subclass of "BaseManager" which can be used for the synchronization of processes. Objects of this type are returned by "multiprocessing.Manager()". Its methods create and return Proxy Objects for a number of commonly used data types to be synchronized across processes. This notably includes shared lists and dictionaries. Barrier(parties[, action[, timeout]]) Create a shared "threading.Barrier" object and return a proxy for it. New in version 3.3. BoundedSemaphore([value]) Create a shared "threading.BoundedSemaphore" object and return a proxy for it. Condition([lock]) Create a shared "threading.Condition" object and return a proxy for it. If *lock* is supplied then it should be a proxy for a "threading.Lock" or "threading.RLock" object. Changed in version 3.3: The "wait_for()" method was added. Event() Create a shared "threading.Event" object and return a proxy for it. Lock() Create a shared "threading.Lock" object and return a proxy for it. Namespace() Create a shared "Namespace" object and return a proxy for it. Queue([maxsize]) Create a shared "queue.Queue" object and return a proxy for it. RLock() Create a shared "threading.RLock" object and return a proxy for it. Semaphore([value]) Create a shared "threading.Semaphore" object and return a proxy for it. Array(typecode, sequence) Create an array and return a proxy for it. Value(typecode, value) Create an object with a writable "value" attribute and return a proxy for it. dict() dict(mapping) dict(sequence) Create a shared "dict" object and return a proxy for it. list() list(sequence) Create a shared "list" object and return a proxy for it. Changed in version 3.6: Shared objects are capable of being nested. For example, a shared container object such as a shared list can contain other shared objects which will all be managed and synchronized by the "SyncManager". class multiprocessing.managers.Namespace A type that can register with "SyncManager". A namespace object has no public methods, but does have writable attributes. Its representation shows the values of its attributes. However, when using a proxy for a namespace object, an attribute beginning with "'_'" will be an attribute of the proxy and not an attribute of the referent: >>> manager = multiprocessing.Manager() >>> Global = manager.Namespace() >>> Global.x = 10 >>> Global.y = 'hello' >>> Global._z = 12.3 # this is an attribute of the proxy >>> print(Global) Namespace(x=10, y='hello') Customized managers ~~~~~~~~~~~~~~~~~~~ To create one’s own manager, one creates a subclass of "BaseManager" and uses the "register()" classmethod to register new types or callables with the manager class. For example: from multiprocessing.managers import BaseManager class MathsClass: def add(self, x, y): return x + y def mul(self, x, y): return x * y class MyManager(BaseManager): pass MyManager.register('Maths', MathsClass) if __name__ == '__main__': with MyManager() as manager: maths = manager.Maths() print(maths.add(4, 3)) # prints 7 print(maths.mul(7, 8)) # prints 56 Using a remote manager ~~~~~~~~~~~~~~~~~~~~~~ It is possible to run a manager server on one machine and have clients use it from other machines (assuming that the firewalls involved allow it). Running the following commands creates a server for a single shared queue which remote clients can access: >>> from multiprocessing.managers import BaseManager >>> from queue import Queue >>> queue = Queue() >>> class QueueManager(BaseManager): pass >>> QueueManager.register('get_queue', callable=lambda:queue) >>> m = QueueManager(address=('', 50000), authkey=b'abracadabra') >>> s = m.get_server() >>> s.serve_forever() One client can access the server as follows: >>> from multiprocessing.managers import BaseManager >>> class QueueManager(BaseManager): pass >>> QueueManager.register('get_queue') >>> m = QueueManager(address=('foo.bar.org', 50000), authkey=b'abracadabra') >>> m.connect() >>> queue = m.get_queue() >>> queue.put('hello') Another client can also use it: >>> from multiprocessing.managers import BaseManager >>> class QueueManager(BaseManager): pass >>> QueueManager.register('get_queue') >>> m = QueueManager(address=('foo.bar.org', 50000), authkey=b'abracadabra') >>> m.connect() >>> queue = m.get_queue() >>> queue.get() 'hello' Local processes can also access that queue, using the code from above on the client to access it remotely: >>> from multiprocessing import Process, Queue >>> from multiprocessing.managers import BaseManager >>> class Worker(Process): ... def __init__(self, q): ... self.q = q ... super().__init__() ... def run(self): ... self.q.put('local hello') ... >>> queue = Queue() >>> w = Worker(queue) >>> w.start() >>> class QueueManager(BaseManager): pass ... >>> QueueManager.register('get_queue', callable=lambda: queue) >>> m = QueueManager(address=('', 50000), authkey=b'abracadabra') >>> s = m.get_server() >>> s.serve_forever() Proxy Objects ------------- A proxy is an object which *refers* to a shared object which lives (presumably) in a different process. The shared object is said to be the *referent* of the proxy. Multiple proxy objects may have the same referent. A proxy object has methods which invoke corresponding methods of its referent (although not every method of the referent will necessarily be available through the proxy). In this way, a proxy can be used just like its referent can: >>> from multiprocessing import Manager >>> manager = Manager() >>> l = manager.list([i*i for i in range(10)]) >>> print(l) [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] >>> print(repr(l)) >>> l[4] 16 >>> l[2:5] [4, 9, 16] Notice that applying "str()" to a proxy will return the representation of the referent, whereas applying "repr()" will return the representation of the proxy. An important feature of proxy objects is that they are picklable so they can be passed between processes. As such, a referent can contain Proxy Objects. This permits nesting of these managed lists, dicts, and other Proxy Objects: >>> a = manager.list() >>> b = manager.list() >>> a.append(b) # referent of a now contains referent of b >>> print(a, b) [] [] >>> b.append('hello') >>> print(a[0], b) ['hello'] ['hello'] Similarly, dict and list proxies may be nested inside one another: >>> l_outer = manager.list([ manager.dict() for i in range(2) ]) >>> d_first_inner = l_outer[0] >>> d_first_inner['a'] = 1 >>> d_first_inner['b'] = 2 >>> l_outer[1]['c'] = 3 >>> l_outer[1]['z'] = 26 >>> print(l_outer[0]) {'a': 1, 'b': 2} >>> print(l_outer[1]) {'c': 3, 'z': 26} If standard (non-proxy) "list" or "dict" objects are contained in a referent, modifications to those mutable values will not be propagated through the manager because the proxy has no way of knowing when the values contained within are modified. However, storing a value in a container proxy (which triggers a "__setitem__" on the proxy object) does propagate through the manager and so to effectively modify such an item, one could re-assign the modified value to the container proxy: # create a list proxy and append a mutable object (a dictionary) lproxy = manager.list() lproxy.append({}) # now mutate the dictionary d = lproxy[0] d['a'] = 1 d['b'] = 2 # at this point, the changes to d are not yet synced, but by # updating the dictionary, the proxy is notified of the change lproxy[0] = d This approach is perhaps less convenient than employing nested Proxy Objects for most use cases but also demonstrates a level of control over the synchronization. Note: The proxy types in "multiprocessing" do nothing to support comparisons by value. So, for instance, we have: >>> manager.list([1,2,3]) == [1,2,3] False One should just use a copy of the referent instead when making comparisons. class multiprocessing.managers.BaseProxy Proxy objects are instances of subclasses of "BaseProxy". _callmethod(methodname[, args[, kwds]]) Call and return the result of a method of the proxy’s referent. If "proxy" is a proxy whose referent is "obj" then the expression proxy._callmethod(methodname, args, kwds) will evaluate the expression getattr(obj, methodname)(*args, **kwds) in the manager’s process. The returned value will be a copy of the result of the call or a proxy to a new shared object – see documentation for the *method_to_typeid* argument of "BaseManager.register()". If an exception is raised by the call, then is re-raised by "_callmethod()". If some other exception is raised in the manager’s process then this is converted into a "RemoteError" exception and is raised by "_callmethod()". Note in particular that an exception will be raised if *methodname* has not been *exposed*. An example of the usage of "_callmethod()": >>> l = manager.list(range(10)) >>> l._callmethod('__len__') 10 >>> l._callmethod('__getitem__', (slice(2, 7),)) # equivalent to l[2:7] [2, 3, 4, 5, 6] >>> l._callmethod('__getitem__', (20,)) # equivalent to l[20] Traceback (most recent call last): ... IndexError: list index out of range _getvalue() Return a copy of the referent. If the referent is unpicklable then this will raise an exception. __repr__() Return a representation of the proxy object. __str__() Return the representation of the referent. Cleanup ~~~~~~~ A proxy object uses a weakref callback so that when it gets garbage collected it deregisters itself from the manager which owns its referent. A shared object gets deleted from the manager process when there are no longer any proxies referring to it. Process Pools ------------- One can create a pool of processes which will carry out tasks submitted to it with the "Pool" class. class multiprocessing.pool.Pool([processes[, initializer[, initargs[, maxtasksperchild[, context]]]]]) A process pool object which controls a pool of worker processes to which jobs can be submitted. It supports asynchronous results with timeouts and callbacks and has a parallel map implementation. *processes* is the number of worker processes to use. If *processes* is "None" then the number returned by "os.cpu_count()" is used. If *initializer* is not "None" then each worker process will call "initializer(*initargs)" when it starts. *maxtasksperchild* is the number of tasks a worker process can complete before it will exit and be replaced with a fresh worker process, to enable unused resources to be freed. The default *maxtasksperchild* is "None", which means worker processes will live as long as the pool. *context* can be used to specify the context used for starting the worker processes. Usually a pool is created using the function "multiprocessing.Pool()" or the "Pool()" method of a context object. In both cases *context* is set appropriately. Note that the methods of the pool object should only be called by the process which created the pool. Warning: "multiprocessing.pool" objects have internal resources that need to be properly managed (like any other resource) by using the pool as a context manager or by calling "close()" and "terminate()" manually. Failure to do this can lead to the process hanging on finalization.Note that is **not correct** to rely on the garbage colletor to destroy the pool as CPython does not assure that the finalizer of the pool will be called (see "object.__del__()" for more information). New in version 3.2: *maxtasksperchild* New in version 3.4: *context* Note: Worker processes within a "Pool" typically live for the complete duration of the Pool’s work queue. A frequent pattern found in other systems (such as Apache, mod_wsgi, etc) to free resources held by workers is to allow a worker within a pool to complete only a set amount of work before being exiting, being cleaned up and a new process spawned to replace the old one. The *maxtasksperchild* argument to the "Pool" exposes this ability to the end user. apply(func[, args[, kwds]]) Call *func* with arguments *args* and keyword arguments *kwds*. It blocks until the result is ready. Given this blocks, "apply_async()" is better suited for performing work in parallel. Additionally, *func* is only executed in one of the workers of the pool. apply_async(func[, args[, kwds[, callback[, error_callback]]]]) A variant of the "apply()" method which returns a "AsyncResult" object. If *callback* is specified then it should be a callable which accepts a single argument. When the result becomes ready *callback* is applied to it, that is unless the call failed, in which case the *error_callback* is applied instead. If *error_callback* is specified then it should be a callable which accepts a single argument. If the target function fails, then the *error_callback* is called with the exception instance. Callbacks should complete immediately since otherwise the thread which handles the results will get blocked. map(func, iterable[, chunksize]) A parallel equivalent of the "map()" built-in function (it supports only one *iterable* argument though, for multiple iterables see "starmap()"). It blocks until the result is ready. This method chops the iterable into a number of chunks which it submits to the process pool as separate tasks. The (approximate) size of these chunks can be specified by setting *chunksize* to a positive integer. Note that it may cause high memory usage for very long iterables. Consider using "imap()" or "imap_unordered()" with explicit *chunksize* option for better efficiency. map_async(func, iterable[, chunksize[, callback[, error_callback]]]) A variant of the "map()" method which returns a "AsyncResult" object. If *callback* is specified then it should be a callable which accepts a single argument. When the result becomes ready *callback* is applied to it, that is unless the call failed, in which case the *error_callback* is applied instead. If *error_callback* is specified then it should be a callable which accepts a single argument. If the target function fails, then the *error_callback* is called with the exception instance. Callbacks should complete immediately since otherwise the thread which handles the results will get blocked. imap(func, iterable[, chunksize]) A lazier version of "map()". The *chunksize* argument is the same as the one used by the "map()" method. For very long iterables using a large value for *chunksize* can make the job complete **much** faster than using the default value of "1". Also if *chunksize* is "1" then the "next()" method of the iterator returned by the "imap()" method has an optional *timeout* parameter: "next(timeout)" will raise "multiprocessing.TimeoutError" if the result cannot be returned within *timeout* seconds. imap_unordered(func, iterable[, chunksize]) The same as "imap()" except that the ordering of the results from the returned iterator should be considered arbitrary. (Only when there is only one worker process is the order guaranteed to be “correct”.) starmap(func, iterable[, chunksize]) Like "map()" except that the elements of the *iterable* are expected to be iterables that are unpacked as arguments. Hence an *iterable* of "[(1,2), (3, 4)]" results in "[func(1,2), func(3,4)]". New in version 3.3. starmap_async(func, iterable[, chunksize[, callback[, error_callback]]]) A combination of "starmap()" and "map_async()" that iterates over *iterable* of iterables and calls *func* with the iterables unpacked. Returns a result object. New in version 3.3. close() Prevents any more tasks from being submitted to the pool. Once all the tasks have been completed the worker processes will exit. terminate() Stops the worker processes immediately without completing outstanding work. When the pool object is garbage collected "terminate()" will be called immediately. join() Wait for the worker processes to exit. One must call "close()" or "terminate()" before using "join()". New in version 3.3: Pool objects now support the context management protocol – see Context Manager Types. "__enter__()" returns the pool object, and "__exit__()" calls "terminate()". class multiprocessing.pool.AsyncResult The class of the result returned by "Pool.apply_async()" and "Pool.map_async()". get([timeout]) Return the result when it arrives. If *timeout* is not "None" and the result does not arrive within *timeout* seconds then "multiprocessing.TimeoutError" is raised. If the remote call raised an exception then that exception will be reraised by "get()". wait([timeout]) Wait until the result is available or until *timeout* seconds pass. ready() Return whether the call has completed. successful() Return whether the call completed without raising an exception. Will raise "ValueError" if the result is not ready. Changed in version 3.7: If the result is not ready, "ValueError" is raised instead of "AssertionError". The following example demonstrates the use of a pool: from multiprocessing import Pool import time def f(x): return x*x if __name__ == '__main__': with Pool(processes=4) as pool: # start 4 worker processes result = pool.apply_async(f, (10,)) # evaluate "f(10)" asynchronously in a single process print(result.get(timeout=1)) # prints "100" unless your computer is *very* slow print(pool.map(f, range(10))) # prints "[0, 1, 4,..., 81]" it = pool.imap(f, range(10)) print(next(it)) # prints "0" print(next(it)) # prints "1" print(it.next(timeout=1)) # prints "4" unless your computer is *very* slow result = pool.apply_async(time.sleep, (10,)) print(result.get(timeout=1)) # raises multiprocessing.TimeoutError Listeners and Clients --------------------- Usually message passing between processes is done using queues or by using "Connection" objects returned by "Pipe()". However, the "multiprocessing.connection" module allows some extra flexibility. It basically gives a high level message oriented API for dealing with sockets or Windows named pipes. It also has support for *digest authentication* using the "hmac" module, and for polling multiple connections at the same time. multiprocessing.connection.deliver_challenge(connection, authkey) Send a randomly generated message to the other end of the connection and wait for a reply. If the reply matches the digest of the message using *authkey* as the key then a welcome message is sent to the other end of the connection. Otherwise "AuthenticationError" is raised. multiprocessing.connection.answer_challenge(connection, authkey) Receive a message, calculate the digest of the message using *authkey* as the key, and then send the digest back. If a welcome message is not received, then "AuthenticationError" is raised. multiprocessing.connection.Client(address[, family[, authkey]]) Attempt to set up a connection to the listener which is using address *address*, returning a "Connection". The type of the connection is determined by *family* argument, but this can generally be omitted since it can usually be inferred from the format of *address*. (See Address Formats) If *authkey* is given and not None, it should be a byte string and will be used as the secret key for an HMAC-based authentication challenge. No authentication is done if *authkey* is None. "AuthenticationError" is raised if authentication fails. See Authentication keys. class multiprocessing.connection.Listener([address[, family[, backlog[, authkey]]]]) A wrapper for a bound socket or Windows named pipe which is ‘listening’ for connections. *address* is the address to be used by the bound socket or named pipe of the listener object. Note: If an address of ‘0.0.0.0’ is used, the address will not be a connectable end point on Windows. If you require a connectable end-point, you should use ‘127.0.0.1’. *family* is the type of socket (or named pipe) to use. This can be one of the strings "'AF_INET'" (for a TCP socket), "'AF_UNIX'" (for a Unix domain socket) or "'AF_PIPE'" (for a Windows named pipe). Of these only the first is guaranteed to be available. If *family* is "None" then the family is inferred from the format of *address*. If *address* is also "None" then a default is chosen. This default is the family which is assumed to be the fastest available. See Address Formats. Note that if *family* is "'AF_UNIX'" and address is "None" then the socket will be created in a private temporary directory created using "tempfile.mkstemp()". If the listener object uses a socket then *backlog* (1 by default) is passed to the "listen()" method of the socket once it has been bound. If *authkey* is given and not None, it should be a byte string and will be used as the secret key for an HMAC-based authentication challenge. No authentication is done if *authkey* is None. "AuthenticationError" is raised if authentication fails. See Authentication keys. accept() Accept a connection on the bound socket or named pipe of the listener object and return a "Connection" object. If authentication is attempted and fails, then "AuthenticationError" is raised. close() Close the bound socket or named pipe of the listener object. This is called automatically when the listener is garbage collected. However it is advisable to call it explicitly. Listener objects have the following read-only properties: address The address which is being used by the Listener object. last_accepted The address from which the last accepted connection came. If this is unavailable then it is "None". New in version 3.3: Listener objects now support the context management protocol – see Context Manager Types. "__enter__()" returns the listener object, and "__exit__()" calls "close()". multiprocessing.connection.wait(object_list, timeout=None) Wait till an object in *object_list* is ready. Returns the list of those objects in *object_list* which are ready. If *timeout* is a float then the call blocks for at most that many seconds. If *timeout* is "None" then it will block for an unlimited period. A negative timeout is equivalent to a zero timeout. For both Unix and Windows, an object can appear in *object_list* if it is * a readable "Connection" object; * a connected and readable "socket.socket" object; or * the "sentinel" attribute of a "Process" object. A connection or socket object is ready when there is data available to be read from it, or the other end has been closed. **Unix**: "wait(object_list, timeout)" almost equivalent "select.select(object_list, [], [], timeout)". The difference is that, if "select.select()" is interrupted by a signal, it can raise "OSError" with an error number of "EINTR", whereas "wait()" will not. **Windows**: An item in *object_list* must either be an integer handle which is waitable (according to the definition used by the documentation of the Win32 function "WaitForMultipleObjects()") or it can be an object with a "fileno()" method which returns a socket handle or pipe handle. (Note that pipe handles and socket handles are **not** waitable handles.) New in version 3.3. **Examples** The following server code creates a listener which uses "'secret password'" as an authentication key. It then waits for a connection and sends some data to the client: from multiprocessing.connection import Listener from array import array address = ('localhost', 6000) # family is deduced to be 'AF_INET' with Listener(address, authkey=b'secret password') as listener: with listener.accept() as conn: print('connection accepted from', listener.last_accepted) conn.send([2.25, None, 'junk', float]) conn.send_bytes(b'hello') conn.send_bytes(array('i', [42, 1729])) The following code connects to the server and receives some data from the server: from multiprocessing.connection import Client from array import array address = ('localhost', 6000) with Client(address, authkey=b'secret password') as conn: print(conn.recv()) # => [2.25, None, 'junk', float] print(conn.recv_bytes()) # => 'hello' arr = array('i', [0, 0, 0, 0, 0]) print(conn.recv_bytes_into(arr)) # => 8 print(arr) # => array('i', [42, 1729, 0, 0, 0]) The following code uses "wait()" to wait for messages from multiple processes at once: import time, random from multiprocessing import Process, Pipe, current_process from multiprocessing.connection import wait def foo(w): for i in range(10): w.send((i, current_process().name)) w.close() if __name__ == '__main__': readers = [] for i in range(4): r, w = Pipe(duplex=False) readers.append(r) p = Process(target=foo, args=(w,)) p.start() # We close the writable end of the pipe now to be sure that # p is the only process which owns a handle for it. This # ensures that when p closes its handle for the writable end, # wait() will promptly report the readable end as being ready. w.close() while readers: for r in wait(readers): try: msg = r.recv() except EOFError: readers.remove(r) else: print(msg) Address Formats ~~~~~~~~~~~~~~~ * An "'AF_INET'" address is a tuple of the form "(hostname, port)" where *hostname* is a string and *port* is an integer. * An "'AF_UNIX'" address is a string representing a filename on the filesystem. * An "'AF_PIPE'" address is a string of the form "r'\.\pipe{PipeName}'". To use "Client()" to connect to a named pipe on a remote computer called *ServerName* one should use an address of the form "r'\*ServerName*\pipe{PipeName}'" instead. Note that any string beginning with two backslashes is assumed by default to be an "'AF_PIPE'" address rather than an "'AF_UNIX'" address. Authentication keys ------------------- When one uses "Connection.recv", the data received is automatically unpickled. Unfortunately unpickling data from an untrusted source is a security risk. Therefore "Listener" and "Client()" use the "hmac" module to provide digest authentication. An authentication key is a byte string which can be thought of as a password: once a connection is established both ends will demand proof that the other knows the authentication key. (Demonstrating that both ends are using the same key does **not** involve sending the key over the connection.) If authentication is requested but no authentication key is specified then the return value of "current_process().authkey" is used (see "Process"). This value will be automatically inherited by any "Process" object that the current process creates. This means that (by default) all processes of a multi-process program will share a single authentication key which can be used when setting up connections between themselves. Suitable authentication keys can also be generated by using "os.urandom()". Logging ------- Some support for logging is available. Note, however, that the "logging" package does not use process shared locks so it is possible (depending on the handler type) for messages from different processes to get mixed up. multiprocessing.get_logger() Returns the logger used by "multiprocessing". If necessary, a new one will be created. When first created the logger has level "logging.NOTSET" and no default handler. Messages sent to this logger will not by default propagate to the root logger. Note that on Windows child processes will only inherit the level of the parent process’s logger – any other customization of the logger will not be inherited. multiprocessing.log_to_stderr(level=None) This function performs a call to "get_logger()" but in addition to returning the logger created by get_logger, it adds a handler which sends output to "sys.stderr" using format "'[%(levelname)s/%(processName)s] %(message)s'". You can modify "levelname" of the logger by passing a "level" argument. Below is an example session with logging turned on: >>> import multiprocessing, logging >>> logger = multiprocessing.log_to_stderr() >>> logger.setLevel(logging.INFO) >>> logger.warning('doomed') [WARNING/MainProcess] doomed >>> m = multiprocessing.Manager() [INFO/SyncManager-...] child process calling self.run() [INFO/SyncManager-...] created temp directory /.../pymp-... [INFO/SyncManager-...] manager serving at '/.../listener-...' >>> del m [INFO/MainProcess] sending shutdown message to manager [INFO/SyncManager-...] manager exiting with exitcode 0 For a full table of logging levels, see the "logging" module. The "multiprocessing.dummy" module ---------------------------------- "multiprocessing.dummy" replicates the API of "multiprocessing" but is no more than a wrapper around the "threading" module. In particular, the "Pool" function provided by "multiprocessing.dummy" returns an instance of "ThreadPool", which is a subclass of "Pool" that supports all the same method calls but uses a pool of worker threads rather than worker processes. class multiprocessing.pool.ThreadPool([processes[, initializer[, initargs]]]) A thread pool object which controls a pool of worker threads to which jobs can be submitted. "ThreadPool" instances are fully interface compatible with "Pool" instances, and their resources must also be properly managed, either by using the pool as a context manager or by calling "close()" and "terminate()" manually. *processes* is the number of worker threads to use. If *processes* is "None" then the number returned by "os.cpu_count()" is used. If *initializer* is not "None" then each worker process will call "initializer(*initargs)" when it starts. Unlike "Pool", *maxtasksperchild* and *context* cannot be provided. Note: A "ThreadPool" shares the same interface as "Pool", which is designed around a pool of processes and predates the introduction of the "concurrent.futures" module. As such, it inherits some operations that don’t make sense for a pool backed by threads, and it has its own type for representing the status of asynchronous jobs, "AsyncResult", that is not understood by any other libraries.Users should generally prefer to use "concurrent.futures.ThreadPoolExecutor", which has a simpler interface that was designed around threads from the start, and which returns "concurrent.futures.Future" instances that are compatible with many other libraries, including "asyncio". Programming guidelines ====================== There are certain guidelines and idioms which should be adhered to when using "multiprocessing". All start methods ----------------- The following applies to all start methods. Avoid shared state As far as possible one should try to avoid shifting large amounts of data between processes. It is probably best to stick to using queues or pipes for communication between processes rather than using the lower level synchronization primitives. Picklability Ensure that the arguments to the methods of proxies are picklable. Thread safety of proxies Do not use a proxy object from more than one thread unless you protect it with a lock. (There is never a problem with different processes using the *same* proxy.) Joining zombie processes On Unix when a process finishes but has not been joined it becomes a zombie. There should never be very many because each time a new process starts (or "active_children()" is called) all completed processes which have not yet been joined will be joined. Also calling a finished process’s "Process.is_alive" will join the process. Even so it is probably good practice to explicitly join all the processes that you start. Better to inherit than pickle/unpickle When using the *spawn* or *forkserver* start methods many types from "multiprocessing" need to be picklable so that child processes can use them. However, one should generally avoid sending shared objects to other processes using pipes or queues. Instead you should arrange the program so that a process which needs access to a shared resource created elsewhere can inherit it from an ancestor process. Avoid terminating processes Using the "Process.terminate" method to stop a process is liable to cause any shared resources (such as locks, semaphores, pipes and queues) currently being used by the process to become broken or unavailable to other processes. Therefore it is probably best to only consider using "Process.terminate" on processes which never use any shared resources. Joining processes that use queues Bear in mind that a process that has put items in a queue will wait before terminating until all the buffered items are fed by the “feeder” thread to the underlying pipe. (The child process can call the "Queue.cancel_join_thread" method of the queue to avoid this behaviour.) This means that whenever you use a queue you need to make sure that all items which have been put on the queue will eventually be removed before the process is joined. Otherwise you cannot be sure that processes which have put items on the queue will terminate. Remember also that non-daemonic processes will be joined automatically. An example which will deadlock is the following: from multiprocessing import Process, Queue def f(q): q.put('X' * 1000000) if __name__ == '__main__': queue = Queue() p = Process(target=f, args=(queue,)) p.start() p.join() # this deadlocks obj = queue.get() A fix here would be to swap the last two lines (or simply remove the "p.join()" line). Explicitly pass resources to child processes On Unix using the *fork* start method, a child process can make use of a shared resource created in a parent process using a global resource. However, it is better to pass the object as an argument to the constructor for the child process. Apart from making the code (potentially) compatible with Windows and the other start methods this also ensures that as long as the child process is still alive the object will not be garbage collected in the parent process. This might be important if some resource is freed when the object is garbage collected in the parent process. So for instance from multiprocessing import Process, Lock def f(): ... do something using "lock" ... if __name__ == '__main__': lock = Lock() for i in range(10): Process(target=f).start() should be rewritten as from multiprocessing import Process, Lock def f(l): ... do something using "l" ... if __name__ == '__main__': lock = Lock() for i in range(10): Process(target=f, args=(lock,)).start() Beware of replacing "sys.stdin" with a “file like object” "multiprocessing" originally unconditionally called: os.close(sys.stdin.fileno()) in the "multiprocessing.Process._bootstrap()" method — this resulted in issues with processes-in-processes. This has been changed to: sys.stdin.close() sys.stdin = open(os.open(os.devnull, os.O_RDONLY), closefd=False) Which solves the fundamental issue of processes colliding with each other resulting in a bad file descriptor error, but introduces a potential danger to applications which replace "sys.stdin()" with a “file-like object” with output buffering. This danger is that if multiple processes call "close()" on this file-like object, it could result in the same data being flushed to the object multiple times, resulting in corruption. If you write a file-like object and implement your own caching, you can make it fork-safe by storing the pid whenever you append to the cache, and discarding the cache when the pid changes. For example: @property def cache(self): pid = os.getpid() if pid != self._pid: self._pid = pid self._cache = [] return self._cache For more information, see bpo-5155, bpo-5313 and bpo-5331 The *spawn* and *forkserver* start methods ------------------------------------------ There are a few extra restriction which don’t apply to the *fork* start method. More picklability Ensure that all arguments to "Process.__init__()" are picklable. Also, if you subclass "Process" then make sure that instances will be picklable when the "Process.start" method is called. Global variables Bear in mind that if code run in a child process tries to access a global variable, then the value it sees (if any) may not be the same as the value in the parent process at the time that "Process.start" was called. However, global variables which are just module level constants cause no problems. Safe importing of main module Make sure that the main module can be safely imported by a new Python interpreter without causing unintended side effects (such a starting a new process). For example, using the *spawn* or *forkserver* start method running the following module would fail with a "RuntimeError": from multiprocessing import Process def foo(): print('hello') p = Process(target=foo) p.start() Instead one should protect the “entry point” of the program by using "if __name__ == '__main__':" as follows: from multiprocessing import Process, freeze_support, set_start_method def foo(): print('hello') if __name__ == '__main__': freeze_support() set_start_method('spawn') p = Process(target=foo) p.start() (The "freeze_support()" line can be omitted if the program will be run normally instead of frozen.) This allows the newly spawned Python interpreter to safely import the module and then run the module’s "foo()" function. Similar restrictions apply if a pool or manager is created in the main module. Examples ======== Demonstration of how to create and use customized managers and proxies: from multiprocessing import freeze_support from multiprocessing.managers import BaseManager, BaseProxy import operator ## class Foo: def f(self): print('you called Foo.f()') def g(self): print('you called Foo.g()') def _h(self): print('you called Foo._h()') # A simple generator function def baz(): for i in range(10): yield i*i # Proxy type for generator objects class GeneratorProxy(BaseProxy): _exposed_ = ['__next__'] def __iter__(self): return self def __next__(self): return self._callmethod('__next__') # Function to return the operator module def get_operator_module(): return operator ## class MyManager(BaseManager): pass # register the Foo class; make `f()` and `g()` accessible via proxy MyManager.register('Foo1', Foo) # register the Foo class; make `g()` and `_h()` accessible via proxy MyManager.register('Foo2', Foo, exposed=('g', '_h')) # register the generator function baz; use `GeneratorProxy` to make proxies MyManager.register('baz', baz, proxytype=GeneratorProxy) # register get_operator_module(); make public functions accessible via proxy MyManager.register('operator', get_operator_module) ## def test(): manager = MyManager() manager.start() print('-' * 20) f1 = manager.Foo1() f1.f() f1.g() assert not hasattr(f1, '_h') assert sorted(f1._exposed_) == sorted(['f', 'g']) print('-' * 20) f2 = manager.Foo2() f2.g() f2._h() assert not hasattr(f2, 'f') assert sorted(f2._exposed_) == sorted(['g', '_h']) print('-' * 20) it = manager.baz() for i in it: print('<%d>' % i, end=' ') print() print('-' * 20) op = manager.operator() print('op.add(23, 45) =', op.add(23, 45)) print('op.pow(2, 94) =', op.pow(2, 94)) print('op._exposed_ =', op._exposed_) ## if __name__ == '__main__': freeze_support() test() Using "Pool": import multiprocessing import time import random import sys # # Functions used by test code # def calculate(func, args): result = func(*args) return '%s says that %s%s = %s' % ( multiprocessing.current_process().name, func.__name__, args, result ) def calculatestar(args): return calculate(*args) def mul(a, b): time.sleep(0.5 * random.random()) return a * b def plus(a, b): time.sleep(0.5 * random.random()) return a + b def f(x): return 1.0 / (x - 5.0) def pow3(x): return x ** 3 def noop(x): pass # # Test code # def test(): PROCESSES = 4 print('Creating pool with %d processes\n' % PROCESSES) with multiprocessing.Pool(PROCESSES) as pool: # # Tests # TASKS = [(mul, (i, 7)) for i in range(10)] + \ [(plus, (i, 8)) for i in range(10)] results = [pool.apply_async(calculate, t) for t in TASKS] imap_it = pool.imap(calculatestar, TASKS) imap_unordered_it = pool.imap_unordered(calculatestar, TASKS) print('Ordered results using pool.apply_async():') for r in results: print('\t', r.get()) print() print('Ordered results using pool.imap():') for x in imap_it: print('\t', x) print() print('Unordered results using pool.imap_unordered():') for x in imap_unordered_it: print('\t', x) print() print('Ordered results using pool.map() --- will block till complete:') for x in pool.map(calculatestar, TASKS): print('\t', x) print() # # Test error handling # print('Testing error handling:') try: print(pool.apply(f, (5,))) except ZeroDivisionError: print('\tGot ZeroDivisionError as expected from pool.apply()') else: raise AssertionError('expected ZeroDivisionError') try: print(pool.map(f, list(range(10)))) except ZeroDivisionError: print('\tGot ZeroDivisionError as expected from pool.map()') else: raise AssertionError('expected ZeroDivisionError') try: print(list(pool.imap(f, list(range(10))))) except ZeroDivisionError: print('\tGot ZeroDivisionError as expected from list(pool.imap())') else: raise AssertionError('expected ZeroDivisionError') it = pool.imap(f, list(range(10))) for i in range(10): try: x = next(it) except ZeroDivisionError: if i == 5: pass except StopIteration: break else: if i == 5: raise AssertionError('expected ZeroDivisionError') assert i == 9 print('\tGot ZeroDivisionError as expected from IMapIterator.next()') print() # # Testing timeouts # print('Testing ApplyResult.get() with timeout:', end=' ') res = pool.apply_async(calculate, TASKS[0]) while 1: sys.stdout.flush() try: sys.stdout.write('\n\t%s' % res.get(0.02)) break except multiprocessing.TimeoutError: sys.stdout.write('.') print() print() print('Testing IMapIterator.next() with timeout:', end=' ') it = pool.imap(calculatestar, TASKS) while 1: sys.stdout.flush() try: sys.stdout.write('\n\t%s' % it.next(0.02)) except StopIteration: break except multiprocessing.TimeoutError: sys.stdout.write('.') print() print() if __name__ == '__main__': multiprocessing.freeze_support() test() An example showing how to use queues to feed tasks to a collection of worker processes and collect the results: import time import random from multiprocessing import Process, Queue, current_process, freeze_support # # Function run by worker processes # def worker(input, output): for func, args in iter(input.get, 'STOP'): result = calculate(func, args) output.put(result) # # Function used to calculate result # def calculate(func, args): result = func(*args) return '%s says that %s%s = %s' % \ (current_process().name, func.__name__, args, result) # # Functions referenced by tasks # def mul(a, b): time.sleep(0.5*random.random()) return a * b def plus(a, b): time.sleep(0.5*random.random()) return a + b # # # def test(): NUMBER_OF_PROCESSES = 4 TASKS1 = [(mul, (i, 7)) for i in range(20)] TASKS2 = [(plus, (i, 8)) for i in range(10)] # Create queues task_queue = Queue() done_queue = Queue() # Submit tasks for task in TASKS1: task_queue.put(task) # Start worker processes for i in range(NUMBER_OF_PROCESSES): Process(target=worker, args=(task_queue, done_queue)).start() # Get and print results print('Unordered results:') for i in range(len(TASKS1)): print('\t', done_queue.get()) # Add more tasks using `put()` for task in TASKS2: task_queue.put(task) # Get and print some more results for i in range(len(TASKS2)): print('\t', done_queue.get()) # Tell child processes to stop for i in range(NUMBER_OF_PROCESSES): task_queue.put('STOP') if __name__ == '__main__': freeze_support() test() Internet Data Handling ********************** This chapter describes modules which support handling data formats commonly used on the Internet. * "email" — An email and MIME handling package * "email.message": Representing an email message * "email.parser": Parsing email messages * FeedParser API * Parser API * Additional notes * "email.generator": Generating MIME documents * "email.policy": Policy Objects * "email.errors": Exception and Defect classes * "email.headerregistry": Custom Header Objects * "email.contentmanager": Managing MIME Content * Content Manager Instances * "email": Examples * "email.message.Message": Representing an email message using the "compat32" API * "email.mime": Creating email and MIME objects from scratch * "email.header": Internationalized headers * "email.charset": Representing character sets * "email.encoders": Encoders * "email.utils": Miscellaneous utilities * "email.iterators": Iterators * "json" — JSON encoder and decoder * Basic Usage * Encoders and Decoders * Exceptions * Standard Compliance and Interoperability * Character Encodings * Infinite and NaN Number Values * Repeated Names Within an Object * Top-level Non-Object, Non-Array Values * Implementation Limitations * Command Line Interface * Command line options * "mailbox" — Manipulate mailboxes in various formats * "Mailbox" objects * "Maildir" * "mbox" * "MH" * "Babyl" * "MMDF" * "Message" objects * "MaildirMessage" * "mboxMessage" * "MHMessage" * "BabylMessage" * "MMDFMessage" * Exceptions * Examples * "mimetypes" — Map filenames to MIME types * MimeTypes Objects * "base64" — Base16, Base32, Base64, Base85 Data Encodings * "binhex" — Encode and decode binhex4 files * Notes * "binascii" — Convert between binary and ASCII * "quopri" — Encode and decode MIME quoted-printable data "netrc" — netrc file processing ******************************* **Source code:** Lib/netrc.py ====================================================================== The "netrc" class parses and encapsulates the netrc file format used by the Unix **ftp** program and other FTP clients. class netrc.netrc([file]) A "netrc" instance or subclass instance encapsulates data from a netrc file. The initialization argument, if present, specifies the file to parse. If no argument is given, the file ".netrc" in the user’s home directory – as determined by "os.path.expanduser()" – will be read. Otherwise, a "FileNotFoundError" exception will be raised. Parse errors will raise "NetrcParseError" with diagnostic information including the file name, line number, and terminating token. If no argument is specified on a POSIX system, the presence of passwords in the ".netrc" file will raise a "NetrcParseError" if the file ownership or permissions are insecure (owned by a user other than the user running the process, or accessible for read or write by any other user). This implements security behavior equivalent to that of ftp and other programs that use ".netrc". Changed in version 3.4: Added the POSIX permission check. Changed in version 3.7: "os.path.expanduser()" is used to find the location of the ".netrc" file when *file* is not passed as argument. exception netrc.NetrcParseError Exception raised by the "netrc" class when syntactical errors are encountered in source text. Instances of this exception provide three interesting attributes: "msg" is a textual explanation of the error, "filename" is the name of the source file, and "lineno" gives the line number on which the error was found. netrc Objects ============= A "netrc" instance has the following methods: netrc.authenticators(host) Return a 3-tuple "(login, account, password)" of authenticators for *host*. If the netrc file did not contain an entry for the given host, return the tuple associated with the ‘default’ entry. If neither matching host nor default entry is available, return "None". netrc.__repr__() Dump the class data as a string in the format of a netrc file. (This discards comments and may reorder the entries.) Instances of "netrc" have public instance variables: netrc.hosts Dictionary mapping host names to "(login, account, password)" tuples. The ‘default’ entry, if any, is represented as a pseudo- host by that name. netrc.macros Dictionary mapping macro names to string lists. Note: Passwords are limited to a subset of the ASCII character set. All ASCII punctuation is allowed in passwords, however, note that whitespace and non-printable characters are not allowed in passwords. This is a limitation of the way the .netrc file is parsed and may be removed in the future. "nis" — Interface to Sun’s NIS (Yellow Pages) ********************************************* Deprecated since version 3.11: The "nis" module is deprecated (see **PEP 594** for details). ====================================================================== The "nis" module gives a thin wrapper around the NIS library, useful for central administration of several hosts. Because NIS exists only on Unix systems, this module is only available for Unix. The "nis" module defines the following functions: nis.match(key, mapname, domain=default_domain) Return the match for *key* in map *mapname*, or raise an error ("nis.error") if there is none. Both should be strings, *key* is 8-bit clean. Return value is an arbitrary array of bytes (may contain "NULL" and other joys). Note that *mapname* is first checked if it is an alias to another name. The *domain* argument allows overriding the NIS domain used for the lookup. If unspecified, lookup is in the default NIS domain. nis.cat(mapname, domain=default_domain) Return a dictionary mapping *key* to *value* such that "match(key, mapname)==value". Note that both keys and values of the dictionary are arbitrary arrays of bytes. Note that *mapname* is first checked if it is an alias to another name. The *domain* argument allows overriding the NIS domain used for the lookup. If unspecified, lookup is in the default NIS domain. nis.maps(domain=default_domain) Return a list of all valid maps. The *domain* argument allows overriding the NIS domain used for the lookup. If unspecified, lookup is in the default NIS domain. nis.get_default_domain() Return the system default NIS domain. The "nis" module defines the following exception: exception nis.error An error raised when a NIS function returns an error code. "nntplib" — NNTP protocol client ******************************** **Source code:** Lib/nntplib.py Deprecated since version 3.11: The "nntplib" module is deprecated (see **PEP 594** for details). ====================================================================== This module defines the class "NNTP" which implements the client side of the Network News Transfer Protocol. It can be used to implement a news reader or poster, or automated news processors. It is compatible with **RFC 3977** as well as the older **RFC 977** and **RFC 2980**. Here are two small examples of how it can be used. To list some statistics about a newsgroup and print the subjects of the last 10 articles: >>> s = nntplib.NNTP('news.gmane.io') >>> resp, count, first, last, name = s.group('gmane.comp.python.committers') >>> print('Group', name, 'has', count, 'articles, range', first, 'to', last) Group gmane.comp.python.committers has 1096 articles, range 1 to 1096 >>> resp, overviews = s.over((last - 9, last)) >>> for id, over in overviews: ... print(id, nntplib.decode_header(over['subject'])) ... 1087 Re: Commit privileges for Łukasz Langa 1088 Re: 3.2 alpha 2 freeze 1089 Re: 3.2 alpha 2 freeze 1090 Re: Commit privileges for Łukasz Langa 1091 Re: Commit privileges for Łukasz Langa 1092 Updated ssh key 1093 Re: Updated ssh key 1094 Re: Updated ssh key 1095 Hello fellow committers! 1096 Re: Hello fellow committers! >>> s.quit() '205 Bye!' To post an article from a binary file (this assumes that the article has valid headers, and that you have right to post on the particular newsgroup): >>> s = nntplib.NNTP('news.gmane.io') >>> f = open('article.txt', 'rb') >>> s.post(f) '240 Article posted successfully.' >>> s.quit() '205 Bye!' The module itself defines the following classes: class nntplib.NNTP(host, port=119, user=None, password=None, readermode=None, usenetrc=False[, timeout]) Return a new "NNTP" object, representing a connection to the NNTP server running on host *host*, listening at port *port*. An optional *timeout* can be specified for the socket connection. If the optional *user* and *password* are provided, or if suitable credentials are present in "/.netrc" and the optional flag *usenetrc* is true, the "AUTHINFO USER" and "AUTHINFO PASS" commands are used to identify and authenticate the user to the server. If the optional flag *readermode* is true, then a "mode reader" command is sent before authentication is performed. Reader mode is sometimes necessary if you are connecting to an NNTP server on the local machine and intend to call reader-specific commands, such as "group". If you get unexpected "NNTPPermanentError"s, you might need to set *readermode*. The "NNTP" class supports the "with" statement to unconditionally consume "OSError" exceptions and to close the NNTP connection when done, e.g.: >>> from nntplib import NNTP >>> with NNTP('news.gmane.io') as n: ... n.group('gmane.comp.python.committers') ... ('211 1755 1 1755 gmane.comp.python.committers', 1755, 1, 1755, 'gmane.comp.python.committers') >>> Raises an auditing event "nntplib.connect" with arguments "self", "host", "port". All commands will raise an auditing event "nntplib.putline" with arguments "self" and "line", where "line" is the bytes about to be sent to the remote host. Changed in version 3.2: *usenetrc* is now "False" by default. Changed in version 3.3: Support for the "with" statement was added. Changed in version 3.9: If the *timeout* parameter is set to be zero, it will raise a "ValueError" to prevent the creation of a non-blocking socket. class nntplib.NNTP_SSL(host, port=563, user=None, password=None, ssl_context=None, readermode=None, usenetrc=False[, timeout]) Return a new "NNTP_SSL" object, representing an encrypted connection to the NNTP server running on host *host*, listening at port *port*. "NNTP_SSL" objects have the same methods as "NNTP" objects. If *port* is omitted, port 563 (NNTPS) is used. *ssl_context* is also optional, and is a "SSLContext" object. Please read Security considerations for best practices. All other parameters behave the same as for "NNTP". Note that SSL-on-563 is discouraged per **RFC 4642**, in favor of STARTTLS as described below. However, some servers only support the former. Raises an auditing event "nntplib.connect" with arguments "self", "host", "port". All commands will raise an auditing event "nntplib.putline" with arguments "self" and "line", where "line" is the bytes about to be sent to the remote host. New in version 3.2. Changed in version 3.4: The class now supports hostname check with "ssl.SSLContext.check_hostname" and *Server Name Indication* (see "ssl.HAS_SNI"). Changed in version 3.9: If the *timeout* parameter is set to be zero, it will raise a "ValueError" to prevent the creation of a non-blocking socket. exception nntplib.NNTPError Derived from the standard exception "Exception", this is the base class for all exceptions raised by the "nntplib" module. Instances of this class have the following attribute: response The response of the server if available, as a "str" object. exception nntplib.NNTPReplyError Exception raised when an unexpected reply is received from the server. exception nntplib.NNTPTemporaryError Exception raised when a response code in the range 400–499 is received. exception nntplib.NNTPPermanentError Exception raised when a response code in the range 500–599 is received. exception nntplib.NNTPProtocolError Exception raised when a reply is received from the server that does not begin with a digit in the range 1–5. exception nntplib.NNTPDataError Exception raised when there is some error in the response data. NNTP Objects ============ When connected, "NNTP" and "NNTP_SSL" objects support the following methods and attributes. Attributes ---------- NNTP.nntp_version An integer representing the version of the NNTP protocol supported by the server. In practice, this should be "2" for servers advertising **RFC 3977** compliance and "1" for others. New in version 3.2. NNTP.nntp_implementation A string describing the software name and version of the NNTP server, or "None" if not advertised by the server. New in version 3.2. Methods ------- The *response* that is returned as the first item in the return tuple of almost all methods is the server’s response: a string beginning with a three-digit code. If the server’s response indicates an error, the method raises one of the above exceptions. Many of the following methods take an optional keyword-only argument *file*. When the *file* argument is supplied, it must be either a *file object* opened for binary writing, or the name of an on-disk file to be written to. The method will then write any data returned by the server (except for the response line and the terminating dot) to the file; any list of lines, tuples or objects that the method normally returns will be empty. Changed in version 3.2: Many of the following methods have been reworked and fixed, which makes them incompatible with their 3.1 counterparts. NNTP.quit() Send a "QUIT" command and close the connection. Once this method has been called, no other methods of the NNTP object should be called. NNTP.getwelcome() Return the welcome message sent by the server in reply to the initial connection. (This message sometimes contains disclaimers or help information that may be relevant to the user.) NNTP.getcapabilities() Return the **RFC 3977** capabilities advertised by the server, as a "dict" instance mapping capability names to (possibly empty) lists of values. On legacy servers which don’t understand the "CAPABILITIES" command, an empty dictionary is returned instead. >>> s = NNTP('news.gmane.io') >>> 'POST' in s.getcapabilities() True New in version 3.2. NNTP.login(user=None, password=None, usenetrc=True) Send "AUTHINFO" commands with the user name and password. If *user* and *password* are "None" and *usenetrc* is true, credentials from "~/.netrc" will be used if possible. Unless intentionally delayed, login is normally performed during the "NNTP" object initialization and separately calling this function is unnecessary. To force authentication to be delayed, you must not set *user* or *password* when creating the object, and must set *usenetrc* to False. New in version 3.2. NNTP.starttls(context=None) Send a "STARTTLS" command. This will enable encryption on the NNTP connection. The *context* argument is optional and should be a "ssl.SSLContext" object. Please read Security considerations for best practices. Note that this may not be done after authentication information has been transmitted, and authentication occurs by default if possible during a "NNTP" object initialization. See "NNTP.login()" for information on suppressing this behavior. New in version 3.2. Changed in version 3.4: The method now supports hostname check with "ssl.SSLContext.check_hostname" and *Server Name Indication* (see "ssl.HAS_SNI"). NNTP.newgroups(date, *, file=None) Send a "NEWGROUPS" command. The *date* argument should be a "datetime.date" or "datetime.datetime" object. Return a pair "(response, groups)" where *groups* is a list representing the groups that are new since the given *date*. If *file* is supplied, though, then *groups* will be empty. >>> from datetime import date, timedelta >>> resp, groups = s.newgroups(date.today() - timedelta(days=3)) >>> len(groups) 85 >>> groups[0] GroupInfo(group='gmane.network.tor.devel', last='4', first='1', flag='m') NNTP.newnews(group, date, *, file=None) Send a "NEWNEWS" command. Here, *group* is a group name or "'*'", and *date* has the same meaning as for "newgroups()". Return a pair "(response, articles)" where *articles* is a list of message ids. This command is frequently disabled by NNTP server administrators. NNTP.list(group_pattern=None, *, file=None) Send a "LIST" or "LIST ACTIVE" command. Return a pair "(response, list)" where *list* is a list of tuples representing all the groups available from this NNTP server, optionally matching the pattern string *group_pattern*. Each tuple has the form "(group, last, first, flag)", where *group* is a group name, *last* and *first* are the last and first article numbers, and *flag* usually takes one of these values: * "y": Local postings and articles from peers are allowed. * "m": The group is moderated and all postings must be approved. * "n": No local postings are allowed, only articles from peers. * "j": Articles from peers are filed in the junk group instead. * "x": No local postings, and articles from peers are ignored. * "=foo.bar": Articles are filed in the "foo.bar" group instead. If *flag* has another value, then the status of the newsgroup should be considered unknown. This command can return very large results, especially if *group_pattern* is not specified. It is best to cache the results offline unless you really need to refresh them. Changed in version 3.2: *group_pattern* was added. NNTP.descriptions(grouppattern) Send a "LIST NEWSGROUPS" command, where *grouppattern* is a wildmat string as specified in **RFC 3977** (it’s essentially the same as DOS or UNIX shell wildcard strings). Return a pair "(response, descriptions)", where *descriptions* is a dictionary mapping group names to textual descriptions. >>> resp, descs = s.descriptions('gmane.comp.python.*') >>> len(descs) 295 >>> descs.popitem() ('gmane.comp.python.bio.general', 'BioPython discussion list (Moderated)') NNTP.description(group) Get a description for a single group *group*. If more than one group matches (if ‘group’ is a real wildmat string), return the first match. If no group matches, return an empty string. This elides the response code from the server. If the response code is needed, use "descriptions()". NNTP.group(name) Send a "GROUP" command, where *name* is the group name. The group is selected as the current group, if it exists. Return a tuple "(response, count, first, last, name)" where *count* is the (estimated) number of articles in the group, *first* is the first article number in the group, *last* is the last article number in the group, and *name* is the group name. NNTP.over(message_spec, *, file=None) Send an "OVER" command, or an "XOVER" command on legacy servers. *message_spec* can be either a string representing a message id, or a "(first, last)" tuple of numbers indicating a range of articles in the current group, or a "(first, None)" tuple indicating a range of articles starting from *first* to the last article in the current group, or "None" to select the current article in the current group. Return a pair "(response, overviews)". *overviews* is a list of "(article_number, overview)" tuples, one for each article selected by *message_spec*. Each *overview* is a dictionary with the same number of items, but this number depends on the server. These items are either message headers (the key is then the lower-cased header name) or metadata items (the key is then the metadata name prepended with "":""). The following items are guaranteed to be present by the NNTP specification: * the "subject", "from", "date", "message-id" and "references" headers * the ":bytes" metadata: the number of bytes in the entire raw article (including headers and body) * the ":lines" metadata: the number of lines in the article body The value of each item is either a string, or "None" if not present. It is advisable to use the "decode_header()" function on header values when they may contain non-ASCII characters: >>> _, _, first, last, _ = s.group('gmane.comp.python.devel') >>> resp, overviews = s.over((last, last)) >>> art_num, over = overviews[0] >>> art_num 117216 >>> list(over.keys()) ['xref', 'from', ':lines', ':bytes', 'references', 'date', 'message-id', 'subject'] >>> over['from'] '=?UTF-8?B?Ik1hcnRpbiB2LiBMw7Z3aXMi?= ' >>> nntplib.decode_header(over['from']) '"Martin v. Löwis" ' New in version 3.2. NNTP.help(*, file=None) Send a "HELP" command. Return a pair "(response, list)" where *list* is a list of help strings. NNTP.stat(message_spec=None) Send a "STAT" command, where *message_spec* is either a message id (enclosed in "'<'" and "'>'") or an article number in the current group. If *message_spec* is omitted or "None", the current article in the current group is considered. Return a triple "(response, number, id)" where *number* is the article number and *id* is the message id. >>> _, _, first, last, _ = s.group('gmane.comp.python.devel') >>> resp, number, message_id = s.stat(first) >>> number, message_id (9099, '<20030112190404.GE29873@epoch.metaslash.com>') NNTP.next() Send a "NEXT" command. Return as for "stat()". NNTP.last() Send a "LAST" command. Return as for "stat()". NNTP.article(message_spec=None, *, file=None) Send an "ARTICLE" command, where *message_spec* has the same meaning as for "stat()". Return a tuple "(response, info)" where *info* is a "namedtuple" with three attributes *number*, *message_id* and *lines* (in that order). *number* is the article number in the group (or 0 if the information is not available), *message_id* the message id as a string, and *lines* a list of lines (without terminating newlines) comprising the raw message including headers and body. >>> resp, info = s.article('<20030112190404.GE29873@epoch.metaslash.com>') >>> info.number 0 >>> info.message_id '<20030112190404.GE29873@epoch.metaslash.com>' >>> len(info.lines) 65 >>> info.lines[0] b'Path: main.gmane.org!not-for-mail' >>> info.lines[1] b'From: Neal Norwitz ' >>> info.lines[-3:] [b'There is a patch for 2.3 as well as 2.2.', b'', b'Neal'] NNTP.head(message_spec=None, *, file=None) Same as "article()", but sends a "HEAD" command. The *lines* returned (or written to *file*) will only contain the message headers, not the body. NNTP.body(message_spec=None, *, file=None) Same as "article()", but sends a "BODY" command. The *lines* returned (or written to *file*) will only contain the message body, not the headers. NNTP.post(data) Post an article using the "POST" command. The *data* argument is either a *file object* opened for binary reading, or any iterable of bytes objects (representing raw lines of the article to be posted). It should represent a well-formed news article, including the required headers. The "post()" method automatically escapes lines beginning with "." and appends the termination line. If the method succeeds, the server’s response is returned. If the server refuses posting, a "NNTPReplyError" is raised. NNTP.ihave(message_id, data) Send an "IHAVE" command. *message_id* is the id of the message to send to the server (enclosed in "'<'" and "'>'"). The *data* parameter and the return value are the same as for "post()". NNTP.date() Return a pair "(response, date)". *date* is a "datetime" object containing the current date and time of the server. NNTP.slave() Send a "SLAVE" command. Return the server’s *response*. NNTP.set_debuglevel(level) Set the instance’s debugging level. This controls the amount of debugging output printed. The default, "0", produces no debugging output. A value of "1" produces a moderate amount of debugging output, generally a single line per request or response. A value of "2" or higher produces the maximum amount of debugging output, logging each line sent and received on the connection (including message text). The following are optional NNTP extensions defined in **RFC 2980**. Some of them have been superseded by newer commands in **RFC 3977**. NNTP.xhdr(hdr, str, *, file=None) Send an "XHDR" command. The *hdr* argument is a header keyword, e.g. "'subject'". The *str* argument should have the form "'first- last'" where *first* and *last* are the first and last article numbers to search. Return a pair "(response, list)", where *list* is a list of pairs "(id, text)", where *id* is an article number (as a string) and *text* is the text of the requested header for that article. If the *file* parameter is supplied, then the output of the "XHDR" command is stored in a file. If *file* is a string, then the method will open a file with that name, write to it then close it. If *file* is a *file object*, then it will start calling "write()" on it to store the lines of the command output. If *file* is supplied, then the returned *list* is an empty list. NNTP.xover(start, end, *, file=None) Send an "XOVER" command. *start* and *end* are article numbers delimiting the range of articles to select. The return value is the same of for "over()". It is recommended to use "over()" instead, since it will automatically use the newer "OVER" command if available. Utility functions ================= The module also defines the following utility function: nntplib.decode_header(header_str) Decode a header value, un-escaping any escaped non-ASCII characters. *header_str* must be a "str" object. The unescaped value is returned. Using this function is recommended to display some headers in a human readable form: >>> decode_header("Some subject") 'Some subject' >>> decode_header("=?ISO-8859-15?Q?D=E9buter_en_Python?=") 'Débuter en Python' >>> decode_header("Re: =?UTF-8?B?cHJvYmzDqG1lIGRlIG1hdHJpY2U=?=") 'Re: problème de matrice' "numbers" — Numeric abstract base classes ***************************************** **Source code:** Lib/numbers.py ====================================================================== The "numbers" module (**PEP 3141**) defines a hierarchy of numeric *abstract base classes* which progressively define more operations. None of the types defined in this module are intended to be instantiated. class numbers.Number The root of the numeric hierarchy. If you just want to check if an argument *x* is a number, without caring what kind, use "isinstance(x, Number)". The numeric tower ================= class numbers.Complex Subclasses of this type describe complex numbers and include the operations that work on the built-in "complex" type. These are: conversions to "complex" and "bool", "real", "imag", "+", "-", "*", "/", "**", "abs()", "conjugate()", "==", and "!=". All except "-" and "!=" are abstract. real Abstract. Retrieves the real component of this number. imag Abstract. Retrieves the imaginary component of this number. abstractmethod conjugate() Abstract. Returns the complex conjugate. For example, "(1+3j).conjugate() == (1-3j)". class numbers.Real To "Complex", "Real" adds the operations that work on real numbers. In short, those are: a conversion to "float", "math.trunc()", "round()", "math.floor()", "math.ceil()", "divmod()", "//", "%", "<", "<=", ">", and ">=". Real also provides defaults for "complex()", "real", "imag", and "conjugate()". class numbers.Rational Subtypes "Real" and adds "numerator" and "denominator" properties, which should be in lowest terms. With these, it provides a default for "float()". numerator Abstract. denominator Abstract. class numbers.Integral Subtypes "Rational" and adds a conversion to "int". Provides defaults for "float()", "numerator", and "denominator". Adds abstract methods for "pow()" with modulus and bit-string operations: "<<", ">>", "&", "^", "|", "~". Notes for type implementors =========================== Implementors should be careful to make equal numbers equal and hash them to the same values. This may be subtle if there are two different extensions of the real numbers. For example, "fractions.Fraction" implements "hash()" as follows: def __hash__(self): if self.denominator == 1: # Get integers right. return hash(self.numerator) # Expensive check, but definitely correct. if self == float(self): return hash(float(self)) else: # Use tuple's hash to avoid a high collision rate on # simple fractions. return hash((self.numerator, self.denominator)) Adding More Numeric ABCs ------------------------ There are, of course, more possible ABCs for numbers, and this would be a poor hierarchy if it precluded the possibility of adding those. You can add "MyFoo" between "Complex" and "Real" with: class MyFoo(Complex): ... MyFoo.register(Real) Implementing the arithmetic operations -------------------------------------- We want to implement the arithmetic operations so that mixed-mode operations either call an implementation whose author knew about the types of both arguments, or convert both to the nearest built in type and do the operation there. For subtypes of "Integral", this means that "__add__()" and "__radd__()" should be defined as: class MyIntegral(Integral): def __add__(self, other): if isinstance(other, MyIntegral): return do_my_adding_stuff(self, other) elif isinstance(other, OtherTypeIKnowAbout): return do_my_other_adding_stuff(self, other) else: return NotImplemented def __radd__(self, other): if isinstance(other, MyIntegral): return do_my_adding_stuff(other, self) elif isinstance(other, OtherTypeIKnowAbout): return do_my_other_adding_stuff(other, self) elif isinstance(other, Integral): return int(other) + int(self) elif isinstance(other, Real): return float(other) + float(self) elif isinstance(other, Complex): return complex(other) + complex(self) else: return NotImplemented There are 5 different cases for a mixed-type operation on subclasses of "Complex". I’ll refer to all of the above code that doesn’t refer to "MyIntegral" and "OtherTypeIKnowAbout" as “boilerplate”. "a" will be an instance of "A", which is a subtype of "Complex" ("a : A <: Complex"), and "b : B <: Complex". I’ll consider "a + b": 1. If "A" defines an "__add__()" which accepts "b", all is well. 2. If "A" falls back to the boilerplate code, and it were to return a value from "__add__()", we’d miss the possibility that "B" defines a more intelligent "__radd__()", so the boilerplate should return "NotImplemented" from "__add__()". (Or "A" may not implement "__add__()" at all.) 3. Then "B"’s "__radd__()" gets a chance. If it accepts "a", all is well. 4. If it falls back to the boilerplate, there are no more possible methods to try, so this is where the default implementation should live. 5. If "B <: A", Python tries "B.__radd__" before "A.__add__". This is ok, because it was implemented with knowledge of "A", so it can handle those instances before delegating to "Complex". If "A <: Complex" and "B <: Real" without sharing any other knowledge, then the appropriate shared operation is the one involving the built in "complex", and both "__radd__()" s land there, so "a+b == b+a". Because most of the operations on any given type will be very similar, it can be useful to define a helper function which generates the forward and reverse instances of any given operator. For example, "fractions.Fraction" uses: def _operator_fallbacks(monomorphic_operator, fallback_operator): def forward(a, b): if isinstance(b, (int, Fraction)): return monomorphic_operator(a, b) elif isinstance(b, float): return fallback_operator(float(a), b) elif isinstance(b, complex): return fallback_operator(complex(a), b) else: return NotImplemented forward.__name__ = '__' + fallback_operator.__name__ + '__' forward.__doc__ = monomorphic_operator.__doc__ def reverse(b, a): if isinstance(a, Rational): # Includes ints. return monomorphic_operator(a, b) elif isinstance(a, numbers.Real): return fallback_operator(float(a), float(b)) elif isinstance(a, numbers.Complex): return fallback_operator(complex(a), complex(b)) else: return NotImplemented reverse.__name__ = '__r' + fallback_operator.__name__ + '__' reverse.__doc__ = monomorphic_operator.__doc__ return forward, reverse def _add(a, b): """a + b""" return Fraction(a.numerator * b.denominator + b.numerator * a.denominator, a.denominator * b.denominator) __add__, __radd__ = _operator_fallbacks(_add, operator.add) # ... Numeric and Mathematical Modules ******************************** The modules described in this chapter provide numeric and math-related functions and data types. The "numbers" module defines an abstract hierarchy of numeric types. The "math" and "cmath" modules contain various mathematical functions for floating-point and complex numbers. The "decimal" module supports exact representations of decimal numbers, using arbitrary precision arithmetic. The following modules are documented in this chapter: * "numbers" — Numeric abstract base classes * The numeric tower * Notes for type implementors * Adding More Numeric ABCs * Implementing the arithmetic operations * "math" — Mathematical functions * Number-theoretic and representation functions * Power and logarithmic functions * Trigonometric functions * Angular conversion * Hyperbolic functions * Special functions * Constants * "cmath" — Mathematical functions for complex numbers * Conversions to and from polar coordinates * Power and logarithmic functions * Trigonometric functions * Hyperbolic functions * Classification functions * Constants * "decimal" — Decimal fixed point and floating point arithmetic * Quick-start Tutorial * Decimal objects * Logical operands * Context objects * Constants * Rounding modes * Signals * Floating Point Notes * Mitigating round-off error with increased precision * Special values * Working with threads * Recipes * Decimal FAQ * "fractions" — Rational numbers * "random" — Generate pseudo-random numbers * Bookkeeping functions * Functions for bytes * Functions for integers * Functions for sequences * Real-valued distributions * Alternative Generator * Notes on Reproducibility * Examples * Recipes * "statistics" — Mathematical statistics functions * Averages and measures of central location * Measures of spread * Function details * Exceptions * "NormalDist" objects * "NormalDist" Examples and Recipes "operator" — Standard operators as functions ******************************************** **Source code:** Lib/operator.py ====================================================================== The "operator" module exports a set of efficient functions corresponding to the intrinsic operators of Python. For example, "operator.add(x, y)" is equivalent to the expression "x+y". Many function names are those used for special methods, without the double underscores. For backward compatibility, many of these have a variant with the double underscores kept. The variants without the double underscores are preferred for clarity. The functions fall into categories that perform object comparisons, logical operations, mathematical operations and sequence operations. The object comparison functions are useful for all objects, and are named after the rich comparison operators they support: operator.lt(a, b) operator.le(a, b) operator.eq(a, b) operator.ne(a, b) operator.ge(a, b) operator.gt(a, b) operator.__lt__(a, b) operator.__le__(a, b) operator.__eq__(a, b) operator.__ne__(a, b) operator.__ge__(a, b) operator.__gt__(a, b) Perform “rich comparisons” between *a* and *b*. Specifically, "lt(a, b)" is equivalent to "a < b", "le(a, b)" is equivalent to "a <= b", "eq(a, b)" is equivalent to "a == b", "ne(a, b)" is equivalent to "a != b", "gt(a, b)" is equivalent to "a > b" and "ge(a, b)" is equivalent to "a >= b". Note that these functions can return any value, which may or may not be interpretable as a Boolean value. See Comparisons for more information about rich comparisons. The logical operations are also generally applicable to all objects, and support truth tests, identity tests, and boolean operations: operator.not_(obj) operator.__not__(obj) Return the outcome of "not" *obj*. (Note that there is no "__not__()" method for object instances; only the interpreter core defines this operation. The result is affected by the "__bool__()" and "__len__()" methods.) operator.truth(obj) Return "True" if *obj* is true, and "False" otherwise. This is equivalent to using the "bool" constructor. operator.is_(a, b) Return "a is b". Tests object identity. operator.is_not(a, b) Return "a is not b". Tests object identity. The mathematical and bitwise operations are the most numerous: operator.abs(obj) operator.__abs__(obj) Return the absolute value of *obj*. operator.add(a, b) operator.__add__(a, b) Return "a + b", for *a* and *b* numbers. operator.and_(a, b) operator.__and__(a, b) Return the bitwise and of *a* and *b*. operator.floordiv(a, b) operator.__floordiv__(a, b) Return "a // b". operator.index(a) operator.__index__(a) Return *a* converted to an integer. Equivalent to "a.__index__()". operator.inv(obj) operator.invert(obj) operator.__inv__(obj) operator.__invert__(obj) Return the bitwise inverse of the number *obj*. This is equivalent to "~obj". operator.lshift(a, b) operator.__lshift__(a, b) Return *a* shifted left by *b*. operator.mod(a, b) operator.__mod__(a, b) Return "a % b". operator.mul(a, b) operator.__mul__(a, b) Return "a * b", for *a* and *b* numbers. operator.matmul(a, b) operator.__matmul__(a, b) Return "a @ b". New in version 3.5. operator.neg(obj) operator.__neg__(obj) Return *obj* negated ("-obj"). operator.or_(a, b) operator.__or__(a, b) Return the bitwise or of *a* and *b*. operator.pos(obj) operator.__pos__(obj) Return *obj* positive ("+obj"). operator.pow(a, b) operator.__pow__(a, b) Return "a ** b", for *a* and *b* numbers. operator.rshift(a, b) operator.__rshift__(a, b) Return *a* shifted right by *b*. operator.sub(a, b) operator.__sub__(a, b) Return "a - b". operator.truediv(a, b) operator.__truediv__(a, b) Return "a / b" where 2/3 is .66 rather than 0. This is also known as “true” division. operator.xor(a, b) operator.__xor__(a, b) Return the bitwise exclusive or of *a* and *b*. Operations which work with sequences (some of them with mappings too) include: operator.concat(a, b) operator.__concat__(a, b) Return "a + b" for *a* and *b* sequences. operator.contains(a, b) operator.__contains__(a, b) Return the outcome of the test "b in a". Note the reversed operands. operator.countOf(a, b) Return the number of occurrences of *b* in *a*. operator.delitem(a, b) operator.__delitem__(a, b) Remove the value of *a* at index *b*. operator.getitem(a, b) operator.__getitem__(a, b) Return the value of *a* at index *b*. operator.indexOf(a, b) Return the index of the first of occurrence of *b* in *a*. operator.setitem(a, b, c) operator.__setitem__(a, b, c) Set the value of *a* at index *b* to *c*. operator.length_hint(obj, default=0) Return an estimated length for the object *o*. First try to return its actual length, then an estimate using "object.__length_hint__()", and finally return the default value. New in version 3.4. The "operator" module also defines tools for generalized attribute and item lookups. These are useful for making fast field extractors as arguments for "map()", "sorted()", "itertools.groupby()", or other functions that expect a function argument. operator.attrgetter(attr) operator.attrgetter(*attrs) Return a callable object that fetches *attr* from its operand. If more than one attribute is requested, returns a tuple of attributes. The attribute names can also contain dots. For example: * After "f = attrgetter('name')", the call "f(b)" returns "b.name". * After "f = attrgetter('name', 'date')", the call "f(b)" returns "(b.name, b.date)". * After "f = attrgetter('name.first', 'name.last')", the call "f(b)" returns "(b.name.first, b.name.last)". Equivalent to: def attrgetter(*items): if any(not isinstance(item, str) for item in items): raise TypeError('attribute name must be a string') if len(items) == 1: attr = items[0] def g(obj): return resolve_attr(obj, attr) else: def g(obj): return tuple(resolve_attr(obj, attr) for attr in items) return g def resolve_attr(obj, attr): for name in attr.split("."): obj = getattr(obj, name) return obj operator.itemgetter(item) operator.itemgetter(*items) Return a callable object that fetches *item* from its operand using the operand’s "__getitem__()" method. If multiple items are specified, returns a tuple of lookup values. For example: * After "f = itemgetter(2)", the call "f(r)" returns "r[2]". * After "g = itemgetter(2, 5, 3)", the call "g(r)" returns "(r[2], r[5], r[3])". Equivalent to: def itemgetter(*items): if len(items) == 1: item = items[0] def g(obj): return obj[item] else: def g(obj): return tuple(obj[item] for item in items) return g The items can be any type accepted by the operand’s "__getitem__()" method. Dictionaries accept any hashable value. Lists, tuples, and strings accept an index or a slice: >>> itemgetter(1)('ABCDEFG') 'B' >>> itemgetter(1, 3, 5)('ABCDEFG') ('B', 'D', 'F') >>> itemgetter(slice(2, None))('ABCDEFG') 'CDEFG' >>> soldier = dict(rank='captain', name='dotterbart') >>> itemgetter('rank')(soldier) 'captain' Example of using "itemgetter()" to retrieve specific fields from a tuple record: >>> inventory = [('apple', 3), ('banana', 2), ('pear', 5), ('orange', 1)] >>> getcount = itemgetter(1) >>> list(map(getcount, inventory)) [3, 2, 5, 1] >>> sorted(inventory, key=getcount) [('orange', 1), ('banana', 2), ('apple', 3), ('pear', 5)] operator.methodcaller(name, /, *args, **kwargs) Return a callable object that calls the method *name* on its operand. If additional arguments and/or keyword arguments are given, they will be given to the method as well. For example: * After "f = methodcaller('name')", the call "f(b)" returns "b.name()". * After "f = methodcaller('name', 'foo', bar=1)", the call "f(b)" returns "b.name('foo', bar=1)". Equivalent to: def methodcaller(name, /, *args, **kwargs): def caller(obj): return getattr(obj, name)(*args, **kwargs) return caller Mapping Operators to Functions ============================== This table shows how abstract operations correspond to operator symbols in the Python syntax and the functions in the "operator" module. +-------------------------+---------------------------+-----------------------------------------+ | Operation | Syntax | Function | |=========================|===========================|=========================================| | Addition | "a + b" | "add(a, b)" | +-------------------------+---------------------------+-----------------------------------------+ | Concatenation | "seq1 + seq2" | "concat(seq1, seq2)" | +-------------------------+---------------------------+-----------------------------------------+ | Containment Test | "obj in seq" | "contains(seq, obj)" | +-------------------------+---------------------------+-----------------------------------------+ | Division | "a / b" | "truediv(a, b)" | +-------------------------+---------------------------+-----------------------------------------+ | Division | "a // b" | "floordiv(a, b)" | +-------------------------+---------------------------+-----------------------------------------+ | Bitwise And | "a & b" | "and_(a, b)" | +-------------------------+---------------------------+-----------------------------------------+ | Bitwise Exclusive Or | "a ^ b" | "xor(a, b)" | +-------------------------+---------------------------+-----------------------------------------+ | Bitwise Inversion | "~ a" | "invert(a)" | +-------------------------+---------------------------+-----------------------------------------+ | Bitwise Or | "a | b" | "or_(a, b)" | +-------------------------+---------------------------+-----------------------------------------+ | Exponentiation | "a ** b" | "pow(a, b)" | +-------------------------+---------------------------+-----------------------------------------+ | Identity | "a is b" | "is_(a, b)" | +-------------------------+---------------------------+-----------------------------------------+ | Identity | "a is not b" | "is_not(a, b)" | +-------------------------+---------------------------+-----------------------------------------+ | Indexed Assignment | "obj[k] = v" | "setitem(obj, k, v)" | +-------------------------+---------------------------+-----------------------------------------+ | Indexed Deletion | "del obj[k]" | "delitem(obj, k)" | +-------------------------+---------------------------+-----------------------------------------+ | Indexing | "obj[k]" | "getitem(obj, k)" | +-------------------------+---------------------------+-----------------------------------------+ | Left Shift | "a << b" | "lshift(a, b)" | +-------------------------+---------------------------+-----------------------------------------+ | Modulo | "a % b" | "mod(a, b)" | +-------------------------+---------------------------+-----------------------------------------+ | Multiplication | "a * b" | "mul(a, b)" | +-------------------------+---------------------------+-----------------------------------------+ | Matrix Multiplication | "a @ b" | "matmul(a, b)" | +-------------------------+---------------------------+-----------------------------------------+ | Negation (Arithmetic) | "- a" | "neg(a)" | +-------------------------+---------------------------+-----------------------------------------+ | Negation (Logical) | "not a" | "not_(a)" | +-------------------------+---------------------------+-----------------------------------------+ | Positive | "+ a" | "pos(a)" | +-------------------------+---------------------------+-----------------------------------------+ | Right Shift | "a >> b" | "rshift(a, b)" | +-------------------------+---------------------------+-----------------------------------------+ | Slice Assignment | "seq[i:j] = values" | "setitem(seq, slice(i, j), values)" | +-------------------------+---------------------------+-----------------------------------------+ | Slice Deletion | "del seq[i:j]" | "delitem(seq, slice(i, j))" | +-------------------------+---------------------------+-----------------------------------------+ | Slicing | "seq[i:j]" | "getitem(seq, slice(i, j))" | +-------------------------+---------------------------+-----------------------------------------+ | String Formatting | "s % obj" | "mod(s, obj)" | +-------------------------+---------------------------+-----------------------------------------+ | Subtraction | "a - b" | "sub(a, b)" | +-------------------------+---------------------------+-----------------------------------------+ | Truth Test | "obj" | "truth(obj)" | +-------------------------+---------------------------+-----------------------------------------+ | Ordering | "a < b" | "lt(a, b)" | +-------------------------+---------------------------+-----------------------------------------+ | Ordering | "a <= b" | "le(a, b)" | +-------------------------+---------------------------+-----------------------------------------+ | Equality | "a == b" | "eq(a, b)" | +-------------------------+---------------------------+-----------------------------------------+ | Difference | "a != b" | "ne(a, b)" | +-------------------------+---------------------------+-----------------------------------------+ | Ordering | "a >= b" | "ge(a, b)" | +-------------------------+---------------------------+-----------------------------------------+ | Ordering | "a > b" | "gt(a, b)" | +-------------------------+---------------------------+-----------------------------------------+ In-place Operators ================== Many operations have an “in-place” version. Listed below are functions providing a more primitive access to in-place operators than the usual syntax does; for example, the *statement* "x += y" is equivalent to "x = operator.iadd(x, y)". Another way to put it is to say that "z = operator.iadd(x, y)" is equivalent to the compound statement "z = x; z += y". In those examples, note that when an in-place method is called, the computation and assignment are performed in two separate steps. The in-place functions listed below only do the first step, calling the in-place method. The second step, assignment, is not handled. For immutable targets such as strings, numbers, and tuples, the updated value is computed, but not assigned back to the input variable: >>> a = 'hello' >>> iadd(a, ' world') 'hello world' >>> a 'hello' For mutable targets such as lists and dictionaries, the in-place method will perform the update, so no subsequent assignment is necessary: >>> s = ['h', 'e', 'l', 'l', 'o'] >>> iadd(s, [' ', 'w', 'o', 'r', 'l', 'd']) ['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd'] >>> s ['h', 'e', 'l', 'l', 'o', ' ', 'w', 'o', 'r', 'l', 'd'] operator.iadd(a, b) operator.__iadd__(a, b) "a = iadd(a, b)" is equivalent to "a += b". operator.iand(a, b) operator.__iand__(a, b) "a = iand(a, b)" is equivalent to "a &= b". operator.iconcat(a, b) operator.__iconcat__(a, b) "a = iconcat(a, b)" is equivalent to "a += b" for *a* and *b* sequences. operator.ifloordiv(a, b) operator.__ifloordiv__(a, b) "a = ifloordiv(a, b)" is equivalent to "a //= b". operator.ilshift(a, b) operator.__ilshift__(a, b) "a = ilshift(a, b)" is equivalent to "a <<= b". operator.imod(a, b) operator.__imod__(a, b) "a = imod(a, b)" is equivalent to "a %= b". operator.imul(a, b) operator.__imul__(a, b) "a = imul(a, b)" is equivalent to "a *= b". operator.imatmul(a, b) operator.__imatmul__(a, b) "a = imatmul(a, b)" is equivalent to "a @= b". New in version 3.5. operator.ior(a, b) operator.__ior__(a, b) "a = ior(a, b)" is equivalent to "a |= b". operator.ipow(a, b) operator.__ipow__(a, b) "a = ipow(a, b)" is equivalent to "a **= b". operator.irshift(a, b) operator.__irshift__(a, b) "a = irshift(a, b)" is equivalent to "a >>= b". operator.isub(a, b) operator.__isub__(a, b) "a = isub(a, b)" is equivalent to "a -= b". operator.itruediv(a, b) operator.__itruediv__(a, b) "a = itruediv(a, b)" is equivalent to "a /= b". operator.ixor(a, b) operator.__ixor__(a, b) "a = ixor(a, b)" is equivalent to "a ^= b". "optparse" — Parser for command line options ******************************************** **Source code:** Lib/optparse.py Deprecated since version 3.2: The "optparse" module is deprecated and will not be developed further; development will continue with the "argparse" module. ====================================================================== "optparse" is a more convenient, flexible, and powerful library for parsing command-line options than the old "getopt" module. "optparse" uses a more declarative style of command-line parsing: you create an instance of "OptionParser", populate it with options, and parse the command line. "optparse" allows users to specify options in the conventional GNU/POSIX syntax, and additionally generates usage and help messages for you. Here’s an example of using "optparse" in a simple script: from optparse import OptionParser ... parser = OptionParser() parser.add_option("-f", "--file", dest="filename", help="write report to FILE", metavar="FILE") parser.add_option("-q", "--quiet", action="store_false", dest="verbose", default=True, help="don't print status messages to stdout") (options, args) = parser.parse_args() With these few lines of code, users of your script can now do the “usual thing” on the command-line, for example: --file=outfile -q As it parses the command line, "optparse" sets attributes of the "options" object returned by "parse_args()" based on user-supplied command-line values. When "parse_args()" returns from parsing this command line, "options.filename" will be ""outfile"" and "options.verbose" will be "False". "optparse" supports both long and short options, allows short options to be merged together, and allows options to be associated with their arguments in a variety of ways. Thus, the following command lines are all equivalent to the above example: -f outfile --quiet --quiet --file outfile -q -foutfile -qfoutfile Additionally, users can run one of the following -h --help and "optparse" will print out a brief summary of your script’s options: Usage: [options] Options: -h, --help show this help message and exit -f FILE, --file=FILE write report to FILE -q, --quiet don't print status messages to stdout where the value of *yourscript* is determined at runtime (normally from "sys.argv[0]"). Background ========== "optparse" was explicitly designed to encourage the creation of programs with straightforward, conventional command-line interfaces. To that end, it supports only the most common command-line syntax and semantics conventionally used under Unix. If you are unfamiliar with these conventions, read this section to acquaint yourself with them. Terminology ----------- argument a string entered on the command-line, and passed by the shell to "execl()" or "execv()". In Python, arguments are elements of "sys.argv[1:]" ("sys.argv[0]" is the name of the program being executed). Unix shells also use the term “word”. It is occasionally desirable to substitute an argument list other than "sys.argv[1:]", so you should read “argument” as “an element of "sys.argv[1:]", or of some other list provided as a substitute for "sys.argv[1:]"”. option an argument used to supply extra information to guide or customize the execution of a program. There are many different syntaxes for options; the traditional Unix syntax is a hyphen (“-”) followed by a single letter, e.g. "-x" or "-F". Also, traditional Unix syntax allows multiple options to be merged into a single argument, e.g. "-x -F" is equivalent to "-xF". The GNU project introduced "--" followed by a series of hyphen-separated words, e.g. "--file" or " --dry-run". These are the only two option syntaxes provided by "optparse". Some other option syntaxes that the world has seen include: * a hyphen followed by a few letters, e.g. "-pf" (this is *not* the same as multiple options merged into a single argument) * a hyphen followed by a whole word, e.g. "-file" (this is technically equivalent to the previous syntax, but they aren’t usually seen in the same program) * a plus sign followed by a single letter, or a few letters, or a word, e.g. "+f", "+rgb" * a slash followed by a letter, or a few letters, or a word, e.g. "/f", "/file" These option syntaxes are not supported by "optparse", and they never will be. This is deliberate: the first three are non- standard on any environment, and the last only makes sense if you’re exclusively targeting Windows or certain legacy platforms (e.g. VMS, MS-DOS). option argument an argument that follows an option, is closely associated with that option, and is consumed from the argument list when that option is. With "optparse", option arguments may either be in a separate argument from their option: -f foo --file foo or included in the same argument: -ffoo --file=foo Typically, a given option either takes an argument or it doesn’t. Lots of people want an “optional option arguments” feature, meaning that some options will take an argument if they see it, and won’t if they don’t. This is somewhat controversial, because it makes parsing ambiguous: if "-a" takes an optional argument and "-b" is another option entirely, how do we interpret "-ab"? Because of this ambiguity, "optparse" does not support this feature. positional argument something leftover in the argument list after options have been parsed, i.e. after options and their arguments have been parsed and removed from the argument list. required option an option that must be supplied on the command-line; note that the phrase “required option” is self-contradictory in English. "optparse" doesn’t prevent you from implementing required options, but doesn’t give you much help at it either. For example, consider this hypothetical command-line: prog -v --report report.txt foo bar "-v" and "--report" are both options. Assuming that "--report" takes one argument, "report.txt" is an option argument. "foo" and "bar" are positional arguments. What are options for? --------------------- Options are used to provide extra information to tune or customize the execution of a program. In case it wasn’t clear, options are usually *optional*. A program should be able to run just fine with no options whatsoever. (Pick a random program from the Unix or GNU toolsets. Can it run without any options at all and still make sense? The main exceptions are "find", "tar", and "dd"—all of which are mutant oddballs that have been rightly criticized for their non-standard syntax and confusing interfaces.) Lots of people want their programs to have “required options”. Think about it. If it’s required, then it’s *not optional*! If there is a piece of information that your program absolutely requires in order to run successfully, that’s what positional arguments are for. As an example of good command-line interface design, consider the humble "cp" utility, for copying files. It doesn’t make much sense to try to copy files without supplying a destination and at least one source. Hence, "cp" fails if you run it with no arguments. However, it has a flexible, useful syntax that does not require any options at all: cp SOURCE DEST cp SOURCE ... DEST-DIR You can get pretty far with just that. Most "cp" implementations provide a bunch of options to tweak exactly how the files are copied: you can preserve mode and modification time, avoid following symlinks, ask before clobbering existing files, etc. But none of this distracts from the core mission of "cp", which is to copy either one file to another, or several files to another directory. What are positional arguments for? ---------------------------------- Positional arguments are for those pieces of information that your program absolutely, positively requires to run. A good user interface should have as few absolute requirements as possible. If your program requires 17 distinct pieces of information in order to run successfully, it doesn’t much matter *how* you get that information from the user—most people will give up and walk away before they successfully run the program. This applies whether the user interface is a command-line, a configuration file, or a GUI: if you make that many demands on your users, most of them will simply give up. In short, try to minimize the amount of information that users are absolutely required to supply—use sensible defaults whenever possible. Of course, you also want to make your programs reasonably flexible. That’s what options are for. Again, it doesn’t matter if they are entries in a config file, widgets in the “Preferences” dialog of a GUI, or command-line options—the more options you implement, the more flexible your program is, and the more complicated its implementation becomes. Too much flexibility has drawbacks as well, of course; too many options can overwhelm users and make your code much harder to maintain. Tutorial ======== While "optparse" is quite flexible and powerful, it’s also straightforward to use in most cases. This section covers the code patterns that are common to any "optparse"-based program. First, you need to import the OptionParser class; then, early in the main program, create an OptionParser instance: from optparse import OptionParser ... parser = OptionParser() Then you can start defining options. The basic syntax is: parser.add_option(opt_str, ..., attr=value, ...) Each option has one or more option strings, such as "-f" or "--file", and several option attributes that tell "optparse" what to expect and what to do when it encounters that option on the command line. Typically, each option will have one short option string and one long option string, e.g.: parser.add_option("-f", "--file", ...) You’re free to define as many short option strings and as many long option strings as you like (including zero), as long as there is at least one option string overall. The option strings passed to "OptionParser.add_option()" are effectively labels for the option defined by that call. For brevity, we will frequently refer to *encountering an option* on the command line; in reality, "optparse" encounters *option strings* and looks up options from them. Once all of your options are defined, instruct "optparse" to parse your program’s command line: (options, args) = parser.parse_args() (If you like, you can pass a custom argument list to "parse_args()", but that’s rarely necessary: by default it uses "sys.argv[1:]".) "parse_args()" returns two values: * "options", an object containing values for all of your options—e.g. if "--file" takes a single string argument, then "options.file" will be the filename supplied by the user, or "None" if the user did not supply that option * "args", the list of positional arguments leftover after parsing options This tutorial section only covers the four most important option attributes: "action", "type", "dest" (destination), and "help". Of these, "action" is the most fundamental. Understanding option actions ---------------------------- Actions tell "optparse" what to do when it encounters an option on the command line. There is a fixed set of actions hard-coded into "optparse"; adding new actions is an advanced topic covered in section Extending optparse. Most actions tell "optparse" to store a value in some variable—for example, take a string from the command line and store it in an attribute of "options". If you don’t specify an option action, "optparse" defaults to "store". The store action ---------------- The most common option action is "store", which tells "optparse" to take the next argument (or the remainder of the current argument), ensure that it is of the correct type, and store it to your chosen destination. For example: parser.add_option("-f", "--file", action="store", type="string", dest="filename") Now let’s make up a fake command line and ask "optparse" to parse it: args = ["-f", "foo.txt"] (options, args) = parser.parse_args(args) When "optparse" sees the option string "-f", it consumes the next argument, "foo.txt", and stores it in "options.filename". So, after this call to "parse_args()", "options.filename" is ""foo.txt"". Some other option types supported by "optparse" are "int" and "float". Here’s an option that expects an integer argument: parser.add_option("-n", type="int", dest="num") Note that this option has no long option string, which is perfectly acceptable. Also, there’s no explicit action, since the default is "store". Let’s parse another fake command-line. This time, we’ll jam the option argument right up against the option: since "-n42" (one argument) is equivalent to "-n 42" (two arguments), the code (options, args) = parser.parse_args(["-n42"]) print(options.num) will print "42". If you don’t specify a type, "optparse" assumes "string". Combined with the fact that the default action is "store", that means our first example can be a lot shorter: parser.add_option("-f", "--file", dest="filename") If you don’t supply a destination, "optparse" figures out a sensible default from the option strings: if the first long option string is " --foo-bar", then the default destination is "foo_bar". If there are no long option strings, "optparse" looks at the first short option string: the default destination for "-f" is "f". "optparse" also includes the built-in "complex" type. Adding types is covered in section Extending optparse. Handling boolean (flag) options ------------------------------- Flag options—set a variable to true or false when a particular option is seen—are quite common. "optparse" supports them with two separate actions, "store_true" and "store_false". For example, you might have a "verbose" flag that is turned on with "-v" and off with "-q": parser.add_option("-v", action="store_true", dest="verbose") parser.add_option("-q", action="store_false", dest="verbose") Here we have two different options with the same destination, which is perfectly OK. (It just means you have to be a bit careful when setting default values—see below.) When "optparse" encounters "-v" on the command line, it sets "options.verbose" to "True"; when it encounters "-q", "options.verbose" is set to "False". Other actions ------------- Some other actions supported by "optparse" are: ""store_const"" store a constant value ""append"" append this option’s argument to a list ""count"" increment a counter by one ""callback"" call a specified function These are covered in section Reference Guide, and section Option Callbacks. Default values -------------- All of the above examples involve setting some variable (the “destination”) when certain command-line options are seen. What happens if those options are never seen? Since we didn’t supply any defaults, they are all set to "None". This is usually fine, but sometimes you want more control. "optparse" lets you supply a default value for each destination, which is assigned before the command line is parsed. First, consider the verbose/quiet example. If we want "optparse" to set "verbose" to "True" unless "-q" is seen, then we can do this: parser.add_option("-v", action="store_true", dest="verbose", default=True) parser.add_option("-q", action="store_false", dest="verbose") Since default values apply to the *destination* rather than to any particular option, and these two options happen to have the same destination, this is exactly equivalent: parser.add_option("-v", action="store_true", dest="verbose") parser.add_option("-q", action="store_false", dest="verbose", default=True) Consider this: parser.add_option("-v", action="store_true", dest="verbose", default=False) parser.add_option("-q", action="store_false", dest="verbose", default=True) Again, the default value for "verbose" will be "True": the last default value supplied for any particular destination is the one that counts. A clearer way to specify default values is the "set_defaults()" method of OptionParser, which you can call at any time before calling "parse_args()": parser.set_defaults(verbose=True) parser.add_option(...) (options, args) = parser.parse_args() As before, the last value specified for a given option destination is the one that counts. For clarity, try to use one method or the other of setting default values, not both. Generating help --------------- "optparse"’s ability to generate help and usage text automatically is useful for creating user-friendly command-line interfaces. All you have to do is supply a "help" value for each option, and optionally a short usage message for your whole program. Here’s an OptionParser populated with user-friendly (documented) options: usage = "usage: %prog [options] arg1 arg2" parser = OptionParser(usage=usage) parser.add_option("-v", "--verbose", action="store_true", dest="verbose", default=True, help="make lots of noise [default]") parser.add_option("-q", "--quiet", action="store_false", dest="verbose", help="be vewwy quiet (I'm hunting wabbits)") parser.add_option("-f", "--filename", metavar="FILE", help="write output to FILE") parser.add_option("-m", "--mode", default="intermediate", help="interaction mode: novice, intermediate, " "or expert [default: %default]") If "optparse" encounters either "-h" or "--help" on the command-line, or if you just call "parser.print_help()", it prints the following to standard output: Usage: [options] arg1 arg2 Options: -h, --help show this help message and exit -v, --verbose make lots of noise [default] -q, --quiet be vewwy quiet (I'm hunting wabbits) -f FILE, --filename=FILE write output to FILE -m MODE, --mode=MODE interaction mode: novice, intermediate, or expert [default: intermediate] (If the help output is triggered by a help option, "optparse" exits after printing the help text.) There’s a lot going on here to help "optparse" generate the best possible help message: * the script defines its own usage message: usage = "usage: %prog [options] arg1 arg2" "optparse" expands "%prog" in the usage string to the name of the current program, i.e. "os.path.basename(sys.argv[0])". The expanded string is then printed before the detailed option help. If you don’t supply a usage string, "optparse" uses a bland but sensible default: ""Usage: %prog [options]"", which is fine if your script doesn’t take any positional arguments. * every option defines a help string, and doesn’t worry about line- wrapping—"optparse" takes care of wrapping lines and making the help output look good. * options that take a value indicate this fact in their automatically- generated help message, e.g. for the “mode” option: -m MODE, --mode=MODE Here, “MODE” is called the meta-variable: it stands for the argument that the user is expected to supply to "-m"/"--mode". By default, "optparse" converts the destination variable name to uppercase and uses that for the meta-variable. Sometimes, that’s not what you want—for example, the "--filename" option explicitly sets "metavar="FILE"", resulting in this automatically-generated option description: -f FILE, --filename=FILE This is important for more than just saving space, though: the manually written help text uses the meta-variable "FILE" to clue the user in that there’s a connection between the semi-formal syntax "-f FILE" and the informal semantic description “write output to FILE”. This is a simple but effective way to make your help text a lot clearer and more useful for end users. * options that have a default value can include "%default" in the help string—"optparse" will replace it with "str()" of the option’s default value. If an option has no default value (or the default value is "None"), "%default" expands to "none". Grouping Options ~~~~~~~~~~~~~~~~ When dealing with many options, it is convenient to group these options for better help output. An "OptionParser" can contain several option groups, each of which can contain several options. An option group is obtained using the class "OptionGroup": class optparse.OptionGroup(parser, title, description=None) where * parser is the "OptionParser" instance the group will be inserted in to * title is the group title * description, optional, is a long description of the group "OptionGroup" inherits from "OptionContainer" (like "OptionParser") and so the "add_option()" method can be used to add an option to the group. Once all the options are declared, using the "OptionParser" method "add_option_group()" the group is added to the previously defined parser. Continuing with the parser defined in the previous section, adding an "OptionGroup" to a parser is easy: group = OptionGroup(parser, "Dangerous Options", "Caution: use these options at your own risk. " "It is believed that some of them bite.") group.add_option("-g", action="store_true", help="Group option.") parser.add_option_group(group) This would result in the following help output: Usage: [options] arg1 arg2 Options: -h, --help show this help message and exit -v, --verbose make lots of noise [default] -q, --quiet be vewwy quiet (I'm hunting wabbits) -f FILE, --filename=FILE write output to FILE -m MODE, --mode=MODE interaction mode: novice, intermediate, or expert [default: intermediate] Dangerous Options: Caution: use these options at your own risk. It is believed that some of them bite. -g Group option. A bit more complete example might involve using more than one group: still extending the previous example: group = OptionGroup(parser, "Dangerous Options", "Caution: use these options at your own risk. " "It is believed that some of them bite.") group.add_option("-g", action="store_true", help="Group option.") parser.add_option_group(group) group = OptionGroup(parser, "Debug Options") group.add_option("-d", "--debug", action="store_true", help="Print debug information") group.add_option("-s", "--sql", action="store_true", help="Print all SQL statements executed") group.add_option("-e", action="store_true", help="Print every action done") parser.add_option_group(group) that results in the following output: Usage: [options] arg1 arg2 Options: -h, --help show this help message and exit -v, --verbose make lots of noise [default] -q, --quiet be vewwy quiet (I'm hunting wabbits) -f FILE, --filename=FILE write output to FILE -m MODE, --mode=MODE interaction mode: novice, intermediate, or expert [default: intermediate] Dangerous Options: Caution: use these options at your own risk. It is believed that some of them bite. -g Group option. Debug Options: -d, --debug Print debug information -s, --sql Print all SQL statements executed -e Print every action done Another interesting method, in particular when working programmatically with option groups is: OptionParser.get_option_group(opt_str) Return the "OptionGroup" to which the short or long option string *opt_str* (e.g. "'-o'" or "'--option'") belongs. If there’s no such "OptionGroup", return "None". Printing a version string ------------------------- Similar to the brief usage string, "optparse" can also print a version string for your program. You have to supply the string as the "version" argument to OptionParser: parser = OptionParser(usage="%prog [-f] [-q]", version="%prog 1.0") "%prog" is expanded just like it is in "usage". Apart from that, "version" can contain anything you like. When you supply it, "optparse" automatically adds a "--version" option to your parser. If it encounters this option on the command line, it expands your "version" string (by replacing "%prog"), prints it to stdout, and exits. For example, if your script is called "/usr/bin/foo": $ /usr/bin/foo --version foo 1.0 The following two methods can be used to print and get the "version" string: OptionParser.print_version(file=None) Print the version message for the current program ("self.version") to *file* (default stdout). As with "print_usage()", any occurrence of "%prog" in "self.version" is replaced with the name of the current program. Does nothing if "self.version" is empty or undefined. OptionParser.get_version() Same as "print_version()" but returns the version string instead of printing it. How "optparse" handles errors ----------------------------- There are two broad classes of errors that "optparse" has to worry about: programmer errors and user errors. Programmer errors are usually erroneous calls to "OptionParser.add_option()", e.g. invalid option strings, unknown option attributes, missing option attributes, etc. These are dealt with in the usual way: raise an exception (either "optparse.OptionError" or "TypeError") and let the program crash. Handling user errors is much more important, since they are guaranteed to happen no matter how stable your code is. "optparse" can automatically detect some user errors, such as bad option arguments (passing "-n 4x" where "-n" takes an integer argument), missing arguments ("-n" at the end of the command line, where "-n" takes an argument of any type). Also, you can call "OptionParser.error()" to signal an application-defined error condition: (options, args) = parser.parse_args() ... if options.a and options.b: parser.error("options -a and -b are mutually exclusive") In either case, "optparse" handles the error the same way: it prints the program’s usage message and an error message to standard error and exits with error status 2. Consider the first example above, where the user passes "4x" to an option that takes an integer: $ /usr/bin/foo -n 4x Usage: foo [options] foo: error: option -n: invalid integer value: '4x' Or, where the user fails to pass a value at all: $ /usr/bin/foo -n Usage: foo [options] foo: error: -n option requires an argument "optparse"-generated error messages take care always to mention the option involved in the error; be sure to do the same when calling "OptionParser.error()" from your application code. If "optparse"’s default error-handling behaviour does not suit your needs, you’ll need to subclass OptionParser and override its "exit()" and/or "error()" methods. Putting it all together ----------------------- Here’s what "optparse"-based scripts usually look like: from optparse import OptionParser ... def main(): usage = "usage: %prog [options] arg" parser = OptionParser(usage) parser.add_option("-f", "--file", dest="filename", help="read data from FILENAME") parser.add_option("-v", "--verbose", action="store_true", dest="verbose") parser.add_option("-q", "--quiet", action="store_false", dest="verbose") ... (options, args) = parser.parse_args() if len(args) != 1: parser.error("incorrect number of arguments") if options.verbose: print("reading %s..." % options.filename) ... if __name__ == "__main__": main() Reference Guide =============== Creating the parser ------------------- The first step in using "optparse" is to create an OptionParser instance. class optparse.OptionParser(...) The OptionParser constructor has no required arguments, but a number of optional keyword arguments. You should always pass them as keyword arguments, i.e. do not rely on the order in which the arguments are declared. "usage" (default: ""%prog [options]"") The usage summary to print when your program is run incorrectly or with a help option. When "optparse" prints the usage string, it expands "%prog" to "os.path.basename(sys.argv[0])" (or to "prog" if you passed that keyword argument). To suppress a usage message, pass the special value "optparse.SUPPRESS_USAGE". "option_list" (default: "[]") A list of Option objects to populate the parser with. The options in "option_list" are added after any options in "standard_option_list" (a class attribute that may be set by OptionParser subclasses), but before any version or help options. Deprecated; use "add_option()" after creating the parser instead. "option_class" (default: optparse.Option) Class to use when adding options to the parser in "add_option()". "version" (default: "None") A version string to print when the user supplies a version option. If you supply a true value for "version", "optparse" automatically adds a version option with the single option string "--version". The substring "%prog" is expanded the same as for "usage". "conflict_handler" (default: ""error"") Specifies what to do when options with conflicting option strings are added to the parser; see section Conflicts between options. "description" (default: "None") A paragraph of text giving a brief overview of your program. "optparse" reformats this paragraph to fit the current terminal width and prints it when the user requests help (after "usage", but before the list of options). "formatter" (default: a new "IndentedHelpFormatter") An instance of optparse.HelpFormatter that will be used for printing help text. "optparse" provides two concrete classes for this purpose: IndentedHelpFormatter and TitledHelpFormatter. "add_help_option" (default: "True") If true, "optparse" will add a help option (with option strings "-h" and "--help") to the parser. "prog" The string to use when expanding "%prog" in "usage" and "version" instead of "os.path.basename(sys.argv[0])". "epilog" (default: "None") A paragraph of help text to print after the option help. Populating the parser --------------------- There are several ways to populate the parser with options. The preferred way is by using "OptionParser.add_option()", as shown in section Tutorial. "add_option()" can be called in one of two ways: * pass it an Option instance (as returned by "make_option()") * pass it any combination of positional and keyword arguments that are acceptable to "make_option()" (i.e., to the Option constructor), and it will create the Option instance for you The other alternative is to pass a list of pre-constructed Option instances to the OptionParser constructor, as in: option_list = [ make_option("-f", "--filename", action="store", type="string", dest="filename"), make_option("-q", "--quiet", action="store_false", dest="verbose"), ] parser = OptionParser(option_list=option_list) ("make_option()" is a factory function for creating Option instances; currently it is an alias for the Option constructor. A future version of "optparse" may split Option into several classes, and "make_option()" will pick the right class to instantiate. Do not instantiate Option directly.) Defining options ---------------- Each Option instance represents a set of synonymous command-line option strings, e.g. "-f" and "--file". You can specify any number of short or long option strings, but you must specify at least one overall option string. The canonical way to create an "Option" instance is with the "add_option()" method of "OptionParser". OptionParser.add_option(option) OptionParser.add_option(*opt_str, attr=value, ...) To define an option with only a short option string: parser.add_option("-f", attr=value, ...) And to define an option with only a long option string: parser.add_option("--foo", attr=value, ...) The keyword arguments define attributes of the new Option object. The most important option attribute is "action", and it largely determines which other attributes are relevant or required. If you pass irrelevant option attributes, or fail to pass required ones, "optparse" raises an "OptionError" exception explaining your mistake. An option’s *action* determines what "optparse" does when it encounters this option on the command-line. The standard option actions hard-coded into "optparse" are: ""store"" store this option’s argument (default) ""store_const"" store a constant value ""store_true"" store "True" ""store_false"" store "False" ""append"" append this option’s argument to a list ""append_const"" append a constant value to a list ""count"" increment a counter by one ""callback"" call a specified function ""help"" print a usage message including all options and the documentation for them (If you don’t supply an action, the default is ""store"". For this action, you may also supply "type" and "dest" option attributes; see Standard option actions.) As you can see, most actions involve storing or updating a value somewhere. "optparse" always creates a special object for this, conventionally called "options" (it happens to be an instance of "optparse.Values"). Option arguments (and various other values) are stored as attributes of this object, according to the "dest" (destination) option attribute. For example, when you call parser.parse_args() one of the first things "optparse" does is create the "options" object: options = Values() If one of the options in this parser is defined with parser.add_option("-f", "--file", action="store", type="string", dest="filename") and the command-line being parsed includes any of the following: -ffoo -f foo --file=foo --file foo then "optparse", on seeing this option, will do the equivalent of options.filename = "foo" The "type" and "dest" option attributes are almost as important as "action", but "action" is the only one that makes sense for *all* options. Option attributes ----------------- The following option attributes may be passed as keyword arguments to "OptionParser.add_option()". If you pass an option attribute that is not relevant to a particular option, or fail to pass a required option attribute, "optparse" raises "OptionError". Option.action (default: ""store"") Determines "optparse"’s behaviour when this option is seen on the command line; the available options are documented here. Option.type (default: ""string"") The argument type expected by this option (e.g., ""string"" or ""int""); the available option types are documented here. Option.dest (default: derived from option strings) If the option’s action implies writing or modifying a value somewhere, this tells "optparse" where to write it: "dest" names an attribute of the "options" object that "optparse" builds as it parses the command line. Option.default The value to use for this option’s destination if the option is not seen on the command line. See also "OptionParser.set_defaults()". Option.nargs (default: 1) How many arguments of type "type" should be consumed when this option is seen. If > 1, "optparse" will store a tuple of values to "dest". Option.const For actions that store a constant value, the constant value to store. Option.choices For options of type ""choice"", the list of strings the user may choose from. Option.callback For options with action ""callback"", the callable to call when this option is seen. See section Option Callbacks for detail on the arguments passed to the callable. Option.callback_args Option.callback_kwargs Additional positional and keyword arguments to pass to "callback" after the four standard callback arguments. Option.help Help text to print for this option when listing all available options after the user supplies a "help" option (such as "--help"). If no help text is supplied, the option will be listed without help text. To hide this option, use the special value "optparse.SUPPRESS_HELP". Option.metavar (default: derived from option strings) Stand-in for the option argument(s) to use when printing help text. See section Tutorial for an example. Standard option actions ----------------------- The various option actions all have slightly different requirements and effects. Most actions have several relevant option attributes which you may specify to guide "optparse"’s behaviour; a few have required attributes, which you must specify for any option using that action. * ""store"" [relevant: "type", "dest", "nargs", "choices"] The option must be followed by an argument, which is converted to a value according to "type" and stored in "dest". If "nargs" > 1, multiple arguments will be consumed from the command line; all will be converted according to "type" and stored to "dest" as a tuple. See the Standard option types section. If "choices" is supplied (a list or tuple of strings), the type defaults to ""choice"". If "type" is not supplied, it defaults to ""string"". If "dest" is not supplied, "optparse" derives a destination from the first long option string (e.g., "--foo-bar" implies "foo_bar"). If there are no long option strings, "optparse" derives a destination from the first short option string (e.g., "-f" implies "f"). Example: parser.add_option("-f") parser.add_option("-p", type="float", nargs=3, dest="point") As it parses the command line -f foo.txt -p 1 -3.5 4 -fbar.txt "optparse" will set options.f = "foo.txt" options.point = (1.0, -3.5, 4.0) options.f = "bar.txt" * ""store_const"" [required: "const"; relevant: "dest"] The value "const" is stored in "dest". Example: parser.add_option("-q", "--quiet", action="store_const", const=0, dest="verbose") parser.add_option("-v", "--verbose", action="store_const", const=1, dest="verbose") parser.add_option("--noisy", action="store_const", const=2, dest="verbose") If "--noisy" is seen, "optparse" will set options.verbose = 2 * ""store_true"" [relevant: "dest"] A special case of ""store_const"" that stores "True" to "dest". * ""store_false"" [relevant: "dest"] Like ""store_true"", but stores "False". Example: parser.add_option("--clobber", action="store_true", dest="clobber") parser.add_option("--no-clobber", action="store_false", dest="clobber") * ""append"" [relevant: "type", "dest", "nargs", "choices"] The option must be followed by an argument, which is appended to the list in "dest". If no default value for "dest" is supplied, an empty list is automatically created when "optparse" first encounters this option on the command-line. If "nargs" > 1, multiple arguments are consumed, and a tuple of length "nargs" is appended to "dest". The defaults for "type" and "dest" are the same as for the ""store"" action. Example: parser.add_option("-t", "--tracks", action="append", type="int") If "-t3" is seen on the command-line, "optparse" does the equivalent of: options.tracks = [] options.tracks.append(int("3")) If, a little later on, "--tracks=4" is seen, it does: options.tracks.append(int("4")) The "append" action calls the "append" method on the current value of the option. This means that any default value specified must have an "append" method. It also means that if the default value is non-empty, the default elements will be present in the parsed value for the option, with any values from the command line appended after those default values: >>> parser.add_option("--files", action="append", default=['~/.mypkg/defaults']) >>> opts, args = parser.parse_args(['--files', 'overrides.mypkg']) >>> opts.files ['~/.mypkg/defaults', 'overrides.mypkg'] * ""append_const"" [required: "const"; relevant: "dest"] Like ""store_const"", but the value "const" is appended to "dest"; as with ""append"", "dest" defaults to "None", and an empty list is automatically created the first time the option is encountered. * ""count"" [relevant: "dest"] Increment the integer stored at "dest". If no default value is supplied, "dest" is set to zero before being incremented the first time. Example: parser.add_option("-v", action="count", dest="verbosity") The first time "-v" is seen on the command line, "optparse" does the equivalent of: options.verbosity = 0 options.verbosity += 1 Every subsequent occurrence of "-v" results in options.verbosity += 1 * ""callback"" [required: "callback"; relevant: "type", "nargs", "callback_args", "callback_kwargs"] Call the function specified by "callback", which is called as func(option, opt_str, value, parser, *args, **kwargs) See section Option Callbacks for more detail. * ""help"" Prints a complete help message for all the options in the current option parser. The help message is constructed from the "usage" string passed to OptionParser’s constructor and the "help" string passed to every option. If no "help" string is supplied for an option, it will still be listed in the help message. To omit an option entirely, use the special value "optparse.SUPPRESS_HELP". "optparse" automatically adds a "help" option to all OptionParsers, so you do not normally need to create one. Example: from optparse import OptionParser, SUPPRESS_HELP # usually, a help option is added automatically, but that can # be suppressed using the add_help_option argument parser = OptionParser(add_help_option=False) parser.add_option("-h", "--help", action="help") parser.add_option("-v", action="store_true", dest="verbose", help="Be moderately verbose") parser.add_option("--file", dest="filename", help="Input file to read data from") parser.add_option("--secret", help=SUPPRESS_HELP) If "optparse" sees either "-h" or "--help" on the command line, it will print something like the following help message to stdout (assuming "sys.argv[0]" is ""foo.py""): Usage: foo.py [options] Options: -h, --help Show this help message and exit -v Be moderately verbose --file=FILENAME Input file to read data from After printing the help message, "optparse" terminates your process with "sys.exit(0)". * ""version"" Prints the version number supplied to the OptionParser to stdout and exits. The version number is actually formatted and printed by the "print_version()" method of OptionParser. Generally only relevant if the "version" argument is supplied to the OptionParser constructor. As with "help" options, you will rarely create "version" options, since "optparse" automatically adds them when needed. Standard option types --------------------- "optparse" has five built-in option types: ""string"", ""int"", ""choice"", ""float"" and ""complex"". If you need to add new option types, see section Extending optparse. Arguments to string options are not checked or converted in any way: the text on the command line is stored in the destination (or passed to the callback) as-is. Integer arguments (type ""int"") are parsed as follows: * if the number starts with "0x", it is parsed as a hexadecimal number * if the number starts with "0", it is parsed as an octal number * if the number starts with "0b", it is parsed as a binary number * otherwise, the number is parsed as a decimal number The conversion is done by calling "int()" with the appropriate base (2, 8, 10, or 16). If this fails, so will "optparse", although with a more useful error message. ""float"" and ""complex"" option arguments are converted directly with "float()" and "complex()", with similar error-handling. ""choice"" options are a subtype of ""string"" options. The "choices" option attribute (a sequence of strings) defines the set of allowed option arguments. "optparse.check_choice()" compares user-supplied option arguments against this master list and raises "OptionValueError" if an invalid string is given. Parsing arguments ----------------- The whole point of creating and populating an OptionParser is to call its "parse_args()" method: (options, args) = parser.parse_args(args=None, values=None) where the input parameters are "args" the list of arguments to process (default: "sys.argv[1:]") "values" an "optparse.Values" object to store option arguments in (default: a new instance of "Values") – if you give an existing object, the option defaults will not be initialized on it and the return values are "options" the same object that was passed in as "values", or the optparse.Values instance created by "optparse" "args" the leftover positional arguments after all options have been processed The most common usage is to supply neither keyword argument. If you supply "values", it will be modified with repeated "setattr()" calls (roughly one for every option argument stored to an option destination) and returned by "parse_args()". If "parse_args()" encounters any errors in the argument list, it calls the OptionParser’s "error()" method with an appropriate end-user error message. This ultimately terminates your process with an exit status of 2 (the traditional Unix exit status for command-line errors). Querying and manipulating your option parser -------------------------------------------- The default behavior of the option parser can be customized slightly, and you can also poke around your option parser and see what’s there. OptionParser provides several methods to help you out: OptionParser.disable_interspersed_args() Set parsing to stop on the first non-option. For example, if "-a" and "-b" are both simple options that take no arguments, "optparse" normally accepts this syntax: prog -a arg1 -b arg2 and treats it as equivalent to prog -a -b arg1 arg2 To disable this feature, call "disable_interspersed_args()". This restores traditional Unix syntax, where option parsing stops with the first non-option argument. Use this if you have a command processor which runs another command which has options of its own and you want to make sure these options don’t get confused. For example, each command might have a different set of options. OptionParser.enable_interspersed_args() Set parsing to not stop on the first non-option, allowing interspersing switches with command arguments. This is the default behavior. OptionParser.get_option(opt_str) Returns the Option instance with the option string *opt_str*, or "None" if no options have that option string. OptionParser.has_option(opt_str) Return "True" if the OptionParser has an option with option string *opt_str* (e.g., "-q" or "--verbose"). OptionParser.remove_option(opt_str) If the "OptionParser" has an option corresponding to *opt_str*, that option is removed. If that option provided any other option strings, all of those option strings become invalid. If *opt_str* does not occur in any option belonging to this "OptionParser", raises "ValueError". Conflicts between options ------------------------- If you’re not careful, it’s easy to define options with conflicting option strings: parser.add_option("-n", "--dry-run", ...) ... parser.add_option("-n", "--noisy", ...) (This is particularly true if you’ve defined your own OptionParser subclass with some standard options.) Every time you add an option, "optparse" checks for conflicts with existing options. If it finds any, it invokes the current conflict- handling mechanism. You can set the conflict-handling mechanism either in the constructor: parser = OptionParser(..., conflict_handler=handler) or with a separate call: parser.set_conflict_handler(handler) The available conflict handlers are: ""error"" (default) assume option conflicts are a programming error and raise "OptionConflictError" ""resolve"" resolve option conflicts intelligently (see below) As an example, let’s define an "OptionParser" that resolves conflicts intelligently and add conflicting options to it: parser = OptionParser(conflict_handler="resolve") parser.add_option("-n", "--dry-run", ..., help="do no harm") parser.add_option("-n", "--noisy", ..., help="be noisy") At this point, "optparse" detects that a previously-added option is already using the "-n" option string. Since "conflict_handler" is ""resolve"", it resolves the situation by removing "-n" from the earlier option’s list of option strings. Now "--dry-run" is the only way for the user to activate that option. If the user asks for help, the help message will reflect that: Options: --dry-run do no harm ... -n, --noisy be noisy It’s possible to whittle away the option strings for a previously- added option until there are none left, and the user has no way of invoking that option from the command-line. In that case, "optparse" removes that option completely, so it doesn’t show up in help text or anywhere else. Carrying on with our existing OptionParser: parser.add_option("--dry-run", ..., help="new dry-run option") At this point, the original "-n"/"--dry-run" option is no longer accessible, so "optparse" removes it, leaving this help text: Options: ... -n, --noisy be noisy --dry-run new dry-run option Cleanup ------- OptionParser instances have several cyclic references. This should not be a problem for Python’s garbage collector, but you may wish to break the cyclic references explicitly by calling "destroy()" on your OptionParser once you are done with it. This is particularly useful in long-running applications where large object graphs are reachable from your OptionParser. Other methods ------------- OptionParser supports several other public methods: OptionParser.set_usage(usage) Set the usage string according to the rules described above for the "usage" constructor keyword argument. Passing "None" sets the default usage string; use "optparse.SUPPRESS_USAGE" to suppress a usage message. OptionParser.print_usage(file=None) Print the usage message for the current program ("self.usage") to *file* (default stdout). Any occurrence of the string "%prog" in "self.usage" is replaced with the name of the current program. Does nothing if "self.usage" is empty or not defined. OptionParser.get_usage() Same as "print_usage()" but returns the usage string instead of printing it. OptionParser.set_defaults(dest=value, ...) Set default values for several option destinations at once. Using "set_defaults()" is the preferred way to set default values for options, since multiple options can share the same destination. For example, if several “mode” options all set the same destination, any one of them can set the default, and the last one wins: parser.add_option("--advanced", action="store_const", dest="mode", const="advanced", default="novice") # overridden below parser.add_option("--novice", action="store_const", dest="mode", const="novice", default="advanced") # overrides above setting To avoid this confusion, use "set_defaults()": parser.set_defaults(mode="advanced") parser.add_option("--advanced", action="store_const", dest="mode", const="advanced") parser.add_option("--novice", action="store_const", dest="mode", const="novice") Option Callbacks ================ When "optparse"’s built-in actions and types aren’t quite enough for your needs, you have two choices: extend "optparse" or define a callback option. Extending "optparse" is more general, but overkill for a lot of simple cases. Quite often a simple callback is all you need. There are two steps to defining a callback option: * define the option itself using the ""callback"" action * write the callback; this is a function (or method) that takes at least four arguments, as described below Defining a callback option -------------------------- As always, the easiest way to define a callback option is by using the "OptionParser.add_option()" method. Apart from "action", the only option attribute you must specify is "callback", the function to call: parser.add_option("-c", action="callback", callback=my_callback) "callback" is a function (or other callable object), so you must have already defined "my_callback()" when you create this callback option. In this simple case, "optparse" doesn’t even know if "-c" takes any arguments, which usually means that the option takes no arguments—the mere presence of "-c" on the command-line is all it needs to know. In some circumstances, though, you might want your callback to consume an arbitrary number of command-line arguments. This is where writing callbacks gets tricky; it’s covered later in this section. "optparse" always passes four particular arguments to your callback, and it will only pass additional arguments if you specify them via "callback_args" and "callback_kwargs". Thus, the minimal callback function signature is: def my_callback(option, opt, value, parser): The four arguments to a callback are described below. There are several other option attributes that you can supply when you define a callback option: "type" has its usual meaning: as with the ""store"" or ""append"" actions, it instructs "optparse" to consume one argument and convert it to "type". Rather than storing the converted value(s) anywhere, though, "optparse" passes it to your callback function. "nargs" also has its usual meaning: if it is supplied and > 1, "optparse" will consume "nargs" arguments, each of which must be convertible to "type". It then passes a tuple of converted values to your callback. "callback_args" a tuple of extra positional arguments to pass to the callback "callback_kwargs" a dictionary of extra keyword arguments to pass to the callback How callbacks are called ------------------------ All callbacks are called as follows: func(option, opt_str, value, parser, *args, **kwargs) where "option" is the Option instance that’s calling the callback "opt_str" is the option string seen on the command-line that’s triggering the callback. (If an abbreviated long option was used, "opt_str" will be the full, canonical option string—e.g. if the user puts "--foo" on the command-line as an abbreviation for "--foobar", then "opt_str" will be ""--foobar"".) "value" is the argument to this option seen on the command-line. "optparse" will only expect an argument if "type" is set; the type of "value" will be the type implied by the option’s type. If "type" for this option is "None" (no argument expected), then "value" will be "None". If "nargs" > 1, "value" will be a tuple of values of the appropriate type. "parser" is the OptionParser instance driving the whole thing, mainly useful because you can access some other interesting data through its instance attributes: "parser.largs" the current list of leftover arguments, ie. arguments that have been consumed but are neither options nor option arguments. Feel free to modify "parser.largs", e.g. by adding more arguments to it. (This list will become "args", the second return value of "parse_args()".) "parser.rargs" the current list of remaining arguments, ie. with "opt_str" and "value" (if applicable) removed, and only the arguments following them still there. Feel free to modify "parser.rargs", e.g. by consuming more arguments. "parser.values" the object where option values are by default stored (an instance of optparse.OptionValues). This lets callbacks use the same mechanism as the rest of "optparse" for storing option values; you don’t need to mess around with globals or closures. You can also access or modify the value(s) of any options already encountered on the command-line. "args" is a tuple of arbitrary positional arguments supplied via the "callback_args" option attribute. "kwargs" is a dictionary of arbitrary keyword arguments supplied via "callback_kwargs". Raising errors in a callback ---------------------------- The callback function should raise "OptionValueError" if there are any problems with the option or its argument(s). "optparse" catches this and terminates the program, printing the error message you supply to stderr. Your message should be clear, concise, accurate, and mention the option at fault. Otherwise, the user will have a hard time figuring out what they did wrong. Callback example 1: trivial callback ------------------------------------ Here’s an example of a callback option that takes no arguments, and simply records that the option was seen: def record_foo_seen(option, opt_str, value, parser): parser.values.saw_foo = True parser.add_option("--foo", action="callback", callback=record_foo_seen) Of course, you could do that with the ""store_true"" action. Callback example 2: check option order -------------------------------------- Here’s a slightly more interesting example: record the fact that "-a" is seen, but blow up if it comes after "-b" in the command-line. def check_order(option, opt_str, value, parser): if parser.values.b: raise OptionValueError("can't use -a after -b") parser.values.a = 1 ... parser.add_option("-a", action="callback", callback=check_order) parser.add_option("-b", action="store_true", dest="b") Callback example 3: check option order (generalized) ---------------------------------------------------- If you want to re-use this callback for several similar options (set a flag, but blow up if "-b" has already been seen), it needs a bit of work: the error message and the flag that it sets must be generalized. def check_order(option, opt_str, value, parser): if parser.values.b: raise OptionValueError("can't use %s after -b" % opt_str) setattr(parser.values, option.dest, 1) ... parser.add_option("-a", action="callback", callback=check_order, dest='a') parser.add_option("-b", action="store_true", dest="b") parser.add_option("-c", action="callback", callback=check_order, dest='c') Callback example 4: check arbitrary condition --------------------------------------------- Of course, you could put any condition in there—you’re not limited to checking the values of already-defined options. For example, if you have options that should not be called when the moon is full, all you have to do is this: def check_moon(option, opt_str, value, parser): if is_moon_full(): raise OptionValueError("%s option invalid when moon is full" % opt_str) setattr(parser.values, option.dest, 1) ... parser.add_option("--foo", action="callback", callback=check_moon, dest="foo") (The definition of "is_moon_full()" is left as an exercise for the reader.) Callback example 5: fixed arguments ----------------------------------- Things get slightly more interesting when you define callback options that take a fixed number of arguments. Specifying that a callback option takes arguments is similar to defining a ""store"" or ""append"" option: if you define "type", then the option takes one argument that must be convertible to that type; if you further define "nargs", then the option takes "nargs" arguments. Here’s an example that just emulates the standard ""store"" action: def store_value(option, opt_str, value, parser): setattr(parser.values, option.dest, value) ... parser.add_option("--foo", action="callback", callback=store_value, type="int", nargs=3, dest="foo") Note that "optparse" takes care of consuming 3 arguments and converting them to integers for you; all you have to do is store them. (Or whatever; obviously you don’t need a callback for this example.) Callback example 6: variable arguments -------------------------------------- Things get hairy when you want an option to take a variable number of arguments. For this case, you must write a callback, as "optparse" doesn’t provide any built-in capabilities for it. And you have to deal with certain intricacies of conventional Unix command-line parsing that "optparse" normally handles for you. In particular, callbacks should implement the conventional rules for bare "--" and "-" arguments: * either "--" or "-" can be option arguments * bare "--" (if not the argument to some option): halt command-line processing and discard the "--" * bare "-" (if not the argument to some option): halt command-line processing but keep the "-" (append it to "parser.largs") If you want an option that takes a variable number of arguments, there are several subtle, tricky issues to worry about. The exact implementation you choose will be based on which trade-offs you’re willing to make for your application (which is why "optparse" doesn’t support this sort of thing directly). Nevertheless, here’s a stab at a callback for an option with variable arguments: def vararg_callback(option, opt_str, value, parser): assert value is None value = [] def floatable(str): try: float(str) return True except ValueError: return False for arg in parser.rargs: # stop on --foo like options if arg[:2] == "--" and len(arg) > 2: break # stop on -a, but not on -3 or -3.0 if arg[:1] == "-" and len(arg) > 1 and not floatable(arg): break value.append(arg) del parser.rargs[:len(value)] setattr(parser.values, option.dest, value) ... parser.add_option("-c", "--callback", dest="vararg_attr", action="callback", callback=vararg_callback) Extending "optparse" ==================== Since the two major controlling factors in how "optparse" interprets command-line options are the action and type of each option, the most likely direction of extension is to add new actions and new types. Adding new types ---------------- To add new types, you need to define your own subclass of "optparse"’s "Option" class. This class has a couple of attributes that define "optparse"’s types: "TYPES" and "TYPE_CHECKER". Option.TYPES A tuple of type names; in your subclass, simply define a new tuple "TYPES" that builds on the standard one. Option.TYPE_CHECKER A dictionary mapping type names to type-checking functions. A type-checking function has the following signature: def check_mytype(option, opt, value) where "option" is an "Option" instance, "opt" is an option string (e.g., "-f"), and "value" is the string from the command line that must be checked and converted to your desired type. "check_mytype()" should return an object of the hypothetical type "mytype". The value returned by a type-checking function will wind up in the OptionValues instance returned by "OptionParser.parse_args()", or be passed to a callback as the "value" parameter. Your type-checking function should raise "OptionValueError" if it encounters any problems. "OptionValueError" takes a single string argument, which is passed as-is to "OptionParser"’s "error()" method, which in turn prepends the program name and the string ""error:"" and prints everything to stderr before terminating the process. Here’s a silly example that demonstrates adding a ""complex"" option type to parse Python-style complex numbers on the command line. (This is even sillier than it used to be, because "optparse" 1.3 added built-in support for complex numbers, but never mind.) First, the necessary imports: from copy import copy from optparse import Option, OptionValueError You need to define your type-checker first, since it’s referred to later (in the "TYPE_CHECKER" class attribute of your Option subclass): def check_complex(option, opt, value): try: return complex(value) except ValueError: raise OptionValueError( "option %s: invalid complex value: %r" % (opt, value)) Finally, the Option subclass: class MyOption (Option): TYPES = Option.TYPES + ("complex",) TYPE_CHECKER = copy(Option.TYPE_CHECKER) TYPE_CHECKER["complex"] = check_complex (If we didn’t make a "copy()" of "Option.TYPE_CHECKER", we would end up modifying the "TYPE_CHECKER" attribute of "optparse"’s Option class. This being Python, nothing stops you from doing that except good manners and common sense.) That’s it! Now you can write a script that uses the new option type just like any other "optparse"-based script, except you have to instruct your OptionParser to use MyOption instead of Option: parser = OptionParser(option_class=MyOption) parser.add_option("-c", type="complex") Alternately, you can build your own option list and pass it to OptionParser; if you don’t use "add_option()" in the above way, you don’t need to tell OptionParser which option class to use: option_list = [MyOption("-c", action="store", type="complex", dest="c")] parser = OptionParser(option_list=option_list) Adding new actions ------------------ Adding new actions is a bit trickier, because you have to understand that "optparse" has a couple of classifications for actions: “store” actions actions that result in "optparse" storing a value to an attribute of the current OptionValues instance; these options require a "dest" attribute to be supplied to the Option constructor. “typed” actions actions that take a value from the command line and expect it to be of a certain type; or rather, a string that can be converted to a certain type. These options require a "type" attribute to the Option constructor. These are overlapping sets: some default “store” actions are ""store"", ""store_const"", ""append"", and ""count"", while the default “typed” actions are ""store"", ""append"", and ""callback"". When you add an action, you need to categorize it by listing it in at least one of the following class attributes of Option (all are lists of strings): Option.ACTIONS All actions must be listed in ACTIONS. Option.STORE_ACTIONS “store” actions are additionally listed here. Option.TYPED_ACTIONS “typed” actions are additionally listed here. Option.ALWAYS_TYPED_ACTIONS Actions that always take a type (i.e. whose options always take a value) are additionally listed here. The only effect of this is that "optparse" assigns the default type, ""string"", to options with no explicit type whose action is listed in "ALWAYS_TYPED_ACTIONS". In order to actually implement your new action, you must override Option’s "take_action()" method and add a case that recognizes your action. For example, let’s add an ""extend"" action. This is similar to the standard ""append"" action, but instead of taking a single value from the command-line and appending it to an existing list, ""extend"" will take multiple values in a single comma-delimited string, and extend an existing list with them. That is, if "--names" is an ""extend"" option of type ""string"", the command line --names=foo,bar --names blah --names ding,dong would result in a list ["foo", "bar", "blah", "ding", "dong"] Again we define a subclass of Option: class MyOption(Option): ACTIONS = Option.ACTIONS + ("extend",) STORE_ACTIONS = Option.STORE_ACTIONS + ("extend",) TYPED_ACTIONS = Option.TYPED_ACTIONS + ("extend",) ALWAYS_TYPED_ACTIONS = Option.ALWAYS_TYPED_ACTIONS + ("extend",) def take_action(self, action, dest, opt, value, values, parser): if action == "extend": lvalue = value.split(",") values.ensure_value(dest, []).extend(lvalue) else: Option.take_action( self, action, dest, opt, value, values, parser) Features of note: * ""extend"" both expects a value on the command-line and stores that value somewhere, so it goes in both "STORE_ACTIONS" and "TYPED_ACTIONS". * to ensure that "optparse" assigns the default type of ""string"" to ""extend"" actions, we put the ""extend"" action in "ALWAYS_TYPED_ACTIONS" as well. * "MyOption.take_action()" implements just this one new action, and passes control back to "Option.take_action()" for the standard "optparse" actions. * "values" is an instance of the optparse_parser.Values class, which provides the very useful "ensure_value()" method. "ensure_value()" is essentially "getattr()" with a safety valve; it is called as values.ensure_value(attr, value) If the "attr" attribute of "values" doesn’t exist or is "None", then ensure_value() first sets it to "value", and then returns ‘value. This is very handy for actions like ""extend"", ""append"", and ""count"", all of which accumulate data in a variable and expect that variable to be of a certain type (a list for the first two, an integer for the latter). Using "ensure_value()" means that scripts using your action don’t have to worry about setting a default value for the option destinations in question; they can just leave the default as "None" and "ensure_value()" will take care of getting it right when it’s needed. "os.path" — Common pathname manipulations ***************************************** **Source code:** Lib/posixpath.py (for POSIX) and Lib/ntpath.py (for Windows). ====================================================================== This module implements some useful functions on pathnames. To read or write files see "open()", and for accessing the filesystem see the "os" module. The path parameters can be passed as strings, or bytes, or any object implementing the "os.PathLike" protocol. Unlike a unix shell, Python does not do any *automatic* path expansions. Functions such as "expanduser()" and "expandvars()" can be invoked explicitly when an application desires shell-like path expansion. (See also the "glob" module.) See also: The "pathlib" module offers high-level path objects. Note: All of these functions accept either only bytes or only string objects as their parameters. The result is an object of the same type, if a path or file name is returned. Note: Since different operating systems have different path name conventions, there are several versions of this module in the standard library. The "os.path" module is always the path module suitable for the operating system Python is running on, and therefore usable for local paths. However, you can also import and use the individual modules if you want to manipulate a path that is *always* in one of the different formats. They all have the same interface: * "posixpath" for UNIX-style paths * "ntpath" for Windows paths Changed in version 3.8: "exists()", "lexists()", "isdir()", "isfile()", "islink()", and "ismount()" now return "False" instead of raising an exception for paths that contain characters or bytes unrepresentable at the OS level. os.path.abspath(path) Return a normalized absolutized version of the pathname *path*. On most platforms, this is equivalent to calling the function "normpath()" as follows: "normpath(join(os.getcwd(), path))". Changed in version 3.6: Accepts a *path-like object*. os.path.basename(path) Return the base name of pathname *path*. This is the second element of the pair returned by passing *path* to the function "split()". Note that the result of this function is different from the Unix **basename** program; where **basename** for "'/foo/bar/'" returns "'bar'", the "basename()" function returns an empty string ("''"). Changed in version 3.6: Accepts a *path-like object*. os.path.commonpath(paths) Return the longest common sub-path of each pathname in the sequence *paths*. Raise "ValueError" if *paths* contain both absolute and relative pathnames, the *paths* are on the different drives or if *paths* is empty. Unlike "commonprefix()", this returns a valid path. Availability: Unix, Windows. New in version 3.5. Changed in version 3.6: Accepts a sequence of *path-like objects*. os.path.commonprefix(list) Return the longest path prefix (taken character-by-character) that is a prefix of all paths in *list*. If *list* is empty, return the empty string ("''"). Note: This function may return invalid paths because it works a character at a time. To obtain a valid path, see "commonpath()". >>> os.path.commonprefix(['/usr/lib', '/usr/local/lib']) '/usr/l' >>> os.path.commonpath(['/usr/lib', '/usr/local/lib']) '/usr' Changed in version 3.6: Accepts a *path-like object*. os.path.dirname(path) Return the directory name of pathname *path*. This is the first element of the pair returned by passing *path* to the function "split()". Changed in version 3.6: Accepts a *path-like object*. os.path.exists(path) Return "True" if *path* refers to an existing path or an open file descriptor. Returns "False" for broken symbolic links. On some platforms, this function may return "False" if permission is not granted to execute "os.stat()" on the requested file, even if the *path* physically exists. Changed in version 3.3: *path* can now be an integer: "True" is returned if it is an open file descriptor, "False" otherwise. Changed in version 3.6: Accepts a *path-like object*. os.path.lexists(path) Return "True" if *path* refers to an existing path. Returns "True" for broken symbolic links. Equivalent to "exists()" on platforms lacking "os.lstat()". Changed in version 3.6: Accepts a *path-like object*. os.path.expanduser(path) On Unix and Windows, return the argument with an initial component of "~" or "~user" replaced by that *user*’s home directory. On Unix, an initial "~" is replaced by the environment variable "HOME" if it is set; otherwise the current user’s home directory is looked up in the password directory through the built-in module "pwd". An initial "~user" is looked up directly in the password directory. On Windows, "USERPROFILE" will be used if set, otherwise a combination of "HOMEPATH" and "HOMEDRIVE" will be used. An initial "~user" is handled by stripping the last directory component from the created user path derived above. If the expansion fails or if the path does not begin with a tilde, the path is returned unchanged. Changed in version 3.6: Accepts a *path-like object*. Changed in version 3.8: No longer uses "HOME" on Windows. os.path.expandvars(path) Return the argument with environment variables expanded. Substrings of the form "$name" or "${name}" are replaced by the value of environment variable *name*. Malformed variable names and references to non-existing variables are left unchanged. On Windows, "%name%" expansions are supported in addition to "$name" and "${name}". Changed in version 3.6: Accepts a *path-like object*. os.path.getatime(path) Return the time of last access of *path*. The return value is a floating point number giving the number of seconds since the epoch (see the "time" module). Raise "OSError" if the file does not exist or is inaccessible. os.path.getmtime(path) Return the time of last modification of *path*. The return value is a floating point number giving the number of seconds since the epoch (see the "time" module). Raise "OSError" if the file does not exist or is inaccessible. Changed in version 3.6: Accepts a *path-like object*. os.path.getctime(path) Return the system’s ctime which, on some systems (like Unix) is the time of the last metadata change, and, on others (like Windows), is the creation time for *path*. The return value is a number giving the number of seconds since the epoch (see the "time" module). Raise "OSError" if the file does not exist or is inaccessible. Changed in version 3.6: Accepts a *path-like object*. os.path.getsize(path) Return the size, in bytes, of *path*. Raise "OSError" if the file does not exist or is inaccessible. Changed in version 3.6: Accepts a *path-like object*. os.path.isabs(path) Return "True" if *path* is an absolute pathname. On Unix, that means it begins with a slash, on Windows that it begins with a (back)slash after chopping off a potential drive letter. Changed in version 3.6: Accepts a *path-like object*. os.path.isfile(path) Return "True" if *path* is an "existing" regular file. This follows symbolic links, so both "islink()" and "isfile()" can be true for the same path. Changed in version 3.6: Accepts a *path-like object*. os.path.isdir(path) Return "True" if *path* is an "existing" directory. This follows symbolic links, so both "islink()" and "isdir()" can be true for the same path. Changed in version 3.6: Accepts a *path-like object*. os.path.islink(path) Return "True" if *path* refers to an "existing" directory entry that is a symbolic link. Always "False" if symbolic links are not supported by the Python runtime. Changed in version 3.6: Accepts a *path-like object*. os.path.ismount(path) Return "True" if pathname *path* is a *mount point*: a point in a file system where a different file system has been mounted. On POSIX, the function checks whether *path*’s parent, "*path*/..", is on a different device than *path*, or whether "*path*/.." and *path* point to the same i-node on the same device — this should detect mount points for all Unix and POSIX variants. It is not able to reliably detect bind mounts on the same filesystem. On Windows, a drive letter root and a share UNC are always mount points, and for any other path "GetVolumePathName" is called to see if it is different from the input path. New in version 3.4: Support for detecting non-root mount points on Windows. Changed in version 3.6: Accepts a *path-like object*. os.path.join(path, *paths) Join one or more path components intelligently. The return value is the concatenation of *path* and any members of **paths* with exactly one directory separator following each non-empty part except the last, meaning that the result will only end in a separator if the last part is empty. If a component is an absolute path, all previous components are thrown away and joining continues from the absolute path component. On Windows, the drive letter is not reset when an absolute path component (e.g., "r'\foo'") is encountered. If a component contains a drive letter, all previous components are thrown away and the drive letter is reset. Note that since there is a current directory for each drive, "os.path.join("c:", "foo")" represents a path relative to the current directory on drive "C:" ("c:foo"), not "c:\foo". Changed in version 3.6: Accepts a *path-like object* for *path* and *paths*. os.path.normcase(path) Normalize the case of a pathname. On Windows, convert all characters in the pathname to lowercase, and also convert forward slashes to backward slashes. On other operating systems, return the path unchanged. Changed in version 3.6: Accepts a *path-like object*. os.path.normpath(path) Normalize a pathname by collapsing redundant separators and up- level references so that "A//B", "A/B/", "A/./B" and "A/foo/../B" all become "A/B". This string manipulation may change the meaning of a path that contains symbolic links. On Windows, it converts forward slashes to backward slashes. To normalize case, use "normcase()". Note: On POSIX systems, in accordance with IEEE Std 1003.1 2013 Edition; 4.13 Pathname Resolution, if a pathname begins with exactly two slashes, the first component following the leading characters may be interpreted in an implementation-defined manner, although more than two leading characters shall be treated as a single character. Changed in version 3.6: Accepts a *path-like object*. os.path.realpath(path, *, strict=False) Return the canonical path of the specified filename, eliminating any symbolic links encountered in the path (if they are supported by the operating system). By default, the path is evaluated up to the first component that does not exist, is a symlink loop, or whose evaluation raises "OSError". All such components are appended unchanged to the existing part of the path. Some errors that are handled this way include “access denied”, “not a directory”, or “bad argument to internal function”. Thus, the resulting path may be missing or inaccessible, may still contain links or loops, and may traverse non-directories. This behavior can be modified by keyword arguments: If *strict* is "True", the first error encountered when evaluating the path is re-raised. In particular, "FileNotFoundError" is raised if *path* does not exist, or another "OSError" if it is otherwise inaccessible. If *strict* is "os.path.ALLOW_MISSING", errors other than "FileNotFoundError" are re-raised (as with "strict=True"). Thus, the returned path will not contain any symbolic links, but the named file and some of its parent directories may be missing. Note: This function emulates the operating system’s procedure for making a path canonical, which differs slightly between Windows and UNIX with respect to how links and subsequent path components interact.Operating system APIs make paths canonical as needed, so it’s not normally necessary to call this function. Changed in version 3.6: Accepts a *path-like object*. Changed in version 3.8: Symbolic links and junctions are now resolved on Windows. Changed in version 3.9.23: The *strict* parameter was added. Changed in version 3.9.23 (unreleased): The "ALLOW_MISSING" value for the *strict* parameter was added. os.path.ALLOW_MISSING Special value used for the *strict* argument in "realpath()". New in version 3.9.23 (unreleased). os.path.relpath(path, start=os.curdir) Return a relative filepath to *path* either from the current directory or from an optional *start* directory. This is a path computation: the filesystem is not accessed to confirm the existence or nature of *path* or *start*. On Windows, "ValueError" is raised when *path* and *start* are on different drives. *start* defaults to "os.curdir". Availability: Unix, Windows. Changed in version 3.6: Accepts a *path-like object*. os.path.samefile(path1, path2) Return "True" if both pathname arguments refer to the same file or directory. This is determined by the device number and i-node number and raises an exception if an "os.stat()" call on either pathname fails. Availability: Unix, Windows. Changed in version 3.2: Added Windows support. Changed in version 3.4: Windows now uses the same implementation as all other platforms. Changed in version 3.6: Accepts a *path-like object*. os.path.sameopenfile(fp1, fp2) Return "True" if the file descriptors *fp1* and *fp2* refer to the same file. Availability: Unix, Windows. Changed in version 3.2: Added Windows support. Changed in version 3.6: Accepts a *path-like object*. os.path.samestat(stat1, stat2) Return "True" if the stat tuples *stat1* and *stat2* refer to the same file. These structures may have been returned by "os.fstat()", "os.lstat()", or "os.stat()". This function implements the underlying comparison used by "samefile()" and "sameopenfile()". Availability: Unix, Windows. Changed in version 3.4: Added Windows support. Changed in version 3.6: Accepts a *path-like object*. os.path.split(path) Split the pathname *path* into a pair, "(head, tail)" where *tail* is the last pathname component and *head* is everything leading up to that. The *tail* part will never contain a slash; if *path* ends in a slash, *tail* will be empty. If there is no slash in *path*, *head* will be empty. If *path* is empty, both *head* and *tail* are empty. Trailing slashes are stripped from *head* unless it is the root (one or more slashes only). In all cases, "join(head, tail)" returns a path to the same location as *path* (but the strings may differ). Also see the functions "dirname()" and "basename()". Changed in version 3.6: Accepts a *path-like object*. os.path.splitdrive(path) Split the pathname *path* into a pair "(drive, tail)" where *drive* is either a mount point or the empty string. On systems which do not use drive specifications, *drive* will always be the empty string. In all cases, "drive + tail" will be the same as *path*. On Windows, splits a pathname into drive/UNC sharepoint and relative path. If the path contains a drive letter, drive will contain everything up to and including the colon: >>> splitdrive("c:/dir") ("c:", "/dir") If the path contains a UNC path, drive will contain the host name and share, up to but not including the fourth separator: >>> splitdrive("//host/computer/dir") ("//host/computer", "/dir") Changed in version 3.6: Accepts a *path-like object*. os.path.splitext(path) Split the pathname *path* into a pair "(root, ext)" such that "root + ext == path", and the extension, *ext*, is empty or begins with a period and contains at most one period. If the path contains no extension, *ext* will be "''": >>> splitext('bar') ('bar', '') If the path contains an extension, then *ext* will be set to this extension, including the leading period. Note that previous periods will be ignored: >>> splitext('foo.bar.exe') ('foo.bar', '.exe') >>> splitext('/foo/bar.exe') ('/foo/bar', '.exe') Leading periods of the last component of the path are considered to be part of the root: >>> splitext('.cshrc') ('.cshrc', '') >>> splitext('/foo/....jpg') ('/foo/....jpg', '') Changed in version 3.6: Accepts a *path-like object*. os.path.supports_unicode_filenames "True" if arbitrary Unicode strings can be used as file names (within limitations imposed by the file system). "os" — Miscellaneous operating system interfaces ************************************************ **Source code:** Lib/os.py ====================================================================== This module provides a portable way of using operating system dependent functionality. If you just want to read or write a file see "open()", if you want to manipulate paths, see the "os.path" module, and if you want to read all the lines in all the files on the command line see the "fileinput" module. For creating temporary files and directories see the "tempfile" module, and for high-level file and directory handling see the "shutil" module. Notes on the availability of these functions: * The design of all built-in operating system dependent modules of Python is such that as long as the same functionality is available, it uses the same interface; for example, the function "os.stat(path)" returns stat information about *path* in the same format (which happens to have originated with the POSIX interface). * Extensions peculiar to a particular operating system are also available through the "os" module, but using them is of course a threat to portability. * All functions accepting path or file names accept both bytes and string objects, and result in an object of the same type, if a path or file name is returned. * On VxWorks, os.fork, os.execv and os.spawn*p* are not supported. Note: All functions in this module raise "OSError" (or subclasses thereof) in the case of invalid or inaccessible file names and paths, or other arguments that have the correct type, but are not accepted by the operating system. exception os.error An alias for the built-in "OSError" exception. os.name The name of the operating system dependent module imported. The following names have currently been registered: "'posix'", "'nt'", "'java'". See also: "sys.platform" has a finer granularity. "os.uname()" gives system-dependent version information. The "platform" module provides detailed checks for the system’s identity. File Names, Command Line Arguments, and Environment Variables ============================================================= In Python, file names, command line arguments, and environment variables are represented using the string type. On some systems, decoding these strings to and from bytes is necessary before passing them to the operating system. Python uses the file system encoding to perform this conversion (see "sys.getfilesystemencoding()"). Changed in version 3.1: On some systems, conversion using the file system encoding may fail. In this case, Python uses the surrogateescape encoding error handler, which means that undecodable bytes are replaced by a Unicode character U+DCxx on decoding, and these are again translated to the original byte on encoding. The file system encoding must guarantee to successfully decode all bytes below 128. If the file system encoding fails to provide this guarantee, API functions may raise UnicodeErrors. Process Parameters ================== These functions and data items provide information and operate on the current process and user. os.ctermid() Return the filename corresponding to the controlling terminal of the process. Availability: Unix. os.environ A *mapping* object where keys and values are strings that represent the process environment. For example, "environ['HOME']" is the pathname of your home directory (on some platforms), and is equivalent to "getenv("HOME")" in C. This mapping is captured the first time the "os" module is imported, typically during Python startup as part of processing "site.py". Changes to the environment made after this time are not reflected in "os.environ", except for changes made by modifying "os.environ" directly. This mapping may be used to modify the environment as well as query the environment. "putenv()" will be called automatically when the mapping is modified. On Unix, keys and values use "sys.getfilesystemencoding()" and "'surrogateescape'" error handler. Use "environb" if you would like to use a different encoding. Note: Calling "putenv()" directly does not change "os.environ", so it’s better to modify "os.environ". Note: On some platforms, including FreeBSD and macOS, setting "environ" may cause memory leaks. Refer to the system documentation for "putenv()". You can delete items in this mapping to unset environment variables. "unsetenv()" will be called automatically when an item is deleted from "os.environ", and when one of the "pop()" or "clear()" methods is called. Changed in version 3.9: Updated to support **PEP 584**’s merge ("|") and update ("|=") operators. os.environb Bytes version of "environ": a *mapping* object where both keys and values are "bytes" objects representing the process environment. "environ" and "environb" are synchronized (modifying "environb" updates "environ", and vice versa). "environb" is only available if "supports_bytes_environ" is "True". New in version 3.2. Changed in version 3.9: Updated to support **PEP 584**’s merge ("|") and update ("|=") operators. os.chdir(path) os.fchdir(fd) os.getcwd() These functions are described in Files and Directories. os.fsencode(filename) Encode *path-like* *filename* to the filesystem encoding with "'surrogateescape'" error handler, or "'strict'" on Windows; return "bytes" unchanged. "fsdecode()" is the reverse function. New in version 3.2. Changed in version 3.6: Support added to accept objects implementing the "os.PathLike" interface. os.fsdecode(filename) Decode the *path-like* *filename* from the filesystem encoding with "'surrogateescape'" error handler, or "'strict'" on Windows; return "str" unchanged. "fsencode()" is the reverse function. New in version 3.2. Changed in version 3.6: Support added to accept objects implementing the "os.PathLike" interface. os.fspath(path) Return the file system representation of the path. If "str" or "bytes" is passed in, it is returned unchanged. Otherwise "__fspath__()" is called and its value is returned as long as it is a "str" or "bytes" object. In all other cases, "TypeError" is raised. New in version 3.6. class os.PathLike An *abstract base class* for objects representing a file system path, e.g. "pathlib.PurePath". New in version 3.6. abstractmethod __fspath__() Return the file system path representation of the object. The method should only return a "str" or "bytes" object, with the preference being for "str". os.getenv(key, default=None) Return the value of the environment variable *key* if it exists, or *default* if it doesn’t. *key*, *default* and the result are str. Note that since "getenv()" uses "os.environ", the mapping of "getenv()" is similarly also captured on import, and the function may not reflect future environment changes. On Unix, keys and values are decoded with "sys.getfilesystemencoding()" and "'surrogateescape'" error handler. Use "os.getenvb()" if you would like to use a different encoding. Availability: most flavors of Unix, Windows. os.getenvb(key, default=None) Return the value of the environment variable *key* if it exists, or *default* if it doesn’t. *key*, *default* and the result are bytes. Note that since "getenvb()" uses "os.environb", the mapping of "getenvb()" is similarly also captured on import, and the function may not reflect future environment changes. "getenvb()" is only available if "supports_bytes_environ" is "True". Availability: most flavors of Unix. New in version 3.2. os.get_exec_path(env=None) Returns the list of directories that will be searched for a named executable, similar to a shell, when launching a process. *env*, when specified, should be an environment variable dictionary to lookup the PATH in. By default, when *env* is "None", "environ" is used. New in version 3.2. os.getegid() Return the effective group id of the current process. This corresponds to the “set id” bit on the file being executed in the current process. Availability: Unix. os.geteuid() Return the current process’s effective user id. Availability: Unix. os.getgid() Return the real group id of the current process. Availability: Unix. os.getgrouplist(user, group) Return list of group ids that *user* belongs to. If *group* is not in the list, it is included; typically, *group* is specified as the group ID field from the password record for *user*, because that group ID will otherwise be potentially omitted. Availability: Unix. New in version 3.3. os.getgroups() Return list of supplemental group ids associated with the current process. Availability: Unix. Note: On macOS, "getgroups()" behavior differs somewhat from other Unix platforms. If the Python interpreter was built with a deployment target of "10.5" or earlier, "getgroups()" returns the list of effective group ids associated with the current user process; this list is limited to a system-defined number of entries, typically 16, and may be modified by calls to "setgroups()" if suitably privileged. If built with a deployment target greater than "10.5", "getgroups()" returns the current group access list for the user associated with the effective user id of the process; the group access list may change over the lifetime of the process, it is not affected by calls to "setgroups()", and its length is not limited to 16. The deployment target value, "MACOSX_DEPLOYMENT_TARGET", can be obtained with "sysconfig.get_config_var()". os.getlogin() Return the name of the user logged in on the controlling terminal of the process. For most purposes, it is more useful to use "getpass.getuser()" since the latter checks the environment variables "LOGNAME" or "USERNAME" to find out who the user is, and falls back to "pwd.getpwuid(os.getuid())[0]" to get the login name of the current real user id. Availability: Unix, Windows. os.getpgid(pid) Return the process group id of the process with process id *pid*. If *pid* is 0, the process group id of the current process is returned. Availability: Unix. os.getpgrp() Return the id of the current process group. Availability: Unix. os.getpid() Return the current process id. os.getppid() Return the parent’s process id. When the parent process has exited, on Unix the id returned is the one of the init process (1), on Windows it is still the same id, which may be already reused by another process. Availability: Unix, Windows. Changed in version 3.2: Added support for Windows. os.getpriority(which, who) Get program scheduling priority. The value *which* is one of "PRIO_PROCESS", "PRIO_PGRP", or "PRIO_USER", and *who* is interpreted relative to *which* (a process identifier for "PRIO_PROCESS", process group identifier for "PRIO_PGRP", and a user ID for "PRIO_USER"). A zero value for *who* denotes (respectively) the calling process, the process group of the calling process, or the real user ID of the calling process. Availability: Unix. New in version 3.3. os.PRIO_PROCESS os.PRIO_PGRP os.PRIO_USER Parameters for the "getpriority()" and "setpriority()" functions. Availability: Unix. New in version 3.3. os.getresuid() Return a tuple (ruid, euid, suid) denoting the current process’s real, effective, and saved user ids. Availability: Unix. New in version 3.2. os.getresgid() Return a tuple (rgid, egid, sgid) denoting the current process’s real, effective, and saved group ids. Availability: Unix. New in version 3.2. os.getuid() Return the current process’s real user id. Availability: Unix. os.initgroups(username, gid) Call the system initgroups() to initialize the group access list with all of the groups of which the specified username is a member, plus the specified group id. Availability: Unix. New in version 3.2. os.putenv(key, value) Set the environment variable named *key* to the string *value*. Such changes to the environment affect subprocesses started with "os.system()", "popen()" or "fork()" and "execv()". Assignments to items in "os.environ" are automatically translated into corresponding calls to "putenv()"; however, calls to "putenv()" don’t update "os.environ", so it is actually preferable to assign to items of "os.environ". This also applies to "getenv()" and "getenvb()", which respectively use "os.environ" and "os.environb" in their implementations. Note: On some platforms, including FreeBSD and macOS, setting "environ" may cause memory leaks. Refer to the system documentation for "putenv()". Raises an auditing event "os.putenv" with arguments "key", "value". Changed in version 3.9: The function is now always available. os.setegid(egid) Set the current process’s effective group id. Availability: Unix. os.seteuid(euid) Set the current process’s effective user id. Availability: Unix. os.setgid(gid) Set the current process’ group id. Availability: Unix. os.setgroups(groups) Set the list of supplemental group ids associated with the current process to *groups*. *groups* must be a sequence, and each element must be an integer identifying a group. This operation is typically available only to the superuser. Availability: Unix. Note: On macOS, the length of *groups* may not exceed the system- defined maximum number of effective group ids, typically 16. See the documentation for "getgroups()" for cases where it may not return the same group list set by calling setgroups(). os.setpgrp() Call the system call "setpgrp()" or "setpgrp(0, 0)" depending on which version is implemented (if any). See the Unix manual for the semantics. Availability: Unix. os.setpgid(pid, pgrp) Call the system call "setpgid()" to set the process group id of the process with id *pid* to the process group with id *pgrp*. See the Unix manual for the semantics. Availability: Unix. os.setpriority(which, who, priority) Set program scheduling priority. The value *which* is one of "PRIO_PROCESS", "PRIO_PGRP", or "PRIO_USER", and *who* is interpreted relative to *which* (a process identifier for "PRIO_PROCESS", process group identifier for "PRIO_PGRP", and a user ID for "PRIO_USER"). A zero value for *who* denotes (respectively) the calling process, the process group of the calling process, or the real user ID of the calling process. *priority* is a value in the range -20 to 19. The default priority is 0; lower priorities cause more favorable scheduling. Availability: Unix. New in version 3.3. os.setregid(rgid, egid) Set the current process’s real and effective group ids. Availability: Unix. os.setresgid(rgid, egid, sgid) Set the current process’s real, effective, and saved group ids. Availability: Unix. New in version 3.2. os.setresuid(ruid, euid, suid) Set the current process’s real, effective, and saved user ids. Availability: Unix. New in version 3.2. os.setreuid(ruid, euid) Set the current process’s real and effective user ids. Availability: Unix. os.getsid(pid) Call the system call "getsid()". See the Unix manual for the semantics. Availability: Unix. os.setsid() Call the system call "setsid()". See the Unix manual for the semantics. Availability: Unix. os.setuid(uid) Set the current process’s user id. Availability: Unix. os.strerror(code) Return the error message corresponding to the error code in *code*. On platforms where "strerror()" returns "NULL" when given an unknown error number, "ValueError" is raised. os.supports_bytes_environ "True" if the native OS type of the environment is bytes (eg. "False" on Windows). New in version 3.2. os.umask(mask) Set the current numeric umask and return the previous umask. os.uname() Returns information identifying the current operating system. The return value is an object with five attributes: * "sysname" - operating system name * "nodename" - name of machine on network (implementation-defined) * "release" - operating system release * "version" - operating system version * "machine" - hardware identifier For backwards compatibility, this object is also iterable, behaving like a five-tuple containing "sysname", "nodename", "release", "version", and "machine" in that order. Some systems truncate "nodename" to 8 characters or to the leading component; a better way to get the hostname is "socket.gethostname()" or even "socket.gethostbyaddr(socket.gethostname())". Availability: recent flavors of Unix. Changed in version 3.3: Return type changed from a tuple to a tuple-like object with named attributes. os.unsetenv(key) Unset (delete) the environment variable named *key*. Such changes to the environment affect subprocesses started with "os.system()", "popen()" or "fork()" and "execv()". Deletion of items in "os.environ" is automatically translated into a corresponding call to "unsetenv()"; however, calls to "unsetenv()" don’t update "os.environ", so it is actually preferable to delete items of "os.environ". Raises an auditing event "os.unsetenv" with argument "key". Changed in version 3.9: The function is now always available and is also available on Windows. File Object Creation ==================== These functions create new *file objects*. (See also "open()" for opening file descriptors.) os.fdopen(fd, *args, **kwargs) Return an open file object connected to the file descriptor *fd*. This is an alias of the "open()" built-in function and accepts the same arguments. The only difference is that the first argument of "fdopen()" must always be an integer. File Descriptor Operations ========================== These functions operate on I/O streams referenced using file descriptors. File descriptors are small integers corresponding to a file that has been opened by the current process. For example, standard input is usually file descriptor 0, standard output is 1, and standard error is 2. Further files opened by a process will then be assigned 3, 4, 5, and so forth. The name “file descriptor” is slightly deceptive; on Unix platforms, sockets and pipes are also referenced by file descriptors. The "fileno()" method can be used to obtain the file descriptor associated with a *file object* when required. Note that using the file descriptor directly will bypass the file object methods, ignoring aspects such as internal buffering of data. os.close(fd) Close file descriptor *fd*. Note: This function is intended for low-level I/O and must be applied to a file descriptor as returned by "os.open()" or "pipe()". To close a “file object” returned by the built-in function "open()" or by "popen()" or "fdopen()", use its "close()" method. os.closerange(fd_low, fd_high) Close all file descriptors from *fd_low* (inclusive) to *fd_high* (exclusive), ignoring errors. Equivalent to (but much faster than): for fd in range(fd_low, fd_high): try: os.close(fd) except OSError: pass os.copy_file_range(src, dst, count, offset_src=None, offset_dst=None) Copy *count* bytes from file descriptor *src*, starting from offset *offset_src*, to file descriptor *dst*, starting from offset *offset_dst*. If *offset_src* is None, then *src* is read from the current position; respectively for *offset_dst*. The files pointed by *src* and *dst* must reside in the same filesystem, otherwise an "OSError" is raised with "errno" set to "errno.EXDEV". This copy is done without the additional cost of transferring data from the kernel to user space and then back into the kernel. Additionally, some filesystems could implement extra optimizations. The copy is done as if both files are opened as binary. The return value is the amount of bytes copied. This could be less than the amount requested. Availability: Linux kernel >= 4.5 or glibc >= 2.27. New in version 3.8. os.device_encoding(fd) Return a string describing the encoding of the device associated with *fd* if it is connected to a terminal; else return "None". os.dup(fd) Return a duplicate of file descriptor *fd*. The new file descriptor is non-inheritable. On Windows, when duplicating a standard stream (0: stdin, 1: stdout, 2: stderr), the new file descriptor is inheritable. Changed in version 3.4: The new file descriptor is now non- inheritable. os.dup2(fd, fd2, inheritable=True) Duplicate file descriptor *fd* to *fd2*, closing the latter first if necessary. Return *fd2*. The new file descriptor is inheritable by default or non-inheritable if *inheritable* is "False". Changed in version 3.4: Add the optional *inheritable* parameter. Changed in version 3.7: Return *fd2* on success. Previously, "None" was always returned. os.fchmod(fd, mode) Change the mode of the file given by *fd* to the numeric *mode*. See the docs for "chmod()" for possible values of *mode*. As of Python 3.3, this is equivalent to "os.chmod(fd, mode)". Raises an auditing event "os.chmod" with arguments "path", "mode", "dir_fd". Availability: Unix. os.fchown(fd, uid, gid) Change the owner and group id of the file given by *fd* to the numeric *uid* and *gid*. To leave one of the ids unchanged, set it to -1. See "chown()". As of Python 3.3, this is equivalent to "os.chown(fd, uid, gid)". Raises an auditing event "os.chown" with arguments "path", "uid", "gid", "dir_fd". Availability: Unix. os.fdatasync(fd) Force write of file with filedescriptor *fd* to disk. Does not force update of metadata. Availability: Unix. Note: This function is not available on MacOS. os.fpathconf(fd, name) Return system configuration information relevant to an open file. *name* specifies the configuration value to retrieve; it may be a string which is the name of a defined system value; these names are specified in a number of standards (POSIX.1, Unix 95, Unix 98, and others). Some platforms define additional names as well. The names known to the host operating system are given in the "pathconf_names" dictionary. For configuration variables not included in that mapping, passing an integer for *name* is also accepted. If *name* is a string and is not known, "ValueError" is raised. If a specific value for *name* is not supported by the host system, even if it is included in "pathconf_names", an "OSError" is raised with "errno.EINVAL" for the error number. As of Python 3.3, this is equivalent to "os.pathconf(fd, name)". Availability: Unix. os.fstat(fd) Get the status of the file descriptor *fd*. Return a "stat_result" object. As of Python 3.3, this is equivalent to "os.stat(fd)". See also: The "stat()" function. os.fstatvfs(fd) Return information about the filesystem containing the file associated with file descriptor *fd*, like "statvfs()". As of Python 3.3, this is equivalent to "os.statvfs(fd)". Availability: Unix. os.fsync(fd) Force write of file with filedescriptor *fd* to disk. On Unix, this calls the native "fsync()" function; on Windows, the MS "_commit()" function. If you’re starting with a buffered Python *file object* *f*, first do "f.flush()", and then do "os.fsync(f.fileno())", to ensure that all internal buffers associated with *f* are written to disk. Availability: Unix, Windows. os.ftruncate(fd, length) Truncate the file corresponding to file descriptor *fd*, so that it is at most *length* bytes in size. As of Python 3.3, this is equivalent to "os.truncate(fd, length)". Raises an auditing event "os.truncate" with arguments "fd", "length". Availability: Unix, Windows. Changed in version 3.5: Added support for Windows os.get_blocking(fd) Get the blocking mode of the file descriptor: "False" if the "O_NONBLOCK" flag is set, "True" if the flag is cleared. See also "set_blocking()" and "socket.socket.setblocking()". Availability: Unix. New in version 3.5. os.isatty(fd) Return "True" if the file descriptor *fd* is open and connected to a tty(-like) device, else "False". os.lockf(fd, cmd, len) Apply, test or remove a POSIX lock on an open file descriptor. *fd* is an open file descriptor. *cmd* specifies the command to use - one of "F_LOCK", "F_TLOCK", "F_ULOCK" or "F_TEST". *len* specifies the section of the file to lock. Raises an auditing event "os.lockf" with arguments "fd", "cmd", "len". Availability: Unix. New in version 3.3. os.F_LOCK os.F_TLOCK os.F_ULOCK os.F_TEST Flags that specify what action "lockf()" will take. Availability: Unix. New in version 3.3. os.lseek(fd, pos, how) Set the current position of file descriptor *fd* to position *pos*, modified by *how*: "SEEK_SET" or "0" to set the position relative to the beginning of the file; "SEEK_CUR" or "1" to set it relative to the current position; "SEEK_END" or "2" to set it relative to the end of the file. Return the new cursor position in bytes, starting from the beginning. os.SEEK_SET os.SEEK_CUR os.SEEK_END Parameters to the "lseek()" function. Their values are 0, 1, and 2, respectively. New in version 3.3: Some operating systems could support additional values, like "os.SEEK_HOLE" or "os.SEEK_DATA". os.open(path, flags, mode=0o777, *, dir_fd=None) Open the file *path* and set various flags according to *flags* and possibly its mode according to *mode*. When computing *mode*, the current umask value is first masked out. Return the file descriptor for the newly opened file. The new file descriptor is non-inheritable. For a description of the flag and mode values, see the C run-time documentation; flag constants (like "O_RDONLY" and "O_WRONLY") are defined in the "os" module. In particular, on Windows adding "O_BINARY" is needed to open files in binary mode. This function can support paths relative to directory descriptors with the *dir_fd* parameter. Raises an auditing event "open" with arguments "path", "mode", "flags". Changed in version 3.4: The new file descriptor is now non- inheritable. Note: This function is intended for low-level I/O. For normal usage, use the built-in function "open()", which returns a *file object* with "read()" and "write()" methods (and many more). To wrap a file descriptor in a file object, use "fdopen()". New in version 3.3: The *dir_fd* argument. Changed in version 3.5: If the system call is interrupted and the signal handler does not raise an exception, the function now retries the system call instead of raising an "InterruptedError" exception (see **PEP 475** for the rationale). Changed in version 3.6: Accepts a *path-like object*. The following constants are options for the *flags* parameter to the "open()" function. They can be combined using the bitwise OR operator "|". Some of them are not available on all platforms. For descriptions of their availability and use, consult the *open(2)* manual page on Unix or the MSDN on Windows. os.O_RDONLY os.O_WRONLY os.O_RDWR os.O_APPEND os.O_CREAT os.O_EXCL os.O_TRUNC The above constants are available on Unix and Windows. os.O_DSYNC os.O_RSYNC os.O_SYNC os.O_NDELAY os.O_NONBLOCK os.O_NOCTTY os.O_CLOEXEC The above constants are only available on Unix. Changed in version 3.3: Add "O_CLOEXEC" constant. os.O_BINARY os.O_NOINHERIT os.O_SHORT_LIVED os.O_TEMPORARY os.O_RANDOM os.O_SEQUENTIAL os.O_TEXT The above constants are only available on Windows. os.O_ASYNC os.O_DIRECT os.O_DIRECTORY os.O_NOFOLLOW os.O_NOATIME os.O_PATH os.O_TMPFILE os.O_SHLOCK os.O_EXLOCK The above constants are extensions and not present if they are not defined by the C library. Changed in version 3.4: Add "O_PATH" on systems that support it. Add "O_TMPFILE", only available on Linux Kernel 3.11 or newer. os.openpty() Open a new pseudo-terminal pair. Return a pair of file descriptors "(master, slave)" for the pty and the tty, respectively. The new file descriptors are non-inheritable. For a (slightly) more portable approach, use the "pty" module. Availability: some flavors of Unix. Changed in version 3.4: The new file descriptors are now non- inheritable. os.pipe() Create a pipe. Return a pair of file descriptors "(r, w)" usable for reading and writing, respectively. The new file descriptor is non-inheritable. Availability: Unix, Windows. Changed in version 3.4: The new file descriptors are now non- inheritable. os.pipe2(flags) Create a pipe with *flags* set atomically. *flags* can be constructed by ORing together one or more of these values: "O_NONBLOCK", "O_CLOEXEC". Return a pair of file descriptors "(r, w)" usable for reading and writing, respectively. Availability: some flavors of Unix. New in version 3.3. os.posix_fallocate(fd, offset, len) Ensures that enough disk space is allocated for the file specified by *fd* starting from *offset* and continuing for *len* bytes. Availability: Unix. New in version 3.3. os.posix_fadvise(fd, offset, len, advice) Announces an intention to access data in a specific pattern thus allowing the kernel to make optimizations. The advice applies to the region of the file specified by *fd* starting at *offset* and continuing for *len* bytes. *advice* is one of "POSIX_FADV_NORMAL", "POSIX_FADV_SEQUENTIAL", "POSIX_FADV_RANDOM", "POSIX_FADV_NOREUSE", "POSIX_FADV_WILLNEED" or "POSIX_FADV_DONTNEED". Availability: Unix. New in version 3.3. os.POSIX_FADV_NORMAL os.POSIX_FADV_SEQUENTIAL os.POSIX_FADV_RANDOM os.POSIX_FADV_NOREUSE os.POSIX_FADV_WILLNEED os.POSIX_FADV_DONTNEED Flags that can be used in *advice* in "posix_fadvise()" that specify the access pattern that is likely to be used. Availability: Unix. New in version 3.3. os.pread(fd, n, offset) Read at most *n* bytes from file descriptor *fd* at a position of *offset*, leaving the file offset unchanged. Return a bytestring containing the bytes read. If the end of the file referred to by *fd* has been reached, an empty bytes object is returned. Availability: Unix. New in version 3.3. os.preadv(fd, buffers, offset, flags=0) Read from a file descriptor *fd* at a position of *offset* into mutable *bytes-like objects* *buffers*, leaving the file offset unchanged. Transfer data into each buffer until it is full and then move on to the next buffer in the sequence to hold the rest of the data. The flags argument contains a bitwise OR of zero or more of the following flags: * "RWF_HIPRI" * "RWF_NOWAIT" Return the total number of bytes actually read which can be less than the total capacity of all the objects. The operating system may set a limit ("sysconf()" value "'SC_IOV_MAX'") on the number of buffers that can be used. Combine the functionality of "os.readv()" and "os.pread()". Availability: Linux 2.6.30 and newer, FreeBSD 6.0 and newer, OpenBSD 2.7 and newer, AIX 7.1 and newer. Using flags requires Linux 4.6 or newer. New in version 3.7. os.RWF_NOWAIT Do not wait for data which is not immediately available. If this flag is specified, the system call will return instantly if it would have to read data from the backing storage or wait for a lock. If some data was successfully read, it will return the number of bytes read. If no bytes were read, it will return "-1" and set errno to "errno.EAGAIN". Availability: Linux 4.14 and newer. New in version 3.7. os.RWF_HIPRI High priority read/write. Allows block-based filesystems to use polling of the device, which provides lower latency, but may use additional resources. Currently, on Linux, this feature is usable only on a file descriptor opened using the "O_DIRECT" flag. Availability: Linux 4.6 and newer. New in version 3.7. os.pwrite(fd, str, offset) Write the bytestring in *str* to file descriptor *fd* at position of *offset*, leaving the file offset unchanged. Return the number of bytes actually written. Availability: Unix. New in version 3.3. os.pwritev(fd, buffers, offset, flags=0) Write the *buffers* contents to file descriptor *fd* at a offset *offset*, leaving the file offset unchanged. *buffers* must be a sequence of *bytes-like objects*. Buffers are processed in array order. Entire contents of the first buffer is written before proceeding to the second, and so on. The flags argument contains a bitwise OR of zero or more of the following flags: * "RWF_DSYNC" * "RWF_SYNC" Return the total number of bytes actually written. The operating system may set a limit ("sysconf()" value "'SC_IOV_MAX'") on the number of buffers that can be used. Combine the functionality of "os.writev()" and "os.pwrite()". Availability: Linux 2.6.30 and newer, FreeBSD 6.0 and newer, OpenBSD 2.7 and newer, AIX 7.1 and newer. Using flags requires Linux 4.7 or newer. New in version 3.7. os.RWF_DSYNC Provide a per-write equivalent of the "O_DSYNC" "open(2)" flag. This flag effect applies only to the data range written by the system call. Availability: Linux 4.7 and newer. New in version 3.7. os.RWF_SYNC Provide a per-write equivalent of the "O_SYNC" "open(2)" flag. This flag effect applies only to the data range written by the system call. Availability: Linux 4.7 and newer. New in version 3.7. os.read(fd, n) Read at most *n* bytes from file descriptor *fd*. Return a bytestring containing the bytes read. If the end of the file referred to by *fd* has been reached, an empty bytes object is returned. Note: This function is intended for low-level I/O and must be applied to a file descriptor as returned by "os.open()" or "pipe()". To read a “file object” returned by the built-in function "open()" or by "popen()" or "fdopen()", or "sys.stdin", use its "read()" or "readline()" methods. Changed in version 3.5: If the system call is interrupted and the signal handler does not raise an exception, the function now retries the system call instead of raising an "InterruptedError" exception (see **PEP 475** for the rationale). os.sendfile(out_fd, in_fd, offset, count) os.sendfile(out_fd, in_fd, offset, count, headers=(), trailers=(), flags=0) Copy *count* bytes from file descriptor *in_fd* to file descriptor *out_fd* starting at *offset*. Return the number of bytes sent. When EOF is reached return "0". The first function notation is supported by all platforms that define "sendfile()". On Linux, if *offset* is given as "None", the bytes are read from the current position of *in_fd* and the position of *in_fd* is updated. The second case may be used on macOS and FreeBSD where *headers* and *trailers* are arbitrary sequences of buffers that are written before and after the data from *in_fd* is written. It returns the same as the first case. On macOS and FreeBSD, a value of "0" for *count* specifies to send until the end of *in_fd* is reached. All platforms support sockets as *out_fd* file descriptor, and some platforms allow other types (e.g. regular file, pipe) as well. Cross-platform applications should not use *headers*, *trailers* and *flags* arguments. Availability: Unix. Note: For a higher-level wrapper of "sendfile()", see "socket.socket.sendfile()". New in version 3.3. Changed in version 3.9: Parameters *out* and *in* was renamed to *out_fd* and *in_fd*. os.set_blocking(fd, blocking) Set the blocking mode of the specified file descriptor. Set the "O_NONBLOCK" flag if blocking is "False", clear the flag otherwise. See also "get_blocking()" and "socket.socket.setblocking()". Availability: Unix. New in version 3.5. os.SF_NODISKIO os.SF_MNOWAIT os.SF_SYNC Parameters to the "sendfile()" function, if the implementation supports them. Availability: Unix. New in version 3.3. os.readv(fd, buffers) Read from a file descriptor *fd* into a number of mutable *bytes- like objects* *buffers*. Transfer data into each buffer until it is full and then move on to the next buffer in the sequence to hold the rest of the data. Return the total number of bytes actually read which can be less than the total capacity of all the objects. The operating system may set a limit ("sysconf()" value "'SC_IOV_MAX'") on the number of buffers that can be used. Availability: Unix. New in version 3.3. os.tcgetpgrp(fd) Return the process group associated with the terminal given by *fd* (an open file descriptor as returned by "os.open()"). Availability: Unix. os.tcsetpgrp(fd, pg) Set the process group associated with the terminal given by *fd* (an open file descriptor as returned by "os.open()") to *pg*. Availability: Unix. os.ttyname(fd) Return a string which specifies the terminal device associated with file descriptor *fd*. If *fd* is not associated with a terminal device, an exception is raised. Availability: Unix. os.write(fd, str) Write the bytestring in *str* to file descriptor *fd*. Return the number of bytes actually written. Note: This function is intended for low-level I/O and must be applied to a file descriptor as returned by "os.open()" or "pipe()". To write a “file object” returned by the built-in function "open()" or by "popen()" or "fdopen()", or "sys.stdout" or "sys.stderr", use its "write()" method. Changed in version 3.5: If the system call is interrupted and the signal handler does not raise an exception, the function now retries the system call instead of raising an "InterruptedError" exception (see **PEP 475** for the rationale). os.writev(fd, buffers) Write the contents of *buffers* to file descriptor *fd*. *buffers* must be a sequence of *bytes-like objects*. Buffers are processed in array order. Entire contents of the first buffer is written before proceeding to the second, and so on. Returns the total number of bytes actually written. The operating system may set a limit ("sysconf()" value "'SC_IOV_MAX'") on the number of buffers that can be used. Availability: Unix. New in version 3.3. Querying the size of a terminal ------------------------------- New in version 3.3. os.get_terminal_size(fd=STDOUT_FILENO) Return the size of the terminal window as "(columns, lines)", tuple of type "terminal_size". The optional argument "fd" (default "STDOUT_FILENO", or standard output) specifies which file descriptor should be queried. If the file descriptor is not connected to a terminal, an "OSError" is raised. "shutil.get_terminal_size()" is the high-level function which should normally be used, "os.get_terminal_size" is the low-level implementation. Availability: Unix, Windows. class os.terminal_size A subclass of tuple, holding "(columns, lines)" of the terminal window size. columns Width of the terminal window in characters. lines Height of the terminal window in characters. Inheritance of File Descriptors ------------------------------- New in version 3.4. A file descriptor has an “inheritable” flag which indicates if the file descriptor can be inherited by child processes. Since Python 3.4, file descriptors created by Python are non-inheritable by default. On UNIX, non-inheritable file descriptors are closed in child processes at the execution of a new program, other file descriptors are inherited. On Windows, non-inheritable handles and file descriptors are closed in child processes, except for standard streams (file descriptors 0, 1 and 2: stdin, stdout and stderr), which are always inherited. Using "spawn*" functions, all inheritable handles and all inheritable file descriptors are inherited. Using the "subprocess" module, all file descriptors except standard streams are closed, and inheritable handles are only inherited if the *close_fds* parameter is "False". os.get_inheritable(fd) Get the “inheritable” flag of the specified file descriptor (a boolean). os.set_inheritable(fd, inheritable) Set the “inheritable” flag of the specified file descriptor. os.get_handle_inheritable(handle) Get the “inheritable” flag of the specified handle (a boolean). Availability: Windows. os.set_handle_inheritable(handle, inheritable) Set the “inheritable” flag of the specified handle. Availability: Windows. Files and Directories ===================== On some Unix platforms, many of these functions support one or more of these features: * **specifying a file descriptor:** Normally the *path* argument provided to functions in the "os" module must be a string specifying a file path. However, some functions now alternatively accept an open file descriptor for their *path* argument. The function will then operate on the file referred to by the descriptor. (For POSIX systems, Python will call the variant of the function prefixed with "f" (e.g. call "fchdir" instead of "chdir").) You can check whether or not *path* can be specified as a file descriptor for a particular function on your platform using "os.supports_fd". If this functionality is unavailable, using it will raise a "NotImplementedError". If the function also supports *dir_fd* or *follow_symlinks* arguments, it’s an error to specify one of those when supplying *path* as a file descriptor. * **paths relative to directory descriptors:** If *dir_fd* is not "None", it should be a file descriptor referring to a directory, and the path to operate on should be relative; path will then be relative to that directory. If the path is absolute, *dir_fd* is ignored. (For POSIX systems, Python will call the variant of the function with an "at" suffix and possibly prefixed with "f" (e.g. call "faccessat" instead of "access"). You can check whether or not *dir_fd* is supported for a particular function on your platform using "os.supports_dir_fd". If it’s unavailable, using it will raise a "NotImplementedError". * **not following symlinks:** If *follow_symlinks* is "False", and the last element of the path to operate on is a symbolic link, the function will operate on the symbolic link itself rather than the file pointed to by the link. (For POSIX systems, Python will call the "l..." variant of the function.) You can check whether or not *follow_symlinks* is supported for a particular function on your platform using "os.supports_follow_symlinks". If it’s unavailable, using it will raise a "NotImplementedError". os.access(path, mode, *, dir_fd=None, effective_ids=False, follow_symlinks=True) Use the real uid/gid to test for access to *path*. Note that most operations will use the effective uid/gid, therefore this routine can be used in a suid/sgid environment to test if the invoking user has the specified access to *path*. *mode* should be "F_OK" to test the existence of *path*, or it can be the inclusive OR of one or more of "R_OK", "W_OK", and "X_OK" to test permissions. Return "True" if access is allowed, "False" if not. See the Unix man page *access(2)* for more information. This function can support specifying paths relative to directory descriptors and not following symlinks. If *effective_ids* is "True", "access()" will perform its access checks using the effective uid/gid instead of the real uid/gid. *effective_ids* may not be supported on your platform; you can check whether or not it is available using "os.supports_effective_ids". If it is unavailable, using it will raise a "NotImplementedError". Note: Using "access()" to check if a user is authorized to e.g. open a file before actually doing so using "open()" creates a security hole, because the user might exploit the short time interval between checking and opening the file to manipulate it. It’s preferable to use *EAFP* techniques. For example: if os.access("myfile", os.R_OK): with open("myfile") as fp: return fp.read() return "some default data" is better written as: try: fp = open("myfile") except PermissionError: return "some default data" else: with fp: return fp.read() Note: I/O operations may fail even when "access()" indicates that they would succeed, particularly for operations on network filesystems which may have permissions semantics beyond the usual POSIX permission-bit model. Changed in version 3.3: Added the *dir_fd*, *effective_ids*, and *follow_symlinks* parameters. Changed in version 3.6: Accepts a *path-like object*. os.F_OK os.R_OK os.W_OK os.X_OK Values to pass as the *mode* parameter of "access()" to test the existence, readability, writability and executability of *path*, respectively. os.chdir(path) Change the current working directory to *path*. This function can support specifying a file descriptor. The descriptor must refer to an opened directory, not an open file. This function can raise "OSError" and subclasses such as "FileNotFoundError", "PermissionError", and "NotADirectoryError". Raises an auditing event "os.chdir" with argument "path". New in version 3.3: Added support for specifying *path* as a file descriptor on some platforms. Changed in version 3.6: Accepts a *path-like object*. os.chflags(path, flags, *, follow_symlinks=True) Set the flags of *path* to the numeric *flags*. *flags* may take a combination (bitwise OR) of the following values (as defined in the "stat" module): * "stat.UF_NODUMP" * "stat.UF_IMMUTABLE" * "stat.UF_APPEND" * "stat.UF_OPAQUE" * "stat.UF_NOUNLINK" * "stat.UF_COMPRESSED" * "stat.UF_HIDDEN" * "stat.SF_ARCHIVED" * "stat.SF_IMMUTABLE" * "stat.SF_APPEND" * "stat.SF_NOUNLINK" * "stat.SF_SNAPSHOT" This function can support not following symlinks. Raises an auditing event "os.chflags" with arguments "path", "flags". Availability: Unix. New in version 3.3: The *follow_symlinks* argument. Changed in version 3.6: Accepts a *path-like object*. os.chmod(path, mode, *, dir_fd=None, follow_symlinks=True) Change the mode of *path* to the numeric *mode*. *mode* may take one of the following values (as defined in the "stat" module) or bitwise ORed combinations of them: * "stat.S_ISUID" * "stat.S_ISGID" * "stat.S_ENFMT" * "stat.S_ISVTX" * "stat.S_IREAD" * "stat.S_IWRITE" * "stat.S_IEXEC" * "stat.S_IRWXU" * "stat.S_IRUSR" * "stat.S_IWUSR" * "stat.S_IXUSR" * "stat.S_IRWXG" * "stat.S_IRGRP" * "stat.S_IWGRP" * "stat.S_IXGRP" * "stat.S_IRWXO" * "stat.S_IROTH" * "stat.S_IWOTH" * "stat.S_IXOTH" This function can support specifying a file descriptor, paths relative to directory descriptors and not following symlinks. Note: Although Windows supports "chmod()", you can only set the file’s read-only flag with it (via the "stat.S_IWRITE" and "stat.S_IREAD" constants or a corresponding integer value). All other bits are ignored. Raises an auditing event "os.chmod" with arguments "path", "mode", "dir_fd". New in version 3.3: Added support for specifying *path* as an open file descriptor, and the *dir_fd* and *follow_symlinks* arguments. Changed in version 3.6: Accepts a *path-like object*. os.chown(path, uid, gid, *, dir_fd=None, follow_symlinks=True) Change the owner and group id of *path* to the numeric *uid* and *gid*. To leave one of the ids unchanged, set it to -1. This function can support specifying a file descriptor, paths relative to directory descriptors and not following symlinks. See "shutil.chown()" for a higher-level function that accepts names in addition to numeric ids. Raises an auditing event "os.chown" with arguments "path", "uid", "gid", "dir_fd". Availability: Unix. New in version 3.3: Added support for specifying *path* as an open file descriptor, and the *dir_fd* and *follow_symlinks* arguments. Changed in version 3.6: Supports a *path-like object*. os.chroot(path) Change the root directory of the current process to *path*. Availability: Unix. Changed in version 3.6: Accepts a *path-like object*. os.fchdir(fd) Change the current working directory to the directory represented by the file descriptor *fd*. The descriptor must refer to an opened directory, not an open file. As of Python 3.3, this is equivalent to "os.chdir(fd)". Raises an auditing event "os.chdir" with argument "path". Availability: Unix. os.getcwd() Return a string representing the current working directory. os.getcwdb() Return a bytestring representing the current working directory. Changed in version 3.8: The function now uses the UTF-8 encoding on Windows, rather than the ANSI code page: see **PEP 529** for the rationale. The function is no longer deprecated on Windows. os.lchflags(path, flags) Set the flags of *path* to the numeric *flags*, like "chflags()", but do not follow symbolic links. As of Python 3.3, this is equivalent to "os.chflags(path, flags, follow_symlinks=False)". Raises an auditing event "os.chflags" with arguments "path", "flags". Availability: Unix. Changed in version 3.6: Accepts a *path-like object*. os.lchmod(path, mode) Change the mode of *path* to the numeric *mode*. If path is a symlink, this affects the symlink rather than the target. See the docs for "chmod()" for possible values of *mode*. As of Python 3.3, this is equivalent to "os.chmod(path, mode, follow_symlinks=False)". Raises an auditing event "os.chmod" with arguments "path", "mode", "dir_fd". Availability: Unix. Changed in version 3.6: Accepts a *path-like object*. os.lchown(path, uid, gid) Change the owner and group id of *path* to the numeric *uid* and *gid*. This function will not follow symbolic links. As of Python 3.3, this is equivalent to "os.chown(path, uid, gid, follow_symlinks=False)". Raises an auditing event "os.chown" with arguments "path", "uid", "gid", "dir_fd". Availability: Unix. Changed in version 3.6: Accepts a *path-like object*. os.link(src, dst, *, src_dir_fd=None, dst_dir_fd=None, follow_symlinks=True) Create a hard link pointing to *src* named *dst*. This function can support specifying *src_dir_fd* and/or *dst_dir_fd* to supply paths relative to directory descriptors, and not following symlinks. Raises an auditing event "os.link" with arguments "src", "dst", "src_dir_fd", "dst_dir_fd". Availability: Unix, Windows. Changed in version 3.2: Added Windows support. New in version 3.3: Added the *src_dir_fd*, *dst_dir_fd*, and *follow_symlinks* arguments. Changed in version 3.6: Accepts a *path-like object* for *src* and *dst*. os.listdir(path='.') Return a list containing the names of the entries in the directory given by *path*. The list is in arbitrary order, and does not include the special entries "'.'" and "'..'" even if they are present in the directory. If a file is removed from or added to the directory during the call of this function, whether a name for that file be included is unspecified. *path* may be a *path-like object*. If *path* is of type "bytes" (directly or indirectly through the "PathLike" interface), the filenames returned will also be of type "bytes"; in all other circumstances, they will be of type "str". This function can also support specifying a file descriptor; the file descriptor must refer to a directory. Raises an auditing event "os.listdir" with argument "path". Note: To encode "str" filenames to "bytes", use "fsencode()". See also: The "scandir()" function returns directory entries along with file attribute information, giving better performance for many common use cases. Changed in version 3.2: The *path* parameter became optional. New in version 3.3: Added support for specifying *path* as an open file descriptor. Changed in version 3.6: Accepts a *path-like object*. os.lstat(path, *, dir_fd=None) Perform the equivalent of an "lstat()" system call on the given path. Similar to "stat()", but does not follow symbolic links. Return a "stat_result" object. On platforms that do not support symbolic links, this is an alias for "stat()". As of Python 3.3, this is equivalent to "os.stat(path, dir_fd=dir_fd, follow_symlinks=False)". This function can also support paths relative to directory descriptors. See also: The "stat()" function. Changed in version 3.2: Added support for Windows 6.0 (Vista) symbolic links. Changed in version 3.3: Added the *dir_fd* parameter. Changed in version 3.6: Accepts a *path-like object*. Changed in version 3.8: On Windows, now opens reparse points that represent another path (name surrogates), including symbolic links and directory junctions. Other kinds of reparse points are resolved by the operating system as for "stat()". os.mkdir(path, mode=0o777, *, dir_fd=None) Create a directory named *path* with numeric mode *mode*. If the directory already exists, "FileExistsError" is raised. If a parent directory in the path does not exist, "FileNotFoundError" is raised. On some systems, *mode* is ignored. Where it is used, the current umask value is first masked out. If bits other than the last 9 (i.e. the last 3 digits of the octal representation of the *mode*) are set, their meaning is platform-dependent. On some platforms, they are ignored and you should call "chmod()" explicitly to set them. On Windows, a *mode* of "0o700" is specifically handled to apply access control to the new directory such that only the current user and administrators have access. Other values of *mode* are ignored. This function can also support paths relative to directory descriptors. It is also possible to create temporary directories; see the "tempfile" module’s "tempfile.mkdtemp()" function. Raises an auditing event "os.mkdir" with arguments "path", "mode", "dir_fd". New in version 3.3: The *dir_fd* argument. Changed in version 3.6: Accepts a *path-like object*. Changed in version 3.9.20: Windows now handles a *mode* of "0o700". os.makedirs(name, mode=0o777, exist_ok=False) Recursive directory creation function. Like "mkdir()", but makes all intermediate-level directories needed to contain the leaf directory. The *mode* parameter is passed to "mkdir()" for creating the leaf directory; see the mkdir() description for how it is interpreted. To set the file permission bits of any newly-created parent directories you can set the umask before invoking "makedirs()". The file permission bits of existing parent directories are not changed. If *exist_ok* is "False" (the default), an "FileExistsError" is raised if the target directory already exists. Note: "makedirs()" will become confused if the path elements to create include "pardir" (eg. “..” on UNIX systems). This function handles UNC paths correctly. Raises an auditing event "os.mkdir" with arguments "path", "mode", "dir_fd". New in version 3.2: The *exist_ok* parameter. Changed in version 3.4.1: Before Python 3.4.1, if *exist_ok* was "True" and the directory existed, "makedirs()" would still raise an error if *mode* did not match the mode of the existing directory. Since this behavior was impossible to implement safely, it was removed in Python 3.4.1. See bpo-21082. Changed in version 3.6: Accepts a *path-like object*. Changed in version 3.7: The *mode* argument no longer affects the file permission bits of newly-created intermediate-level directories. os.mkfifo(path, mode=0o666, *, dir_fd=None) Create a FIFO (a named pipe) named *path* with numeric mode *mode*. The current umask value is first masked out from the mode. This function can also support paths relative to directory descriptors. FIFOs are pipes that can be accessed like regular files. FIFOs exist until they are deleted (for example with "os.unlink()"). Generally, FIFOs are used as rendezvous between “client” and “server” type processes: the server opens the FIFO for reading, and the client opens it for writing. Note that "mkfifo()" doesn’t open the FIFO — it just creates the rendezvous point. Availability: Unix. New in version 3.3: The *dir_fd* argument. Changed in version 3.6: Accepts a *path-like object*. os.mknod(path, mode=0o600, device=0, *, dir_fd=None) Create a filesystem node (file, device special file or named pipe) named *path*. *mode* specifies both the permissions to use and the type of node to be created, being combined (bitwise OR) with one of "stat.S_IFREG", "stat.S_IFCHR", "stat.S_IFBLK", and "stat.S_IFIFO" (those constants are available in "stat"). For "stat.S_IFCHR" and "stat.S_IFBLK", *device* defines the newly created device special file (probably using "os.makedev()"), otherwise it is ignored. This function can also support paths relative to directory descriptors. Availability: Unix. New in version 3.3: The *dir_fd* argument. Changed in version 3.6: Accepts a *path-like object*. os.major(device) Extract the device major number from a raw device number (usually the "st_dev" or "st_rdev" field from "stat"). os.minor(device) Extract the device minor number from a raw device number (usually the "st_dev" or "st_rdev" field from "stat"). os.makedev(major, minor) Compose a raw device number from the major and minor device numbers. os.pathconf(path, name) Return system configuration information relevant to a named file. *name* specifies the configuration value to retrieve; it may be a string which is the name of a defined system value; these names are specified in a number of standards (POSIX.1, Unix 95, Unix 98, and others). Some platforms define additional names as well. The names known to the host operating system are given in the "pathconf_names" dictionary. For configuration variables not included in that mapping, passing an integer for *name* is also accepted. If *name* is a string and is not known, "ValueError" is raised. If a specific value for *name* is not supported by the host system, even if it is included in "pathconf_names", an "OSError" is raised with "errno.EINVAL" for the error number. This function can support specifying a file descriptor. Availability: Unix. Changed in version 3.6: Accepts a *path-like object*. os.pathconf_names Dictionary mapping names accepted by "pathconf()" and "fpathconf()" to the integer values defined for those names by the host operating system. This can be used to determine the set of names known to the system. Availability: Unix. os.readlink(path, *, dir_fd=None) Return a string representing the path to which the symbolic link points. The result may be either an absolute or relative pathname; if it is relative, it may be converted to an absolute pathname using "os.path.join(os.path.dirname(path), result)". If the *path* is a string object (directly or indirectly through a "PathLike" interface), the result will also be a string object, and the call may raise a UnicodeDecodeError. If the *path* is a bytes object (direct or indirectly), the result will be a bytes object. This function can also support paths relative to directory descriptors. When trying to resolve a path that may contain links, use "realpath()" to properly handle recursion and platform differences. Availability: Unix, Windows. Changed in version 3.2: Added support for Windows 6.0 (Vista) symbolic links. New in version 3.3: The *dir_fd* argument. Changed in version 3.6: Accepts a *path-like object* on Unix. Changed in version 3.8: Accepts a *path-like object* and a bytes object on Windows. Changed in version 3.8: Added support for directory junctions, and changed to return the substitution path (which typically includes "\\?\" prefix) rather than the optional “print name” field that was previously returned. os.remove(path, *, dir_fd=None) Remove (delete) the file *path*. If *path* is a directory, an "IsADirectoryError" is raised. Use "rmdir()" to remove directories. If the file does not exist, a "FileNotFoundError" is raised. This function can support paths relative to directory descriptors. On Windows, attempting to remove a file that is in use causes an exception to be raised; on Unix, the directory entry is removed but the storage allocated to the file is not made available until the original file is no longer in use. This function is semantically identical to "unlink()". Raises an auditing event "os.remove" with arguments "path", "dir_fd". New in version 3.3: The *dir_fd* argument. Changed in version 3.6: Accepts a *path-like object*. os.removedirs(name) Remove directories recursively. Works like "rmdir()" except that, if the leaf directory is successfully removed, "removedirs()" tries to successively remove every parent directory mentioned in *path* until an error is raised (which is ignored, because it generally means that a parent directory is not empty). For example, "os.removedirs('foo/bar/baz')" will first remove the directory "'foo/bar/baz'", and then remove "'foo/bar'" and "'foo'" if they are empty. Raises "OSError" if the leaf directory could not be successfully removed. Raises an auditing event "os.remove" with arguments "path", "dir_fd". Changed in version 3.6: Accepts a *path-like object*. os.rename(src, dst, *, src_dir_fd=None, dst_dir_fd=None) Rename the file or directory *src* to *dst*. If *dst* exists, the operation will fail with an "OSError" subclass in a number of cases: On Windows, if *dst* exists a "FileExistsError" is always raised. On Unix, if *src* is a file and *dst* is a directory or vice-versa, an "IsADirectoryError" or a "NotADirectoryError" will be raised respectively. If both are directories and *dst* is empty, *dst* will be silently replaced. If *dst* is a non-empty directory, an "OSError" is raised. If both are files, *dst* it will be replaced silently if the user has permission. The operation may fail on some Unix flavors if *src* and *dst* are on different filesystems. If successful, the renaming will be an atomic operation (this is a POSIX requirement). This function can support specifying *src_dir_fd* and/or *dst_dir_fd* to supply paths relative to directory descriptors. If you want cross-platform overwriting of the destination, use "replace()". Raises an auditing event "os.rename" with arguments "src", "dst", "src_dir_fd", "dst_dir_fd". New in version 3.3: The *src_dir_fd* and *dst_dir_fd* arguments. Changed in version 3.6: Accepts a *path-like object* for *src* and *dst*. os.renames(old, new) Recursive directory or file renaming function. Works like "rename()", except creation of any intermediate directories needed to make the new pathname good is attempted first. After the rename, directories corresponding to rightmost path segments of the old name will be pruned away using "removedirs()". Note: This function can fail with the new directory structure made if you lack permissions needed to remove the leaf directory or file. Raises an auditing event "os.rename" with arguments "src", "dst", "src_dir_fd", "dst_dir_fd". Changed in version 3.6: Accepts a *path-like object* for *old* and *new*. os.replace(src, dst, *, src_dir_fd=None, dst_dir_fd=None) Rename the file or directory *src* to *dst*. If *dst* is a non- empty directory, "OSError" will be raised. If *dst* exists and is a file, it will be replaced silently if the user has permission. The operation may fail if *src* and *dst* are on different filesystems. If successful, the renaming will be an atomic operation (this is a POSIX requirement). This function can support specifying *src_dir_fd* and/or *dst_dir_fd* to supply paths relative to directory descriptors. Raises an auditing event "os.rename" with arguments "src", "dst", "src_dir_fd", "dst_dir_fd". New in version 3.3. Changed in version 3.6: Accepts a *path-like object* for *src* and *dst*. os.rmdir(path, *, dir_fd=None) Remove (delete) the directory *path*. If the directory does not exist or is not empty, an "FileNotFoundError" or an "OSError" is raised respectively. In order to remove whole directory trees, "shutil.rmtree()" can be used. This function can support paths relative to directory descriptors. Raises an auditing event "os.rmdir" with arguments "path", "dir_fd". New in version 3.3: The *dir_fd* parameter. Changed in version 3.6: Accepts a *path-like object*. os.scandir(path='.') Return an iterator of "os.DirEntry" objects corresponding to the entries in the directory given by *path*. The entries are yielded in arbitrary order, and the special entries "'.'" and "'..'" are not included. If a file is removed from or added to the directory after creating the iterator, whether an entry for that file be included is unspecified. Using "scandir()" instead of "listdir()" can significantly increase the performance of code that also needs file type or file attribute information, because "os.DirEntry" objects expose this information if the operating system provides it when scanning a directory. All "os.DirEntry" methods may perform a system call, but "is_dir()" and "is_file()" usually only require a system call for symbolic links; "os.DirEntry.stat()" always requires a system call on Unix but only requires one for symbolic links on Windows. *path* may be a *path-like object*. If *path* is of type "bytes" (directly or indirectly through the "PathLike" interface), the type of the "name" and "path" attributes of each "os.DirEntry" will be "bytes"; in all other circumstances, they will be of type "str". This function can also support specifying a file descriptor; the file descriptor must refer to a directory. Raises an auditing event "os.scandir" with argument "path". The "scandir()" iterator supports the *context manager* protocol and has the following method: scandir.close() Close the iterator and free acquired resources. This is called automatically when the iterator is exhausted or garbage collected, or when an error happens during iterating. However it is advisable to call it explicitly or use the "with" statement. New in version 3.6. The following example shows a simple use of "scandir()" to display all the files (excluding directories) in the given *path* that don’t start with "'.'". The "entry.is_file()" call will generally not make an additional system call: with os.scandir(path) as it: for entry in it: if not entry.name.startswith('.') and entry.is_file(): print(entry.name) Note: On Unix-based systems, "scandir()" uses the system’s opendir() and readdir() functions. On Windows, it uses the Win32 FindFirstFileW and FindNextFileW functions. New in version 3.5. New in version 3.6: Added support for the *context manager* protocol and the "close()" method. If a "scandir()" iterator is neither exhausted nor explicitly closed a "ResourceWarning" will be emitted in its destructor.The function accepts a *path-like object*. Changed in version 3.7: Added support for file descriptors on Unix. class os.DirEntry Object yielded by "scandir()" to expose the file path and other file attributes of a directory entry. "scandir()" will provide as much of this information as possible without making additional system calls. When a "stat()" or "lstat()" system call is made, the "os.DirEntry" object will cache the result. "os.DirEntry" instances are not intended to be stored in long-lived data structures; if you know the file metadata has changed or if a long time has elapsed since calling "scandir()", call "os.stat(entry.path)" to fetch up-to-date information. Because the "os.DirEntry" methods can make operating system calls, they may also raise "OSError". If you need very fine-grained control over errors, you can catch "OSError" when calling one of the "os.DirEntry" methods and handle as appropriate. To be directly usable as a *path-like object*, "os.DirEntry" implements the "PathLike" interface. Attributes and methods on a "os.DirEntry" instance are as follows: name The entry’s base filename, relative to the "scandir()" *path* argument. The "name" attribute will be "bytes" if the "scandir()" *path* argument is of type "bytes" and "str" otherwise. Use "fsdecode()" to decode byte filenames. path The entry’s full path name: equivalent to "os.path.join(scandir_path, entry.name)" where *scandir_path* is the "scandir()" *path* argument. The path is only absolute if the "scandir()" *path* argument was absolute. If the "scandir()" *path* argument was a file descriptor, the "path" attribute is the same as the "name" attribute. The "path" attribute will be "bytes" if the "scandir()" *path* argument is of type "bytes" and "str" otherwise. Use "fsdecode()" to decode byte filenames. inode() Return the inode number of the entry. The result is cached on the "os.DirEntry" object. Use "os.stat(entry.path, follow_symlinks=False).st_ino" to fetch up- to-date information. On the first, uncached call, a system call is required on Windows but not on Unix. is_dir(*, follow_symlinks=True) Return "True" if this entry is a directory or a symbolic link pointing to a directory; return "False" if the entry is or points to any other kind of file, or if it doesn’t exist anymore. If *follow_symlinks* is "False", return "True" only if this entry is a directory (without following symlinks); return "False" if the entry is any other kind of file or if it doesn’t exist anymore. The result is cached on the "os.DirEntry" object, with a separate cache for *follow_symlinks* "True" and "False". Call "os.stat()" along with "stat.S_ISDIR()" to fetch up-to-date information. On the first, uncached call, no system call is required in most cases. Specifically, for non-symlinks, neither Windows or Unix require a system call, except on certain Unix file systems, such as network file systems, that return "dirent.d_type == DT_UNKNOWN". If the entry is a symlink, a system call will be required to follow the symlink unless *follow_symlinks* is "False". This method can raise "OSError", such as "PermissionError", but "FileNotFoundError" is caught and not raised. is_file(*, follow_symlinks=True) Return "True" if this entry is a file or a symbolic link pointing to a file; return "False" if the entry is or points to a directory or other non-file entry, or if it doesn’t exist anymore. If *follow_symlinks* is "False", return "True" only if this entry is a file (without following symlinks); return "False" if the entry is a directory or other non-file entry, or if it doesn’t exist anymore. The result is cached on the "os.DirEntry" object. Caching, system calls made, and exceptions raised are as per "is_dir()". is_symlink() Return "True" if this entry is a symbolic link (even if broken); return "False" if the entry points to a directory or any kind of file, or if it doesn’t exist anymore. The result is cached on the "os.DirEntry" object. Call "os.path.islink()" to fetch up-to-date information. On the first, uncached call, no system call is required in most cases. Specifically, neither Windows or Unix require a system call, except on certain Unix file systems, such as network file systems, that return "dirent.d_type == DT_UNKNOWN". This method can raise "OSError", such as "PermissionError", but "FileNotFoundError" is caught and not raised. stat(*, follow_symlinks=True) Return a "stat_result" object for this entry. This method follows symbolic links by default; to stat a symbolic link add the "follow_symlinks=False" argument. On Unix, this method always requires a system call. On Windows, it only requires a system call if *follow_symlinks* is "True" and the entry is a reparse point (for example, a symbolic link or directory junction). On Windows, the "st_ino", "st_dev" and "st_nlink" attributes of the "stat_result" are always set to zero. Call "os.stat()" to get these attributes. The result is cached on the "os.DirEntry" object, with a separate cache for *follow_symlinks* "True" and "False". Call "os.stat()" to fetch up-to-date information. Note that there is a nice correspondence between several attributes and methods of "os.DirEntry" and of "pathlib.Path". In particular, the "name" attribute has the same meaning, as do the "is_dir()", "is_file()", "is_symlink()" and "stat()" methods. New in version 3.5. Changed in version 3.6: Added support for the "PathLike" interface. Added support for "bytes" paths on Windows. os.stat(path, *, dir_fd=None, follow_symlinks=True) Get the status of a file or a file descriptor. Perform the equivalent of a "stat()" system call on the given path. *path* may be specified as either a string or bytes – directly or indirectly through the "PathLike" interface – or as an open file descriptor. Return a "stat_result" object. This function normally follows symlinks; to stat a symlink add the argument "follow_symlinks=False", or use "lstat()". This function can support specifying a file descriptor and not following symlinks. On Windows, passing "follow_symlinks=False" will disable following all name-surrogate reparse points, which includes symlinks and directory junctions. Other types of reparse points that do not resemble links or that the operating system is unable to follow will be opened directly. When following a chain of multiple links, this may result in the original link being returned instead of the non-link that prevented full traversal. To obtain stat results for the final path in this case, use the "os.path.realpath()" function to resolve the path name as far as possible and call "lstat()" on the result. This does not apply to dangling symlinks or junction points, which will raise the usual exceptions. Example: >>> import os >>> statinfo = os.stat('somefile.txt') >>> statinfo os.stat_result(st_mode=33188, st_ino=7876932, st_dev=234881026, st_nlink=1, st_uid=501, st_gid=501, st_size=264, st_atime=1297230295, st_mtime=1297230027, st_ctime=1297230027) >>> statinfo.st_size 264 See also: "fstat()" and "lstat()" functions. New in version 3.3: Added the *dir_fd* and *follow_symlinks* arguments, specifying a file descriptor instead of a path. Changed in version 3.6: Accepts a *path-like object*. Changed in version 3.8: On Windows, all reparse points that can be resolved by the operating system are now followed, and passing "follow_symlinks=False" disables following all name surrogate reparse points. If the operating system reaches a reparse point that it is not able to follow, *stat* now returns the information for the original path as if "follow_symlinks=False" had been specified instead of raising an error. class os.stat_result Object whose attributes correspond roughly to the members of the "stat" structure. It is used for the result of "os.stat()", "os.fstat()" and "os.lstat()". Attributes: st_mode File mode: file type and file mode bits (permissions). st_ino Platform dependent, but if non-zero, uniquely identifies the file for a given value of "st_dev". Typically: * the inode number on Unix, * the file index on Windows st_dev Identifier of the device on which this file resides. st_nlink Number of hard links. st_uid User identifier of the file owner. st_gid Group identifier of the file owner. st_size Size of the file in bytes, if it is a regular file or a symbolic link. The size of a symbolic link is the length of the pathname it contains, without a terminating null byte. Timestamps: st_atime Time of most recent access expressed in seconds. st_mtime Time of most recent content modification expressed in seconds. st_ctime Platform dependent: * the time of most recent metadata change on Unix, * the time of creation on Windows, expressed in seconds. st_atime_ns Time of most recent access expressed in nanoseconds as an integer. st_mtime_ns Time of most recent content modification expressed in nanoseconds as an integer. st_ctime_ns Platform dependent: * the time of most recent metadata change on Unix, * the time of creation on Windows, expressed in nanoseconds as an integer. Note: The exact meaning and resolution of the "st_atime", "st_mtime", and "st_ctime" attributes depend on the operating system and the file system. For example, on Windows systems using the FAT or FAT32 file systems, "st_mtime" has 2-second resolution, and "st_atime" has only 1-day resolution. See your operating system documentation for details.Similarly, although "st_atime_ns", "st_mtime_ns", and "st_ctime_ns" are always expressed in nanoseconds, many systems do not provide nanosecond precision. On systems that do provide nanosecond precision, the floating- point object used to store "st_atime", "st_mtime", and "st_ctime" cannot preserve all of it, and as such will be slightly inexact. If you need the exact timestamps you should always use "st_atime_ns", "st_mtime_ns", and "st_ctime_ns". On some Unix systems (such as Linux), the following attributes may also be available: st_blocks Number of 512-byte blocks allocated for file. This may be smaller than "st_size"/512 when the file has holes. st_blksize “Preferred” blocksize for efficient file system I/O. Writing to a file in smaller chunks may cause an inefficient read-modify- rewrite. st_rdev Type of device if an inode device. st_flags User defined flags for file. On other Unix systems (such as FreeBSD), the following attributes may be available (but may be only filled out if root tries to use them): st_gen File generation number. st_birthtime Time of file creation. On Solaris and derivatives, the following attributes may also be available: st_fstype String that uniquely identifies the type of the filesystem that contains the file. On macOS systems, the following attributes may also be available: st_rsize Real size of the file. st_creator Creator of the file. st_type File type. On Windows systems, the following attributes are also available: st_file_attributes Windows file attributes: "dwFileAttributes" member of the "BY_HANDLE_FILE_INFORMATION" structure returned by "GetFileInformationByHandle()". See the "FILE_ATTRIBUTE_*" constants in the "stat" module. st_reparse_tag When "st_file_attributes" has the "FILE_ATTRIBUTE_REPARSE_POINT" set, this field contains the tag identifying the type of reparse point. See the "IO_REPARSE_TAG_*" constants in the "stat" module. The standard module "stat" defines functions and constants that are useful for extracting information from a "stat" structure. (On Windows, some items are filled with dummy values.) For backward compatibility, a "stat_result" instance is also accessible as a tuple of at least 10 integers giving the most important (and portable) members of the "stat" structure, in the order "st_mode", "st_ino", "st_dev", "st_nlink", "st_uid", "st_gid", "st_size", "st_atime", "st_mtime", "st_ctime". More items may be added at the end by some implementations. For compatibility with older Python versions, accessing "stat_result" as a tuple always returns integers. New in version 3.3: Added the "st_atime_ns", "st_mtime_ns", and "st_ctime_ns" members. New in version 3.5: Added the "st_file_attributes" member on Windows. Changed in version 3.5: Windows now returns the file index as "st_ino" when available. New in version 3.7: Added the "st_fstype" member to Solaris/derivatives. New in version 3.8: Added the "st_reparse_tag" member on Windows. Changed in version 3.8: On Windows, the "st_mode" member now identifies special files as "S_IFCHR", "S_IFIFO" or "S_IFBLK" as appropriate. os.statvfs(path) Perform a "statvfs()" system call on the given path. The return value is an object whose attributes describe the filesystem on the given path, and correspond to the members of the "statvfs" structure, namely: "f_bsize", "f_frsize", "f_blocks", "f_bfree", "f_bavail", "f_files", "f_ffree", "f_favail", "f_flag", "f_namemax", "f_fsid". Two module-level constants are defined for the "f_flag" attribute’s bit-flags: if "ST_RDONLY" is set, the filesystem is mounted read- only, and if "ST_NOSUID" is set, the semantics of setuid/setgid bits are disabled or not supported. Additional module-level constants are defined for GNU/glibc based systems. These are "ST_NODEV" (disallow access to device special files), "ST_NOEXEC" (disallow program execution), "ST_SYNCHRONOUS" (writes are synced at once), "ST_MANDLOCK" (allow mandatory locks on an FS), "ST_WRITE" (write on file/directory/symlink), "ST_APPEND" (append-only file), "ST_IMMUTABLE" (immutable file), "ST_NOATIME" (do not update access times), "ST_NODIRATIME" (do not update directory access times), "ST_RELATIME" (update atime relative to mtime/ctime). This function can support specifying a file descriptor. Availability: Unix. Changed in version 3.2: The "ST_RDONLY" and "ST_NOSUID" constants were added. New in version 3.3: Added support for specifying *path* as an open file descriptor. Changed in version 3.4: The "ST_NODEV", "ST_NOEXEC", "ST_SYNCHRONOUS", "ST_MANDLOCK", "ST_WRITE", "ST_APPEND", "ST_IMMUTABLE", "ST_NOATIME", "ST_NODIRATIME", and "ST_RELATIME" constants were added. Changed in version 3.6: Accepts a *path-like object*. New in version 3.7: Added "f_fsid". os.supports_dir_fd A "set" object indicating which functions in the "os" module accept an open file descriptor for their *dir_fd* parameter. Different platforms provide different features, and the underlying functionality Python uses to implement the *dir_fd* parameter is not available on all platforms Python supports. For consistency’s sake, functions that may support *dir_fd* always allow specifying the parameter, but will throw an exception if the functionality is used when it’s not locally available. (Specifying "None" for *dir_fd* is always supported on all platforms.) To check whether a particular function accepts an open file descriptor for its *dir_fd* parameter, use the "in" operator on "supports_dir_fd". As an example, this expression evaluates to "True" if "os.stat()" accepts open file descriptors for *dir_fd* on the local platform: os.stat in os.supports_dir_fd Currently *dir_fd* parameters only work on Unix platforms; none of them work on Windows. New in version 3.3. os.supports_effective_ids A "set" object indicating whether "os.access()" permits specifying "True" for its *effective_ids* parameter on the local platform. (Specifying "False" for *effective_ids* is always supported on all platforms.) If the local platform supports it, the collection will contain "os.access()"; otherwise it will be empty. This expression evaluates to "True" if "os.access()" supports "effective_ids=True" on the local platform: os.access in os.supports_effective_ids Currently *effective_ids* is only supported on Unix platforms; it does not work on Windows. New in version 3.3. os.supports_fd A "set" object indicating which functions in the "os" module permit specifying their *path* parameter as an open file descriptor on the local platform. Different platforms provide different features, and the underlying functionality Python uses to accept open file descriptors as *path* arguments is not available on all platforms Python supports. To determine whether a particular function permits specifying an open file descriptor for its *path* parameter, use the "in" operator on "supports_fd". As an example, this expression evaluates to "True" if "os.chdir()" accepts open file descriptors for *path* on your local platform: os.chdir in os.supports_fd New in version 3.3. os.supports_follow_symlinks A "set" object indicating which functions in the "os" module accept "False" for their *follow_symlinks* parameter on the local platform. Different platforms provide different features, and the underlying functionality Python uses to implement *follow_symlinks* is not available on all platforms Python supports. For consistency’s sake, functions that may support *follow_symlinks* always allow specifying the parameter, but will throw an exception if the functionality is used when it’s not locally available. (Specifying "True" for *follow_symlinks* is always supported on all platforms.) To check whether a particular function accepts "False" for its *follow_symlinks* parameter, use the "in" operator on "supports_follow_symlinks". As an example, this expression evaluates to "True" if you may specify "follow_symlinks=False" when calling "os.stat()" on the local platform: os.stat in os.supports_follow_symlinks New in version 3.3. os.symlink(src, dst, target_is_directory=False, *, dir_fd=None) Create a symbolic link pointing to *src* named *dst*. On Windows, a symlink represents either a file or a directory, and does not morph to the target dynamically. If the target is present, the type of the symlink will be created to match. Otherwise, the symlink will be created as a directory if *target_is_directory* is "True" or a file symlink (the default) otherwise. On non-Windows platforms, *target_is_directory* is ignored. This function can support paths relative to directory descriptors. Note: On newer versions of Windows 10, unprivileged accounts can create symlinks if Developer Mode is enabled. When Developer Mode is not available/enabled, the *SeCreateSymbolicLinkPrivilege* privilege is required, or the process must be run as an administrator."OSError" is raised when the function is called by an unprivileged user. Raises an auditing event "os.symlink" with arguments "src", "dst", "dir_fd". Availability: Unix, Windows. Changed in version 3.2: Added support for Windows 6.0 (Vista) symbolic links. New in version 3.3: Added the *dir_fd* argument, and now allow *target_is_directory* on non-Windows platforms. Changed in version 3.6: Accepts a *path-like object* for *src* and *dst*. Changed in version 3.8: Added support for unelevated symlinks on Windows with Developer Mode. os.sync() Force write of everything to disk. Availability: Unix. New in version 3.3. os.truncate(path, length) Truncate the file corresponding to *path*, so that it is at most *length* bytes in size. This function can support specifying a file descriptor. Raises an auditing event "os.truncate" with arguments "path", "length". Availability: Unix, Windows. New in version 3.3. Changed in version 3.5: Added support for Windows Changed in version 3.6: Accepts a *path-like object*. os.unlink(path, *, dir_fd=None) Remove (delete) the file *path*. This function is semantically identical to "remove()"; the "unlink" name is its traditional Unix name. Please see the documentation for "remove()" for further information. Raises an auditing event "os.remove" with arguments "path", "dir_fd". New in version 3.3: The *dir_fd* parameter. Changed in version 3.6: Accepts a *path-like object*. os.utime(path, times=None, *[, ns], dir_fd=None, follow_symlinks=True) Set the access and modified times of the file specified by *path*. "utime()" takes two optional parameters, *times* and *ns*. These specify the times set on *path* and are used as follows: * If *ns* is specified, it must be a 2-tuple of the form "(atime_ns, mtime_ns)" where each member is an int expressing nanoseconds. * If *times* is not "None", it must be a 2-tuple of the form "(atime, mtime)" where each member is an int or float expressing seconds. * If *times* is "None" and *ns* is unspecified, this is equivalent to specifying "ns=(atime_ns, mtime_ns)" where both times are the current time. It is an error to specify tuples for both *times* and *ns*. Note that the exact times you set here may not be returned by a subsequent "stat()" call, depending on the resolution with which your operating system records access and modification times; see "stat()". The best way to preserve exact times is to use the *st_atime_ns* and *st_mtime_ns* fields from the "os.stat()" result object with the *ns* parameter to *utime*. This function can support specifying a file descriptor, paths relative to directory descriptors and not following symlinks. Raises an auditing event "os.utime" with arguments "path", "times", "ns", "dir_fd". New in version 3.3: Added support for specifying *path* as an open file descriptor, and the *dir_fd*, *follow_symlinks*, and *ns* parameters. Changed in version 3.6: Accepts a *path-like object*. os.walk(top, topdown=True, onerror=None, followlinks=False) Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For each directory in the tree rooted at directory *top* (including *top* itself), it yields a 3-tuple "(dirpath, dirnames, filenames)". *dirpath* is a string, the path to the directory. *dirnames* is a list of the names of the subdirectories in *dirpath* (excluding "'.'" and "'..'"). *filenames* is a list of the names of the non- directory files in *dirpath*. Note that the names in the lists contain no path components. To get a full path (which begins with *top*) to a file or directory in *dirpath*, do "os.path.join(dirpath, name)". Whether or not the lists are sorted depends on the file system. If a file is removed from or added to the *dirpath* directory during generating the lists, whether a name for that file be included is unspecified. If optional argument *topdown* is "True" or not specified, the triple for a directory is generated before the triples for any of its subdirectories (directories are generated top-down). If *topdown* is "False", the triple for a directory is generated after the triples for all of its subdirectories (directories are generated bottom-up). No matter the value of *topdown*, the list of subdirectories is retrieved before the tuples for the directory and its subdirectories are generated. When *topdown* is "True", the caller can modify the *dirnames* list in-place (perhaps using "del" or slice assignment), and "walk()" will only recurse into the subdirectories whose names remain in *dirnames*; this can be used to prune the search, impose a specific order of visiting, or even to inform "walk()" about directories the caller creates or renames before it resumes "walk()" again. Modifying *dirnames* when *topdown* is "False" has no effect on the behavior of the walk, because in bottom-up mode the directories in *dirnames* are generated before *dirpath* itself is generated. By default, errors from the "scandir()" call are ignored. If optional argument *onerror* is specified, it should be a function; it will be called with one argument, an "OSError" instance. It can report the error to continue with the walk, or raise the exception to abort the walk. Note that the filename is available as the "filename" attribute of the exception object. By default, "walk()" will not walk down into symbolic links that resolve to directories. Set *followlinks* to "True" to visit directories pointed to by symlinks, on systems that support them. Note: Be aware that setting *followlinks* to "True" can lead to infinite recursion if a link points to a parent directory of itself. "walk()" does not keep track of the directories it visited already. Note: If you pass a relative pathname, don’t change the current working directory between resumptions of "walk()". "walk()" never changes the current directory, and assumes that its caller doesn’t either. This example displays the number of bytes taken by non-directory files in each directory under the starting directory, except that it doesn’t look under any CVS subdirectory: import os from os.path import join, getsize for root, dirs, files in os.walk('python/Lib/email'): print(root, "consumes", end=" ") print(sum(getsize(join(root, name)) for name in files), end=" ") print("bytes in", len(files), "non-directory files") if 'CVS' in dirs: dirs.remove('CVS') # don't visit CVS directories In the next example (simple implementation of "shutil.rmtree()"), walking the tree bottom-up is essential, "rmdir()" doesn’t allow deleting a directory before the directory is empty: # Delete everything reachable from the directory named in "top", # assuming there are no symbolic links. # CAUTION: This is dangerous! For example, if top == '/', it # could delete all your disk files. import os for root, dirs, files in os.walk(top, topdown=False): for name in files: os.remove(os.path.join(root, name)) for name in dirs: os.rmdir(os.path.join(root, name)) Raises an auditing event "os.walk" with arguments "top", "topdown", "onerror", "followlinks". Changed in version 3.5: This function now calls "os.scandir()" instead of "os.listdir()", making it faster by reducing the number of calls to "os.stat()". Changed in version 3.6: Accepts a *path-like object*. os.fwalk(top='.', topdown=True, onerror=None, *, follow_symlinks=False, dir_fd=None) This behaves exactly like "walk()", except that it yields a 4-tuple "(dirpath, dirnames, filenames, dirfd)", and it supports "dir_fd". *dirpath*, *dirnames* and *filenames* are identical to "walk()" output, and *dirfd* is a file descriptor referring to the directory *dirpath*. This function always supports paths relative to directory descriptors and not following symlinks. Note however that, unlike other functions, the "fwalk()" default value for *follow_symlinks* is "False". Note: Since "fwalk()" yields file descriptors, those are only valid until the next iteration step, so you should duplicate them (e.g. with "dup()") if you want to keep them longer. This example displays the number of bytes taken by non-directory files in each directory under the starting directory, except that it doesn’t look under any CVS subdirectory: import os for root, dirs, files, rootfd in os.fwalk('python/Lib/email'): print(root, "consumes", end="") print(sum([os.stat(name, dir_fd=rootfd).st_size for name in files]), end="") print("bytes in", len(files), "non-directory files") if 'CVS' in dirs: dirs.remove('CVS') # don't visit CVS directories In the next example, walking the tree bottom-up is essential: "rmdir()" doesn’t allow deleting a directory before the directory is empty: # Delete everything reachable from the directory named in "top", # assuming there are no symbolic links. # CAUTION: This is dangerous! For example, if top == '/', it # could delete all your disk files. import os for root, dirs, files, rootfd in os.fwalk(top, topdown=False): for name in files: os.unlink(name, dir_fd=rootfd) for name in dirs: os.rmdir(name, dir_fd=rootfd) Raises an auditing event "os.fwalk" with arguments "top", "topdown", "onerror", "follow_symlinks", "dir_fd". Availability: Unix. New in version 3.3. Changed in version 3.6: Accepts a *path-like object*. Changed in version 3.7: Added support for "bytes" paths. os.memfd_create(name[, flags=os.MFD_CLOEXEC]) Create an anonymous file and return a file descriptor that refers to it. *flags* must be one of the "os.MFD_*" constants available on the system (or a bitwise ORed combination of them). By default, the new file descriptor is non-inheritable. The name supplied in *name* is used as a filename and will be displayed as the target of the corresponding symbolic link in the directory "/proc/self/fd/". The displayed name is always prefixed with "memfd:" and serves only for debugging purposes. Names do not affect the behavior of the file descriptor, and as such multiple files can have the same name without any side effects. Availability: Linux 3.17 or newer with glibc 2.27 or newer. New in version 3.8. os.MFD_CLOEXEC os.MFD_ALLOW_SEALING os.MFD_HUGETLB os.MFD_HUGE_SHIFT os.MFD_HUGE_MASK os.MFD_HUGE_64KB os.MFD_HUGE_512KB os.MFD_HUGE_1MB os.MFD_HUGE_2MB os.MFD_HUGE_8MB os.MFD_HUGE_16MB os.MFD_HUGE_32MB os.MFD_HUGE_256MB os.MFD_HUGE_512MB os.MFD_HUGE_1GB os.MFD_HUGE_2GB os.MFD_HUGE_16GB These flags can be passed to "memfd_create()". Availability: Linux 3.17 or newer with glibc 2.27 or newer. The "MFD_HUGE*" flags are only available since Linux 4.14. New in version 3.8. Linux extended attributes ------------------------- New in version 3.3. These functions are all available on Linux only. os.getxattr(path, attribute, *, follow_symlinks=True) Return the value of the extended filesystem attribute *attribute* for *path*. *attribute* can be bytes or str (directly or indirectly through the "PathLike" interface). If it is str, it is encoded with the filesystem encoding. This function can support specifying a file descriptor and not following symlinks. Raises an auditing event "os.getxattr" with arguments "path", "attribute". Changed in version 3.6: Accepts a *path-like object* for *path* and *attribute*. os.listxattr(path=None, *, follow_symlinks=True) Return a list of the extended filesystem attributes on *path*. The attributes in the list are represented as strings decoded with the filesystem encoding. If *path* is "None", "listxattr()" will examine the current directory. This function can support specifying a file descriptor and not following symlinks. Raises an auditing event "os.listxattr" with argument "path". Changed in version 3.6: Accepts a *path-like object*. os.removexattr(path, attribute, *, follow_symlinks=True) Removes the extended filesystem attribute *attribute* from *path*. *attribute* should be bytes or str (directly or indirectly through the "PathLike" interface). If it is a string, it is encoded with the filesystem encoding. This function can support specifying a file descriptor and not following symlinks. Raises an auditing event "os.removexattr" with arguments "path", "attribute". Changed in version 3.6: Accepts a *path-like object* for *path* and *attribute*. os.setxattr(path, attribute, value, flags=0, *, follow_symlinks=True) Set the extended filesystem attribute *attribute* on *path* to *value*. *attribute* must be a bytes or str with no embedded NULs (directly or indirectly through the "PathLike" interface). If it is a str, it is encoded with the filesystem encoding. *flags* may be "XATTR_REPLACE" or "XATTR_CREATE". If "XATTR_REPLACE" is given and the attribute does not exist, "ENODATA" will be raised. If "XATTR_CREATE" is given and the attribute already exists, the attribute will not be created and "EEXISTS" will be raised. This function can support specifying a file descriptor and not following symlinks. Note: A bug in Linux kernel versions less than 2.6.39 caused the flags argument to be ignored on some filesystems. Raises an auditing event "os.setxattr" with arguments "path", "attribute", "value", "flags". Changed in version 3.6: Accepts a *path-like object* for *path* and *attribute*. os.XATTR_SIZE_MAX The maximum size the value of an extended attribute can be. Currently, this is 64 KiB on Linux. os.XATTR_CREATE This is a possible value for the flags argument in "setxattr()". It indicates the operation must create an attribute. os.XATTR_REPLACE This is a possible value for the flags argument in "setxattr()". It indicates the operation must replace an existing attribute. Process Management ================== These functions may be used to create and manage processes. The various "exec*" functions take a list of arguments for the new program loaded into the process. In each case, the first of these arguments is passed to the new program as its own name rather than as an argument a user may have typed on a command line. For the C programmer, this is the "argv[0]" passed to a program’s "main()". For example, "os.execv('/bin/echo', ['foo', 'bar'])" will only print "bar" on standard output; "foo" will seem to be ignored. os.abort() Generate a "SIGABRT" signal to the current process. On Unix, the default behavior is to produce a core dump; on Windows, the process immediately returns an exit code of "3". Be aware that calling this function will not call the Python signal handler registered for "SIGABRT" with "signal.signal()". os.add_dll_directory(path) Add a path to the DLL search path. This search path is used when resolving dependencies for imported extension modules (the module itself is resolved through "sys.path"), and also by "ctypes". Remove the directory by calling **close()** on the returned object or using it in a "with" statement. See the Microsoft documentation for more information about how DLLs are loaded. Raises an auditing event "os.add_dll_directory" with argument "path". Availability: Windows. New in version 3.8: Previous versions of CPython would resolve DLLs using the default behavior for the current process. This led to inconsistencies, such as only sometimes searching "PATH" or the current working directory, and OS functions such as "AddDllDirectory" having no effect.In 3.8, the two primary ways DLLs are loaded now explicitly override the process-wide behavior to ensure consistency. See the porting notes for information on updating libraries. os.execl(path, arg0, arg1, ...) os.execle(path, arg0, arg1, ..., env) os.execlp(file, arg0, arg1, ...) os.execlpe(file, arg0, arg1, ..., env) os.execv(path, args) os.execve(path, args, env) os.execvp(file, args) os.execvpe(file, args, env) These functions all execute a new program, replacing the current process; they do not return. On Unix, the new executable is loaded into the current process, and will have the same process id as the caller. Errors will be reported as "OSError" exceptions. The current process is replaced immediately. Open file objects and descriptors are not flushed, so if there may be data buffered on these open files, you should flush them using "sys.stdout.flush()" or "os.fsync()" before calling an "exec*" function. The “l” and “v” variants of the "exec*" functions differ in how command-line arguments are passed. The “l” variants are perhaps the easiest to work with if the number of parameters is fixed when the code is written; the individual parameters simply become additional parameters to the "execl*()" functions. The “v” variants are good when the number of parameters is variable, with the arguments being passed in a list or tuple as the *args* parameter. In either case, the arguments to the child process should start with the name of the command being run, but this is not enforced. The variants which include a “p” near the end ("execlp()", "execlpe()", "execvp()", and "execvpe()") will use the "PATH" environment variable to locate the program *file*. When the environment is being replaced (using one of the "exec*e" variants, discussed in the next paragraph), the new environment is used as the source of the "PATH" variable. The other variants, "execl()", "execle()", "execv()", and "execve()", will not use the "PATH" variable to locate the executable; *path* must contain an appropriate absolute or relative path. For "execle()", "execlpe()", "execve()", and "execvpe()" (note that these all end in “e”), the *env* parameter must be a mapping which is used to define the environment variables for the new process (these are used instead of the current process’ environment); the functions "execl()", "execlp()", "execv()", and "execvp()" all cause the new process to inherit the environment of the current process. For "execve()" on some platforms, *path* may also be specified as an open file descriptor. This functionality may not be supported on your platform; you can check whether or not it is available using "os.supports_fd". If it is unavailable, using it will raise a "NotImplementedError". Raises an auditing event "os.exec" with arguments "path", "args", "env". Availability: Unix, Windows. New in version 3.3: Added support for specifying *path* as an open file descriptor for "execve()". Changed in version 3.6: Accepts a *path-like object*. os._exit(n) Exit the process with status *n*, without calling cleanup handlers, flushing stdio buffers, etc. Note: The standard way to exit is "sys.exit(n)". "_exit()" should normally only be used in the child process after a "fork()". The following exit codes are defined and can be used with "_exit()", although they are not required. These are typically used for system programs written in Python, such as a mail server’s external command delivery program. Note: Some of these may not be available on all Unix platforms, since there is some variation. These constants are defined where they are defined by the underlying platform. os.EX_OK Exit code that means no error occurred. Availability: Unix. os.EX_USAGE Exit code that means the command was used incorrectly, such as when the wrong number of arguments are given. Availability: Unix. os.EX_DATAERR Exit code that means the input data was incorrect. Availability: Unix. os.EX_NOINPUT Exit code that means an input file did not exist or was not readable. Availability: Unix. os.EX_NOUSER Exit code that means a specified user did not exist. Availability: Unix. os.EX_NOHOST Exit code that means a specified host did not exist. Availability: Unix. os.EX_UNAVAILABLE Exit code that means that a required service is unavailable. Availability: Unix. os.EX_SOFTWARE Exit code that means an internal software error was detected. Availability: Unix. os.EX_OSERR Exit code that means an operating system error was detected, such as the inability to fork or create a pipe. Availability: Unix. os.EX_OSFILE Exit code that means some system file did not exist, could not be opened, or had some other kind of error. Availability: Unix. os.EX_CANTCREAT Exit code that means a user specified output file could not be created. Availability: Unix. os.EX_IOERR Exit code that means that an error occurred while doing I/O on some file. Availability: Unix. os.EX_TEMPFAIL Exit code that means a temporary failure occurred. This indicates something that may not really be an error, such as a network connection that couldn’t be made during a retryable operation. Availability: Unix. os.EX_PROTOCOL Exit code that means that a protocol exchange was illegal, invalid, or not understood. Availability: Unix. os.EX_NOPERM Exit code that means that there were insufficient permissions to perform the operation (but not intended for file system problems). Availability: Unix. os.EX_CONFIG Exit code that means that some kind of configuration error occurred. Availability: Unix. os.EX_NOTFOUND Exit code that means something like “an entry was not found”. Availability: Unix. os.fork() Fork a child process. Return "0" in the child and the child’s process id in the parent. If an error occurs "OSError" is raised. Note that some platforms including FreeBSD <= 6.3 and Cygwin have known issues when using "fork()" from a thread. Raises an auditing event "os.fork" with no arguments. Changed in version 3.8: Calling "fork()" in a subinterpreter is no longer supported ("RuntimeError" is raised). Warning: See "ssl" for applications that use the SSL module with fork(). Availability: Unix. os.forkpty() Fork a child process, using a new pseudo-terminal as the child’s controlling terminal. Return a pair of "(pid, fd)", where *pid* is "0" in the child, the new child’s process id in the parent, and *fd* is the file descriptor of the master end of the pseudo- terminal. For a more portable approach, use the "pty" module. If an error occurs "OSError" is raised. Raises an auditing event "os.forkpty" with no arguments. Changed in version 3.8: Calling "forkpty()" in a subinterpreter is no longer supported ("RuntimeError" is raised). Availability: some flavors of Unix. os.kill(pid, sig) Send signal *sig* to the process *pid*. Constants for the specific signals available on the host platform are defined in the "signal" module. Windows: The "signal.CTRL_C_EVENT" and "signal.CTRL_BREAK_EVENT" signals are special signals which can only be sent to console processes which share a common console window, e.g., some subprocesses. Any other value for *sig* will cause the process to be unconditionally killed by the TerminateProcess API, and the exit code will be set to *sig*. The Windows version of "kill()" additionally takes process handles to be killed. See also "signal.pthread_kill()". Raises an auditing event "os.kill" with arguments "pid", "sig". New in version 3.2: Windows support. os.killpg(pgid, sig) Send the signal *sig* to the process group *pgid*. Raises an auditing event "os.killpg" with arguments "pgid", "sig". Availability: Unix. os.nice(increment) Add *increment* to the process’s “niceness”. Return the new niceness. Availability: Unix. os.pidfd_open(pid, flags=0) Return a file descriptor referring to the process *pid*. This descriptor can be used to perform process management without races and signals. The *flags* argument is provided for future extensions; no flag values are currently defined. See the *pidfd_open(2)* man page for more details. Availability: Linux 5.3+ New in version 3.9. os.plock(op) Lock program segments into memory. The value of *op* (defined in "") determines which segments are locked. Availability: Unix. os.popen(cmd, mode='r', buffering=-1) Open a pipe to or from command *cmd*. The return value is an open file object connected to the pipe, which can be read or written depending on whether *mode* is "'r'" (default) or "'w'". The *buffering* argument has the same meaning as the corresponding argument to the built-in "open()" function. The returned file object reads or writes text strings rather than bytes. The "close" method returns "None" if the subprocess exited successfully, or the subprocess’s return code if there was an error. On POSIX systems, if the return code is positive it represents the return value of the process left-shifted by one byte. If the return code is negative, the process was terminated by the signal given by the negated value of the return code. (For example, the return value might be "- signal.SIGKILL" if the subprocess was killed.) On Windows systems, the return value contains the signed integer return code from the child process. On Unix, "waitstatus_to_exitcode()" can be used to convert the "close" method result (exit status) into an exit code if it is not "None". On Windows, the "close" method result is directly the exit code (or "None"). This is implemented using "subprocess.Popen"; see that class’s documentation for more powerful ways to manage and communicate with subprocesses. os.posix_spawn(path, argv, env, *, file_actions=None, setpgroup=None, resetids=False, setsid=False, setsigmask=(), setsigdef=(), scheduler=None) Wraps the "posix_spawn()" C library API for use from Python. Most users should use "subprocess.run()" instead of "posix_spawn()". The positional-only arguments *path*, *args*, and *env* are similar to "execve()". The *path* parameter is the path to the executable file. The *path* should contain a directory. Use "posix_spawnp()" to pass an executable file without directory. The *file_actions* argument may be a sequence of tuples describing actions to take on specific file descriptors in the child process between the C library implementation’s "fork()" and "exec()" steps. The first item in each tuple must be one of the three type indicator listed below describing the remaining tuple elements: os.POSIX_SPAWN_OPEN ("os.POSIX_SPAWN_OPEN", *fd*, *path*, *flags*, *mode*) Performs "os.dup2(os.open(path, flags, mode), fd)". os.POSIX_SPAWN_CLOSE ("os.POSIX_SPAWN_CLOSE", *fd*) Performs "os.close(fd)". os.POSIX_SPAWN_DUP2 ("os.POSIX_SPAWN_DUP2", *fd*, *new_fd*) Performs "os.dup2(fd, new_fd)". These tuples correspond to the C library "posix_spawn_file_actions_addopen()", "posix_spawn_file_actions_addclose()", and "posix_spawn_file_actions_adddup2()" API calls used to prepare for the "posix_spawn()" call itself. The *setpgroup* argument will set the process group of the child to the value specified. If the value specified is 0, the child’s process group ID will be made the same as its process ID. If the value of *setpgroup* is not set, the child will inherit the parent’s process group ID. This argument corresponds to the C library "POSIX_SPAWN_SETPGROUP" flag. If the *resetids* argument is "True" it will reset the effective UID and GID of the child to the real UID and GID of the parent process. If the argument is "False", then the child retains the effective UID and GID of the parent. In either case, if the set- user-ID and set-group-ID permission bits are enabled on the executable file, their effect will override the setting of the effective UID and GID. This argument corresponds to the C library "POSIX_SPAWN_RESETIDS" flag. If the *setsid* argument is "True", it will create a new session ID for *posix_spawn*. *setsid* requires "POSIX_SPAWN_SETSID" or "POSIX_SPAWN_SETSID_NP" flag. Otherwise, "NotImplementedError" is raised. The *setsigmask* argument will set the signal mask to the signal set specified. If the parameter is not used, then the child inherits the parent’s signal mask. This argument corresponds to the C library "POSIX_SPAWN_SETSIGMASK" flag. The *sigdef* argument will reset the disposition of all signals in the set specified. This argument corresponds to the C library "POSIX_SPAWN_SETSIGDEF" flag. The *scheduler* argument must be a tuple containing the (optional) scheduler policy and an instance of "sched_param" with the scheduler parameters. A value of "None" in the place of the scheduler policy indicates that is not being provided. This argument is a combination of the C library "POSIX_SPAWN_SETSCHEDPARAM" and "POSIX_SPAWN_SETSCHEDULER" flags. Raises an auditing event "os.posix_spawn" with arguments "path", "argv", "env". New in version 3.8. Availability: Unix. os.posix_spawnp(path, argv, env, *, file_actions=None, setpgroup=None, resetids=False, setsid=False, setsigmask=(), setsigdef=(), scheduler=None) Wraps the "posix_spawnp()" C library API for use from Python. Similar to "posix_spawn()" except that the system searches for the *executable* file in the list of directories specified by the "PATH" environment variable (in the same way as for "execvp(3)"). Raises an auditing event "os.posix_spawn" with arguments "path", "argv", "env". New in version 3.8. Availability: See "posix_spawn()" documentation. os.register_at_fork(*, before=None, after_in_parent=None, after_in_child=None) Register callables to be executed when a new child process is forked using "os.fork()" or similar process cloning APIs. The parameters are optional and keyword-only. Each specifies a different call point. * *before* is a function called before forking a child process. * *after_in_parent* is a function called from the parent process after forking a child process. * *after_in_child* is a function called from the child process. These calls are only made if control is expected to return to the Python interpreter. A typical "subprocess" launch will not trigger them as the child is not going to re-enter the interpreter. Functions registered for execution before forking are called in reverse registration order. Functions registered for execution after forking (either in the parent or in the child) are called in registration order. Note that "fork()" calls made by third-party C code may not call those functions, unless it explicitly calls "PyOS_BeforeFork()", "PyOS_AfterFork_Parent()" and "PyOS_AfterFork_Child()". There is no way to unregister a function. Availability: Unix. New in version 3.7. os.spawnl(mode, path, ...) os.spawnle(mode, path, ..., env) os.spawnlp(mode, file, ...) os.spawnlpe(mode, file, ..., env) os.spawnv(mode, path, args) os.spawnve(mode, path, args, env) os.spawnvp(mode, file, args) os.spawnvpe(mode, file, args, env) Execute the program *path* in a new process. (Note that the "subprocess" module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using these functions. Check especially the Replacing Older Functions with the subprocess Module section.) If *mode* is "P_NOWAIT", this function returns the process id of the new process; if *mode* is "P_WAIT", returns the process’s exit code if it exits normally, or "-signal", where *signal* is the signal that killed the process. On Windows, the process id will actually be the process handle, so can be used with the "waitpid()" function. Note on VxWorks, this function doesn’t return "-signal" when the new process is killed. Instead it raises OSError exception. The “l” and “v” variants of the "spawn*" functions differ in how command-line arguments are passed. The “l” variants are perhaps the easiest to work with if the number of parameters is fixed when the code is written; the individual parameters simply become additional parameters to the "spawnl*()" functions. The “v” variants are good when the number of parameters is variable, with the arguments being passed in a list or tuple as the *args* parameter. In either case, the arguments to the child process must start with the name of the command being run. The variants which include a second “p” near the end ("spawnlp()", "spawnlpe()", "spawnvp()", and "spawnvpe()") will use the "PATH" environment variable to locate the program *file*. When the environment is being replaced (using one of the "spawn*e" variants, discussed in the next paragraph), the new environment is used as the source of the "PATH" variable. The other variants, "spawnl()", "spawnle()", "spawnv()", and "spawnve()", will not use the "PATH" variable to locate the executable; *path* must contain an appropriate absolute or relative path. For "spawnle()", "spawnlpe()", "spawnve()", and "spawnvpe()" (note that these all end in “e”), the *env* parameter must be a mapping which is used to define the environment variables for the new process (they are used instead of the current process’ environment); the functions "spawnl()", "spawnlp()", "spawnv()", and "spawnvp()" all cause the new process to inherit the environment of the current process. Note that keys and values in the *env* dictionary must be strings; invalid keys or values will cause the function to fail, with a return value of "127". As an example, the following calls to "spawnlp()" and "spawnvpe()" are equivalent: import os os.spawnlp(os.P_WAIT, 'cp', 'cp', 'index.html', '/dev/null') L = ['cp', 'index.html', '/dev/null'] os.spawnvpe(os.P_WAIT, 'cp', L, os.environ) Raises an auditing event "os.spawn" with arguments "mode", "path", "args", "env". Availability: Unix, Windows. "spawnlp()", "spawnlpe()", "spawnvp()" and "spawnvpe()" are not available on Windows. "spawnle()" and "spawnve()" are not thread-safe on Windows; we advise you to use the "subprocess" module instead. Changed in version 3.6: Accepts a *path-like object*. os.P_NOWAIT os.P_NOWAITO Possible values for the *mode* parameter to the "spawn*" family of functions. If either of these values is given, the "spawn*()" functions will return as soon as the new process has been created, with the process id as the return value. Availability: Unix, Windows. os.P_WAIT Possible value for the *mode* parameter to the "spawn*" family of functions. If this is given as *mode*, the "spawn*()" functions will not return until the new process has run to completion and will return the exit code of the process the run is successful, or "-signal" if a signal kills the process. Availability: Unix, Windows. os.P_DETACH os.P_OVERLAY Possible values for the *mode* parameter to the "spawn*" family of functions. These are less portable than those listed above. "P_DETACH" is similar to "P_NOWAIT", but the new process is detached from the console of the calling process. If "P_OVERLAY" is used, the current process will be replaced; the "spawn*" function will not return. Availability: Windows. os.startfile(path[, operation]) Start a file with its associated application. When *operation* is not specified or "'open'", this acts like double-clicking the file in Windows Explorer, or giving the file name as an argument to the **start** command from the interactive command shell: the file is opened with whatever application (if any) its extension is associated. When another *operation* is given, it must be a “command verb” that specifies what should be done with the file. Common verbs documented by Microsoft are "'print'" and "'edit'" (to be used on files) as well as "'explore'" and "'find'" (to be used on directories). "startfile()" returns as soon as the associated application is launched. There is no option to wait for the application to close, and no way to retrieve the application’s exit status. The *path* parameter is relative to the current directory. If you want to use an absolute path, make sure the first character is not a slash ("'/'"); the underlying Win32 "ShellExecute()" function doesn’t work if it is. Use the "os.path.normpath()" function to ensure that the path is properly encoded for Win32. To reduce interpreter startup overhead, the Win32 "ShellExecute()" function is not resolved until this function is first called. If the function cannot be resolved, "NotImplementedError" will be raised. Raises an auditing event "os.startfile" with arguments "path", "operation". Availability: Windows. os.system(command) Execute the command (a string) in a subshell. This is implemented by calling the Standard C function "system()", and has the same limitations. Changes to "sys.stdin", etc. are not reflected in the environment of the executed command. If *command* generates any output, it will be sent to the interpreter standard output stream. The C standard does not specify the meaning of the return value of the C function, so the return value of the Python function is system-dependent. On Unix, the return value is the exit status of the process encoded in the format specified for "wait()". On Windows, the return value is that returned by the system shell after running *command*. The shell is given by the Windows environment variable "COMSPEC": it is usually **cmd.exe**, which returns the exit status of the command run; on systems using a non- native shell, consult your shell documentation. The "subprocess" module provides more powerful facilities for spawning new processes and retrieving their results; using that module is preferable to using this function. See the Replacing Older Functions with the subprocess Module section in the "subprocess" documentation for some helpful recipes. On Unix, "waitstatus_to_exitcode()" can be used to convert the result (exit status) into an exit code. On Windows, the result is directly the exit code. Raises an auditing event "os.system" with argument "command". Availability: Unix, Windows. os.times() Returns the current global process times. The return value is an object with five attributes: * "user" - user time * "system" - system time * "children_user" - user time of all child processes * "children_system" - system time of all child processes * "elapsed" - elapsed real time since a fixed point in the past For backwards compatibility, this object also behaves like a five- tuple containing "user", "system", "children_user", "children_system", and "elapsed" in that order. See the Unix manual page *times(2)* and *times(3)* manual page on Unix or the GetProcessTimes MSDN on Windows. On Windows, only "user" and "system" are known; the other attributes are zero. Availability: Unix, Windows. Changed in version 3.3: Return type changed from a tuple to a tuple-like object with named attributes. os.wait() Wait for completion of a child process, and return a tuple containing its pid and exit status indication: a 16-bit number, whose low byte is the signal number that killed the process, and whose high byte is the exit status (if the signal number is zero); the high bit of the low byte is set if a core file was produced. "waitstatus_to_exitcode()" can be used to convert the exit status into an exit code. Availability: Unix. See also: "waitpid()" can be used to wait for the completion of a specific child process and has more options. os.waitid(idtype, id, options) Wait for the completion of one or more child processes. *idtype* can be "P_PID", "P_PGID", "P_ALL", or "P_PIDFD" on Linux. *id* specifies the pid to wait on. *options* is constructed from the ORing of one or more of "WEXITED", "WSTOPPED" or "WCONTINUED" and additionally may be ORed with "WNOHANG" or "WNOWAIT". The return value is an object representing the data contained in the "siginfo_t" structure, namely: "si_pid", "si_uid", "si_signo", "si_status", "si_code" or "None" if "WNOHANG" is specified and there are no children in a waitable state. Availability: Unix. New in version 3.3. os.P_PID os.P_PGID os.P_ALL These are the possible values for *idtype* in "waitid()". They affect how *id* is interpreted. Availability: Unix. New in version 3.3. os.P_PIDFD This is a Linux-specific *idtype* that indicates that *id* is a file descriptor that refers to a process. Availability: Linux 5.4+ New in version 3.9. os.WEXITED os.WSTOPPED os.WNOWAIT Flags that can be used in *options* in "waitid()" that specify what child signal to wait for. Availability: Unix. New in version 3.3. os.CLD_EXITED os.CLD_KILLED os.CLD_DUMPED os.CLD_TRAPPED os.CLD_STOPPED os.CLD_CONTINUED These are the possible values for "si_code" in the result returned by "waitid()". Availability: Unix. New in version 3.3. Changed in version 3.9: Added "CLD_KILLED" and "CLD_STOPPED" values. os.waitpid(pid, options) The details of this function differ on Unix and Windows. On Unix: Wait for completion of a child process given by process id *pid*, and return a tuple containing its process id and exit status indication (encoded as for "wait()"). The semantics of the call are affected by the value of the integer *options*, which should be "0" for normal operation. If *pid* is greater than "0", "waitpid()" requests status information for that specific process. If *pid* is "0", the request is for the status of any child in the process group of the current process. If *pid* is "-1", the request pertains to any child of the current process. If *pid* is less than "-1", status is requested for any process in the process group "-pid" (the absolute value of *pid*). An "OSError" is raised with the value of errno when the syscall returns -1. On Windows: Wait for completion of a process given by process handle *pid*, and return a tuple containing *pid*, and its exit status shifted left by 8 bits (shifting makes cross-platform use of the function easier). A *pid* less than or equal to "0" has no special meaning on Windows, and raises an exception. The value of integer *options* has no effect. *pid* can refer to any process whose id is known, not necessarily a child process. The "spawn*" functions called with "P_NOWAIT" return suitable process handles. "waitstatus_to_exitcode()" can be used to convert the exit status into an exit code. Changed in version 3.5: If the system call is interrupted and the signal handler does not raise an exception, the function now retries the system call instead of raising an "InterruptedError" exception (see **PEP 475** for the rationale). os.wait3(options) Similar to "waitpid()", except no process id argument is given and a 3-element tuple containing the child’s process id, exit status indication, and resource usage information is returned. Refer to "resource"."getrusage()" for details on resource usage information. The option argument is the same as that provided to "waitpid()" and "wait4()". "waitstatus_to_exitcode()" can be used to convert the exit status into an exitcode. Availability: Unix. os.wait4(pid, options) Similar to "waitpid()", except a 3-element tuple, containing the child’s process id, exit status indication, and resource usage information is returned. Refer to "resource"."getrusage()" for details on resource usage information. The arguments to "wait4()" are the same as those provided to "waitpid()". "waitstatus_to_exitcode()" can be used to convert the exit status into an exitcode. Availability: Unix. os.waitstatus_to_exitcode(status) Convert a wait status to an exit code. On Unix: * If the process exited normally (if "WIFEXITED(status)" is true), return the process exit status (return "WEXITSTATUS(status)"): result greater than or equal to 0. * If the process was terminated by a signal (if "WIFSIGNALED(status)" is true), return "-signum" where *signum* is the number of the signal that caused the process to terminate (return "-WTERMSIG(status)"): result less than 0. * Otherwise, raise a "ValueError". On Windows, return *status* shifted right by 8 bits. On Unix, if the process is being traced or if "waitpid()" was called with "WUNTRACED" option, the caller must first check if "WIFSTOPPED(status)" is true. This function must not be called if "WIFSTOPPED(status)" is true. See also: "WIFEXITED()", "WEXITSTATUS()", "WIFSIGNALED()", "WTERMSIG()", "WIFSTOPPED()", "WSTOPSIG()" functions. New in version 3.9. os.WNOHANG The option for "waitpid()" to return immediately if no child process status is available immediately. The function returns "(0, 0)" in this case. Availability: Unix. os.WCONTINUED This option causes child processes to be reported if they have been continued from a job control stop since their status was last reported. Availability: some Unix systems. os.WUNTRACED This option causes child processes to be reported if they have been stopped but their current state has not been reported since they were stopped. Availability: Unix. The following functions take a process status code as returned by "system()", "wait()", or "waitpid()" as a parameter. They may be used to determine the disposition of a process. os.WCOREDUMP(status) Return "True" if a core dump was generated for the process, otherwise return "False". This function should be employed only if "WIFSIGNALED()" is true. Availability: Unix. os.WIFCONTINUED(status) Return "True" if a stopped child has been resumed by delivery of "SIGCONT" (if the process has been continued from a job control stop), otherwise return "False". See "WCONTINUED" option. Availability: Unix. os.WIFSTOPPED(status) Return "True" if the process was stopped by delivery of a signal, otherwise return "False". "WIFSTOPPED()" only returns "True" if the "waitpid()" call was done using "WUNTRACED" option or when the process is being traced (see *ptrace(2)*). Availability: Unix. os.WIFSIGNALED(status) Return "True" if the process was terminated by a signal, otherwise return "False". Availability: Unix. os.WIFEXITED(status) Return "True" if the process exited terminated normally, that is, by calling "exit()" or "_exit()", or by returning from "main()"; otherwise return "False". Availability: Unix. os.WEXITSTATUS(status) Return the process exit status. This function should be employed only if "WIFEXITED()" is true. Availability: Unix. os.WSTOPSIG(status) Return the signal which caused the process to stop. This function should be employed only if "WIFSTOPPED()" is true. Availability: Unix. os.WTERMSIG(status) Return the number of the signal that caused the process to terminate. This function should be employed only if "WIFSIGNALED()" is true. Availability: Unix. Interface to the scheduler ========================== These functions control how a process is allocated CPU time by the operating system. They are only available on some Unix platforms. For more detailed information, consult your Unix manpages. New in version 3.3. The following scheduling policies are exposed if they are supported by the operating system. os.SCHED_OTHER The default scheduling policy. os.SCHED_BATCH Scheduling policy for CPU-intensive processes that tries to preserve interactivity on the rest of the computer. os.SCHED_IDLE Scheduling policy for extremely low priority background tasks. os.SCHED_SPORADIC Scheduling policy for sporadic server programs. os.SCHED_FIFO A First In First Out scheduling policy. os.SCHED_RR A round-robin scheduling policy. os.SCHED_RESET_ON_FORK This flag can be OR’ed with any other scheduling policy. When a process with this flag set forks, its child’s scheduling policy and priority are reset to the default. class os.sched_param(sched_priority) This class represents tunable scheduling parameters used in "sched_setparam()", "sched_setscheduler()", and "sched_getparam()". It is immutable. At the moment, there is only one possible parameter: sched_priority The scheduling priority for a scheduling policy. os.sched_get_priority_min(policy) Get the minimum priority value for *policy*. *policy* is one of the scheduling policy constants above. os.sched_get_priority_max(policy) Get the maximum priority value for *policy*. *policy* is one of the scheduling policy constants above. os.sched_setscheduler(pid, policy, param) Set the scheduling policy for the process with PID *pid*. A *pid* of 0 means the calling process. *policy* is one of the scheduling policy constants above. *param* is a "sched_param" instance. os.sched_getscheduler(pid) Return the scheduling policy for the process with PID *pid*. A *pid* of 0 means the calling process. The result is one of the scheduling policy constants above. os.sched_setparam(pid, param) Set the scheduling parameters for the process with PID *pid*. A *pid* of 0 means the calling process. *param* is a "sched_param" instance. os.sched_getparam(pid) Return the scheduling parameters as a "sched_param" instance for the process with PID *pid*. A *pid* of 0 means the calling process. os.sched_rr_get_interval(pid) Return the round-robin quantum in seconds for the process with PID *pid*. A *pid* of 0 means the calling process. os.sched_yield() Voluntarily relinquish the CPU. os.sched_setaffinity(pid, mask) Restrict the process with PID *pid* (or the current process if zero) to a set of CPUs. *mask* is an iterable of integers representing the set of CPUs to which the process should be restricted. os.sched_getaffinity(pid) Return the set of CPUs the process with PID *pid* (or the current process if zero) is restricted to. Miscellaneous System Information ================================ os.confstr(name) Return string-valued system configuration values. *name* specifies the configuration value to retrieve; it may be a string which is the name of a defined system value; these names are specified in a number of standards (POSIX, Unix 95, Unix 98, and others). Some platforms define additional names as well. The names known to the host operating system are given as the keys of the "confstr_names" dictionary. For configuration variables not included in that mapping, passing an integer for *name* is also accepted. If the configuration value specified by *name* isn’t defined, "None" is returned. If *name* is a string and is not known, "ValueError" is raised. If a specific value for *name* is not supported by the host system, even if it is included in "confstr_names", an "OSError" is raised with "errno.EINVAL" for the error number. Availability: Unix. os.confstr_names Dictionary mapping names accepted by "confstr()" to the integer values defined for those names by the host operating system. This can be used to determine the set of names known to the system. Availability: Unix. os.cpu_count() Return the number of CPUs in the system. Returns "None" if undetermined. This number is not equivalent to the number of CPUs the current process can use. The number of usable CPUs can be obtained with "len(os.sched_getaffinity(0))" New in version 3.4. os.getloadavg() Return the number of processes in the system run queue averaged over the last 1, 5, and 15 minutes or raises "OSError" if the load average was unobtainable. Availability: Unix. os.sysconf(name) Return integer-valued system configuration values. If the configuration value specified by *name* isn’t defined, "-1" is returned. The comments regarding the *name* parameter for "confstr()" apply here as well; the dictionary that provides information on the known names is given by "sysconf_names". Availability: Unix. os.sysconf_names Dictionary mapping names accepted by "sysconf()" to the integer values defined for those names by the host operating system. This can be used to determine the set of names known to the system. Availability: Unix. The following data values are used to support path manipulation operations. These are defined for all platforms. Higher-level operations on pathnames are defined in the "os.path" module. os.curdir The constant string used by the operating system to refer to the current directory. This is "'.'" for Windows and POSIX. Also available via "os.path". os.pardir The constant string used by the operating system to refer to the parent directory. This is "'..'" for Windows and POSIX. Also available via "os.path". os.sep The character used by the operating system to separate pathname components. This is "'/'" for POSIX and "'\\'" for Windows. Note that knowing this is not sufficient to be able to parse or concatenate pathnames — use "os.path.split()" and "os.path.join()" — but it is occasionally useful. Also available via "os.path". os.altsep An alternative character used by the operating system to separate pathname components, or "None" if only one separator character exists. This is set to "'/'" on Windows systems where "sep" is a backslash. Also available via "os.path". os.extsep The character which separates the base filename from the extension; for example, the "'.'" in "os.py". Also available via "os.path". os.pathsep The character conventionally used by the operating system to separate search path components (as in "PATH"), such as "':'" for POSIX or "';'" for Windows. Also available via "os.path". os.defpath The default search path used by "exec*p*" and "spawn*p*" if the environment doesn’t have a "'PATH'" key. Also available via "os.path". os.linesep The string used to separate (or, rather, terminate) lines on the current platform. This may be a single character, such as "'\n'" for POSIX, or multiple characters, for example, "'\r\n'" for Windows. Do not use *os.linesep* as a line terminator when writing files opened in text mode (the default); use a single "'\n'" instead, on all platforms. os.devnull The file path of the null device. For example: "'/dev/null'" for POSIX, "'nul'" for Windows. Also available via "os.path". os.RTLD_LAZY os.RTLD_NOW os.RTLD_GLOBAL os.RTLD_LOCAL os.RTLD_NODELETE os.RTLD_NOLOAD os.RTLD_DEEPBIND Flags for use with the "setdlopenflags()" and "getdlopenflags()" functions. See the Unix manual page *dlopen(3)* for what the different flags mean. New in version 3.3. Random numbers ============== os.getrandom(size, flags=0) Get up to *size* random bytes. The function can return less bytes than requested. These bytes can be used to seed user-space random number generators or for cryptographic purposes. "getrandom()" relies on entropy gathered from device drivers and other sources of environmental noise. Unnecessarily reading large quantities of data will have a negative impact on other users of the "/dev/random" and "/dev/urandom" devices. The flags argument is a bit mask that can contain zero or more of the following values ORed together: "os.GRND_RANDOM" and "GRND_NONBLOCK". See also the Linux getrandom() manual page. Availability: Linux 3.17 and newer. New in version 3.6. os.urandom(size) Return a bytestring of *size* random bytes suitable for cryptographic use. This function returns random bytes from an OS-specific randomness source. The returned data should be unpredictable enough for cryptographic applications, though its exact quality depends on the OS implementation. On Linux, if the "getrandom()" syscall is available, it is used in blocking mode: block until the system urandom entropy pool is initialized (128 bits of entropy are collected by the kernel). See the **PEP 524** for the rationale. On Linux, the "getrandom()" function can be used to get random bytes in non-blocking mode (using the "GRND_NONBLOCK" flag) or to poll until the system urandom entropy pool is initialized. On a Unix-like system, random bytes are read from the "/dev/urandom" device. If the "/dev/urandom" device is not available or not readable, the "NotImplementedError" exception is raised. On Windows, it will use "CryptGenRandom()". See also: The "secrets" module provides higher level functions. For an easy-to-use interface to the random number generator provided by your platform, please see "random.SystemRandom". Changed in version 3.6.0: On Linux, "getrandom()" is now used in blocking mode to increase the security. Changed in version 3.5.2: On Linux, if the "getrandom()" syscall blocks (the urandom entropy pool is not initialized yet), fall back on reading "/dev/urandom". Changed in version 3.5: On Linux 3.17 and newer, the "getrandom()" syscall is now used when available. On OpenBSD 5.6 and newer, the C "getentropy()" function is now used. These functions avoid the usage of an internal file descriptor. os.GRND_NONBLOCK By default, when reading from "/dev/random", "getrandom()" blocks if no random bytes are available, and when reading from "/dev/urandom", it blocks if the entropy pool has not yet been initialized. If the "GRND_NONBLOCK" flag is set, then "getrandom()" does not block in these cases, but instead immediately raises "BlockingIOError". New in version 3.6. os.GRND_RANDOM If this bit is set, then random bytes are drawn from the "/dev/random" pool instead of the "/dev/urandom" pool. New in version 3.6. "ossaudiodev" — Access to OSS-compatible audio devices ****************************************************** Deprecated since version 3.11: The "ossaudiodev" module is deprecated (see **PEP 594** for details). ====================================================================== This module allows you to access the OSS (Open Sound System) audio interface. OSS is available for a wide range of open-source and commercial Unices, and is the standard audio interface for Linux and recent versions of FreeBSD. Changed in version 3.3: Operations in this module now raise "OSError" where "IOError" was raised. See also: Open Sound System Programmer’s Guide the official documentation for the OSS C API The module defines a large number of constants supplied by the OSS device driver; see "" on either Linux or FreeBSD for a listing. "ossaudiodev" defines the following variables and functions: exception ossaudiodev.OSSAudioError This exception is raised on certain errors. The argument is a string describing what went wrong. (If "ossaudiodev" receives an error from a system call such as "open()", "write()", or "ioctl()", it raises "OSError". Errors detected directly by "ossaudiodev" result in "OSSAudioError".) (For backwards compatibility, the exception class is also available as "ossaudiodev.error".) ossaudiodev.open(mode) ossaudiodev.open(device, mode) Open an audio device and return an OSS audio device object. This object supports many file-like methods, such as "read()", "write()", and "fileno()" (although there are subtle differences between conventional Unix read/write semantics and those of OSS audio devices). It also supports a number of audio-specific methods; see below for the complete list of methods. *device* is the audio device filename to use. If it is not specified, this module first looks in the environment variable "AUDIODEV" for a device to use. If not found, it falls back to "/dev/dsp". *mode* is one of "'r'" for read-only (record) access, "'w'" for write-only (playback) access and "'rw'" for both. Since many sound cards only allow one process to have the recorder or player open at a time, it is a good idea to open the device only for the activity needed. Further, some sound cards are half-duplex: they can be opened for reading or writing, but not both at once. Note the unusual calling syntax: the *first* argument is optional, and the second is required. This is a historical artifact for compatibility with the older "linuxaudiodev" module which "ossaudiodev" supersedes. ossaudiodev.openmixer([device]) Open a mixer device and return an OSS mixer device object. *device* is the mixer device filename to use. If it is not specified, this module first looks in the environment variable "MIXERDEV" for a device to use. If not found, it falls back to "/dev/mixer". Audio Device Objects ==================== Before you can write to or read from an audio device, you must call three methods in the correct order: 1. "setfmt()" to set the output format 2. "channels()" to set the number of channels 3. "speed()" to set the sample rate Alternately, you can use the "setparameters()" method to set all three audio parameters at once. This is more convenient, but may not be as flexible in all cases. The audio device objects returned by "open()" define the following methods and (read-only) attributes: oss_audio_device.close() Explicitly close the audio device. When you are done writing to or reading from an audio device, you should explicitly close it. A closed device cannot be used again. oss_audio_device.fileno() Return the file descriptor associated with the device. oss_audio_device.read(size) Read *size* bytes from the audio input and return them as a Python string. Unlike most Unix device drivers, OSS audio devices in blocking mode (the default) will block "read()" until the entire requested amount of data is available. oss_audio_device.write(data) Write a *bytes-like object* *data* to the audio device and return the number of bytes written. If the audio device is in blocking mode (the default), the entire data is always written (again, this is different from usual Unix device semantics). If the device is in non-blocking mode, some data may not be written—see "writeall()". Changed in version 3.5: Writable *bytes-like object* is now accepted. oss_audio_device.writeall(data) Write a *bytes-like object* *data* to the audio device: waits until the audio device is able to accept data, writes as much data as it will accept, and repeats until *data* has been completely written. If the device is in blocking mode (the default), this has the same effect as "write()"; "writeall()" is only useful in non-blocking mode. Has no return value, since the amount of data written is always equal to the amount of data supplied. Changed in version 3.5: Writable *bytes-like object* is now accepted. Changed in version 3.2: Audio device objects also support the context management protocol, i.e. they can be used in a "with" statement. The following methods each map to exactly one "ioctl()" system call. The correspondence is obvious: for example, "setfmt()" corresponds to the "SNDCTL_DSP_SETFMT" ioctl, and "sync()" to "SNDCTL_DSP_SYNC" (this can be useful when consulting the OSS documentation). If the underlying "ioctl()" fails, they all raise "OSError". oss_audio_device.nonblock() Put the device into non-blocking mode. Once in non-blocking mode, there is no way to return it to blocking mode. oss_audio_device.getfmts() Return a bitmask of the audio output formats supported by the soundcard. Some of the formats supported by OSS are: +---------------------------+-----------------------------------------------+ | Format | Description | |===========================|===============================================| | "AFMT_MU_LAW" | a logarithmic encoding (used by Sun ".au" | | | files and "/dev/audio") | +---------------------------+-----------------------------------------------+ | "AFMT_A_LAW" | a logarithmic encoding | +---------------------------+-----------------------------------------------+ | "AFMT_IMA_ADPCM" | a 4:1 compressed format defined by the | | | Interactive Multimedia Association | +---------------------------+-----------------------------------------------+ | "AFMT_U8" | Unsigned, 8-bit audio | +---------------------------+-----------------------------------------------+ | "AFMT_S16_LE" | Signed, 16-bit audio, little-endian byte | | | order (as used by Intel processors) | +---------------------------+-----------------------------------------------+ | "AFMT_S16_BE" | Signed, 16-bit audio, big-endian byte order | | | (as used by 68k, PowerPC, Sparc) | +---------------------------+-----------------------------------------------+ | "AFMT_S8" | Signed, 8 bit audio | +---------------------------+-----------------------------------------------+ | "AFMT_U16_LE" | Unsigned, 16-bit little-endian audio | +---------------------------+-----------------------------------------------+ | "AFMT_U16_BE" | Unsigned, 16-bit big-endian audio | +---------------------------+-----------------------------------------------+ Consult the OSS documentation for a full list of audio formats, and note that most devices support only a subset of these formats. Some older devices only support "AFMT_U8"; the most common format used today is "AFMT_S16_LE". oss_audio_device.setfmt(format) Try to set the current audio format to *format*—see "getfmts()" for a list. Returns the audio format that the device was set to, which may not be the requested format. May also be used to return the current audio format—do this by passing an “audio format” of "AFMT_QUERY". oss_audio_device.channels(nchannels) Set the number of output channels to *nchannels*. A value of 1 indicates monophonic sound, 2 stereophonic. Some devices may have more than 2 channels, and some high-end devices may not support mono. Returns the number of channels the device was set to. oss_audio_device.speed(samplerate) Try to set the audio sampling rate to *samplerate* samples per second. Returns the rate actually set. Most sound devices don’t support arbitrary sampling rates. Common rates are: +---------+---------------------------------------------+ | Rate | Description | |=========|=============================================| | 8000 | default rate for "/dev/audio" | +---------+---------------------------------------------+ | 11025 | speech recording | +---------+---------------------------------------------+ | 22050 | | +---------+---------------------------------------------+ | 44100 | CD quality audio (at 16 bits/sample and 2 | | | channels) | +---------+---------------------------------------------+ | 96000 | DVD quality audio (at 24 bits/sample) | +---------+---------------------------------------------+ oss_audio_device.sync() Wait until the sound device has played every byte in its buffer. (This happens implicitly when the device is closed.) The OSS documentation recommends closing and re-opening the device rather than using "sync()". oss_audio_device.reset() Immediately stop playing or recording and return the device to a state where it can accept commands. The OSS documentation recommends closing and re-opening the device after calling "reset()". oss_audio_device.post() Tell the driver that there is likely to be a pause in the output, making it possible for the device to handle the pause more intelligently. You might use this after playing a spot sound effect, before waiting for user input, or before doing disk I/O. The following convenience methods combine several ioctls, or one ioctl and some simple calculations. oss_audio_device.setparameters(format, nchannels, samplerate[, strict=False]) Set the key audio sampling parameters—sample format, number of channels, and sampling rate—in one method call. *format*, *nchannels*, and *samplerate* should be as specified in the "setfmt()", "channels()", and "speed()" methods. If *strict* is true, "setparameters()" checks to see if each parameter was actually set to the requested value, and raises "OSSAudioError" if not. Returns a tuple (*format*, *nchannels*, *samplerate*) indicating the parameter values that were actually set by the device driver (i.e., the same as the return values of "setfmt()", "channels()", and "speed()"). For example, (fmt, channels, rate) = dsp.setparameters(fmt, channels, rate) is equivalent to fmt = dsp.setfmt(fmt) channels = dsp.channels(channels) rate = dsp.rate(rate) oss_audio_device.bufsize() Returns the size of the hardware buffer, in samples. oss_audio_device.obufcount() Returns the number of samples that are in the hardware buffer yet to be played. oss_audio_device.obuffree() Returns the number of samples that could be queued into the hardware buffer to be played without blocking. Audio device objects also support several read-only attributes: oss_audio_device.closed Boolean indicating whether the device has been closed. oss_audio_device.name String containing the name of the device file. oss_audio_device.mode The I/O mode for the file, either ""r"", ""rw"", or ""w"". Mixer Device Objects ==================== The mixer object provides two file-like methods: oss_mixer_device.close() This method closes the open mixer device file. Any further attempts to use the mixer after this file is closed will raise an "OSError". oss_mixer_device.fileno() Returns the file handle number of the open mixer device file. Changed in version 3.2: Mixer objects also support the context management protocol. The remaining methods are specific to audio mixing: oss_mixer_device.controls() This method returns a bitmask specifying the available mixer controls (“Control” being a specific mixable “channel”, such as "SOUND_MIXER_PCM" or "SOUND_MIXER_SYNTH"). This bitmask indicates a subset of all available mixer controls—the "SOUND_MIXER_*" constants defined at module level. To determine if, for example, the current mixer object supports a PCM mixer, use the following Python code: mixer=ossaudiodev.openmixer() if mixer.controls() & (1 << ossaudiodev.SOUND_MIXER_PCM): # PCM is supported ... code ... For most purposes, the "SOUND_MIXER_VOLUME" (master volume) and "SOUND_MIXER_PCM" controls should suffice—but code that uses the mixer should be flexible when it comes to choosing mixer controls. On the Gravis Ultrasound, for example, "SOUND_MIXER_VOLUME" does not exist. oss_mixer_device.stereocontrols() Returns a bitmask indicating stereo mixer controls. If a bit is set, the corresponding control is stereo; if it is unset, the control is either monophonic or not supported by the mixer (use in combination with "controls()" to determine which). See the code example for the "controls()" function for an example of getting data from a bitmask. oss_mixer_device.reccontrols() Returns a bitmask specifying the mixer controls that may be used to record. See the code example for "controls()" for an example of reading from a bitmask. oss_mixer_device.get(control) Returns the volume of a given mixer control. The returned volume is a 2-tuple "(left_volume,right_volume)". Volumes are specified as numbers from 0 (silent) to 100 (full volume). If the control is monophonic, a 2-tuple is still returned, but both volumes are the same. Raises "OSSAudioError" if an invalid control is specified, or "OSError" if an unsupported control is specified. oss_mixer_device.set(control, (left, right)) Sets the volume for a given mixer control to "(left,right)". "left" and "right" must be ints and between 0 (silent) and 100 (full volume). On success, the new volume is returned as a 2-tuple. Note that this may not be exactly the same as the volume specified, because of the limited resolution of some soundcard’s mixers. Raises "OSSAudioError" if an invalid mixer control was specified, or if the specified volumes were out-of-range. oss_mixer_device.get_recsrc() This method returns a bitmask indicating which control(s) are currently being used as a recording source. oss_mixer_device.set_recsrc(bitmask) Call this function to specify a recording source. Returns a bitmask indicating the new recording source (or sources) if successful; raises "OSError" if an invalid source was specified. To set the current recording source to the microphone input: mixer.setrecsrc (1 << ossaudiodev.SOUND_MIXER_MIC) "parser" — Access Python parse trees ************************************ ====================================================================== The "parser" module provides an interface to Python’s internal parser and byte-code compiler. The primary purpose for this interface is to allow Python code to edit the parse tree of a Python expression and create executable code from this. This is better than trying to parse and modify an arbitrary Python code fragment as a string because parsing is performed in a manner identical to the code forming the application. It is also faster. Warning: The parser module is deprecated and will be removed in future versions of Python. For the majority of use cases you can leverage the Abstract Syntax Tree (AST) generation and compilation stage, using the "ast" module. There are a few things to note about this module which are important to making use of the data structures created. This is not a tutorial on editing the parse trees for Python code, but some examples of using the "parser" module are presented. Most importantly, a good understanding of the Python grammar processed by the internal parser is required. For full information on the language syntax, refer to The Python Language Reference. The parser itself is created from a grammar specification defined in the file "Grammar/Grammar" in the standard Python distribution. The parse trees stored in the ST objects created by this module are the actual output from the internal parser when created by the "expr()" or "suite()" functions, described below. The ST objects created by "sequence2st()" faithfully simulate those structures. Be aware that the values of the sequences which are considered “correct” will vary from one version of Python to another as the formal grammar for the language is revised. However, transporting code from one Python version to another as source text will always allow correct parse trees to be created in the target version, with the only restriction being that migrating to an older version of the interpreter will not support more recent language constructs. The parse trees are not typically compatible from one version to another, though source code has usually been forward-compatible within a major release series. Each element of the sequences returned by "st2list()" or "st2tuple()" has a simple form. Sequences representing non-terminal elements in the grammar always have a length greater than one. The first element is an integer which identifies a production in the grammar. These integers are given symbolic names in the C header file "Include/graminit.h" and the Python module "symbol". Each additional element of the sequence represents a component of the production as recognized in the input string: these are always sequences which have the same form as the parent. An important aspect of this structure which should be noted is that keywords used to identify the parent node type, such as the keyword "if" in an "if_stmt", are included in the node tree without any special treatment. For example, the "if" keyword is represented by the tuple "(1, 'if')", where "1" is the numeric value associated with all "NAME" tokens, including variable and function names defined by the user. In an alternate form returned when line number information is requested, the same token might be represented as "(1, 'if', 12)", where the "12" represents the line number at which the terminal symbol was found. Terminal elements are represented in much the same way, but without any child elements and the addition of the source text which was identified. The example of the "if" keyword above is representative. The various types of terminal symbols are defined in the C header file "Include/token.h" and the Python module "token". The ST objects are not required to support the functionality of this module, but are provided for three purposes: to allow an application to amortize the cost of processing complex parse trees, to provide a parse tree representation which conserves memory space when compared to the Python list or tuple representation, and to ease the creation of additional modules in C which manipulate parse trees. A simple “wrapper” class may be created in Python to hide the use of ST objects. The "parser" module defines functions for a few distinct purposes. The most important purposes are to create ST objects and to convert ST objects to other representations such as parse trees and compiled code objects, but there are also functions which serve to query the type of parse tree represented by an ST object. See also: Module "symbol" Useful constants representing internal nodes of the parse tree. Module "token" Useful constants representing leaf nodes of the parse tree and functions for testing node values. Creating ST Objects =================== ST objects may be created from source code or from a parse tree. When creating an ST object from source, different functions are used to create the "'eval'" and "'exec'" forms. parser.expr(source) The "expr()" function parses the parameter *source* as if it were an input to "compile(source, 'file.py', 'eval')". If the parse succeeds, an ST object is created to hold the internal parse tree representation, otherwise an appropriate exception is raised. parser.suite(source) The "suite()" function parses the parameter *source* as if it were an input to "compile(source, 'file.py', 'exec')". If the parse succeeds, an ST object is created to hold the internal parse tree representation, otherwise an appropriate exception is raised. parser.sequence2st(sequence) This function accepts a parse tree represented as a sequence and builds an internal representation if possible. If it can validate that the tree conforms to the Python grammar and all nodes are valid node types in the host version of Python, an ST object is created from the internal representation and returned to the called. If there is a problem creating the internal representation, or if the tree cannot be validated, a "ParserError" exception is raised. An ST object created this way should not be assumed to compile correctly; normal exceptions raised by compilation may still be initiated when the ST object is passed to "compilest()". This may indicate problems not related to syntax (such as a "MemoryError" exception), but may also be due to constructs such as the result of parsing "del f(0)", which escapes the Python parser but is checked by the bytecode compiler. Sequences representing terminal tokens may be represented as either two-element lists of the form "(1, 'name')" or as three-element lists of the form "(1, 'name', 56)". If the third element is present, it is assumed to be a valid line number. The line number may be specified for any subset of the terminal symbols in the input tree. parser.tuple2st(sequence) This is the same function as "sequence2st()". This entry point is maintained for backward compatibility. Converting ST Objects ===================== ST objects, regardless of the input used to create them, may be converted to parse trees represented as list- or tuple- trees, or may be compiled into executable code objects. Parse trees may be extracted with or without line numbering information. parser.st2list(st, line_info=False, col_info=False) This function accepts an ST object from the caller in *st* and returns a Python list representing the equivalent parse tree. The resulting list representation can be used for inspection or the creation of a new parse tree in list form. This function does not fail so long as memory is available to build the list representation. If the parse tree will only be used for inspection, "st2tuple()" should be used instead to reduce memory consumption and fragmentation. When the list representation is required, this function is significantly faster than retrieving a tuple representation and converting that to nested lists. If *line_info* is true, line number information will be included for all terminal tokens as a third element of the list representing the token. Note that the line number provided specifies the line on which the token *ends*. This information is omitted if the flag is false or omitted. parser.st2tuple(st, line_info=False, col_info=False) This function accepts an ST object from the caller in *st* and returns a Python tuple representing the equivalent parse tree. Other than returning a tuple instead of a list, this function is identical to "st2list()". If *line_info* is true, line number information will be included for all terminal tokens as a third element of the list representing the token. This information is omitted if the flag is false or omitted. parser.compilest(st, filename='') The Python byte compiler can be invoked on an ST object to produce code objects which can be used as part of a call to the built-in "exec()" or "eval()" functions. This function provides the interface to the compiler, passing the internal parse tree from *st* to the parser, using the source file name specified by the *filename* parameter. The default value supplied for *filename* indicates that the source was an ST object. Compiling an ST object may result in exceptions related to compilation; an example would be a "SyntaxError" caused by the parse tree for "del f(0)": this statement is considered legal within the formal grammar for Python but is not a legal language construct. The "SyntaxError" raised for this condition is actually generated by the Python byte-compiler normally, which is why it can be raised at this point by the "parser" module. Most causes of compilation failure can be diagnosed programmatically by inspection of the parse tree. Queries on ST Objects ===================== Two functions are provided which allow an application to determine if an ST was created as an expression or a suite. Neither of these functions can be used to determine if an ST was created from source code via "expr()" or "suite()" or from a parse tree via "sequence2st()". parser.isexpr(st) When *st* represents an "'eval'" form, this function returns "True", otherwise it returns "False". This is useful, since code objects normally cannot be queried for this information using existing built-in functions. Note that the code objects created by "compilest()" cannot be queried like this either, and are identical to those created by the built-in "compile()" function. parser.issuite(st) This function mirrors "isexpr()" in that it reports whether an ST object represents an "'exec'" form, commonly known as a “suite.” It is not safe to assume that this function is equivalent to "not isexpr(st)", as additional syntactic fragments may be supported in the future. Exceptions and Error Handling ============================= The parser module defines a single exception, but may also pass other built-in exceptions from other portions of the Python runtime environment. See each function for information about the exceptions it can raise. exception parser.ParserError Exception raised when a failure occurs within the parser module. This is generally produced for validation failures rather than the built-in "SyntaxError" raised during normal parsing. The exception argument is either a string describing the reason of the failure or a tuple containing a sequence causing the failure from a parse tree passed to "sequence2st()" and an explanatory string. Calls to "sequence2st()" need to be able to handle either type of exception, while calls to other functions in the module will only need to be aware of the simple string values. Note that the functions "compilest()", "expr()", and "suite()" may raise exceptions which are normally raised by the parsing and compilation process. These include the built in exceptions "MemoryError", "OverflowError", "SyntaxError", and "SystemError". In these cases, these exceptions carry all the meaning normally associated with them. Refer to the descriptions of each function for detailed information. ST Objects ========== Ordered and equality comparisons are supported between ST objects. Pickling of ST objects (using the "pickle" module) is also supported. parser.STType The type of the objects returned by "expr()", "suite()" and "sequence2st()". ST objects have the following methods: ST.compile(filename='') Same as "compilest(st, filename)". ST.isexpr() Same as "isexpr(st)". ST.issuite() Same as "issuite(st)". ST.tolist(line_info=False, col_info=False) Same as "st2list(st, line_info, col_info)". ST.totuple(line_info=False, col_info=False) Same as "st2tuple(st, line_info, col_info)". Example: Emulation of "compile()" ================================= While many useful operations may take place between parsing and bytecode generation, the simplest operation is to do nothing. For this purpose, using the "parser" module to produce an intermediate data structure is equivalent to the code >>> code = compile('a + 5', 'file.py', 'eval') >>> a = 5 >>> eval(code) 10 The equivalent operation using the "parser" module is somewhat longer, and allows the intermediate internal parse tree to be retained as an ST object: >>> import parser >>> st = parser.expr('a + 5') >>> code = st.compile('file.py') >>> a = 5 >>> eval(code) 10 An application which needs both ST and code objects can package this code into readily available functions: import parser def load_suite(source_string): st = parser.suite(source_string) return st, st.compile() def load_expression(source_string): st = parser.expr(source_string) return st, st.compile() "pathlib" — Object-oriented filesystem paths ******************************************** New in version 3.4. **Source code:** Lib/pathlib.py ====================================================================== This module offers classes representing filesystem paths with semantics appropriate for different operating systems. Path classes are divided between pure paths, which provide purely computational operations without I/O, and concrete paths, which inherit from pure paths but also provide I/O operations. [image] If you’ve never used this module before or just aren’t sure which class is right for your task, "Path" is most likely what you need. It instantiates a concrete path for the platform the code is running on. Pure paths are useful in some special cases; for example: 1. If you want to manipulate Windows paths on a Unix machine (or vice versa). You cannot instantiate a "WindowsPath" when running on Unix, but you can instantiate "PureWindowsPath". 2. You want to make sure that your code only manipulates paths without actually accessing the OS. In this case, instantiating one of the pure classes may be useful since those simply don’t have any OS- accessing operations. See also: **PEP 428**: The pathlib module – object-oriented filesystem paths. See also: For low-level path manipulation on strings, you can also use the "os.path" module. Basic use ========= Importing the main class: >>> from pathlib import Path Listing subdirectories: >>> p = Path('.') >>> [x for x in p.iterdir() if x.is_dir()] [PosixPath('.hg'), PosixPath('docs'), PosixPath('dist'), PosixPath('__pycache__'), PosixPath('build')] Listing Python source files in this directory tree: >>> list(p.glob('**/*.py')) [PosixPath('test_pathlib.py'), PosixPath('setup.py'), PosixPath('pathlib.py'), PosixPath('docs/conf.py'), PosixPath('build/lib/pathlib.py')] Navigating inside a directory tree: >>> p = Path('/etc') >>> q = p / 'init.d' / 'reboot' >>> q PosixPath('/etc/init.d/reboot') >>> q.resolve() PosixPath('/etc/rc.d/init.d/halt') Querying path properties: >>> q.exists() True >>> q.is_dir() False Opening a file: >>> with q.open() as f: f.readline() ... '#!/bin/bash\n' Pure paths ========== Pure path objects provide path-handling operations which don’t actually access a filesystem. There are three ways to access these classes, which we also call *flavours*: class pathlib.PurePath(*pathsegments) A generic class that represents the system’s path flavour (instantiating it creates either a "PurePosixPath" or a "PureWindowsPath"): >>> PurePath('setup.py') # Running on a Unix machine PurePosixPath('setup.py') Each element of *pathsegments* can be either a string representing a path segment, an object implementing the "os.PathLike" interface which returns a string, or another path object: >>> PurePath('foo', 'some/path', 'bar') PurePosixPath('foo/some/path/bar') >>> PurePath(Path('foo'), Path('bar')) PurePosixPath('foo/bar') When *pathsegments* is empty, the current directory is assumed: >>> PurePath() PurePosixPath('.') When several absolute paths are given, the last is taken as an anchor (mimicking "os.path.join()"’s behaviour): >>> PurePath('/etc', '/usr', 'lib64') PurePosixPath('/usr/lib64') >>> PureWindowsPath('c:/Windows', 'd:bar') PureWindowsPath('d:bar') However, in a Windows path, changing the local root doesn’t discard the previous drive setting: >>> PureWindowsPath('c:/Windows', '/Program Files') PureWindowsPath('c:/Program Files') Spurious slashes and single dots are collapsed, but double dots ("'..'") are not, since this would change the meaning of a path in the face of symbolic links: >>> PurePath('foo//bar') PurePosixPath('foo/bar') >>> PurePath('foo/./bar') PurePosixPath('foo/bar') >>> PurePath('foo/../bar') PurePosixPath('foo/../bar') (a naïve approach would make "PurePosixPath('foo/../bar')" equivalent to "PurePosixPath('bar')", which is wrong if "foo" is a symbolic link to another directory) Pure path objects implement the "os.PathLike" interface, allowing them to be used anywhere the interface is accepted. Changed in version 3.6: Added support for the "os.PathLike" interface. class pathlib.PurePosixPath(*pathsegments) A subclass of "PurePath", this path flavour represents non-Windows filesystem paths: >>> PurePosixPath('/etc') PurePosixPath('/etc') *pathsegments* is specified similarly to "PurePath". class pathlib.PureWindowsPath(*pathsegments) A subclass of "PurePath", this path flavour represents Windows filesystem paths: >>> PureWindowsPath('c:/Program Files/') PureWindowsPath('c:/Program Files') *pathsegments* is specified similarly to "PurePath". Regardless of the system you’re running on, you can instantiate all of these classes, since they don’t provide any operation that does system calls. General properties ------------------ Paths are immutable and hashable. Paths of a same flavour are comparable and orderable. These properties respect the flavour’s case-folding semantics: >>> PurePosixPath('foo') == PurePosixPath('FOO') False >>> PureWindowsPath('foo') == PureWindowsPath('FOO') True >>> PureWindowsPath('FOO') in { PureWindowsPath('foo') } True >>> PureWindowsPath('C:') < PureWindowsPath('d:') True Paths of a different flavour compare unequal and cannot be ordered: >>> PureWindowsPath('foo') == PurePosixPath('foo') False >>> PureWindowsPath('foo') < PurePosixPath('foo') Traceback (most recent call last): File "", line 1, in TypeError: '<' not supported between instances of 'PureWindowsPath' and 'PurePosixPath' Operators --------- The slash operator helps create child paths, similarly to "os.path.join()": >>> p = PurePath('/etc') >>> p PurePosixPath('/etc') >>> p / 'init.d' / 'apache2' PurePosixPath('/etc/init.d/apache2') >>> q = PurePath('bin') >>> '/usr' / q PurePosixPath('/usr/bin') A path object can be used anywhere an object implementing "os.PathLike" is accepted: >>> import os >>> p = PurePath('/etc') >>> os.fspath(p) '/etc' The string representation of a path is the raw filesystem path itself (in native form, e.g. with backslashes under Windows), which you can pass to any function taking a file path as a string: >>> p = PurePath('/etc') >>> str(p) '/etc' >>> p = PureWindowsPath('c:/Program Files') >>> str(p) 'c:\\Program Files' Similarly, calling "bytes" on a path gives the raw filesystem path as a bytes object, as encoded by "os.fsencode()": >>> bytes(p) b'/etc' Note: Calling "bytes" is only recommended under Unix. Under Windows, the unicode form is the canonical representation of filesystem paths. Accessing individual parts -------------------------- To access the individual “parts” (components) of a path, use the following property: PurePath.parts A tuple giving access to the path’s various components: >>> p = PurePath('/usr/bin/python3') >>> p.parts ('/', 'usr', 'bin', 'python3') >>> p = PureWindowsPath('c:/Program Files/PSF') >>> p.parts ('c:\\', 'Program Files', 'PSF') (note how the drive and local root are regrouped in a single part) Methods and properties ---------------------- Pure paths provide the following methods and properties: PurePath.drive A string representing the drive letter or name, if any: >>> PureWindowsPath('c:/Program Files/').drive 'c:' >>> PureWindowsPath('/Program Files/').drive '' >>> PurePosixPath('/etc').drive '' UNC shares are also considered drives: >>> PureWindowsPath('//host/share/foo.txt').drive '\\\\host\\share' PurePath.root A string representing the (local or global) root, if any: >>> PureWindowsPath('c:/Program Files/').root '\\' >>> PureWindowsPath('c:Program Files/').root '' >>> PurePosixPath('/etc').root '/' UNC shares always have a root: >>> PureWindowsPath('//host/share').root '\\' PurePath.anchor The concatenation of the drive and root: >>> PureWindowsPath('c:/Program Files/').anchor 'c:\\' >>> PureWindowsPath('c:Program Files/').anchor 'c:' >>> PurePosixPath('/etc').anchor '/' >>> PureWindowsPath('//host/share').anchor '\\\\host\\share\\' PurePath.parents An immutable sequence providing access to the logical ancestors of the path: >>> p = PureWindowsPath('c:/foo/bar/setup.py') >>> p.parents[0] PureWindowsPath('c:/foo/bar') >>> p.parents[1] PureWindowsPath('c:/foo') >>> p.parents[2] PureWindowsPath('c:/') PurePath.parent The logical parent of the path: >>> p = PurePosixPath('/a/b/c/d') >>> p.parent PurePosixPath('/a/b/c') You cannot go past an anchor, or empty path: >>> p = PurePosixPath('/') >>> p.parent PurePosixPath('/') >>> p = PurePosixPath('.') >>> p.parent PurePosixPath('.') Note: This is a purely lexical operation, hence the following behaviour: >>> p = PurePosixPath('foo/..') >>> p.parent PurePosixPath('foo') If you want to walk an arbitrary filesystem path upwards, it is recommended to first call "Path.resolve()" so as to resolve symlinks and eliminate *“..”* components. PurePath.name A string representing the final path component, excluding the drive and root, if any: >>> PurePosixPath('my/library/setup.py').name 'setup.py' UNC drive names are not considered: >>> PureWindowsPath('//some/share/setup.py').name 'setup.py' >>> PureWindowsPath('//some/share').name '' PurePath.suffix The file extension of the final component, if any: >>> PurePosixPath('my/library/setup.py').suffix '.py' >>> PurePosixPath('my/library.tar.gz').suffix '.gz' >>> PurePosixPath('my/library').suffix '' PurePath.suffixes A list of the path’s file extensions: >>> PurePosixPath('my/library.tar.gar').suffixes ['.tar', '.gar'] >>> PurePosixPath('my/library.tar.gz').suffixes ['.tar', '.gz'] >>> PurePosixPath('my/library').suffixes [] PurePath.stem The final path component, without its suffix: >>> PurePosixPath('my/library.tar.gz').stem 'library.tar' >>> PurePosixPath('my/library.tar').stem 'library' >>> PurePosixPath('my/library').stem 'library' PurePath.as_posix() Return a string representation of the path with forward slashes ("/"): >>> p = PureWindowsPath('c:\\windows') >>> str(p) 'c:\\windows' >>> p.as_posix() 'c:/windows' PurePath.as_uri() Represent the path as a "file" URI. "ValueError" is raised if the path isn’t absolute. >>> p = PurePosixPath('/etc/passwd') >>> p.as_uri() 'file:///etc/passwd' >>> p = PureWindowsPath('c:/Windows') >>> p.as_uri() 'file:///c:/Windows' PurePath.is_absolute() Return whether the path is absolute or not. A path is considered absolute if it has both a root and (if the flavour allows) a drive: >>> PurePosixPath('/a/b').is_absolute() True >>> PurePosixPath('a/b').is_absolute() False >>> PureWindowsPath('c:/a/b').is_absolute() True >>> PureWindowsPath('/a/b').is_absolute() False >>> PureWindowsPath('c:').is_absolute() False >>> PureWindowsPath('//some/share').is_absolute() True PurePath.is_relative_to(*other) Return whether or not this path is relative to the *other* path. >>> p = PurePath('/etc/passwd') >>> p.is_relative_to('/etc') True >>> p.is_relative_to('/usr') False New in version 3.9. PurePath.is_reserved() With "PureWindowsPath", return "True" if the path is considered reserved under Windows, "False" otherwise. With "PurePosixPath", "False" is always returned. >>> PureWindowsPath('nul').is_reserved() True >>> PurePosixPath('nul').is_reserved() False File system calls on reserved paths can fail mysteriously or have unintended effects. PurePath.joinpath(*other) Calling this method is equivalent to combining the path with each of the *other* arguments in turn: >>> PurePosixPath('/etc').joinpath('passwd') PurePosixPath('/etc/passwd') >>> PurePosixPath('/etc').joinpath(PurePosixPath('passwd')) PurePosixPath('/etc/passwd') >>> PurePosixPath('/etc').joinpath('init.d', 'apache2') PurePosixPath('/etc/init.d/apache2') >>> PureWindowsPath('c:').joinpath('/Program Files') PureWindowsPath('c:/Program Files') PurePath.match(pattern) Match this path against the provided glob-style pattern. Return "True" if matching is successful, "False" otherwise. If *pattern* is relative, the path can be either relative or absolute, and matching is done from the right: >>> PurePath('a/b.py').match('*.py') True >>> PurePath('/a/b/c.py').match('b/*.py') True >>> PurePath('/a/b/c.py').match('a/*.py') False If *pattern* is absolute, the path must be absolute, and the whole path must match: >>> PurePath('/a.py').match('/*.py') True >>> PurePath('a/b.py').match('/*.py') False As with other methods, case-sensitivity follows platform defaults: >>> PurePosixPath('b.py').match('*.PY') False >>> PureWindowsPath('b.py').match('*.PY') True PurePath.relative_to(*other) Compute a version of this path relative to the path represented by *other*. If it’s impossible, ValueError is raised: >>> p = PurePosixPath('/etc/passwd') >>> p.relative_to('/') PurePosixPath('etc/passwd') >>> p.relative_to('/etc') PurePosixPath('passwd') >>> p.relative_to('/usr') Traceback (most recent call last): File "", line 1, in File "pathlib.py", line 694, in relative_to .format(str(self), str(formatted))) ValueError: '/etc/passwd' is not in the subpath of '/usr' OR one path is relative and the other absolute. NOTE: This function is part of "PurePath" and works with strings. It does not check or access the underlying file structure. PurePath.with_name(name) Return a new path with the "name" changed. If the original path doesn’t have a name, ValueError is raised: >>> p = PureWindowsPath('c:/Downloads/pathlib.tar.gz') >>> p.with_name('setup.py') PureWindowsPath('c:/Downloads/setup.py') >>> p = PureWindowsPath('c:/') >>> p.with_name('setup.py') Traceback (most recent call last): File "", line 1, in File "/home/antoine/cpython/default/Lib/pathlib.py", line 751, in with_name raise ValueError("%r has an empty name" % (self,)) ValueError: PureWindowsPath('c:/') has an empty name PurePath.with_stem(stem) Return a new path with the "stem" changed. If the original path doesn’t have a name, ValueError is raised: >>> p = PureWindowsPath('c:/Downloads/draft.txt') >>> p.with_stem('final') PureWindowsPath('c:/Downloads/final.txt') >>> p = PureWindowsPath('c:/Downloads/pathlib.tar.gz') >>> p.with_stem('lib') PureWindowsPath('c:/Downloads/lib.gz') >>> p = PureWindowsPath('c:/') >>> p.with_stem('') Traceback (most recent call last): File "", line 1, in File "/home/antoine/cpython/default/Lib/pathlib.py", line 861, in with_stem return self.with_name(stem + self.suffix) File "/home/antoine/cpython/default/Lib/pathlib.py", line 851, in with_name raise ValueError("%r has an empty name" % (self,)) ValueError: PureWindowsPath('c:/') has an empty name New in version 3.9. PurePath.with_suffix(suffix) Return a new path with the "suffix" changed. If the original path doesn’t have a suffix, the new *suffix* is appended instead. If the *suffix* is an empty string, the original suffix is removed: >>> p = PureWindowsPath('c:/Downloads/pathlib.tar.gz') >>> p.with_suffix('.bz2') PureWindowsPath('c:/Downloads/pathlib.tar.bz2') >>> p = PureWindowsPath('README') >>> p.with_suffix('.txt') PureWindowsPath('README.txt') >>> p = PureWindowsPath('README.txt') >>> p.with_suffix('') PureWindowsPath('README') Concrete paths ============== Concrete paths are subclasses of the pure path classes. In addition to operations provided by the latter, they also provide methods to do system calls on path objects. There are three ways to instantiate concrete paths: class pathlib.Path(*pathsegments) A subclass of "PurePath", this class represents concrete paths of the system’s path flavour (instantiating it creates either a "PosixPath" or a "WindowsPath"): >>> Path('setup.py') PosixPath('setup.py') *pathsegments* is specified similarly to "PurePath". class pathlib.PosixPath(*pathsegments) A subclass of "Path" and "PurePosixPath", this class represents concrete non-Windows filesystem paths: >>> PosixPath('/etc') PosixPath('/etc') *pathsegments* is specified similarly to "PurePath". class pathlib.WindowsPath(*pathsegments) A subclass of "Path" and "PureWindowsPath", this class represents concrete Windows filesystem paths: >>> WindowsPath('c:/Program Files/') WindowsPath('c:/Program Files') *pathsegments* is specified similarly to "PurePath". You can only instantiate the class flavour that corresponds to your system (allowing system calls on non-compatible path flavours could lead to bugs or failures in your application): >>> import os >>> os.name 'posix' >>> Path('setup.py') PosixPath('setup.py') >>> PosixPath('setup.py') PosixPath('setup.py') >>> WindowsPath('setup.py') Traceback (most recent call last): File "", line 1, in File "pathlib.py", line 798, in __new__ % (cls.__name__,)) NotImplementedError: cannot instantiate 'WindowsPath' on your system Methods ------- Concrete paths provide the following methods in addition to pure paths methods. Many of these methods can raise an "OSError" if a system call fails (for example because the path doesn’t exist). Changed in version 3.8: "exists()", "is_dir()", "is_file()", "is_mount()", "is_symlink()", "is_block_device()", "is_char_device()", "is_fifo()", "is_socket()" now return "False" instead of raising an exception for paths that contain characters unrepresentable at the OS level. classmethod Path.cwd() Return a new path object representing the current directory (as returned by "os.getcwd()"): >>> Path.cwd() PosixPath('/home/antoine/pathlib') classmethod Path.home() Return a new path object representing the user’s home directory (as returned by "os.path.expanduser()" with "~" construct): >>> Path.home() PosixPath('/home/antoine') Note that unlike "os.path.expanduser()", on POSIX systems a "KeyError" or "RuntimeError" will be raised, and on Windows systems a "RuntimeError" will be raised if home directory can’t be resolved. New in version 3.5. Path.stat() Return a "os.stat_result" object containing information about this path, like "os.stat()". The result is looked up at each call to this method. >>> p = Path('setup.py') >>> p.stat().st_size 956 >>> p.stat().st_mtime 1327883547.852554 Path.chmod(mode) Change the file mode and permissions, like "os.chmod()": >>> p = Path('setup.py') >>> p.stat().st_mode 33277 >>> p.chmod(0o444) >>> p.stat().st_mode 33060 Path.exists() Whether the path points to an existing file or directory: >>> Path('.').exists() True >>> Path('setup.py').exists() True >>> Path('/etc').exists() True >>> Path('nonexistentfile').exists() False Note: If the path points to a symlink, "exists()" returns whether the symlink *points to* an existing file or directory. Path.expanduser() Return a new path with expanded "~" and "~user" constructs, as returned by "os.path.expanduser()": >>> p = PosixPath('~/films/Monty Python') >>> p.expanduser() PosixPath('/home/eric/films/Monty Python') Note that unlike "os.path.expanduser()", on POSIX systems a "KeyError" or "RuntimeError" will be raised, and on Windows systems a "RuntimeError" will be raised if home directory can’t be resolved. New in version 3.5. Path.glob(pattern) Glob the given relative *pattern* in the directory represented by this path, yielding all matching files (of any kind): >>> sorted(Path('.').glob('*.py')) [PosixPath('pathlib.py'), PosixPath('setup.py'), PosixPath('test_pathlib.py')] >>> sorted(Path('.').glob('*/*.py')) [PosixPath('docs/conf.py')] The “"**"” pattern means “this directory and all subdirectories, recursively”. In other words, it enables recursive globbing: >>> sorted(Path('.').glob('**/*.py')) [PosixPath('build/lib/pathlib.py'), PosixPath('docs/conf.py'), PosixPath('pathlib.py'), PosixPath('setup.py'), PosixPath('test_pathlib.py')] Note: Using the “"**"” pattern in large directory trees may consume an inordinate amount of time. Raises an auditing event "pathlib.Path.glob" with arguments "self", "pattern". Path.group() Return the name of the group owning the file. "KeyError" is raised if the file’s gid isn’t found in the system database. Path.is_dir() Return "True" if the path points to a directory (or a symbolic link pointing to a directory), "False" if it points to another kind of file. "False" is also returned if the path doesn’t exist or is a broken symlink; other errors (such as permission errors) are propagated. Path.is_file() Return "True" if the path points to a regular file (or a symbolic link pointing to a regular file), "False" if it points to another kind of file. "False" is also returned if the path doesn’t exist or is a broken symlink; other errors (such as permission errors) are propagated. Path.is_mount() Return "True" if the path is a *mount point*: a point in a file system where a different file system has been mounted. On POSIX, the function checks whether *path*’s parent, "path/..", is on a different device than *path*, or whether "path/.." and *path* point to the same i-node on the same device — this should detect mount points for all Unix and POSIX variants. Not implemented on Windows. New in version 3.7. Path.is_symlink() Return "True" if the path points to a symbolic link, "False" otherwise. "False" is also returned if the path doesn’t exist; other errors (such as permission errors) are propagated. Path.is_socket() Return "True" if the path points to a Unix socket (or a symbolic link pointing to a Unix socket), "False" if it points to another kind of file. "False" is also returned if the path doesn’t exist or is a broken symlink; other errors (such as permission errors) are propagated. Path.is_fifo() Return "True" if the path points to a FIFO (or a symbolic link pointing to a FIFO), "False" if it points to another kind of file. "False" is also returned if the path doesn’t exist or is a broken symlink; other errors (such as permission errors) are propagated. Path.is_block_device() Return "True" if the path points to a block device (or a symbolic link pointing to a block device), "False" if it points to another kind of file. "False" is also returned if the path doesn’t exist or is a broken symlink; other errors (such as permission errors) are propagated. Path.is_char_device() Return "True" if the path points to a character device (or a symbolic link pointing to a character device), "False" if it points to another kind of file. "False" is also returned if the path doesn’t exist or is a broken symlink; other errors (such as permission errors) are propagated. Path.iterdir() When the path points to a directory, yield path objects of the directory contents: >>> p = Path('docs') >>> for child in p.iterdir(): child ... PosixPath('docs/conf.py') PosixPath('docs/_templates') PosixPath('docs/make.bat') PosixPath('docs/index.rst') PosixPath('docs/_build') PosixPath('docs/_static') PosixPath('docs/Makefile') The children are yielded in arbitrary order, and the special entries "'.'" and "'..'" are not included. If a file is removed from or added to the directory after creating the iterator, whether a path object for that file be included is unspecified. Path.lchmod(mode) Like "Path.chmod()" but, if the path points to a symbolic link, the symbolic link’s mode is changed rather than its target’s. Path.lstat() Like "Path.stat()" but, if the path points to a symbolic link, return the symbolic link’s information rather than its target’s. Path.mkdir(mode=0o777, parents=False, exist_ok=False) Create a new directory at this given path. If *mode* is given, it is combined with the process’ "umask" value to determine the file mode and access flags. If the path already exists, "FileExistsError" is raised. If *parents* is true, any missing parents of this path are created as needed; they are created with the default permissions without taking *mode* into account (mimicking the POSIX "mkdir -p" command). If *parents* is false (the default), a missing parent raises "FileNotFoundError". If *exist_ok* is false (the default), "FileExistsError" is raised if the target directory already exists. If *exist_ok* is true, "FileExistsError" exceptions will be ignored (same behavior as the POSIX "mkdir -p" command), but only if the last path component is not an existing non-directory file. Changed in version 3.5: The *exist_ok* parameter was added. Path.open(mode='r', buffering=-1, encoding=None, errors=None, newline=None) Open the file pointed to by the path, like the built-in "open()" function does: >>> p = Path('setup.py') >>> with p.open() as f: ... f.readline() ... '#!/usr/bin/env python3\n' Path.owner() Return the name of the user owning the file. "KeyError" is raised if the file’s uid isn’t found in the system database. Path.read_bytes() Return the binary contents of the pointed-to file as a bytes object: >>> p = Path('my_binary_file') >>> p.write_bytes(b'Binary file contents') 20 >>> p.read_bytes() b'Binary file contents' New in version 3.5. Path.read_text(encoding=None, errors=None) Return the decoded contents of the pointed-to file as a string: >>> p = Path('my_text_file') >>> p.write_text('Text file contents') 18 >>> p.read_text() 'Text file contents' The file is opened and then closed. The optional parameters have the same meaning as in "open()". New in version 3.5. Path.readlink() Return the path to which the symbolic link points (as returned by "os.readlink()"): >>> p = Path('mylink') >>> p.symlink_to('setup.py') >>> p.readlink() PosixPath('setup.py') New in version 3.9. Path.rename(target) Rename this file or directory to the given *target*, and return a new Path instance pointing to *target*. On Unix, if *target* exists and is a file, it will be replaced silently if the user has permission. *target* can be either a string or another path object: >>> p = Path('foo') >>> p.open('w').write('some text') 9 >>> target = Path('bar') >>> p.rename(target) PosixPath('bar') >>> target.open().read() 'some text' The target path may be absolute or relative. Relative paths are interpreted relative to the current working directory, *not* the directory of the Path object. Changed in version 3.8: Added return value, return the new Path instance. Path.replace(target) Rename this file or directory to the given *target*, and return a new Path instance pointing to *target*. If *target* points to an existing file or empty directory, it will be unconditionally replaced. The target path may be absolute or relative. Relative paths are interpreted relative to the current working directory, *not* the directory of the Path object. Changed in version 3.8: Added return value, return the new Path instance. Path.resolve(strict=False) Make the path absolute, resolving any symlinks. A new path object is returned: >>> p = Path() >>> p PosixPath('.') >>> p.resolve() PosixPath('/home/antoine/pathlib') “".."” components are also eliminated (this is the only method to do so): >>> p = Path('docs/../setup.py') >>> p.resolve() PosixPath('/home/antoine/pathlib/setup.py') If the path doesn’t exist and *strict* is "True", "FileNotFoundError" is raised. If *strict* is "False", the path is resolved as far as possible and any remainder is appended without checking whether it exists. If an infinite loop is encountered along the resolution path, "RuntimeError" is raised. New in version 3.6: The *strict* argument (pre-3.6 behavior is strict). Path.rglob(pattern) This is like calling "Path.glob()" with “"**/"” added in front of the given relative *pattern*: >>> sorted(Path().rglob("*.py")) [PosixPath('build/lib/pathlib.py'), PosixPath('docs/conf.py'), PosixPath('pathlib.py'), PosixPath('setup.py'), PosixPath('test_pathlib.py')] Raises an auditing event "pathlib.Path.rglob" with arguments "self", "pattern". Path.rmdir() Remove this directory. The directory must be empty. Path.samefile(other_path) Return whether this path points to the same file as *other_path*, which can be either a Path object, or a string. The semantics are similar to "os.path.samefile()" and "os.path.samestat()". An "OSError" can be raised if either file cannot be accessed for some reason. >>> p = Path('spam') >>> q = Path('eggs') >>> p.samefile(q) False >>> p.samefile('spam') True New in version 3.5. Path.symlink_to(target, target_is_directory=False) Make this path a symbolic link to *target*. Under Windows, *target_is_directory* must be true (default "False") if the link’s target is a directory. Under POSIX, *target_is_directory*’s value is ignored. >>> p = Path('mylink') >>> p.symlink_to('setup.py') >>> p.resolve() PosixPath('/home/antoine/pathlib/setup.py') >>> p.stat().st_size 956 >>> p.lstat().st_size 8 Note: The order of arguments (link, target) is the reverse of "os.symlink()"’s. Path.link_to(target) Make *target* a hard link to this path. Warning: This function does not make this path a hard link to *target*, despite the implication of the function and argument names. The argument order (target, link) is the reverse of "Path.symlink_to()", but matches that of "os.link()". New in version 3.8. Path.touch(mode=0o666, exist_ok=True) Create a file at this given path. If *mode* is given, it is combined with the process’ "umask" value to determine the file mode and access flags. If the file already exists, the function succeeds if *exist_ok* is true (and its modification time is updated to the current time), otherwise "FileExistsError" is raised. Path.unlink(missing_ok=False) Remove this file or symbolic link. If the path points to a directory, use "Path.rmdir()" instead. If *missing_ok* is false (the default), "FileNotFoundError" is raised if the path does not exist. If *missing_ok* is true, "FileNotFoundError" exceptions will be ignored (same behavior as the POSIX "rm -f" command). Changed in version 3.8: The *missing_ok* parameter was added. Path.write_bytes(data) Open the file pointed to in bytes mode, write *data* to it, and close the file: >>> p = Path('my_binary_file') >>> p.write_bytes(b'Binary file contents') 20 >>> p.read_bytes() b'Binary file contents' An existing file of the same name is overwritten. New in version 3.5. Path.write_text(data, encoding=None, errors=None) Open the file pointed to in text mode, write *data* to it, and close the file: >>> p = Path('my_text_file') >>> p.write_text('Text file contents') 18 >>> p.read_text() 'Text file contents' An existing file of the same name is overwritten. The optional parameters have the same meaning as in "open()". New in version 3.5. Correspondence to tools in the "os" module ========================================== Below is a table mapping various "os" functions to their corresponding "PurePath"/"Path" equivalent. Note: Although "os.path.relpath()" and "PurePath.relative_to()" have some overlapping use-cases, their semantics differ enough to warrant not considering them equivalent. +--------------------------------------+--------------------------------+ | os and os.path | pathlib | |======================================|================================| | "os.path.abspath()" | "Path.resolve()" | +--------------------------------------+--------------------------------+ | "os.chmod()" | "Path.chmod()" | +--------------------------------------+--------------------------------+ | "os.mkdir()" | "Path.mkdir()" | +--------------------------------------+--------------------------------+ | "os.makedirs()" | "Path.mkdir()" | +--------------------------------------+--------------------------------+ | "os.rename()" | "Path.rename()" | +--------------------------------------+--------------------------------+ | "os.replace()" | "Path.replace()" | +--------------------------------------+--------------------------------+ | "os.rmdir()" | "Path.rmdir()" | +--------------------------------------+--------------------------------+ | "os.remove()", "os.unlink()" | "Path.unlink()" | +--------------------------------------+--------------------------------+ | "os.getcwd()" | "Path.cwd()" | +--------------------------------------+--------------------------------+ | "os.path.exists()" | "Path.exists()" | +--------------------------------------+--------------------------------+ | "os.path.expanduser()" | "Path.expanduser()" and | | | "Path.home()" | +--------------------------------------+--------------------------------+ | "os.listdir()" | "Path.iterdir()" | +--------------------------------------+--------------------------------+ | "os.path.isdir()" | "Path.is_dir()" | +--------------------------------------+--------------------------------+ | "os.path.isfile()" | "Path.is_file()" | +--------------------------------------+--------------------------------+ | "os.path.islink()" | "Path.is_symlink()" | +--------------------------------------+--------------------------------+ | "os.link()" | "Path.link_to()" | +--------------------------------------+--------------------------------+ | "os.symlink()" | "Path.symlink_to()" | +--------------------------------------+--------------------------------+ | "os.readlink()" | "Path.readlink()" | +--------------------------------------+--------------------------------+ | "os.stat()" | "Path.stat()", "Path.owner()", | | | "Path.group()" | +--------------------------------------+--------------------------------+ | "os.path.isabs()" | "PurePath.is_absolute()" | +--------------------------------------+--------------------------------+ | "os.path.join()" | "PurePath.joinpath()" | +--------------------------------------+--------------------------------+ | "os.path.basename()" | "PurePath.name" | +--------------------------------------+--------------------------------+ | "os.path.dirname()" | "PurePath.parent" | +--------------------------------------+--------------------------------+ | "os.path.samefile()" | "Path.samefile()" | +--------------------------------------+--------------------------------+ | "os.path.splitext()" | "PurePath.suffix" | +--------------------------------------+--------------------------------+ "pdb" — The Python Debugger *************************** **Source code:** Lib/pdb.py ====================================================================== The module "pdb" defines an interactive source code debugger for Python programs. It supports setting (conditional) breakpoints and single stepping at the source line level, inspection of stack frames, source code listing, and evaluation of arbitrary Python code in the context of any stack frame. It also supports post-mortem debugging and can be called under program control. The debugger is extensible – it is actually defined as the class "Pdb". This is currently undocumented but easily understood by reading the source. The extension interface uses the modules "bdb" and "cmd". The debugger’s prompt is "(Pdb)". Typical usage to run a program under control of the debugger is: >>> import pdb >>> import mymodule >>> pdb.run('mymodule.test()') > (0)?() (Pdb) continue > (1)?() (Pdb) continue NameError: 'spam' > (1)?() (Pdb) Changed in version 3.3: Tab-completion via the "readline" module is available for commands and command arguments, e.g. the current global and local names are offered as arguments of the "p" command. "pdb.py" can also be invoked as a script to debug other scripts. For example: python3 -m pdb myscript.py When invoked as a script, pdb will automatically enter post-mortem debugging if the program being debugged exits abnormally. After post- mortem debugging (or after normal exit of the program), pdb will restart the program. Automatic restarting preserves pdb’s state (such as breakpoints) and in most cases is more useful than quitting the debugger upon program’s exit. New in version 3.2: "pdb.py" now accepts a "-c" option that executes commands as if given in a ".pdbrc" file, see Debugger Commands. New in version 3.7: "pdb.py" now accepts a "-m" option that execute modules similar to the way "python3 -m" does. As with a script, the debugger will pause execution just before the first line of the module. The typical usage to break into the debugger is to insert: import pdb; pdb.set_trace() at the location you want to break into the debugger, and then run the program. You can then step through the code following this statement, and continue running without the debugger using the "continue" command. New in version 3.7: The built-in "breakpoint()", when called with defaults, can be used instead of "import pdb; pdb.set_trace()". The typical usage to inspect a crashed program is: >>> import pdb >>> import mymodule >>> mymodule.test() Traceback (most recent call last): File "", line 1, in File "./mymodule.py", line 4, in test test2() File "./mymodule.py", line 3, in test2 print(spam) NameError: spam >>> pdb.pm() > ./mymodule.py(3)test2() -> print(spam) (Pdb) The module defines the following functions; each enters the debugger in a slightly different way: pdb.run(statement, globals=None, locals=None) Execute the *statement* (given as a string or a code object) under debugger control. The debugger prompt appears before any code is executed; you can set breakpoints and type "continue", or you can step through the statement using "step" or "next" (all these commands are explained below). The optional *globals* and *locals* arguments specify the environment in which the code is executed; by default the dictionary of the module "__main__" is used. (See the explanation of the built-in "exec()" or "eval()" functions.) pdb.runeval(expression, globals=None, locals=None) Evaluate the *expression* (given as a string or a code object) under debugger control. When "runeval()" returns, it returns the value of the expression. Otherwise this function is similar to "run()". pdb.runcall(function, *args, **kwds) Call the *function* (a function or method object, not a string) with the given arguments. When "runcall()" returns, it returns whatever the function call returned. The debugger prompt appears as soon as the function is entered. pdb.set_trace(*, header=None) Enter the debugger at the calling stack frame. This is useful to hard-code a breakpoint at a given point in a program, even if the code is not otherwise being debugged (e.g. when an assertion fails). If given, *header* is printed to the console just before debugging begins. Changed in version 3.7: The keyword-only argument *header*. pdb.post_mortem(traceback=None) Enter post-mortem debugging of the given *traceback* object. If no *traceback* is given, it uses the one of the exception that is currently being handled (an exception must be being handled if the default is to be used). pdb.pm() Enter post-mortem debugging of the traceback found in "sys.last_traceback". The "run*" functions and "set_trace()" are aliases for instantiating the "Pdb" class and calling the method of the same name. If you want to access further features, you have to do this yourself: class pdb.Pdb(completekey='tab', stdin=None, stdout=None, skip=None, nosigint=False, readrc=True) "Pdb" is the debugger class. The *completekey*, *stdin* and *stdout* arguments are passed to the underlying "cmd.Cmd" class; see the description there. The *skip* argument, if given, must be an iterable of glob-style module name patterns. The debugger will not step into frames that originate in a module that matches one of these patterns. [1] By default, Pdb sets a handler for the SIGINT signal (which is sent when the user presses "Ctrl-C" on the console) when you give a "continue" command. This allows you to break into the debugger again by pressing "Ctrl-C". If you want Pdb not to touch the SIGINT handler, set *nosigint* to true. The *readrc* argument defaults to true and controls whether Pdb will load .pdbrc files from the filesystem. Example call to enable tracing with *skip*: import pdb; pdb.Pdb(skip=['django.*']).set_trace() Raises an auditing event "pdb.Pdb" with no arguments. New in version 3.1: The *skip* argument. New in version 3.2: The *nosigint* argument. Previously, a SIGINT handler was never set by Pdb. Changed in version 3.6: The *readrc* argument. run(statement, globals=None, locals=None) runeval(expression, globals=None, locals=None) runcall(function, *args, **kwds) set_trace() See the documentation for the functions explained above. Debugger Commands ================= The commands recognized by the debugger are listed below. Most commands can be abbreviated to one or two letters as indicated; e.g. "h(elp)" means that either "h" or "help" can be used to enter the help command (but not "he" or "hel", nor "H" or "Help" or "HELP"). Arguments to commands must be separated by whitespace (spaces or tabs). Optional arguments are enclosed in square brackets ("[]") in the command syntax; the square brackets must not be typed. Alternatives in the command syntax are separated by a vertical bar ("|"). Entering a blank line repeats the last command entered. Exception: if the last command was a "list" command, the next 11 lines are listed. Commands that the debugger doesn’t recognize are assumed to be Python statements and are executed in the context of the program being debugged. Python statements can also be prefixed with an exclamation point ("!"). This is a powerful way to inspect the program being debugged; it is even possible to change a variable or call a function. When an exception occurs in such a statement, the exception name is printed but the debugger’s state is not changed. The debugger supports aliases. Aliases can have parameters which allows one a certain level of adaptability to the context under examination. Multiple commands may be entered on a single line, separated by ";;". (A single ";" is not used as it is the separator for multiple commands in a line that is passed to the Python parser.) No intelligence is applied to separating the commands; the input is split at the first ";;" pair, even if it is in the middle of a quoted string. A workaround for strings with double semicolons is to use implicit string concatenation "';'';'" or "";"";"". If a file ".pdbrc" exists in the user’s home directory or in the current directory, it is read in and executed as if it had been typed at the debugger prompt. This is particularly useful for aliases. If both files exist, the one in the home directory is read first and aliases defined there can be overridden by the local file. Changed in version 3.2: ".pdbrc" can now contain commands that continue debugging, such as "continue" or "next". Previously, these commands had no effect. h(elp) [command] Without argument, print the list of available commands. With a *command* as argument, print help about that command. "help pdb" displays the full documentation (the docstring of the "pdb" module). Since the *command* argument must be an identifier, "help exec" must be entered to get help on the "!" command. w(here) Print a stack trace, with the most recent frame at the bottom. An arrow indicates the current frame, which determines the context of most commands. d(own) [count] Move the current frame *count* (default one) levels down in the stack trace (to a newer frame). u(p) [count] Move the current frame *count* (default one) levels up in the stack trace (to an older frame). b(reak) [([filename:]lineno | function) [, condition]] With a *lineno* argument, set a break there in the current file. With a *function* argument, set a break at the first executable statement within that function. The line number may be prefixed with a filename and a colon, to specify a breakpoint in another file (probably one that hasn’t been loaded yet). The file is searched on "sys.path". Note that each breakpoint is assigned a number to which all the other breakpoint commands refer. If a second argument is present, it is an expression which must evaluate to true before the breakpoint is honored. Without argument, list all breaks, including for each breakpoint, the number of times that breakpoint has been hit, the current ignore count, and the associated condition if any. tbreak [([filename:]lineno | function) [, condition]] Temporary breakpoint, which is removed automatically when it is first hit. The arguments are the same as for "break". cl(ear) [filename:lineno | bpnumber ...] With a *filename:lineno* argument, clear all the breakpoints at this line. With a space separated list of breakpoint numbers, clear those breakpoints. Without argument, clear all breaks (but first ask confirmation). disable [bpnumber ...] Disable the breakpoints given as a space separated list of breakpoint numbers. Disabling a breakpoint means it cannot cause the program to stop execution, but unlike clearing a breakpoint, it remains in the list of breakpoints and can be (re-)enabled. enable [bpnumber ...] Enable the breakpoints specified. ignore bpnumber [count] Set the ignore count for the given breakpoint number. If count is omitted, the ignore count is set to 0. A breakpoint becomes active when the ignore count is zero. When non-zero, the count is decremented each time the breakpoint is reached and the breakpoint is not disabled and any associated condition evaluates to true. condition bpnumber [condition] Set a new *condition* for the breakpoint, an expression which must evaluate to true before the breakpoint is honored. If *condition* is absent, any existing condition is removed; i.e., the breakpoint is made unconditional. commands [bpnumber] Specify a list of commands for breakpoint number *bpnumber*. The commands themselves appear on the following lines. Type a line containing just "end" to terminate the commands. An example: (Pdb) commands 1 (com) p some_variable (com) end (Pdb) To remove all commands from a breakpoint, type "commands" and follow it immediately with "end"; that is, give no commands. With no *bpnumber* argument, "commands" refers to the last breakpoint set. You can use breakpoint commands to start your program up again. Simply use the "continue" command, or "step", or any other command that resumes execution. Specifying any command resuming execution (currently "continue", "step", "next", "return", "jump", "quit" and their abbreviations) terminates the command list (as if that command was immediately followed by end). This is because any time you resume execution (even with a simple next or step), you may encounter another breakpoint—which could have its own command list, leading to ambiguities about which list to execute. If you use the ‘silent’ command in the command list, the usual message about stopping at a breakpoint is not printed. This may be desirable for breakpoints that are to print a specific message and then continue. If none of the other commands print anything, you see no sign that the breakpoint was reached. s(tep) Execute the current line, stop at the first possible occasion (either in a function that is called or on the next line in the current function). n(ext) Continue execution until the next line in the current function is reached or it returns. (The difference between "next" and "step" is that "step" stops inside a called function, while "next" executes called functions at (nearly) full speed, only stopping at the next line in the current function.) unt(il) [lineno] Without argument, continue execution until the line with a number greater than the current one is reached. With a line number, continue execution until a line with a number greater or equal to that is reached. In both cases, also stop when the current frame returns. Changed in version 3.2: Allow giving an explicit line number. r(eturn) Continue execution until the current function returns. c(ont(inue)) Continue execution, only stop when a breakpoint is encountered. j(ump) lineno Set the next line that will be executed. Only available in the bottom-most frame. This lets you jump back and execute code again, or jump forward to skip code that you don’t want to run. It should be noted that not all jumps are allowed – for instance it is not possible to jump into the middle of a "for" loop or out of a "finally" clause. l(ist) [first[, last]] List source code for the current file. Without arguments, list 11 lines around the current line or continue the previous listing. With "." as argument, list 11 lines around the current line. With one argument, list 11 lines around at that line. With two arguments, list the given range; if the second argument is less than the first, it is interpreted as a count. The current line in the current frame is indicated by "->". If an exception is being debugged, the line where the exception was originally raised or propagated is indicated by ">>", if it differs from the current line. New in version 3.2: The ">>" marker. ll | longlist List all source code for the current function or frame. Interesting lines are marked as for "list". New in version 3.2. a(rgs) Print the argument list of the current function. p expression Evaluate the *expression* in the current context and print its value. Note: "print()" can also be used, but is not a debugger command — this executes the Python "print()" function. pp expression Like the "p" command, except the value of the expression is pretty- printed using the "pprint" module. whatis expression Print the type of the *expression*. source expression Try to get source code for the given object and display it. New in version 3.2. display [expression] Display the value of the expression if it changed, each time execution stops in the current frame. Without expression, list all display expressions for the current frame. New in version 3.2. undisplay [expression] Do not display the expression any more in the current frame. Without expression, clear all display expressions for the current frame. New in version 3.2. interact Start an interactive interpreter (using the "code" module) whose global namespace contains all the (global and local) names found in the current scope. New in version 3.2. alias [name [command]] Create an alias called *name* that executes *command*. The command must *not* be enclosed in quotes. Replaceable parameters can be indicated by "%1", "%2", and so on, while "%*" is replaced by all the parameters. If no command is given, the current alias for *name* is shown. If no arguments are given, all aliases are listed. Aliases may be nested and can contain anything that can be legally typed at the pdb prompt. Note that internal pdb commands *can* be overridden by aliases. Such a command is then hidden until the alias is removed. Aliasing is recursively applied to the first word of the command line; all other words in the line are left alone. As an example, here are two useful aliases (especially when placed in the ".pdbrc" file): # Print instance variables (usage "pi classInst") alias pi for k in %1.__dict__.keys(): print("%1.",k,"=",%1.__dict__[k]) # Print instance variables in self alias ps pi self unalias name Delete the specified alias. ! statement Execute the (one-line) *statement* in the context of the current stack frame. The exclamation point can be omitted unless the first word of the statement resembles a debugger command. To set a global variable, you can prefix the assignment command with a "global" statement on the same line, e.g.: (Pdb) global list_options; list_options = ['-l'] (Pdb) run [args ...] restart [args ...] Restart the debugged Python program. If an argument is supplied, it is split with "shlex" and the result is used as the new "sys.argv". History, breakpoints, actions and debugger options are preserved. "restart" is an alias for "run". q(uit) Quit from the debugger. The program being executed is aborted. debug code Enter a recursive debugger that steps through the code argument (which is an arbitrary expression or statement to be executed in the current environment). retval Print the return value for the last return of a function. -[ Footnotes ]- [1] Whether a frame is considered to originate in a certain module is determined by the "__name__" in the frame globals. Data Persistence **************** The modules described in this chapter support storing Python data in a persistent form on disk. The "pickle" and "marshal" modules can turn many Python data types into a stream of bytes and then recreate the objects from the bytes. The various DBM-related modules support a family of hash-based file formats that store a mapping of strings to other strings. The list of modules described in this chapter is: * "pickle" — Python object serialization * Relationship to other Python modules * Comparison with "marshal" * Comparison with "json" * Data stream format * Module Interface * What can be pickled and unpickled? * Pickling Class Instances * Persistence of External Objects * Dispatch Tables * Handling Stateful Objects * Custom Reduction for Types, Functions, and Other Objects * Out-of-band Buffers * Provider API * Consumer API * Example * Restricting Globals * Performance * Examples * "copyreg" — Register "pickle" support functions * Example * "shelve" — Python object persistence * Restrictions * Example * "marshal" — Internal Python object serialization * "dbm" — Interfaces to Unix “databases” * "dbm.gnu" — GNU’s reinterpretation of dbm * "dbm.ndbm" — Interface based on ndbm * "dbm.dumb" — Portable DBM implementation * "sqlite3" — DB-API 2.0 interface for SQLite databases * Module functions and constants * Connection Objects * Cursor Objects * Row Objects * Exceptions * SQLite and Python types * Introduction * Using adapters to store additional Python types in SQLite databases * Letting your object adapt itself * Registering an adapter callable * Converting SQLite values to custom Python types * Default adapters and converters * Controlling Transactions * Using "sqlite3" efficiently * Using shortcut methods * Accessing columns by name instead of by index * Using the connection as a context manager "pickle" — Python object serialization ************************************** **Source code:** Lib/pickle.py ====================================================================== The "pickle" module implements binary protocols for serializing and de-serializing a Python object structure. *“Pickling”* is the process whereby a Python object hierarchy is converted into a byte stream, and *“unpickling”* is the inverse operation, whereby a byte stream (from a *binary file* or *bytes-like object*) is converted back into an object hierarchy. Pickling (and unpickling) is alternatively known as “serialization”, “marshalling,” [1] or “flattening”; however, to avoid confusion, the terms used here are “pickling” and “unpickling”. Warning: The "pickle" module **is not secure**. Only unpickle data you trust.It is possible to construct malicious pickle data which will **execute arbitrary code during unpickling**. Never unpickle data that could have come from an untrusted source, or that could have been tampered with.Consider signing data with "hmac" if you need to ensure that it has not been tampered with.Safer serialization formats such as "json" may be more appropriate if you are processing untrusted data. See Comparison with json. Relationship to other Python modules ==================================== Comparison with "marshal" ------------------------- Python has a more primitive serialization module called "marshal", but in general "pickle" should always be the preferred way to serialize Python objects. "marshal" exists primarily to support Python’s ".pyc" files. The "pickle" module differs from "marshal" in several significant ways: * The "pickle" module keeps track of the objects it has already serialized, so that later references to the same object won’t be serialized again. "marshal" doesn’t do this. This has implications both for recursive objects and object sharing. Recursive objects are objects that contain references to themselves. These are not handled by marshal, and in fact, attempting to marshal recursive objects will crash your Python interpreter. Object sharing happens when there are multiple references to the same object in different places in the object hierarchy being serialized. "pickle" stores such objects only once, and ensures that all other references point to the master copy. Shared objects remain shared, which can be very important for mutable objects. * "marshal" cannot be used to serialize user-defined classes and their instances. "pickle" can save and restore class instances transparently, however the class definition must be importable and live in the same module as when the object was stored. * The "marshal" serialization format is not guaranteed to be portable across Python versions. Because its primary job in life is to support ".pyc" files, the Python implementers reserve the right to change the serialization format in non-backwards compatible ways should the need arise. The "pickle" serialization format is guaranteed to be backwards compatible across Python releases provided a compatible pickle protocol is chosen and pickling and unpickling code deals with Python 2 to Python 3 type differences if your data is crossing that unique breaking change language boundary. Comparison with "json" ---------------------- There are fundamental differences between the pickle protocols and JSON (JavaScript Object Notation): * JSON is a text serialization format (it outputs unicode text, although most of the time it is then encoded to "utf-8"), while pickle is a binary serialization format; * JSON is human-readable, while pickle is not; * JSON is interoperable and widely used outside of the Python ecosystem, while pickle is Python-specific; * JSON, by default, can only represent a subset of the Python built-in types, and no custom classes; pickle can represent an extremely large number of Python types (many of them automatically, by clever usage of Python’s introspection facilities; complex cases can be tackled by implementing specific object APIs); * Unlike pickle, deserializing untrusted JSON does not in itself create an arbitrary code execution vulnerability. See also: The "json" module: a standard library module allowing JSON serialization and deserialization. Data stream format ================== The data format used by "pickle" is Python-specific. This has the advantage that there are no restrictions imposed by external standards such as JSON or XDR (which can’t represent pointer sharing); however it means that non-Python programs may not be able to reconstruct pickled Python objects. By default, the "pickle" data format uses a relatively compact binary representation. If you need optimal size characteristics, you can efficiently compress pickled data. The module "pickletools" contains tools for analyzing data streams generated by "pickle". "pickletools" source code has extensive comments about opcodes used by pickle protocols. There are currently 6 different protocols which can be used for pickling. The higher the protocol used, the more recent the version of Python needed to read the pickle produced. * Protocol version 0 is the original “human-readable” protocol and is backwards compatible with earlier versions of Python. * Protocol version 1 is an old binary format which is also compatible with earlier versions of Python. * Protocol version 2 was introduced in Python 2.3. It provides much more efficient pickling of *new-style classes*. Refer to **PEP 307** for information about improvements brought by protocol 2. * Protocol version 3 was added in Python 3.0. It has explicit support for "bytes" objects and cannot be unpickled by Python 2.x. This was the default protocol in Python 3.0–3.7. * Protocol version 4 was added in Python 3.4. It adds support for very large objects, pickling more kinds of objects, and some data format optimizations. It is the default protocol starting with Python 3.8. Refer to **PEP 3154** for information about improvements brought by protocol 4. * Protocol version 5 was added in Python 3.8. It adds support for out-of-band data and speedup for in-band data. Refer to **PEP 574** for information about improvements brought by protocol 5. Note: Serialization is a more primitive notion than persistence; although "pickle" reads and writes file objects, it does not handle the issue of naming persistent objects, nor the (even more complicated) issue of concurrent access to persistent objects. The "pickle" module can transform a complex object into a byte stream and it can transform the byte stream into an object with the same internal structure. Perhaps the most obvious thing to do with these byte streams is to write them onto a file, but it is also conceivable to send them across a network or store them in a database. The "shelve" module provides a simple interface to pickle and unpickle objects on DBM- style database files. Module Interface ================ To serialize an object hierarchy, you simply call the "dumps()" function. Similarly, to de-serialize a data stream, you call the "loads()" function. However, if you want more control over serialization and de-serialization, you can create a "Pickler" or an "Unpickler" object, respectively. The "pickle" module provides the following constants: pickle.HIGHEST_PROTOCOL An integer, the highest protocol version available. This value can be passed as a *protocol* value to functions "dump()" and "dumps()" as well as the "Pickler" constructor. pickle.DEFAULT_PROTOCOL An integer, the default protocol version used for pickling. May be less than "HIGHEST_PROTOCOL". Currently the default protocol is 4, first introduced in Python 3.4 and incompatible with previous versions. Changed in version 3.0: The default protocol is 3. Changed in version 3.8: The default protocol is 4. The "pickle" module provides the following functions to make the pickling process more convenient: pickle.dump(obj, file, protocol=None, *, fix_imports=True, buffer_callback=None) Write the pickled representation of the object *obj* to the open *file object* *file*. This is equivalent to "Pickler(file, protocol).dump(obj)". Arguments *file*, *protocol*, *fix_imports* and *buffer_callback* have the same meaning as in the "Pickler" constructor. Changed in version 3.8: The *buffer_callback* argument was added. pickle.dumps(obj, protocol=None, *, fix_imports=True, buffer_callback=None) Return the pickled representation of the object *obj* as a "bytes" object, instead of writing it to a file. Arguments *protocol*, *fix_imports* and *buffer_callback* have the same meaning as in the "Pickler" constructor. Changed in version 3.8: The *buffer_callback* argument was added. pickle.load(file, *, fix_imports=True, encoding="ASCII", errors="strict", buffers=None) Read the pickled representation of an object from the open *file object* *file* and return the reconstituted object hierarchy specified therein. This is equivalent to "Unpickler(file).load()". The protocol version of the pickle is detected automatically, so no protocol argument is needed. Bytes past the pickled representation of the object are ignored. Arguments *file*, *fix_imports*, *encoding*, *errors*, *strict* and *buffers* have the same meaning as in the "Unpickler" constructor. Changed in version 3.8: The *buffers* argument was added. pickle.loads(data, /, *, fix_imports=True, encoding="ASCII", errors="strict", buffers=None) Return the reconstituted object hierarchy of the pickled representation *data* of an object. *data* must be a *bytes-like object*. The protocol version of the pickle is detected automatically, so no protocol argument is needed. Bytes past the pickled representation of the object are ignored. Arguments *fix_imports*, *encoding*, *errors*, *strict* and *buffers* have the same meaning as in the "Unpickler" constructor. Changed in version 3.8: The *buffers* argument was added. The "pickle" module defines three exceptions: exception pickle.PickleError Common base class for the other pickling exceptions. It inherits "Exception". exception pickle.PicklingError Error raised when an unpicklable object is encountered by "Pickler". It inherits "PickleError". Refer to What can be pickled and unpickled? to learn what kinds of objects can be pickled. exception pickle.UnpicklingError Error raised when there is a problem unpickling an object, such as a data corruption or a security violation. It inherits "PickleError". Note that other exceptions may also be raised during unpickling, including (but not necessarily limited to) AttributeError, EOFError, ImportError, and IndexError. The "pickle" module exports three classes, "Pickler", "Unpickler" and "PickleBuffer": class pickle.Pickler(file, protocol=None, *, fix_imports=True, buffer_callback=None) This takes a binary file for writing a pickle data stream. The optional *protocol* argument, an integer, tells the pickler to use the given protocol; supported protocols are 0 to "HIGHEST_PROTOCOL". If not specified, the default is "DEFAULT_PROTOCOL". If a negative number is specified, "HIGHEST_PROTOCOL" is selected. The *file* argument must have a write() method that accepts a single bytes argument. It can thus be an on-disk file opened for binary writing, an "io.BytesIO" instance, or any other custom object that meets this interface. If *fix_imports* is true and *protocol* is less than 3, pickle will try to map the new Python 3 names to the old module names used in Python 2, so that the pickle data stream is readable with Python 2. If *buffer_callback* is None (the default), buffer views are serialized into *file* as part of the pickle stream. If *buffer_callback* is not None, then it can be called any number of times with a buffer view. If the callback returns a false value (such as None), the given buffer is out-of-band; otherwise the buffer is serialized in-band, i.e. inside the pickle stream. It is an error if *buffer_callback* is not None and *protocol* is None or smaller than 5. Changed in version 3.8: The *buffer_callback* argument was added. dump(obj) Write the pickled representation of *obj* to the open file object given in the constructor. persistent_id(obj) Do nothing by default. This exists so a subclass can override it. If "persistent_id()" returns "None", *obj* is pickled as usual. Any other value causes "Pickler" to emit the returned value as a persistent ID for *obj*. The meaning of this persistent ID should be defined by "Unpickler.persistent_load()". Note that the value returned by "persistent_id()" cannot itself have a persistent ID. See Persistence of External Objects for details and examples of uses. dispatch_table A pickler object’s dispatch table is a registry of *reduction functions* of the kind which can be declared using "copyreg.pickle()". It is a mapping whose keys are classes and whose values are reduction functions. A reduction function takes a single argument of the associated class and should conform to the same interface as a "__reduce__()" method. By default, a pickler object will not have a "dispatch_table" attribute, and it will instead use the global dispatch table managed by the "copyreg" module. However, to customize the pickling for a specific pickler object one can set the "dispatch_table" attribute to a dict-like object. Alternatively, if a subclass of "Pickler" has a "dispatch_table" attribute then this will be used as the default dispatch table for instances of that class. See Dispatch Tables for usage examples. New in version 3.3. reducer_override(obj) Special reducer that can be defined in "Pickler" subclasses. This method has priority over any reducer in the "dispatch_table". It should conform to the same interface as a "__reduce__()" method, and can optionally return "NotImplemented" to fallback on "dispatch_table"-registered reducers to pickle "obj". For a detailed example, see Custom Reduction for Types, Functions, and Other Objects. New in version 3.8. fast Deprecated. Enable fast mode if set to a true value. The fast mode disables the usage of memo, therefore speeding the pickling process by not generating superfluous PUT opcodes. It should not be used with self-referential objects, doing otherwise will cause "Pickler" to recurse infinitely. Use "pickletools.optimize()" if you need more compact pickles. class pickle.Unpickler(file, *, fix_imports=True, encoding="ASCII", errors="strict", buffers=None) This takes a binary file for reading a pickle data stream. The protocol version of the pickle is detected automatically, so no protocol argument is needed. The argument *file* must have three methods, a read() method that takes an integer argument, a readinto() method that takes a buffer argument and a readline() method that requires no arguments, as in the "io.BufferedIOBase" interface. Thus *file* can be an on-disk file opened for binary reading, an "io.BytesIO" object, or any other custom object that meets this interface. The optional arguments *fix_imports*, *encoding* and *errors* are used to control compatibility support for pickle stream generated by Python 2. If *fix_imports* is true, pickle will try to map the old Python 2 names to the new names used in Python 3. The *encoding* and *errors* tell pickle how to decode 8-bit string instances pickled by Python 2; these default to ‘ASCII’ and ‘strict’, respectively. The *encoding* can be ‘bytes’ to read these 8-bit string instances as bytes objects. Using "encoding='latin1'" is required for unpickling NumPy arrays and instances of "datetime", "date" and "time" pickled by Python 2. If *buffers* is None (the default), then all data necessary for deserialization must be contained in the pickle stream. This means that the *buffer_callback* argument was None when a "Pickler" was instantiated (or when "dump()" or "dumps()" was called). If *buffers* is not None, it should be an iterable of buffer- enabled objects that is consumed each time the pickle stream references an out-of-band buffer view. Such buffers have been given in order to the *buffer_callback* of a Pickler object. Changed in version 3.8: The *buffers* argument was added. load() Read the pickled representation of an object from the open file object given in the constructor, and return the reconstituted object hierarchy specified therein. Bytes past the pickled representation of the object are ignored. persistent_load(pid) Raise an "UnpicklingError" by default. If defined, "persistent_load()" should return the object specified by the persistent ID *pid*. If an invalid persistent ID is encountered, an "UnpicklingError" should be raised. See Persistence of External Objects for details and examples of uses. find_class(module, name) Import *module* if necessary and return the object called *name* from it, where the *module* and *name* arguments are "str" objects. Note, unlike its name suggests, "find_class()" is also used for finding functions. Subclasses may override this to gain control over what type of objects and how they can be loaded, potentially reducing security risks. Refer to Restricting Globals for details. Raises an auditing event "pickle.find_class" with arguments "module", "name". class pickle.PickleBuffer(buffer) A wrapper for a buffer representing picklable data. *buffer* must be a buffer-providing object, such as a *bytes-like object* or a N-dimensional array. "PickleBuffer" is itself a buffer provider, therefore it is possible to pass it to other APIs expecting a buffer-providing object, such as "memoryview". "PickleBuffer" objects can only be serialized using pickle protocol 5 or higher. They are eligible for out-of-band serialization. New in version 3.8. raw() Return a "memoryview" of the memory area underlying this buffer. The returned object is a one-dimensional, C-contiguous memoryview with format "B" (unsigned bytes). "BufferError" is raised if the buffer is neither C- nor Fortran-contiguous. release() Release the underlying buffer exposed by the PickleBuffer object. What can be pickled and unpickled? ================================== The following types can be pickled: * "None", "True", and "False"; * integers, floating-point numbers, complex numbers; * strings, bytes, bytearrays; * tuples, lists, sets, and dictionaries containing only picklable objects; * functions (built-in and user-defined) defined at the top level of a module (using "def", not "lambda"); * classes defined at the top level of a module; * instances of such classes whose "__dict__" or the result of calling "__getstate__()" is picklable (see section Pickling Class Instances for details). Attempts to pickle unpicklable objects will raise the "PicklingError" exception; when this happens, an unspecified number of bytes may have already been written to the underlying file. Trying to pickle a highly recursive data structure may exceed the maximum recursion depth, a "RecursionError" will be raised in this case. You can carefully raise this limit with "sys.setrecursionlimit()". Note that functions (built-in and user-defined) are pickled by fully qualified name, not by value. [2] This means that only the function name is pickled, along with the name of the module the function is defined in. Neither the function’s code, nor any of its function attributes are pickled. Thus the defining module must be importable in the unpickling environment, and the module must contain the named object, otherwise an exception will be raised. [3] Similarly, classes are pickled by fully qualified name, so the same restrictions in the unpickling environment apply. Note that none of the class’s code or data is pickled, so in the following example the class attribute "attr" is not restored in the unpickling environment: class Foo: attr = 'A class attribute' picklestring = pickle.dumps(Foo) These restrictions are why picklable functions and classes must be defined at the top level of a module. Similarly, when class instances are pickled, their class’s code and data are not pickled along with them. Only the instance data are pickled. This is done on purpose, so you can fix bugs in a class or add methods to the class and still load objects that were created with an earlier version of the class. If you plan to have long-lived objects that will see many versions of a class, it may be worthwhile to put a version number in the objects so that suitable conversions can be made by the class’s "__setstate__()" method. Pickling Class Instances ======================== In this section, we describe the general mechanisms available to you to define, customize, and control how class instances are pickled and unpickled. In most cases, no additional code is needed to make instances picklable. By default, pickle will retrieve the class and the attributes of an instance via introspection. When a class instance is unpickled, its "__init__()" method is usually *not* invoked. The default behaviour first creates an uninitialized instance and then restores the saved attributes. The following code shows an implementation of this behaviour: def save(obj): return (obj.__class__, obj.__dict__) def restore(cls, attributes): obj = cls.__new__(cls) obj.__dict__.update(attributes) return obj Classes can alter the default behaviour by providing one or several special methods: object.__getnewargs_ex__() In protocols 2 and newer, classes that implements the "__getnewargs_ex__()" method can dictate the values passed to the "__new__()" method upon unpickling. The method must return a pair "(args, kwargs)" where *args* is a tuple of positional arguments and *kwargs* a dictionary of named arguments for constructing the object. Those will be passed to the "__new__()" method upon unpickling. You should implement this method if the "__new__()" method of your class requires keyword-only arguments. Otherwise, it is recommended for compatibility to implement "__getnewargs__()". Changed in version 3.6: "__getnewargs_ex__()" is now used in protocols 2 and 3. object.__getnewargs__() This method serves a similar purpose as "__getnewargs_ex__()", but supports only positional arguments. It must return a tuple of arguments "args" which will be passed to the "__new__()" method upon unpickling. "__getnewargs__()" will not be called if "__getnewargs_ex__()" is defined. Changed in version 3.6: Before Python 3.6, "__getnewargs__()" was called instead of "__getnewargs_ex__()" in protocols 2 and 3. object.__getstate__() Classes can further influence how their instances are pickled; if the class defines the method "__getstate__()", it is called and the returned object is pickled as the contents for the instance, instead of the contents of the instance’s dictionary. If the "__getstate__()" method is absent, the instance’s "__dict__" is pickled as usual. object.__setstate__(state) Upon unpickling, if the class defines "__setstate__()", it is called with the unpickled state. In that case, there is no requirement for the state object to be a dictionary. Otherwise, the pickled state must be a dictionary and its items are assigned to the new instance’s dictionary. Note: If "__getstate__()" returns a false value, the "__setstate__()" method will not be called upon unpickling. Refer to the section Handling Stateful Objects for more information about how to use the methods "__getstate__()" and "__setstate__()". Note: At unpickling time, some methods like "__getattr__()", "__getattribute__()", or "__setattr__()" may be called upon the instance. In case those methods rely on some internal invariant being true, the type should implement "__new__()" to establish such an invariant, as "__init__()" is not called when unpickling an instance. As we shall see, pickle does not use directly the methods described above. In fact, these methods are part of the copy protocol which implements the "__reduce__()" special method. The copy protocol provides a unified interface for retrieving the data necessary for pickling and copying objects. [4] Although powerful, implementing "__reduce__()" directly in your classes is error prone. For this reason, class designers should use the high-level interface (i.e., "__getnewargs_ex__()", "__getstate__()" and "__setstate__()") whenever possible. We will show, however, cases where using "__reduce__()" is the only option or leads to more efficient pickling or both. object.__reduce__() The interface is currently defined as follows. The "__reduce__()" method takes no argument and shall return either a string or preferably a tuple (the returned object is often referred to as the “reduce value”). If a string is returned, the string should be interpreted as the name of a global variable. It should be the object’s local name relative to its module; the pickle module searches the module namespace to determine the object’s module. This behaviour is typically useful for singletons. When a tuple is returned, it must be between two and six items long. Optional items can either be omitted, or "None" can be provided as their value. The semantics of each item are in order: * A callable object that will be called to create the initial version of the object. * A tuple of arguments for the callable object. An empty tuple must be given if the callable does not accept any argument. * Optionally, the object’s state, which will be passed to the object’s "__setstate__()" method as previously described. If the object has no such method then, the value must be a dictionary and it will be added to the object’s "__dict__" attribute. * Optionally, an iterator (and not a sequence) yielding successive items. These items will be appended to the object either using "obj.append(item)" or, in batch, using "obj.extend(list_of_items)". This is primarily used for list subclasses, but may be used by other classes as long as they have "append()" and "extend()" methods with the appropriate signature. (Whether "append()" or "extend()" is used depends on which pickle protocol version is used as well as the number of items to append, so both must be supported.) * Optionally, an iterator (not a sequence) yielding successive key- value pairs. These items will be stored to the object using "obj[key] = value". This is primarily used for dictionary subclasses, but may be used by other classes as long as they implement "__setitem__()". * Optionally, a callable with a "(obj, state)" signature. This callable allows the user to programmatically control the state- updating behavior of a specific object, instead of using "obj"’s static "__setstate__()" method. If not "None", this callable will have priority over "obj"’s "__setstate__()". New in version 3.8: The optional sixth tuple item, "(obj, state)", was added. object.__reduce_ex__(protocol) Alternatively, a "__reduce_ex__()" method may be defined. The only difference is this method should take a single integer argument, the protocol version. When defined, pickle will prefer it over the "__reduce__()" method. In addition, "__reduce__()" automatically becomes a synonym for the extended version. The main use for this method is to provide backwards-compatible reduce values for older Python releases. Persistence of External Objects ------------------------------- For the benefit of object persistence, the "pickle" module supports the notion of a reference to an object outside the pickled data stream. Such objects are referenced by a persistent ID, which should be either a string of alphanumeric characters (for protocol 0) [5] or just an arbitrary object (for any newer protocol). The resolution of such persistent IDs is not defined by the "pickle" module; it will delegate this resolution to the user-defined methods on the pickler and unpickler, "persistent_id()" and "persistent_load()" respectively. To pickle objects that have an external persistent ID, the pickler must have a custom "persistent_id()" method that takes an object as an argument and returns either "None" or the persistent ID for that object. When "None" is returned, the pickler simply pickles the object as normal. When a persistent ID string is returned, the pickler will pickle that object, along with a marker so that the unpickler will recognize it as a persistent ID. To unpickle external objects, the unpickler must have a custom "persistent_load()" method that takes a persistent ID object and returns the referenced object. Here is a comprehensive example presenting how persistent ID can be used to pickle external objects by reference. # Simple example presenting how persistent ID can be used to pickle # external objects by reference. import pickle import sqlite3 from collections import namedtuple # Simple class representing a record in our database. MemoRecord = namedtuple("MemoRecord", "key, task") class DBPickler(pickle.Pickler): def persistent_id(self, obj): # Instead of pickling MemoRecord as a regular class instance, we emit a # persistent ID. if isinstance(obj, MemoRecord): # Here, our persistent ID is simply a tuple, containing a tag and a # key, which refers to a specific record in the database. return ("MemoRecord", obj.key) else: # If obj does not have a persistent ID, return None. This means obj # needs to be pickled as usual. return None class DBUnpickler(pickle.Unpickler): def __init__(self, file, connection): super().__init__(file) self.connection = connection def persistent_load(self, pid): # This method is invoked whenever a persistent ID is encountered. # Here, pid is the tuple returned by DBPickler. cursor = self.connection.cursor() type_tag, key_id = pid if type_tag == "MemoRecord": # Fetch the referenced record from the database and return it. cursor.execute("SELECT * FROM memos WHERE key=?", (str(key_id),)) key, task = cursor.fetchone() return MemoRecord(key, task) else: # Always raises an error if you cannot return the correct object. # Otherwise, the unpickler will think None is the object referenced # by the persistent ID. raise pickle.UnpicklingError("unsupported persistent object") def main(): import io import pprint # Initialize and populate our database. conn = sqlite3.connect(":memory:") cursor = conn.cursor() cursor.execute("CREATE TABLE memos(key INTEGER PRIMARY KEY, task TEXT)") tasks = ( 'give food to fish', 'prepare group meeting', 'fight with a zebra', ) for task in tasks: cursor.execute("INSERT INTO memos VALUES(NULL, ?)", (task,)) # Fetch the records to be pickled. cursor.execute("SELECT * FROM memos") memos = [MemoRecord(key, task) for key, task in cursor] # Save the records using our custom DBPickler. file = io.BytesIO() DBPickler(file).dump(memos) print("Pickled records:") pprint.pprint(memos) # Update a record, just for good measure. cursor.execute("UPDATE memos SET task='learn italian' WHERE key=1") # Load the records from the pickle data stream. file.seek(0) memos = DBUnpickler(file, conn).load() print("Unpickled records:") pprint.pprint(memos) if __name__ == '__main__': main() Dispatch Tables --------------- If one wants to customize pickling of some classes without disturbing any other code which depends on pickling, then one can create a pickler with a private dispatch table. The global dispatch table managed by the "copyreg" module is available as "copyreg.dispatch_table". Therefore, one may choose to use a modified copy of "copyreg.dispatch_table" as a private dispatch table. For example f = io.BytesIO() p = pickle.Pickler(f) p.dispatch_table = copyreg.dispatch_table.copy() p.dispatch_table[SomeClass] = reduce_SomeClass creates an instance of "pickle.Pickler" with a private dispatch table which handles the "SomeClass" class specially. Alternatively, the code class MyPickler(pickle.Pickler): dispatch_table = copyreg.dispatch_table.copy() dispatch_table[SomeClass] = reduce_SomeClass f = io.BytesIO() p = MyPickler(f) does the same but all instances of "MyPickler" will by default share the private dispatch table. On the other hand, the code copyreg.pickle(SomeClass, reduce_SomeClass) f = io.BytesIO() p = pickle.Pickler(f) modifies the global dispatch table shared by all users of the "copyreg" module. Handling Stateful Objects ------------------------- Here’s an example that shows how to modify pickling behavior for a class. The "TextReader" class opens a text file, and returns the line number and line contents each time its "readline()" method is called. If a "TextReader" instance is pickled, all attributes *except* the file object member are saved. When the instance is unpickled, the file is reopened, and reading resumes from the last location. The "__setstate__()" and "__getstate__()" methods are used to implement this behavior. class TextReader: """Print and number lines in a text file.""" def __init__(self, filename): self.filename = filename self.file = open(filename) self.lineno = 0 def readline(self): self.lineno += 1 line = self.file.readline() if not line: return None if line.endswith('\n'): line = line[:-1] return "%i: %s" % (self.lineno, line) def __getstate__(self): # Copy the object's state from self.__dict__ which contains # all our instance attributes. Always use the dict.copy() # method to avoid modifying the original state. state = self.__dict__.copy() # Remove the unpicklable entries. del state['file'] return state def __setstate__(self, state): # Restore instance attributes (i.e., filename and lineno). self.__dict__.update(state) # Restore the previously opened file's state. To do so, we need to # reopen it and read from it until the line count is restored. file = open(self.filename) for _ in range(self.lineno): file.readline() # Finally, save the file. self.file = file A sample usage might be something like this: >>> reader = TextReader("hello.txt") >>> reader.readline() '1: Hello world!' >>> reader.readline() '2: I am line number two.' >>> new_reader = pickle.loads(pickle.dumps(reader)) >>> new_reader.readline() '3: Goodbye!' Custom Reduction for Types, Functions, and Other Objects ======================================================== New in version 3.8. Sometimes, "dispatch_table" may not be flexible enough. In particular we may want to customize pickling based on another criterion than the object’s type, or we may want to customize the pickling of functions and classes. For those cases, it is possible to subclass from the "Pickler" class and implement a "reducer_override()" method. This method can return an arbitrary reduction tuple (see "__reduce__()"). It can alternatively return "NotImplemented" to fallback to the traditional behavior. If both the "dispatch_table" and "reducer_override()" are defined, then "reducer_override()" method takes priority. Note: For performance reasons, "reducer_override()" may not be called for the following objects: "None", "True", "False", and exact instances of "int", "float", "bytes", "str", "dict", "set", "frozenset", "list" and "tuple". Here is a simple example where we allow pickling and reconstructing a given class: import io import pickle class MyClass: my_attribute = 1 class MyPickler(pickle.Pickler): def reducer_override(self, obj): """Custom reducer for MyClass.""" if getattr(obj, "__name__", None) == "MyClass": return type, (obj.__name__, obj.__bases__, {'my_attribute': obj.my_attribute}) else: # For any other object, fallback to usual reduction return NotImplemented f = io.BytesIO() p = MyPickler(f) p.dump(MyClass) del MyClass unpickled_class = pickle.loads(f.getvalue()) assert isinstance(unpickled_class, type) assert unpickled_class.__name__ == "MyClass" assert unpickled_class.my_attribute == 1 Out-of-band Buffers =================== New in version 3.8. In some contexts, the "pickle" module is used to transfer massive amounts of data. Therefore, it can be important to minimize the number of memory copies, to preserve performance and resource consumption. However, normal operation of the "pickle" module, as it transforms a graph-like structure of objects into a sequential stream of bytes, intrinsically involves copying data to and from the pickle stream. This constraint can be eschewed if both the *provider* (the implementation of the object types to be transferred) and the *consumer* (the implementation of the communications system) support the out-of-band transfer facilities provided by pickle protocol 5 and higher. Provider API ------------ The large data objects to be pickled must implement a "__reduce_ex__()" method specialized for protocol 5 and higher, which returns a "PickleBuffer" instance (instead of e.g. a "bytes" object) for any large data. A "PickleBuffer" object *signals* that the underlying buffer is eligible for out-of-band data transfer. Those objects remain compatible with normal usage of the "pickle" module. However, consumers can also opt-in to tell "pickle" that they will handle those buffers by themselves. Consumer API ------------ A communications system can enable custom handling of the "PickleBuffer" objects generated when serializing an object graph. On the sending side, it needs to pass a *buffer_callback* argument to "Pickler" (or to the "dump()" or "dumps()" function), which will be called with each "PickleBuffer" generated while pickling the object graph. Buffers accumulated by the *buffer_callback* will not see their data copied into the pickle stream, only a cheap marker will be inserted. On the receiving side, it needs to pass a *buffers* argument to "Unpickler" (or to the "load()" or "loads()" function), which is an iterable of the buffers which were passed to *buffer_callback*. That iterable should produce buffers in the same order as they were passed to *buffer_callback*. Those buffers will provide the data expected by the reconstructors of the objects whose pickling produced the original "PickleBuffer" objects. Between the sending side and the receiving side, the communications system is free to implement its own transfer mechanism for out-of-band buffers. Potential optimizations include the use of shared memory or datatype-dependent compression. Example ------- Here is a trivial example where we implement a "bytearray" subclass able to participate in out-of-band buffer pickling: class ZeroCopyByteArray(bytearray): def __reduce_ex__(self, protocol): if protocol >= 5: return type(self)._reconstruct, (PickleBuffer(self),), None else: # PickleBuffer is forbidden with pickle protocols <= 4. return type(self)._reconstruct, (bytearray(self),) @classmethod def _reconstruct(cls, obj): with memoryview(obj) as m: # Get a handle over the original buffer object obj = m.obj if type(obj) is cls: # Original buffer object is a ZeroCopyByteArray, return it # as-is. return obj else: return cls(obj) The reconstructor (the "_reconstruct" class method) returns the buffer’s providing object if it has the right type. This is an easy way to simulate zero-copy behaviour on this toy example. On the consumer side, we can pickle those objects the usual way, which when unserialized will give us a copy of the original object: b = ZeroCopyByteArray(b"abc") data = pickle.dumps(b, protocol=5) new_b = pickle.loads(data) print(b == new_b) # True print(b is new_b) # False: a copy was made But if we pass a *buffer_callback* and then give back the accumulated buffers when unserializing, we are able to get back the original object: b = ZeroCopyByteArray(b"abc") buffers = [] data = pickle.dumps(b, protocol=5, buffer_callback=buffers.append) new_b = pickle.loads(data, buffers=buffers) print(b == new_b) # True print(b is new_b) # True: no copy was made This example is limited by the fact that "bytearray" allocates its own memory: you cannot create a "bytearray" instance that is backed by another object’s memory. However, third-party datatypes such as NumPy arrays do not have this limitation, and allow use of zero-copy pickling (or making as few copies as possible) when transferring between distinct processes or systems. See also: **PEP 574** – Pickle protocol 5 with out-of-band data Restricting Globals =================== By default, unpickling will import any class or function that it finds in the pickle data. For many applications, this behaviour is unacceptable as it permits the unpickler to import and invoke arbitrary code. Just consider what this hand-crafted pickle data stream does when loaded: >>> import pickle >>> pickle.loads(b"cos\nsystem\n(S'echo hello world'\ntR.") hello world 0 In this example, the unpickler imports the "os.system()" function and then apply the string argument “echo hello world”. Although this example is inoffensive, it is not difficult to imagine one that could damage your system. For this reason, you may want to control what gets unpickled by customizing "Unpickler.find_class()". Unlike its name suggests, "Unpickler.find_class()" is called whenever a global (i.e., a class or a function) is requested. Thus it is possible to either completely forbid globals or restrict them to a safe subset. Here is an example of an unpickler allowing only few safe classes from the "builtins" module to be loaded: import builtins import io import pickle safe_builtins = { 'range', 'complex', 'set', 'frozenset', 'slice', } class RestrictedUnpickler(pickle.Unpickler): def find_class(self, module, name): # Only allow safe classes from builtins. if module == "builtins" and name in safe_builtins: return getattr(builtins, name) # Forbid everything else. raise pickle.UnpicklingError("global '%s.%s' is forbidden" % (module, name)) def restricted_loads(s): """Helper function analogous to pickle.loads().""" return RestrictedUnpickler(io.BytesIO(s)).load() A sample usage of our unpickler working as intended: >>> restricted_loads(pickle.dumps([1, 2, range(15)])) [1, 2, range(0, 15)] >>> restricted_loads(b"cos\nsystem\n(S'echo hello world'\ntR.") Traceback (most recent call last): ... pickle.UnpicklingError: global 'os.system' is forbidden >>> restricted_loads(b'cbuiltins\neval\n' ... b'(S\'getattr(__import__("os"), "system")' ... b'("echo hello world")\'\ntR.') Traceback (most recent call last): ... pickle.UnpicklingError: global 'builtins.eval' is forbidden As our examples shows, you have to be careful with what you allow to be unpickled. Therefore if security is a concern, you may want to consider alternatives such as the marshalling API in "xmlrpc.client" or third-party solutions. Performance =========== Recent versions of the pickle protocol (from protocol 2 and upwards) feature efficient binary encodings for several common features and built-in types. Also, the "pickle" module has a transparent optimizer written in C. Examples ======== For the simplest code, use the "dump()" and "load()" functions. import pickle # An arbitrary collection of objects supported by pickle. data = { 'a': [1, 2.0, 3+4j], 'b': ("character string", b"byte string"), 'c': {None, True, False} } with open('data.pickle', 'wb') as f: # Pickle the 'data' dictionary using the highest protocol available. pickle.dump(data, f, pickle.HIGHEST_PROTOCOL) The following example reads the resulting pickled data. import pickle with open('data.pickle', 'rb') as f: # The protocol version used is detected automatically, so we do not # have to specify it. data = pickle.load(f) See also: Module "copyreg" Pickle interface constructor registration for extension types. Module "pickletools" Tools for working with and analyzing pickled data. Module "shelve" Indexed databases of objects; uses "pickle". Module "copy" Shallow and deep object copying. Module "marshal" High-performance serialization of built-in types. -[ Footnotes ]- [1] Don’t confuse this with the "marshal" module [2] This is why "lambda" functions cannot be pickled: all "lambda" functions share the same name: "". [3] The exception raised will likely be an "ImportError" or an "AttributeError" but it could be something else. [4] The "copy" module uses this protocol for shallow and deep copying operations. [5] The limitation on alphanumeric characters is due to the fact that persistent IDs in protocol 0 are delimited by the newline character. Therefore if any kind of newline characters occurs in persistent IDs, the resulting pickled data will become unreadable. "pickletools" — Tools for pickle developers ******************************************* **Source code:** Lib/pickletools.py ====================================================================== This module contains various constants relating to the intimate details of the "pickle" module, some lengthy comments about the implementation, and a few useful functions for analyzing pickled data. The contents of this module are useful for Python core developers who are working on the "pickle"; ordinary users of the "pickle" module probably won’t find the "pickletools" module relevant. Command line usage ================== New in version 3.2. When invoked from the command line, "python -m pickletools" will disassemble the contents of one or more pickle files. Note that if you want to see the Python object stored in the pickle rather than the details of pickle format, you may want to use "-m pickle" instead. However, when the pickle file that you want to examine comes from an untrusted source, "-m pickletools" is a safer option because it does not execute pickle bytecode. For example, with a tuple "(1, 2)" pickled in file "x.pickle": $ python -m pickle x.pickle (1, 2) $ python -m pickletools x.pickle 0: \x80 PROTO 3 2: K BININT1 1 4: K BININT1 2 6: \x86 TUPLE2 7: q BINPUT 0 9: . STOP highest protocol among opcodes = 2 Command line options -------------------- -a, --annotate Annotate each line with a short opcode description. -o, --output= Name of a file where the output should be written. -l, --indentlevel= The number of blanks by which to indent a new MARK level. -m, --memo When multiple objects are disassembled, preserve memo between disassemblies. -p, --preamble= When more than one pickle file are specified, print given preamble before each disassembly. Programmatic Interface ====================== pickletools.dis(pickle, out=None, memo=None, indentlevel=4, annotate=0) Outputs a symbolic disassembly of the pickle to the file-like object *out*, defaulting to "sys.stdout". *pickle* can be a string or a file-like object. *memo* can be a Python dictionary that will be used as the pickle’s memo; it can be used to perform disassemblies across multiple pickles created by the same pickler. Successive levels, indicated by "MARK" opcodes in the stream, are indented by *indentlevel* spaces. If a nonzero value is given to *annotate*, each opcode in the output is annotated with a short description. The value of *annotate* is used as a hint for the column where annotation should start. New in version 3.2: The *annotate* argument. pickletools.genops(pickle) Provides an *iterator* over all of the opcodes in a pickle, returning a sequence of "(opcode, arg, pos)" triples. *opcode* is an instance of an "OpcodeInfo" class; *arg* is the decoded value, as a Python object, of the opcode’s argument; *pos* is the position at which this opcode is located. *pickle* can be a string or a file-like object. pickletools.optimize(picklestring) Returns a new equivalent pickle string after eliminating unused "PUT" opcodes. The optimized pickle is shorter, takes less transmission time, requires less storage space, and unpickles more efficiently. "pipes" — Interface to shell pipelines ************************************** **Source code:** Lib/pipes.py Deprecated since version 3.11: The "pipes" module is deprecated (see **PEP 594** for details). Please use the "subprocess" module instead. ====================================================================== The "pipes" module defines a class to abstract the concept of a *pipeline* — a sequence of converters from one file to another. Because the module uses **/bin/sh** command lines, a POSIX or compatible shell for "os.system()" and "os.popen()" is required. The "pipes" module defines the following class: class pipes.Template An abstraction of a pipeline. Example: >>> import pipes >>> t = pipes.Template() >>> t.append('tr a-z A-Z', '--') >>> f = t.open('pipefile', 'w') >>> f.write('hello world') >>> f.close() >>> open('pipefile').read() 'HELLO WORLD' Template Objects ================ Template objects following methods: Template.reset() Restore a pipeline template to its initial state. Template.clone() Return a new, equivalent, pipeline template. Template.debug(flag) If *flag* is true, turn debugging on. Otherwise, turn debugging off. When debugging is on, commands to be executed are printed, and the shell is given "set -x" command to be more verbose. Template.append(cmd, kind) Append a new action at the end. The *cmd* variable must be a valid bourne shell command. The *kind* variable consists of two letters. The first letter can be either of "'-'" (which means the command reads its standard input), "'f'" (which means the commands reads a given file on the command line) or "'.'" (which means the commands reads no input, and hence must be first.) Similarly, the second letter can be either of "'-'" (which means the command writes to standard output), "'f'" (which means the command writes a file on the command line) or "'.'" (which means the command does not write anything, and hence must be last.) Template.prepend(cmd, kind) Add a new action at the beginning. See "append()" for explanations of the arguments. Template.open(file, mode) Return a file-like object, open to *file*, but read from or written to by the pipeline. Note that only one of "'r'", "'w'" may be given. Template.copy(infile, outfile) Copy *infile* to *outfile* through the pipe. "pkgutil" — Package extension utility ************************************* **Source code:** Lib/pkgutil.py ====================================================================== This module provides utilities for the import system, in particular package support. class pkgutil.ModuleInfo(module_finder, name, ispkg) A namedtuple that holds a brief summary of a module’s info. New in version 3.6. pkgutil.extend_path(path, name) Extend the search path for the modules which comprise a package. Intended use is to place the following code in a package’s "__init__.py": from pkgutil import extend_path __path__ = extend_path(__path__, __name__) This will add to the package’s "__path__" all subdirectories of directories on "sys.path" named after the package. This is useful if one wants to distribute different parts of a single logical package as multiple directories. It also looks for "*.pkg" files beginning where "*" matches the *name* argument. This feature is similar to "*.pth" files (see the "site" module for more information), except that it doesn’t special-case lines starting with "import". A "*.pkg" file is trusted at face value: apart from checking for duplicates, all entries found in a "*.pkg" file are added to the path, regardless of whether they exist on the filesystem. (This is a feature.) If the input path is not a list (as is the case for frozen packages) it is returned unchanged. The input path is not modified; an extended copy is returned. Items are only appended to the copy at the end. It is assumed that "sys.path" is a sequence. Items of "sys.path" that are not strings referring to existing directories are ignored. Unicode items on "sys.path" that cause errors when used as filenames may cause this function to raise an exception (in line with "os.path.isdir()" behavior). class pkgutil.ImpImporter(dirname=None) **PEP 302** Finder that wraps Python’s “classic” import algorithm. If *dirname* is a string, a **PEP 302** finder is created that searches that directory. If *dirname* is "None", a **PEP 302** finder is created that searches the current "sys.path", plus any modules that are frozen or built-in. Note that "ImpImporter" does not currently support being used by placement on "sys.meta_path". Deprecated since version 3.3: This emulation is no longer needed, as the standard import mechanism is now fully **PEP 302** compliant and available in "importlib". class pkgutil.ImpLoader(fullname, file, filename, etc) *Loader* that wraps Python’s “classic” import algorithm. Deprecated since version 3.3: This emulation is no longer needed, as the standard import mechanism is now fully **PEP 302** compliant and available in "importlib". pkgutil.find_loader(fullname) Retrieve a module *loader* for the given *fullname*. This is a backwards compatibility wrapper around "importlib.util.find_spec()" that converts most failures to "ImportError" and only returns the loader rather than the full "ModuleSpec". Changed in version 3.3: Updated to be based directly on "importlib" rather than relying on the package internal **PEP 302** import emulation. Changed in version 3.4: Updated to be based on **PEP 451** pkgutil.get_importer(path_item) Retrieve a *finder* for the given *path_item*. The returned finder is cached in "sys.path_importer_cache" if it was newly created by a path hook. The cache (or part of it) can be cleared manually if a rescan of "sys.path_hooks" is necessary. Changed in version 3.3: Updated to be based directly on "importlib" rather than relying on the package internal **PEP 302** import emulation. pkgutil.get_loader(module_or_name) Get a *loader* object for *module_or_name*. If the module or package is accessible via the normal import mechanism, a wrapper around the relevant part of that machinery is returned. Returns "None" if the module cannot be found or imported. If the named module is not already imported, its containing package (if any) is imported, in order to establish the package "__path__". Changed in version 3.3: Updated to be based directly on "importlib" rather than relying on the package internal **PEP 302** import emulation. Changed in version 3.4: Updated to be based on **PEP 451** pkgutil.iter_importers(fullname='') Yield *finder* objects for the given module name. If fullname contains a "'.'", the finders will be for the package containing fullname, otherwise they will be all registered top level finders (i.e. those on both "sys.meta_path" and "sys.path_hooks"). If the named module is in a package, that package is imported as a side effect of invoking this function. If no module name is specified, all top level finders are produced. Changed in version 3.3: Updated to be based directly on "importlib" rather than relying on the package internal **PEP 302** import emulation. pkgutil.iter_modules(path=None, prefix='') Yields "ModuleInfo" for all submodules on *path*, or, if *path* is "None", all top-level modules on "sys.path". *path* should be either "None" or a list of paths to look for modules in. *prefix* is a string to output on the front of every module name on output. Note: Only works for a *finder* which defines an "iter_modules()" method. This interface is non-standard, so the module also provides implementations for "importlib.machinery.FileFinder" and "zipimport.zipimporter". Changed in version 3.3: Updated to be based directly on "importlib" rather than relying on the package internal **PEP 302** import emulation. pkgutil.walk_packages(path=None, prefix='', onerror=None) Yields "ModuleInfo" for all modules recursively on *path*, or, if *path* is "None", all accessible modules. *path* should be either "None" or a list of paths to look for modules in. *prefix* is a string to output on the front of every module name on output. Note that this function must import all *packages* (*not* all modules!) on the given *path*, in order to access the "__path__" attribute to find submodules. *onerror* is a function which gets called with one argument (the name of the package which was being imported) if any exception occurs while trying to import a package. If no *onerror* function is supplied, "ImportError"s are caught and ignored, while all other exceptions are propagated, terminating the search. Examples: # list all modules python can access walk_packages() # list all submodules of ctypes walk_packages(ctypes.__path__, ctypes.__name__ + '.') Note: Only works for a *finder* which defines an "iter_modules()" method. This interface is non-standard, so the module also provides implementations for "importlib.machinery.FileFinder" and "zipimport.zipimporter". Changed in version 3.3: Updated to be based directly on "importlib" rather than relying on the package internal **PEP 302** import emulation. pkgutil.get_data(package, resource) Get a resource from a package. This is a wrapper for the *loader* "get_data" API. The *package* argument should be the name of a package, in standard module format ("foo.bar"). The *resource* argument should be in the form of a relative filename, using "/" as the path separator. The parent directory name ".." is not allowed, and nor is a rooted name (starting with a "/"). The function returns a binary string that is the contents of the specified resource. For packages located in the filesystem, which have already been imported, this is the rough equivalent of: d = os.path.dirname(sys.modules[package].__file__) data = open(os.path.join(d, resource), 'rb').read() If the package cannot be located or loaded, or it uses a *loader* which does not support "get_data", then "None" is returned. In particular, the *loader* for *namespace packages* does not support "get_data". pkgutil.resolve_name(name) Resolve a name to an object. This functionality is used in numerous places in the standard library (see bpo-12915) - and equivalent functionality is also in widely used third-party packages such as setuptools, Django and Pyramid. It is expected that *name* will be a string in one of the following formats, where W is shorthand for a valid Python identifier and dot stands for a literal period in these pseudo-regexes: * "W(.W)*" * "W(.W)*:(W(.W)*)?" The first form is intended for backward compatibility only. It assumes that some part of the dotted name is a package, and the rest is an object somewhere within that package, possibly nested inside other objects. Because the place where the package stops and the object hierarchy starts can’t be inferred by inspection, repeated attempts to import must be done with this form. In the second form, the caller makes the division point clear through the provision of a single colon: the dotted name to the left of the colon is a package to be imported, and the dotted name to the right is the object hierarchy within that package. Only one import is needed in this form. If it ends with the colon, then a module object is returned. The function will return an object (which might be a module), or raise one of the following exceptions: "ValueError" – if *name* isn’t in a recognised format. "ImportError" – if an import failed when it shouldn’t have. "AttributeError" – If a failure occurred when traversing the object hierarchy within the imported package to get to the desired object. New in version 3.9. "platform" — Access to underlying platform’s identifying data ************************************************************** **Source code:** Lib/platform.py ====================================================================== Note: Specific platforms listed alphabetically, with Linux included in the Unix section. Cross Platform ============== platform.architecture(executable=sys.executable, bits='', linkage='') Queries the given executable (defaults to the Python interpreter binary) for various architecture information. Returns a tuple "(bits, linkage)" which contain information about the bit architecture and the linkage format used for the executable. Both values are returned as strings. Values that cannot be determined are returned as given by the parameter presets. If bits is given as "''", the "sizeof(pointer)" (or "sizeof(long)" on Python version < 1.5.2) is used as indicator for the supported pointer size. The function relies on the system’s "file" command to do the actual work. This is available on most if not all Unix platforms and some non-Unix platforms and then only if the executable points to the Python interpreter. Reasonable defaults are used when the above needs are not met. Note: On macOS (and perhaps other platforms), executable files may be universal files containing multiple architectures.To get at the “64-bitness” of the current interpreter, it is more reliable to query the "sys.maxsize" attribute: is_64bits = sys.maxsize > 2**32 platform.machine() Returns the machine type, e.g. "'i386'". An empty string is returned if the value cannot be determined. platform.node() Returns the computer’s network name (may not be fully qualified!). An empty string is returned if the value cannot be determined. platform.platform(aliased=0, terse=0) Returns a single string identifying the underlying platform with as much useful information as possible. The output is intended to be *human readable* rather than machine parseable. It may look different on different platforms and this is intended. If *aliased* is true, the function will use aliases for various platforms that report system names which differ from their common names, for example SunOS will be reported as Solaris. The "system_alias()" function is used to implement this. Setting *terse* to true causes the function to return only the absolute minimum information needed to identify the platform. Changed in version 3.8: On macOS, the function now uses "mac_ver()", if it returns a non-empty release string, to get the macOS version rather than the darwin version. platform.processor() Returns the (real) processor name, e.g. "'amdk6'". An empty string is returned if the value cannot be determined. Note that many platforms do not provide this information or simply return the same value as for "machine()". NetBSD does this. platform.python_build() Returns a tuple "(buildno, builddate)" stating the Python build number and date as strings. platform.python_compiler() Returns a string identifying the compiler used for compiling Python. platform.python_branch() Returns a string identifying the Python implementation SCM branch. platform.python_implementation() Returns a string identifying the Python implementation. Possible return values are: ‘CPython’, ‘IronPython’, ‘Jython’, ‘PyPy’. platform.python_revision() Returns a string identifying the Python implementation SCM revision. platform.python_version() Returns the Python version as string "'major.minor.patchlevel'". Note that unlike the Python "sys.version", the returned value will always include the patchlevel (it defaults to 0). platform.python_version_tuple() Returns the Python version as tuple "(major, minor, patchlevel)" of strings. Note that unlike the Python "sys.version", the returned value will always include the patchlevel (it defaults to "'0'"). platform.release() Returns the system’s release, e.g. "'2.2.0'" or "'NT'". An empty string is returned if the value cannot be determined. platform.system() Returns the system/OS name, such as "'Linux'", "'Darwin'", "'Java'", "'Windows'". An empty string is returned if the value cannot be determined. platform.system_alias(system, release, version) Returns "(system, release, version)" aliased to common marketing names used for some systems. It also does some reordering of the information in some cases where it would otherwise cause confusion. platform.version() Returns the system’s release version, e.g. "'#3 on degas'". An empty string is returned if the value cannot be determined. platform.uname() Fairly portable uname interface. Returns a "namedtuple()" containing six attributes: "system", "node", "release", "version", "machine", and "processor". Note that this adds a sixth attribute ("processor") not present in the "os.uname()" result. Also, the attribute names are different for the first two attributes; "os.uname()" names them "sysname" and "nodename". Entries which cannot be determined are set to "''". Changed in version 3.3: Result changed from a tuple to a "namedtuple()". Java Platform ============= platform.java_ver(release='', vendor='', vminfo=('', '', ''), osinfo=('', '', '')) Version interface for Jython. Returns a tuple "(release, vendor, vminfo, osinfo)" with *vminfo* being a tuple "(vm_name, vm_release, vm_vendor)" and *osinfo* being a tuple "(os_name, os_version, os_arch)". Values which cannot be determined are set to the defaults given as parameters (which all default to "''"). Windows Platform ================ platform.win32_ver(release='', version='', csd='', ptype='') Get additional version information from the Windows Registry and return a tuple "(release, version, csd, ptype)" referring to OS release, version number, CSD level (service pack) and OS type (multi/single processor). Values which cannot be determined are set to the defaults given as parameters (which all default to an empty string). As a hint: *ptype* is "'Uniprocessor Free'" on single processor NT machines and "'Multiprocessor Free'" on multi processor machines. The *‘Free’* refers to the OS version being free of debugging code. It could also state *‘Checked’* which means the OS version uses debugging code, i.e. code that checks arguments, ranges, etc. platform.win32_edition() Returns a string representing the current Windows edition, or "None" if the value cannot be determined. Possible values include but are not limited to "'Enterprise'", "'IoTUAP'", "'ServerStandard'", and "'nanoserver'". New in version 3.8. platform.win32_is_iot() Return "True" if the Windows edition returned by "win32_edition()" is recognized as an IoT edition. New in version 3.8. macOS Platform ============== platform.mac_ver(release='', versioninfo=('', '', ''), machine='') Get macOS version information and return it as tuple "(release, versioninfo, machine)" with *versioninfo* being a tuple "(version, dev_stage, non_release_version)". Entries which cannot be determined are set to "''". All tuple entries are strings. Unix Platforms ============== platform.libc_ver(executable=sys.executable, lib='', version='', chunksize=16384) Tries to determine the libc version against which the file executable (defaults to the Python interpreter) is linked. Returns a tuple of strings "(lib, version)" which default to the given parameters in case the lookup fails. Note that this function has intimate knowledge of how different libc versions add symbols to the executable is probably only usable for executables compiled using **gcc**. The file is read and scanned in chunks of *chunksize* bytes. "plistlib" — Generate and parse Apple ".plist" files **************************************************** **Source code:** Lib/plistlib.py ====================================================================== This module provides an interface for reading and writing the “property list” files used by Apple, primarily on macOS and iOS. This module supports both binary and XML plist files. The property list (".plist") file format is a simple serialization supporting basic object types, like dictionaries, lists, numbers and strings. Usually the top level object is a dictionary. To write out and to parse a plist file, use the "dump()" and "load()" functions. To work with plist data in bytes objects, use "dumps()" and "loads()". Values can be strings, integers, floats, booleans, tuples, lists, dictionaries (but only with string keys), "bytes", "bytearray" or "datetime.datetime" objects. Changed in version 3.4: New API, old API deprecated. Support for binary format plists added. Changed in version 3.8: Support added for reading and writing "UID" tokens in binary plists as used by NSKeyedArchiver and NSKeyedUnarchiver. Changed in version 3.9: Old API removed. See also: PList manual page Apple’s documentation of the file format. This module defines the following functions: plistlib.load(fp, *, fmt=None, dict_type=dict) Read a plist file. *fp* should be a readable and binary file object. Return the unpacked root object (which usually is a dictionary). The *fmt* is the format of the file and the following values are valid: * "None": Autodetect the file format * "FMT_XML": XML file format * "FMT_BINARY": Binary plist format The *dict_type* is the type used for dictionaries that are read from the plist file. XML data for the "FMT_XML" format is parsed using the Expat parser from "xml.parsers.expat" – see its documentation for possible exceptions on ill-formed XML. Unknown elements will simply be ignored by the plist parser. The parser for the binary format raises "InvalidFileException" when the file cannot be parsed. New in version 3.4. plistlib.loads(data, *, fmt=None, dict_type=dict) Load a plist from a bytes object. See "load()" for an explanation of the keyword arguments. New in version 3.4. plistlib.dump(value, fp, *, fmt=FMT_XML, sort_keys=True, skipkeys=False) Write *value* to a plist file. *Fp* should be a writable, binary file object. The *fmt* argument specifies the format of the plist file and can be one of the following values: * "FMT_XML": XML formatted plist file * "FMT_BINARY": Binary formatted plist file When *sort_keys* is true (the default) the keys for dictionaries will be written to the plist in sorted order, otherwise they will be written in the iteration order of the dictionary. When *skipkeys* is false (the default) the function raises "TypeError" when a key of a dictionary is not a string, otherwise such keys are skipped. A "TypeError" will be raised if the object is of an unsupported type or a container that contains objects of unsupported types. An "OverflowError" will be raised for integer values that cannot be represented in (binary) plist files. New in version 3.4. plistlib.dumps(value, *, fmt=FMT_XML, sort_keys=True, skipkeys=False) Return *value* as a plist-formatted bytes object. See the documentation for "dump()" for an explanation of the keyword arguments of this function. New in version 3.4. The following classes are available: class plistlib.UID(data) Wraps an "int". This is used when reading or writing NSKeyedArchiver encoded data, which contains UID (see PList manual). It has one attribute, "data", which can be used to retrieve the int value of the UID. "data" must be in the range "0 <= data < 2**64". New in version 3.8. The following constants are available: plistlib.FMT_XML The XML format for plist files. New in version 3.4. plistlib.FMT_BINARY The binary format for plist files New in version 3.4. Examples ======== Generating a plist: pl = dict( aString = "Doodah", aList = ["A", "B", 12, 32.1, [1, 2, 3]], aFloat = 0.1, anInt = 728, aDict = dict( anotherString = "", aThirdString = "M\xe4ssig, Ma\xdf", aTrueValue = True, aFalseValue = False, ), someData = b"", someMoreData = b"" * 10, aDate = datetime.datetime.fromtimestamp(time.mktime(time.gmtime())), ) with open(fileName, 'wb') as fp: dump(pl, fp) Parsing a plist: with open(fileName, 'rb') as fp: pl = load(fp) print(pl["aKey"]) "poplib" — POP3 protocol client ******************************* **Source code:** Lib/poplib.py ====================================================================== This module defines a class, "POP3", which encapsulates a connection to a POP3 server and implements the protocol as defined in **RFC 1939**. The "POP3" class supports both the minimal and optional command sets from **RFC 1939**. The "POP3" class also supports the "STLS" command introduced in **RFC 2595** to enable encrypted communication on an already established connection. Additionally, this module provides a class "POP3_SSL", which provides support for connecting to POP3 servers that use SSL as an underlying protocol layer. Note that POP3, though widely supported, is obsolescent. The implementation quality of POP3 servers varies widely, and too many are quite poor. If your mailserver supports IMAP, you would be better off using the "imaplib.IMAP4" class, as IMAP servers tend to be better implemented. The "poplib" module provides two classes: class poplib.POP3(host, port=POP3_PORT[, timeout]) This class implements the actual POP3 protocol. The connection is created when the instance is initialized. If *port* is omitted, the standard POP3 port (110) is used. The optional *timeout* parameter specifies a timeout in seconds for the connection attempt (if not specified, the global default timeout setting will be used). Raises an auditing event "poplib.connect" with arguments "self", "host", "port". All commands will raise an auditing event "poplib.putline" with arguments "self" and "line", where "line" is the bytes about to be sent to the remote host. Changed in version 3.9: If the *timeout* parameter is set to be zero, it will raise a "ValueError" to prevent the creation of a non-blocking socket. class poplib.POP3_SSL(host, port=POP3_SSL_PORT, keyfile=None, certfile=None, timeout=None, context=None) This is a subclass of "POP3" that connects to the server over an SSL encrypted socket. If *port* is not specified, 995, the standard POP3-over-SSL port is used. *timeout* works as in the "POP3" constructor. *context* is an optional "ssl.SSLContext" object which allows bundling SSL configuration options, certificates and private keys into a single (potentially long- lived) structure. Please read Security considerations for best practices. *keyfile* and *certfile* are a legacy alternative to *context* - they can point to PEM-formatted private key and certificate chain files, respectively, for the SSL connection. Raises an auditing event "poplib.connect" with arguments "self", "host", "port". All commands will raise an auditing event "poplib.putline" with arguments "self" and "line", where "line" is the bytes about to be sent to the remote host. Changed in version 3.2: *context* parameter added. Changed in version 3.4: The class now supports hostname check with "ssl.SSLContext.check_hostname" and *Server Name Indication* (see "ssl.HAS_SNI"). Deprecated since version 3.6: *keyfile* and *certfile* are deprecated in favor of *context*. Please use "ssl.SSLContext.load_cert_chain()" instead, or let "ssl.create_default_context()" select the system’s trusted CA certificates for you. Changed in version 3.9: If the *timeout* parameter is set to be zero, it will raise a "ValueError" to prevent the creation of a non-blocking socket. One exception is defined as an attribute of the "poplib" module: exception poplib.error_proto Exception raised on any errors from this module (errors from "socket" module are not caught). The reason for the exception is passed to the constructor as a string. See also: Module "imaplib" The standard Python IMAP module. Frequently Asked Questions About Fetchmail The FAQ for the **fetchmail** POP/IMAP client collects information on POP3 server variations and RFC noncompliance that may be useful if you need to write an application based on the POP protocol. POP3 Objects ============ All POP3 commands are represented by methods of the same name, in lower-case; most return the response text sent by the server. An "POP3" instance has the following methods: POP3.set_debuglevel(level) Set the instance’s debugging level. This controls the amount of debugging output printed. The default, "0", produces no debugging output. A value of "1" produces a moderate amount of debugging output, generally a single line per request. A value of "2" or higher produces the maximum amount of debugging output, logging each line sent and received on the control connection. POP3.getwelcome() Returns the greeting string sent by the POP3 server. POP3.capa() Query the server’s capabilities as specified in **RFC 2449**. Returns a dictionary in the form "{'name': ['param'...]}". New in version 3.4. POP3.user(username) Send user command, response should indicate that a password is required. POP3.pass_(password) Send password, response includes message count and mailbox size. Note: the mailbox on the server is locked until "quit()" is called. POP3.apop(user, secret) Use the more secure APOP authentication to log into the POP3 server. POP3.rpop(user) Use RPOP authentication (similar to UNIX r-commands) to log into POP3 server. POP3.stat() Get mailbox status. The result is a tuple of 2 integers: "(message count, mailbox size)". POP3.list([which]) Request message list, result is in the form "(response, ['mesg_num octets', ...], octets)". If *which* is set, it is the message to list. POP3.retr(which) Retrieve whole message number *which*, and set its seen flag. Result is in form "(response, ['line', ...], octets)". POP3.dele(which) Flag message number *which* for deletion. On most servers deletions are not actually performed until QUIT (the major exception is Eudora QPOP, which deliberately violates the RFCs by doing pending deletes on any disconnect). POP3.rset() Remove any deletion marks for the mailbox. POP3.noop() Do nothing. Might be used as a keep-alive. POP3.quit() Signoff: commit changes, unlock mailbox, drop connection. POP3.top(which, howmuch) Retrieves the message header plus *howmuch* lines of the message after the header of message number *which*. Result is in form "(response, ['line', ...], octets)". The POP3 TOP command this method uses, unlike the RETR command, doesn’t set the message’s seen flag; unfortunately, TOP is poorly specified in the RFCs and is frequently broken in off-brand servers. Test this method by hand against the POP3 servers you will use before trusting it. POP3.uidl(which=None) Return message digest (unique id) list. If *which* is specified, result contains the unique id for that message in the form "'response mesgnum uid", otherwise result is list "(response, ['mesgnum uid', ...], octets)". POP3.utf8() Try to switch to UTF-8 mode. Returns the server response if successful, raises "error_proto" if not. Specified in **RFC 6856**. New in version 3.5. POP3.stls(context=None) Start a TLS session on the active connection as specified in **RFC 2595**. This is only allowed before user authentication *context* parameter is a "ssl.SSLContext" object which allows bundling SSL configuration options, certificates and private keys into a single (potentially long-lived) structure. Please read Security considerations for best practices. This method supports hostname checking via "ssl.SSLContext.check_hostname" and *Server Name Indication* (see "ssl.HAS_SNI"). New in version 3.4. Instances of "POP3_SSL" have no additional methods. The interface of this subclass is identical to its parent. POP3 Example ============ Here is a minimal example (without error checking) that opens a mailbox and retrieves and prints all messages: import getpass, poplib M = poplib.POP3('localhost') M.user(getpass.getuser()) M.pass_(getpass.getpass()) numMessages = len(M.list()[1]) for i in range(numMessages): for j in M.retr(i+1)[1]: print(j) At the end of the module, there is a test section that contains a more extensive example of usage. "posix" — The most common POSIX system calls ******************************************** ====================================================================== This module provides access to operating system functionality that is standardized by the C Standard and the POSIX standard (a thinly disguised Unix interface). **Do not import this module directly.** Instead, import the module "os", which provides a *portable* version of this interface. On Unix, the "os" module provides a superset of the "posix" interface. On non- Unix operating systems the "posix" module is not available, but a subset is always available through the "os" interface. Once "os" is imported, there is *no* performance penalty in using it instead of "posix". In addition, "os" provides some additional functionality, such as automatically calling "putenv()" when an entry in "os.environ" is changed. Errors are reported as exceptions; the usual exceptions are given for type errors, while errors reported by the system calls raise "OSError". Large File Support ================== Several operating systems (including AIX, HP-UX, Irix and Solaris) provide support for files that are larger than 2 GiB from a C programming model where "int" and "long" are 32-bit values. This is typically accomplished by defining the relevant size and offset types as 64-bit values. Such files are sometimes referred to as *large files*. Large file support is enabled in Python when the size of an "off_t" is larger than a "long" and the "long long" is at least as large as an "off_t". It may be necessary to configure and compile Python with certain compiler flags to enable this mode. For example, it is enabled by default with recent versions of Irix, but with Solaris 2.6 and 2.7 you need to do something like: CFLAGS="`getconf LFS_CFLAGS`" OPT="-g -O2 $CFLAGS" \ ./configure On large-file-capable Linux systems, this might work: CFLAGS='-D_LARGEFILE64_SOURCE -D_FILE_OFFSET_BITS=64' OPT="-g -O2 $CFLAGS" \ ./configure Notable Module Contents ======================= In addition to many functions described in the "os" module documentation, "posix" defines the following data item: posix.environ A dictionary representing the string environment at the time the interpreter was started. Keys and values are bytes on Unix and str on Windows. For example, "environ[b'HOME']" ("environ['HOME']" on Windows) is the pathname of your home directory, equivalent to "getenv("HOME")" in C. Modifying this dictionary does not affect the string environment passed on by "execv()", "popen()" or "system()"; if you need to change the environment, pass "environ" to "execve()" or add variable assignments and export statements to the command string for "system()" or "popen()". Changed in version 3.2: On Unix, keys and values are bytes. Note: The "os" module provides an alternate implementation of "environ" which updates the environment on modification. Note also that updating "os.environ" will render this dictionary obsolete. Use of the "os" module version of this is recommended over direct access to the "posix" module. "pprint" — Data pretty printer ****************************** **Source code:** Lib/pprint.py ====================================================================== The "pprint" module provides a capability to “pretty-print” arbitrary Python data structures in a form which can be used as input to the interpreter. If the formatted structures include objects which are not fundamental Python types, the representation may not be loadable. This may be the case if objects such as files, sockets or classes are included, as well as many other objects which are not representable as Python literals. The formatted representation keeps objects on a single line if it can, and breaks them onto multiple lines if they don’t fit within the allowed width. Construct "PrettyPrinter" objects explicitly if you need to adjust the width constraint. Dictionaries are sorted by key before the display is computed. Changed in version 3.9: Added support for pretty-printing "types.SimpleNamespace". The "pprint" module defines one class: class pprint.PrettyPrinter(indent=1, width=80, depth=None, stream=None, *, compact=False, sort_dicts=True) Construct a "PrettyPrinter" instance. This constructor understands several keyword parameters. An output stream may be set using the *stream* keyword; the only method used on the stream object is the file protocol’s "write()" method. If not specified, the "PrettyPrinter" adopts "sys.stdout". The amount of indentation added for each recursive level is specified by *indent*; the default is one. Other values can cause output to look a little odd, but can make nesting easier to spot. The number of levels which may be printed is controlled by *depth*; if the data structure being printed is too deep, the next contained level is replaced by "...". By default, there is no constraint on the depth of the objects being formatted. The desired output width is constrained using the *width* parameter; the default is 80 characters. If a structure cannot be formatted within the constrained width, a best effort will be made. If *compact* is false (the default) each item of a long sequence will be formatted on a separate line. If *compact* is true, as many items as will fit within the *width* will be formatted on each output line. If *sort_dicts* is true (the default), dictionaries will be formatted with their keys sorted, otherwise they will display in insertion order. Changed in version 3.4: Added the *compact* parameter. Changed in version 3.8: Added the *sort_dicts* parameter. >>> import pprint >>> stuff = ['spam', 'eggs', 'lumberjack', 'knights', 'ni'] >>> stuff.insert(0, stuff[:]) >>> pp = pprint.PrettyPrinter(indent=4) >>> pp.pprint(stuff) [ ['spam', 'eggs', 'lumberjack', 'knights', 'ni'], 'spam', 'eggs', 'lumberjack', 'knights', 'ni'] >>> pp = pprint.PrettyPrinter(width=41, compact=True) >>> pp.pprint(stuff) [['spam', 'eggs', 'lumberjack', 'knights', 'ni'], 'spam', 'eggs', 'lumberjack', 'knights', 'ni'] >>> tup = ('spam', ('eggs', ('lumberjack', ('knights', ('ni', ('dead', ... ('parrot', ('fresh fruit',)))))))) >>> pp = pprint.PrettyPrinter(depth=6) >>> pp.pprint(tup) ('spam', ('eggs', ('lumberjack', ('knights', ('ni', ('dead', (...))))))) The "pprint" module also provides several shortcut functions: pprint.pformat(object, indent=1, width=80, depth=None, *, compact=False, sort_dicts=True) Return the formatted representation of *object* as a string. *indent*, *width*, *depth*, *compact* and *sort_dicts* will be passed to the "PrettyPrinter" constructor as formatting parameters. Changed in version 3.4: Added the *compact* parameter. Changed in version 3.8: Added the *sort_dicts* parameter. pprint.pp(object, *args, sort_dicts=False, **kwargs) Prints the formatted representation of *object* followed by a newline. If *sort_dicts* is false (the default), dictionaries will be displayed with their keys in insertion order, otherwise the dict keys will be sorted. *args* and *kwargs* will be passed to "pprint()" as formatting parameters. New in version 3.8. pprint.pprint(object, stream=None, indent=1, width=80, depth=None, *, compact=False, sort_dicts=True) Prints the formatted representation of *object* on *stream*, followed by a newline. If *stream* is "None", "sys.stdout" is used. This may be used in the interactive interpreter instead of the "print()" function for inspecting values (you can even reassign "print = pprint.pprint" for use within a scope). *indent*, *width*, *depth*, *compact* and *sort_dicts* will be passed to the "PrettyPrinter" constructor as formatting parameters. Changed in version 3.4: Added the *compact* parameter. Changed in version 3.8: Added the *sort_dicts* parameter. >>> import pprint >>> stuff = ['spam', 'eggs', 'lumberjack', 'knights', 'ni'] >>> stuff.insert(0, stuff) >>> pprint.pprint(stuff) [, 'spam', 'eggs', 'lumberjack', 'knights', 'ni'] pprint.isreadable(object) Determine if the formatted representation of *object* is “readable”, or can be used to reconstruct the value using "eval()". This always returns "False" for recursive objects. >>> pprint.isreadable(stuff) False pprint.isrecursive(object) Determine if *object* requires a recursive representation. One more support function is also defined: pprint.saferepr(object) Return a string representation of *object*, protected against recursive data structures. If the representation of *object* exposes a recursive entry, the recursive reference will be represented as "". The representation is not otherwise formatted. >>> pprint.saferepr(stuff) "[, 'spam', 'eggs', 'lumberjack', 'knights', 'ni']" PrettyPrinter Objects ===================== "PrettyPrinter" instances have the following methods: PrettyPrinter.pformat(object) Return the formatted representation of *object*. This takes into account the options passed to the "PrettyPrinter" constructor. PrettyPrinter.pprint(object) Print the formatted representation of *object* on the configured stream, followed by a newline. The following methods provide the implementations for the corresponding functions of the same names. Using these methods on an instance is slightly more efficient since new "PrettyPrinter" objects don’t need to be created. PrettyPrinter.isreadable(object) Determine if the formatted representation of the object is “readable,” or can be used to reconstruct the value using "eval()". Note that this returns "False" for recursive objects. If the *depth* parameter of the "PrettyPrinter" is set and the object is deeper than allowed, this returns "False". PrettyPrinter.isrecursive(object) Determine if the object requires a recursive representation. This method is provided as a hook to allow subclasses to modify the way objects are converted to strings. The default implementation uses the internals of the "saferepr()" implementation. PrettyPrinter.format(object, context, maxlevels, level) Returns three values: the formatted version of *object* as a string, a flag indicating whether the result is readable, and a flag indicating whether recursion was detected. The first argument is the object to be presented. The second is a dictionary which contains the "id()" of objects that are part of the current presentation context (direct and indirect containers for *object* that are affecting the presentation) as the keys; if an object needs to be presented which is already represented in *context*, the third return value should be "True". Recursive calls to the "format()" method should add additional entries for containers to this dictionary. The third argument, *maxlevels*, gives the requested limit to recursion; this will be "0" if there is no requested limit. This argument should be passed unmodified to recursive calls. The fourth argument, *level*, gives the current level; recursive calls should be passed a value less than that of the current call. Example ======= To demonstrate several uses of the "pprint()" function and its parameters, let’s fetch information about a project from PyPI: >>> import json >>> import pprint >>> from urllib.request import urlopen >>> with urlopen('https://pypi.org/pypi/sampleproject/json') as resp: ... project_info = json.load(resp)['info'] In its basic form, "pprint()" shows the whole object: >>> pprint.pprint(project_info) {'author': 'The Python Packaging Authority', 'author_email': 'pypa-dev@googlegroups.com', 'bugtrack_url': None, 'classifiers': ['Development Status :: 3 - Alpha', 'Intended Audience :: Developers', 'License :: OSI Approved :: MIT License', 'Programming Language :: Python :: 2', 'Programming Language :: Python :: 2.6', 'Programming Language :: Python :: 2.7', 'Programming Language :: Python :: 3', 'Programming Language :: Python :: 3.2', 'Programming Language :: Python :: 3.3', 'Programming Language :: Python :: 3.4', 'Topic :: Software Development :: Build Tools'], 'description': 'A sample Python project\n' '=======================\n' '\n' 'This is the description file for the project.\n' '\n' 'The file should use UTF-8 encoding and be written using ' 'ReStructured Text. It\n' 'will be used to generate the project webpage on PyPI, and ' 'should be written for\n' 'that purpose.\n' '\n' 'Typical contents for this file would include an overview of ' 'the project, basic\n' 'usage examples, etc. Generally, including the project ' 'changelog in here is not\n' 'a good idea, although a simple "What\'s New" section for the ' 'most recent version\n' 'may be appropriate.', 'description_content_type': None, 'docs_url': None, 'download_url': 'UNKNOWN', 'downloads': {'last_day': -1, 'last_month': -1, 'last_week': -1}, 'home_page': 'https://github.com/pypa/sampleproject', 'keywords': 'sample setuptools development', 'license': 'MIT', 'maintainer': None, 'maintainer_email': None, 'name': 'sampleproject', 'package_url': 'https://pypi.org/project/sampleproject/', 'platform': 'UNKNOWN', 'project_url': 'https://pypi.org/project/sampleproject/', 'project_urls': {'Download': 'UNKNOWN', 'Homepage': 'https://github.com/pypa/sampleproject'}, 'release_url': 'https://pypi.org/project/sampleproject/1.2.0/', 'requires_dist': None, 'requires_python': None, 'summary': 'A sample Python project', 'version': '1.2.0'} The result can be limited to a certain *depth* (ellipsis is used for deeper contents): >>> pprint.pprint(project_info, depth=1) {'author': 'The Python Packaging Authority', 'author_email': 'pypa-dev@googlegroups.com', 'bugtrack_url': None, 'classifiers': [...], 'description': 'A sample Python project\n' '=======================\n' '\n' 'This is the description file for the project.\n' '\n' 'The file should use UTF-8 encoding and be written using ' 'ReStructured Text. It\n' 'will be used to generate the project webpage on PyPI, and ' 'should be written for\n' 'that purpose.\n' '\n' 'Typical contents for this file would include an overview of ' 'the project, basic\n' 'usage examples, etc. Generally, including the project ' 'changelog in here is not\n' 'a good idea, although a simple "What\'s New" section for the ' 'most recent version\n' 'may be appropriate.', 'description_content_type': None, 'docs_url': None, 'download_url': 'UNKNOWN', 'downloads': {...}, 'home_page': 'https://github.com/pypa/sampleproject', 'keywords': 'sample setuptools development', 'license': 'MIT', 'maintainer': None, 'maintainer_email': None, 'name': 'sampleproject', 'package_url': 'https://pypi.org/project/sampleproject/', 'platform': 'UNKNOWN', 'project_url': 'https://pypi.org/project/sampleproject/', 'project_urls': {...}, 'release_url': 'https://pypi.org/project/sampleproject/1.2.0/', 'requires_dist': None, 'requires_python': None, 'summary': 'A sample Python project', 'version': '1.2.0'} Additionally, maximum character *width* can be suggested. If a long object cannot be split, the specified width will be exceeded: >>> pprint.pprint(project_info, depth=1, width=60) {'author': 'The Python Packaging Authority', 'author_email': 'pypa-dev@googlegroups.com', 'bugtrack_url': None, 'classifiers': [...], 'description': 'A sample Python project\n' '=======================\n' '\n' 'This is the description file for the ' 'project.\n' '\n' 'The file should use UTF-8 encoding and be ' 'written using ReStructured Text. It\n' 'will be used to generate the project ' 'webpage on PyPI, and should be written ' 'for\n' 'that purpose.\n' '\n' 'Typical contents for this file would ' 'include an overview of the project, ' 'basic\n' 'usage examples, etc. Generally, including ' 'the project changelog in here is not\n' 'a good idea, although a simple "What\'s ' 'New" section for the most recent version\n' 'may be appropriate.', 'description_content_type': None, 'docs_url': None, 'download_url': 'UNKNOWN', 'downloads': {...}, 'home_page': 'https://github.com/pypa/sampleproject', 'keywords': 'sample setuptools development', 'license': 'MIT', 'maintainer': None, 'maintainer_email': None, 'name': 'sampleproject', 'package_url': 'https://pypi.org/project/sampleproject/', 'platform': 'UNKNOWN', 'project_url': 'https://pypi.org/project/sampleproject/', 'project_urls': {...}, 'release_url': 'https://pypi.org/project/sampleproject/1.2.0/', 'requires_dist': None, 'requires_python': None, 'summary': 'A sample Python project', 'version': '1.2.0'} The Python Profilers ******************** **Source code:** Lib/profile.py and Lib/pstats.py ====================================================================== Introduction to the profilers ============================= "cProfile" and "profile" provide *deterministic profiling* of Python programs. A *profile* is a set of statistics that describes how often and for how long various parts of the program executed. These statistics can be formatted into reports via the "pstats" module. The Python standard library provides two different implementations of the same profiling interface: 1. "cProfile" is recommended for most users; it’s a C extension with reasonable overhead that makes it suitable for profiling long- running programs. Based on "lsprof", contributed by Brett Rosen and Ted Czotter. 2. "profile", a pure Python module whose interface is imitated by "cProfile", but which adds significant overhead to profiled programs. If you’re trying to extend the profiler in some way, the task might be easier with this module. Originally designed and written by Jim Roskind. Note: The profiler modules are designed to provide an execution profile for a given program, not for benchmarking purposes (for that, there is "timeit" for reasonably accurate results). This particularly applies to benchmarking Python code against C code: the profilers introduce overhead for Python code, but not for C-level functions, and so the C code would seem faster than any Python one. Instant User’s Manual ===================== This section is provided for users that “don’t want to read the manual.” It provides a very brief overview, and allows a user to rapidly perform profiling on an existing application. To profile a function that takes a single argument, you can do: import cProfile import re cProfile.run('re.compile("foo|bar")') (Use "profile" instead of "cProfile" if the latter is not available on your system.) The above action would run "re.compile()" and print profile results like the following: 197 function calls (192 primitive calls) in 0.002 seconds Ordered by: standard name ncalls tottime percall cumtime percall filename:lineno(function) 1 0.000 0.000 0.001 0.001 :1() 1 0.000 0.000 0.001 0.001 re.py:212(compile) 1 0.000 0.000 0.001 0.001 re.py:268(_compile) 1 0.000 0.000 0.000 0.000 sre_compile.py:172(_compile_charset) 1 0.000 0.000 0.000 0.000 sre_compile.py:201(_optimize_charset) 4 0.000 0.000 0.000 0.000 sre_compile.py:25(_identityfunction) 3/1 0.000 0.000 0.000 0.000 sre_compile.py:33(_compile) The first line indicates that 197 calls were monitored. Of those calls, 192 were *primitive*, meaning that the call was not induced via recursion. The next line: "Ordered by: standard name", indicates that the text string in the far right column was used to sort the output. The column headings include: ncalls for the number of calls. tottime for the total time spent in the given function (and excluding time made in calls to sub-functions) percall is the quotient of "tottime" divided by "ncalls" cumtime is the cumulative time spent in this and all subfunctions (from invocation till exit). This figure is accurate *even* for recursive functions. percall is the quotient of "cumtime" divided by primitive calls filename:lineno(function) provides the respective data of each function When there are two numbers in the first column (for example "3/1"), it means that the function recursed. The second value is the number of primitive calls and the former is the total number of calls. Note that when the function does not recurse, these two values are the same, and only the single figure is printed. Instead of printing the output at the end of the profile run, you can save the results to a file by specifying a filename to the "run()" function: import cProfile import re cProfile.run('re.compile("foo|bar")', 'restats') The "pstats.Stats" class reads profile results from a file and formats them in various ways. The files "cProfile" and "profile" can also be invoked as a script to profile another script. For example: python -m cProfile [-o output_file] [-s sort_order] (-m module | myscript.py) "-o" writes the profile results to a file instead of to stdout "-s" specifies one of the "sort_stats()" sort values to sort the output by. This only applies when "-o" is not supplied. "-m" specifies that a module is being profiled instead of a script. New in version 3.7: Added the "-m" option to "cProfile". New in version 3.8: Added the "-m" option to "profile". The "pstats" module’s "Stats" class has a variety of methods for manipulating and printing the data saved into a profile results file: import pstats from pstats import SortKey p = pstats.Stats('restats') p.strip_dirs().sort_stats(-1).print_stats() The "strip_dirs()" method removed the extraneous path from all the module names. The "sort_stats()" method sorted all the entries according to the standard module/line/name string that is printed. The "print_stats()" method printed out all the statistics. You might try the following sort calls: p.sort_stats(SortKey.NAME) p.print_stats() The first call will actually sort the list by function name, and the second call will print out the statistics. The following are some interesting calls to experiment with: p.sort_stats(SortKey.CUMULATIVE).print_stats(10) This sorts the profile by cumulative time in a function, and then only prints the ten most significant lines. If you want to understand what algorithms are taking time, the above line is what you would use. If you were looking to see what functions were looping a lot, and taking a lot of time, you would do: p.sort_stats(SortKey.TIME).print_stats(10) to sort according to time spent within each function, and then print the statistics for the top ten functions. You might also try: p.sort_stats(SortKey.FILENAME).print_stats('__init__') This will sort all the statistics by file name, and then print out statistics for only the class init methods (since they are spelled with "__init__" in them). As one final example, you could try: p.sort_stats(SortKey.TIME, SortKey.CUMULATIVE).print_stats(.5, 'init') This line sorts statistics with a primary key of time, and a secondary key of cumulative time, and then prints out some of the statistics. To be specific, the list is first culled down to 50% (re: ".5") of its original size, then only lines containing "init" are maintained, and that sub-sub-list is printed. If you wondered what functions called the above functions, you could now ("p" is still sorted according to the last criteria) do: p.print_callers(.5, 'init') and you would get a list of callers for each of the listed functions. If you want more functionality, you’re going to have to read the manual, or guess what the following functions do: p.print_callees() p.add('restats') Invoked as a script, the "pstats" module is a statistics browser for reading and examining profile dumps. It has a simple line-oriented interface (implemented using "cmd") and interactive help. "profile" and "cProfile" Module Reference ========================================= Both the "profile" and "cProfile" modules provide the following functions: profile.run(command, filename=None, sort=-1) This function takes a single argument that can be passed to the "exec()" function, and an optional file name. In all cases this routine executes: exec(command, __main__.__dict__, __main__.__dict__) and gathers profiling statistics from the execution. If no file name is present, then this function automatically creates a "Stats" instance and prints a simple profiling report. If the sort value is specified, it is passed to this "Stats" instance to control how the results are sorted. profile.runctx(command, globals, locals, filename=None, sort=-1) This function is similar to "run()", with added arguments to supply the globals and locals dictionaries for the *command* string. This routine executes: exec(command, globals, locals) and gathers profiling statistics as in the "run()" function above. class profile.Profile(timer=None, timeunit=0.0, subcalls=True, builtins=True) This class is normally only used if more precise control over profiling is needed than what the "cProfile.run()" function provides. A custom timer can be supplied for measuring how long code takes to run via the *timer* argument. This must be a function that returns a single number representing the current time. If the number is an integer, the *timeunit* specifies a multiplier that specifies the duration of each unit of time. For example, if the timer returns times measured in thousands of seconds, the time unit would be ".001". Directly using the "Profile" class allows formatting profile results without writing the profile data to a file: import cProfile, pstats, io from pstats import SortKey pr = cProfile.Profile() pr.enable() # ... do something ... pr.disable() s = io.StringIO() sortby = SortKey.CUMULATIVE ps = pstats.Stats(pr, stream=s).sort_stats(sortby) ps.print_stats() print(s.getvalue()) The "Profile" class can also be used as a context manager (supported only in "cProfile" module. see Context Manager Types): import cProfile with cProfile.Profile() as pr: # ... do something ... pr.print_stats() Changed in version 3.8: Added context manager support. enable() Start collecting profiling data. Only in "cProfile". disable() Stop collecting profiling data. Only in "cProfile". create_stats() Stop collecting profiling data and record the results internally as the current profile. print_stats(sort=-1) Create a "Stats" object based on the current profile and print the results to stdout. dump_stats(filename) Write the results of the current profile to *filename*. run(cmd) Profile the cmd via "exec()". runctx(cmd, globals, locals) Profile the cmd via "exec()" with the specified global and local environment. runcall(func, /, *args, **kwargs) Profile "func(*args, **kwargs)" Note that profiling will only work if the called command/function actually returns. If the interpreter is terminated (e.g. via a "sys.exit()" call during the called command/function execution) no profiling results will be printed. The "Stats" Class ================= Analysis of the profiler data is done using the "Stats" class. class pstats.Stats(*filenames or profile, stream=sys.stdout) This class constructor creates an instance of a “statistics object” from a *filename* (or list of filenames) or from a "Profile" instance. Output will be printed to the stream specified by *stream*. The file selected by the above constructor must have been created by the corresponding version of "profile" or "cProfile". To be specific, there is *no* file compatibility guaranteed with future versions of this profiler, and there is no compatibility with files produced by other profilers, or the same profiler run on a different operating system. If several files are provided, all the statistics for identical functions will be coalesced, so that an overall view of several processes can be considered in a single report. If additional files need to be combined with data in an existing "Stats" object, the "add()" method can be used. Instead of reading the profile data from a file, a "cProfile.Profile" or "profile.Profile" object can be used as the profile data source. "Stats" objects have the following methods: strip_dirs() This method for the "Stats" class removes all leading path information from file names. It is very useful in reducing the size of the printout to fit within (close to) 80 columns. This method modifies the object, and the stripped information is lost. After performing a strip operation, the object is considered to have its entries in a “random” order, as it was just after object initialization and loading. If "strip_dirs()" causes two function names to be indistinguishable (they are on the same line of the same filename, and have the same function name), then the statistics for these two entries are accumulated into a single entry. add(*filenames) This method of the "Stats" class accumulates additional profiling information into the current profiling object. Its arguments should refer to filenames created by the corresponding version of "profile.run()" or "cProfile.run()". Statistics for identically named (re: file, line, name) functions are automatically accumulated into single function statistics. dump_stats(filename) Save the data loaded into the "Stats" object to a file named *filename*. The file is created if it does not exist, and is overwritten if it already exists. This is equivalent to the method of the same name on the "profile.Profile" and "cProfile.Profile" classes. sort_stats(*keys) This method modifies the "Stats" object by sorting it according to the supplied criteria. The argument can be either a string or a SortKey enum identifying the basis of a sort (example: "'time'", "'name'", "SortKey.TIME" or "SortKey.NAME"). The SortKey enums argument have advantage over the string argument in that it is more robust and less error prone. When more than one key is provided, then additional keys are used as secondary criteria when there is equality in all keys selected before them. For example, "sort_stats(SortKey.NAME, SortKey.FILE)" will sort all the entries according to their function name, and resolve all ties (identical function names) by sorting by file name. For the string argument, abbreviations can be used for any key names, as long as the abbreviation is unambiguous. The following are the valid string and SortKey: +--------------------+-----------------------+------------------------+ | Valid String Arg | Valid enum Arg | Meaning | |====================|=======================|========================| | "'calls'" | SortKey.CALLS | call count | +--------------------+-----------------------+------------------------+ | "'cumulative'" | SortKey.CUMULATIVE | cumulative time | +--------------------+-----------------------+------------------------+ | "'cumtime'" | N/A | cumulative time | +--------------------+-----------------------+------------------------+ | "'file'" | N/A | file name | +--------------------+-----------------------+------------------------+ | "'filename'" | SortKey.FILENAME | file name | +--------------------+-----------------------+------------------------+ | "'module'" | N/A | file name | +--------------------+-----------------------+------------------------+ | "'ncalls'" | N/A | call count | +--------------------+-----------------------+------------------------+ | "'pcalls'" | SortKey.PCALLS | primitive call count | +--------------------+-----------------------+------------------------+ | "'line'" | SortKey.LINE | line number | +--------------------+-----------------------+------------------------+ | "'name'" | SortKey.NAME | function name | +--------------------+-----------------------+------------------------+ | "'nfl'" | SortKey.NFL | name/file/line | +--------------------+-----------------------+------------------------+ | "'stdname'" | SortKey.STDNAME | standard name | +--------------------+-----------------------+------------------------+ | "'time'" | SortKey.TIME | internal time | +--------------------+-----------------------+------------------------+ | "'tottime'" | N/A | internal time | +--------------------+-----------------------+------------------------+ Note that all sorts on statistics are in descending order (placing most time consuming items first), where as name, file, and line number searches are in ascending order (alphabetical). The subtle distinction between "SortKey.NFL" and "SortKey.STDNAME" is that the standard name is a sort of the name as printed, which means that the embedded line numbers get compared in an odd way. For example, lines 3, 20, and 40 would (if the file names were the same) appear in the string order 20, 3 and 40. In contrast, "SortKey.NFL" does a numeric compare of the line numbers. In fact, "sort_stats(SortKey.NFL)" is the same as "sort_stats(SortKey.NAME, SortKey.FILENAME, SortKey.LINE)". For backward-compatibility reasons, the numeric arguments "-1", "0", "1", and "2" are permitted. They are interpreted as "'stdname'", "'calls'", "'time'", and "'cumulative'" respectively. If this old style format (numeric) is used, only one sort key (the numeric key) will be used, and additional arguments will be silently ignored. New in version 3.7: Added the SortKey enum. reverse_order() This method for the "Stats" class reverses the ordering of the basic list within the object. Note that by default ascending vs descending order is properly selected based on the sort key of choice. print_stats(*restrictions) This method for the "Stats" class prints out a report as described in the "profile.run()" definition. The order of the printing is based on the last "sort_stats()" operation done on the object (subject to caveats in "add()" and "strip_dirs()"). The arguments provided (if any) can be used to limit the list down to the significant entries. Initially, the list is taken to be the complete set of profiled functions. Each restriction is either an integer (to select a count of lines), or a decimal fraction between 0.0 and 1.0 inclusive (to select a percentage of lines), or a string that will interpreted as a regular expression (to pattern match the standard name that is printed). If several restrictions are provided, then they are applied sequentially. For example: print_stats(.1, 'foo:') would first limit the printing to first 10% of list, and then only print functions that were part of filename ".*foo:". In contrast, the command: print_stats('foo:', .1) would limit the list to all functions having file names ".*foo:", and then proceed to only print the first 10% of them. print_callers(*restrictions) This method for the "Stats" class prints a list of all functions that called each function in the profiled database. The ordering is identical to that provided by "print_stats()", and the definition of the restricting argument is also identical. Each caller is reported on its own line. The format differs slightly depending on the profiler that produced the stats: * With "profile", a number is shown in parentheses after each caller to show how many times this specific call was made. For convenience, a second non-parenthesized number repeats the cumulative time spent in the function at the right. * With "cProfile", each caller is preceded by three numbers: the number of times this specific call was made, and the total and cumulative times spent in the current function while it was invoked by this specific caller. print_callees(*restrictions) This method for the "Stats" class prints a list of all function that were called by the indicated function. Aside from this reversal of direction of calls (re: called vs was called by), the arguments and ordering are identical to the "print_callers()" method. get_stats_profile() This method returns an instance of StatsProfile, which contains a mapping of function names to instances of FunctionProfile. Each FunctionProfile instance holds information related to the function’s profile such as how long the function took to run, how many times it was called, etc… New in version 3.9: Added the following dataclasses: StatsProfile, FunctionProfile. Added the following function: get_stats_profile. What Is Deterministic Profiling? ================================ *Deterministic profiling* is meant to reflect the fact that all *function call*, *function return*, and *exception* events are monitored, and precise timings are made for the intervals between these events (during which time the user’s code is executing). In contrast, *statistical profiling* (which is not done by this module) randomly samples the effective instruction pointer, and deduces where time is being spent. The latter technique traditionally involves less overhead (as the code does not need to be instrumented), but provides only relative indications of where time is being spent. In Python, since there is an interpreter active during execution, the presence of instrumented code is not required in order to do deterministic profiling. Python automatically provides a *hook* (optional callback) for each event. In addition, the interpreted nature of Python tends to add so much overhead to execution, that deterministic profiling tends to only add small processing overhead in typical applications. The result is that deterministic profiling is not that expensive, yet provides extensive run time statistics about the execution of a Python program. Call count statistics can be used to identify bugs in code (surprising counts), and to identify possible inline-expansion points (high call counts). Internal time statistics can be used to identify “hot loops” that should be carefully optimized. Cumulative time statistics should be used to identify high level errors in the selection of algorithms. Note that the unusual handling of cumulative times in this profiler allows statistics for recursive implementations of algorithms to be directly compared to iterative implementations. Limitations =========== One limitation has to do with accuracy of timing information. There is a fundamental problem with deterministic profilers involving accuracy. The most obvious restriction is that the underlying “clock” is only ticking at a rate (typically) of about .001 seconds. Hence no measurements will be more accurate than the underlying clock. If enough measurements are taken, then the “error” will tend to average out. Unfortunately, removing this first error induces a second source of error. The second problem is that it “takes a while” from when an event is dispatched until the profiler’s call to get the time actually *gets* the state of the clock. Similarly, there is a certain lag when exiting the profiler event handler from the time that the clock’s value was obtained (and then squirreled away), until the user’s code is once again executing. As a result, functions that are called many times, or call many functions, will typically accumulate this error. The error that accumulates in this fashion is typically less than the accuracy of the clock (less than one clock tick), but it *can* accumulate and become very significant. The problem is more important with "profile" than with the lower- overhead "cProfile". For this reason, "profile" provides a means of calibrating itself for a given platform so that this error can be probabilistically (on the average) removed. After the profiler is calibrated, it will be more accurate (in a least square sense), but it will sometimes produce negative numbers (when call counts are exceptionally low, and the gods of probability work against you :-). ) Do *not* be alarmed by negative numbers in the profile. They should *only* appear if you have calibrated your profiler, and the results are actually better than without calibration. Calibration =========== The profiler of the "profile" module subtracts a constant from each event handling time to compensate for the overhead of calling the time function, and socking away the results. By default, the constant is 0. The following procedure can be used to obtain a better constant for a given platform (see Limitations). import profile pr = profile.Profile() for i in range(5): print(pr.calibrate(10000)) The method executes the number of Python calls given by the argument, directly and again under the profiler, measuring the time for both. It then computes the hidden overhead per profiler event, and returns that as a float. For example, on a 1.8Ghz Intel Core i5 running macOS, and using Python’s time.process_time() as the timer, the magical number is about 4.04e-6. The object of this exercise is to get a fairly consistent result. If your computer is *very* fast, or your timer function has poor resolution, you might have to pass 100000, or even 1000000, to get consistent results. When you have a consistent answer, there are three ways you can use it: import profile # 1. Apply computed bias to all Profile instances created hereafter. profile.Profile.bias = your_computed_bias # 2. Apply computed bias to a specific Profile instance. pr = profile.Profile() pr.bias = your_computed_bias # 3. Specify computed bias in instance constructor. pr = profile.Profile(bias=your_computed_bias) If you have a choice, you are better off choosing a smaller constant, and then your results will “less often” show up as negative in profile statistics. Using a custom timer ==================== If you want to change how current time is determined (for example, to force use of wall-clock time or elapsed process time), pass the timing function you want to the "Profile" class constructor: pr = profile.Profile(your_time_func) The resulting profiler will then call "your_time_func". Depending on whether you are using "profile.Profile" or "cProfile.Profile", "your_time_func"’s return value will be interpreted differently: "profile.Profile" "your_time_func" should return a single number, or a list of numbers whose sum is the current time (like what "os.times()" returns). If the function returns a single time number, or the list of returned numbers has length 2, then you will get an especially fast version of the dispatch routine. Be warned that you should calibrate the profiler class for the timer function that you choose (see Calibration). For most machines, a timer that returns a lone integer value will provide the best results in terms of low overhead during profiling. ("os.times()" is *pretty* bad, as it returns a tuple of floating point values). If you want to substitute a better timer in the cleanest fashion, derive a class and hardwire a replacement dispatch method that best handles your timer call, along with the appropriate calibration constant. "cProfile.Profile" "your_time_func" should return a single number. If it returns integers, you can also invoke the class constructor with a second argument specifying the real duration of one unit of time. For example, if "your_integer_time_func" returns times measured in thousands of seconds, you would construct the "Profile" instance as follows: pr = cProfile.Profile(your_integer_time_func, 0.001) As the "cProfile.Profile" class cannot be calibrated, custom timer functions should be used with care and should be as fast as possible. For the best results with a custom timer, it might be necessary to hard-code it in the C source of the internal "_lsprof" module. Python 3.3 adds several new functions in "time" that can be used to make precise measurements of process or wall-clock time. For example, see "time.perf_counter()". "pty" — Pseudo-terminal utilities ********************************* **Source code:** Lib/pty.py ====================================================================== The "pty" module defines operations for handling the pseudo-terminal concept: starting another process and being able to write to and read from its controlling terminal programmatically. Because pseudo-terminal handling is highly platform dependent, there is code to do it only for Linux. (The Linux code is supposed to work on other platforms, but hasn’t been tested yet.) The "pty" module defines the following functions: pty.fork() Fork. Connect the child’s controlling terminal to a pseudo- terminal. Return value is "(pid, fd)". Note that the child gets *pid* 0, and the *fd* is *invalid*. The parent’s return value is the *pid* of the child, and *fd* is a file descriptor connected to the child’s controlling terminal (and also to the child’s standard input and output). pty.openpty() Open a new pseudo-terminal pair, using "os.openpty()" if possible, or emulation code for generic Unix systems. Return a pair of file descriptors "(master, slave)", for the master and the slave end, respectively. pty.spawn(argv[, master_read[, stdin_read]]) Spawn a process, and connect its controlling terminal with the current process’s standard io. This is often used to baffle programs which insist on reading from the controlling terminal. It is expected that the process spawned behind the pty will eventually terminate, and when it does *spawn* will return. The functions *master_read* and *stdin_read* are passed a file descriptor which they should read from, and they should always return a byte string. In order to force spawn to return before the child process exits an "OSError" should be thrown. The default implementation for both functions will read and return up to 1024 bytes each time the function is called. The *master_read* callback is passed the pseudoterminal’s master file descriptor to read output from the child process, and *stdin_read* is passed file descriptor 0, to read from the parent process’s standard input. Returning an empty byte string from either callback is interpreted as an end-of-file (EOF) condition, and that callback will not be called after that. If *stdin_read* signals EOF the controlling terminal can no longer communicate with the parent process OR the child process. Unless the child process will quit without any input, *spawn* will then loop forever. If *master_read* signals EOF the same behavior results (on linux at least). If both callbacks signal EOF then *spawn* will probably never return, unless *select* throws an error on your platform when passed three empty lists. This is a bug, documented in issue 26228. Return the exit status value from "os.waitpid()" on the child process. "waitstatus_to_exitcode()" can be used to convert the exit status into an exit code. Raises an auditing event "pty.spawn" with argument "argv". Changed in version 3.4: "spawn()" now returns the status value from "os.waitpid()" on the child process. Example ======= The following program acts like the Unix command *script(1)*, using a pseudo-terminal to record all input and output of a terminal session in a “typescript”. import argparse import os import pty import sys import time parser = argparse.ArgumentParser() parser.add_argument('-a', dest='append', action='store_true') parser.add_argument('-p', dest='use_python', action='store_true') parser.add_argument('filename', nargs='?', default='typescript') options = parser.parse_args() shell = sys.executable if options.use_python else os.environ.get('SHELL', 'sh') filename = options.filename mode = 'ab' if options.append else 'wb' with open(filename, mode) as script: def read(fd): data = os.read(fd, 1024) script.write(data) return data print('Script started, file is', filename) script.write(('Script started on %s\n' % time.asctime()).encode()) pty.spawn(shell, read) script.write(('Script done on %s\n' % time.asctime()).encode()) print('Script done, file is', filename) "pwd" — The password database ***************************** ====================================================================== This module provides access to the Unix user account and password database. It is available on all Unix versions. Password database entries are reported as a tuple-like object, whose attributes correspond to the members of the "passwd" structure (Attribute field below, see ""): +---------+-----------------+-------------------------------+ | Index | Attribute | Meaning | |=========|=================|===============================| | 0 | "pw_name" | Login name | +---------+-----------------+-------------------------------+ | 1 | "pw_passwd" | Optional encrypted password | +---------+-----------------+-------------------------------+ | 2 | "pw_uid" | Numerical user ID | +---------+-----------------+-------------------------------+ | 3 | "pw_gid" | Numerical group ID | +---------+-----------------+-------------------------------+ | 4 | "pw_gecos" | User name or comment field | +---------+-----------------+-------------------------------+ | 5 | "pw_dir" | User home directory | +---------+-----------------+-------------------------------+ | 6 | "pw_shell" | User command interpreter | +---------+-----------------+-------------------------------+ The uid and gid items are integers, all others are strings. "KeyError" is raised if the entry asked for cannot be found. Note: In traditional Unix the field "pw_passwd" usually contains a password encrypted with a DES derived algorithm (see module "crypt"). However most modern unices use a so-called *shadow password* system. On those unices the *pw_passwd* field only contains an asterisk ("'*'") or the letter "'x'" where the encrypted password is stored in a file "/etc/shadow" which is not world readable. Whether the *pw_passwd* field contains anything useful is system-dependent. If available, the "spwd" module should be used where access to the encrypted password is required. It defines the following items: pwd.getpwuid(uid) Return the password database entry for the given numeric user ID. pwd.getpwnam(name) Return the password database entry for the given user name. pwd.getpwall() Return a list of all available password database entries, in arbitrary order. See also: Module "grp" An interface to the group database, similar to this. Module "spwd" An interface to the shadow password database, similar to this. "py_compile" — Compile Python source files ****************************************** **Source code:** Lib/py_compile.py ====================================================================== The "py_compile" module provides a function to generate a byte-code file from a source file, and another function used when the module source file is invoked as a script. Though not often needed, this function can be useful when installing modules for shared use, especially if some of the users may not have permission to write the byte-code cache files in the directory containing the source code. exception py_compile.PyCompileError Exception raised when an error occurs while attempting to compile the file. py_compile.compile(file, cfile=None, dfile=None, doraise=False, optimize=-1, invalidation_mode=PycInvalidationMode.TIMESTAMP, quiet=0) Compile a source file to byte-code and write out the byte-code cache file. The source code is loaded from the file named *file*. The byte-code is written to *cfile*, which defaults to the **PEP 3147**/**PEP 488** path, ending in ".pyc". For example, if *file* is "/foo/bar/baz.py" *cfile* will default to "/foo/bar/__pycache__/baz.cpython-32.pyc" for Python 3.2. If *dfile* is specified, it is used as the name of the source file in error messages instead of *file*. If *doraise* is true, a "PyCompileError" is raised when an error is encountered while compiling *file*. If *doraise* is false (the default), an error string is written to "sys.stderr", but no exception is raised. This function returns the path to byte-compiled file, i.e. whatever *cfile* value was used. The *doraise* and *quiet* arguments determine how errors are handled while compiling file. If *quiet* is 0 or 1, and *doraise* is false, the default behaviour is enabled: an error string is written to "sys.stderr", and the function returns "None" instead of a path. If *doraise* is true, a "PyCompileError" is raised instead. However if *quiet* is 2, no message is written, and *doraise* has no effect. If the path that *cfile* becomes (either explicitly specified or computed) is a symlink or non-regular file, "FileExistsError" will be raised. This is to act as a warning that import will turn those paths into regular files if it is allowed to write byte-compiled files to those paths. This is a side-effect of import using file renaming to place the final byte-compiled file into place to prevent concurrent file writing issues. *optimize* controls the optimization level and is passed to the built-in "compile()" function. The default of "-1" selects the optimization level of the current interpreter. *invalidation_mode* should be a member of the "PycInvalidationMode" enum and controls how the generated bytecode cache is invalidated at runtime. The default is "PycInvalidationMode.CHECKED_HASH" if the "SOURCE_DATE_EPOCH" environment variable is set, otherwise the default is "PycInvalidationMode.TIMESTAMP". Changed in version 3.2: Changed default value of *cfile* to be **PEP 3147**-compliant. Previous default was *file* + "'c'" ("'o'" if optimization was enabled). Also added the *optimize* parameter. Changed in version 3.4: Changed code to use "importlib" for the byte-code cache file writing. This means file creation/writing semantics now match what "importlib" does, e.g. permissions, write- and-move semantics, etc. Also added the caveat that "FileExistsError" is raised if *cfile* is a symlink or non-regular file. Changed in version 3.7: The *invalidation_mode* parameter was added as specified in **PEP 552**. If the "SOURCE_DATE_EPOCH" environment variable is set, *invalidation_mode* will be forced to "PycInvalidationMode.CHECKED_HASH". Changed in version 3.7.2: The "SOURCE_DATE_EPOCH" environment variable no longer overrides the value of the *invalidation_mode* argument, and determines its default value instead. Changed in version 3.8: The *quiet* parameter was added. class py_compile.PycInvalidationMode A enumeration of possible methods the interpreter can use to determine whether a bytecode file is up to date with a source file. The ".pyc" file indicates the desired invalidation mode in its header. See Cached bytecode invalidation for more information on how Python invalidates ".pyc" files at runtime. New in version 3.7. TIMESTAMP The ".pyc" file includes the timestamp and size of the source file, which Python will compare against the metadata of the source file at runtime to determine if the ".pyc" file needs to be regenerated. CHECKED_HASH The ".pyc" file includes a hash of the source file content, which Python will compare against the source at runtime to determine if the ".pyc" file needs to be regenerated. UNCHECKED_HASH Like "CHECKED_HASH", the ".pyc" file includes a hash of the source file content. However, Python will at runtime assume the ".pyc" file is up to date and not validate the ".pyc" against the source file at all. This option is useful when the ".pycs" are kept up to date by some system external to Python like a build system. py_compile.main(args=None) Compile several source files. The files named in *args* (or on the command line, if *args* is "None") are compiled and the resulting byte-code is cached in the normal manner. This function does not search a directory structure to locate source files; it only compiles files named explicitly. If "'-'" is the only parameter in args, the list of files is taken from standard input. Changed in version 3.2: Added support for "'-'". When this module is run as a script, the "main()" is used to compile all the files named on the command line. The exit status is nonzero if one of the files could not be compiled. See also: Module "compileall" Utilities to compile all Python source files in a directory tree. "pyclbr" — Python module browser support **************************************** **Source code:** Lib/pyclbr.py ====================================================================== The "pyclbr" module provides limited information about the functions, classes, and methods defined in a Python-coded module. The information is sufficient to implement a module browser. The information is extracted from the Python source code rather than by importing the module, so this module is safe to use with untrusted code. This restriction makes it impossible to use this module with modules not implemented in Python, including all standard and optional extension modules. pyclbr.readmodule(module, path=None) Return a dictionary mapping module-level class names to class descriptors. If possible, descriptors for imported base classes are included. Parameter *module* is a string with the name of the module to read; it may be the name of a module within a package. If given, *path* is a sequence of directory paths prepended to "sys.path", which is used to locate the module source code. This function is the original interface and is only kept for back compatibility. It returns a filtered version of the following. pyclbr.readmodule_ex(module, path=None) Return a dictionary-based tree containing a function or class descriptors for each function and class defined in the module with a "def" or "class" statement. The returned dictionary maps module- level function and class names to their descriptors. Nested objects are entered into the children dictionary of their parent. As with readmodule, *module* names the module to be read and *path* is prepended to sys.path. If the module being read is a package, the returned dictionary has a key "'__path__'" whose value is a list containing the package search path. New in version 3.7: Descriptors for nested definitions. They are accessed through the new children attribute. Each has a new parent attribute. The descriptors returned by these functions are instances of Function and Class classes. Users are not expected to create instances of these classes. Function Objects ================ Class "Function" instances describe functions defined by def statements. They have the following attributes: Function.file Name of the file in which the function is defined. Function.module The name of the module defining the function described. Function.name The name of the function. Function.lineno The line number in the file where the definition starts. Function.parent For top-level functions, None. For nested functions, the parent. New in version 3.7. Function.children A dictionary mapping names to descriptors for nested functions and classes. New in version 3.7. Class Objects ============= Class "Class" instances describe classes defined by class statements. They have the same attributes as Functions and two more. Class.file Name of the file in which the class is defined. Class.module The name of the module defining the class described. Class.name The name of the class. Class.lineno The line number in the file where the definition starts. Class.parent For top-level classes, None. For nested classes, the parent. New in version 3.7. Class.children A dictionary mapping names to descriptors for nested functions and classes. New in version 3.7. Class.super A list of "Class" objects which describe the immediate base classes of the class being described. Classes which are named as superclasses but which are not discoverable by "readmodule_ex()" are listed as a string with the class name instead of as "Class" objects. Class.methods A dictionary mapping method names to line numbers. This can be derived from the newer children dictionary, but remains for back- compatibility. "pydoc" — Documentation generator and online help system ******************************************************** **Source code:** Lib/pydoc.py ====================================================================== The "pydoc" module automatically generates documentation from Python modules. The documentation can be presented as pages of text on the console, served to a Web browser, or saved to HTML files. For modules, classes, functions and methods, the displayed documentation is derived from the docstring (i.e. the "__doc__" attribute) of the object, and recursively of its documentable members. If there is no docstring, "pydoc" tries to obtain a description from the block of comment lines just above the definition of the class, function or method in the source file, or at the top of the module (see "inspect.getcomments()"). The built-in function "help()" invokes the online help system in the interactive interpreter, which uses "pydoc" to generate its documentation as text on the console. The same text documentation can also be viewed from outside the Python interpreter by running **pydoc** as a script at the operating system’s command prompt. For example, running pydoc sys at a shell prompt will display documentation on the "sys" module, in a style similar to the manual pages shown by the Unix **man** command. The argument to **pydoc** can be the name of a function, module, or package, or a dotted reference to a class, method, or function within a module or module in a package. If the argument to **pydoc** looks like a path (that is, it contains the path separator for your operating system, such as a slash in Unix), and refers to an existing Python source file, then documentation is produced for that file. Note: In order to find objects and their documentation, "pydoc" imports the module(s) to be documented. Therefore, any code on module level will be executed on that occasion. Use an "if __name__ == '__main__':" guard to only execute code when a file is invoked as a script and not just imported. When printing output to the console, **pydoc** attempts to paginate the output for easier reading. If the "PAGER" environment variable is set, **pydoc** will use its value as a pagination program. Specifying a "-w" flag before the argument will cause HTML documentation to be written out to a file in the current directory, instead of displaying text on the console. Specifying a "-k" flag before the argument will search the synopsis lines of all available modules for the keyword given as the argument, again in a manner similar to the Unix **man** command. The synopsis line of a module is the first line of its documentation string. You can also use **pydoc** to start an HTTP server on the local machine that will serve documentation to visiting Web browsers. **pydoc -p 1234** will start a HTTP server on port 1234, allowing you to browse the documentation at "http://localhost:1234/" in your preferred Web browser. Specifying "0" as the port number will select an arbitrary unused port. **pydoc -n ** will start the server listening at the given hostname. By default the hostname is ‘localhost’ but if you want the server to be reached from other machines, you may want to change the host name that the server responds to. During development this is especially useful if you want to run pydoc from within a container. **pydoc -b** will start the server and additionally open a web browser to a module index page. Each served page has a navigation bar at the top where you can *Get* help on an individual item, *Search* all modules with a keyword in their synopsis line, and go to the *Module index*, *Topics* and *Keywords* pages. When **pydoc** generates documentation, it uses the current environment and path to locate modules. Thus, invoking **pydoc spam** documents precisely the version of the module you would get if you started the Python interpreter and typed "import spam". Module docs for core modules are assumed to reside in "https://docs.python.org/X.Y/library/" where "X" and "Y" are the major and minor version numbers of the Python interpreter. This can be overridden by setting the "PYTHONDOCS" environment variable to a different URL or to a local directory containing the Library Reference Manual pages. Changed in version 3.2: Added the "-b" option. Changed in version 3.3: The "-g" command line option was removed. Changed in version 3.4: "pydoc" now uses "inspect.signature()" rather than "inspect.getfullargspec()" to extract signature information from callables. Changed in version 3.7: Added the "-n" option. "xml.parsers.expat" — Fast XML parsing using Expat ************************************************** ====================================================================== Warning: The "pyexpat" module is not secure against maliciously constructed data. If you need to parse untrusted or unauthenticated data see XML vulnerabilities. The "xml.parsers.expat" module is a Python interface to the Expat non- validating XML parser. The module provides a single extension type, "xmlparser", that represents the current state of an XML parser. After an "xmlparser" object has been created, various attributes of the object can be set to handler functions. When an XML document is then fed to the parser, the handler functions are called for the character data and markup in the XML document. This module uses the "pyexpat" module to provide access to the Expat parser. Direct use of the "pyexpat" module is deprecated. This module provides one exception and one type object: exception xml.parsers.expat.ExpatError The exception raised when Expat reports an error. See section ExpatError Exceptions for more information on interpreting Expat errors. exception xml.parsers.expat.error Alias for "ExpatError". xml.parsers.expat.XMLParserType The type of the return values from the "ParserCreate()" function. The "xml.parsers.expat" module contains two functions: xml.parsers.expat.ErrorString(errno) Returns an explanatory string for a given error number *errno*. xml.parsers.expat.ParserCreate(encoding=None, namespace_separator=None) Creates and returns a new "xmlparser" object. *encoding*, if specified, must be a string naming the encoding used by the XML data. Expat doesn’t support as many encodings as Python does, and its repertoire of encodings can’t be extended; it supports UTF-8, UTF-16, ISO-8859-1 (Latin1), and ASCII. If *encoding* [1] is given it will override the implicit or explicit encoding of the document. Expat can optionally do XML namespace processing for you, enabled by providing a value for *namespace_separator*. The value must be a one-character string; a "ValueError" will be raised if the string has an illegal length ("None" is considered the same as omission). When namespace processing is enabled, element type names and attribute names that belong to a namespace will be expanded. The element name passed to the element handlers "StartElementHandler" and "EndElementHandler" will be the concatenation of the namespace URI, the namespace separator character, and the local part of the name. If the namespace separator is a zero byte ("chr(0)") then the namespace URI and the local part will be concatenated without any separator. For example, if *namespace_separator* is set to a space character ("' '") and the following document is parsed: "StartElementHandler" will receive the following strings for each element: http://default-namespace.org/ root http://www.python.org/ns/ elem1 elem2 Due to limitations in the "Expat" library used by "pyexpat", the "xmlparser" instance returned can only be used to parse a single XML document. Call "ParserCreate" for each document to provide unique parser instances. See also: The Expat XML Parser Home page of the Expat project. XMLParser Objects ================= "xmlparser" objects have the following methods: xmlparser.Parse(data[, isfinal]) Parses the contents of the string *data*, calling the appropriate handler functions to process the parsed data. *isfinal* must be true on the final call to this method; it allows the parsing of a single file in fragments, not the submission of multiple files. *data* can be the empty string at any time. xmlparser.ParseFile(file) Parse XML data reading from the object *file*. *file* only needs to provide the "read(nbytes)" method, returning the empty string when there’s no more data. xmlparser.SetBase(base) Sets the base to be used for resolving relative URIs in system identifiers in declarations. Resolving relative identifiers is left to the application: this value will be passed through as the *base* argument to the "ExternalEntityRefHandler()", "NotationDeclHandler()", and "UnparsedEntityDeclHandler()" functions. xmlparser.GetBase() Returns a string containing the base set by a previous call to "SetBase()", or "None" if "SetBase()" hasn’t been called. xmlparser.GetInputContext() Returns the input data that generated the current event as a string. The data is in the encoding of the entity which contains the text. When called while an event handler is not active, the return value is "None". xmlparser.ExternalEntityParserCreate(context[, encoding]) Create a “child” parser which can be used to parse an external parsed entity referred to by content parsed by the parent parser. The *context* parameter should be the string passed to the "ExternalEntityRefHandler()" handler function, described below. The child parser is created with the "ordered_attributes" and "specified_attributes" set to the values of this parser. xmlparser.SetParamEntityParsing(flag) Control parsing of parameter entities (including the external DTD subset). Possible *flag* values are "XML_PARAM_ENTITY_PARSING_NEVER", "XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE" and "XML_PARAM_ENTITY_PARSING_ALWAYS". Return true if setting the flag was successful. xmlparser.UseForeignDTD([flag]) Calling this with a true value for *flag* (the default) will cause Expat to call the "ExternalEntityRefHandler" with "None" for all arguments to allow an alternate DTD to be loaded. If the document does not contain a document type declaration, the "ExternalEntityRefHandler" will still be called, but the "StartDoctypeDeclHandler" and "EndDoctypeDeclHandler" will not be called. Passing a false value for *flag* will cancel a previous call that passed a true value, but otherwise has no effect. This method can only be called before the "Parse()" or "ParseFile()" methods are called; calling it after either of those have been called causes "ExpatError" to be raised with the "code" attribute set to "errors.codes[errors.XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING]". xmlparser.SetReparseDeferralEnabled(enabled) Warning: Calling "SetReparseDeferralEnabled(False)" has security implications, as detailed below; please make sure to understand these consequences prior to using the "SetReparseDeferralEnabled" method. Expat 2.6.0 introduced a security mechanism called “reparse deferral” where instead of causing denial of service through quadratic runtime from reparsing large tokens, reparsing of unfinished tokens is now delayed by default until a sufficient amount of input is reached. Due to this delay, registered handlers may — depending of the sizing of input chunks pushed to Expat — no longer be called right after pushing new input to the parser. Where immediate feedback and taking over responsiblity of protecting against denial of service from large tokens are both wanted, calling "SetReparseDeferralEnabled(False)" disables reparse deferral for the current Expat parser instance, temporarily or altogether. Calling "SetReparseDeferralEnabled(True)" allows re- enabling reparse deferral. Note that "SetReparseDeferralEnabled()" has been backported to some prior releases of CPython as a security fix. Check for availability of "SetReparseDeferralEnabled()" using "hasattr()" if used in code running across a variety of Python versions. New in version 3.9.19. xmlparser.GetReparseDeferralEnabled() Returns whether reparse deferral is currently enabled for the given Expat parser instance. New in version 3.9.19. "xmlparser" objects have the following attributes: xmlparser.buffer_size The size of the buffer used when "buffer_text" is true. A new buffer size can be set by assigning a new integer value to this attribute. When the size is changed, the buffer will be flushed. xmlparser.buffer_text Setting this to true causes the "xmlparser" object to buffer textual content returned by Expat to avoid multiple calls to the "CharacterDataHandler()" callback whenever possible. This can improve performance substantially since Expat normally breaks character data into chunks at every line ending. This attribute is false by default, and may be changed at any time. xmlparser.buffer_used If "buffer_text" is enabled, the number of bytes stored in the buffer. These bytes represent UTF-8 encoded text. This attribute has no meaningful interpretation when "buffer_text" is false. xmlparser.ordered_attributes Setting this attribute to a non-zero integer causes the attributes to be reported as a list rather than a dictionary. The attributes are presented in the order found in the document text. For each attribute, two list entries are presented: the attribute name and the attribute value. (Older versions of this module also used this format.) By default, this attribute is false; it may be changed at any time. xmlparser.specified_attributes If set to a non-zero integer, the parser will report only those attributes which were specified in the document instance and not those which were derived from attribute declarations. Applications which set this need to be especially careful to use what additional information is available from the declarations as needed to comply with the standards for the behavior of XML processors. By default, this attribute is false; it may be changed at any time. The following attributes contain values relating to the most recent error encountered by an "xmlparser" object, and will only have correct values once a call to "Parse()" or "ParseFile()" has raised an "xml.parsers.expat.ExpatError" exception. xmlparser.ErrorByteIndex Byte index at which an error occurred. xmlparser.ErrorCode Numeric code specifying the problem. This value can be passed to the "ErrorString()" function, or compared to one of the constants defined in the "errors" object. xmlparser.ErrorColumnNumber Column number at which an error occurred. xmlparser.ErrorLineNumber Line number at which an error occurred. The following attributes contain values relating to the current parse location in an "xmlparser" object. During a callback reporting a parse event they indicate the location of the first of the sequence of characters that generated the event. When called outside of a callback, the position indicated will be just past the last parse event (regardless of whether there was an associated callback). xmlparser.CurrentByteIndex Current byte index in the parser input. xmlparser.CurrentColumnNumber Current column number in the parser input. xmlparser.CurrentLineNumber Current line number in the parser input. Here is the list of handlers that can be set. To set a handler on an "xmlparser" object *o*, use "o.handlername = func". *handlername* must be taken from the following list, and *func* must be a callable object accepting the correct number of arguments. The arguments are all strings, unless otherwise stated. xmlparser.XmlDeclHandler(version, encoding, standalone) Called when the XML declaration is parsed. The XML declaration is the (optional) declaration of the applicable version of the XML recommendation, the encoding of the document text, and an optional “standalone” declaration. *version* and *encoding* will be strings, and *standalone* will be "1" if the document is declared standalone, "0" if it is declared not to be standalone, or "-1" if the standalone clause was omitted. This is only available with Expat version 1.95.0 or newer. xmlparser.StartDoctypeDeclHandler(doctypeName, systemId, publicId, has_internal_subset) Called when Expat begins parsing the document type declaration ("'". xmlparser.StartCdataSectionHandler() Called at the start of a CDATA section. This and "EndCdataSectionHandler" are needed to be able to identify the syntactical start and end for CDATA sections. xmlparser.EndCdataSectionHandler() Called at the end of a CDATA section. xmlparser.DefaultHandler(data) Called for any characters in the XML document for which no applicable handler has been specified. This means characters that are part of a construct which could be reported, but for which no handler has been supplied. xmlparser.DefaultHandlerExpand(data) This is the same as the "DefaultHandler()", but doesn’t inhibit expansion of internal entities. The entity reference will not be passed to the default handler. xmlparser.NotStandaloneHandler() Called if the XML document hasn’t been declared as being a standalone document. This happens when there is an external subset or a reference to a parameter entity, but the XML declaration does not set standalone to "yes" in an XML declaration. If this handler returns "0", then the parser will raise an "XML_ERROR_NOT_STANDALONE" error. If this handler is not set, no exception is raised by the parser for this condition. xmlparser.ExternalEntityRefHandler(context, base, systemId, publicId) Called for references to external entities. *base* is the current base, as set by a previous call to "SetBase()". The public and system identifiers, *systemId* and *publicId*, are strings if given; if the public identifier is not given, *publicId* will be "None". The *context* value is opaque and should only be used as described below. For external entities to be parsed, this handler must be implemented. It is responsible for creating the sub-parser using "ExternalEntityParserCreate(context)", initializing it with the appropriate callbacks, and parsing the entity. This handler should return an integer; if it returns "0", the parser will raise an "XML_ERROR_EXTERNAL_ENTITY_HANDLING" error, otherwise parsing will continue. If this handler is not provided, external entities are reported by the "DefaultHandler" callback, if provided. ExpatError Exceptions ===================== "ExpatError" exceptions have a number of interesting attributes: ExpatError.code Expat’s internal error number for the specific error. The "errors.messages" dictionary maps these error numbers to Expat’s error messages. For example: from xml.parsers.expat import ParserCreate, ExpatError, errors p = ParserCreate() try: p.Parse(some_xml_document) except ExpatError as err: print("Error:", errors.messages[err.code]) The "errors" module also provides error message constants and a dictionary "codes" mapping these messages back to the error codes, see below. ExpatError.lineno Line number on which the error was detected. The first line is numbered "1". ExpatError.offset Character offset into the line where the error occurred. The first column is numbered "0". Example ======= The following program defines three handlers that just print out their arguments. import xml.parsers.expat # 3 handler functions def start_element(name, attrs): print('Start element:', name, attrs) def end_element(name): print('End element:', name) def char_data(data): print('Character data:', repr(data)) p = xml.parsers.expat.ParserCreate() p.StartElementHandler = start_element p.EndElementHandler = end_element p.CharacterDataHandler = char_data p.Parse(""" Text goes here More text """, 1) The output from this program is: Start element: parent {'id': 'top'} Start element: child1 {'name': 'paul'} Character data: 'Text goes here' End element: child1 Character data: '\n' Start element: child2 {'name': 'fred'} Character data: 'More text' End element: child2 Character data: '\n' End element: parent Content Model Descriptions ========================== Content models are described using nested tuples. Each tuple contains four values: the type, the quantifier, the name, and a tuple of children. Children are simply additional content model descriptions. The values of the first two fields are constants defined in the "xml.parsers.expat.model" module. These constants can be collected in two groups: the model type group and the quantifier group. The constants in the model type group are: xml.parsers.expat.model.XML_CTYPE_ANY The element named by the model name was declared to have a content model of "ANY". xml.parsers.expat.model.XML_CTYPE_CHOICE The named element allows a choice from a number of options; this is used for content models such as "(A | B | C)". xml.parsers.expat.model.XML_CTYPE_EMPTY Elements which are declared to be "EMPTY" have this model type. xml.parsers.expat.model.XML_CTYPE_MIXED xml.parsers.expat.model.XML_CTYPE_NAME xml.parsers.expat.model.XML_CTYPE_SEQ Models which represent a series of models which follow one after the other are indicated with this model type. This is used for models such as "(A, B, C)". The constants in the quantifier group are: xml.parsers.expat.model.XML_CQUANT_NONE No modifier is given, so it can appear exactly once, as for "A". xml.parsers.expat.model.XML_CQUANT_OPT The model is optional: it can appear once or not at all, as for "A?". xml.parsers.expat.model.XML_CQUANT_PLUS The model must occur one or more times (like "A+"). xml.parsers.expat.model.XML_CQUANT_REP The model must occur zero or more times, as for "A*". Expat error constants ===================== The following constants are provided in the "xml.parsers.expat.errors" module. These constants are useful in interpreting some of the attributes of the "ExpatError" exception objects raised when an error has occurred. Since for backwards compatibility reasons, the constants’ value is the error *message* and not the numeric error *code*, you do this by comparing its "code" attribute with "errors.codes[errors.XML_ERROR_*CONSTANT_NAME*]". The "errors" module has the following attributes: xml.parsers.expat.errors.codes A dictionary mapping string descriptions to their error codes. New in version 3.2. xml.parsers.expat.errors.messages A dictionary mapping numeric error codes to their string descriptions. New in version 3.2. xml.parsers.expat.errors.XML_ERROR_ASYNC_ENTITY xml.parsers.expat.errors.XML_ERROR_ATTRIBUTE_EXTERNAL_ENTITY_REF An entity reference in an attribute value referred to an external entity instead of an internal entity. xml.parsers.expat.errors.XML_ERROR_BAD_CHAR_REF A character reference referred to a character which is illegal in XML (for example, character "0", or ‘"�"’). xml.parsers.expat.errors.XML_ERROR_BINARY_ENTITY_REF An entity reference referred to an entity which was declared with a notation, so cannot be parsed. xml.parsers.expat.errors.XML_ERROR_DUPLICATE_ATTRIBUTE An attribute was used more than once in a start tag. xml.parsers.expat.errors.XML_ERROR_INCORRECT_ENCODING xml.parsers.expat.errors.XML_ERROR_INVALID_TOKEN Raised when an input byte could not properly be assigned to a character; for example, a NUL byte (value "0") in a UTF-8 input stream. xml.parsers.expat.errors.XML_ERROR_JUNK_AFTER_DOC_ELEMENT Something other than whitespace occurred after the document element. xml.parsers.expat.errors.XML_ERROR_MISPLACED_XML_PI An XML declaration was found somewhere other than the start of the input data. xml.parsers.expat.errors.XML_ERROR_NO_ELEMENTS The document contains no elements (XML requires all documents to contain exactly one top-level element).. xml.parsers.expat.errors.XML_ERROR_NO_MEMORY Expat was not able to allocate memory internally. xml.parsers.expat.errors.XML_ERROR_PARAM_ENTITY_REF A parameter entity reference was found where it was not allowed. xml.parsers.expat.errors.XML_ERROR_PARTIAL_CHAR An incomplete character was found in the input. xml.parsers.expat.errors.XML_ERROR_RECURSIVE_ENTITY_REF An entity reference contained another reference to the same entity; possibly via a different name, and possibly indirectly. xml.parsers.expat.errors.XML_ERROR_SYNTAX Some unspecified syntax error was encountered. xml.parsers.expat.errors.XML_ERROR_TAG_MISMATCH An end tag did not match the innermost open start tag. xml.parsers.expat.errors.XML_ERROR_UNCLOSED_TOKEN Some token (such as a start tag) was not closed before the end of the stream or the next token was encountered. xml.parsers.expat.errors.XML_ERROR_UNDEFINED_ENTITY A reference was made to an entity which was not defined. xml.parsers.expat.errors.XML_ERROR_UNKNOWN_ENCODING The document encoding is not supported by Expat. xml.parsers.expat.errors.XML_ERROR_UNCLOSED_CDATA_SECTION A CDATA marked section was not closed. xml.parsers.expat.errors.XML_ERROR_EXTERNAL_ENTITY_HANDLING xml.parsers.expat.errors.XML_ERROR_NOT_STANDALONE The parser determined that the document was not “standalone” though it declared itself to be in the XML declaration, and the "NotStandaloneHandler" was set and returned "0". xml.parsers.expat.errors.XML_ERROR_UNEXPECTED_STATE xml.parsers.expat.errors.XML_ERROR_ENTITY_DECLARED_IN_PE xml.parsers.expat.errors.XML_ERROR_FEATURE_REQUIRES_XML_DTD An operation was requested that requires DTD support to be compiled in, but Expat was configured without DTD support. This should never be reported by a standard build of the "xml.parsers.expat" module. xml.parsers.expat.errors.XML_ERROR_CANT_CHANGE_FEATURE_ONCE_PARSING A behavioral change was requested after parsing started that can only be changed before parsing has started. This is (currently) only raised by "UseForeignDTD()". xml.parsers.expat.errors.XML_ERROR_UNBOUND_PREFIX An undeclared prefix was found when namespace processing was enabled. xml.parsers.expat.errors.XML_ERROR_UNDECLARING_PREFIX The document attempted to remove the namespace declaration associated with a prefix. xml.parsers.expat.errors.XML_ERROR_INCOMPLETE_PE A parameter entity contained incomplete markup. xml.parsers.expat.errors.XML_ERROR_XML_DECL The document contained no document element at all. xml.parsers.expat.errors.XML_ERROR_TEXT_DECL There was an error parsing a text declaration in an external entity. xml.parsers.expat.errors.XML_ERROR_PUBLICID Characters were found in the public id that are not allowed. xml.parsers.expat.errors.XML_ERROR_SUSPENDED The requested operation was made on a suspended parser, but isn’t allowed. This includes attempts to provide additional input or to stop the parser. xml.parsers.expat.errors.XML_ERROR_NOT_SUSPENDED An attempt to resume the parser was made when the parser had not been suspended. xml.parsers.expat.errors.XML_ERROR_ABORTED This should not be reported to Python applications. xml.parsers.expat.errors.XML_ERROR_FINISHED The requested operation was made on a parser which was finished parsing input, but isn’t allowed. This includes attempts to provide additional input or to stop the parser. xml.parsers.expat.errors.XML_ERROR_SUSPEND_PE -[ Footnotes ]- [1] The encoding string included in XML output should conform to the appropriate standards. For example, “UTF-8” is valid, but “UTF8” is not. See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT- EncodingDecl and https://www.iana.org/assignments/character-sets /character-sets.xhtml. Python Runtime Services *********************** The modules described in this chapter provide a wide range of services related to the Python interpreter and its interaction with its environment. Here’s an overview: * "sys" — System-specific parameters and functions * "sysconfig" — Provide access to Python’s configuration information * Configuration variables * Installation paths * Other functions * Using "sysconfig" as a script * "builtins" — Built-in objects * "__main__" — Top-level script environment * "warnings" — Warning control * Warning Categories * The Warnings Filter * Describing Warning Filters * Default Warning Filter * Overriding the default filter * Temporarily Suppressing Warnings * Testing Warnings * Updating Code For New Versions of Dependencies * Available Functions * Available Context Managers * "dataclasses" — Data Classes * Module-level decorators, classes, and functions * Post-init processing * Class variables * Init-only variables * Frozen instances * Inheritance * Default factory functions * Mutable default values * Exceptions * "contextlib" — Utilities for "with"-statement contexts * Utilities * Examples and Recipes * Supporting a variable number of context managers * Catching exceptions from "__enter__" methods * Cleaning up in an "__enter__" implementation * Replacing any use of "try-finally" and flag variables * Using a context manager as a function decorator * Single use, reusable and reentrant context managers * Reentrant context managers * Reusable context managers * "abc" — Abstract Base Classes * "atexit" — Exit handlers * "atexit" Example * "traceback" — Print or retrieve a stack traceback * "TracebackException" Objects * "StackSummary" Objects * "FrameSummary" Objects * Traceback Examples * "__future__" — Future statement definitions * "gc" — Garbage Collector interface * "inspect" — Inspect live objects * Types and members * Retrieving source code * Introspecting callables with the Signature object * Classes and functions * The interpreter stack * Fetching attributes statically * Current State of Generators and Coroutines * Code Objects Bit Flags * Command Line Interface * "site" — Site-specific configuration hook * Readline configuration * Module contents * Command Line Interface "queue" — A synchronized queue class ************************************ **Source code:** Lib/queue.py ====================================================================== The "queue" module implements multi-producer, multi-consumer queues. It is especially useful in threaded programming when information must be exchanged safely between multiple threads. The "Queue" class in this module implements all the required locking semantics. The module implements three types of queue, which differ only in the order in which the entries are retrieved. In a FIFO (first-in, first- out) queue, the first tasks added are the first retrieved. In a LIFO (last-in, first-out) queue, the most recently added entry is the first retrieved (operating like a stack). With a priority queue, the entries are kept sorted (using the "heapq" module) and the lowest valued entry is retrieved first. Internally, those three types of queues use locks to temporarily block competing threads; however, they are not designed to handle reentrancy within a thread. In addition, the module implements a “simple” FIFO (first-in, first- out) queue type, "SimpleQueue", whose specific implementation provides additional guarantees in exchange for the smaller functionality. The "queue" module defines the following classes and exceptions: class queue.Queue(maxsize=0) Constructor for a FIFO (first-in, first-out) queue. *maxsize* is an integer that sets the upperbound limit on the number of items that can be placed in the queue. Insertion will block once this size has been reached, until queue items are consumed. If *maxsize* is less than or equal to zero, the queue size is infinite. class queue.LifoQueue(maxsize=0) Constructor for a LIFO (last-in, first-out) queue. *maxsize* is an integer that sets the upperbound limit on the number of items that can be placed in the queue. Insertion will block once this size has been reached, until queue items are consumed. If *maxsize* is less than or equal to zero, the queue size is infinite. class queue.PriorityQueue(maxsize=0) Constructor for a priority queue. *maxsize* is an integer that sets the upperbound limit on the number of items that can be placed in the queue. Insertion will block once this size has been reached, until queue items are consumed. If *maxsize* is less than or equal to zero, the queue size is infinite. The lowest valued entries are retrieved first (the lowest valued entry is the one returned by "sorted(list(entries))[0]"). A typical pattern for entries is a tuple in the form: "(priority_number, data)". If the *data* elements are not comparable, the data can be wrapped in a class that ignores the data item and only compares the priority number: from dataclasses import dataclass, field from typing import Any @dataclass(order=True) class PrioritizedItem: priority: int item: Any=field(compare=False) class queue.SimpleQueue Constructor for an unbounded FIFO (first-in, first-out) queue. Simple queues lack advanced functionality such as task tracking. New in version 3.7. exception queue.Empty Exception raised when non-blocking "get()" (or "get_nowait()") is called on a "Queue" object which is empty. exception queue.Full Exception raised when non-blocking "put()" (or "put_nowait()") is called on a "Queue" object which is full. Queue Objects ============= Queue objects ("Queue", "LifoQueue", or "PriorityQueue") provide the public methods described below. Queue.qsize() Return the approximate size of the queue. Note, qsize() > 0 doesn’t guarantee that a subsequent get() will not block, nor will qsize() < maxsize guarantee that put() will not block. Queue.empty() Return "True" if the queue is empty, "False" otherwise. If empty() returns "True" it doesn’t guarantee that a subsequent call to put() will not block. Similarly, if empty() returns "False" it doesn’t guarantee that a subsequent call to get() will not block. Queue.full() Return "True" if the queue is full, "False" otherwise. If full() returns "True" it doesn’t guarantee that a subsequent call to get() will not block. Similarly, if full() returns "False" it doesn’t guarantee that a subsequent call to put() will not block. Queue.put(item, block=True, timeout=None) Put *item* into the queue. If optional args *block* is true and *timeout* is "None" (the default), block if necessary until a free slot is available. If *timeout* is a positive number, it blocks at most *timeout* seconds and raises the "Full" exception if no free slot was available within that time. Otherwise (*block* is false), put an item on the queue if a free slot is immediately available, else raise the "Full" exception (*timeout* is ignored in that case). Queue.put_nowait(item) Equivalent to "put(item, block=False)". Queue.get(block=True, timeout=None) Remove and return an item from the queue. If optional args *block* is true and *timeout* is "None" (the default), block if necessary until an item is available. If *timeout* is a positive number, it blocks at most *timeout* seconds and raises the "Empty" exception if no item was available within that time. Otherwise (*block* is false), return an item if one is immediately available, else raise the "Empty" exception (*timeout* is ignored in that case). Prior to 3.0 on POSIX systems, and for all versions on Windows, if *block* is true and *timeout* is "None", this operation goes into an uninterruptible wait on an underlying lock. This means that no exceptions can occur, and in particular a SIGINT will not trigger a "KeyboardInterrupt". Queue.get_nowait() Equivalent to "get(False)". Two methods are offered to support tracking whether enqueued tasks have been fully processed by daemon consumer threads. Queue.task_done() Indicate that a formerly enqueued task is complete. Used by queue consumer threads. For each "get()" used to fetch a task, a subsequent call to "task_done()" tells the queue that the processing on the task is complete. If a "join()" is currently blocking, it will resume when all items have been processed (meaning that a "task_done()" call was received for every item that had been "put()" into the queue). Raises a "ValueError" if called more times than there were items placed in the queue. Queue.join() Blocks until all items in the queue have been gotten and processed. The count of unfinished tasks goes up whenever an item is added to the queue. The count goes down whenever a consumer thread calls "task_done()" to indicate that the item was retrieved and all work on it is complete. When the count of unfinished tasks drops to zero, "join()" unblocks. Example of how to wait for enqueued tasks to be completed: import threading, queue q = queue.Queue() def worker(): while True: item = q.get() print(f'Working on {item}') print(f'Finished {item}') q.task_done() # turn-on the worker thread threading.Thread(target=worker, daemon=True).start() # send thirty task requests to the worker for item in range(30): q.put(item) print('All task requests sent\n', end='') # block until all tasks are done q.join() print('All work completed') SimpleQueue Objects =================== "SimpleQueue" objects provide the public methods described below. SimpleQueue.qsize() Return the approximate size of the queue. Note, qsize() > 0 doesn’t guarantee that a subsequent get() will not block. SimpleQueue.empty() Return "True" if the queue is empty, "False" otherwise. If empty() returns "False" it doesn’t guarantee that a subsequent call to get() will not block. SimpleQueue.put(item, block=True, timeout=None) Put *item* into the queue. The method never blocks and always succeeds (except for potential low-level errors such as failure to allocate memory). The optional args *block* and *timeout* are ignored and only provided for compatibility with "Queue.put()". **CPython implementation detail:** This method has a C implementation which is reentrant. That is, a "put()" or "get()" call can be interrupted by another "put()" call in the same thread without deadlocking or corrupting internal state inside the queue. This makes it appropriate for use in destructors such as "__del__" methods or "weakref" callbacks. SimpleQueue.put_nowait(item) Equivalent to "put(item, block=False)", provided for compatibility with "Queue.put_nowait()". SimpleQueue.get(block=True, timeout=None) Remove and return an item from the queue. If optional args *block* is true and *timeout* is "None" (the default), block if necessary until an item is available. If *timeout* is a positive number, it blocks at most *timeout* seconds and raises the "Empty" exception if no item was available within that time. Otherwise (*block* is false), return an item if one is immediately available, else raise the "Empty" exception (*timeout* is ignored in that case). SimpleQueue.get_nowait() Equivalent to "get(False)". See also: Class "multiprocessing.Queue" A queue class for use in a multi-processing (rather than multi- threading) context. "collections.deque" is an alternative implementation of unbounded queues with fast atomic "append()" and "popleft()" operations that do not require locking and also support indexing. "quopri" — Encode and decode MIME quoted-printable data ******************************************************* **Source code:** Lib/quopri.py ====================================================================== This module performs quoted-printable transport encoding and decoding, as defined in **RFC 1521**: “MIME (Multipurpose Internet Mail Extensions) Part One: Mechanisms for Specifying and Describing the Format of Internet Message Bodies”. The quoted-printable encoding is designed for data where there are relatively few nonprintable characters; the base64 encoding scheme available via the "base64" module is more compact if there are many such characters, as when sending a graphics file. quopri.decode(input, output, header=False) Decode the contents of the *input* file and write the resulting decoded binary data to the *output* file. *input* and *output* must be *binary file objects*. If the optional argument *header* is present and true, underscore will be decoded as space. This is used to decode “Q”-encoded headers as described in **RFC 1522**: “MIME (Multipurpose Internet Mail Extensions) Part Two: Message Header Extensions for Non-ASCII Text”. quopri.encode(input, output, quotetabs, header=False) Encode the contents of the *input* file and write the resulting quoted-printable data to the *output* file. *input* and *output* must be *binary file objects*. *quotetabs*, a non-optional flag which controls whether to encode embedded spaces and tabs; when true it encodes such embedded whitespace, and when false it leaves them unencoded. Note that spaces and tabs appearing at the end of lines are always encoded, as per **RFC 1521**. *header* is a flag which controls if spaces are encoded as underscores as per **RFC 1522**. quopri.decodestring(s, header=False) Like "decode()", except that it accepts a source "bytes" and returns the corresponding decoded "bytes". quopri.encodestring(s, quotetabs=False, header=False) Like "encode()", except that it accepts a source "bytes" and returns the corresponding encoded "bytes". By default, it sends a "False" value to *quotetabs* parameter of the "encode()" function. See also: Module "base64" Encode and decode MIME base64 data "random" — Generate pseudo-random numbers ***************************************** **Source code:** Lib/random.py ====================================================================== This module implements pseudo-random number generators for various distributions. For integers, there is uniform selection from a range. For sequences, there is uniform selection of a random element, a function to generate a random permutation of a list in-place, and a function for random sampling without replacement. On the real line, there are functions to compute uniform, normal (Gaussian), lognormal, negative exponential, gamma, and beta distributions. For generating distributions of angles, the von Mises distribution is available. Almost all module functions depend on the basic function "random()", which generates a random float uniformly in the semi-open range [0.0, 1.0). Python uses the Mersenne Twister as the core generator. It produces 53-bit precision floats and has a period of 2**19937-1. The underlying implementation in C is both fast and threadsafe. The Mersenne Twister is one of the most extensively tested random number generators in existence. However, being completely deterministic, it is not suitable for all purposes, and is completely unsuitable for cryptographic purposes. The functions supplied by this module are actually bound methods of a hidden instance of the "random.Random" class. You can instantiate your own instances of "Random" to get generators that don’t share state. Class "Random" can also be subclassed if you want to use a different basic generator of your own devising: in that case, override the "random()", "seed()", "getstate()", and "setstate()" methods. Optionally, a new generator can supply a "getrandbits()" method — this allows "randrange()" to produce selections over an arbitrarily large range. The "random" module also provides the "SystemRandom" class which uses the system function "os.urandom()" to generate random numbers from sources provided by the operating system. Warning: The pseudo-random generators of this module should not be used for security purposes. For security or cryptographic uses, see the "secrets" module. See also: M. Matsumoto and T. Nishimura, “Mersenne Twister: A 623-dimensionally equidistributed uniform pseudorandom number generator”, ACM Transactions on Modeling and Computer Simulation Vol. 8, No. 1, January pp.3–30 1998. Complementary-Multiply-with-Carry recipe for a compatible alternative random number generator with a long period and comparatively simple update operations. Bookkeeping functions ===================== random.seed(a=None, version=2) Initialize the random number generator. If *a* is omitted or "None", the current system time is used. If randomness sources are provided by the operating system, they are used instead of the system time (see the "os.urandom()" function for details on availability). If *a* is an int, it is used directly. With version 2 (the default), a "str", "bytes", or "bytearray" object gets converted to an "int" and all of its bits are used. With version 1 (provided for reproducing random sequences from older versions of Python), the algorithm for "str" and "bytes" generates a narrower range of seeds. Changed in version 3.2: Moved to the version 2 scheme which uses all of the bits in a string seed. Deprecated since version 3.9: In the future, the *seed* must be one of the following types: *NoneType*, "int", "float", "str", "bytes", or "bytearray". random.getstate() Return an object capturing the current internal state of the generator. This object can be passed to "setstate()" to restore the state. random.setstate(state) *state* should have been obtained from a previous call to "getstate()", and "setstate()" restores the internal state of the generator to what it was at the time "getstate()" was called. Functions for bytes =================== random.randbytes(n) Generate *n* random bytes. This method should not be used for generating security tokens. Use "secrets.token_bytes()" instead. New in version 3.9. Functions for integers ====================== random.randrange(stop) random.randrange(start, stop[, step]) Return a randomly selected element from "range(start, stop, step)". This is equivalent to "choice(range(start, stop, step))", but doesn’t actually build a range object. The positional argument pattern matches that of "range()". Keyword arguments should not be used because the function may use them in unexpected ways. Changed in version 3.2: "randrange()" is more sophisticated about producing equally distributed values. Formerly it used a style like "int(random()*n)" which could produce slightly uneven distributions. random.randint(a, b) Return a random integer *N* such that "a <= N <= b". Alias for "randrange(a, b+1)". random.getrandbits(k) Returns a non-negative Python integer with *k* random bits. This method is supplied with the MersenneTwister generator and some other generators may also provide it as an optional part of the API. When available, "getrandbits()" enables "randrange()" to handle arbitrarily large ranges. Changed in version 3.9: This method now accepts zero for *k*. Functions for sequences ======================= random.choice(seq) Return a random element from the non-empty sequence *seq*. If *seq* is empty, raises "IndexError". random.choices(population, weights=None, *, cum_weights=None, k=1) Return a *k* sized list of elements chosen from the *population* with replacement. If the *population* is empty, raises "IndexError". If a *weights* sequence is specified, selections are made according to the relative weights. Alternatively, if a *cum_weights* sequence is given, the selections are made according to the cumulative weights (perhaps computed using "itertools.accumulate()"). For example, the relative weights "[10, 5, 30, 5]" are equivalent to the cumulative weights "[10, 15, 45, 50]". Internally, the relative weights are converted to cumulative weights before making selections, so supplying the cumulative weights saves work. If neither *weights* nor *cum_weights* are specified, selections are made with equal probability. If a weights sequence is supplied, it must be the same length as the *population* sequence. It is a "TypeError" to specify both *weights* and *cum_weights*. The *weights* or *cum_weights* can use any numeric type that interoperates with the "float" values returned by "random()" (that includes integers, floats, and fractions but excludes decimals). Behavior is undefined if any weight is negative. A "ValueError" is raised if all weights are zero. For a given seed, the "choices()" function with equal weighting typically produces a different sequence than repeated calls to "choice()". The algorithm used by "choices()" uses floating point arithmetic for internal consistency and speed. The algorithm used by "choice()" defaults to integer arithmetic with repeated selections to avoid small biases from round-off error. New in version 3.6. Changed in version 3.9: Raises a "ValueError" if all weights are zero. random.shuffle(x[, random]) Shuffle the sequence *x* in place. The optional argument *random* is a 0-argument function returning a random float in [0.0, 1.0); by default, this is the function "random()". To shuffle an immutable sequence and return a new shuffled list, use "sample(x, k=len(x))" instead. Note that even for small "len(x)", the total number of permutations of *x* can quickly grow larger than the period of most random number generators. This implies that most permutations of a long sequence can never be generated. For example, a sequence of length 2080 is the largest that can fit within the period of the Mersenne Twister random number generator. Deprecated since version 3.9, will be removed in version 3.11: The optional parameter *random*. random.sample(population, k, *, counts=None) Return a *k* length list of unique elements chosen from the population sequence or set. Used for random sampling without replacement. Returns a new list containing elements from the population while leaving the original population unchanged. The resulting list is in selection order so that all sub-slices will also be valid random samples. This allows raffle winners (the sample) to be partitioned into grand prize and second place winners (the subslices). Members of the population need not be *hashable* or unique. If the population contains repeats, then each occurrence is a possible selection in the sample. Repeated elements can be specified one at a time or with the optional keyword-only *counts* parameter. For example, "sample(['red', 'blue'], counts=[4, 2], k=5)" is equivalent to "sample(['red', 'red', 'red', 'red', 'blue', 'blue'], k=5)". To choose a sample from a range of integers, use a "range()" object as an argument. This is especially fast and space efficient for sampling from a large population: "sample(range(10000000), k=60)". If the sample size is larger than the population size, a "ValueError" is raised. Changed in version 3.9: Added the *counts* parameter. Deprecated since version 3.9: In the future, the *population* must be a sequence. Instances of "set" are no longer supported. The set must first be converted to a "list" or "tuple", preferably in a deterministic order so that the sample is reproducible. Real-valued distributions ========================= The following functions generate specific real-valued distributions. Function parameters are named after the corresponding variables in the distribution’s equation, as used in common mathematical practice; most of these equations can be found in any statistics text. random.random() Return the next random floating point number in the range [0.0, 1.0). random.uniform(a, b) Return a random floating point number *N* such that "a <= N <= b" for "a <= b" and "b <= N <= a" for "b < a". The end-point value "b" may or may not be included in the range depending on floating-point rounding in the equation "a + (b-a) * random()". random.triangular(low, high, mode) Return a random floating point number *N* such that "low <= N <= high" and with the specified *mode* between those bounds. The *low* and *high* bounds default to zero and one. The *mode* argument defaults to the midpoint between the bounds, giving a symmetric distribution. random.betavariate(alpha, beta) Beta distribution. Conditions on the parameters are "alpha > 0" and "beta > 0". Returned values range between 0 and 1. random.expovariate(lambd) Exponential distribution. *lambd* is 1.0 divided by the desired mean. It should be nonzero. (The parameter would be called “lambda”, but that is a reserved word in Python.) Returned values range from 0 to positive infinity if *lambd* is positive, and from negative infinity to 0 if *lambd* is negative. random.gammavariate(alpha, beta) Gamma distribution. (*Not* the gamma function!) Conditions on the parameters are "alpha > 0" and "beta > 0". The probability distribution function is: x ** (alpha - 1) * math.exp(-x / beta) pdf(x) = -------------------------------------- math.gamma(alpha) * beta ** alpha random.gauss(mu, sigma) Gaussian distribution. *mu* is the mean, and *sigma* is the standard deviation. This is slightly faster than the "normalvariate()" function defined below. Multithreading note: When two threads call this function simultaneously, it is possible that they will receive the same return value. This can be avoided in three ways. 1) Have each thread use a different instance of the random number generator. 2) Put locks around all calls. 3) Use the slower, but thread-safe "normalvariate()" function instead. random.lognormvariate(mu, sigma) Log normal distribution. If you take the natural logarithm of this distribution, you’ll get a normal distribution with mean *mu* and standard deviation *sigma*. *mu* can have any value, and *sigma* must be greater than zero. random.normalvariate(mu, sigma) Normal distribution. *mu* is the mean, and *sigma* is the standard deviation. random.vonmisesvariate(mu, kappa) *mu* is the mean angle, expressed in radians between 0 and 2**pi*, and *kappa* is the concentration parameter, which must be greater than or equal to zero. If *kappa* is equal to zero, this distribution reduces to a uniform random angle over the range 0 to 2**pi*. random.paretovariate(alpha) Pareto distribution. *alpha* is the shape parameter. random.weibullvariate(alpha, beta) Weibull distribution. *alpha* is the scale parameter and *beta* is the shape parameter. Alternative Generator ===================== class random.Random([seed]) Class that implements the default pseudo-random number generator used by the "random" module. Deprecated since version 3.9: In the future, the *seed* must be one of the following types: "NoneType", "int", "float", "str", "bytes", or "bytearray". class random.SystemRandom([seed]) Class that uses the "os.urandom()" function for generating random numbers from sources provided by the operating system. Not available on all systems. Does not rely on software state, and sequences are not reproducible. Accordingly, the "seed()" method has no effect and is ignored. The "getstate()" and "setstate()" methods raise "NotImplementedError" if called. Notes on Reproducibility ======================== Sometimes it is useful to be able to reproduce the sequences given by a pseudo-random number generator. By re-using a seed value, the same sequence should be reproducible from run to run as long as multiple threads are not running. Most of the random module’s algorithms and seeding functions are subject to change across Python versions, but two aspects are guaranteed not to change: * If a new seeding method is added, then a backward compatible seeder will be offered. * The generator’s "random()" method will continue to produce the same sequence when the compatible seeder is given the same seed. Examples ======== Basic examples: >>> random() # Random float: 0.0 <= x < 1.0 0.37444887175646646 >>> uniform(2.5, 10.0) # Random float: 2.5 <= x <= 10.0 3.1800146073117523 >>> expovariate(1 / 5) # Interval between arrivals averaging 5 seconds 5.148957571865031 >>> randrange(10) # Integer from 0 to 9 inclusive 7 >>> randrange(0, 101, 2) # Even integer from 0 to 100 inclusive 26 >>> choice(['win', 'lose', 'draw']) # Single random element from a sequence 'draw' >>> deck = 'ace two three four'.split() >>> shuffle(deck) # Shuffle a list >>> deck ['four', 'two', 'ace', 'three'] >>> sample([10, 20, 30, 40, 50], k=4) # Four samples without replacement [40, 10, 50, 30] Simulations: >>> # Six roulette wheel spins (weighted sampling with replacement) >>> choices(['red', 'black', 'green'], [18, 18, 2], k=6) ['red', 'green', 'black', 'black', 'red', 'black'] >>> # Deal 20 cards without replacement from a deck >>> # of 52 playing cards, and determine the proportion of cards >>> # with a ten-value: ten, jack, queen, or king. >>> dealt = sample(['tens', 'low cards'], counts=[16, 36], k=20) >>> dealt.count('tens') / 20 0.15 >>> # Estimate the probability of getting 5 or more heads from 7 spins >>> # of a biased coin that settles on heads 60% of the time. >>> def trial(): ... return choices('HT', cum_weights=(0.60, 1.00), k=7).count('H') >= 5 ... >>> sum(trial() for i in range(10_000)) / 10_000 0.4169 >>> # Probability of the median of 5 samples being in middle two quartiles >>> def trial(): ... return 2_500 <= sorted(choices(range(10_000), k=5))[2] < 7_500 ... >>> sum(trial() for i in range(10_000)) / 10_000 0.7958 Example of statistical bootstrapping using resampling with replacement to estimate a confidence interval for the mean of a sample: # http://statistics.about.com/od/Applications/a/Example-Of-Bootstrapping.htm from statistics import fmean as mean from random import choices data = [41, 50, 29, 37, 81, 30, 73, 63, 20, 35, 68, 22, 60, 31, 95] means = sorted(mean(choices(data, k=len(data))) for i in range(100)) print(f'The sample mean of {mean(data):.1f} has a 90% confidence ' f'interval from {means[5]:.1f} to {means[94]:.1f}') Example of a resampling permutation test to determine the statistical significance or p-value of an observed difference between the effects of a drug versus a placebo: # Example from "Statistics is Easy" by Dennis Shasha and Manda Wilson from statistics import fmean as mean from random import shuffle drug = [54, 73, 53, 70, 73, 68, 52, 65, 65] placebo = [54, 51, 58, 44, 55, 52, 42, 47, 58, 46] observed_diff = mean(drug) - mean(placebo) n = 10_000 count = 0 combined = drug + placebo for i in range(n): shuffle(combined) new_diff = mean(combined[:len(drug)]) - mean(combined[len(drug):]) count += (new_diff >= observed_diff) print(f'{n} label reshufflings produced only {count} instances with a difference') print(f'at least as extreme as the observed difference of {observed_diff:.1f}.') print(f'The one-sided p-value of {count / n:.4f} leads us to reject the null') print(f'hypothesis that there is no difference between the drug and the placebo.') Simulation of arrival times and service deliveries for a multiserver queue: from heapq import heapify, heapreplace from random import expovariate, gauss from statistics import mean, median, stdev average_arrival_interval = 5.6 average_service_time = 15.0 stdev_service_time = 3.5 num_servers = 3 waits = [] arrival_time = 0.0 servers = [0.0] * num_servers # time when each server becomes available heapify(servers) for i in range(1_000_000): arrival_time += expovariate(1.0 / average_arrival_interval) next_server_available = servers[0] wait = max(0.0, next_server_available - arrival_time) waits.append(wait) service_duration = max(0.0, gauss(average_service_time, stdev_service_time)) service_completed = arrival_time + wait + service_duration heapreplace(servers, service_completed) print(f'Mean wait: {mean(waits):.1f}. Stdev wait: {stdev(waits):.1f}.') print(f'Median wait: {median(waits):.1f}. Max wait: {max(waits):.1f}.') See also: Statistics for Hackers a video tutorial by Jake Vanderplas on statistical analysis using just a few fundamental concepts including simulation, sampling, shuffling, and cross-validation. Economics Simulation a simulation of a marketplace by Peter Norvig that shows effective use of many of the tools and distributions provided by this module (gauss, uniform, sample, betavariate, choice, triangular, and randrange). A Concrete Introduction to Probability (using Python) a tutorial by Peter Norvig covering the basics of probability theory, how to write simulations, and how to perform data analysis using Python. Recipes ======= The default "random()" returns multiples of 2⁻⁵³ in the range *0.0 ≤ x < 1.0*. All such numbers are evenly spaced and are exactly representable as Python floats. However, many other representable floats in that interval are not possible selections. For example, "0.05954861408025609" isn’t an integer multiple of 2⁻⁵³. The following recipe takes a different approach. All floats in the interval are possible selections. The mantissa comes from a uniform distribution of integers in the range *2⁵² ≤ mantissa < 2⁵³*. The exponent comes from a geometric distribution where exponents smaller than *-53* occur half as often as the next larger exponent. from random import Random from math import ldexp class FullRandom(Random): def random(self): mantissa = 0x10_0000_0000_0000 | self.getrandbits(52) exponent = -53 x = 0 while not x: x = self.getrandbits(32) exponent += x.bit_length() - 32 return ldexp(mantissa, exponent) All real valued distributions in the class will use the new method: >>> fr = FullRandom() >>> fr.random() 0.05954861408025609 >>> fr.expovariate(0.25) 8.87925541791544 The recipe is conceptually equivalent to an algorithm that chooses from all the multiples of 2⁻¹⁰⁷⁴ in the range *0.0 ≤ x < 1.0*. All such numbers are evenly spaced, but most have to be rounded down to the nearest representable Python float. (The value 2⁻¹⁰⁷⁴ is the smallest positive unnormalized float and is equal to "math.ulp(0.0)".) See also: Generating Pseudo-random Floating-Point Values a paper by Allen B. Downey describing ways to generate more fine-grained floats than normally generated by "random()". "re" — Regular expression operations ************************************ **Source code:** Lib/re.py ====================================================================== This module provides regular expression matching operations similar to those found in Perl. Both patterns and strings to be searched can be Unicode strings ("str") as well as 8-bit strings ("bytes"). However, Unicode strings and 8-bit strings cannot be mixed: that is, you cannot match a Unicode string with a byte pattern or vice-versa; similarly, when asking for a substitution, the replacement string must be of the same type as both the pattern and the search string. Regular expressions use the backslash character ("'\'") to indicate special forms or to allow special characters to be used without invoking their special meaning. This collides with Python’s usage of the same character for the same purpose in string literals; for example, to match a literal backslash, one might have to write "'\\\\'" as the pattern string, because the regular expression must be "\\", and each backslash must be expressed as "\\" inside a regular Python string literal. Also, please note that any invalid escape sequences in Python’s usage of the backslash in string literals now generate a "DeprecationWarning" and in the future this will become a "SyntaxError". This behaviour will happen even if it is a valid escape sequence for a regular expression. The solution is to use Python’s raw string notation for regular expression patterns; backslashes are not handled in any special way in a string literal prefixed with "'r'". So "r"\n"" is a two-character string containing "'\'" and "'n'", while ""\n"" is a one-character string containing a newline. Usually patterns will be expressed in Python code using this raw string notation. It is important to note that most regular expression operations are available as module-level functions and methods on compiled regular expressions. The functions are shortcuts that don’t require you to compile a regex object first, but miss some fine-tuning parameters. See also: The third-party regex module, which has an API compatible with the standard library "re" module, but offers additional functionality and a more thorough Unicode support. Regular Expression Syntax ========================= A regular expression (or RE) specifies a set of strings that matches it; the functions in this module let you check if a particular string matches a given regular expression (or if a given regular expression matches a particular string, which comes down to the same thing). Regular expressions can be concatenated to form new regular expressions; if *A* and *B* are both regular expressions, then *AB* is also a regular expression. In general, if a string *p* matches *A* and another string *q* matches *B*, the string *pq* will match AB. This holds unless *A* or *B* contain low precedence operations; boundary conditions between *A* and *B*; or have numbered group references. Thus, complex expressions can easily be constructed from simpler primitive expressions like the ones described here. For details of the theory and implementation of regular expressions, consult the Friedl book [Frie09], or almost any textbook about compiler construction. A brief explanation of the format of regular expressions follows. For further information and a gentler presentation, consult the Regular Expression HOWTO. Regular expressions can contain both special and ordinary characters. Most ordinary characters, like "'A'", "'a'", or "'0'", are the simplest regular expressions; they simply match themselves. You can concatenate ordinary characters, so "last" matches the string "'last'". (In the rest of this section, we’ll write RE’s in "this special style", usually without quotes, and strings to be matched "'in single quotes'".) Some characters, like "'|'" or "'('", are special. Special characters either stand for classes of ordinary characters, or affect how the regular expressions around them are interpreted. Repetition qualifiers ("*", "+", "?", "{m,n}", etc) cannot be directly nested. This avoids ambiguity with the non-greedy modifier suffix "?", and with other modifiers in other implementations. To apply a second repetition to an inner repetition, parentheses may be used. For example, the expression "(?:a{6})*" matches any multiple of six "'a'" characters. The special characters are: "." (Dot.) In the default mode, this matches any character except a newline. If the "DOTALL" flag has been specified, this matches any character including a newline. "^" (Caret.) Matches the start of the string, and in "MULTILINE" mode also matches immediately after each newline. "$" Matches the end of the string or just before the newline at the end of the string, and in "MULTILINE" mode also matches before a newline. "foo" matches both ‘foo’ and ‘foobar’, while the regular expression "foo$" matches only ‘foo’. More interestingly, searching for "foo.$" in "'foo1\nfoo2\n'" matches ‘foo2’ normally, but ‘foo1’ in "MULTILINE" mode; searching for a single "$" in "'foo\n'" will find two (empty) matches: one just before the newline, and one at the end of the string. "*" Causes the resulting RE to match 0 or more repetitions of the preceding RE, as many repetitions as are possible. "ab*" will match ‘a’, ‘ab’, or ‘a’ followed by any number of ‘b’s. "+" Causes the resulting RE to match 1 or more repetitions of the preceding RE. "ab+" will match ‘a’ followed by any non-zero number of ‘b’s; it will not match just ‘a’. "?" Causes the resulting RE to match 0 or 1 repetitions of the preceding RE. "ab?" will match either ‘a’ or ‘ab’. "*?", "+?", "??" The "'*'", "'+'", and "'?'" qualifiers are all *greedy*; they match as much text as possible. Sometimes this behaviour isn’t desired; if the RE "<.*>" is matched against "' b '", it will match the entire string, and not just "''". Adding "?" after the qualifier makes it perform the match in *non-greedy* or *minimal* fashion; as *few* characters as possible will be matched. Using the RE "<.*?>" will match only "''". "{m}" Specifies that exactly *m* copies of the previous RE should be matched; fewer matches cause the entire RE not to match. For example, "a{6}" will match exactly six "'a'" characters, but not five. "{m,n}" Causes the resulting RE to match from *m* to *n* repetitions of the preceding RE, attempting to match as many repetitions as possible. For example, "a{3,5}" will match from 3 to 5 "'a'" characters. Omitting *m* specifies a lower bound of zero, and omitting *n* specifies an infinite upper bound. As an example, "a{4,}b" will match "'aaaab'" or a thousand "'a'" characters followed by a "'b'", but not "'aaab'". The comma may not be omitted or the modifier would be confused with the previously described form. "{m,n}?" Causes the resulting RE to match from *m* to *n* repetitions of the preceding RE, attempting to match as *few* repetitions as possible. This is the non-greedy version of the previous qualifier. For example, on the 6-character string "'aaaaaa'", "a{3,5}" will match 5 "'a'" characters, while "a{3,5}?" will only match 3 characters. "\" Either escapes special characters (permitting you to match characters like "'*'", "'?'", and so forth), or signals a special sequence; special sequences are discussed below. If you’re not using a raw string to express the pattern, remember that Python also uses the backslash as an escape sequence in string literals; if the escape sequence isn’t recognized by Python’s parser, the backslash and subsequent character are included in the resulting string. However, if Python would recognize the resulting sequence, the backslash should be repeated twice. This is complicated and hard to understand, so it’s highly recommended that you use raw strings for all but the simplest expressions. "[]" Used to indicate a set of characters. In a set: * Characters can be listed individually, e.g. "[amk]" will match "'a'", "'m'", or "'k'". * Ranges of characters can be indicated by giving two characters and separating them by a "'-'", for example "[a-z]" will match any lowercase ASCII letter, "[0-5][0-9]" will match all the two- digits numbers from "00" to "59", and "[0-9A-Fa-f]" will match any hexadecimal digit. If "-" is escaped (e.g. "[a\-z]") or if it’s placed as the first or last character (e.g. "[-a]" or "[a-]"), it will match a literal "'-'". * Special characters lose their special meaning inside sets. For example, "[(+*)]" will match any of the literal characters "'('", "'+'", "'*'", or "')'". * Character classes such as "\w" or "\S" (defined below) are also accepted inside a set, although the characters they match depends on whether "ASCII" or "LOCALE" mode is in force. * Characters that are not within a range can be matched by *complementing* the set. If the first character of the set is "'^'", all the characters that are *not* in the set will be matched. For example, "[^5]" will match any character except "'5'", and "[^^]" will match any character except "'^'". "^" has no special meaning if it’s not the first character in the set. * To match a literal "']'" inside a set, precede it with a backslash, or place it at the beginning of the set. For example, both "[()[\]{}]" and "[]()[{}]" will both match a parenthesis. * Support of nested sets and set operations as in Unicode Technical Standard #18 might be added in the future. This would change the syntax, so to facilitate this change a "FutureWarning" will be raised in ambiguous cases for the time being. That includes sets starting with a literal "'['" or containing literal character sequences "'--'", "'&&'", "'~~'", and "'||'". To avoid a warning escape them with a backslash. Changed in version 3.7: "FutureWarning" is raised if a character set contains constructs that will change semantically in the future. "|" "A|B", where *A* and *B* can be arbitrary REs, creates a regular expression that will match either *A* or *B*. An arbitrary number of REs can be separated by the "'|'" in this way. This can be used inside groups (see below) as well. As the target string is scanned, REs separated by "'|'" are tried from left to right. When one pattern completely matches, that branch is accepted. This means that once *A* matches, *B* will not be tested further, even if it would produce a longer overall match. In other words, the "'|'" operator is never greedy. To match a literal "'|'", use "\|", or enclose it inside a character class, as in "[|]". "(...)" Matches whatever regular expression is inside the parentheses, and indicates the start and end of a group; the contents of a group can be retrieved after a match has been performed, and can be matched later in the string with the "\number" special sequence, described below. To match the literals "'('" or "')'", use "\(" or "\)", or enclose them inside a character class: "[(]", "[)]". "(?...)" This is an extension notation (a "'?'" following a "'('" is not meaningful otherwise). The first character after the "'?'" determines what the meaning and further syntax of the construct is. Extensions usually do not create a new group; "(?P...)" is the only exception to this rule. Following are the currently supported extensions. "(?aiLmsux)" (One or more letters from the set "'a'", "'i'", "'L'", "'m'", "'s'", "'u'", "'x'".) The group matches the empty string; the letters set the corresponding flags: "re.A" (ASCII-only matching), "re.I" (ignore case), "re.L" (locale dependent), "re.M" (multi- line), "re.S" (dot matches all), "re.U" (Unicode matching), and "re.X" (verbose), for the entire regular expression. (The flags are described in Module Contents.) This is useful if you wish to include the flags as part of the regular expression, instead of passing a *flag* argument to the "re.compile()" function. Flags should be used first in the expression string. "(?:...)" A non-capturing version of regular parentheses. Matches whatever regular expression is inside the parentheses, but the substring matched by the group *cannot* be retrieved after performing a match or referenced later in the pattern. "(?aiLmsux-imsx:...)" (Zero or more letters from the set "'a'", "'i'", "'L'", "'m'", "'s'", "'u'", "'x'", optionally followed by "'-'" followed by one or more letters from the "'i'", "'m'", "'s'", "'x'".) The letters set or remove the corresponding flags: "re.A" (ASCII-only matching), "re.I" (ignore case), "re.L" (locale dependent), "re.M" (multi-line), "re.S" (dot matches all), "re.U" (Unicode matching), and "re.X" (verbose), for the part of the expression. (The flags are described in Module Contents.) The letters "'a'", "'L'" and "'u'" are mutually exclusive when used as inline flags, so they can’t be combined or follow "'-'". Instead, when one of them appears in an inline group, it overrides the matching mode in the enclosing group. In Unicode patterns "(?a:...)" switches to ASCII-only matching, and "(?u:...)" switches to Unicode matching (default). In byte pattern "(?L:...)" switches to locale depending matching, and "(?a:...)" switches to ASCII-only matching (default). This override is only in effect for the narrow inline group, and the original matching mode is restored outside of the group. New in version 3.6. Changed in version 3.7: The letters "'a'", "'L'" and "'u'" also can be used in a group. "(?P...)" Similar to regular parentheses, but the substring matched by the group is accessible via the symbolic group name *name*. Group names must be valid Python identifiers, and each group name must be defined only once within a regular expression. A symbolic group is also a numbered group, just as if the group were not named. Named groups can be referenced in three contexts. If the pattern is "(?P['"]).*?(?P=quote)" (i.e. matching a string quoted with either single or double quotes): +-----------------------------------------+------------------------------------+ | Context of reference to group “quote” | Ways to reference it | |=========================================|====================================| | in the same pattern itself | * "(?P=quote)" (as shown) * "\1" | +-----------------------------------------+------------------------------------+ | when processing match object *m* | * "m.group('quote')" * | | | "m.end('quote')" (etc.) | +-----------------------------------------+------------------------------------+ | in a string passed to the *repl* | * "\g" * "\g<1>" * "\1" | | argument of "re.sub()" | | +-----------------------------------------+------------------------------------+ "(?P=name)" A backreference to a named group; it matches whatever text was matched by the earlier group named *name*. "(?#...)" A comment; the contents of the parentheses are simply ignored. "(?=...)" Matches if "..." matches next, but doesn’t consume any of the string. This is called a *lookahead assertion*. For example, "Isaac (?=Asimov)" will match "'Isaac '" only if it’s followed by "'Asimov'". "(?!...)" Matches if "..." doesn’t match next. This is a *negative lookahead assertion*. For example, "Isaac (?!Asimov)" will match "'Isaac '" only if it’s *not* followed by "'Asimov'". "(?<=...)" Matches if the current position in the string is preceded by a match for "..." that ends at the current position. This is called a *positive lookbehind assertion*. "(?<=abc)def" will find a match in "'abcdef'", since the lookbehind will back up 3 characters and check if the contained pattern matches. The contained pattern must only match strings of some fixed length, meaning that "abc" or "a|b" are allowed, but "a*" and "a{3,4}" are not. Note that patterns which start with positive lookbehind assertions will not match at the beginning of the string being searched; you will most likely want to use the "search()" function rather than the "match()" function: >>> import re >>> m = re.search('(?<=abc)def', 'abcdef') >>> m.group(0) 'def' This example looks for a word following a hyphen: >>> m = re.search(r'(?<=-)\w+', 'spam-egg') >>> m.group(0) 'egg' Changed in version 3.5: Added support for group references of fixed length. "(?|$)" is a poor email matching pattern, which will match with "''" as well as "'user@host.com'", but not with "''". The special sequences consist of "'\'" and a character from the list below. If the ordinary character is not an ASCII digit or an ASCII letter, then the resulting RE will match the second character. For example, "\$" matches the character "'$'". "\number" Matches the contents of the group of the same number. Groups are numbered starting from 1. For example, "(.+) \1" matches "'the the'" or "'55 55'", but not "'thethe'" (note the space after the group). This special sequence can only be used to match one of the first 99 groups. If the first digit of *number* is 0, or *number* is 3 octal digits long, it will not be interpreted as a group match, but as the character with octal value *number*. Inside the "'['" and "']'" of a character class, all numeric escapes are treated as characters. "\A" Matches only at the start of the string. "\b" Matches the empty string, but only at the beginning or end of a word. A word is defined as a sequence of word characters. Note that formally, "\b" is defined as the boundary between a "\w" and a "\W" character (or vice versa), or between "\w" and the beginning/end of the string. This means that "r'\bfoo\b'" matches "'foo'", "'foo.'", "'(foo)'", "'bar foo baz'" but not "'foobar'" or "'foo3'". By default Unicode alphanumerics are the ones used in Unicode patterns, but this can be changed by using the "ASCII" flag. Word boundaries are determined by the current locale if the "LOCALE" flag is used. Inside a character range, "\b" represents the backspace character, for compatibility with Python’s string literals. "\B" Matches the empty string, but only when it is *not* at the beginning or end of a word. This means that "r'py\B'" matches "'python'", "'py3'", "'py2'", but not "'py'", "'py.'", or "'py!'". "\B" is just the opposite of "\b", so word characters in Unicode patterns are Unicode alphanumerics or the underscore, although this can be changed by using the "ASCII" flag. Word boundaries are determined by the current locale if the "LOCALE" flag is used. "\d" For Unicode (str) patterns: Matches any Unicode decimal digit (that is, any character in Unicode character category [Nd]). This includes "[0-9]", and also many other digit characters. If the "ASCII" flag is used only "[0-9]" is matched. For 8-bit (bytes) patterns: Matches any decimal digit; this is equivalent to "[0-9]". "\D" Matches any character which is not a decimal digit. This is the opposite of "\d". If the "ASCII" flag is used this becomes the equivalent of "[^0-9]". "\s" For Unicode (str) patterns: Matches Unicode whitespace characters (which includes "[ \t\n\r\f\v]", and also many other characters, for example the non-breaking spaces mandated by typography rules in many languages). If the "ASCII" flag is used, only "[ \t\n\r\f\v]" is matched. For 8-bit (bytes) patterns: Matches characters considered whitespace in the ASCII character set; this is equivalent to "[ \t\n\r\f\v]". "\S" Matches any character which is not a whitespace character. This is the opposite of "\s". If the "ASCII" flag is used this becomes the equivalent of "[^ \t\n\r\f\v]". "\w" For Unicode (str) patterns: Matches Unicode word characters; this includes most characters that can be part of a word in any language, as well as numbers and the underscore. If the "ASCII" flag is used, only "[a-zA-Z0-9_]" is matched. For 8-bit (bytes) patterns: Matches characters considered alphanumeric in the ASCII character set; this is equivalent to "[a-zA-Z0-9_]". If the "LOCALE" flag is used, matches characters considered alphanumeric in the current locale and the underscore. "\W" Matches any character which is not a word character. This is the opposite of "\w". If the "ASCII" flag is used this becomes the equivalent of "[^a-zA-Z0-9_]". If the "LOCALE" flag is used, matches characters which are neither alphanumeric in the current locale nor the underscore. "\Z" Matches only at the end of the string. Most of the standard escapes supported by Python string literals are also accepted by the regular expression parser: \a \b \f \n \N \r \t \u \U \v \x \\ (Note that "\b" is used to represent word boundaries, and means “backspace” only inside character classes.) "'\u'", "'\U'", and "'\N'" escape sequences are only recognized in Unicode patterns. In bytes patterns they are errors. Unknown escapes of ASCII letters are reserved for future use and treated as errors. Octal escapes are included in a limited form. If the first digit is a 0, or if there are three octal digits, it is considered an octal escape. Otherwise, it is a group reference. As for string literals, octal escapes are always at most three digits in length. Changed in version 3.3: The "'\u'" and "'\U'" escape sequences have been added. Changed in version 3.6: Unknown escapes consisting of "'\'" and an ASCII letter now are errors. Changed in version 3.8: The "'\N{name}'" escape sequence has been added. As in string literals, it expands to the named Unicode character (e.g. "'\N{EM DASH}'"). Module Contents =============== The module defines several functions, constants, and an exception. Some of the functions are simplified versions of the full featured methods for compiled regular expressions. Most non-trivial applications always use the compiled form. Changed in version 3.6: Flag constants are now instances of "RegexFlag", which is a subclass of "enum.IntFlag". re.compile(pattern, flags=0) Compile a regular expression pattern into a regular expression object, which can be used for matching using its "match()", "search()" and other methods, described below. The expression’s behaviour can be modified by specifying a *flags* value. Values can be any of the following variables, combined using bitwise OR (the "|" operator). The sequence prog = re.compile(pattern) result = prog.match(string) is equivalent to result = re.match(pattern, string) but using "re.compile()" and saving the resulting regular expression object for reuse is more efficient when the expression will be used several times in a single program. Note: The compiled versions of the most recent patterns passed to "re.compile()" and the module-level matching functions are cached, so programs that use only a few regular expressions at a time needn’t worry about compiling regular expressions. re.A re.ASCII Make "\w", "\W", "\b", "\B", "\d", "\D", "\s" and "\S" perform ASCII-only matching instead of full Unicode matching. This is only meaningful for Unicode patterns, and is ignored for byte patterns. Corresponds to the inline flag "(?a)". Note that for backward compatibility, the "re.U" flag still exists (as well as its synonym "re.UNICODE" and its embedded counterpart "(?u)"), but these are redundant in Python 3 since matches are Unicode by default for strings (and Unicode matching isn’t allowed for bytes). re.DEBUG Display debug information about compiled expression. No corresponding inline flag. re.I re.IGNORECASE Perform case-insensitive matching; expressions like "[A-Z]" will also match lowercase letters. Full Unicode matching (such as "Ü" matching "ü") also works unless the "re.ASCII" flag is used to disable non-ASCII matches. The current locale does not change the effect of this flag unless the "re.LOCALE" flag is also used. Corresponds to the inline flag "(?i)". Note that when the Unicode patterns "[a-z]" or "[A-Z]" are used in combination with the "IGNORECASE" flag, they will match the 52 ASCII letters and 4 additional non-ASCII letters: ‘İ’ (U+0130, Latin capital letter I with dot above), ‘ı’ (U+0131, Latin small letter dotless i), ‘ſ’ (U+017F, Latin small letter long s) and ‘K’ (U+212A, Kelvin sign). If the "ASCII" flag is used, only letters ‘a’ to ‘z’ and ‘A’ to ‘Z’ are matched. re.L re.LOCALE Make "\w", "\W", "\b", "\B" and case-insensitive matching dependent on the current locale. This flag can be used only with bytes patterns. The use of this flag is discouraged as the locale mechanism is very unreliable, it only handles one “culture” at a time, and it only works with 8-bit locales. Unicode matching is already enabled by default in Python 3 for Unicode (str) patterns, and it is able to handle different locales/languages. Corresponds to the inline flag "(?L)". Changed in version 3.6: "re.LOCALE" can be used only with bytes patterns and is not compatible with "re.ASCII". Changed in version 3.7: Compiled regular expression objects with the "re.LOCALE" flag no longer depend on the locale at compile time. Only the locale at matching time affects the result of matching. re.M re.MULTILINE When specified, the pattern character "'^'" matches at the beginning of the string and at the beginning of each line (immediately following each newline); and the pattern character "'$'" matches at the end of the string and at the end of each line (immediately preceding each newline). By default, "'^'" matches only at the beginning of the string, and "'$'" only at the end of the string and immediately before the newline (if any) at the end of the string. Corresponds to the inline flag "(?m)". re.S re.DOTALL Make the "'.'" special character match any character at all, including a newline; without this flag, "'.'" will match anything *except* a newline. Corresponds to the inline flag "(?s)". re.X re.VERBOSE This flag allows you to write regular expressions that look nicer and are more readable by allowing you to visually separate logical sections of the pattern and add comments. Whitespace within the pattern is ignored, except when in a character class, or when preceded by an unescaped backslash, or within tokens like "*?", "(?:" or "(?P<...>". When a line contains a "#" that is not in a character class and is not preceded by an unescaped backslash, all characters from the leftmost such "#" through the end of the line are ignored. This means that the two following regular expression objects that match a decimal number are functionally equal: a = re.compile(r"""\d + # the integral part \. # the decimal point \d * # some fractional digits""", re.X) b = re.compile(r"\d+\.\d*") Corresponds to the inline flag "(?x)". re.search(pattern, string, flags=0) Scan through *string* looking for the first location where the regular expression *pattern* produces a match, and return a corresponding match object. Return "None" if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string. re.match(pattern, string, flags=0) If zero or more characters at the beginning of *string* match the regular expression *pattern*, return a corresponding match object. Return "None" if the string does not match the pattern; note that this is different from a zero-length match. Note that even in "MULTILINE" mode, "re.match()" will only match at the beginning of the string and not at the beginning of each line. If you want to locate a match anywhere in *string*, use "search()" instead (see also search() vs. match()). re.fullmatch(pattern, string, flags=0) If the whole *string* matches the regular expression *pattern*, return a corresponding match object. Return "None" if the string does not match the pattern; note that this is different from a zero-length match. New in version 3.4. re.split(pattern, string, maxsplit=0, flags=0) Split *string* by the occurrences of *pattern*. If capturing parentheses are used in *pattern*, then the text of all groups in the pattern are also returned as part of the resulting list. If *maxsplit* is nonzero, at most *maxsplit* splits occur, and the remainder of the string is returned as the final element of the list. >>> re.split(r'\W+', 'Words, words, words.') ['Words', 'words', 'words', ''] >>> re.split(r'(\W+)', 'Words, words, words.') ['Words', ', ', 'words', ', ', 'words', '.', ''] >>> re.split(r'\W+', 'Words, words, words.', 1) ['Words', 'words, words.'] >>> re.split('[a-f]+', '0a3B9', flags=re.IGNORECASE) ['0', '3', '9'] If there are capturing groups in the separator and it matches at the start of the string, the result will start with an empty string. The same holds for the end of the string: >>> re.split(r'(\W+)', '...words, words...') ['', '...', 'words', ', ', 'words', '...', ''] That way, separator components are always found at the same relative indices within the result list. Empty matches for the pattern split the string only when not adjacent to a previous empty match. >>> re.split(r'\b', 'Words, words, words.') ['', 'Words', ', ', 'words', ', ', 'words', '.'] >>> re.split(r'\W*', '...words...') ['', '', 'w', 'o', 'r', 'd', 's', '', ''] >>> re.split(r'(\W*)', '...words...') ['', '...', '', '', 'w', '', 'o', '', 'r', '', 'd', '', 's', '...', '', '', ''] Changed in version 3.1: Added the optional flags argument. Changed in version 3.7: Added support of splitting on a pattern that could match an empty string. re.findall(pattern, string, flags=0) Return all non-overlapping matches of *pattern* in *string*, as a list of strings or tuples. The *string* is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result. The result depends on the number of capturing groups in the pattern. If there are no groups, return a list of strings matching the whole pattern. If there is exactly one group, return a list of strings matching that group. If multiple groups are present, return a list of tuples of strings matching the groups. Non- capturing groups do not affect the form of the result. >>> re.findall(r'\bf[a-z]*', 'which foot or hand fell fastest') ['foot', 'fell', 'fastest'] >>> re.findall(r'(\w+)=(\d+)', 'set width=20 and height=10') [('width', '20'), ('height', '10')] Changed in version 3.7: Non-empty matches can now start just after a previous empty match. re.finditer(pattern, string, flags=0) Return an *iterator* yielding match objects over all non- overlapping matches for the RE *pattern* in *string*. The *string* is scanned left-to-right, and matches are returned in the order found. Empty matches are included in the result. Changed in version 3.7: Non-empty matches can now start just after a previous empty match. re.sub(pattern, repl, string, count=0, flags=0) Return the string obtained by replacing the leftmost non- overlapping occurrences of *pattern* in *string* by the replacement *repl*. If the pattern isn’t found, *string* is returned unchanged. *repl* can be a string or a function; if it is a string, any backslash escapes in it are processed. That is, "\n" is converted to a single newline character, "\r" is converted to a carriage return, and so forth. Unknown escapes of ASCII letters are reserved for future use and treated as errors. Other unknown escapes such as "\&" are left alone. Backreferences, such as "\6", are replaced with the substring matched by group 6 in the pattern. For example: >>> re.sub(r'def\s+([a-zA-Z_][a-zA-Z_0-9]*)\s*\(\s*\):', ... r'static PyObject*\npy_\1(void)\n{', ... 'def myfunc():') 'static PyObject*\npy_myfunc(void)\n{' If *repl* is a function, it is called for every non-overlapping occurrence of *pattern*. The function takes a single match object argument, and returns the replacement string. For example: >>> def dashrepl(matchobj): ... if matchobj.group(0) == '-': return ' ' ... else: return '-' >>> re.sub('-{1,2}', dashrepl, 'pro----gram-files') 'pro--gram files' >>> re.sub(r'\sAND\s', ' & ', 'Baked Beans And Spam', flags=re.IGNORECASE) 'Baked Beans & Spam' The pattern may be a string or a pattern object. The optional argument *count* is the maximum number of pattern occurrences to be replaced; *count* must be a non-negative integer. If omitted or zero, all occurrences will be replaced. Empty matches for the pattern are replaced only when not adjacent to a previous empty match, so "sub('x*', '-', 'abxd')" returns "'-a-b--d-'". In string-type *repl* arguments, in addition to the character escapes and backreferences described above, "\g" will use the substring matched by the group named "name", as defined by the "(?P...)" syntax. "\g" uses the corresponding group number; "\g<2>" is therefore equivalent to "\2", but isn’t ambiguous in a replacement such as "\g<2>0". "\20" would be interpreted as a reference to group 20, not a reference to group 2 followed by the literal character "'0'". The backreference "\g<0>" substitutes in the entire substring matched by the RE. Changed in version 3.1: Added the optional flags argument. Changed in version 3.5: Unmatched groups are replaced with an empty string. Changed in version 3.6: Unknown escapes in *pattern* consisting of "'\'" and an ASCII letter now are errors. Changed in version 3.7: Unknown escapes in *repl* consisting of "'\'" and an ASCII letter now are errors. Changed in version 3.7: Empty matches for the pattern are replaced when adjacent to a previous non-empty match. re.subn(pattern, repl, string, count=0, flags=0) Perform the same operation as "sub()", but return a tuple "(new_string, number_of_subs_made)". Changed in version 3.1: Added the optional flags argument. Changed in version 3.5: Unmatched groups are replaced with an empty string. re.escape(pattern) Escape special characters in *pattern*. This is useful if you want to match an arbitrary literal string that may have regular expression metacharacters in it. For example: >>> print(re.escape('https://www.python.org')) https://www\.python\.org >>> legal_chars = string.ascii_lowercase + string.digits + "!#$%&'*+-.^_`|~:" >>> print('[%s]+' % re.escape(legal_chars)) [abcdefghijklmnopqrstuvwxyz0123456789!\#\$%\&'\*\+\-\.\^_`\|\~:]+ >>> operators = ['+', '-', '*', '/', '**'] >>> print('|'.join(map(re.escape, sorted(operators, reverse=True)))) /|\-|\+|\*\*|\* This function must not be used for the replacement string in "sub()" and "subn()", only backslashes should be escaped. For example: >>> digits_re = r'\d+' >>> sample = '/usr/sbin/sendmail - 0 errors, 12 warnings' >>> print(re.sub(digits_re, digits_re.replace('\\', r'\\'), sample)) /usr/sbin/sendmail - \d+ errors, \d+ warnings Changed in version 3.3: The "'_'" character is no longer escaped. Changed in version 3.7: Only characters that can have special meaning in a regular expression are escaped. As a result, "'!'", "'"'", "'%'", ""'"", "','", "'/'", "':'", "';'", "'<'", "'='", "'>'", "'@'", and ""`"" are no longer escaped. re.purge() Clear the regular expression cache. exception re.error(msg, pattern=None, pos=None) Exception raised when a string passed to one of the functions here is not a valid regular expression (for example, it might contain unmatched parentheses) or when some other error occurs during compilation or matching. It is never an error if a string contains no match for a pattern. The error instance has the following additional attributes: msg The unformatted error message. pattern The regular expression pattern. pos The index in *pattern* where compilation failed (may be "None"). lineno The line corresponding to *pos* (may be "None"). colno The column corresponding to *pos* (may be "None"). Changed in version 3.5: Added additional attributes. Regular Expression Objects ========================== Compiled regular expression objects support the following methods and attributes: Pattern.search(string[, pos[, endpos]]) Scan through *string* looking for the first location where this regular expression produces a match, and return a corresponding match object. Return "None" if no position in the string matches the pattern; note that this is different from finding a zero-length match at some point in the string. The optional second parameter *pos* gives an index in the string where the search is to start; it defaults to "0". This is not completely equivalent to slicing the string; the "'^'" pattern character matches at the real beginning of the string and at positions just after a newline, but not necessarily at the index where the search is to start. The optional parameter *endpos* limits how far the string will be searched; it will be as if the string is *endpos* characters long, so only the characters from *pos* to "endpos - 1" will be searched for a match. If *endpos* is less than *pos*, no match will be found; otherwise, if *rx* is a compiled regular expression object, "rx.search(string, 0, 50)" is equivalent to "rx.search(string[:50], 0)". >>> pattern = re.compile("d") >>> pattern.search("dog") # Match at index 0 >>> pattern.search("dog", 1) # No match; search doesn't include the "d" Pattern.match(string[, pos[, endpos]]) If zero or more characters at the *beginning* of *string* match this regular expression, return a corresponding match object. Return "None" if the string does not match the pattern; note that this is different from a zero-length match. The optional *pos* and *endpos* parameters have the same meaning as for the "search()" method. >>> pattern = re.compile("o") >>> pattern.match("dog") # No match as "o" is not at the start of "dog". >>> pattern.match("dog", 1) # Match as "o" is the 2nd character of "dog". If you want to locate a match anywhere in *string*, use "search()" instead (see also search() vs. match()). Pattern.fullmatch(string[, pos[, endpos]]) If the whole *string* matches this regular expression, return a corresponding match object. Return "None" if the string does not match the pattern; note that this is different from a zero-length match. The optional *pos* and *endpos* parameters have the same meaning as for the "search()" method. >>> pattern = re.compile("o[gh]") >>> pattern.fullmatch("dog") # No match as "o" is not at the start of "dog". >>> pattern.fullmatch("ogre") # No match as not the full string matches. >>> pattern.fullmatch("doggie", 1, 3) # Matches within given limits. New in version 3.4. Pattern.split(string, maxsplit=0) Identical to the "split()" function, using the compiled pattern. Pattern.findall(string[, pos[, endpos]]) Similar to the "findall()" function, using the compiled pattern, but also accepts optional *pos* and *endpos* parameters that limit the search region like for "search()". Pattern.finditer(string[, pos[, endpos]]) Similar to the "finditer()" function, using the compiled pattern, but also accepts optional *pos* and *endpos* parameters that limit the search region like for "search()". Pattern.sub(repl, string, count=0) Identical to the "sub()" function, using the compiled pattern. Pattern.subn(repl, string, count=0) Identical to the "subn()" function, using the compiled pattern. Pattern.flags The regex matching flags. This is a combination of the flags given to "compile()", any "(?...)" inline flags in the pattern, and implicit flags such as "UNICODE" if the pattern is a Unicode string. Pattern.groups The number of capturing groups in the pattern. Pattern.groupindex A dictionary mapping any symbolic group names defined by "(?P)" to group numbers. The dictionary is empty if no symbolic groups were used in the pattern. Pattern.pattern The pattern string from which the pattern object was compiled. Changed in version 3.7: Added support of "copy.copy()" and "copy.deepcopy()". Compiled regular expression objects are considered atomic. Match Objects ============= Match objects always have a boolean value of "True". Since "match()" and "search()" return "None" when there is no match, you can test whether there was a match with a simple "if" statement: match = re.search(pattern, string) if match: process(match) Match objects support the following methods and attributes: Match.expand(template) Return the string obtained by doing backslash substitution on the template string *template*, as done by the "sub()" method. Escapes such as "\n" are converted to the appropriate characters, and numeric backreferences ("\1", "\2") and named backreferences ("\g<1>", "\g") are replaced by the contents of the corresponding group. Changed in version 3.5: Unmatched groups are replaced with an empty string. Match.group([group1, ...]) Returns one or more subgroups of the match. If there is a single argument, the result is a single string; if there are multiple arguments, the result is a tuple with one item per argument. Without arguments, *group1* defaults to zero (the whole match is returned). If a *groupN* argument is zero, the corresponding return value is the entire matching string; if it is in the inclusive range [1..99], it is the string matching the corresponding parenthesized group. If a group number is negative or larger than the number of groups defined in the pattern, an "IndexError" exception is raised. If a group is contained in a part of the pattern that did not match, the corresponding result is "None". If a group is contained in a part of the pattern that matched multiple times, the last match is returned. >>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist") >>> m.group(0) # The entire match 'Isaac Newton' >>> m.group(1) # The first parenthesized subgroup. 'Isaac' >>> m.group(2) # The second parenthesized subgroup. 'Newton' >>> m.group(1, 2) # Multiple arguments give us a tuple. ('Isaac', 'Newton') If the regular expression uses the "(?P...)" syntax, the *groupN* arguments may also be strings identifying groups by their group name. If a string argument is not used as a group name in the pattern, an "IndexError" exception is raised. A moderately complicated example: >>> m = re.match(r"(?P\w+) (?P\w+)", "Malcolm Reynolds") >>> m.group('first_name') 'Malcolm' >>> m.group('last_name') 'Reynolds' Named groups can also be referred to by their index: >>> m.group(1) 'Malcolm' >>> m.group(2) 'Reynolds' If a group matches multiple times, only the last match is accessible: >>> m = re.match(r"(..)+", "a1b2c3") # Matches 3 times. >>> m.group(1) # Returns only the last match. 'c3' Match.__getitem__(g) This is identical to "m.group(g)". This allows easier access to an individual group from a match: >>> m = re.match(r"(\w+) (\w+)", "Isaac Newton, physicist") >>> m[0] # The entire match 'Isaac Newton' >>> m[1] # The first parenthesized subgroup. 'Isaac' >>> m[2] # The second parenthesized subgroup. 'Newton' New in version 3.6. Match.groups(default=None) Return a tuple containing all the subgroups of the match, from 1 up to however many groups are in the pattern. The *default* argument is used for groups that did not participate in the match; it defaults to "None". For example: >>> m = re.match(r"(\d+)\.(\d+)", "24.1632") >>> m.groups() ('24', '1632') If we make the decimal place and everything after it optional, not all groups might participate in the match. These groups will default to "None" unless the *default* argument is given: >>> m = re.match(r"(\d+)\.?(\d+)?", "24") >>> m.groups() # Second group defaults to None. ('24', None) >>> m.groups('0') # Now, the second group defaults to '0'. ('24', '0') Match.groupdict(default=None) Return a dictionary containing all the *named* subgroups of the match, keyed by the subgroup name. The *default* argument is used for groups that did not participate in the match; it defaults to "None". For example: >>> m = re.match(r"(?P\w+) (?P\w+)", "Malcolm Reynolds") >>> m.groupdict() {'first_name': 'Malcolm', 'last_name': 'Reynolds'} Match.start([group]) Match.end([group]) Return the indices of the start and end of the substring matched by *group*; *group* defaults to zero (meaning the whole matched substring). Return "-1" if *group* exists but did not contribute to the match. For a match object *m*, and a group *g* that did contribute to the match, the substring matched by group *g* (equivalent to "m.group(g)") is m.string[m.start(g):m.end(g)] Note that "m.start(group)" will equal "m.end(group)" if *group* matched a null string. For example, after "m = re.search('b(c?)', 'cba')", "m.start(0)" is 1, "m.end(0)" is 2, "m.start(1)" and "m.end(1)" are both 2, and "m.start(2)" raises an "IndexError" exception. An example that will remove *remove_this* from email addresses: >>> email = "tony@tiremove_thisger.net" >>> m = re.search("remove_this", email) >>> email[:m.start()] + email[m.end():] 'tony@tiger.net' Match.span([group]) For a match *m*, return the 2-tuple "(m.start(group), m.end(group))". Note that if *group* did not contribute to the match, this is "(-1, -1)". *group* defaults to zero, the entire match. Match.pos The value of *pos* which was passed to the "search()" or "match()" method of a regex object. This is the index into the string at which the RE engine started looking for a match. Match.endpos The value of *endpos* which was passed to the "search()" or "match()" method of a regex object. This is the index into the string beyond which the RE engine will not go. Match.lastindex The integer index of the last matched capturing group, or "None" if no group was matched at all. For example, the expressions "(a)b", "((a)(b))", and "((ab))" will have "lastindex == 1" if applied to the string "'ab'", while the expression "(a)(b)" will have "lastindex == 2", if applied to the same string. Match.lastgroup The name of the last matched capturing group, or "None" if the group didn’t have a name, or if no group was matched at all. Match.re The regular expression object whose "match()" or "search()" method produced this match instance. Match.string The string passed to "match()" or "search()". Changed in version 3.7: Added support of "copy.copy()" and "copy.deepcopy()". Match objects are considered atomic. Regular Expression Examples =========================== Checking for a Pair ------------------- In this example, we’ll use the following helper function to display match objects a little more gracefully: def displaymatch(match): if match is None: return None return '' % (match.group(), match.groups()) Suppose you are writing a poker program where a player’s hand is represented as a 5-character string with each character representing a card, “a” for ace, “k” for king, “q” for queen, “j” for jack, “t” for 10, and “2” through “9” representing the card with that value. To see if a given string is a valid hand, one could do the following: >>> valid = re.compile(r"^[a2-9tjqk]{5}$") >>> displaymatch(valid.match("akt5q")) # Valid. "" >>> displaymatch(valid.match("akt5e")) # Invalid. >>> displaymatch(valid.match("akt")) # Invalid. >>> displaymatch(valid.match("727ak")) # Valid. "" That last hand, ""727ak"", contained a pair, or two of the same valued cards. To match this with a regular expression, one could use backreferences as such: >>> pair = re.compile(r".*(.).*\1") >>> displaymatch(pair.match("717ak")) # Pair of 7s. "" >>> displaymatch(pair.match("718ak")) # No pairs. >>> displaymatch(pair.match("354aa")) # Pair of aces. "" To find out what card the pair consists of, one could use the "group()" method of the match object in the following manner: >>> pair = re.compile(r".*(.).*\1") >>> pair.match("717ak").group(1) '7' # Error because re.match() returns None, which doesn't have a group() method: >>> pair.match("718ak").group(1) Traceback (most recent call last): File "", line 1, in re.match(r".*(.).*\1", "718ak").group(1) AttributeError: 'NoneType' object has no attribute 'group' >>> pair.match("354aa").group(1) 'a' Simulating scanf() ------------------ Python does not currently have an equivalent to "scanf()". Regular expressions are generally more powerful, though also more verbose, than "scanf()" format strings. The table below offers some more-or- less equivalent mappings between "scanf()" format tokens and regular expressions. +----------------------------------+-----------------------------------------------+ | "scanf()" Token | Regular Expression | |==================================|===============================================| | "%c" | "." | +----------------------------------+-----------------------------------------------+ | "%5c" | ".{5}" | +----------------------------------+-----------------------------------------------+ | "%d" | "[-+]?\d+" | +----------------------------------+-----------------------------------------------+ | "%e", "%E", "%f", "%g" | "[-+]?(\d+(\.\d*)?|\.\d+)([eE][-+]?\d+)?" | +----------------------------------+-----------------------------------------------+ | "%i" | "[-+]?(0[xX][\dA-Fa-f]+|0[0-7]*|\d+)" | +----------------------------------+-----------------------------------------------+ | "%o" | "[-+]?[0-7]+" | +----------------------------------+-----------------------------------------------+ | "%s" | "\S+" | +----------------------------------+-----------------------------------------------+ | "%u" | "\d+" | +----------------------------------+-----------------------------------------------+ | "%x", "%X" | "[-+]?(0[xX])?[\dA-Fa-f]+" | +----------------------------------+-----------------------------------------------+ To extract the filename and numbers from a string like /usr/sbin/sendmail - 0 errors, 4 warnings you would use a "scanf()" format like %s - %d errors, %d warnings The equivalent regular expression would be (\S+) - (\d+) errors, (\d+) warnings search() vs. match() -------------------- Python offers two different primitive operations based on regular expressions: "re.match()" checks for a match only at the beginning of the string, while "re.search()" checks for a match anywhere in the string (this is what Perl does by default). For example: >>> re.match("c", "abcdef") # No match >>> re.search("c", "abcdef") # Match Regular expressions beginning with "'^'" can be used with "search()" to restrict the match at the beginning of the string: >>> re.match("c", "abcdef") # No match >>> re.search("^c", "abcdef") # No match >>> re.search("^a", "abcdef") # Match Note however that in "MULTILINE" mode "match()" only matches at the beginning of the string, whereas using "search()" with a regular expression beginning with "'^'" will match at the beginning of each line. >>> re.match('X', 'A\nB\nX', re.MULTILINE) # No match >>> re.search('^X', 'A\nB\nX', re.MULTILINE) # Match Making a Phonebook ------------------ "split()" splits a string into a list delimited by the passed pattern. The method is invaluable for converting textual data into data structures that can be easily read and modified by Python as demonstrated in the following example that creates a phonebook. First, here is the input. Normally it may come from a file, here we are using triple-quoted string syntax >>> text = """Ross McFluff: 834.345.1254 155 Elm Street ... ... Ronald Heathmore: 892.345.3428 436 Finley Avenue ... Frank Burger: 925.541.7625 662 South Dogwood Way ... ... ... Heather Albrecht: 548.326.4584 919 Park Place""" The entries are separated by one or more newlines. Now we convert the string into a list with each nonempty line having its own entry: >>> entries = re.split("\n+", text) >>> entries ['Ross McFluff: 834.345.1254 155 Elm Street', 'Ronald Heathmore: 892.345.3428 436 Finley Avenue', 'Frank Burger: 925.541.7625 662 South Dogwood Way', 'Heather Albrecht: 548.326.4584 919 Park Place'] Finally, split each entry into a list with first name, last name, telephone number, and address. We use the "maxsplit" parameter of "split()" because the address has spaces, our splitting pattern, in it: >>> [re.split(":? ", entry, 3) for entry in entries] [['Ross', 'McFluff', '834.345.1254', '155 Elm Street'], ['Ronald', 'Heathmore', '892.345.3428', '436 Finley Avenue'], ['Frank', 'Burger', '925.541.7625', '662 South Dogwood Way'], ['Heather', 'Albrecht', '548.326.4584', '919 Park Place']] The ":?" pattern matches the colon after the last name, so that it does not occur in the result list. With a "maxsplit" of "4", we could separate the house number from the street name: >>> [re.split(":? ", entry, 4) for entry in entries] [['Ross', 'McFluff', '834.345.1254', '155', 'Elm Street'], ['Ronald', 'Heathmore', '892.345.3428', '436', 'Finley Avenue'], ['Frank', 'Burger', '925.541.7625', '662', 'South Dogwood Way'], ['Heather', 'Albrecht', '548.326.4584', '919', 'Park Place']] Text Munging ------------ "sub()" replaces every occurrence of a pattern with a string or the result of a function. This example demonstrates using "sub()" with a function to “munge” text, or randomize the order of all the characters in each word of a sentence except for the first and last characters: >>> def repl(m): ... inner_word = list(m.group(2)) ... random.shuffle(inner_word) ... return m.group(1) + "".join(inner_word) + m.group(3) >>> text = "Professor Abdolmalek, please report your absences promptly." >>> re.sub(r"(\w)(\w+)(\w)", repl, text) 'Poefsrosr Aealmlobdk, pslaee reorpt your abnseces plmrptoy.' >>> re.sub(r"(\w)(\w+)(\w)", repl, text) 'Pofsroser Aodlambelk, plasee reoprt yuor asnebces potlmrpy.' Finding all Adverbs ------------------- "findall()" matches *all* occurrences of a pattern, not just the first one as "search()" does. For example, if a writer wanted to find all of the adverbs in some text, they might use "findall()" in the following manner: >>> text = "He was carefully disguised but captured quickly by police." >>> re.findall(r"\w+ly\b", text) ['carefully', 'quickly'] Finding all Adverbs and their Positions --------------------------------------- If one wants more information about all matches of a pattern than the matched text, "finditer()" is useful as it provides match objects instead of strings. Continuing with the previous example, if a writer wanted to find all of the adverbs *and their positions* in some text, they would use "finditer()" in the following manner: >>> text = "He was carefully disguised but captured quickly by police." >>> for m in re.finditer(r"\w+ly\b", text): ... print('%02d-%02d: %s' % (m.start(), m.end(), m.group(0))) 07-16: carefully 40-47: quickly Raw String Notation ------------------- Raw string notation ("r"text"") keeps regular expressions sane. Without it, every backslash ("'\'") in a regular expression would have to be prefixed with another one to escape it. For example, the two following lines of code are functionally identical: >>> re.match(r"\W(.)\1\W", " ff ") >>> re.match("\\W(.)\\1\\W", " ff ") When one wants to match a literal backslash, it must be escaped in the regular expression. With raw string notation, this means "r"\\"". Without raw string notation, one must use ""\\\\"", making the following lines of code functionally identical: >>> re.match(r"\\", r"\\") >>> re.match("\\\\", r"\\") Writing a Tokenizer ------------------- A tokenizer or scanner analyzes a string to categorize groups of characters. This is a useful first step in writing a compiler or interpreter. The text categories are specified with regular expressions. The technique is to combine those into a single master regular expression and to loop over successive matches: from typing import NamedTuple import re class Token(NamedTuple): type: str value: str line: int column: int def tokenize(code): keywords = {'IF', 'THEN', 'ENDIF', 'FOR', 'NEXT', 'GOSUB', 'RETURN'} token_specification = [ ('NUMBER', r'\d+(\.\d*)?'), # Integer or decimal number ('ASSIGN', r':='), # Assignment operator ('END', r';'), # Statement terminator ('ID', r'[A-Za-z]+'), # Identifiers ('OP', r'[+\-*/]'), # Arithmetic operators ('NEWLINE', r'\n'), # Line endings ('SKIP', r'[ \t]+'), # Skip over spaces and tabs ('MISMATCH', r'.'), # Any other character ] tok_regex = '|'.join('(?P<%s>%s)' % pair for pair in token_specification) line_num = 1 line_start = 0 for mo in re.finditer(tok_regex, code): kind = mo.lastgroup value = mo.group() column = mo.start() - line_start if kind == 'NUMBER': value = float(value) if '.' in value else int(value) elif kind == 'ID' and value in keywords: kind = value elif kind == 'NEWLINE': line_start = mo.end() line_num += 1 continue elif kind == 'SKIP': continue elif kind == 'MISMATCH': raise RuntimeError(f'{value!r} unexpected on line {line_num}') yield Token(kind, value, line_num, column) statements = ''' IF quantity THEN total := total + price * quantity; tax := price * 0.05; ENDIF; ''' for token in tokenize(statements): print(token) The tokenizer produces the following output: Token(type='IF', value='IF', line=2, column=4) Token(type='ID', value='quantity', line=2, column=7) Token(type='THEN', value='THEN', line=2, column=16) Token(type='ID', value='total', line=3, column=8) Token(type='ASSIGN', value=':=', line=3, column=14) Token(type='ID', value='total', line=3, column=17) Token(type='OP', value='+', line=3, column=23) Token(type='ID', value='price', line=3, column=25) Token(type='OP', value='*', line=3, column=31) Token(type='ID', value='quantity', line=3, column=33) Token(type='END', value=';', line=3, column=41) Token(type='ID', value='tax', line=4, column=8) Token(type='ASSIGN', value=':=', line=4, column=12) Token(type='ID', value='price', line=4, column=15) Token(type='OP', value='*', line=4, column=21) Token(type='NUMBER', value=0.05, line=4, column=23) Token(type='END', value=';', line=4, column=27) Token(type='ENDIF', value='ENDIF', line=5, column=4) Token(type='END', value=';', line=5, column=9) [Frie09] Friedl, Jeffrey. Mastering Regular Expressions. 3rd ed., O’Reilly Media, 2009. The third edition of the book no longer covers Python at all, but the first edition covered writing good regular expression patterns in great detail. "readline" — GNU readline interface *********************************** ====================================================================== The "readline" module defines a number of functions to facilitate completion and reading/writing of history files from the Python interpreter. This module can be used directly, or via the "rlcompleter" module, which supports completion of Python identifiers at the interactive prompt. Settings made using this module affect the behaviour of both the interpreter’s interactive prompt and the prompts offered by the built-in "input()" function. Readline keybindings may be configured via an initialization file, typically ".inputrc" in your home directory. See Readline Init File in the GNU Readline manual for information about the format and allowable constructs of that file, and the capabilities of the Readline library in general. Note: The underlying Readline library API may be implemented by the "libedit" library instead of GNU readline. On macOS the "readline" module detects which library is being used at run time.The configuration file for "libedit" is different from that of GNU readline. If you programmatically load configuration strings you can check for the text “libedit” in "readline.__doc__" to differentiate between GNU readline and libedit.If you use *editline*/"libedit" readline emulation on macOS, the initialization file located in your home directory is named ".editrc". For example, the following content in "~/.editrc" will turn ON *vi* keybindings and TAB completion: python:bind -v python:bind ^I rl_complete Init file ========= The following functions relate to the init file and user configuration: readline.parse_and_bind(string) Execute the init line provided in the *string* argument. This calls "rl_parse_and_bind()" in the underlying library. readline.read_init_file([filename]) Execute a readline initialization file. The default filename is the last filename used. This calls "rl_read_init_file()" in the underlying library. Line buffer =========== The following functions operate on the line buffer: readline.get_line_buffer() Return the current contents of the line buffer ("rl_line_buffer" in the underlying library). readline.insert_text(string) Insert text into the line buffer at the cursor position. This calls "rl_insert_text()" in the underlying library, but ignores the return value. readline.redisplay() Change what’s displayed on the screen to reflect the current contents of the line buffer. This calls "rl_redisplay()" in the underlying library. History file ============ The following functions operate on a history file: readline.read_history_file([filename]) Load a readline history file, and append it to the history list. The default filename is "~/.history". This calls "read_history()" in the underlying library. readline.write_history_file([filename]) Save the history list to a readline history file, overwriting any existing file. The default filename is "~/.history". This calls "write_history()" in the underlying library. readline.append_history_file(nelements[, filename]) Append the last *nelements* items of history to a file. The default filename is "~/.history". The file must already exist. This calls "append_history()" in the underlying library. This function only exists if Python was compiled for a version of the library that supports it. New in version 3.5. readline.get_history_length() readline.set_history_length(length) Set or return the desired number of lines to save in the history file. The "write_history_file()" function uses this value to truncate the history file, by calling "history_truncate_file()" in the underlying library. Negative values imply unlimited history file size. History list ============ The following functions operate on a global history list: readline.clear_history() Clear the current history. This calls "clear_history()" in the underlying library. The Python function only exists if Python was compiled for a version of the library that supports it. readline.get_current_history_length() Return the number of items currently in the history. (This is different from "get_history_length()", which returns the maximum number of lines that will be written to a history file.) readline.get_history_item(index) Return the current contents of history item at *index*. The item index is one-based. This calls "history_get()" in the underlying library. readline.remove_history_item(pos) Remove history item specified by its position from the history. The position is zero-based. This calls "remove_history()" in the underlying library. readline.replace_history_item(pos, line) Replace history item specified by its position with *line*. The position is zero-based. This calls "replace_history_entry()" in the underlying library. readline.add_history(line) Append *line* to the history buffer, as if it was the last line typed. This calls "add_history()" in the underlying library. readline.set_auto_history(enabled) Enable or disable automatic calls to "add_history()" when reading input via readline. The *enabled* argument should be a Boolean value that when true, enables auto history, and that when false, disables auto history. New in version 3.6. **CPython implementation detail:** Auto history is enabled by default, and changes to this do not persist across multiple sessions. Startup hooks ============= readline.set_startup_hook([function]) Set or remove the function invoked by the "rl_startup_hook" callback of the underlying library. If *function* is specified, it will be used as the new hook function; if omitted or "None", any function already installed is removed. The hook is called with no arguments just before readline prints the first prompt. readline.set_pre_input_hook([function]) Set or remove the function invoked by the "rl_pre_input_hook" callback of the underlying library. If *function* is specified, it will be used as the new hook function; if omitted or "None", any function already installed is removed. The hook is called with no arguments after the first prompt has been printed and just before readline starts reading input characters. This function only exists if Python was compiled for a version of the library that supports it. Completion ========== The following functions relate to implementing a custom word completion function. This is typically operated by the Tab key, and can suggest and automatically complete a word being typed. By default, Readline is set up to be used by "rlcompleter" to complete Python identifiers for the interactive interpreter. If the "readline" module is to be used with a custom completer, a different set of word delimiters should be set. readline.set_completer([function]) Set or remove the completer function. If *function* is specified, it will be used as the new completer function; if omitted or "None", any completer function already installed is removed. The completer function is called as "function(text, state)", for *state* in "0", "1", "2", …, until it returns a non-string value. It should return the next possible completion starting with *text*. The installed completer function is invoked by the *entry_func* callback passed to "rl_completion_matches()" in the underlying library. The *text* string comes from the first parameter to the "rl_attempted_completion_function" callback of the underlying library. readline.get_completer() Get the completer function, or "None" if no completer function has been set. readline.get_completion_type() Get the type of completion being attempted. This returns the "rl_completion_type" variable in the underlying library as an integer. readline.get_begidx() readline.get_endidx() Get the beginning or ending index of the completion scope. These indexes are the *start* and *end* arguments passed to the "rl_attempted_completion_function" callback of the underlying library. readline.set_completer_delims(string) readline.get_completer_delims() Set or get the word delimiters for completion. These determine the start of the word to be considered for completion (the completion scope). These functions access the "rl_completer_word_break_characters" variable in the underlying library. readline.set_completion_display_matches_hook([function]) Set or remove the completion display function. If *function* is specified, it will be used as the new completion display function; if omitted or "None", any completion display function already installed is removed. This sets or clears the "rl_completion_display_matches_hook" callback in the underlying library. The completion display function is called as "function(substitution, [matches], longest_match_length)" once each time matches need to be displayed. Example ======= The following example demonstrates how to use the "readline" module’s history reading and writing functions to automatically load and save a history file named ".python_history" from the user’s home directory. The code below would normally be executed automatically during interactive sessions from the user’s "PYTHONSTARTUP" file. import atexit import os import readline histfile = os.path.join(os.path.expanduser("~"), ".python_history") try: readline.read_history_file(histfile) # default history len is -1 (infinite), which may grow unruly readline.set_history_length(1000) except FileNotFoundError: pass atexit.register(readline.write_history_file, histfile) This code is actually automatically run when Python is run in interactive mode (see Readline configuration). The following example achieves the same goal but supports concurrent interactive sessions, by only appending the new history. import atexit import os import readline histfile = os.path.join(os.path.expanduser("~"), ".python_history") try: readline.read_history_file(histfile) h_len = readline.get_current_history_length() except FileNotFoundError: open(histfile, 'wb').close() h_len = 0 def save(prev_h_len, histfile): new_h_len = readline.get_current_history_length() readline.set_history_length(1000) readline.append_history_file(new_h_len - prev_h_len, histfile) atexit.register(save, h_len, histfile) The following example extends the "code.InteractiveConsole" class to support history save/restore. import atexit import code import os import readline class HistoryConsole(code.InteractiveConsole): def __init__(self, locals=None, filename="", histfile=os.path.expanduser("~/.console-history")): code.InteractiveConsole.__init__(self, locals, filename) self.init_history(histfile) def init_history(self, histfile): readline.parse_and_bind("tab: complete") if hasattr(readline, "read_history_file"): try: readline.read_history_file(histfile) except FileNotFoundError: pass atexit.register(self.save_history, histfile) def save_history(self, histfile): readline.set_history_length(1000) readline.write_history_file(histfile) "reprlib" — Alternate "repr()" implementation ********************************************* **Source code:** Lib/reprlib.py ====================================================================== The "reprlib" module provides a means for producing object representations with limits on the size of the resulting strings. This is used in the Python debugger and may be useful in other contexts as well. This module provides a class, an instance, and a function: class reprlib.Repr Class which provides formatting services useful in implementing functions similar to the built-in "repr()"; size limits for different object types are added to avoid the generation of representations which are excessively long. reprlib.aRepr This is an instance of "Repr" which is used to provide the "repr()" function described below. Changing the attributes of this object will affect the size limits used by "repr()" and the Python debugger. reprlib.repr(obj) This is the "repr()" method of "aRepr". It returns a string similar to that returned by the built-in function of the same name, but with limits on most sizes. In addition to size-limiting tools, the module also provides a decorator for detecting recursive calls to "__repr__()" and substituting a placeholder string instead. @reprlib.recursive_repr(fillvalue="...") Decorator for "__repr__()" methods to detect recursive calls within the same thread. If a recursive call is made, the *fillvalue* is returned, otherwise, the usual "__repr__()" call is made. For example: >>> from reprlib import recursive_repr >>> class MyList(list): ... @recursive_repr() ... def __repr__(self): ... return '<' + '|'.join(map(repr, self)) + '>' ... >>> m = MyList('abc') >>> m.append(m) >>> m.append('x') >>> print(m) <'a'|'b'|'c'|...|'x'> New in version 3.2. Repr Objects ============ "Repr" instances provide several attributes which can be used to provide size limits for the representations of different object types, and methods which format specific object types. Repr.maxlevel Depth limit on the creation of recursive representations. The default is "6". Repr.maxdict Repr.maxlist Repr.maxtuple Repr.maxset Repr.maxfrozenset Repr.maxdeque Repr.maxarray Limits on the number of entries represented for the named object type. The default is "4" for "maxdict", "5" for "maxarray", and "6" for the others. Repr.maxlong Maximum number of characters in the representation for an integer. Digits are dropped from the middle. The default is "40". Repr.maxstring Limit on the number of characters in the representation of the string. Note that the “normal” representation of the string is used as the character source: if escape sequences are needed in the representation, these may be mangled when the representation is shortened. The default is "30". Repr.maxother This limit is used to control the size of object types for which no specific formatting method is available on the "Repr" object. It is applied in a similar manner as "maxstring". The default is "20". Repr.repr(obj) The equivalent to the built-in "repr()" that uses the formatting imposed by the instance. Repr.repr1(obj, level) Recursive implementation used by "repr()". This uses the type of *obj* to determine which formatting method to call, passing it *obj* and *level*. The type-specific methods should call "repr1()" to perform recursive formatting, with "level - 1" for the value of *level* in the recursive call. Repr.repr_TYPE(obj, level) Formatting methods for specific types are implemented as methods with a name based on the type name. In the method name, **TYPE** is replaced by "'_'.join(type(obj).__name__.split())". Dispatch to these methods is handled by "repr1()". Type-specific methods which need to recursively format a value should call "self.repr1(subobj, level - 1)". Subclassing Repr Objects ======================== The use of dynamic dispatching by "Repr.repr1()" allows subclasses of "Repr" to add support for additional built-in object types or to modify the handling of types already supported. This example shows how special support for file objects could be added: import reprlib import sys class MyRepr(reprlib.Repr): def repr_TextIOWrapper(self, obj, level): if obj.name in {'', '', ''}: return obj.name return repr(obj) aRepr = MyRepr() print(aRepr.repr(sys.stdin)) # prints '' "resource" — Resource usage information *************************************** ====================================================================== This module provides basic mechanisms for measuring and controlling system resources utilized by a program. Symbolic constants are used to specify particular system resources and to request usage information about either the current process or its children. An "OSError" is raised on syscall failure. exception resource.error A deprecated alias of "OSError". Changed in version 3.3: Following **PEP 3151**, this class was made an alias of "OSError". Resource Limits =============== Resources usage can be limited using the "setrlimit()" function described below. Each resource is controlled by a pair of limits: a soft limit and a hard limit. The soft limit is the current limit, and may be lowered or raised by a process over time. The soft limit can never exceed the hard limit. The hard limit can be lowered to any value greater than the soft limit, but not raised. (Only processes with the effective UID of the super-user can raise a hard limit.) The specific resources that can be limited are system dependent. They are described in the *getrlimit(2)* man page. The resources listed below are supported when the underlying operating system supports them; resources which cannot be checked or controlled by the operating system are not defined in this module for those platforms. resource.RLIM_INFINITY Constant used to represent the limit for an unlimited resource. resource.getrlimit(resource) Returns a tuple "(soft, hard)" with the current soft and hard limits of *resource*. Raises "ValueError" if an invalid resource is specified, or "error" if the underlying system call fails unexpectedly. resource.setrlimit(resource, limits) Sets new limits of consumption of *resource*. The *limits* argument must be a tuple "(soft, hard)" of two integers describing the new limits. A value of "RLIM_INFINITY" can be used to request a limit that is unlimited. Raises "ValueError" if an invalid resource is specified, if the new soft limit exceeds the hard limit, or if a process tries to raise its hard limit. Specifying a limit of "RLIM_INFINITY" when the hard or system limit for that resource is not unlimited will result in a "ValueError". A process with the effective UID of super-user can request any valid limit value, including unlimited, but "ValueError" will still be raised if the requested limit exceeds the system imposed limit. "setrlimit" may also raise "error" if the underlying system call fails. VxWorks only supports setting "RLIMIT_NOFILE". Raises an auditing event "resource.setrlimit" with arguments "resource", "limits". resource.prlimit(pid, resource[, limits]) Combines "setrlimit()" and "getrlimit()" in one function and supports to get and set the resources limits of an arbitrary process. If *pid* is 0, then the call applies to the current process. *resource* and *limits* have the same meaning as in "setrlimit()", except that *limits* is optional. When *limits* is not given the function returns the *resource* limit of the process *pid*. When *limits* is given the *resource* limit of the process is set and the former resource limit is returned. Raises "ProcessLookupError" when *pid* can’t be found and "PermissionError" when the user doesn’t have "CAP_SYS_RESOURCE" for the process. Raises an auditing event "resource.prlimit" with arguments "pid", "resource", "limits". Availability: Linux 2.6.36 or later with glibc 2.13 or later. New in version 3.4. These symbols define resources whose consumption can be controlled using the "setrlimit()" and "getrlimit()" functions described below. The values of these symbols are exactly the constants used by C programs. The Unix man page for *getrlimit(2)* lists the available resources. Note that not all systems use the same symbol or same value to denote the same resource. This module does not attempt to mask platform differences — symbols not defined for a platform will not be available from this module on that platform. resource.RLIMIT_CORE The maximum size (in bytes) of a core file that the current process can create. This may result in the creation of a partial core file if a larger core would be required to contain the entire process image. resource.RLIMIT_CPU The maximum amount of processor time (in seconds) that a process can use. If this limit is exceeded, a "SIGXCPU" signal is sent to the process. (See the "signal" module documentation for information about how to catch this signal and do something useful, e.g. flush open files to disk.) resource.RLIMIT_FSIZE The maximum size of a file which the process may create. resource.RLIMIT_DATA The maximum size (in bytes) of the process’s heap. resource.RLIMIT_STACK The maximum size (in bytes) of the call stack for the current process. This only affects the stack of the main thread in a multi-threaded process. resource.RLIMIT_RSS The maximum resident set size that should be made available to the process. resource.RLIMIT_NPROC The maximum number of processes the current process may create. resource.RLIMIT_NOFILE The maximum number of open file descriptors for the current process. resource.RLIMIT_OFILE The BSD name for "RLIMIT_NOFILE". resource.RLIMIT_MEMLOCK The maximum address space which may be locked in memory. resource.RLIMIT_VMEM The largest area of mapped memory which the process may occupy. resource.RLIMIT_AS The maximum area (in bytes) of address space which may be taken by the process. resource.RLIMIT_MSGQUEUE The number of bytes that can be allocated for POSIX message queues. Availability: Linux 2.6.8 or later. New in version 3.4. resource.RLIMIT_NICE The ceiling for the process’s nice level (calculated as 20 - rlim_cur). Availability: Linux 2.6.12 or later. New in version 3.4. resource.RLIMIT_RTPRIO The ceiling of the real-time priority. Availability: Linux 2.6.12 or later. New in version 3.4. resource.RLIMIT_RTTIME The time limit (in microseconds) on CPU time that a process can spend under real-time scheduling without making a blocking syscall. Availability: Linux 2.6.25 or later. New in version 3.4. resource.RLIMIT_SIGPENDING The number of signals which the process may queue. Availability: Linux 2.6.8 or later. New in version 3.4. resource.RLIMIT_SBSIZE The maximum size (in bytes) of socket buffer usage for this user. This limits the amount of network memory, and hence the amount of mbufs, that this user may hold at any time. Availability: FreeBSD 9 or later. New in version 3.4. resource.RLIMIT_SWAP The maximum size (in bytes) of the swap space that may be reserved or used by all of this user id’s processes. This limit is enforced only if bit 1 of the vm.overcommit sysctl is set. Please see tuning(7) for a complete description of this sysctl. Availability: FreeBSD 9 or later. New in version 3.4. resource.RLIMIT_NPTS The maximum number of pseudo-terminals created by this user id. Availability: FreeBSD 9 or later. New in version 3.4. Resource Usage ============== These functions are used to retrieve resource usage information: resource.getrusage(who) This function returns an object that describes the resources consumed by either the current process or its children, as specified by the *who* parameter. The *who* parameter should be specified using one of the "RUSAGE_*" constants described below. A simple example: from resource import * import time # a non CPU-bound task time.sleep(3) print(getrusage(RUSAGE_SELF)) # a CPU-bound task for i in range(10 ** 8): _ = 1 + 1 print(getrusage(RUSAGE_SELF)) The fields of the return value each describe how a particular system resource has been used, e.g. amount of time spent running is user mode or number of times the process was swapped out of main memory. Some values are dependent on the clock tick internal, e.g. the amount of memory the process is using. For backward compatibility, the return value is also accessible as a tuple of 16 elements. The fields "ru_utime" and "ru_stime" of the return value are floating point values representing the amount of time spent executing in user mode and the amount of time spent executing in system mode, respectively. The remaining values are integers. Consult the *getrusage(2)* man page for detailed information about these values. A brief summary is presented here: +----------+-----------------------+-----------------------------------------+ | Index | Field | Resource | |==========|=======================|=========================================| | "0" | "ru_utime" | time in user mode (float seconds) | +----------+-----------------------+-----------------------------------------+ | "1" | "ru_stime" | time in system mode (float seconds) | +----------+-----------------------+-----------------------------------------+ | "2" | "ru_maxrss" | maximum resident set size | +----------+-----------------------+-----------------------------------------+ | "3" | "ru_ixrss" | shared memory size | +----------+-----------------------+-----------------------------------------+ | "4" | "ru_idrss" | unshared memory size | +----------+-----------------------+-----------------------------------------+ | "5" | "ru_isrss" | unshared stack size | +----------+-----------------------+-----------------------------------------+ | "6" | "ru_minflt" | page faults not requiring I/O | +----------+-----------------------+-----------------------------------------+ | "7" | "ru_majflt" | page faults requiring I/O | +----------+-----------------------+-----------------------------------------+ | "8" | "ru_nswap" | number of swap outs | +----------+-----------------------+-----------------------------------------+ | "9" | "ru_inblock" | block input operations | +----------+-----------------------+-----------------------------------------+ | "10" | "ru_oublock" | block output operations | +----------+-----------------------+-----------------------------------------+ | "11" | "ru_msgsnd" | messages sent | +----------+-----------------------+-----------------------------------------+ | "12" | "ru_msgrcv" | messages received | +----------+-----------------------+-----------------------------------------+ | "13" | "ru_nsignals" | signals received | +----------+-----------------------+-----------------------------------------+ | "14" | "ru_nvcsw" | voluntary context switches | +----------+-----------------------+-----------------------------------------+ | "15" | "ru_nivcsw" | involuntary context switches | +----------+-----------------------+-----------------------------------------+ This function will raise a "ValueError" if an invalid *who* parameter is specified. It may also raise "error" exception in unusual circumstances. resource.getpagesize() Returns the number of bytes in a system page. (This need not be the same as the hardware page size.) The following "RUSAGE_*" symbols are passed to the "getrusage()" function to specify which processes information should be provided for. resource.RUSAGE_SELF Pass to "getrusage()" to request resources consumed by the calling process, which is the sum of resources used by all threads in the process. resource.RUSAGE_CHILDREN Pass to "getrusage()" to request resources consumed by child processes of the calling process which have been terminated and waited for. resource.RUSAGE_BOTH Pass to "getrusage()" to request resources consumed by both the current process and child processes. May not be available on all systems. resource.RUSAGE_THREAD Pass to "getrusage()" to request resources consumed by the current thread. May not be available on all systems. New in version 3.2. "rlcompleter" — Completion function for GNU readline **************************************************** **Source code:** Lib/rlcompleter.py ====================================================================== The "rlcompleter" module defines a completion function suitable for the "readline" module by completing valid Python identifiers and keywords. When this module is imported on a Unix platform with the "readline" module available, an instance of the "Completer" class is automatically created and its "complete()" method is set as the "readline" completer. Example: >>> import rlcompleter >>> import readline >>> readline.parse_and_bind("tab: complete") >>> readline. readline.__doc__ readline.get_line_buffer( readline.read_init_file( readline.__file__ readline.insert_text( readline.set_completer( readline.__name__ readline.parse_and_bind( >>> readline. The "rlcompleter" module is designed for use with Python’s interactive mode. Unless Python is run with the "-S" option, the module is automatically imported and configured (see Readline configuration). On platforms without "readline", the "Completer" class defined by this module can still be used for custom purposes. Completer Objects ================= Completer objects have the following method: Completer.complete(text, state) Return the *state*th completion for *text*. If called for *text* that doesn’t include a period character ("'.'"), it will complete from names currently defined in "__main__", "builtins" and keywords (as defined by the "keyword" module). If called for a dotted name, it will try to evaluate anything without obvious side-effects (functions will not be evaluated, but it can generate calls to "__getattr__()") up to the last part, and find matches for the rest via the "dir()" function. Any exception raised during the evaluation of the expression is caught, silenced and "None" is returned. "runpy" — Locating and executing Python modules *********************************************** **Source code:** Lib/runpy.py ====================================================================== The "runpy" module is used to locate and run Python modules without importing them first. Its main use is to implement the "-m" command line switch that allows scripts to be located using the Python module namespace rather than the filesystem. Note that this is *not* a sandbox module - all code is executed in the current process, and any side effects (such as cached imports of other modules) will remain in place after the functions have returned. Furthermore, any functions and classes defined by the executed code are not guaranteed to work correctly after a "runpy" function has returned. If that limitation is not acceptable for a given use case, "importlib" is likely to be a more suitable choice than this module. The "runpy" module provides two functions: runpy.run_module(mod_name, init_globals=None, run_name=None, alter_sys=False) Execute the code of the specified module and return the resulting module globals dictionary. The module’s code is first located using the standard import mechanism (refer to **PEP 302** for details) and then executed in a fresh module namespace. The *mod_name* argument should be an absolute module name. If the module name refers to a package rather than a normal module, then that package is imported and the "__main__" submodule within that package is then executed and the resulting module globals dictionary returned. The optional dictionary argument *init_globals* may be used to pre- populate the module’s globals dictionary before the code is executed. The supplied dictionary will not be modified. If any of the special global variables below are defined in the supplied dictionary, those definitions are overridden by "run_module()". The special global variables "__name__", "__spec__", "__file__", "__cached__", "__loader__" and "__package__" are set in the globals dictionary before the module code is executed (Note that this is a minimal set of variables - other variables may be set implicitly as an interpreter implementation detail). "__name__" is set to *run_name* if this optional argument is not "None", to "mod_name + '.__main__'" if the named module is a package and to the *mod_name* argument otherwise. "__spec__" will be set appropriately for the *actually* imported module (that is, "__spec__.name" will always be *mod_name* or "mod_name + '.__main__", never *run_name*). "__file__", "__cached__", "__loader__" and "__package__" are set as normal based on the module spec. If the argument *alter_sys* is supplied and evaluates to "True", then "sys.argv[0]" is updated with the value of "__file__" and "sys.modules[__name__]" is updated with a temporary module object for the module being executed. Both "sys.argv[0]" and "sys.modules[__name__]" are restored to their original values before the function returns. Note that this manipulation of "sys" is not thread-safe. Other threads may see the partially initialised module, as well as the altered list of arguments. It is recommended that the "sys" module be left alone when invoking this function from threaded code. See also: The "-m" option offering equivalent functionality from the command line. Changed in version 3.1: Added ability to execute packages by looking for a "__main__" submodule. Changed in version 3.2: Added "__cached__" global variable (see **PEP 3147**). Changed in version 3.4: Updated to take advantage of the module spec feature added by **PEP 451**. This allows "__cached__" to be set correctly for modules run this way, as well as ensuring the real module name is always accessible as "__spec__.name". runpy.run_path(path_name, init_globals=None, run_name=None) Execute the code at the named filesystem location and return the resulting module globals dictionary. As with a script name supplied to the CPython command line, the supplied path may refer to a Python source file, a compiled bytecode file or a valid sys.path entry containing a "__main__" module (e.g. a zipfile containing a top-level "__main__.py" file). For a simple script, the specified code is simply executed in a fresh module namespace. For a valid sys.path entry (typically a zipfile or directory), the entry is first added to the beginning of "sys.path". The function then looks for and executes a "__main__" module using the updated path. Note that there is no special protection against invoking an existing "__main__" entry located elsewhere on "sys.path" if there is no such module at the specified location. The optional dictionary argument *init_globals* may be used to pre- populate the module’s globals dictionary before the code is executed. The supplied dictionary will not be modified. If any of the special global variables below are defined in the supplied dictionary, those definitions are overridden by "run_path()". The special global variables "__name__", "__spec__", "__file__", "__cached__", "__loader__" and "__package__" are set in the globals dictionary before the module code is executed (Note that this is a minimal set of variables - other variables may be set implicitly as an interpreter implementation detail). "__name__" is set to *run_name* if this optional argument is not "None" and to "''" otherwise. If the supplied path directly references a script file (whether as source or as precompiled byte code), then "__file__" will be set to the supplied path, and "__spec__", "__cached__", "__loader__" and "__package__" will all be set to "None". If the supplied path is a reference to a valid sys.path entry, then "__spec__" will be set appropriately for the imported "__main__" module (that is, "__spec__.name" will always be "__main__"). "__file__", "__cached__", "__loader__" and "__package__" will be set as normal based on the module spec. A number of alterations are also made to the "sys" module. Firstly, "sys.path" may be altered as described above. "sys.argv[0]" is updated with the value of "path_name" and "sys.modules[__name__]" is updated with a temporary module object for the module being executed. All modifications to items in "sys" are reverted before the function returns. Note that, unlike "run_module()", the alterations made to "sys" are not optional in this function as these adjustments are essential to allowing the execution of sys.path entries. As the thread-safety limitations still apply, use of this function in threaded code should be either serialised with the import lock or delegated to a separate process. See also: Interface options for equivalent functionality on the command line ("python path/to/script"). New in version 3.2. Changed in version 3.4: Updated to take advantage of the module spec feature added by **PEP 451**. This allows "__cached__" to be set correctly in the case where "__main__" is imported from a valid sys.path entry rather than being executed directly. See also: **PEP 338** – Executing modules as scripts PEP written and implemented by Nick Coghlan. **PEP 366** – Main module explicit relative imports PEP written and implemented by Nick Coghlan. **PEP 451** – A ModuleSpec Type for the Import System PEP written and implemented by Eric Snow Command line and environment - CPython command line details The "importlib.import_module()" function "sched" — Event scheduler ************************* **Source code:** Lib/sched.py ====================================================================== The "sched" module defines a class which implements a general purpose event scheduler: class sched.scheduler(timefunc=time.monotonic, delayfunc=time.sleep) The "scheduler" class defines a generic interface to scheduling events. It needs two functions to actually deal with the “outside world” — *timefunc* should be callable without arguments, and return a number (the “time”, in any units whatsoever). The *delayfunc* function should be callable with one argument, compatible with the output of *timefunc*, and should delay that many time units. *delayfunc* will also be called with the argument "0" after each event is run to allow other threads an opportunity to run in multi-threaded applications. Changed in version 3.3: *timefunc* and *delayfunc* parameters are optional. Changed in version 3.3: "scheduler" class can be safely used in multi-threaded environments. Example: >>> import sched, time >>> s = sched.scheduler(time.time, time.sleep) >>> def print_time(a='default'): ... print("From print_time", time.time(), a) ... >>> def print_some_times(): ... print(time.time()) ... s.enter(10, 1, print_time) ... s.enter(5, 2, print_time, argument=('positional',)) ... s.enter(5, 1, print_time, kwargs={'a': 'keyword'}) ... s.run() ... print(time.time()) ... >>> print_some_times() 930343690.257 From print_time 930343695.274 positional From print_time 930343695.275 keyword From print_time 930343700.273 default 930343700.276 Scheduler Objects ================= "scheduler" instances have the following methods and attributes: scheduler.enterabs(time, priority, action, argument=(), kwargs={}) Schedule a new event. The *time* argument should be a numeric type compatible with the return value of the *timefunc* function passed to the constructor. Events scheduled for the same *time* will be executed in the order of their *priority*. A lower number represents a higher priority. Executing the event means executing "action(*argument, **kwargs)". *argument* is a sequence holding the positional arguments for *action*. *kwargs* is a dictionary holding the keyword arguments for *action*. Return value is an event which may be used for later cancellation of the event (see "cancel()"). Changed in version 3.3: *argument* parameter is optional. Changed in version 3.3: *kwargs* parameter was added. scheduler.enter(delay, priority, action, argument=(), kwargs={}) Schedule an event for *delay* more time units. Other than the relative time, the other arguments, the effect and the return value are the same as those for "enterabs()". Changed in version 3.3: *argument* parameter is optional. Changed in version 3.3: *kwargs* parameter was added. scheduler.cancel(event) Remove the event from the queue. If *event* is not an event currently in the queue, this method will raise a "ValueError". scheduler.empty() Return "True" if the event queue is empty. scheduler.run(blocking=True) Run all scheduled events. This method will wait (using the "delayfunc()" function passed to the constructor) for the next event, then execute it and so on until there are no more scheduled events. If *blocking* is false executes the scheduled events due to expire soonest (if any) and then return the deadline of the next scheduled call in the scheduler (if any). Either *action* or *delayfunc* can raise an exception. In either case, the scheduler will maintain a consistent state and propagate the exception. If an exception is raised by *action*, the event will not be attempted in future calls to "run()". If a sequence of events takes longer to run than the time available before the next event, the scheduler will simply fall behind. No events will be dropped; the calling code is responsible for canceling events which are no longer pertinent. Changed in version 3.3: *blocking* parameter was added. scheduler.queue Read-only attribute returning a list of upcoming events in the order they will be run. Each event is shown as a *named tuple* with the following fields: time, priority, action, argument, kwargs. "secrets" — Generate secure random numbers for managing secrets *************************************************************** New in version 3.6. **Source code:** Lib/secrets.py ====================================================================== The "secrets" module is used for generating cryptographically strong random numbers suitable for managing data such as passwords, account authentication, security tokens, and related secrets. In particular, "secrets" should be used in preference to the default pseudo-random number generator in the "random" module, which is designed for modelling and simulation, not security or cryptography. See also: **PEP 506** Random numbers ============== The "secrets" module provides access to the most secure source of randomness that your operating system provides. class secrets.SystemRandom A class for generating random numbers using the highest-quality sources provided by the operating system. See "random.SystemRandom" for additional details. secrets.choice(sequence) Return a randomly-chosen element from a non-empty sequence. secrets.randbelow(n) Return a random int in the range [0, *n*). secrets.randbits(k) Return an int with *k* random bits. Generating tokens ================= The "secrets" module provides functions for generating secure tokens, suitable for applications such as password resets, hard-to-guess URLs, and similar. secrets.token_bytes([nbytes=None]) Return a random byte string containing *nbytes* number of bytes. If *nbytes* is "None" or not supplied, a reasonable default is used. >>> token_bytes(16) b'\xebr\x17D*t\xae\xd4\xe3S\xb6\xe2\xebP1\x8b' secrets.token_hex([nbytes=None]) Return a random text string, in hexadecimal. The string has *nbytes* random bytes, each byte converted to two hex digits. If *nbytes* is "None" or not supplied, a reasonable default is used. >>> token_hex(16) 'f9bf78b9a18ce6d46a0cd2b0b86df9da' secrets.token_urlsafe([nbytes=None]) Return a random URL-safe text string, containing *nbytes* random bytes. The text is Base64 encoded, so on average each byte results in approximately 1.3 characters. If *nbytes* is "None" or not supplied, a reasonable default is used. >>> token_urlsafe(16) 'Drmhze6EPcv0fN_81Bj-nA' How many bytes should tokens use? --------------------------------- To be secure against brute-force attacks, tokens need to have sufficient randomness. Unfortunately, what is considered sufficient will necessarily increase as computers get more powerful and able to make more guesses in a shorter period. As of 2015, it is believed that 32 bytes (256 bits) of randomness is sufficient for the typical use-case expected for the "secrets" module. For those who want to manage their own token length, you can explicitly specify how much randomness is used for tokens by giving an "int" argument to the various "token_*" functions. That argument is taken as the number of bytes of randomness to use. Otherwise, if no argument is provided, or if the argument is "None", the "token_*" functions will use a reasonable default instead. Note: That default is subject to change at any time, including during maintenance releases. Other functions =============== secrets.compare_digest(a, b) Return "True" if strings *a* and *b* are equal, otherwise "False", in such a way as to reduce the risk of timing attacks. See "hmac.compare_digest()" for additional details. Recipes and best practices ========================== This section shows recipes and best practices for using "secrets" to manage a basic level of security. Generate an eight-character alphanumeric password: import string import secrets alphabet = string.ascii_letters + string.digits password = ''.join(secrets.choice(alphabet) for i in range(8)) Note: Applications should not store passwords in a recoverable format, whether plain text or encrypted. They should be salted and hashed using a cryptographically-strong one-way (irreversible) hash function. Generate a ten-character alphanumeric password with at least one lowercase character, at least one uppercase character, and at least three digits: import string import secrets alphabet = string.ascii_letters + string.digits while True: password = ''.join(secrets.choice(alphabet) for i in range(10)) if (any(c.islower() for c in password) and any(c.isupper() for c in password) and sum(c.isdigit() for c in password) >= 3): break Generate an XKCD-style passphrase: import secrets # On standard Linux systems, use a convenient dictionary file. # Other platforms may need to provide their own word-list. with open('/usr/share/dict/words') as f: words = [word.strip() for word in f] password = ' '.join(secrets.choice(words) for i in range(4)) Generate a hard-to-guess temporary URL containing a security token suitable for password recovery applications: import secrets url = 'https://example.com/reset=' + secrets.token_urlsafe() Security Considerations *********************** The following modules have specific security considerations: * "cgi": CGI security considerations * "hashlib": all constructors take a “usedforsecurity” keyword-only argument disabling known insecure and blocked algorithms * "http.server" is not suitable for production use, only implementing basic security checks. See the security considerations. * "logging": Logging configuration uses eval() * "multiprocessing": Connection.recv() uses pickle * "pickle": Restricting globals in pickle * "random" shouldn’t be used for security purposes, use "secrets" instead * "shelve": shelve is based on pickle and thus unsuitable for dealing with untrusted sources * "ssl": SSL/TLS security considerations * "subprocess": Subprocess security considerations * "tempfile": mktemp is deprecated due to vulnerability to race conditions * "xml": XML vulnerabilities * "zipfile": maliciously prepared .zip files can cause disk volume exhaustion "select" — Waiting for I/O completion ************************************* ====================================================================== This module provides access to the "select()" and "poll()" functions available in most operating systems, "devpoll()" available on Solaris and derivatives, "epoll()" available on Linux 2.5+ and "kqueue()" available on most BSD. Note that on Windows, it only works for sockets; on other operating systems, it also works for other file types (in particular, on Unix, it works on pipes). It cannot be used on regular files to determine whether a file has grown since it was last read. Note: The "selectors" module allows high-level and efficient I/O multiplexing, built upon the "select" module primitives. Users are encouraged to use the "selectors" module instead, unless they want precise control over the OS-level primitives used. The module defines the following: exception select.error A deprecated alias of "OSError". Changed in version 3.3: Following **PEP 3151**, this class was made an alias of "OSError". select.devpoll() (Only supported on Solaris and derivatives.) Returns a "/dev/poll" polling object; see section /dev/poll Polling Objects below for the methods supported by devpoll objects. "devpoll()" objects are linked to the number of file descriptors allowed at the time of instantiation. If your program reduces this value, "devpoll()" will fail. If your program increases this value, "devpoll()" may return an incomplete list of active file descriptors. The new file descriptor is non-inheritable. New in version 3.3. Changed in version 3.4: The new file descriptor is now non- inheritable. select.epoll(sizehint=-1, flags=0) (Only supported on Linux 2.5.44 and newer.) Return an edge polling object, which can be used as Edge or Level Triggered interface for I/O events. *sizehint* informs epoll about the expected number of events to be registered. It must be positive, or *-1* to use the default. It is only used on older systems where "epoll_create1()" is not available; otherwise it has no effect (though its value is still checked). *flags* is deprecated and completely ignored. However, when supplied, its value must be "0" or "select.EPOLL_CLOEXEC", otherwise "OSError" is raised. See the Edge and Level Trigger Polling (epoll) Objects section below for the methods supported by epolling objects. "epoll" objects support the context management protocol: when used in a "with" statement, the new file descriptor is automatically closed at the end of the block. The new file descriptor is non-inheritable. Changed in version 3.3: Added the *flags* parameter. Changed in version 3.4: Support for the "with" statement was added. The new file descriptor is now non-inheritable. Deprecated since version 3.4: The *flags* parameter. "select.EPOLL_CLOEXEC" is used by default now. Use "os.set_inheritable()" to make the file descriptor inheritable. select.poll() (Not supported by all operating systems.) Returns a polling object, which supports registering and unregistering file descriptors, and then polling them for I/O events; see section Polling Objects below for the methods supported by polling objects. select.kqueue() (Only supported on BSD.) Returns a kernel queue object; see section Kqueue Objects below for the methods supported by kqueue objects. The new file descriptor is non-inheritable. Changed in version 3.4: The new file descriptor is now non- inheritable. select.kevent(ident, filter=KQ_FILTER_READ, flags=KQ_EV_ADD, fflags=0, data=0, udata=0) (Only supported on BSD.) Returns a kernel event object; see section Kevent Objects below for the methods supported by kevent objects. select.select(rlist, wlist, xlist[, timeout]) This is a straightforward interface to the Unix "select()" system call. The first three arguments are iterables of ‘waitable objects’: either integers representing file descriptors or objects with a parameterless method named "fileno()" returning such an integer: * *rlist*: wait until ready for reading * *wlist*: wait until ready for writing * *xlist*: wait for an “exceptional condition” (see the manual page for what your system considers such a condition) Empty iterables are allowed, but acceptance of three empty iterables is platform-dependent. (It is known to work on Unix but not on Windows.) The optional *timeout* argument specifies a time- out as a floating point number in seconds. When the *timeout* argument is omitted the function blocks until at least one file descriptor is ready. A time-out value of zero specifies a poll and never blocks. The return value is a triple of lists of objects that are ready: subsets of the first three arguments. When the time-out is reached without a file descriptor becoming ready, three empty lists are returned. Among the acceptable object types in the iterables are Python *file objects* (e.g. "sys.stdin", or objects returned by "open()" or "os.popen()"), socket objects returned by "socket.socket()". You may also define a *wrapper* class yourself, as long as it has an appropriate "fileno()" method (that really returns a file descriptor, not just a random integer). Note: File objects on Windows are not acceptable, but sockets are. On Windows, the underlying "select()" function is provided by the WinSock library, and does not handle file descriptors that don’t originate from WinSock. Changed in version 3.5: The function is now retried with a recomputed timeout when interrupted by a signal, except if the signal handler raises an exception (see **PEP 475** for the rationale), instead of raising "InterruptedError". select.PIPE_BUF The minimum number of bytes which can be written without blocking to a pipe when the pipe has been reported as ready for writing by "select()", "poll()" or another interface in this module. This doesn’t apply to other kind of file-like objects such as sockets. This value is guaranteed by POSIX to be at least 512. Availability: Unix New in version 3.2. "/dev/poll" Polling Objects =========================== Solaris and derivatives have "/dev/poll". While "select()" is O(highest file descriptor) and "poll()" is O(number of file descriptors), "/dev/poll" is O(active file descriptors). "/dev/poll" behaviour is very close to the standard "poll()" object. devpoll.close() Close the file descriptor of the polling object. New in version 3.4. devpoll.closed "True" if the polling object is closed. New in version 3.4. devpoll.fileno() Return the file descriptor number of the polling object. New in version 3.4. devpoll.register(fd[, eventmask]) Register a file descriptor with the polling object. Future calls to the "poll()" method will then check whether the file descriptor has any pending I/O events. *fd* can be either an integer, or an object with a "fileno()" method that returns an integer. File objects implement "fileno()", so they can also be used as the argument. *eventmask* is an optional bitmask describing the type of events you want to check for. The constants are the same that with "poll()" object. The default value is a combination of the constants "POLLIN", "POLLPRI", and "POLLOUT". Warning: Registering a file descriptor that’s already registered is not an error, but the result is undefined. The appropriate action is to unregister or modify it first. This is an important difference compared with "poll()". devpoll.modify(fd[, eventmask]) This method does an "unregister()" followed by a "register()". It is (a bit) more efficient that doing the same explicitly. devpoll.unregister(fd) Remove a file descriptor being tracked by a polling object. Just like the "register()" method, *fd* can be an integer or an object with a "fileno()" method that returns an integer. Attempting to remove a file descriptor that was never registered is safely ignored. devpoll.poll([timeout]) Polls the set of registered file descriptors, and returns a possibly-empty list containing "(fd, event)" 2-tuples for the descriptors that have events or errors to report. *fd* is the file descriptor, and *event* is a bitmask with bits set for the reported events for that descriptor — "POLLIN" for waiting input, "POLLOUT" to indicate that the descriptor can be written to, and so forth. An empty list indicates that the call timed out and no file descriptors had any events to report. If *timeout* is given, it specifies the length of time in milliseconds which the system will wait for events before returning. If *timeout* is omitted, -1, or "None", the call will block until there is an event for this poll object. Changed in version 3.5: The function is now retried with a recomputed timeout when interrupted by a signal, except if the signal handler raises an exception (see **PEP 475** for the rationale), instead of raising "InterruptedError". Edge and Level Trigger Polling (epoll) Objects ============================================== https://linux.die.net/man/4/epoll *eventmask* +---------------------------+-------------------------------------------------+ | Constant | Meaning | |===========================|=================================================| | "EPOLLIN" | Available for read | +---------------------------+-------------------------------------------------+ | "EPOLLOUT" | Available for write | +---------------------------+-------------------------------------------------+ | "EPOLLPRI" | Urgent data for read | +---------------------------+-------------------------------------------------+ | "EPOLLERR" | Error condition happened on the assoc. fd | +---------------------------+-------------------------------------------------+ | "EPOLLHUP" | Hang up happened on the assoc. fd | +---------------------------+-------------------------------------------------+ | "EPOLLET" | Set Edge Trigger behavior, the default is Level | | | Trigger behavior | +---------------------------+-------------------------------------------------+ | "EPOLLONESHOT" | Set one-shot behavior. After one event is | | | pulled out, the fd is internally disabled | +---------------------------+-------------------------------------------------+ | "EPOLLEXCLUSIVE" | Wake only one epoll object when the associated | | | fd has an event. The default (if this flag is | | | not set) is to wake all epoll objects polling | | | on a fd. | +---------------------------+-------------------------------------------------+ | "EPOLLRDHUP" | Stream socket peer closed connection or shut | | | down writing half of connection. | +---------------------------+-------------------------------------------------+ | "EPOLLRDNORM" | Equivalent to "EPOLLIN" | +---------------------------+-------------------------------------------------+ | "EPOLLRDBAND" | Priority data band can be read. | +---------------------------+-------------------------------------------------+ | "EPOLLWRNORM" | Equivalent to "EPOLLOUT" | +---------------------------+-------------------------------------------------+ | "EPOLLWRBAND" | Priority data may be written. | +---------------------------+-------------------------------------------------+ | "EPOLLMSG" | Ignored. | +---------------------------+-------------------------------------------------+ New in version 3.6: "EPOLLEXCLUSIVE" was added. It’s only supported by Linux Kernel 4.5 or later. epoll.close() Close the control file descriptor of the epoll object. epoll.closed "True" if the epoll object is closed. epoll.fileno() Return the file descriptor number of the control fd. epoll.fromfd(fd) Create an epoll object from a given file descriptor. epoll.register(fd[, eventmask]) Register a fd descriptor with the epoll object. epoll.modify(fd, eventmask) Modify a registered file descriptor. epoll.unregister(fd) Remove a registered file descriptor from the epoll object. Changed in version 3.9: The method no longer ignores the "EBADF" error. epoll.poll(timeout=None, maxevents=-1) Wait for events. timeout in seconds (float) Changed in version 3.5: The function is now retried with a recomputed timeout when interrupted by a signal, except if the signal handler raises an exception (see **PEP 475** for the rationale), instead of raising "InterruptedError". Polling Objects =============== The "poll()" system call, supported on most Unix systems, provides better scalability for network servers that service many, many clients at the same time. "poll()" scales better because the system call only requires listing the file descriptors of interest, while "select()" builds a bitmap, turns on bits for the fds of interest, and then afterward the whole bitmap has to be linearly scanned again. "select()" is O(highest file descriptor), while "poll()" is O(number of file descriptors). poll.register(fd[, eventmask]) Register a file descriptor with the polling object. Future calls to the "poll()" method will then check whether the file descriptor has any pending I/O events. *fd* can be either an integer, or an object with a "fileno()" method that returns an integer. File objects implement "fileno()", so they can also be used as the argument. *eventmask* is an optional bitmask describing the type of events you want to check for, and can be a combination of the constants "POLLIN", "POLLPRI", and "POLLOUT", described in the table below. If not specified, the default value used will check for all 3 types of events. +---------------------+--------------------------------------------+ | Constant | Meaning | |=====================|============================================| | "POLLIN" | There is data to read | +---------------------+--------------------------------------------+ | "POLLPRI" | There is urgent data to read | +---------------------+--------------------------------------------+ | "POLLOUT" | Ready for output: writing will not block | +---------------------+--------------------------------------------+ | "POLLERR" | Error condition of some sort | +---------------------+--------------------------------------------+ | "POLLHUP" | Hung up | +---------------------+--------------------------------------------+ | "POLLRDHUP" | Stream socket peer closed connection, or | | | shut down writing half of connection | +---------------------+--------------------------------------------+ | "POLLNVAL" | Invalid request: descriptor not open | +---------------------+--------------------------------------------+ Registering a file descriptor that’s already registered is not an error, and has the same effect as registering the descriptor exactly once. poll.modify(fd, eventmask) Modifies an already registered fd. This has the same effect as "register(fd, eventmask)". Attempting to modify a file descriptor that was never registered causes an "OSError" exception with errno "ENOENT" to be raised. poll.unregister(fd) Remove a file descriptor being tracked by a polling object. Just like the "register()" method, *fd* can be an integer or an object with a "fileno()" method that returns an integer. Attempting to remove a file descriptor that was never registered causes a "KeyError" exception to be raised. poll.poll([timeout]) Polls the set of registered file descriptors, and returns a possibly-empty list containing "(fd, event)" 2-tuples for the descriptors that have events or errors to report. *fd* is the file descriptor, and *event* is a bitmask with bits set for the reported events for that descriptor — "POLLIN" for waiting input, "POLLOUT" to indicate that the descriptor can be written to, and so forth. An empty list indicates that the call timed out and no file descriptors had any events to report. If *timeout* is given, it specifies the length of time in milliseconds which the system will wait for events before returning. If *timeout* is omitted, negative, or "None", the call will block until there is an event for this poll object. Changed in version 3.5: The function is now retried with a recomputed timeout when interrupted by a signal, except if the signal handler raises an exception (see **PEP 475** for the rationale), instead of raising "InterruptedError". Kqueue Objects ============== kqueue.close() Close the control file descriptor of the kqueue object. kqueue.closed "True" if the kqueue object is closed. kqueue.fileno() Return the file descriptor number of the control fd. kqueue.fromfd(fd) Create a kqueue object from a given file descriptor. kqueue.control(changelist, max_events[, timeout]) -> eventlist Low level interface to kevent * changelist must be an iterable of kevent objects or "None" * max_events must be 0 or a positive integer * timeout in seconds (floats possible); the default is "None", to wait forever Changed in version 3.5: The function is now retried with a recomputed timeout when interrupted by a signal, except if the signal handler raises an exception (see **PEP 475** for the rationale), instead of raising "InterruptedError". Kevent Objects ============== https://www.freebsd.org/cgi/man.cgi?query=kqueue&sektion=2 kevent.ident Value used to identify the event. The interpretation depends on the filter but it’s usually the file descriptor. In the constructor ident can either be an int or an object with a "fileno()" method. kevent stores the integer internally. kevent.filter Name of the kernel filter. +-----------------------------+-----------------------------------------------+ | Constant | Meaning | |=============================|===============================================| | "KQ_FILTER_READ" | Takes a descriptor and returns whenever there | | | is data available to read | +-----------------------------+-----------------------------------------------+ | "KQ_FILTER_WRITE" | Takes a descriptor and returns whenever there | | | is data available to write | +-----------------------------+-----------------------------------------------+ | "KQ_FILTER_AIO" | AIO requests | +-----------------------------+-----------------------------------------------+ | "KQ_FILTER_VNODE" | Returns when one or more of the requested | | | events watched in *fflag* occurs | +-----------------------------+-----------------------------------------------+ | "KQ_FILTER_PROC" | Watch for events on a process id | +-----------------------------+-----------------------------------------------+ | "KQ_FILTER_NETDEV" | Watch for events on a network device [not | | | available on macOS] | +-----------------------------+-----------------------------------------------+ | "KQ_FILTER_SIGNAL" | Returns whenever the watched signal is | | | delivered to the process | +-----------------------------+-----------------------------------------------+ | "KQ_FILTER_TIMER" | Establishes an arbitrary timer | +-----------------------------+-----------------------------------------------+ kevent.flags Filter action. +-----------------------------+-----------------------------------------------+ | Constant | Meaning | |=============================|===============================================| | "KQ_EV_ADD" | Adds or modifies an event | +-----------------------------+-----------------------------------------------+ | "KQ_EV_DELETE" | Removes an event from the queue | +-----------------------------+-----------------------------------------------+ | "KQ_EV_ENABLE" | Permitscontrol() to returns the event | +-----------------------------+-----------------------------------------------+ | "KQ_EV_DISABLE" | Disablesevent | +-----------------------------+-----------------------------------------------+ | "KQ_EV_ONESHOT" | Removes event after first occurrence | +-----------------------------+-----------------------------------------------+ | "KQ_EV_CLEAR" | Reset the state after an event is retrieved | +-----------------------------+-----------------------------------------------+ | "KQ_EV_SYSFLAGS" | internal event | +-----------------------------+-----------------------------------------------+ | "KQ_EV_FLAG1" | internal event | +-----------------------------+-----------------------------------------------+ | "KQ_EV_EOF" | Filter specific EOF condition | +-----------------------------+-----------------------------------------------+ | "KQ_EV_ERROR" | See return values | +-----------------------------+-----------------------------------------------+ kevent.fflags Filter specific flags. "KQ_FILTER_READ" and "KQ_FILTER_WRITE" filter flags: +------------------------------+----------------------------------------------+ | Constant | Meaning | |==============================|==============================================| | "KQ_NOTE_LOWAT" | low water mark of a socket buffer | +------------------------------+----------------------------------------------+ "KQ_FILTER_VNODE" filter flags: +------------------------------+----------------------------------------------+ | Constant | Meaning | |==============================|==============================================| | "KQ_NOTE_DELETE" | *unlink()* was called | +------------------------------+----------------------------------------------+ | "KQ_NOTE_WRITE" | a write occurred | +------------------------------+----------------------------------------------+ | "KQ_NOTE_EXTEND" | the file was extended | +------------------------------+----------------------------------------------+ | "KQ_NOTE_ATTRIB" | an attribute was changed | +------------------------------+----------------------------------------------+ | "KQ_NOTE_LINK" | the link count has changed | +------------------------------+----------------------------------------------+ | "KQ_NOTE_RENAME" | the file was renamed | +------------------------------+----------------------------------------------+ | "KQ_NOTE_REVOKE" | access to the file was revoked | +------------------------------+----------------------------------------------+ "KQ_FILTER_PROC" filter flags: +------------------------------+----------------------------------------------+ | Constant | Meaning | |==============================|==============================================| | "KQ_NOTE_EXIT" | the process has exited | +------------------------------+----------------------------------------------+ | "KQ_NOTE_FORK" | the process has called *fork()* | +------------------------------+----------------------------------------------+ | "KQ_NOTE_EXEC" | the process has executed a new process | +------------------------------+----------------------------------------------+ | "KQ_NOTE_PCTRLMASK" | internal filter flag | +------------------------------+----------------------------------------------+ | "KQ_NOTE_PDATAMASK" | internal filter flag | +------------------------------+----------------------------------------------+ | "KQ_NOTE_TRACK" | follow a process across *fork()* | +------------------------------+----------------------------------------------+ | "KQ_NOTE_CHILD" | returned on the child process for | | | *NOTE_TRACK* | +------------------------------+----------------------------------------------+ | "KQ_NOTE_TRACKERR" | unable to attach to a child | +------------------------------+----------------------------------------------+ "KQ_FILTER_NETDEV" filter flags (not available on macOS): +------------------------------+----------------------------------------------+ | Constant | Meaning | |==============================|==============================================| | "KQ_NOTE_LINKUP" | link is up | +------------------------------+----------------------------------------------+ | "KQ_NOTE_LINKDOWN" | link is down | +------------------------------+----------------------------------------------+ | "KQ_NOTE_LINKINV" | link state is invalid | +------------------------------+----------------------------------------------+ kevent.data Filter specific data. kevent.udata User defined value. "selectors" — High-level I/O multiplexing ***************************************** New in version 3.4. **Source code:** Lib/selectors.py ====================================================================== Introduction ============ This module allows high-level and efficient I/O multiplexing, built upon the "select" module primitives. Users are encouraged to use this module instead, unless they want precise control over the OS-level primitives used. It defines a "BaseSelector" abstract base class, along with several concrete implementations ("KqueueSelector", "EpollSelector"…), that can be used to wait for I/O readiness notification on multiple file objects. In the following, “file object” refers to any object with a "fileno()" method, or a raw file descriptor. See *file object*. "DefaultSelector" is an alias to the most efficient implementation available on the current platform: this should be the default choice for most users. Note: The type of file objects supported depends on the platform: on Windows, sockets are supported, but not pipes, whereas on Unix, both are supported (some other types may be supported as well, such as fifos or special file devices). See also: "select" Low-level I/O multiplexing module. Classes ======= Classes hierarchy: BaseSelector +-- SelectSelector +-- PollSelector +-- EpollSelector +-- DevpollSelector +-- KqueueSelector In the following, *events* is a bitwise mask indicating which I/O events should be waited for on a given file object. It can be a combination of the modules constants below: +-------------------------+-------------------------------------------------+ | Constant | Meaning | |=========================|=================================================| | "EVENT_READ" | Available for read | +-------------------------+-------------------------------------------------+ | "EVENT_WRITE" | Available for write | +-------------------------+-------------------------------------------------+ class selectors.SelectorKey A "SelectorKey" is a "namedtuple" used to associate a file object to its underlying file descriptor, selected event mask and attached data. It is returned by several "BaseSelector" methods. fileobj File object registered. fd Underlying file descriptor. events Events that must be waited for on this file object. data Optional opaque data associated to this file object: for example, this could be used to store a per-client session ID. class selectors.BaseSelector A "BaseSelector" is used to wait for I/O event readiness on multiple file objects. It supports file stream registration, unregistration, and a method to wait for I/O events on those streams, with an optional timeout. It’s an abstract base class, so cannot be instantiated. Use "DefaultSelector" instead, or one of "SelectSelector", "KqueueSelector" etc. if you want to specifically use an implementation, and your platform supports it. "BaseSelector" and its concrete implementations support the *context manager* protocol. abstractmethod register(fileobj, events, data=None) Register a file object for selection, monitoring it for I/O events. *fileobj* is the file object to monitor. It may either be an integer file descriptor or an object with a "fileno()" method. *events* is a bitwise mask of events to monitor. *data* is an opaque object. This returns a new "SelectorKey" instance, or raises a "ValueError" in case of invalid event mask or file descriptor, or "KeyError" if the file object is already registered. abstractmethod unregister(fileobj) Unregister a file object from selection, removing it from monitoring. A file object shall be unregistered prior to being closed. *fileobj* must be a file object previously registered. This returns the associated "SelectorKey" instance, or raises a "KeyError" if *fileobj* is not registered. It will raise "ValueError" if *fileobj* is invalid (e.g. it has no "fileno()" method or its "fileno()" method has an invalid return value). modify(fileobj, events, data=None) Change a registered file object’s monitored events or attached data. This is equivalent to "BaseSelector.unregister(fileobj)()" followed by "BaseSelector.register(fileobj, events, data)()", except that it can be implemented more efficiently. This returns a new "SelectorKey" instance, or raises a "ValueError" in case of invalid event mask or file descriptor, or "KeyError" if the file object is not registered. abstractmethod select(timeout=None) Wait until some registered file objects become ready, or the timeout expires. If "timeout > 0", this specifies the maximum wait time, in seconds. If "timeout <= 0", the call won’t block, and will report the currently ready file objects. If *timeout* is "None", the call will block until a monitored file object becomes ready. This returns a list of "(key, events)" tuples, one for each ready file object. *key* is the "SelectorKey" instance corresponding to a ready file object. *events* is a bitmask of events ready on this file object. Note: This method can return before any file object becomes ready or the timeout has elapsed if the current process receives a signal: in this case, an empty list will be returned. Changed in version 3.5: The selector is now retried with a recomputed timeout when interrupted by a signal if the signal handler did not raise an exception (see **PEP 475** for the rationale), instead of returning an empty list of events before the timeout. close() Close the selector. This must be called to make sure that any underlying resource is freed. The selector shall not be used once it has been closed. get_key(fileobj) Return the key associated with a registered file object. This returns the "SelectorKey" instance associated to this file object, or raises "KeyError" if the file object is not registered. abstractmethod get_map() Return a mapping of file objects to selector keys. This returns a "Mapping" instance mapping registered file objects to their associated "SelectorKey" instance. class selectors.DefaultSelector The default selector class, using the most efficient implementation available on the current platform. This should be the default choice for most users. class selectors.SelectSelector "select.select()"-based selector. class selectors.PollSelector "select.poll()"-based selector. class selectors.EpollSelector "select.epoll()"-based selector. fileno() This returns the file descriptor used by the underlying "select.epoll()" object. class selectors.DevpollSelector "select.devpoll()"-based selector. fileno() This returns the file descriptor used by the underlying "select.devpoll()" object. New in version 3.5. class selectors.KqueueSelector "select.kqueue()"-based selector. fileno() This returns the file descriptor used by the underlying "select.kqueue()" object. Examples ======== Here is a simple echo server implementation: import selectors import socket sel = selectors.DefaultSelector() def accept(sock, mask): conn, addr = sock.accept() # Should be ready print('accepted', conn, 'from', addr) conn.setblocking(False) sel.register(conn, selectors.EVENT_READ, read) def read(conn, mask): data = conn.recv(1000) # Should be ready if data: print('echoing', repr(data), 'to', conn) conn.send(data) # Hope it won't block else: print('closing', conn) sel.unregister(conn) conn.close() sock = socket.socket() sock.bind(('localhost', 1234)) sock.listen(100) sock.setblocking(False) sel.register(sock, selectors.EVENT_READ, accept) while True: events = sel.select() for key, mask in events: callback = key.data callback(key.fileobj, mask) "shelve" — Python object persistence ************************************ **Source code:** Lib/shelve.py ====================================================================== A “shelf” is a persistent, dictionary-like object. The difference with “dbm” databases is that the values (not the keys!) in a shelf can be essentially arbitrary Python objects — anything that the "pickle" module can handle. This includes most class instances, recursive data types, and objects containing lots of shared sub-objects. The keys are ordinary strings. shelve.open(filename, flag='c', protocol=None, writeback=False) Open a persistent dictionary. The filename specified is the base filename for the underlying database. As a side-effect, an extension may be added to the filename and more than one file may be created. By default, the underlying database file is opened for reading and writing. The optional *flag* parameter has the same interpretation as the *flag* parameter of "dbm.open()". By default, version 3 pickles are used to serialize values. The version of the pickle protocol can be specified with the *protocol* parameter. Because of Python semantics, a shelf cannot know when a mutable persistent-dictionary entry is modified. By default modified objects are written *only* when assigned to the shelf (see Example). If the optional *writeback* parameter is set to "True", all entries accessed are also cached in memory, and written back on "sync()" and "close()"; this can make it handier to mutate mutable entries in the persistent dictionary, but, if many entries are accessed, it can consume vast amounts of memory for the cache, and it can make the close operation very slow since all accessed entries are written back (there is no way to determine which accessed entries are mutable, nor which ones were actually mutated). Note: Do not rely on the shelf being closed automatically; always call "close()" explicitly when you don’t need it any more, or use "shelve.open()" as a context manager: with shelve.open('spam') as db: db['eggs'] = 'eggs' Warning: Because the "shelve" module is backed by "pickle", it is insecure to load a shelf from an untrusted source. Like with pickle, loading a shelf can execute arbitrary code. Shelf objects support most of methods and operations supported by dictionaries (except copying, constructors and operators "|" and "|="). This eases the transition from dictionary based scripts to those requiring persistent storage. Two additional methods are supported: Shelf.sync() Write back all entries in the cache if the shelf was opened with *writeback* set to "True". Also empty the cache and synchronize the persistent dictionary on disk, if feasible. This is called automatically when the shelf is closed with "close()". Shelf.close() Synchronize and close the persistent *dict* object. Operations on a closed shelf will fail with a "ValueError". See also: Persistent dictionary recipe with widely supported storage formats and having the speed of native dictionaries. Restrictions ============ * The choice of which database package will be used (such as "dbm.ndbm" or "dbm.gnu") depends on which interface is available. Therefore it is not safe to open the database directly using "dbm". The database is also (unfortunately) subject to the limitations of "dbm", if it is used — this means that (the pickled representation of) the objects stored in the database should be fairly small, and in rare cases key collisions may cause the database to refuse updates. * The "shelve" module does not support *concurrent* read/write access to shelved objects. (Multiple simultaneous read accesses are safe.) When a program has a shelf open for writing, no other program should have it open for reading or writing. Unix file locking can be used to solve this, but this differs across Unix versions and requires knowledge about the database implementation used. class shelve.Shelf(dict, protocol=None, writeback=False, keyencoding='utf-8') A subclass of "collections.abc.MutableMapping" which stores pickled values in the *dict* object. By default, version 3 pickles are used to serialize values. The version of the pickle protocol can be specified with the *protocol* parameter. See the "pickle" documentation for a discussion of the pickle protocols. If the *writeback* parameter is "True", the object will hold a cache of all entries accessed and write them back to the *dict* at sync and close times. This allows natural operations on mutable entries, but can consume much more memory and make sync and close take a long time. The *keyencoding* parameter is the encoding used to encode keys before they are used with the underlying dict. A "Shelf" object can also be used as a context manager, in which case it will be automatically closed when the "with" block ends. Changed in version 3.2: Added the *keyencoding* parameter; previously, keys were always encoded in UTF-8. Changed in version 3.4: Added context manager support. class shelve.BsdDbShelf(dict, protocol=None, writeback=False, keyencoding='utf-8') A subclass of "Shelf" which exposes "first()", "next()", "previous()", "last()" and "set_location()" which are available in the third-party "bsddb" module from pybsddb but not in other database modules. The *dict* object passed to the constructor must support those methods. This is generally accomplished by calling one of "bsddb.hashopen()", "bsddb.btopen()" or "bsddb.rnopen()". The optional *protocol*, *writeback*, and *keyencoding* parameters have the same interpretation as for the "Shelf" class. class shelve.DbfilenameShelf(filename, flag='c', protocol=None, writeback=False) A subclass of "Shelf" which accepts a *filename* instead of a dict- like object. The underlying file will be opened using "dbm.open()". By default, the file will be created and opened for both read and write. The optional *flag* parameter has the same interpretation as for the "open()" function. The optional *protocol* and *writeback* parameters have the same interpretation as for the "Shelf" class. Example ======= To summarize the interface ("key" is a string, "data" is an arbitrary object): import shelve d = shelve.open(filename) # open -- file may get suffix added by low-level # library d[key] = data # store data at key (overwrites old data if # using an existing key) data = d[key] # retrieve a COPY of data at key (raise KeyError # if no such key) del d[key] # delete data stored at key (raises KeyError # if no such key) flag = key in d # true if the key exists klist = list(d.keys()) # a list of all existing keys (slow!) # as d was opened WITHOUT writeback=True, beware: d['xx'] = [0, 1, 2] # this works as expected, but... d['xx'].append(3) # *this doesn't!* -- d['xx'] is STILL [0, 1, 2]! # having opened d without writeback=True, you need to code carefully: temp = d['xx'] # extracts the copy temp.append(5) # mutates the copy d['xx'] = temp # stores the copy right back, to persist it # or, d=shelve.open(filename,writeback=True) would let you just code # d['xx'].append(5) and have it work as expected, BUT it would also # consume more memory and make the d.close() operation slower. d.close() # close it See also: Module "dbm" Generic interface to "dbm"-style databases. Module "pickle" Object serialization used by "shelve". "shlex" — Simple lexical analysis ********************************* **Source code:** Lib/shlex.py ====================================================================== The "shlex" class makes it easy to write lexical analyzers for simple syntaxes resembling that of the Unix shell. This will often be useful for writing minilanguages, (for example, in run control files for Python applications) or for parsing quoted strings. The "shlex" module defines the following functions: shlex.split(s, comments=False, posix=True) Split the string *s* using shell-like syntax. If *comments* is "False" (the default), the parsing of comments in the given string will be disabled (setting the "commenters" attribute of the "shlex" instance to the empty string). This function operates in POSIX mode by default, but uses non-POSIX mode if the *posix* argument is false. Note: Since the "split()" function instantiates a "shlex" instance, passing "None" for *s* will read the string to split from standard input. Deprecated since version 3.9: Passing "None" for *s* will raise an exception in future Python versions. shlex.join(split_command) Concatenate the tokens of the list *split_command* and return a string. This function is the inverse of "split()". >>> from shlex import join >>> print(join(['echo', '-n', 'Multiple words'])) echo -n 'Multiple words' The returned value is shell-escaped to protect against injection vulnerabilities (see "quote()"). New in version 3.8. shlex.quote(s) Return a shell-escaped version of the string *s*. The returned value is a string that can safely be used as one token in a shell command line, for cases where you cannot use a list. This idiom would be unsafe: >>> filename = 'somefile; rm -rf ~' >>> command = 'ls -l {}'.format(filename) >>> print(command) # executed by a shell: boom! ls -l somefile; rm -rf ~ "quote()" lets you plug the security hole: >>> from shlex import quote >>> command = 'ls -l {}'.format(quote(filename)) >>> print(command) ls -l 'somefile; rm -rf ~' >>> remote_command = 'ssh home {}'.format(quote(command)) >>> print(remote_command) ssh home 'ls -l '"'"'somefile; rm -rf ~'"'"'' The quoting is compatible with UNIX shells and with "split()": >>> from shlex import split >>> remote_command = split(remote_command) >>> remote_command ['ssh', 'home', "ls -l 'somefile; rm -rf ~'"] >>> command = split(remote_command[-1]) >>> command ['ls', '-l', 'somefile; rm -rf ~'] New in version 3.3. The "shlex" module defines the following class: class shlex.shlex(instream=None, infile=None, posix=False, punctuation_chars=False) A "shlex" instance or subclass instance is a lexical analyzer object. The initialization argument, if present, specifies where to read characters from. It must be a file-/stream-like object with "read()" and "readline()" methods, or a string. If no argument is given, input will be taken from "sys.stdin". The second optional argument is a filename string, which sets the initial value of the "infile" attribute. If the *instream* argument is omitted or equal to "sys.stdin", this second argument defaults to “stdin”. The *posix* argument defines the operational mode: when *posix* is not true (default), the "shlex" instance will operate in compatibility mode. When operating in POSIX mode, "shlex" will try to be as close as possible to the POSIX shell parsing rules. The *punctuation_chars* argument provides a way to make the behaviour even closer to how real shells parse. This can take a number of values: the default value, "False", preserves the behaviour seen under Python 3.5 and earlier. If set to "True", then parsing of the characters "();<>|&" is changed: any run of these characters (considered punctuation characters) is returned as a single token. If set to a non-empty string of characters, those characters will be used as the punctuation characters. Any characters in the "wordchars" attribute that appear in *punctuation_chars* will be removed from "wordchars". See Improved Compatibility with Shells for more information. *punctuation_chars* can be set only upon "shlex" instance creation and can’t be modified later. Changed in version 3.6: The *punctuation_chars* parameter was added. See also: Module "configparser" Parser for configuration files similar to the Windows ".ini" files. shlex Objects ============= A "shlex" instance has the following methods: shlex.get_token() Return a token. If tokens have been stacked using "push_token()", pop a token off the stack. Otherwise, read one from the input stream. If reading encounters an immediate end-of-file, "eof" is returned (the empty string ("''") in non-POSIX mode, and "None" in POSIX mode). shlex.push_token(str) Push the argument onto the token stack. shlex.read_token() Read a raw token. Ignore the pushback stack, and do not interpret source requests. (This is not ordinarily a useful entry point, and is documented here only for the sake of completeness.) shlex.sourcehook(filename) When "shlex" detects a source request (see "source" below) this method is given the following token as argument, and expected to return a tuple consisting of a filename and an open file-like object. Normally, this method first strips any quotes off the argument. If the result is an absolute pathname, or there was no previous source request in effect, or the previous source was a stream (such as "sys.stdin"), the result is left alone. Otherwise, if the result is a relative pathname, the directory part of the name of the file immediately before it on the source inclusion stack is prepended (this behavior is like the way the C preprocessor handles "#include "file.h""). The result of the manipulations is treated as a filename, and returned as the first component of the tuple, with "open()" called on it to yield the second component. (Note: this is the reverse of the order of arguments in instance initialization!) This hook is exposed so that you can use it to implement directory search paths, addition of file extensions, and other namespace hacks. There is no corresponding ‘close’ hook, but a shlex instance will call the "close()" method of the sourced input stream when it returns EOF. For more explicit control of source stacking, use the "push_source()" and "pop_source()" methods. shlex.push_source(newstream, newfile=None) Push an input source stream onto the input stack. If the filename argument is specified it will later be available for use in error messages. This is the same method used internally by the "sourcehook()" method. shlex.pop_source() Pop the last-pushed input source from the input stack. This is the same method used internally when the lexer reaches EOF on a stacked input stream. shlex.error_leader(infile=None, lineno=None) This method generates an error message leader in the format of a Unix C compiler error label; the format is "'"%s", line %d: '", where the "%s" is replaced with the name of the current source file and the "%d" with the current input line number (the optional arguments can be used to override these). This convenience is provided to encourage "shlex" users to generate error messages in the standard, parseable format understood by Emacs and other Unix tools. Instances of "shlex" subclasses have some public instance variables which either control lexical analysis or can be used for debugging: shlex.commenters The string of characters that are recognized as comment beginners. All characters from the comment beginner to end of line are ignored. Includes just "'#'" by default. shlex.wordchars The string of characters that will accumulate into multi-character tokens. By default, includes all ASCII alphanumerics and underscore. In POSIX mode, the accented characters in the Latin-1 set are also included. If "punctuation_chars" is not empty, the characters "~-./*?=", which can appear in filename specifications and command line parameters, will also be included in this attribute, and any characters which appear in "punctuation_chars" will be removed from "wordchars" if they are present there. If "whitespace_split" is set to "True", this will have no effect. shlex.whitespace Characters that will be considered whitespace and skipped. Whitespace bounds tokens. By default, includes space, tab, linefeed and carriage-return. shlex.escape Characters that will be considered as escape. This will be only used in POSIX mode, and includes just "'\'" by default. shlex.quotes Characters that will be considered string quotes. The token accumulates until the same quote is encountered again (thus, different quote types protect each other as in the shell.) By default, includes ASCII single and double quotes. shlex.escapedquotes Characters in "quotes" that will interpret escape characters defined in "escape". This is only used in POSIX mode, and includes just "'"'" by default. shlex.whitespace_split If "True", tokens will only be split in whitespaces. This is useful, for example, for parsing command lines with "shlex", getting tokens in a similar way to shell arguments. When used in combination with "punctuation_chars", tokens will be split on whitespace in addition to those characters. Changed in version 3.8: The "punctuation_chars" attribute was made compatible with the "whitespace_split" attribute. shlex.infile The name of the current input file, as initially set at class instantiation time or stacked by later source requests. It may be useful to examine this when constructing error messages. shlex.instream The input stream from which this "shlex" instance is reading characters. shlex.source This attribute is "None" by default. If you assign a string to it, that string will be recognized as a lexical-level inclusion request similar to the "source" keyword in various shells. That is, the immediately following token will be opened as a filename and input will be taken from that stream until EOF, at which point the "close()" method of that stream will be called and the input source will again become the original input stream. Source requests may be stacked any number of levels deep. shlex.debug If this attribute is numeric and "1" or more, a "shlex" instance will print verbose progress output on its behavior. If you need to use this, you can read the module source code to learn the details. shlex.lineno Source line number (count of newlines seen so far plus one). shlex.token The token buffer. It may be useful to examine this when catching exceptions. shlex.eof Token used to determine end of file. This will be set to the empty string ("''"), in non-POSIX mode, and to "None" in POSIX mode. shlex.punctuation_chars A read-only property. Characters that will be considered punctuation. Runs of punctuation characters will be returned as a single token. However, note that no semantic validity checking will be performed: for example, ‘>>>’ could be returned as a token, even though it may not be recognised as such by shells. New in version 3.6. Parsing Rules ============= When operating in non-POSIX mode, "shlex" will try to obey to the following rules. * Quote characters are not recognized within words ("Do"Not"Separate" is parsed as the single word "Do"Not"Separate"); * Escape characters are not recognized; * Enclosing characters in quotes preserve the literal value of all characters within the quotes; * Closing quotes separate words (""Do"Separate" is parsed as ""Do"" and "Separate"); * If "whitespace_split" is "False", any character not declared to be a word character, whitespace, or a quote will be returned as a single- character token. If it is "True", "shlex" will only split words in whitespaces; * EOF is signaled with an empty string ("''"); * It’s not possible to parse empty strings, even if quoted. When operating in POSIX mode, "shlex" will try to obey to the following parsing rules. * Quotes are stripped out, and do not separate words (""Do"Not"Separate"" is parsed as the single word "DoNotSeparate"); * Non-quoted escape characters (e.g. "'\'") preserve the literal value of the next character that follows; * Enclosing characters in quotes which are not part of "escapedquotes" (e.g. ""'"") preserve the literal value of all characters within the quotes; * Enclosing characters in quotes which are part of "escapedquotes" (e.g. "'"'") preserves the literal value of all characters within the quotes, with the exception of the characters mentioned in "escape". The escape characters retain its special meaning only when followed by the quote in use, or the escape character itself. Otherwise the escape character will be considered a normal character. * EOF is signaled with a "None" value; * Quoted empty strings ("''") are allowed. Improved Compatibility with Shells ================================== New in version 3.6. The "shlex" class provides compatibility with the parsing performed by common Unix shells like "bash", "dash", and "sh". To take advantage of this compatibility, specify the "punctuation_chars" argument in the constructor. This defaults to "False", which preserves pre-3.6 behaviour. However, if it is set to "True", then parsing of the characters "();<>|&" is changed: any run of these characters is returned as a single token. While this is short of a full parser for shells (which would be out of scope for the standard library, given the multiplicity of shells out there), it does allow you to perform processing of command lines more easily than you could otherwise. To illustrate, you can see the difference in the following snippet: >>> import shlex >>> text = "a && b; c && d || e; f >'abc'; (def \"ghi\")" >>> s = shlex.shlex(text, posix=True) >>> s.whitespace_split = True >>> list(s) ['a', '&&', 'b;', 'c', '&&', 'd', '||', 'e;', 'f', '>abc;', '(def', 'ghi)'] >>> s = shlex.shlex(text, posix=True, punctuation_chars=True) >>> s.whitespace_split = True >>> list(s) ['a', '&&', 'b', ';', 'c', '&&', 'd', '||', 'e', ';', 'f', '>', 'abc', ';', '(', 'def', 'ghi', ')'] Of course, tokens will be returned which are not valid for shells, and you’ll need to implement your own error checks on the returned tokens. Instead of passing "True" as the value for the punctuation_chars parameter, you can pass a string with specific characters, which will be used to determine which characters constitute punctuation. For example: >>> import shlex >>> s = shlex.shlex("a && b || c", punctuation_chars="|") >>> list(s) ['a', '&', '&', 'b', '||', 'c'] Note: When "punctuation_chars" is specified, the "wordchars" attribute is augmented with the characters "~-./*?=". That is because these characters can appear in file names (including wildcards) and command-line arguments (e.g. "--color=auto"). Hence: >>> import shlex >>> s = shlex.shlex('~/a && b-c --color=auto || d *.py?', ... punctuation_chars=True) >>> list(s) ['~/a', '&&', 'b-c', '--color=auto', '||', 'd', '*.py?'] However, to match the shell as closely as possible, it is recommended to always use "posix" and "whitespace_split" when using "punctuation_chars", which will negate "wordchars" entirely. For best effect, "punctuation_chars" should be set in conjunction with "posix=True". (Note that "posix=False" is the default for "shlex".) "shutil" — High-level file operations ************************************* **Source code:** Lib/shutil.py ====================================================================== The "shutil" module offers a number of high-level operations on files and collections of files. In particular, functions are provided which support file copying and removal. For operations on individual files, see also the "os" module. Warning: Even the higher-level file copying functions ("shutil.copy()", "shutil.copy2()") cannot copy all file metadata.On POSIX platforms, this means that file owner and group are lost as well as ACLs. On Mac OS, the resource fork and other metadata are not used. This means that resources will be lost and file type and creator codes will not be correct. On Windows, file owners, ACLs and alternate data streams are not copied. Directory and files operations ============================== shutil.copyfileobj(fsrc, fdst[, length]) Copy the contents of the file-like object *fsrc* to the file-like object *fdst*. The integer *length*, if given, is the buffer size. In particular, a negative *length* value means to copy the data without looping over the source data in chunks; by default the data is read in chunks to avoid uncontrolled memory consumption. Note that if the current file position of the *fsrc* object is not 0, only the contents from the current file position to the end of the file will be copied. shutil.copyfile(src, dst, *, follow_symlinks=True) Copy the contents (no metadata) of the file named *src* to a file named *dst* and return *dst* in the most efficient way possible. *src* and *dst* are path-like objects or path names given as strings. *dst* must be the complete target file name; look at "copy()" for a copy that accepts a target directory path. If *src* and *dst* specify the same file, "SameFileError" is raised. The destination location must be writable; otherwise, an "OSError" exception will be raised. If *dst* already exists, it will be replaced. Special files such as character or block devices and pipes cannot be copied with this function. If *follow_symlinks* is false and *src* is a symbolic link, a new symbolic link will be created instead of copying the file *src* points to. Raises an auditing event "shutil.copyfile" with arguments "src", "dst". Changed in version 3.3: "IOError" used to be raised instead of "OSError". Added *follow_symlinks* argument. Now returns *dst*. Changed in version 3.4: Raise "SameFileError" instead of "Error". Since the former is a subclass of the latter, this change is backward compatible. Changed in version 3.8: Platform-specific fast-copy syscalls may be used internally in order to copy the file more efficiently. See Platform-dependent efficient copy operations section. exception shutil.SameFileError This exception is raised if source and destination in "copyfile()" are the same file. New in version 3.4. shutil.copymode(src, dst, *, follow_symlinks=True) Copy the permission bits from *src* to *dst*. The file contents, owner, and group are unaffected. *src* and *dst* are path-like objects or path names given as strings. If *follow_symlinks* is false, and both *src* and *dst* are symbolic links, "copymode()" will attempt to modify the mode of *dst* itself (rather than the file it points to). This functionality is not available on every platform; please see "copystat()" for more information. If "copymode()" cannot modify symbolic links on the local platform, and it is asked to do so, it will do nothing and return. Raises an auditing event "shutil.copymode" with arguments "src", "dst". Changed in version 3.3: Added *follow_symlinks* argument. shutil.copystat(src, dst, *, follow_symlinks=True) Copy the permission bits, last access time, last modification time, and flags from *src* to *dst*. On Linux, "copystat()" also copies the “extended attributes” where possible. The file contents, owner, and group are unaffected. *src* and *dst* are path-like objects or path names given as strings. If *follow_symlinks* is false, and *src* and *dst* both refer to symbolic links, "copystat()" will operate on the symbolic links themselves rather than the files the symbolic links refer to—reading the information from the *src* symbolic link, and writing the information to the *dst* symbolic link. Note: Not all platforms provide the ability to examine and modify symbolic links. Python itself can tell you what functionality is locally available. * If "os.chmod in os.supports_follow_symlinks" is "True", "copystat()" can modify the permission bits of a symbolic link. * If "os.utime in os.supports_follow_symlinks" is "True", "copystat()" can modify the last access and modification times of a symbolic link. * If "os.chflags in os.supports_follow_symlinks" is "True", "copystat()" can modify the flags of a symbolic link. ("os.chflags" is not available on all platforms.) On platforms where some or all of this functionality is unavailable, when asked to modify a symbolic link, "copystat()" will copy everything it can. "copystat()" never returns failure.Please see "os.supports_follow_symlinks" for more information. Raises an auditing event "shutil.copystat" with arguments "src", "dst". Changed in version 3.3: Added *follow_symlinks* argument and support for Linux extended attributes. shutil.copy(src, dst, *, follow_symlinks=True) Copies the file *src* to the file or directory *dst*. *src* and *dst* should be *path-like objects* or strings. If *dst* specifies a directory, the file will be copied into *dst* using the base filename from *src*. If *dst* specifies a file that already exists, it will be replaced. Returns the path to the newly created file. If *follow_symlinks* is false, and *src* is a symbolic link, *dst* will be created as a symbolic link. If *follow_symlinks* is true and *src* is a symbolic link, *dst* will be a copy of the file *src* refers to. "copy()" copies the file data and the file’s permission mode (see "os.chmod()"). Other metadata, like the file’s creation and modification times, is not preserved. To preserve all file metadata from the original, use "copy2()" instead. Raises an auditing event "shutil.copyfile" with arguments "src", "dst". Raises an auditing event "shutil.copymode" with arguments "src", "dst". Changed in version 3.3: Added *follow_symlinks* argument. Now returns path to the newly created file. Changed in version 3.8: Platform-specific fast-copy syscalls may be used internally in order to copy the file more efficiently. See Platform-dependent efficient copy operations section. shutil.copy2(src, dst, *, follow_symlinks=True) Identical to "copy()" except that "copy2()" also attempts to preserve file metadata. When *follow_symlinks* is false, and *src* is a symbolic link, "copy2()" attempts to copy all metadata from the *src* symbolic link to the newly-created *dst* symbolic link. However, this functionality is not available on all platforms. On platforms where some or all of this functionality is unavailable, "copy2()" will preserve all the metadata it can; "copy2()" never raises an exception because it cannot preserve file metadata. "copy2()" uses "copystat()" to copy the file metadata. Please see "copystat()" for more information about platform support for modifying symbolic link metadata. Raises an auditing event "shutil.copyfile" with arguments "src", "dst". Raises an auditing event "shutil.copystat" with arguments "src", "dst". Changed in version 3.3: Added *follow_symlinks* argument, try to copy extended file system attributes too (currently Linux only). Now returns path to the newly created file. Changed in version 3.8: Platform-specific fast-copy syscalls may be used internally in order to copy the file more efficiently. See Platform-dependent efficient copy operations section. shutil.ignore_patterns(*patterns) This factory function creates a function that can be used as a callable for "copytree()"’s *ignore* argument, ignoring files and directories that match one of the glob-style *patterns* provided. See the example below. shutil.copytree(src, dst, symlinks=False, ignore=None, copy_function=copy2, ignore_dangling_symlinks=False, dirs_exist_ok=False) Recursively copy an entire directory tree rooted at *src* to a directory named *dst* and return the destination directory. All intermediate directories needed to contain *dst* will also be created by default. Permissions and times of directories are copied with "copystat()", individual files are copied using "copy2()". If *symlinks* is true, symbolic links in the source tree are represented as symbolic links in the new tree and the metadata of the original links will be copied as far as the platform allows; if false or omitted, the contents and metadata of the linked files are copied to the new tree. When *symlinks* is false, if the file pointed by the symlink doesn’t exist, an exception will be added in the list of errors raised in an "Error" exception at the end of the copy process. You can set the optional *ignore_dangling_symlinks* flag to true if you want to silence this exception. Notice that this option has no effect on platforms that don’t support "os.symlink()". If *ignore* is given, it must be a callable that will receive as its arguments the directory being visited by "copytree()", and a list of its contents, as returned by "os.listdir()". Since "copytree()" is called recursively, the *ignore* callable will be called once for each directory that is copied. The callable must return a sequence of directory and file names relative to the current directory (i.e. a subset of the items in its second argument); these names will then be ignored in the copy process. "ignore_patterns()" can be used to create such a callable that ignores names based on glob-style patterns. If exception(s) occur, an "Error" is raised with a list of reasons. If *copy_function* is given, it must be a callable that will be used to copy each file. It will be called with the source path and the destination path as arguments. By default, "copy2()" is used, but any function that supports the same signature (like "copy()") can be used. If *dirs_exist_ok* is false (the default) and *dst* already exists, a "FileExistsError" is raised. If *dirs_exist_ok* is true, the copying operation will continue if it encounters existing directories, and files within the *dst* tree will be overwritten by corresponding files from the *src* tree. Raises an auditing event "shutil.copytree" with arguments "src", "dst". Changed in version 3.3: Copy metadata when *symlinks* is false. Now returns *dst*. Changed in version 3.2: Added the *copy_function* argument to be able to provide a custom copy function. Added the *ignore_dangling_symlinks* argument to silence dangling symlinks errors when *symlinks* is false. Changed in version 3.8: Platform-specific fast-copy syscalls may be used internally in order to copy the file more efficiently. See Platform-dependent efficient copy operations section. New in version 3.8: The *dirs_exist_ok* parameter. shutil.rmtree(path, ignore_errors=False, onerror=None) Delete an entire directory tree; *path* must point to a directory (but not a symbolic link to a directory). If *ignore_errors* is true, errors resulting from failed removals will be ignored; if false or omitted, such errors are handled by calling a handler specified by *onerror* or, if that is omitted, they raise an exception. Note: On platforms that support the necessary fd-based functions a symlink attack resistant version of "rmtree()" is used by default. On other platforms, the "rmtree()" implementation is susceptible to a symlink attack: given proper timing and circumstances, attackers can manipulate symlinks on the filesystem to delete files they wouldn’t be able to access otherwise. Applications can use the "rmtree.avoids_symlink_attacks" function attribute to determine which case applies. If *onerror* is provided, it must be a callable that accepts three parameters: *function*, *path*, and *excinfo*. The first parameter, *function*, is the function which raised the exception; it depends on the platform and implementation. The second parameter, *path*, will be the path name passed to *function*. The third parameter, *excinfo*, will be the exception information returned by "sys.exc_info()". Exceptions raised by *onerror* will not be caught. Raises an auditing event "shutil.rmtree" with argument "path". Changed in version 3.3: Added a symlink attack resistant version that is used automatically if platform supports fd-based functions. Changed in version 3.8: On Windows, will no longer delete the contents of a directory junction before removing the junction. rmtree.avoids_symlink_attacks Indicates whether the current platform and implementation provides a symlink attack resistant version of "rmtree()". Currently this is only true for platforms supporting fd-based directory access functions. New in version 3.3. shutil.move(src, dst, copy_function=copy2) Recursively move a file or directory (*src*) to another location (*dst*) and return the destination. If the destination is an existing directory, then *src* is moved inside that directory. If the destination already exists but is not a directory, it may be overwritten depending on "os.rename()" semantics. If the destination is on the current filesystem, then "os.rename()" is used. Otherwise, *src* is copied to *dst* using *copy_function* and then removed. In case of symlinks, a new symlink pointing to the target of *src* will be created in or as *dst* and *src* will be removed. If *copy_function* is given, it must be a callable that takes two arguments *src* and *dst*, and will be used to copy *src* to *dst* if "os.rename()" cannot be used. If the source is a directory, "copytree()" is called, passing it the "copy_function()". The default *copy_function* is "copy2()". Using "copy()" as the *copy_function* allows the move to succeed when it is not possible to also copy the metadata, at the expense of not copying any of the metadata. Raises an auditing event "shutil.move" with arguments "src", "dst". Changed in version 3.3: Added explicit symlink handling for foreign filesystems, thus adapting it to the behavior of GNU’s **mv**. Now returns *dst*. Changed in version 3.5: Added the *copy_function* keyword argument. Changed in version 3.8: Platform-specific fast-copy syscalls may be used internally in order to copy the file more efficiently. See Platform-dependent efficient copy operations section. Changed in version 3.9: Accepts a *path-like object* for both *src* and *dst*. shutil.disk_usage(path) Return disk usage statistics about the given path as a *named tuple* with the attributes *total*, *used* and *free*, which are the amount of total, used and free space, in bytes. *path* may be a file or a directory. New in version 3.3. Changed in version 3.8: On Windows, *path* can now be a file or directory. Availability: Unix, Windows. shutil.chown(path, user=None, group=None) Change owner *user* and/or *group* of the given *path*. *user* can be a system user name or a uid; the same applies to *group*. At least one argument is required. See also "os.chown()", the underlying function. Raises an auditing event "shutil.chown" with arguments "path", "user", "group". Availability: Unix. New in version 3.3. shutil.which(cmd, mode=os.F_OK | os.X_OK, path=None) Return the path to an executable which would be run if the given *cmd* was called. If no *cmd* would be called, return "None". *mode* is a permission mask passed to "os.access()", by default determining if the file exists and executable. When no *path* is specified, the results of "os.environ()" are used, returning either the “PATH” value or a fallback of "os.defpath". On Windows, the current directory is always prepended to the *path* whether or not you use the default or provide your own, which is the behavior the command shell uses when finding executables. Additionally, when finding the *cmd* in the *path*, the "PATHEXT" environment variable is checked. For example, if you call "shutil.which("python")", "which()" will search "PATHEXT" to know that it should look for "python.exe" within the *path* directories. For example, on Windows: >>> shutil.which("python") 'C:\\Python33\\python.EXE' New in version 3.3. Changed in version 3.8: The "bytes" type is now accepted. If *cmd* type is "bytes", the result type is also "bytes". exception shutil.Error This exception collects exceptions that are raised during a multi- file operation. For "copytree()", the exception argument is a list of 3-tuples (*srcname*, *dstname*, *exception*). Platform-dependent efficient copy operations -------------------------------------------- Starting from Python 3.8, all functions involving a file copy ("copyfile()", "copy()", "copy2()", "copytree()", and "move()") may use platform-specific “fast-copy” syscalls in order to copy the file more efficiently (see bpo-33671). “fast-copy” means that the copying operation occurs within the kernel, avoiding the use of userspace buffers in Python as in “"outfd.write(infd.read())"”. On macOS fcopyfile is used to copy the file content (not metadata). On Linux "os.sendfile()" is used. On Windows "shutil.copyfile()" uses a bigger default buffer size (1 MiB instead of 64 KiB) and a "memoryview()"-based variant of "shutil.copyfileobj()" is used. If the fast-copy operation fails and no data was written in the destination file then shutil will silently fallback on using less efficient "copyfileobj()" function internally. Changed in version 3.8. copytree example ---------------- An example that uses the "ignore_patterns()" helper: from shutil import copytree, ignore_patterns copytree(source, destination, ignore=ignore_patterns('*.pyc', 'tmp*')) This will copy everything except ".pyc" files and files or directories whose name starts with "tmp". Another example that uses the *ignore* argument to add a logging call: from shutil import copytree import logging def _logpath(path, names): logging.info('Working in %s', path) return [] # nothing will be ignored copytree(source, destination, ignore=_logpath) rmtree example -------------- This example shows how to remove a directory tree on Windows where some of the files have their read-only bit set. It uses the onerror callback to clear the readonly bit and reattempt the remove. Any subsequent failure will propagate. import os, stat import shutil def remove_readonly(func, path, _): "Clear the readonly bit and reattempt the removal" os.chmod(path, stat.S_IWRITE) func(path) shutil.rmtree(directory, onerror=remove_readonly) Archiving operations ==================== New in version 3.2. Changed in version 3.5: Added support for the *xztar* format. High-level utilities to create and read compressed and archived files are also provided. They rely on the "zipfile" and "tarfile" modules. shutil.make_archive(base_name, format[, root_dir[, base_dir[, verbose[, dry_run[, owner[, group[, logger]]]]]]]) Create an archive file (such as zip or tar) and return its name. *base_name* is the name of the file to create, including the path, minus any format-specific extension. *format* is the archive format: one of “zip” (if the "zlib" module is available), “tar”, “gztar” (if the "zlib" module is available), “bztar” (if the "bz2" module is available), or “xztar” (if the "lzma" module is available). *root_dir* is a directory that will be the root directory of the archive, all paths in the archive will be relative to it; for example, we typically chdir into *root_dir* before creating the archive. *base_dir* is the directory where we start archiving from; i.e. *base_dir* will be the common prefix of all files and directories in the archive. *base_dir* must be given relative to *root_dir*. See Archiving example with base_dir for how to use *base_dir* and *root_dir* together. *root_dir* and *base_dir* both default to the current directory. If *dry_run* is true, no archive is created, but the operations that would be executed are logged to *logger*. *owner* and *group* are used when creating a tar archive. By default, uses the current owner and group. *logger* must be an object compatible with **PEP 282**, usually an instance of "logging.Logger". The *verbose* argument is unused and deprecated. Raises an auditing event "shutil.make_archive" with arguments "base_name", "format", "root_dir", "base_dir". Note: This function is not thread-safe. Changed in version 3.8: The modern pax (POSIX.1-2001) format is now used instead of the legacy GNU format for archives created with "format="tar"". shutil.get_archive_formats() Return a list of supported formats for archiving. Each element of the returned sequence is a tuple "(name, description)". By default "shutil" provides these formats: * *zip*: ZIP file (if the "zlib" module is available). * *tar*: Uncompressed tar file. Uses POSIX.1-2001 pax format for new archives. * *gztar*: gzip’ed tar-file (if the "zlib" module is available). * *bztar*: bzip2’ed tar-file (if the "bz2" module is available). * *xztar*: xz’ed tar-file (if the "lzma" module is available). You can register new formats or provide your own archiver for any existing formats, by using "register_archive_format()". shutil.register_archive_format(name, function[, extra_args[, description]]) Register an archiver for the format *name*. *function* is the callable that will be used to unpack archives. The callable will receive the *base_name* of the file to create, followed by the *base_dir* (which defaults to "os.curdir") to start archiving from. Further arguments are passed as keyword arguments: *owner*, *group*, *dry_run* and *logger* (as passed in "make_archive()"). If given, *extra_args* is a sequence of "(name, value)" pairs that will be used as extra keywords arguments when the archiver callable is used. *description* is used by "get_archive_formats()" which returns the list of archivers. Defaults to an empty string. shutil.unregister_archive_format(name) Remove the archive format *name* from the list of supported formats. shutil.unpack_archive(filename[, extract_dir[, format[, filter]]]) Unpack an archive. *filename* is the full path of the archive. *extract_dir* is the name of the target directory where the archive is unpacked. If not provided, the current working directory is used. *format* is the archive format: one of “zip”, “tar”, “gztar”, “bztar”, or “xztar”. Or any other format registered with "register_unpack_format()". If not provided, "unpack_archive()" will use the archive file name extension and see if an unpacker was registered for that extension. In case none is found, a "ValueError" is raised. The keyword-only *filter* argument, which was added in Python 3.9.17, is passed to the underlying unpacking function. For zip files, *filter* is not accepted. For tar files, it is recommended to set it to "'data'", unless using features specific to tar and UNIX-like filesystems. (See Extraction filters for details.) The "'data'" filter will become the default for tar files in Python 3.14. Raises an auditing event "shutil.unpack_archive" with arguments "filename", "extract_dir", "format". Warning: Never extract archives from untrusted sources without prior inspection. It is possible that files are created outside of the path specified in the *extract_dir* argument, e.g. members that have absolute filenames starting with “/” or filenames with two dots “..”. Changed in version 3.7: Accepts a *path-like object* for *filename* and *extract_dir*. Changed in version 3.9.17: Added the *filter* argument. shutil.register_unpack_format(name, extensions, function[, extra_args[, description]]) Registers an unpack format. *name* is the name of the format and *extensions* is a list of extensions corresponding to the format, like ".zip" for Zip files. *function* is the callable that will be used to unpack archives. The callable will receive: * the path of the archive, as a positional argument; * the directory the archive must be extracted to, as a positional argument; * possibly a *filter* keyword argument, if it was given to "unpack_archive()"; * additional keyword arguments, specified by *extra_args* as a sequence of "(name, value)" tuples. *description* can be provided to describe the format, and will be returned by the "get_unpack_formats()" function. shutil.unregister_unpack_format(name) Unregister an unpack format. *name* is the name of the format. shutil.get_unpack_formats() Return a list of all registered formats for unpacking. Each element of the returned sequence is a tuple "(name, extensions, description)". By default "shutil" provides these formats: * *zip*: ZIP file (unpacking compressed files works only if the corresponding module is available). * *tar*: uncompressed tar file. * *gztar*: gzip’ed tar-file (if the "zlib" module is available). * *bztar*: bzip2’ed tar-file (if the "bz2" module is available). * *xztar*: xz’ed tar-file (if the "lzma" module is available). You can register new formats or provide your own unpacker for any existing formats, by using "register_unpack_format()". Archiving example ----------------- In this example, we create a gzip’ed tar-file archive containing all files found in the ".ssh" directory of the user: >>> from shutil import make_archive >>> import os >>> archive_name = os.path.expanduser(os.path.join('~', 'myarchive')) >>> root_dir = os.path.expanduser(os.path.join('~', '.ssh')) >>> make_archive(archive_name, 'gztar', root_dir) '/Users/tarek/myarchive.tar.gz' The resulting archive contains: $ tar -tzvf /Users/tarek/myarchive.tar.gz drwx------ tarek/staff 0 2010-02-01 16:23:40 ./ -rw-r--r-- tarek/staff 609 2008-06-09 13:26:54 ./authorized_keys -rwxr-xr-x tarek/staff 65 2008-06-09 13:26:54 ./config -rwx------ tarek/staff 668 2008-06-09 13:26:54 ./id_dsa -rwxr-xr-x tarek/staff 609 2008-06-09 13:26:54 ./id_dsa.pub -rw------- tarek/staff 1675 2008-06-09 13:26:54 ./id_rsa -rw-r--r-- tarek/staff 397 2008-06-09 13:26:54 ./id_rsa.pub -rw-r--r-- tarek/staff 37192 2010-02-06 18:23:10 ./known_hosts Archiving example with *base_dir* --------------------------------- In this example, similar to the one above, we show how to use "make_archive()", but this time with the usage of *base_dir*. We now have the following directory structure: $ tree tmp tmp └── root └── structure ├── content └── please_add.txt └── do_not_add.txt In the final archive, "please_add.txt" should be included, but "do_not_add.txt" should not. Therefore we use the following: >>> from shutil import make_archive >>> import os >>> archive_name = os.path.expanduser(os.path.join('~', 'myarchive')) >>> make_archive( ... archive_name, ... 'tar', ... root_dir='tmp/root', ... base_dir='structure/content', ... ) '/Users/tarek/my_archive.tar' Listing the files in the resulting archive gives us: $ python -m tarfile -l /Users/tarek/myarchive.tar structure/content/ structure/content/please_add.txt Querying the size of the output terminal ======================================== shutil.get_terminal_size(fallback=(columns, lines)) Get the size of the terminal window. For each of the two dimensions, the environment variable, "COLUMNS" and "LINES" respectively, is checked. If the variable is defined and the value is a positive integer, it is used. When "COLUMNS" or "LINES" is not defined, which is the common case, the terminal connected to "sys.__stdout__" is queried by invoking "os.get_terminal_size()". If the terminal size cannot be successfully queried, either because the system doesn’t support querying, or because we are not connected to a terminal, the value given in "fallback" parameter is used. "fallback" defaults to "(80, 24)" which is the default size used by many terminal emulators. The value returned is a named tuple of type "os.terminal_size". See also: The Single UNIX Specification, Version 2, Other Environment Variables. New in version 3.3. "signal" — Set handlers for asynchronous events *********************************************** ====================================================================== This module provides mechanisms to use signal handlers in Python. General rules ============= The "signal.signal()" function allows defining custom handlers to be executed when a signal is received. A small number of default handlers are installed: "SIGPIPE" is ignored (so write errors on pipes and sockets can be reported as ordinary Python exceptions) and "SIGINT" is translated into a "KeyboardInterrupt" exception if the parent process has not changed it. A handler for a particular signal, once set, remains installed until it is explicitly reset (Python emulates the BSD style interface regardless of the underlying implementation), with the exception of the handler for "SIGCHLD", which follows the underlying implementation. Execution of Python signal handlers ----------------------------------- A Python signal handler does not get executed inside the low-level (C) signal handler. Instead, the low-level signal handler sets a flag which tells the *virtual machine* to execute the corresponding Python signal handler at a later point(for example at the next *bytecode* instruction). This has consequences: * It makes little sense to catch synchronous errors like "SIGFPE" or "SIGSEGV" that are caused by an invalid operation in C code. Python will return from the signal handler to the C code, which is likely to raise the same signal again, causing Python to apparently hang. From Python 3.3 onwards, you can use the "faulthandler" module to report on synchronous errors. * A long-running calculation implemented purely in C (such as regular expression matching on a large body of text) may run uninterrupted for an arbitrary amount of time, regardless of any signals received. The Python signal handlers will be called when the calculation finishes. * If the handler raises an exception, it will be raised “out of thin air” in the main thread. See the note below for a discussion. Signals and threads ------------------- Python signal handlers are always executed in the main Python thread of the main interpreter, even if the signal was received in another thread. This means that signals can’t be used as a means of inter- thread communication. You can use the synchronization primitives from the "threading" module instead. Besides, only the main thread of the main interpreter is allowed to set a new signal handler. Module contents =============== Changed in version 3.5: signal (SIG*), handler ("SIG_DFL", "SIG_IGN") and sigmask ("SIG_BLOCK", "SIG_UNBLOCK", "SIG_SETMASK") related constants listed below were turned into "enums". "getsignal()", "pthread_sigmask()", "sigpending()" and "sigwait()" functions return human-readable "enums". The variables defined in the "signal" module are: signal.SIG_DFL This is one of two standard signal handling options; it will simply perform the default function for the signal. For example, on most systems the default action for "SIGQUIT" is to dump core and exit, while the default action for "SIGCHLD" is to simply ignore it. signal.SIG_IGN This is another standard signal handler, which will simply ignore the given signal. signal.SIGABRT Abort signal from *abort(3)*. signal.SIGALRM Timer signal from *alarm(2)*. Availability: Unix. signal.SIGBREAK Interrupt from keyboard (CTRL + BREAK). Availability: Windows. signal.SIGBUS Bus error (bad memory access). Availability: Unix. signal.SIGCHLD Child process stopped or terminated. Availability: Unix. signal.SIGCLD Alias to "SIGCHLD". signal.SIGCONT Continue the process if it is currently stopped Availability: Unix. signal.SIGFPE Floating-point exception. For example, division by zero. See also: "ZeroDivisionError" is raised when the second argument of a division or modulo operation is zero. signal.SIGHUP Hangup detected on controlling terminal or death of controlling process. Availability: Unix. signal.SIGILL Illegal instruction. signal.SIGINT Interrupt from keyboard (CTRL + C). Default action is to raise "KeyboardInterrupt". signal.SIGKILL Kill signal. It cannot be caught, blocked, or ignored. Availability: Unix. signal.SIGPIPE Broken pipe: write to pipe with no readers. Default action is to ignore the signal. Availability: Unix. signal.SIGSEGV Segmentation fault: invalid memory reference. signal.SIGTERM Termination signal. signal.SIGUSR1 User-defined signal 1. Availability: Unix. signal.SIGUSR2 User-defined signal 2. Availability: Unix. signal.SIGWINCH Window resize signal. Availability: Unix. SIG* All the signal numbers are defined symbolically. For example, the hangup signal is defined as "signal.SIGHUP"; the variable names are identical to the names used in C programs, as found in "". The Unix man page for ‘"signal()"’ lists the existing signals (on some systems this is *signal(2)*, on others the list is in *signal(7)*). Note that not all systems define the same set of signal names; only those names defined by the system are defined by this module. signal.CTRL_C_EVENT The signal corresponding to the "Ctrl+C" keystroke event. This signal can only be used with "os.kill()". Availability: Windows. New in version 3.2. signal.CTRL_BREAK_EVENT The signal corresponding to the "Ctrl+Break" keystroke event. This signal can only be used with "os.kill()". Availability: Windows. New in version 3.2. signal.NSIG One more than the number of the highest signal number. signal.ITIMER_REAL Decrements interval timer in real time, and delivers "SIGALRM" upon expiration. signal.ITIMER_VIRTUAL Decrements interval timer only when the process is executing, and delivers SIGVTALRM upon expiration. signal.ITIMER_PROF Decrements interval timer both when the process executes and when the system is executing on behalf of the process. Coupled with ITIMER_VIRTUAL, this timer is usually used to profile the time spent by the application in user and kernel space. SIGPROF is delivered upon expiration. signal.SIG_BLOCK A possible value for the *how* parameter to "pthread_sigmask()" indicating that signals are to be blocked. New in version 3.3. signal.SIG_UNBLOCK A possible value for the *how* parameter to "pthread_sigmask()" indicating that signals are to be unblocked. New in version 3.3. signal.SIG_SETMASK A possible value for the *how* parameter to "pthread_sigmask()" indicating that the signal mask is to be replaced. New in version 3.3. The "signal" module defines one exception: exception signal.ItimerError Raised to signal an error from the underlying "setitimer()" or "getitimer()" implementation. Expect this error if an invalid interval timer or a negative time is passed to "setitimer()". This error is a subtype of "OSError". New in version 3.3: This error used to be a subtype of "IOError", which is now an alias of "OSError". The "signal" module defines the following functions: signal.alarm(time) If *time* is non-zero, this function requests that a "SIGALRM" signal be sent to the process in *time* seconds. Any previously scheduled alarm is canceled (only one alarm can be scheduled at any time). The returned value is then the number of seconds before any previously set alarm was to have been delivered. If *time* is zero, no alarm is scheduled, and any scheduled alarm is canceled. If the return value is zero, no alarm is currently scheduled. Availability: Unix. See the man page *alarm(2)* for further information. signal.getsignal(signalnum) Return the current signal handler for the signal *signalnum*. The returned value may be a callable Python object, or one of the special values "signal.SIG_IGN", "signal.SIG_DFL" or "None". Here, "signal.SIG_IGN" means that the signal was previously ignored, "signal.SIG_DFL" means that the default way of handling the signal was previously in use, and "None" means that the previous signal handler was not installed from Python. signal.strsignal(signalnum) Return the system description of the signal *signalnum*, such as “Interrupt”, “Segmentation fault”, etc. Returns "None" if the signal is not recognized. New in version 3.8. signal.valid_signals() Return the set of valid signal numbers on this platform. This can be less than "range(1, NSIG)" if some signals are reserved by the system for internal use. New in version 3.8. signal.pause() Cause the process to sleep until a signal is received; the appropriate handler will then be called. Returns nothing. Availability: Unix. See the man page *signal(2)* for further information. See also "sigwait()", "sigwaitinfo()", "sigtimedwait()" and "sigpending()". signal.raise_signal(signum) Sends a signal to the calling process. Returns nothing. New in version 3.8. signal.pidfd_send_signal(pidfd, sig, siginfo=None, flags=0) Send signal *sig* to the process referred to by file descriptor *pidfd*. Python does not currently support the *siginfo* parameter; it must be "None". The *flags* argument is provided for future extensions; no flag values are currently defined. See the *pidfd_send_signal(2)* man page for more information. Availability: Linux 5.1+ New in version 3.9. signal.pthread_kill(thread_id, signalnum) Send the signal *signalnum* to the thread *thread_id*, another thread in the same process as the caller. The target thread can be executing any code (Python or not). However, if the target thread is executing the Python interpreter, the Python signal handlers will be executed by the main thread of the main interpreter. Therefore, the only point of sending a signal to a particular Python thread would be to force a running system call to fail with "InterruptedError". Use "threading.get_ident()" or the "ident" attribute of "threading.Thread" objects to get a suitable value for *thread_id*. If *signalnum* is 0, then no signal is sent, but error checking is still performed; this can be used to check if the target thread is still running. Raises an auditing event "signal.pthread_kill" with arguments "thread_id", "signalnum". Availability: Unix. See the man page *pthread_kill(3)* for further information. See also "os.kill()". New in version 3.3. signal.pthread_sigmask(how, mask) Fetch and/or change the signal mask of the calling thread. The signal mask is the set of signals whose delivery is currently blocked for the caller. Return the old signal mask as a set of signals. The behavior of the call is dependent on the value of *how*, as follows. * "SIG_BLOCK": The set of blocked signals is the union of the current set and the *mask* argument. * "SIG_UNBLOCK": The signals in *mask* are removed from the current set of blocked signals. It is permissible to attempt to unblock a signal which is not blocked. * "SIG_SETMASK": The set of blocked signals is set to the *mask* argument. *mask* is a set of signal numbers (e.g. {"signal.SIGINT", "signal.SIGTERM"}). Use "valid_signals()" for a full mask including all signals. For example, "signal.pthread_sigmask(signal.SIG_BLOCK, [])" reads the signal mask of the calling thread. "SIGKILL" and "SIGSTOP" cannot be blocked. Availability: Unix. See the man page *sigprocmask(3)* and *pthread_sigmask(3)* for further information. See also "pause()", "sigpending()" and "sigwait()". New in version 3.3. signal.setitimer(which, seconds, interval=0.0) Sets given interval timer (one of "signal.ITIMER_REAL", "signal.ITIMER_VIRTUAL" or "signal.ITIMER_PROF") specified by *which* to fire after *seconds* (float is accepted, different from "alarm()") and after that every *interval* seconds (if *interval* is non-zero). The interval timer specified by *which* can be cleared by setting *seconds* to zero. When an interval timer fires, a signal is sent to the process. The signal sent is dependent on the timer being used; "signal.ITIMER_REAL" will deliver "SIGALRM", "signal.ITIMER_VIRTUAL" sends "SIGVTALRM", and "signal.ITIMER_PROF" will deliver "SIGPROF". The old values are returned as a tuple: (delay, interval). Attempting to pass an invalid interval timer will cause an "ItimerError". Availability: Unix. signal.getitimer(which) Returns current value of a given interval timer specified by *which*. Availability: Unix. signal.set_wakeup_fd(fd, *, warn_on_full_buffer=True) Set the wakeup file descriptor to *fd*. When a signal is received, the signal number is written as a single byte into the fd. This can be used by a library to wakeup a poll or select call, allowing the signal to be fully processed. The old wakeup fd is returned (or -1 if file descriptor wakeup was not enabled). If *fd* is -1, file descriptor wakeup is disabled. If not -1, *fd* must be non-blocking. It is up to the library to remove any bytes from *fd* before calling poll or select again. When threads are enabled, this function can only be called from the main thread of the main interpreter; attempting to call it from other threads will cause a "ValueError" exception to be raised. There are two common ways to use this function. In both approaches, you use the fd to wake up when a signal arrives, but then they differ in how they determine *which* signal or signals have arrived. In the first approach, we read the data out of the fd’s buffer, and the byte values give you the signal numbers. This is simple, but in rare cases it can run into a problem: generally the fd will have a limited amount of buffer space, and if too many signals arrive too quickly, then the buffer may become full, and some signals may be lost. If you use this approach, then you should set "warn_on_full_buffer=True", which will at least cause a warning to be printed to stderr when signals are lost. In the second approach, we use the wakeup fd *only* for wakeups, and ignore the actual byte values. In this case, all we care about is whether the fd’s buffer is empty or non-empty; a full buffer doesn’t indicate a problem at all. If you use this approach, then you should set "warn_on_full_buffer=False", so that your users are not confused by spurious warning messages. Changed in version 3.5: On Windows, the function now also supports socket handles. Changed in version 3.7: Added "warn_on_full_buffer" parameter. signal.siginterrupt(signalnum, flag) Change system call restart behaviour: if *flag* is "False", system calls will be restarted when interrupted by signal *signalnum*, otherwise system calls will be interrupted. Returns nothing. Availability: Unix. See the man page *siginterrupt(3)* for further information. Note that installing a signal handler with "signal()" will reset the restart behaviour to interruptible by implicitly calling "siginterrupt()" with a true *flag* value for the given signal. signal.signal(signalnum, handler) Set the handler for signal *signalnum* to the function *handler*. *handler* can be a callable Python object taking two arguments (see below), or one of the special values "signal.SIG_IGN" or "signal.SIG_DFL". The previous signal handler will be returned (see the description of "getsignal()" above). (See the Unix man page *signal(2)* for further information.) When threads are enabled, this function can only be called from the main thread of the main interpreter; attempting to call it from other threads will cause a "ValueError" exception to be raised. The *handler* is called with two arguments: the signal number and the current stack frame ("None" or a frame object; for a description of frame objects, see the description in the type hierarchy or see the attribute descriptions in the "inspect" module). On Windows, "signal()" can only be called with "SIGABRT", "SIGFPE", "SIGILL", "SIGINT", "SIGSEGV", "SIGTERM", or "SIGBREAK". A "ValueError" will be raised in any other case. Note that not all systems define the same set of signal names; an "AttributeError" will be raised if a signal name is not defined as "SIG*" module level constant. signal.sigpending() Examine the set of signals that are pending for delivery to the calling thread (i.e., the signals which have been raised while blocked). Return the set of the pending signals. Availability: Unix. See the man page *sigpending(2)* for further information. See also "pause()", "pthread_sigmask()" and "sigwait()". New in version 3.3. signal.sigwait(sigset) Suspend execution of the calling thread until the delivery of one of the signals specified in the signal set *sigset*. The function accepts the signal (removes it from the pending list of signals), and returns the signal number. Availability: Unix. See the man page *sigwait(3)* for further information. See also "pause()", "pthread_sigmask()", "sigpending()", "sigwaitinfo()" and "sigtimedwait()". New in version 3.3. signal.sigwaitinfo(sigset) Suspend execution of the calling thread until the delivery of one of the signals specified in the signal set *sigset*. The function accepts the signal and removes it from the pending list of signals. If one of the signals in *sigset* is already pending for the calling thread, the function will return immediately with information about that signal. The signal handler is not called for the delivered signal. The function raises an "InterruptedError" if it is interrupted by a signal that is not in *sigset*. The return value is an object representing the data contained in the "siginfo_t" structure, namely: "si_signo", "si_code", "si_errno", "si_pid", "si_uid", "si_status", "si_band". Availability: Unix. See the man page *sigwaitinfo(2)* for further information. See also "pause()", "sigwait()" and "sigtimedwait()". New in version 3.3. Changed in version 3.5: The function is now retried if interrupted by a signal not in *sigset* and the signal handler does not raise an exception (see **PEP 475** for the rationale). signal.sigtimedwait(sigset, timeout) Like "sigwaitinfo()", but takes an additional *timeout* argument specifying a timeout. If *timeout* is specified as "0", a poll is performed. Returns "None" if a timeout occurs. Availability: Unix. See the man page *sigtimedwait(2)* for further information. See also "pause()", "sigwait()" and "sigwaitinfo()". New in version 3.3. Changed in version 3.5: The function is now retried with the recomputed *timeout* if interrupted by a signal not in *sigset* and the signal handler does not raise an exception (see **PEP 475** for the rationale). Example ======= Here is a minimal example program. It uses the "alarm()" function to limit the time spent waiting to open a file; this is useful if the file is for a serial device that may not be turned on, which would normally cause the "os.open()" to hang indefinitely. The solution is to set a 5-second alarm before opening the file; if the operation takes too long, the alarm signal will be sent, and the handler raises an exception. import signal, os def handler(signum, frame): print('Signal handler called with signal', signum) raise OSError("Couldn't open device!") # Set the signal handler and a 5-second alarm signal.signal(signal.SIGALRM, handler) signal.alarm(5) # This open() may hang indefinitely fd = os.open('/dev/ttyS0', os.O_RDWR) signal.alarm(0) # Disable the alarm Note on SIGPIPE =============== Piping output of your program to tools like *head(1)* will cause a "SIGPIPE" signal to be sent to your process when the receiver of its standard output closes early. This results in an exception like "BrokenPipeError: [Errno 32] Broken pipe". To handle this case, wrap your entry point to catch this exception as follows: import os import sys def main(): try: # simulate large output (your code replaces this loop) for x in range(10000): print("y") # flush output here to force SIGPIPE to be triggered # while inside this try block. sys.stdout.flush() except BrokenPipeError: # Python flushes standard streams on exit; redirect remaining output # to devnull to avoid another BrokenPipeError at shutdown devnull = os.open(os.devnull, os.O_WRONLY) os.dup2(devnull, sys.stdout.fileno()) sys.exit(1) # Python exits with error code 1 on EPIPE if __name__ == '__main__': main() Do not set "SIGPIPE"’s disposition to "SIG_DFL" in order to avoid "BrokenPipeError". Doing that would cause your program to exit unexpectedly whenever any socket connection is interrupted while your program is still writing to it. Note on Signal Handlers and Exceptions ====================================== If a signal handler raises an exception, the exception will be propagated to the main thread and may be raised after any *bytecode* instruction. Most notably, a "KeyboardInterrupt" may appear at any point during execution. Most Python code, including the standard library, cannot be made robust against this, and so a "KeyboardInterrupt" (or any other exception resulting from a signal handler) may on rare occasions put the program in an unexpected state. To illustrate this issue, consider the following code: class SpamContext: def __init__(self): self.lock = threading.Lock() def __enter__(self): # If KeyboardInterrupt occurs here, everything is fine self.lock.acquire() # If KeyboardInterrupt occcurs here, __exit__ will not be called ... # KeyboardInterrupt could occur just before the function returns def __exit__(self, exc_type, exc_val, exc_tb): ... self.lock.release() For many programs, especially those that merely want to exit on "KeyboardInterrupt", this is not a problem, but applications that are complex or require high reliability should avoid raising exceptions from signal handlers. They should also avoid catching "KeyboardInterrupt" as a means of gracefully shutting down. Instead, they should install their own "SIGINT" handler. Below is an example of an HTTP server that avoids "KeyboardInterrupt": import signal import socket from selectors import DefaultSelector, EVENT_READ from http.server import HTTPServer, SimpleHTTPRequestHandler interrupt_read, interrupt_write = socket.socketpair() def handler(signum, frame): print('Signal handler called with signal', signum) interrupt_write.send(b'\0') signal.signal(signal.SIGINT, handler) def serve_forever(httpd): sel = DefaultSelector() sel.register(interrupt_read, EVENT_READ) sel.register(httpd, EVENT_READ) while True: for key, _ in sel.select(): if key.fileobj == interrupt_read: interrupt_read.recv(1) return if key.fileobj == httpd: httpd.handle_request() print("Serving on port 8000") httpd = HTTPServer(('', 8000), SimpleHTTPRequestHandler) serve_forever(httpd) print("Shutdown...") "site" — Site-specific configuration hook ***************************************** **Source code:** Lib/site.py ====================================================================== **This module is automatically imported during initialization.** The automatic import can be suppressed using the interpreter’s "-S" option. Importing this module will append site-specific paths to the module search path and add a few builtins, unless "-S" was used. In that case, this module can be safely imported with no automatic modifications to the module search path or additions to the builtins. To explicitly trigger the usual site-specific additions, call the "site.main()" function. Changed in version 3.3: Importing the module used to trigger paths manipulation even when using "-S". It starts by constructing up to four directories from a head and a tail part. For the head part, it uses "sys.prefix" and "sys.exec_prefix"; empty heads are skipped. For the tail part, it uses the empty string and then "lib/site-packages" (on Windows) or "lib/python*X.Y*/site-packages" (on Unix and macOS). For each of the distinct head-tail combinations, it sees if it refers to an existing directory, and if so, adds it to "sys.path" and also inspects the newly added path for configuration files. Changed in version 3.5: Support for the “site-python” directory has been removed. If a file named “pyvenv.cfg” exists one directory above sys.executable, sys.prefix and sys.exec_prefix are set to that directory and it is also checked for site-packages (sys.base_prefix and sys.base_exec_prefix will always be the “real” prefixes of the Python installation). If “pyvenv.cfg” (a bootstrap configuration file) contains the key “include-system-site-packages” set to anything other than “true” (case-insensitive), the system-level prefixes will not be searched for site-packages; otherwise they will. A path configuration file is a file whose name has the form "*name*.pth" and exists in one of the four directories mentioned above; its contents are additional items (one per line) to be added to "sys.path". Non-existing items are never added to "sys.path", and no check is made that the item refers to a directory rather than a file. No item is added to "sys.path" more than once. Blank lines and lines beginning with "#" are skipped. Lines starting with "import" (followed by space or tab) are executed. Note: An executable line in a ".pth" file is run at every Python startup, regardless of whether a particular module is actually going to be used. Its impact should thus be kept to a minimum. The primary intended purpose of executable lines is to make the corresponding module(s) importable (load 3rd-party import hooks, adjust "PATH" etc). Any other initialization is supposed to be done upon a module’s actual import, if and when it happens. Limiting a code chunk to a single line is a deliberate measure to discourage putting anything more complex here. For example, suppose "sys.prefix" and "sys.exec_prefix" are set to "/usr/local". The Python X.Y library is then installed in "/usr/local/lib/python*X.Y*". Suppose this has a subdirectory "/usr/local/lib/python*X.Y*/site-packages" with three subsubdirectories, "foo", "bar" and "spam", and two path configuration files, "foo.pth" and "bar.pth". Assume "foo.pth" contains the following: # foo package configuration foo bar bletch and "bar.pth" contains: # bar package configuration bar Then the following version-specific directories are added to "sys.path", in this order: /usr/local/lib/pythonX.Y/site-packages/bar /usr/local/lib/pythonX.Y/site-packages/foo Note that "bletch" is omitted because it doesn’t exist; the "bar" directory precedes the "foo" directory because "bar.pth" comes alphabetically before "foo.pth"; and "spam" is omitted because it is not mentioned in either path configuration file. After these path manipulations, an attempt is made to import a module named "sitecustomize", which can perform arbitrary site-specific customizations. It is typically created by a system administrator in the site-packages directory. If this import fails with an "ImportError" or its subclass exception, and the exception’s "name" attribute equals to "'sitecustomize'", it is silently ignored. If Python is started without output streams available, as with "pythonw.exe" on Windows (which is used by default to start IDLE), attempted output from "sitecustomize" is ignored. Any other exception causes a silent and perhaps mysterious failure of the process. After this, an attempt is made to import a module named "usercustomize", which can perform arbitrary user-specific customizations, if "ENABLE_USER_SITE" is true. This file is intended to be created in the user site-packages directory (see below), which is part of "sys.path" unless disabled by "-s". If this import fails with an "ImportError" or its subclass exception, and the exception’s "name" attribute equals to "'usercustomize'", it is silently ignored. Note that for some non-Unix systems, "sys.prefix" and "sys.exec_prefix" are empty, and the path manipulations are skipped; however the import of "sitecustomize" and "usercustomize" is still attempted. Readline configuration ====================== On systems that support "readline", this module will also import and configure the "rlcompleter" module, if Python is started in interactive mode and without the "-S" option. The default behavior is enable tab-completion and to use "~/.python_history" as the history save file. To disable it, delete (or override) the "sys.__interactivehook__" attribute in your "sitecustomize" or "usercustomize" module or your "PYTHONSTARTUP" file. Changed in version 3.4: Activation of rlcompleter and history was made automatic. Module contents =============== site.PREFIXES A list of prefixes for site-packages directories. site.ENABLE_USER_SITE Flag showing the status of the user site-packages directory. "True" means that it is enabled and was added to "sys.path". "False" means that it was disabled by user request (with "-s" or "PYTHONNOUSERSITE"). "None" means it was disabled for security reasons (mismatch between user or group id and effective id) or by an administrator. site.USER_SITE Path to the user site-packages for the running Python. Can be "None" if "getusersitepackages()" hasn’t been called yet. Default value is "~/.local/lib/python*X.Y*/site-packages" for UNIX and non- framework macOS builds, "~/Library/Python/*X.Y*/lib/python/site- packages" for macOS framework builds, and "*%APPDATA%*\Python\Python*XY*\site-packages" on Windows. This directory is a site directory, which means that ".pth" files in it will be processed. site.USER_BASE Path to the base directory for the user site-packages. Can be "None" if "getuserbase()" hasn’t been called yet. Default value is "~/.local" for UNIX and macOS non-framework builds, "~/Library/Python/*X.Y*" for macOS framework builds, and "*%APPDATA%*\Python" for Windows. This value is used by Distutils to compute the installation directories for scripts, data files, Python modules, etc. for the user installation scheme. See also "PYTHONUSERBASE". site.main() Adds all the standard site-specific directories to the module search path. This function is called automatically when this module is imported, unless the Python interpreter was started with the "-S" flag. Changed in version 3.3: This function used to be called unconditionally. site.addsitedir(sitedir, known_paths=None) Add a directory to sys.path and process its ".pth" files. Typically used in "sitecustomize" or "usercustomize" (see above). site.getsitepackages() Return a list containing all global site-packages directories. New in version 3.2. site.getuserbase() Return the path of the user base directory, "USER_BASE". If it is not initialized yet, this function will also set it, respecting "PYTHONUSERBASE". New in version 3.2. site.getusersitepackages() Return the path of the user-specific site-packages directory, "USER_SITE". If it is not initialized yet, this function will also set it, respecting "USER_BASE". To determine if the user-specific site-packages was added to "sys.path" "ENABLE_USER_SITE" should be used. New in version 3.2. Command Line Interface ====================== The "site" module also provides a way to get the user directories from the command line: $ python3 -m site --user-site /home/user/.local/lib/python3.3/site-packages If it is called without arguments, it will print the contents of "sys.path" on the standard output, followed by the value of "USER_BASE" and whether the directory exists, then the same thing for "USER_SITE", and finally the value of "ENABLE_USER_SITE". --user-base Print the path to the user base directory. --user-site Print the path to the user site-packages directory. If both options are given, user base and user site will be printed (always in this order), separated by "os.pathsep". If any option is given, the script will exit with one of these values: "0" if the user site-packages directory is enabled, "1" if it was disabled by the user, "2" if it is disabled for security reasons or by an administrator, and a value greater than 2 if there is an error. See also: **PEP 370** – Per user site-packages directory "smtpd" — SMTP Server ********************* **Source code:** Lib/smtpd.py ====================================================================== This module offers several classes to implement SMTP (email) servers. Deprecated since version 3.6: "smtpd" will be removed in Python 3.12 (see **PEP 594** for details). The aiosmtpd package is a recommended replacement for this module. It is based on "asyncio" and provides a more straightforward API. Several server implementations are present; one is a generic do- nothing implementation, which can be overridden, while the other two offer specific mail-sending strategies. Additionally the SMTPChannel may be extended to implement very specific interaction behaviour with SMTP clients. The code supports **RFC 5321**, plus the **RFC 1870** SIZE and **RFC 6531** SMTPUTF8 extensions. SMTPServer Objects ================== class smtpd.SMTPServer(localaddr, remoteaddr, data_size_limit=33554432, map=None, enable_SMTPUTF8=False, decode_data=False) Create a new "SMTPServer" object, which binds to local address *localaddr*. It will treat *remoteaddr* as an upstream SMTP relayer. Both *localaddr* and *remoteaddr* should be a (host, port) tuple. The object inherits from "asyncore.dispatcher", and so will insert itself into "asyncore"’s event loop on instantiation. *data_size_limit* specifies the maximum number of bytes that will be accepted in a "DATA" command. A value of "None" or "0" means no limit. *map* is the socket map to use for connections (an initially empty dictionary is a suitable value). If not specified the "asyncore" global socket map is used. *enable_SMTPUTF8* determines whether the "SMTPUTF8" extension (as defined in **RFC 6531**) should be enabled. The default is "False". When "True", "SMTPUTF8" is accepted as a parameter to the "MAIL" command and when present is passed to "process_message()" in the "kwargs['mail_options']" list. *decode_data* and *enable_SMTPUTF8* cannot be set to "True" at the same time. *decode_data* specifies whether the data portion of the SMTP transaction should be decoded using UTF-8. When *decode_data* is "False" (the default), the server advertises the "8BITMIME" extension (**RFC 6152**), accepts the "BODY=8BITMIME" parameter to the "MAIL" command, and when present passes it to "process_message()" in the "kwargs['mail_options']" list. *decode_data* and *enable_SMTPUTF8* cannot be set to "True" at the same time. process_message(peer, mailfrom, rcpttos, data, **kwargs) Raise a "NotImplementedError" exception. Override this in subclasses to do something useful with this message. Whatever was passed in the constructor as *remoteaddr* will be available as the "_remoteaddr" attribute. *peer* is the remote host’s address, *mailfrom* is the envelope originator, *rcpttos* are the envelope recipients and *data* is a string containing the contents of the e-mail (which should be in **RFC 5321** format). If the *decode_data* constructor keyword is set to "True", the *data* argument will be a unicode string. If it is set to "False", it will be a bytes object. *kwargs* is a dictionary containing additional information. It is empty if "decode_data=True" was given as an init argument, otherwise it contains the following keys: *mail_options*: a list of all received parameters to the "MAIL" command (the elements are uppercase strings; example: "['BODY=8BITMIME', 'SMTPUTF8']"). *rcpt_options*: same as *mail_options* but for the "RCPT" command. Currently no "RCPT TO" options are supported, so for now this will always be an empty list. Implementations of "process_message" should use the "**kwargs" signature to accept arbitrary keyword arguments, since future feature enhancements may add keys to the kwargs dictionary. Return "None" to request a normal "250 Ok" response; otherwise return the desired response string in **RFC 5321** format. channel_class Override this in subclasses to use a custom "SMTPChannel" for managing SMTP clients. New in version 3.4: The *map* constructor argument. Changed in version 3.5: *localaddr* and *remoteaddr* may now contain IPv6 addresses. New in version 3.5: The *decode_data* and *enable_SMTPUTF8* constructor parameters, and the *kwargs* parameter to "process_message()" when *decode_data* is "False". Changed in version 3.6: *decode_data* is now "False" by default. DebuggingServer Objects ======================= class smtpd.DebuggingServer(localaddr, remoteaddr) Create a new debugging server. Arguments are as per "SMTPServer". Messages will be discarded, and printed on stdout. PureProxy Objects ================= class smtpd.PureProxy(localaddr, remoteaddr) Create a new pure proxy server. Arguments are as per "SMTPServer". Everything will be relayed to *remoteaddr*. Note that running this has a good chance to make you into an open relay, so please be careful. MailmanProxy Objects ==================== class smtpd.MailmanProxy(localaddr, remoteaddr) Deprecated since version 3.9, will be removed in version 3.11: "MailmanProxy" is deprecated, it depends on a "Mailman" module which no longer exists and therefore is already broken. Create a new pure proxy server. Arguments are as per "SMTPServer". Everything will be relayed to *remoteaddr*, unless local mailman configurations knows about an address, in which case it will be handled via mailman. Note that running this has a good chance to make you into an open relay, so please be careful. SMTPChannel Objects =================== class smtpd.SMTPChannel(server, conn, addr, data_size_limit=33554432, map=None, enable_SMTPUTF8=False, decode_data=False) Create a new "SMTPChannel" object which manages the communication between the server and a single SMTP client. *conn* and *addr* are as per the instance variables described below. *data_size_limit* specifies the maximum number of bytes that will be accepted in a "DATA" command. A value of "None" or "0" means no limit. *enable_SMTPUTF8* determines whether the "SMTPUTF8" extension (as defined in **RFC 6531**) should be enabled. The default is "False". *decode_data* and *enable_SMTPUTF8* cannot be set to "True" at the same time. A dictionary can be specified in *map* to avoid using a global socket map. *decode_data* specifies whether the data portion of the SMTP transaction should be decoded using UTF-8. The default is "False". *decode_data* and *enable_SMTPUTF8* cannot be set to "True" at the same time. To use a custom SMTPChannel implementation you need to override the "SMTPServer.channel_class" of your "SMTPServer". Changed in version 3.5: The *decode_data* and *enable_SMTPUTF8* parameters were added. Changed in version 3.6: *decode_data* is now "False" by default. The "SMTPChannel" has the following instance variables: smtp_server Holds the "SMTPServer" that spawned this channel. conn Holds the socket object connecting to the client. addr Holds the address of the client, the second value returned by "socket.accept" received_lines Holds a list of the line strings (decoded using UTF-8) received from the client. The lines have their ""\r\n"" line ending translated to ""\n"". smtp_state Holds the current state of the channel. This will be either "COMMAND" initially and then "DATA" after the client sends a “DATA” line. seen_greeting Holds a string containing the greeting sent by the client in its “HELO”. mailfrom Holds a string containing the address identified in the “MAIL FROM:” line from the client. rcpttos Holds a list of strings containing the addresses identified in the “RCPT TO:” lines from the client. received_data Holds a string containing all of the data sent by the client during the DATA state, up to but not including the terminating ""\r\n.\r\n"". fqdn Holds the fully-qualified domain name of the server as returned by "socket.getfqdn()". peer Holds the name of the client peer as returned by "conn.getpeername()" where "conn" is "conn". The "SMTPChannel" operates by invoking methods named "smtp_" upon reception of a command line from the client. Built into the base "SMTPChannel" class are methods for handling the following commands (and responding to them appropriately): +----------+---------------------------------------------------------------------+ | Command | Action taken | |==========|=====================================================================| | HELO | Accepts the greeting from the client and stores it in | | | "seen_greeting". Sets server to base command mode. | +----------+---------------------------------------------------------------------+ | EHLO | Accepts the greeting from the client and stores it in | | | "seen_greeting". Sets server to extended command mode. | +----------+---------------------------------------------------------------------+ | NOOP | Takes no action. | +----------+---------------------------------------------------------------------+ | QUIT | Closes the connection cleanly. | +----------+---------------------------------------------------------------------+ | MAIL | Accepts the “MAIL FROM:” syntax and stores the supplied address as | | | "mailfrom". In extended command mode, accepts the **RFC 1870** | | | SIZE attribute and responds appropriately based on the value of | | | *data_size_limit*. | +----------+---------------------------------------------------------------------+ | RCPT | Accepts the “RCPT TO:” syntax and stores the supplied addresses in | | | the "rcpttos" list. | +----------+---------------------------------------------------------------------+ | RSET | Resets the "mailfrom", "rcpttos", and "received_data", but not the | | | greeting. | +----------+---------------------------------------------------------------------+ | DATA | Sets the internal state to "DATA" and stores remaining lines from | | | the client in "received_data" until the terminator ""\r\n.\r\n"" is | | | received. | +----------+---------------------------------------------------------------------+ | HELP | Returns minimal information on command syntax | +----------+---------------------------------------------------------------------+ | VRFY | Returns code 252 (the server doesn’t know if the address is valid) | +----------+---------------------------------------------------------------------+ | EXPN | Reports that the command is not implemented. | +----------+---------------------------------------------------------------------+ "smtplib" — SMTP protocol client ******************************** **Source code:** Lib/smtplib.py ====================================================================== The "smtplib" module defines an SMTP client session object that can be used to send mail to any Internet machine with an SMTP or ESMTP listener daemon. For details of SMTP and ESMTP operation, consult **RFC 821** (Simple Mail Transfer Protocol) and **RFC 1869** (SMTP Service Extensions). class smtplib.SMTP(host='', port=0, local_hostname=None[, timeout], source_address=None) An "SMTP" instance encapsulates an SMTP connection. It has methods that support a full repertoire of SMTP and ESMTP operations. If the optional host and port parameters are given, the SMTP "connect()" method is called with those parameters during initialization. If specified, *local_hostname* is used as the FQDN of the local host in the HELO/EHLO command. Otherwise, the local hostname is found using "socket.getfqdn()". If the "connect()" call returns anything other than a success code, an "SMTPConnectError" is raised. The optional *timeout* parameter specifies a timeout in seconds for blocking operations like the connection attempt (if not specified, the global default timeout setting will be used). If the timeout expires, "socket.timeout" is raised. The optional source_address parameter allows binding to some specific source address in a machine with multiple network interfaces, and/or to some specific source TCP port. It takes a 2-tuple (host, port), for the socket to bind to as its source address before connecting. If omitted (or if host or port are "''" and/or 0 respectively) the OS default behavior will be used. For normal use, you should only require the initialization/connect, "sendmail()", and "SMTP.quit()" methods. An example is included below. The "SMTP" class supports the "with" statement. When used like this, the SMTP "QUIT" command is issued automatically when the "with" statement exits. E.g.: >>> from smtplib import SMTP >>> with SMTP("domain.org") as smtp: ... smtp.noop() ... (250, b'Ok') >>> All commands will raise an auditing event "smtplib.SMTP.send" with arguments "self" and "data", where "data" is the bytes about to be sent to the remote host. Changed in version 3.3: Support for the "with" statement was added. Changed in version 3.3: source_address argument was added. New in version 3.5: The SMTPUTF8 extension (**RFC 6531**) is now supported. Changed in version 3.9: If the *timeout* parameter is set to be zero, it will raise a "ValueError" to prevent the creation of a non-blocking socket class smtplib.SMTP_SSL(host='', port=0, local_hostname=None, keyfile=None, certfile=None[, timeout], context=None, source_address=None) An "SMTP_SSL" instance behaves exactly the same as instances of "SMTP". "SMTP_SSL" should be used for situations where SSL is required from the beginning of the connection and using "starttls()" is not appropriate. If *host* is not specified, the local host is used. If *port* is zero, the standard SMTP-over-SSL port (465) is used. The optional arguments *local_hostname*, *timeout* and *source_address* have the same meaning as they do in the "SMTP" class. *context*, also optional, can contain a "SSLContext" and allows configuring various aspects of the secure connection. Please read Security considerations for best practices. *keyfile* and *certfile* are a legacy alternative to *context*, and can point to a PEM formatted private key and certificate chain file for the SSL connection. Changed in version 3.3: *context* was added. Changed in version 3.3: source_address argument was added. Changed in version 3.4: The class now supports hostname check with "ssl.SSLContext.check_hostname" and *Server Name Indication* (see "ssl.HAS_SNI"). Deprecated since version 3.6: *keyfile* and *certfile* are deprecated in favor of *context*. Please use "ssl.SSLContext.load_cert_chain()" instead, or let "ssl.create_default_context()" select the system’s trusted CA certificates for you. Changed in version 3.9: If the *timeout* parameter is set to be zero, it will raise a "ValueError" to prevent the creation of a non-blocking socket class smtplib.LMTP(host='', port=LMTP_PORT, local_hostname=None, source_address=None[, timeout]) The LMTP protocol, which is very similar to ESMTP, is heavily based on the standard SMTP client. It’s common to use Unix sockets for LMTP, so our "connect()" method must support that as well as a regular host:port server. The optional arguments local_hostname and source_address have the same meaning as they do in the "SMTP" class. To specify a Unix socket, you must use an absolute path for *host*, starting with a ‘/’. Authentication is supported, using the regular SMTP mechanism. When using a Unix socket, LMTP generally don’t support or require any authentication, but your mileage might vary. Changed in version 3.9: The optional *timeout* parameter was added. A nice selection of exceptions is defined as well: exception smtplib.SMTPException Subclass of "OSError" that is the base exception class for all the other exceptions provided by this module. Changed in version 3.4: SMTPException became subclass of "OSError" exception smtplib.SMTPServerDisconnected This exception is raised when the server unexpectedly disconnects, or when an attempt is made to use the "SMTP" instance before connecting it to a server. exception smtplib.SMTPResponseException Base class for all exceptions that include an SMTP error code. These exceptions are generated in some instances when the SMTP server returns an error code. The error code is stored in the "smtp_code" attribute of the error, and the "smtp_error" attribute is set to the error message. exception smtplib.SMTPSenderRefused Sender address refused. In addition to the attributes set by on all "SMTPResponseException" exceptions, this sets ‘sender’ to the string that the SMTP server refused. exception smtplib.SMTPRecipientsRefused All recipient addresses refused. The errors for each recipient are accessible through the attribute "recipients", which is a dictionary of exactly the same sort as "SMTP.sendmail()" returns. exception smtplib.SMTPDataError The SMTP server refused to accept the message data. exception smtplib.SMTPConnectError Error occurred during establishment of a connection with the server. exception smtplib.SMTPHeloError The server refused our "HELO" message. exception smtplib.SMTPNotSupportedError The command or option attempted is not supported by the server. New in version 3.5. exception smtplib.SMTPAuthenticationError SMTP authentication went wrong. Most probably the server didn’t accept the username/password combination provided. See also: **RFC 821** - Simple Mail Transfer Protocol Protocol definition for SMTP. This document covers the model, operating procedure, and protocol details for SMTP. **RFC 1869** - SMTP Service Extensions Definition of the ESMTP extensions for SMTP. This describes a framework for extending SMTP with new commands, supporting dynamic discovery of the commands provided by the server, and defines a few additional commands. SMTP Objects ============ An "SMTP" instance has the following methods: SMTP.set_debuglevel(level) Set the debug output level. A value of 1 or "True" for *level* results in debug messages for connection and for all messages sent to and received from the server. A value of 2 for *level* results in these messages being timestamped. Changed in version 3.5: Added debuglevel 2. SMTP.docmd(cmd, args='') Send a command *cmd* to the server. The optional argument *args* is simply concatenated to the command, separated by a space. This returns a 2-tuple composed of a numeric response code and the actual response line (multiline responses are joined into one long line.) In normal operation it should not be necessary to call this method explicitly. It is used to implement other methods and may be useful for testing private extensions. If the connection to the server is lost while waiting for the reply, "SMTPServerDisconnected" will be raised. SMTP.connect(host='localhost', port=0) Connect to a host on a given port. The defaults are to connect to the local host at the standard SMTP port (25). If the hostname ends with a colon ("':'") followed by a number, that suffix will be stripped off and the number interpreted as the port number to use. This method is automatically invoked by the constructor if a host is specified during instantiation. Returns a 2-tuple of the response code and message sent by the server in its connection response. Raises an auditing event "smtplib.connect" with arguments "self", "host", "port". SMTP.helo(name='') Identify yourself to the SMTP server using "HELO". The hostname argument defaults to the fully qualified domain name of the local host. The message returned by the server is stored as the "helo_resp" attribute of the object. In normal operation it should not be necessary to call this method explicitly. It will be implicitly called by the "sendmail()" when necessary. SMTP.ehlo(name='') Identify yourself to an ESMTP server using "EHLO". The hostname argument defaults to the fully qualified domain name of the local host. Examine the response for ESMTP option and store them for use by "has_extn()". Also sets several informational attributes: the message returned by the server is stored as the "ehlo_resp" attribute, "does_esmtp" is set to true or false depending on whether the server supports ESMTP, and "esmtp_features" will be a dictionary containing the names of the SMTP service extensions this server supports, and their parameters (if any). Unless you wish to use "has_extn()" before sending mail, it should not be necessary to call this method explicitly. It will be implicitly called by "sendmail()" when necessary. SMTP.ehlo_or_helo_if_needed() This method calls "ehlo()" and/or "helo()" if there has been no previous "EHLO" or "HELO" command this session. It tries ESMTP "EHLO" first. "SMTPHeloError" The server didn’t reply properly to the "HELO" greeting. SMTP.has_extn(name) Return "True" if *name* is in the set of SMTP service extensions returned by the server, "False" otherwise. Case is ignored. SMTP.verify(address) Check the validity of an address on this server using SMTP "VRFY". Returns a tuple consisting of code 250 and a full **RFC 822** address (including human name) if the user address is valid. Otherwise returns an SMTP error code of 400 or greater and an error string. Note: Many sites disable SMTP "VRFY" in order to foil spammers. SMTP.login(user, password, *, initial_response_ok=True) Log in on an SMTP server that requires authentication. The arguments are the username and the password to authenticate with. If there has been no previous "EHLO" or "HELO" command this session, this method tries ESMTP "EHLO" first. This method will return normally if the authentication was successful, or may raise the following exceptions: "SMTPHeloError" The server didn’t reply properly to the "HELO" greeting. "SMTPAuthenticationError" The server didn’t accept the username/password combination. "SMTPNotSupportedError" The "AUTH" command is not supported by the server. "SMTPException" No suitable authentication method was found. Each of the authentication methods supported by "smtplib" are tried in turn if they are advertised as supported by the server. See "auth()" for a list of supported authentication methods. *initial_response_ok* is passed through to "auth()". Optional keyword argument *initial_response_ok* specifies whether, for authentication methods that support it, an “initial response” as specified in **RFC 4954** can be sent along with the "AUTH" command, rather than requiring a challenge/response. Changed in version 3.5: "SMTPNotSupportedError" may be raised, and the *initial_response_ok* parameter was added. SMTP.auth(mechanism, authobject, *, initial_response_ok=True) Issue an "SMTP" "AUTH" command for the specified authentication *mechanism*, and handle the challenge response via *authobject*. *mechanism* specifies which authentication mechanism is to be used as argument to the "AUTH" command; the valid values are those listed in the "auth" element of "esmtp_features". *authobject* must be a callable object taking an optional single argument: data = authobject(challenge=None) If optional keyword argument *initial_response_ok* is true, "authobject()" will be called first with no argument. It can return the **RFC 4954** “initial response” ASCII "str" which will be encoded and sent with the "AUTH" command as below. If the "authobject()" does not support an initial response (e.g. because it requires a challenge), it should return "None" when called with "challenge=None". If *initial_response_ok* is false, then "authobject()" will not be called first with "None". If the initial response check returns "None", or if *initial_response_ok* is false, "authobject()" will be called to process the server’s challenge response; the *challenge* argument it is passed will be a "bytes". It should return ASCII "str" *data* that will be base64 encoded and sent to the server. The "SMTP" class provides "authobjects" for the "CRAM-MD5", "PLAIN", and "LOGIN" mechanisms; they are named "SMTP.auth_cram_md5", "SMTP.auth_plain", and "SMTP.auth_login" respectively. They all require that the "user" and "password" properties of the "SMTP" instance are set to appropriate values. User code does not normally need to call "auth" directly, but can instead call the "login()" method, which will try each of the above mechanisms in turn, in the order listed. "auth" is exposed to facilitate the implementation of authentication methods not (or not yet) supported directly by "smtplib". New in version 3.5. SMTP.starttls(keyfile=None, certfile=None, context=None) Put the SMTP connection in TLS (Transport Layer Security) mode. All SMTP commands that follow will be encrypted. You should then call "ehlo()" again. If *keyfile* and *certfile* are provided, they are used to create an "ssl.SSLContext". Optional *context* parameter is an "ssl.SSLContext" object; This is an alternative to using a keyfile and a certfile and if specified both *keyfile* and *certfile* should be "None". If there has been no previous "EHLO" or "HELO" command this session, this method tries ESMTP "EHLO" first. Deprecated since version 3.6: *keyfile* and *certfile* are deprecated in favor of *context*. Please use "ssl.SSLContext.load_cert_chain()" instead, or let "ssl.create_default_context()" select the system’s trusted CA certificates for you. "SMTPHeloError" The server didn’t reply properly to the "HELO" greeting. "SMTPNotSupportedError" The server does not support the STARTTLS extension. "RuntimeError" SSL/TLS support is not available to your Python interpreter. Changed in version 3.3: *context* was added. Changed in version 3.4: The method now supports hostname check with "SSLContext.check_hostname" and *Server Name Indicator* (see "HAS_SNI"). Changed in version 3.5: The error raised for lack of STARTTLS support is now the "SMTPNotSupportedError" subclass instead of the base "SMTPException". SMTP.sendmail(from_addr, to_addrs, msg, mail_options=(), rcpt_options=()) Send mail. The required arguments are an **RFC 822** from-address string, a list of **RFC 822** to-address strings (a bare string will be treated as a list with 1 address), and a message string. The caller may pass a list of ESMTP options (such as "8bitmime") to be used in "MAIL FROM" commands as *mail_options*. ESMTP options (such as "DSN" commands) that should be used with all "RCPT" commands can be passed as *rcpt_options*. (If you need to use different ESMTP options to different recipients you have to use the low-level methods such as "mail()", "rcpt()" and "data()" to send the message.) Note: The *from_addr* and *to_addrs* parameters are used to construct the message envelope used by the transport agents. "sendmail" does not modify the message headers in any way. *msg* may be a string containing characters in the ASCII range, or a byte string. A string is encoded to bytes using the ascii codec, and lone "\r" and "\n" characters are converted to "\r\n" characters. A byte string is not modified. If there has been no previous "EHLO" or "HELO" command this session, this method tries ESMTP "EHLO" first. If the server does ESMTP, message size and each of the specified options will be passed to it (if the option is in the feature set the server advertises). If "EHLO" fails, "HELO" will be tried and ESMTP options suppressed. This method will return normally if the mail is accepted for at least one recipient. Otherwise it will raise an exception. That is, if this method does not raise an exception, then someone should get your mail. If this method does not raise an exception, it returns a dictionary, with one entry for each recipient that was refused. Each entry contains a tuple of the SMTP error code and the accompanying error message sent by the server. If "SMTPUTF8" is included in *mail_options*, and the server supports it, *from_addr* and *to_addrs* may contain non-ASCII characters. This method may raise the following exceptions: "SMTPRecipientsRefused" All recipients were refused. Nobody got the mail. The "recipients" attribute of the exception object is a dictionary with information about the refused recipients (like the one returned when at least one recipient was accepted). "SMTPHeloError" The server didn’t reply properly to the "HELO" greeting. "SMTPSenderRefused" The server didn’t accept the *from_addr*. "SMTPDataError" The server replied with an unexpected error code (other than a refusal of a recipient). "SMTPNotSupportedError" "SMTPUTF8" was given in the *mail_options* but is not supported by the server. Unless otherwise noted, the connection will be open even after an exception is raised. Changed in version 3.2: *msg* may be a byte string. Changed in version 3.5: "SMTPUTF8" support added, and "SMTPNotSupportedError" may be raised if "SMTPUTF8" is specified but the server does not support it. SMTP.send_message(msg, from_addr=None, to_addrs=None, mail_options=(), rcpt_options=()) This is a convenience method for calling "sendmail()" with the message represented by an "email.message.Message" object. The arguments have the same meaning as for "sendmail()", except that *msg* is a "Message" object. If *from_addr* is "None" or *to_addrs* is "None", "send_message" fills those arguments with addresses extracted from the headers of *msg* as specified in **RFC 5322**: *from_addr* is set to the *Sender* field if it is present, and otherwise to the *From* field. *to_addrs* combines the values (if any) of the *To*, *Cc*, and *Bcc* fields from *msg*. If exactly one set of *Resent-** headers appear in the message, the regular headers are ignored and the *Resent-** headers are used instead. If the message contains more than one set of *Resent-** headers, a "ValueError" is raised, since there is no way to unambiguously detect the most recent set of *Resent-* headers. "send_message" serializes *msg* using "BytesGenerator" with "\r\n" as the *linesep*, and calls "sendmail()" to transmit the resulting message. Regardless of the values of *from_addr* and *to_addrs*, "send_message" does not transmit any *Bcc* or *Resent-Bcc* headers that may appear in *msg*. If any of the addresses in *from_addr* and *to_addrs* contain non-ASCII characters and the server does not advertise "SMTPUTF8" support, an "SMTPNotSupported" error is raised. Otherwise the "Message" is serialized with a clone of its "policy" with the "utf8" attribute set to "True", and "SMTPUTF8" and "BODY=8BITMIME" are added to *mail_options*. New in version 3.2. New in version 3.5: Support for internationalized addresses ("SMTPUTF8"). SMTP.quit() Terminate the SMTP session and close the connection. Return the result of the SMTP "QUIT" command. Low-level methods corresponding to the standard SMTP/ESMTP commands "HELP", "RSET", "NOOP", "MAIL", "RCPT", and "DATA" are also supported. Normally these do not need to be called directly, so they are not documented here. For details, consult the module code. SMTP Example ============ This example prompts the user for addresses needed in the message envelope (‘To’ and ‘From’ addresses), and the message to be delivered. Note that the headers to be included with the message must be included in the message as entered; this example doesn’t do any processing of the **RFC 822** headers. In particular, the ‘To’ and ‘From’ addresses must be included in the message headers explicitly. import smtplib def prompt(prompt): return input(prompt).strip() fromaddr = prompt("From: ") toaddrs = prompt("To: ").split() print("Enter message, end with ^D (Unix) or ^Z (Windows):") # Add the From: and To: headers at the start! msg = ("From: %s\r\nTo: %s\r\n\r\n" % (fromaddr, ", ".join(toaddrs))) while True: try: line = input() except EOFError: break if not line: break msg = msg + line print("Message length is", len(msg)) server = smtplib.SMTP('localhost') server.set_debuglevel(1) server.sendmail(fromaddr, toaddrs, msg) server.quit() Note: In general, you will want to use the "email" package’s features to construct an email message, which you can then send via "send_message()"; see email: Examples. "sndhdr" — Determine type of sound file *************************************** **Source code:** Lib/sndhdr.py Deprecated since version 3.11: The "sndhdr" module is deprecated (see **PEP 594** for details and alternatives). ====================================================================== The "sndhdr" provides utility functions which attempt to determine the type of sound data which is in a file. When these functions are able to determine what type of sound data is stored in a file, they return a "namedtuple()", containing five attributes: ("filetype", "framerate", "nchannels", "nframes", "sampwidth"). The value for *type* indicates the data type and will be one of the strings "'aifc'", "'aiff'", "'au'", "'hcom'", "'sndr'", "'sndt'", "'voc'", "'wav'", "'8svx'", "'sb'", "'ub'", or "'ul'". The *sampling_rate* will be either the actual value or "0" if unknown or difficult to decode. Similarly, *channels* will be either the number of channels or "0" if it cannot be determined or if the value is difficult to decode. The value for *frames* will be either the number of frames or "-1". The last item in the tuple, *bits_per_sample*, will either be the sample size in bits or "'A'" for A-LAW or "'U'" for u-LAW. sndhdr.what(filename) Determines the type of sound data stored in the file *filename* using "whathdr()". If it succeeds, returns a namedtuple as described above, otherwise "None" is returned. Changed in version 3.5: Result changed from a tuple to a namedtuple. sndhdr.whathdr(filename) Determines the type of sound data stored in a file based on the file header. The name of the file is given by *filename*. This function returns a namedtuple as described above on success, or "None". Changed in version 3.5: Result changed from a tuple to a namedtuple. "socket" — Low-level networking interface ***************************************** **Source code:** Lib/socket.py ====================================================================== This module provides access to the BSD *socket* interface. It is available on all modern Unix systems, Windows, MacOS, and probably additional platforms. Note: Some behavior may be platform dependent, since calls are made to the operating system socket APIs. The Python interface is a straightforward transliteration of the Unix system call and library interface for sockets to Python’s object- oriented style: the "socket()" function returns a *socket object* whose methods implement the various socket system calls. Parameter types are somewhat higher-level than in the C interface: as with "read()" and "write()" operations on Python files, buffer allocation on receive operations is automatic, and buffer length is implicit on send operations. See also: Module "socketserver" Classes that simplify writing network servers. Module "ssl" A TLS/SSL wrapper for socket objects. Socket families =============== Depending on the system and the build options, various socket families are supported by this module. The address format required by a particular socket object is automatically selected based on the address family specified when the socket object was created. Socket addresses are represented as follows: * The address of an "AF_UNIX" socket bound to a file system node is represented as a string, using the file system encoding and the "'surrogateescape'" error handler (see **PEP 383**). An address in Linux’s abstract namespace is returned as a *bytes-like object* with an initial null byte; note that sockets in this namespace can communicate with normal file system sockets, so programs intended to run on Linux may need to deal with both types of address. A string or bytes-like object can be used for either type of address when passing it as an argument. Changed in version 3.3: Previously, "AF_UNIX" socket paths were assumed to use UTF-8 encoding. Changed in version 3.5: Writable *bytes-like object* is now accepted. * A pair "(host, port)" is used for the "AF_INET" address family, where *host* is a string representing either a hostname in Internet domain notation like "'daring.cwi.nl'" or an IPv4 address like "'100.50.200.5'", and *port* is an integer. * For IPv4 addresses, two special forms are accepted instead of a host address: "''" represents "INADDR_ANY", which is used to bind to all interfaces, and the string "''" represents "INADDR_BROADCAST". This behavior is not compatible with IPv6, therefore, you may want to avoid these if you intend to support IPv6 with your Python programs. * For "AF_INET6" address family, a four-tuple "(host, port, flowinfo, scope_id)" is used, where *flowinfo* and *scope_id* represent the "sin6_flowinfo" and "sin6_scope_id" members in "struct sockaddr_in6" in C. For "socket" module methods, *flowinfo* and *scope_id* can be omitted just for backward compatibility. Note, however, omission of *scope_id* can cause problems in manipulating scoped IPv6 addresses. Changed in version 3.7: For multicast addresses (with *scope_id* meaningful) *address* may not contain "%scope_id" (or "zone id") part. This information is superfluous and may be safely omitted (recommended). * "AF_NETLINK" sockets are represented as pairs "(pid, groups)". * Linux-only support for TIPC is available using the "AF_TIPC" address family. TIPC is an open, non-IP based networked protocol designed for use in clustered computer environments. Addresses are represented by a tuple, and the fields depend on the address type. The general tuple form is "(addr_type, v1, v2, v3 [, scope])", where: * *addr_type* is one of "TIPC_ADDR_NAMESEQ", "TIPC_ADDR_NAME", or "TIPC_ADDR_ID". * *scope* is one of "TIPC_ZONE_SCOPE", "TIPC_CLUSTER_SCOPE", and "TIPC_NODE_SCOPE". * If *addr_type* is "TIPC_ADDR_NAME", then *v1* is the server type, *v2* is the port identifier, and *v3* should be 0. If *addr_type* is "TIPC_ADDR_NAMESEQ", then *v1* is the server type, *v2* is the lower port number, and *v3* is the upper port number. If *addr_type* is "TIPC_ADDR_ID", then *v1* is the node, *v2* is the reference, and *v3* should be set to 0. * A tuple "(interface, )" is used for the "AF_CAN" address family, where *interface* is a string representing a network interface name like "'can0'". The network interface name "''" can be used to receive packets from all network interfaces of this family. * "CAN_ISOTP" protocol require a tuple "(interface, rx_addr, tx_addr)" where both additional parameters are unsigned long integer that represent a CAN identifier (standard or extended). * "CAN_J1939" protocol require a tuple "(interface, name, pgn, addr)" where additional parameters are 64-bit unsigned integer representing the ECU name, a 32-bit unsigned integer representing the Parameter Group Number (PGN), and an 8-bit integer representing the address. * A string or a tuple "(id, unit)" is used for the "SYSPROTO_CONTROL" protocol of the "PF_SYSTEM" family. The string is the name of a kernel control using a dynamically-assigned ID. The tuple can be used if ID and unit number of the kernel control are known or if a registered ID is used. New in version 3.3. * "AF_BLUETOOTH" supports the following protocols and address formats: * "BTPROTO_L2CAP" accepts "(bdaddr, psm)" where "bdaddr" is the Bluetooth address as a string and "psm" is an integer. * "BTPROTO_RFCOMM" accepts "(bdaddr, channel)" where "bdaddr" is the Bluetooth address as a string and "channel" is an integer. * "BTPROTO_HCI" accepts "(device_id,)" where "device_id" is either an integer or a string with the Bluetooth address of the interface. (This depends on your OS; NetBSD and DragonFlyBSD expect a Bluetooth address while everything else expects an integer.) Changed in version 3.2: NetBSD and DragonFlyBSD support added. * "BTPROTO_SCO" accepts "bdaddr" where "bdaddr" is a "bytes" object containing the Bluetooth address in a string format. (ex. "b'12:23:34:45:56:67'") This protocol is not supported under FreeBSD. * "AF_ALG" is a Linux-only socket based interface to Kernel cryptography. An algorithm socket is configured with a tuple of two to four elements "(type, name [, feat [, mask]])", where: * *type* is the algorithm type as string, e.g. "aead", "hash", "skcipher" or "rng". * *name* is the algorithm name and operation mode as string, e.g. "sha256", "hmac(sha256)", "cbc(aes)" or "drbg_nopr_ctr_aes256". * *feat* and *mask* are unsigned 32bit integers. Availability: Linux 2.6.38, some algorithm types require more recent Kernels. New in version 3.6. * "AF_VSOCK" allows communication between virtual machines and their hosts. The sockets are represented as a "(CID, port)" tuple where the context ID or CID and port are integers. Availability: Linux >= 4.8 QEMU >= 2.8 ESX >= 4.0 ESX Workstation >= 6.5. New in version 3.7. * "AF_PACKET" is a low-level interface directly to network devices. The packets are represented by the tuple "(ifname, proto[, pkttype[, hatype[, addr]]])" where: * *ifname* - String specifying the device name. * *proto* - An in network-byte-order integer specifying the Ethernet protocol number. * *pkttype* - Optional integer specifying the packet type: * "PACKET_HOST" (the default) - Packet addressed to the local host. * "PACKET_BROADCAST" - Physical-layer broadcast packet. * "PACKET_MULTIHOST" - Packet sent to a physical-layer multicast address. * "PACKET_OTHERHOST" - Packet to some other host that has been caught by a device driver in promiscuous mode. * "PACKET_OUTGOING" - Packet originating from the local host that is looped back to a packet socket. * *hatype* - Optional integer specifying the ARP hardware address type. * *addr* - Optional bytes-like object specifying the hardware physical address, whose interpretation depends on the device. Availability: Linux >= 2.2. * "AF_QIPCRTR" is a Linux-only socket based interface for communicating with services running on co-processors in Qualcomm platforms. The address family is represented as a "(node, port)" tuple where the *node* and *port* are non-negative integers. Availability: Linux >= 4.7. New in version 3.8. * "IPPROTO_UDPLITE" is a variant of UDP which allows you to specify what portion of a packet is covered with the checksum. It adds two socket options that you can change. "self.setsockopt(IPPROTO_UDPLITE, UDPLITE_SEND_CSCOV, length)" will change what portion of outgoing packets are covered by the checksum and "self.setsockopt(IPPROTO_UDPLITE, UDPLITE_RECV_CSCOV, length)" will filter out packets which cover too little of their data. In both cases "length" should be in "range(8, 2**16, 8)". Such a socket should be constructed with "socket(AF_INET, SOCK_DGRAM, IPPROTO_UDPLITE)" for IPv4 or "socket(AF_INET6, SOCK_DGRAM, IPPROTO_UDPLITE)" for IPv6. Availability: Linux >= 2.6.20, FreeBSD >= 10.1-RELEASE New in version 3.9. If you use a hostname in the *host* portion of IPv4/v6 socket address, the program may show a nondeterministic behavior, as Python uses the first address returned from the DNS resolution. The socket address will be resolved differently into an actual IPv4/v6 address, depending on the results from DNS resolution and/or the host configuration. For deterministic behavior use a numeric address in *host* portion. All errors raise exceptions. The normal exceptions for invalid argument types and out-of-memory conditions can be raised; starting from Python 3.3, errors related to socket or address semantics raise "OSError" or one of its subclasses (they used to raise "socket.error"). Non-blocking mode is supported through "setblocking()". A generalization of this based on timeouts is supported through "settimeout()". Module contents =============== The module "socket" exports the following elements. Exceptions ---------- exception socket.error A deprecated alias of "OSError". Changed in version 3.3: Following **PEP 3151**, this class was made an alias of "OSError". exception socket.herror A subclass of "OSError", this exception is raised for address- related errors, i.e. for functions that use *h_errno* in the POSIX C API, including "gethostbyname_ex()" and "gethostbyaddr()". The accompanying value is a pair "(h_errno, string)" representing an error returned by a library call. *h_errno* is a numeric value, while *string* represents the description of *h_errno*, as returned by the "hstrerror()" C function. Changed in version 3.3: This class was made a subclass of "OSError". exception socket.gaierror A subclass of "OSError", this exception is raised for address- related errors by "getaddrinfo()" and "getnameinfo()". The accompanying value is a pair "(error, string)" representing an error returned by a library call. *string* represents the description of *error*, as returned by the "gai_strerror()" C function. The numeric *error* value will match one of the "EAI_*" constants defined in this module. Changed in version 3.3: This class was made a subclass of "OSError". exception socket.timeout A subclass of "OSError", this exception is raised when a timeout occurs on a socket which has had timeouts enabled via a prior call to "settimeout()" (or implicitly through "setdefaulttimeout()"). The accompanying value is a string whose value is currently always “timed out”. Changed in version 3.3: This class was made a subclass of "OSError". Constants --------- The AF_* and SOCK_* constants are now "AddressFamily" and "SocketKind" "IntEnum" collections. New in version 3.4. socket.AF_UNIX socket.AF_INET socket.AF_INET6 These constants represent the address (and protocol) families, used for the first argument to "socket()". If the "AF_UNIX" constant is not defined then this protocol is unsupported. More constants may be available depending on the system. socket.SOCK_STREAM socket.SOCK_DGRAM socket.SOCK_RAW socket.SOCK_RDM socket.SOCK_SEQPACKET These constants represent the socket types, used for the second argument to "socket()". More constants may be available depending on the system. (Only "SOCK_STREAM" and "SOCK_DGRAM" appear to be generally useful.) socket.SOCK_CLOEXEC socket.SOCK_NONBLOCK These two constants, if defined, can be combined with the socket types and allow you to set some flags atomically (thus avoiding possible race conditions and the need for separate calls). See also: Secure File Descriptor Handling for a more thorough explanation. Availability: Linux >= 2.6.27. New in version 3.2. SO_* socket.SOMAXCONN MSG_* SOL_* SCM_* IPPROTO_* IPPORT_* INADDR_* IP_* IPV6_* EAI_* AI_* NI_* TCP_* Many constants of these forms, documented in the Unix documentation on sockets and/or the IP protocol, are also defined in the socket module. They are generally used in arguments to the "setsockopt()" and "getsockopt()" methods of socket objects. In most cases, only those symbols that are defined in the Unix header files are defined; for a few symbols, default values are provided. Changed in version 3.6: "SO_DOMAIN", "SO_PROTOCOL", "SO_PEERSEC", "SO_PASSSEC", "TCP_USER_TIMEOUT", "TCP_CONGESTION" were added. Changed in version 3.6.5: On Windows, "TCP_FASTOPEN", "TCP_KEEPCNT" appear if run-time Windows supports. Changed in version 3.7: "TCP_NOTSENT_LOWAT" was added.On Windows, "TCP_KEEPIDLE", "TCP_KEEPINTVL" appear if run-time Windows supports. socket.AF_CAN socket.PF_CAN SOL_CAN_* CAN_* Many constants of these forms, documented in the Linux documentation, are also defined in the socket module. Availability: Linux >= 2.6.25. New in version 3.3. socket.CAN_BCM CAN_BCM_* CAN_BCM, in the CAN protocol family, is the broadcast manager (BCM) protocol. Broadcast manager constants, documented in the Linux documentation, are also defined in the socket module. Availability: Linux >= 2.6.25. Note: The "CAN_BCM_CAN_FD_FRAME" flag is only available on Linux >= 4.8. New in version 3.4. socket.CAN_RAW_FD_FRAMES Enables CAN FD support in a CAN_RAW socket. This is disabled by default. This allows your application to send both CAN and CAN FD frames; however, you must accept both CAN and CAN FD frames when reading from the socket. This constant is documented in the Linux documentation. Availability: Linux >= 3.6. New in version 3.5. socket.CAN_RAW_JOIN_FILTERS Joins the applied CAN filters such that only CAN frames that match all given CAN filters are passed to user space. This constant is documented in the Linux documentation. Availability: Linux >= 4.1. New in version 3.9. socket.CAN_ISOTP CAN_ISOTP, in the CAN protocol family, is the ISO-TP (ISO 15765-2) protocol. ISO-TP constants, documented in the Linux documentation. Availability: Linux >= 2.6.25. New in version 3.7. socket.CAN_J1939 CAN_J1939, in the CAN protocol family, is the SAE J1939 protocol. J1939 constants, documented in the Linux documentation. Availability: Linux >= 5.4. New in version 3.9. socket.AF_PACKET socket.PF_PACKET PACKET_* Many constants of these forms, documented in the Linux documentation, are also defined in the socket module. Availability: Linux >= 2.2. socket.AF_RDS socket.PF_RDS socket.SOL_RDS RDS_* Many constants of these forms, documented in the Linux documentation, are also defined in the socket module. Availability: Linux >= 2.6.30. New in version 3.3. socket.SIO_RCVALL socket.SIO_KEEPALIVE_VALS socket.SIO_LOOPBACK_FAST_PATH RCVALL_* Constants for Windows’ WSAIoctl(). The constants are used as arguments to the "ioctl()" method of socket objects. Changed in version 3.6: "SIO_LOOPBACK_FAST_PATH" was added. TIPC_* TIPC related constants, matching the ones exported by the C socket API. See the TIPC documentation for more information. socket.AF_ALG socket.SOL_ALG ALG_* Constants for Linux Kernel cryptography. Availability: Linux >= 2.6.38. New in version 3.6. socket.AF_VSOCK socket.IOCTL_VM_SOCKETS_GET_LOCAL_CID VMADDR* SO_VM* Constants for Linux host/guest communication. Availability: Linux >= 4.8. New in version 3.7. socket.AF_LINK Availability: BSD, macOS. New in version 3.4. socket.has_ipv6 This constant contains a boolean value which indicates if IPv6 is supported on this platform. socket.BDADDR_ANY socket.BDADDR_LOCAL These are string constants containing Bluetooth addresses with special meanings. For example, "BDADDR_ANY" can be used to indicate any address when specifying the binding socket with "BTPROTO_RFCOMM". socket.HCI_FILTER socket.HCI_TIME_STAMP socket.HCI_DATA_DIR For use with "BTPROTO_HCI". "HCI_FILTER" is not available for NetBSD or DragonFlyBSD. "HCI_TIME_STAMP" and "HCI_DATA_DIR" are not available for FreeBSD, NetBSD, or DragonFlyBSD. socket.AF_QIPCRTR Constant for Qualcomm’s IPC router protocol, used to communicate with service providing remote processors. Availability: Linux >= 4.7. Functions --------- Creating sockets ~~~~~~~~~~~~~~~~ The following functions all create socket objects. class socket.socket(family=AF_INET, type=SOCK_STREAM, proto=0, fileno=None) Create a new socket using the given address family, socket type and protocol number. The address family should be "AF_INET" (the default), "AF_INET6", "AF_UNIX", "AF_CAN", "AF_PACKET", or "AF_RDS". The socket type should be "SOCK_STREAM" (the default), "SOCK_DGRAM", "SOCK_RAW" or perhaps one of the other "SOCK_" constants. The protocol number is usually zero and may be omitted or in the case where the address family is "AF_CAN" the protocol should be one of "CAN_RAW", "CAN_BCM", "CAN_ISOTP" or "CAN_J1939". If *fileno* is specified, the values for *family*, *type*, and *proto* are auto-detected from the specified file descriptor. Auto-detection can be overruled by calling the function with explicit *family*, *type*, or *proto* arguments. This only affects how Python represents e.g. the return value of "socket.getpeername()" but not the actual OS resource. Unlike "socket.fromfd()", *fileno* will return the same socket and not a duplicate. This may help close a detached socket using "socket.close()". The newly created socket is non-inheritable. Raises an auditing event "socket.__new__" with arguments "self", "family", "type", "protocol". Changed in version 3.3: The AF_CAN family was added. The AF_RDS family was added. Changed in version 3.4: The CAN_BCM protocol was added. Changed in version 3.4: The returned socket is now non-inheritable. Changed in version 3.7: The CAN_ISOTP protocol was added. Changed in version 3.7: When "SOCK_NONBLOCK" or "SOCK_CLOEXEC" bit flags are applied to *type* they are cleared, and "socket.type" will not reflect them. They are still passed to the underlying system *socket()* call. Therefore, sock = socket.socket( socket.AF_INET, socket.SOCK_STREAM | socket.SOCK_NONBLOCK) will still create a non-blocking socket on OSes that support "SOCK_NONBLOCK", but "sock.type" will be set to "socket.SOCK_STREAM". Changed in version 3.9: The CAN_J1939 protocol was added. socket.socketpair([family[, type[, proto]]]) Build a pair of connected socket objects using the given address family, socket type, and protocol number. Address family, socket type, and protocol number are as for the "socket()" function above. The default family is "AF_UNIX" if defined on the platform; otherwise, the default is "AF_INET". The newly created sockets are non-inheritable. Changed in version 3.2: The returned socket objects now support the whole socket API, rather than a subset. Changed in version 3.4: The returned sockets are now non- inheritable. Changed in version 3.5: Windows support added. socket.create_connection(address[, timeout[, source_address]]) Connect to a TCP service listening on the Internet *address* (a 2-tuple "(host, port)"), and return the socket object. This is a higher-level function than "socket.connect()": if *host* is a non- numeric hostname, it will try to resolve it for both "AF_INET" and "AF_INET6", and then try to connect to all possible addresses in turn until a connection succeeds. This makes it easy to write clients that are compatible to both IPv4 and IPv6. Passing the optional *timeout* parameter will set the timeout on the socket instance before attempting to connect. If no *timeout* is supplied, the global default timeout setting returned by "getdefaulttimeout()" is used. If supplied, *source_address* must be a 2-tuple "(host, port)" for the socket to bind to as its source address before connecting. If host or port are ‘’ or 0 respectively the OS default behavior will be used. Changed in version 3.2: *source_address* was added. socket.create_server(address, *, family=AF_INET, backlog=None, reuse_port=False, dualstack_ipv6=False) Convenience function which creates a TCP socket bound to *address* (a 2-tuple "(host, port)") and return the socket object. *family* should be either "AF_INET" or "AF_INET6". *backlog* is the queue size passed to "socket.listen()"; when "0" a default reasonable value is chosen. *reuse_port* dictates whether to set the "SO_REUSEPORT" socket option. If *dualstack_ipv6* is true and the platform supports it the socket will be able to accept both IPv4 and IPv6 connections, else it will raise "ValueError". Most POSIX platforms and Windows are supposed to support this functionality. When this functionality is enabled the address returned by "socket.getpeername()" when an IPv4 connection occurs will be an IPv6 address represented as an IPv4-mapped IPv6 address. If *dualstack_ipv6* is false it will explicitly disable this functionality on platforms that enable it by default (e.g. Linux). This parameter can be used in conjunction with "has_dualstack_ipv6()": import socket addr = ("", 8080) # all interfaces, port 8080 if socket.has_dualstack_ipv6(): s = socket.create_server(addr, family=socket.AF_INET6, dualstack_ipv6=True) else: s = socket.create_server(addr) Note: On POSIX platforms the "SO_REUSEADDR" socket option is set in order to immediately reuse previous sockets which were bound on the same *address* and remained in TIME_WAIT state. New in version 3.8. socket.has_dualstack_ipv6() Return "True" if the platform supports creating a TCP socket which can handle both IPv4 and IPv6 connections. New in version 3.8. socket.fromfd(fd, family, type, proto=0) Duplicate the file descriptor *fd* (an integer as returned by a file object’s "fileno()" method) and build a socket object from the result. Address family, socket type and protocol number are as for the "socket()" function above. The file descriptor should refer to a socket, but this is not checked — subsequent operations on the object may fail if the file descriptor is invalid. This function is rarely needed, but can be used to get or set socket options on a socket passed to a program as standard input or output (such as a server started by the Unix inet daemon). The socket is assumed to be in blocking mode. The newly created socket is non-inheritable. Changed in version 3.4: The returned socket is now non-inheritable. socket.fromshare(data) Instantiate a socket from data obtained from the "socket.share()" method. The socket is assumed to be in blocking mode. Availability: Windows. New in version 3.3. socket.SocketType This is a Python type object that represents the socket object type. It is the same as "type(socket(...))". Other functions ~~~~~~~~~~~~~~~ The "socket" module also offers various network-related services: socket.close(fd) Close a socket file descriptor. This is like "os.close()", but for sockets. On some platforms (most noticeable Windows) "os.close()" does not work for socket file descriptors. New in version 3.7. socket.getaddrinfo(host, port, family=0, type=0, proto=0, flags=0) Translate the *host*/*port* argument into a sequence of 5-tuples that contain all the necessary arguments for creating a socket connected to that service. *host* is a domain name, a string representation of an IPv4/v6 address or "None". *port* is a string service name such as "'http'", a numeric port number or "None". By passing "None" as the value of *host* and *port*, you can pass "NULL" to the underlying C API. The *family*, *type* and *proto* arguments can be optionally specified in order to narrow the list of addresses returned. Passing zero as a value for each of these arguments selects the full range of results. The *flags* argument can be one or several of the "AI_*" constants, and will influence how results are computed and returned. For example, "AI_NUMERICHOST" will disable domain name resolution and will raise an error if *host* is a domain name. The function returns a list of 5-tuples with the following structure: "(family, type, proto, canonname, sockaddr)" In these tuples, *family*, *type*, *proto* are all integers and are meant to be passed to the "socket()" function. *canonname* will be a string representing the canonical name of the *host* if "AI_CANONNAME" is part of the *flags* argument; else *canonname* will be empty. *sockaddr* is a tuple describing a socket address, whose format depends on the returned *family* (a "(address, port)" 2-tuple for "AF_INET", a "(address, port, flowinfo, scope_id)" 4-tuple for "AF_INET6"), and is meant to be passed to the "socket.connect()" method. Raises an auditing event "socket.getaddrinfo" with arguments "host", "port", "family", "type", "protocol". The following example fetches address information for a hypothetical TCP connection to "example.org" on port 80 (results may differ on your system if IPv6 isn’t enabled): >>> socket.getaddrinfo("example.org", 80, proto=socket.IPPROTO_TCP) [(, , 6, '', ('2606:2800:220:1:248:1893:25c8:1946', 80, 0, 0)), (, , 6, '', ('93.184.216.34', 80))] Changed in version 3.2: parameters can now be passed using keyword arguments. Changed in version 3.7: for IPv6 multicast addresses, string representing an address will not contain "%scope_id" part. socket.getfqdn([name]) Return a fully qualified domain name for *name*. If *name* is omitted or empty, it is interpreted as the local host. To find the fully qualified name, the hostname returned by "gethostbyaddr()" is checked, followed by aliases for the host, if available. The first name which includes a period is selected. In case no fully qualified domain name is available and *name* was provided, it is returned unchanged. If *name* was empty or equal to "'0.0.0.0'", the hostname from "gethostname()" is returned. socket.gethostbyname(hostname) Translate a host name to IPv4 address format. The IPv4 address is returned as a string, such as "'100.50.200.5'". If the host name is an IPv4 address itself it is returned unchanged. See "gethostbyname_ex()" for a more complete interface. "gethostbyname()" does not support IPv6 name resolution, and "getaddrinfo()" should be used instead for IPv4/v6 dual stack support. Raises an auditing event "socket.gethostbyname" with argument "hostname". socket.gethostbyname_ex(hostname) Translate a host name to IPv4 address format, extended interface. Return a triple "(hostname, aliaslist, ipaddrlist)" where *hostname* is the host’s primary host name, *aliaslist* is a (possibly empty) list of alternative host names for the same address, and *ipaddrlist* is a list of IPv4 addresses for the same interface on the same host (often but not always a single address). "gethostbyname_ex()" does not support IPv6 name resolution, and "getaddrinfo()" should be used instead for IPv4/v6 dual stack support. Raises an auditing event "socket.gethostbyname" with argument "hostname". socket.gethostname() Return a string containing the hostname of the machine where the Python interpreter is currently executing. Raises an auditing event "socket.gethostname" with no arguments. Note: "gethostname()" doesn’t always return the fully qualified domain name; use "getfqdn()" for that. socket.gethostbyaddr(ip_address) Return a triple "(hostname, aliaslist, ipaddrlist)" where *hostname* is the primary host name responding to the given *ip_address*, *aliaslist* is a (possibly empty) list of alternative host names for the same address, and *ipaddrlist* is a list of IPv4/v6 addresses for the same interface on the same host (most likely containing only a single address). To find the fully qualified domain name, use the function "getfqdn()". "gethostbyaddr()" supports both IPv4 and IPv6. Raises an auditing event "socket.gethostbyaddr" with argument "ip_address". socket.getnameinfo(sockaddr, flags) Translate a socket address *sockaddr* into a 2-tuple "(host, port)". Depending on the settings of *flags*, the result can contain a fully-qualified domain name or numeric address representation in *host*. Similarly, *port* can contain a string port name or a numeric port number. For IPv6 addresses, "%scope_id" is appended to the host part if *sockaddr* contains meaningful *scope_id*. Usually this happens for multicast addresses. For more information about *flags* you can consult *getnameinfo(3)*. Raises an auditing event "socket.getnameinfo" with argument "sockaddr". socket.getprotobyname(protocolname) Translate an Internet protocol name (for example, "'icmp'") to a constant suitable for passing as the (optional) third argument to the "socket()" function. This is usually only needed for sockets opened in “raw” mode ("SOCK_RAW"); for the normal socket modes, the correct protocol is chosen automatically if the protocol is omitted or zero. socket.getservbyname(servicename[, protocolname]) Translate an Internet service name and protocol name to a port number for that service. The optional protocol name, if given, should be "'tcp'" or "'udp'", otherwise any protocol will match. Raises an auditing event "socket.getservbyname" with arguments "servicename", "protocolname". socket.getservbyport(port[, protocolname]) Translate an Internet port number and protocol name to a service name for that service. The optional protocol name, if given, should be "'tcp'" or "'udp'", otherwise any protocol will match. Raises an auditing event "socket.getservbyport" with arguments "port", "protocolname". socket.ntohl(x) Convert 32-bit positive integers from network to host byte order. On machines where the host byte order is the same as network byte order, this is a no-op; otherwise, it performs a 4-byte swap operation. socket.ntohs(x) Convert 16-bit positive integers from network to host byte order. On machines where the host byte order is the same as network byte order, this is a no-op; otherwise, it performs a 2-byte swap operation. Deprecated since version 3.7: In case *x* does not fit in 16-bit unsigned integer, but does fit in a positive C int, it is silently truncated to 16-bit unsigned integer. This silent truncation feature is deprecated, and will raise an exception in future versions of Python. socket.htonl(x) Convert 32-bit positive integers from host to network byte order. On machines where the host byte order is the same as network byte order, this is a no-op; otherwise, it performs a 4-byte swap operation. socket.htons(x) Convert 16-bit positive integers from host to network byte order. On machines where the host byte order is the same as network byte order, this is a no-op; otherwise, it performs a 2-byte swap operation. Deprecated since version 3.7: In case *x* does not fit in 16-bit unsigned integer, but does fit in a positive C int, it is silently truncated to 16-bit unsigned integer. This silent truncation feature is deprecated, and will raise an exception in future versions of Python. socket.inet_aton(ip_string) Convert an IPv4 address from dotted-quad string format (for example, ‘123.45.67.89’) to 32-bit packed binary format, as a bytes object four characters in length. This is useful when conversing with a program that uses the standard C library and needs objects of type "struct in_addr", which is the C type for the 32-bit packed binary this function returns. "inet_aton()" also accepts strings with less than three dots; see the Unix manual page *inet(3)* for details. If the IPv4 address string passed to this function is invalid, "OSError" will be raised. Note that exactly what is valid depends on the underlying C implementation of "inet_aton()". "inet_aton()" does not support IPv6, and "inet_pton()" should be used instead for IPv4/v6 dual stack support. socket.inet_ntoa(packed_ip) Convert a 32-bit packed IPv4 address (a *bytes-like object* four bytes in length) to its standard dotted-quad string representation (for example, ‘123.45.67.89’). This is useful when conversing with a program that uses the standard C library and needs objects of type "struct in_addr", which is the C type for the 32-bit packed binary data this function takes as an argument. If the byte sequence passed to this function is not exactly 4 bytes in length, "OSError" will be raised. "inet_ntoa()" does not support IPv6, and "inet_ntop()" should be used instead for IPv4/v6 dual stack support. Changed in version 3.5: Writable *bytes-like object* is now accepted. socket.inet_pton(address_family, ip_string) Convert an IP address from its family-specific string format to a packed, binary format. "inet_pton()" is useful when a library or network protocol calls for an object of type "struct in_addr" (similar to "inet_aton()") or "struct in6_addr". Supported values for *address_family* are currently "AF_INET" and "AF_INET6". If the IP address string *ip_string* is invalid, "OSError" will be raised. Note that exactly what is valid depends on both the value of *address_family* and the underlying implementation of "inet_pton()". Availability: Unix (maybe not all platforms), Windows. Changed in version 3.4: Windows support added socket.inet_ntop(address_family, packed_ip) Convert a packed IP address (a *bytes-like object* of some number of bytes) to its standard, family-specific string representation (for example, "'7.10.0.5'" or "'5aef:2b::8'"). "inet_ntop()" is useful when a library or network protocol returns an object of type "struct in_addr" (similar to "inet_ntoa()") or "struct in6_addr". Supported values for *address_family* are currently "AF_INET" and "AF_INET6". If the bytes object *packed_ip* is not the correct length for the specified address family, "ValueError" will be raised. "OSError" is raised for errors from the call to "inet_ntop()". Availability: Unix (maybe not all platforms), Windows. Changed in version 3.4: Windows support added Changed in version 3.5: Writable *bytes-like object* is now accepted. socket.CMSG_LEN(length) Return the total length, without trailing padding, of an ancillary data item with associated data of the given *length*. This value can often be used as the buffer size for "recvmsg()" to receive a single item of ancillary data, but **RFC 3542** requires portable applications to use "CMSG_SPACE()" and thus include space for padding, even when the item will be the last in the buffer. Raises "OverflowError" if *length* is outside the permissible range of values. Availability: most Unix platforms, possibly others. New in version 3.3. socket.CMSG_SPACE(length) Return the buffer size needed for "recvmsg()" to receive an ancillary data item with associated data of the given *length*, along with any trailing padding. The buffer space needed to receive multiple items is the sum of the "CMSG_SPACE()" values for their associated data lengths. Raises "OverflowError" if *length* is outside the permissible range of values. Note that some systems might support ancillary data without providing this function. Also note that setting the buffer size using the results of this function may not precisely limit the amount of ancillary data that can be received, since additional data may be able to fit into the padding area. Availability: most Unix platforms, possibly others. New in version 3.3. socket.getdefaulttimeout() Return the default timeout in seconds (float) for new socket objects. A value of "None" indicates that new socket objects have no timeout. When the socket module is first imported, the default is "None". socket.setdefaulttimeout(timeout) Set the default timeout in seconds (float) for new socket objects. When the socket module is first imported, the default is "None". See "settimeout()" for possible values and their respective meanings. socket.sethostname(name) Set the machine’s hostname to *name*. This will raise an "OSError" if you don’t have enough rights. Raises an auditing event "socket.sethostname" with argument "name". Availability: Unix. New in version 3.3. socket.if_nameindex() Return a list of network interface information (index int, name string) tuples. "OSError" if the system call fails. Availability: Unix, Windows. New in version 3.3. Changed in version 3.8: Windows support was added. Note: On Windows network interfaces have different names in different contexts (all names are examples): * UUID: "{FB605B73-AAC2-49A6-9A2F-25416AEA0573}" * name: "ethernet_32770" * friendly name: "vEthernet (nat)" * description: "Hyper-V Virtual Ethernet Adapter" This function returns names of the second form from the list, "ethernet_32770" in this example case. socket.if_nametoindex(if_name) Return a network interface index number corresponding to an interface name. "OSError" if no interface with the given name exists. Availability: Unix, Windows. New in version 3.3. Changed in version 3.8: Windows support was added. See also: “Interface name” is a name as documented in "if_nameindex()". socket.if_indextoname(if_index) Return a network interface name corresponding to an interface index number. "OSError" if no interface with the given index exists. Availability: Unix, Windows. New in version 3.3. Changed in version 3.8: Windows support was added. See also: “Interface name” is a name as documented in "if_nameindex()". socket.send_fds(sock, buffers, fds[, flags[, address]]) Send the list of file descriptors *fds* over an "AF_UNIX" socket *sock*. The *fds* parameter is a sequence of file descriptors. Consult "sendmsg()" for the documentation of these parameters. Availability: Unix supporting "sendmsg()" and "SCM_RIGHTS" mechanism. New in version 3.9. socket.recv_fds(sock, bufsize, maxfds[, flags]) Receive up to *maxfds* file descriptors from an "AF_UNIX" socket *sock*. Return "(msg, list(fds), flags, addr)". Consult "recvmsg()" for the documentation of these parameters. Availability: Unix supporting "recvmsg()" and "SCM_RIGHTS" mechanism. New in version 3.9. Note: Any truncated integers at the end of the list of file descriptors. Socket Objects ============== Socket objects have the following methods. Except for "makefile()", these correspond to Unix system calls applicable to sockets. Changed in version 3.2: Support for the *context manager* protocol was added. Exiting the context manager is equivalent to calling "close()". socket.accept() Accept a connection. The socket must be bound to an address and listening for connections. The return value is a pair "(conn, address)" where *conn* is a *new* socket object usable to send and receive data on the connection, and *address* is the address bound to the socket on the other end of the connection. The newly created socket is non-inheritable. Changed in version 3.4: The socket is now non-inheritable. Changed in version 3.5: If the system call is interrupted and the signal handler does not raise an exception, the method now retries the system call instead of raising an "InterruptedError" exception (see **PEP 475** for the rationale). socket.bind(address) Bind the socket to *address*. The socket must not already be bound. (The format of *address* depends on the address family — see above.) Raises an auditing event "socket.bind" with arguments "self", "address". socket.close() Mark the socket closed. The underlying system resource (e.g. a file descriptor) is also closed when all file objects from "makefile()" are closed. Once that happens, all future operations on the socket object will fail. The remote end will receive no more data (after queued data is flushed). Sockets are automatically closed when they are garbage-collected, but it is recommended to "close()" them explicitly, or to use a "with" statement around them. Changed in version 3.6: "OSError" is now raised if an error occurs when the underlying "close()" call is made. Note: "close()" releases the resource associated with a connection but does not necessarily close the connection immediately. If you want to close the connection in a timely fashion, call "shutdown()" before "close()". socket.connect(address) Connect to a remote socket at *address*. (The format of *address* depends on the address family — see above.) If the connection is interrupted by a signal, the method waits until the connection completes, or raise a "socket.timeout" on timeout, if the signal handler doesn’t raise an exception and the socket is blocking or has a timeout. For non-blocking sockets, the method raises an "InterruptedError" exception if the connection is interrupted by a signal (or the exception raised by the signal handler). Raises an auditing event "socket.connect" with arguments "self", "address". Changed in version 3.5: The method now waits until the connection completes instead of raising an "InterruptedError" exception if the connection is interrupted by a signal, the signal handler doesn’t raise an exception and the socket is blocking or has a timeout (see the **PEP 475** for the rationale). socket.connect_ex(address) Like "connect(address)", but return an error indicator instead of raising an exception for errors returned by the C-level "connect()" call (other problems, such as “host not found,” can still raise exceptions). The error indicator is "0" if the operation succeeded, otherwise the value of the "errno" variable. This is useful to support, for example, asynchronous connects. Raises an auditing event "socket.connect" with arguments "self", "address". socket.detach() Put the socket object into closed state without actually closing the underlying file descriptor. The file descriptor is returned, and can be reused for other purposes. New in version 3.2. socket.dup() Duplicate the socket. The newly created socket is non-inheritable. Changed in version 3.4: The socket is now non-inheritable. socket.fileno() Return the socket’s file descriptor (a small integer), or -1 on failure. This is useful with "select.select()". Under Windows the small integer returned by this method cannot be used where a file descriptor can be used (such as "os.fdopen()"). Unix does not have this limitation. socket.get_inheritable() Get the inheritable flag of the socket’s file descriptor or socket’s handle: "True" if the socket can be inherited in child processes, "False" if it cannot. New in version 3.4. socket.getpeername() Return the remote address to which the socket is connected. This is useful to find out the port number of a remote IPv4/v6 socket, for instance. (The format of the address returned depends on the address family — see above.) On some systems this function is not supported. socket.getsockname() Return the socket’s own address. This is useful to find out the port number of an IPv4/v6 socket, for instance. (The format of the address returned depends on the address family — see above.) socket.getsockopt(level, optname[, buflen]) Return the value of the given socket option (see the Unix man page *getsockopt(2)*). The needed symbolic constants ("SO_*" etc.) are defined in this module. If *buflen* is absent, an integer option is assumed and its integer value is returned by the function. If *buflen* is present, it specifies the maximum length of the buffer used to receive the option in, and this buffer is returned as a bytes object. It is up to the caller to decode the contents of the buffer (see the optional built-in module "struct" for a way to decode C structures encoded as byte strings). socket.getblocking() Return "True" if socket is in blocking mode, "False" if in non- blocking. This is equivalent to checking "socket.gettimeout() == 0". New in version 3.7. socket.gettimeout() Return the timeout in seconds (float) associated with socket operations, or "None" if no timeout is set. This reflects the last call to "setblocking()" or "settimeout()". socket.ioctl(control, option) Platform: Windows The "ioctl()" method is a limited interface to the WSAIoctl system interface. Please refer to the Win32 documentation for more information. On other platforms, the generic "fcntl.fcntl()" and "fcntl.ioctl()" functions may be used; they accept a socket object as their first argument. Currently only the following control codes are supported: "SIO_RCVALL", "SIO_KEEPALIVE_VALS", and "SIO_LOOPBACK_FAST_PATH". Changed in version 3.6: "SIO_LOOPBACK_FAST_PATH" was added. socket.listen([backlog]) Enable a server to accept connections. If *backlog* is specified, it must be at least 0 (if it is lower, it is set to 0); it specifies the number of unaccepted connections that the system will allow before refusing new connections. If not specified, a default reasonable value is chosen. Changed in version 3.5: The *backlog* parameter is now optional. socket.makefile(mode='r', buffering=None, *, encoding=None, errors=None, newline=None) Return a *file object* associated with the socket. The exact returned type depends on the arguments given to "makefile()". These arguments are interpreted the same way as by the built-in "open()" function, except the only supported *mode* values are "'r'" (default), "'w'" and "'b'". The socket must be in blocking mode; it can have a timeout, but the file object’s internal buffer may end up in an inconsistent state if a timeout occurs. Closing the file object returned by "makefile()" won’t close the original socket unless all other file objects have been closed and "socket.close()" has been called on the socket object. Note: On Windows, the file-like object created by "makefile()" cannot be used where a file object with a file descriptor is expected, such as the stream arguments of "subprocess.Popen()". socket.recv(bufsize[, flags]) Receive data from the socket. The return value is a bytes object representing the data received. The maximum amount of data to be received at once is specified by *bufsize*. See the Unix manual page *recv(2)* for the meaning of the optional argument *flags*; it defaults to zero. Note: For best match with hardware and network realities, the value of *bufsize* should be a relatively small power of 2, for example, 4096. Changed in version 3.5: If the system call is interrupted and the signal handler does not raise an exception, the method now retries the system call instead of raising an "InterruptedError" exception (see **PEP 475** for the rationale). socket.recvfrom(bufsize[, flags]) Receive data from the socket. The return value is a pair "(bytes, address)" where *bytes* is a bytes object representing the data received and *address* is the address of the socket sending the data. See the Unix manual page *recv(2)* for the meaning of the optional argument *flags*; it defaults to zero. (The format of *address* depends on the address family — see above.) Changed in version 3.5: If the system call is interrupted and the signal handler does not raise an exception, the method now retries the system call instead of raising an "InterruptedError" exception (see **PEP 475** for the rationale). Changed in version 3.7: For multicast IPv6 address, first item of *address* does not contain "%scope_id" part anymore. In order to get full IPv6 address use "getnameinfo()". socket.recvmsg(bufsize[, ancbufsize[, flags]]) Receive normal data (up to *bufsize* bytes) and ancillary data from the socket. The *ancbufsize* argument sets the size in bytes of the internal buffer used to receive the ancillary data; it defaults to 0, meaning that no ancillary data will be received. Appropriate buffer sizes for ancillary data can be calculated using "CMSG_SPACE()" or "CMSG_LEN()", and items which do not fit into the buffer might be truncated or discarded. The *flags* argument defaults to 0 and has the same meaning as for "recv()". The return value is a 4-tuple: "(data, ancdata, msg_flags, address)". The *data* item is a "bytes" object holding the non- ancillary data received. The *ancdata* item is a list of zero or more tuples "(cmsg_level, cmsg_type, cmsg_data)" representing the ancillary data (control messages) received: *cmsg_level* and *cmsg_type* are integers specifying the protocol level and protocol-specific type respectively, and *cmsg_data* is a "bytes" object holding the associated data. The *msg_flags* item is the bitwise OR of various flags indicating conditions on the received message; see your system documentation for details. If the receiving socket is unconnected, *address* is the address of the sending socket, if available; otherwise, its value is unspecified. On some systems, "sendmsg()" and "recvmsg()" can be used to pass file descriptors between processes over an "AF_UNIX" socket. When this facility is used (it is often restricted to "SOCK_STREAM" sockets), "recvmsg()" will return, in its ancillary data, items of the form "(socket.SOL_SOCKET, socket.SCM_RIGHTS, fds)", where *fds* is a "bytes" object representing the new file descriptors as a binary array of the native C "int" type. If "recvmsg()" raises an exception after the system call returns, it will first attempt to close any file descriptors received via this mechanism. Some systems do not indicate the truncated length of ancillary data items which have been only partially received. If an item appears to extend beyond the end of the buffer, "recvmsg()" will issue a "RuntimeWarning", and will return the part of it which is inside the buffer provided it has not been truncated before the start of its associated data. On systems which support the "SCM_RIGHTS" mechanism, the following function will receive up to *maxfds* file descriptors, returning the message data and a list containing the descriptors (while ignoring unexpected conditions such as unrelated control messages being received). See also "sendmsg()". import socket, array def recv_fds(sock, msglen, maxfds): fds = array.array("i") # Array of ints msg, ancdata, flags, addr = sock.recvmsg(msglen, socket.CMSG_LEN(maxfds * fds.itemsize)) for cmsg_level, cmsg_type, cmsg_data in ancdata: if cmsg_level == socket.SOL_SOCKET and cmsg_type == socket.SCM_RIGHTS: # Append data, ignoring any truncated integers at the end. fds.frombytes(cmsg_data[:len(cmsg_data) - (len(cmsg_data) % fds.itemsize)]) return msg, list(fds) Availability: most Unix platforms, possibly others. New in version 3.3. Changed in version 3.5: If the system call is interrupted and the signal handler does not raise an exception, the method now retries the system call instead of raising an "InterruptedError" exception (see **PEP 475** for the rationale). socket.recvmsg_into(buffers[, ancbufsize[, flags]]) Receive normal data and ancillary data from the socket, behaving as "recvmsg()" would, but scatter the non-ancillary data into a series of buffers instead of returning a new bytes object. The *buffers* argument must be an iterable of objects that export writable buffers (e.g. "bytearray" objects); these will be filled with successive chunks of the non-ancillary data until it has all been written or there are no more buffers. The operating system may set a limit ("sysconf()" value "SC_IOV_MAX") on the number of buffers that can be used. The *ancbufsize* and *flags* arguments have the same meaning as for "recvmsg()". The return value is a 4-tuple: "(nbytes, ancdata, msg_flags, address)", where *nbytes* is the total number of bytes of non- ancillary data written into the buffers, and *ancdata*, *msg_flags* and *address* are the same as for "recvmsg()". Example: >>> import socket >>> s1, s2 = socket.socketpair() >>> b1 = bytearray(b'----') >>> b2 = bytearray(b'0123456789') >>> b3 = bytearray(b'--------------') >>> s1.send(b'Mary had a little lamb') 22 >>> s2.recvmsg_into([b1, memoryview(b2)[2:9], b3]) (22, [], 0, None) >>> [b1, b2, b3] [bytearray(b'Mary'), bytearray(b'01 had a 9'), bytearray(b'little lamb---')] Availability: most Unix platforms, possibly others. New in version 3.3. socket.recvfrom_into(buffer[, nbytes[, flags]]) Receive data from the socket, writing it into *buffer* instead of creating a new bytestring. The return value is a pair "(nbytes, address)" where *nbytes* is the number of bytes received and *address* is the address of the socket sending the data. See the Unix manual page *recv(2)* for the meaning of the optional argument *flags*; it defaults to zero. (The format of *address* depends on the address family — see above.) socket.recv_into(buffer[, nbytes[, flags]]) Receive up to *nbytes* bytes from the socket, storing the data into a buffer rather than creating a new bytestring. If *nbytes* is not specified (or 0), receive up to the size available in the given buffer. Returns the number of bytes received. See the Unix manual page *recv(2)* for the meaning of the optional argument *flags*; it defaults to zero. socket.send(bytes[, flags]) Send data to the socket. The socket must be connected to a remote socket. The optional *flags* argument has the same meaning as for "recv()" above. Returns the number of bytes sent. Applications are responsible for checking that all data has been sent; if only some of the data was transmitted, the application needs to attempt delivery of the remaining data. For further information on this topic, consult the Socket Programming HOWTO. Changed in version 3.5: If the system call is interrupted and the signal handler does not raise an exception, the method now retries the system call instead of raising an "InterruptedError" exception (see **PEP 475** for the rationale). socket.sendall(bytes[, flags]) Send data to the socket. The socket must be connected to a remote socket. The optional *flags* argument has the same meaning as for "recv()" above. Unlike "send()", this method continues to send data from *bytes* until either all data has been sent or an error occurs. "None" is returned on success. On error, an exception is raised, and there is no way to determine how much data, if any, was successfully sent. Changed in version 3.5: The socket timeout is no more reset each time data is sent successfully. The socket timeout is now the maximum total duration to send all data. Changed in version 3.5: If the system call is interrupted and the signal handler does not raise an exception, the method now retries the system call instead of raising an "InterruptedError" exception (see **PEP 475** for the rationale). socket.sendto(bytes, address) socket.sendto(bytes, flags, address) Send data to the socket. The socket should not be connected to a remote socket, since the destination socket is specified by *address*. The optional *flags* argument has the same meaning as for "recv()" above. Return the number of bytes sent. (The format of *address* depends on the address family — see above.) Raises an auditing event "socket.sendto" with arguments "self", "address". Changed in version 3.5: If the system call is interrupted and the signal handler does not raise an exception, the method now retries the system call instead of raising an "InterruptedError" exception (see **PEP 475** for the rationale). socket.sendmsg(buffers[, ancdata[, flags[, address]]]) Send normal and ancillary data to the socket, gathering the non- ancillary data from a series of buffers and concatenating it into a single message. The *buffers* argument specifies the non-ancillary data as an iterable of *bytes-like objects* (e.g. "bytes" objects); the operating system may set a limit ("sysconf()" value "SC_IOV_MAX") on the number of buffers that can be used. The *ancdata* argument specifies the ancillary data (control messages) as an iterable of zero or more tuples "(cmsg_level, cmsg_type, cmsg_data)", where *cmsg_level* and *cmsg_type* are integers specifying the protocol level and protocol-specific type respectively, and *cmsg_data* is a bytes-like object holding the associated data. Note that some systems (in particular, systems without "CMSG_SPACE()") might support sending only one control message per call. The *flags* argument defaults to 0 and has the same meaning as for "send()". If *address* is supplied and not "None", it sets a destination address for the message. The return value is the number of bytes of non-ancillary data sent. The following function sends the list of file descriptors *fds* over an "AF_UNIX" socket, on systems which support the "SCM_RIGHTS" mechanism. See also "recvmsg()". import socket, array def send_fds(sock, msg, fds): return sock.sendmsg([msg], [(socket.SOL_SOCKET, socket.SCM_RIGHTS, array.array("i", fds))]) Availability: most Unix platforms, possibly others. Raises an auditing event "socket.sendmsg" with arguments "self", "address". New in version 3.3. Changed in version 3.5: If the system call is interrupted and the signal handler does not raise an exception, the method now retries the system call instead of raising an "InterruptedError" exception (see **PEP 475** for the rationale). socket.sendmsg_afalg([msg], *, op[, iv[, assoclen[, flags]]]) Specialized version of "sendmsg()" for "AF_ALG" socket. Set mode, IV, AEAD associated data length and flags for "AF_ALG" socket. Availability: Linux >= 2.6.38. New in version 3.6. socket.sendfile(file, offset=0, count=None) Send a file until EOF is reached by using high-performance "os.sendfile" and return the total number of bytes which were sent. *file* must be a regular file object opened in binary mode. If "os.sendfile" is not available (e.g. Windows) or *file* is not a regular file "send()" will be used instead. *offset* tells from where to start reading the file. If specified, *count* is the total number of bytes to transmit as opposed to sending the file until EOF is reached. File position is updated on return or also in case of error in which case "file.tell()" can be used to figure out the number of bytes which were sent. The socket must be of "SOCK_STREAM" type. Non-blocking sockets are not supported. New in version 3.5. socket.set_inheritable(inheritable) Set the inheritable flag of the socket’s file descriptor or socket’s handle. New in version 3.4. socket.setblocking(flag) Set blocking or non-blocking mode of the socket: if *flag* is false, the socket is set to non-blocking, else to blocking mode. This method is a shorthand for certain "settimeout()" calls: * "sock.setblocking(True)" is equivalent to "sock.settimeout(None)" * "sock.setblocking(False)" is equivalent to "sock.settimeout(0.0)" Changed in version 3.7: The method no longer applies "SOCK_NONBLOCK" flag on "socket.type". socket.settimeout(value) Set a timeout on blocking socket operations. The *value* argument can be a nonnegative floating point number expressing seconds, or "None". If a non-zero value is given, subsequent socket operations will raise a "timeout" exception if the timeout period *value* has elapsed before the operation has completed. If zero is given, the socket is put in non-blocking mode. If "None" is given, the socket is put in blocking mode. For further information, please consult the notes on socket timeouts. Changed in version 3.7: The method no longer toggles "SOCK_NONBLOCK" flag on "socket.type". socket.setsockopt(level, optname, value: int) socket.setsockopt(level, optname, value: buffer) socket.setsockopt(level, optname, None, optlen: int) Set the value of the given socket option (see the Unix manual page *setsockopt(2)*). The needed symbolic constants are defined in the "socket" module ("SO_*" etc.). The value can be an integer, "None" or a *bytes-like object* representing a buffer. In the later case it is up to the caller to ensure that the bytestring contains the proper bits (see the optional built-in module "struct" for a way to encode C structures as bytestrings). When *value* is set to "None", *optlen* argument is required. It’s equivalent to call "setsockopt()" C function with "optval=NULL" and "optlen=optlen". Changed in version 3.5: Writable *bytes-like object* is now accepted. Changed in version 3.6: setsockopt(level, optname, None, optlen: int) form added. socket.shutdown(how) Shut down one or both halves of the connection. If *how* is "SHUT_RD", further receives are disallowed. If *how* is "SHUT_WR", further sends are disallowed. If *how* is "SHUT_RDWR", further sends and receives are disallowed. socket.share(process_id) Duplicate a socket and prepare it for sharing with a target process. The target process must be provided with *process_id*. The resulting bytes object can then be passed to the target process using some form of interprocess communication and the socket can be recreated there using "fromshare()". Once this method has been called, it is safe to close the socket since the operating system has already duplicated it for the target process. Availability: Windows. New in version 3.3. Note that there are no methods "read()" or "write()"; use "recv()" and "send()" without *flags* argument instead. Socket objects also have these (read-only) attributes that correspond to the values given to the "socket" constructor. socket.family The socket family. socket.type The socket type. socket.proto The socket protocol. Notes on socket timeouts ======================== A socket object can be in one of three modes: blocking, non-blocking, or timeout. Sockets are by default always created in blocking mode, but this can be changed by calling "setdefaulttimeout()". * In *blocking mode*, operations block until complete or the system returns an error (such as connection timed out). * In *non-blocking mode*, operations fail (with an error that is unfortunately system-dependent) if they cannot be completed immediately: functions from the "select" can be used to know when and whether a socket is available for reading or writing. * In *timeout mode*, operations fail if they cannot be completed within the timeout specified for the socket (they raise a "timeout" exception) or if the system returns an error. Note: At the operating system level, sockets in *timeout mode* are internally set in non-blocking mode. Also, the blocking and timeout modes are shared between file descriptors and socket objects that refer to the same network endpoint. This implementation detail can have visible consequences if e.g. you decide to use the "fileno()" of a socket. Timeouts and the "connect" method --------------------------------- The "connect()" operation is also subject to the timeout setting, and in general it is recommended to call "settimeout()" before calling "connect()" or pass a timeout parameter to "create_connection()". However, the system network stack may also return a connection timeout error of its own regardless of any Python socket timeout setting. Timeouts and the "accept" method -------------------------------- If "getdefaulttimeout()" is not "None", sockets returned by the "accept()" method inherit that timeout. Otherwise, the behaviour depends on settings of the listening socket: * if the listening socket is in *blocking mode* or in *timeout mode*, the socket returned by "accept()" is in *blocking mode*; * if the listening socket is in *non-blocking mode*, whether the socket returned by "accept()" is in blocking or non-blocking mode is operating system-dependent. If you want to ensure cross-platform behaviour, it is recommended you manually override this setting. Example ======= Here are four minimal example programs using the TCP/IP protocol: a server that echoes all data that it receives back (servicing only one client), and a client using it. Note that a server must perform the sequence "socket()", "bind()", "listen()", "accept()" (possibly repeating the "accept()" to service more than one client), while a client only needs the sequence "socket()", "connect()". Also note that the server does not "sendall()"/"recv()" on the socket it is listening on but on the new socket returned by "accept()". The first two examples support IPv4 only. # Echo server program import socket HOST = '' # Symbolic name meaning all available interfaces PORT = 50007 # Arbitrary non-privileged port with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: s.bind((HOST, PORT)) s.listen(1) conn, addr = s.accept() with conn: print('Connected by', addr) while True: data = conn.recv(1024) if not data: break conn.sendall(data) # Echo client program import socket HOST = 'daring.cwi.nl' # The remote host PORT = 50007 # The same port as used by the server with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as s: s.connect((HOST, PORT)) s.sendall(b'Hello, world') data = s.recv(1024) print('Received', repr(data)) The next two examples are identical to the above two, but support both IPv4 and IPv6. The server side will listen to the first address family available (it should listen to both instead). On most of IPv6-ready systems, IPv6 will take precedence and the server may not accept IPv4 traffic. The client side will try to connect to the all addresses returned as a result of the name resolution, and sends traffic to the first one connected successfully. # Echo server program import socket import sys HOST = None # Symbolic name meaning all available interfaces PORT = 50007 # Arbitrary non-privileged port s = None for res in socket.getaddrinfo(HOST, PORT, socket.AF_UNSPEC, socket.SOCK_STREAM, 0, socket.AI_PASSIVE): af, socktype, proto, canonname, sa = res try: s = socket.socket(af, socktype, proto) except OSError as msg: s = None continue try: s.bind(sa) s.listen(1) except OSError as msg: s.close() s = None continue break if s is None: print('could not open socket') sys.exit(1) conn, addr = s.accept() with conn: print('Connected by', addr) while True: data = conn.recv(1024) if not data: break conn.send(data) # Echo client program import socket import sys HOST = 'daring.cwi.nl' # The remote host PORT = 50007 # The same port as used by the server s = None for res in socket.getaddrinfo(HOST, PORT, socket.AF_UNSPEC, socket.SOCK_STREAM): af, socktype, proto, canonname, sa = res try: s = socket.socket(af, socktype, proto) except OSError as msg: s = None continue try: s.connect(sa) except OSError as msg: s.close() s = None continue break if s is None: print('could not open socket') sys.exit(1) with s: s.sendall(b'Hello, world') data = s.recv(1024) print('Received', repr(data)) The next example shows how to write a very simple network sniffer with raw sockets on Windows. The example requires administrator privileges to modify the interface: import socket # the public network interface HOST = socket.gethostbyname(socket.gethostname()) # create a raw socket and bind it to the public interface s = socket.socket(socket.AF_INET, socket.SOCK_RAW, socket.IPPROTO_IP) s.bind((HOST, 0)) # Include IP headers s.setsockopt(socket.IPPROTO_IP, socket.IP_HDRINCL, 1) # receive all packages s.ioctl(socket.SIO_RCVALL, socket.RCVALL_ON) # receive a package print(s.recvfrom(65565)) # disabled promiscuous mode s.ioctl(socket.SIO_RCVALL, socket.RCVALL_OFF) The next example shows how to use the socket interface to communicate to a CAN network using the raw socket protocol. To use CAN with the broadcast manager protocol instead, open a socket with: socket.socket(socket.AF_CAN, socket.SOCK_DGRAM, socket.CAN_BCM) After binding ("CAN_RAW") or connecting ("CAN_BCM") the socket, you can use the "socket.send()", and the "socket.recv()" operations (and their counterparts) on the socket object as usual. This last example might require special privileges: import socket import struct # CAN frame packing/unpacking (see 'struct can_frame' in ) can_frame_fmt = "=IB3x8s" can_frame_size = struct.calcsize(can_frame_fmt) def build_can_frame(can_id, data): can_dlc = len(data) data = data.ljust(8, b'\x00') return struct.pack(can_frame_fmt, can_id, can_dlc, data) def dissect_can_frame(frame): can_id, can_dlc, data = struct.unpack(can_frame_fmt, frame) return (can_id, can_dlc, data[:can_dlc]) # create a raw socket and bind it to the 'vcan0' interface s = socket.socket(socket.AF_CAN, socket.SOCK_RAW, socket.CAN_RAW) s.bind(('vcan0',)) while True: cf, addr = s.recvfrom(can_frame_size) print('Received: can_id=%x, can_dlc=%x, data=%s' % dissect_can_frame(cf)) try: s.send(cf) except OSError: print('Error sending CAN frame') try: s.send(build_can_frame(0x01, b'\x01\x02\x03')) except OSError: print('Error sending CAN frame') Running an example several times with too small delay between executions, could lead to this error: OSError: [Errno 98] Address already in use This is because the previous execution has left the socket in a "TIME_WAIT" state, and can’t be immediately reused. There is a "socket" flag to set, in order to prevent this, "socket.SO_REUSEADDR": s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1) s.bind((HOST, PORT)) the "SO_REUSEADDR" flag tells the kernel to reuse a local socket in "TIME_WAIT" state, without waiting for its natural timeout to expire. See also: For an introduction to socket programming (in C), see the following papers: * *An Introductory 4.3BSD Interprocess Communication Tutorial*, by Stuart Sechrest * *An Advanced 4.3BSD Interprocess Communication Tutorial*, by Samuel J. Leffler et al, both in the UNIX Programmer’s Manual, Supplementary Documents 1 (sections PS1:7 and PS1:8). The platform-specific reference material for the various socket-related system calls are also a valuable source of information on the details of socket semantics. For Unix, refer to the manual pages; for Windows, see the WinSock (or Winsock 2) specification. For IPv6-ready APIs, readers may want to refer to **RFC 3493** titled Basic Socket Interface Extensions for IPv6. "socketserver" — A framework for network servers ************************************************ **Source code:** Lib/socketserver.py ====================================================================== The "socketserver" module simplifies the task of writing network servers. There are four basic concrete server classes: class socketserver.TCPServer(server_address, RequestHandlerClass, bind_and_activate=True) This uses the Internet TCP protocol, which provides for continuous streams of data between the client and server. If *bind_and_activate* is true, the constructor automatically attempts to invoke "server_bind()" and "server_activate()". The other parameters are passed to the "BaseServer" base class. class socketserver.UDPServer(server_address, RequestHandlerClass, bind_and_activate=True) This uses datagrams, which are discrete packets of information that may arrive out of order or be lost while in transit. The parameters are the same as for "TCPServer". class socketserver.UnixStreamServer(server_address, RequestHandlerClass, bind_and_activate=True) class socketserver.UnixDatagramServer(server_address, RequestHandlerClass, bind_and_activate=True) These more infrequently used classes are similar to the TCP and UDP classes, but use Unix domain sockets; they’re not available on non- Unix platforms. The parameters are the same as for "TCPServer". These four classes process requests *synchronously*; each request must be completed before the next request can be started. This isn’t suitable if each request takes a long time to complete, because it requires a lot of computation, or because it returns a lot of data which the client is slow to process. The solution is to create a separate process or thread to handle each request; the "ForkingMixIn" and "ThreadingMixIn" mix-in classes can be used to support asynchronous behaviour. Creating a server requires several steps. First, you must create a request handler class by subclassing the "BaseRequestHandler" class and overriding its "handle()" method; this method will process incoming requests. Second, you must instantiate one of the server classes, passing it the server’s address and the request handler class. It is recommended to use the server in a "with" statement. Then call the "handle_request()" or "serve_forever()" method of the server object to process one or many requests. Finally, call "server_close()" to close the socket (unless you used a "with" statement). When inheriting from "ThreadingMixIn" for threaded connection behavior, you should explicitly declare how you want your threads to behave on an abrupt shutdown. The "ThreadingMixIn" class defines an attribute *daemon_threads*, which indicates whether or not the server should wait for thread termination. You should set the flag explicitly if you would like threads to behave autonomously; the default is "False", meaning that Python will not exit until all threads created by "ThreadingMixIn" have exited. Server classes have the same external methods and attributes, no matter what network protocol they use. Server Creation Notes ===================== There are five classes in an inheritance diagram, four of which represent synchronous servers of four types: +------------+ | BaseServer | +------------+ | v +-----------+ +------------------+ | TCPServer |------->| UnixStreamServer | +-----------+ +------------------+ | v +-----------+ +--------------------+ | UDPServer |------->| UnixDatagramServer | +-----------+ +--------------------+ Note that "UnixDatagramServer" derives from "UDPServer", not from "UnixStreamServer" — the only difference between an IP and a Unix stream server is the address family, which is simply repeated in both Unix server classes. class socketserver.ForkingMixIn class socketserver.ThreadingMixIn Forking and threading versions of each type of server can be created using these mix-in classes. For instance, "ThreadingUDPServer" is created as follows: class ThreadingUDPServer(ThreadingMixIn, UDPServer): pass The mix-in class comes first, since it overrides a method defined in "UDPServer". Setting the various attributes also changes the behavior of the underlying server mechanism. "ForkingMixIn" and the Forking classes mentioned below are only available on POSIX platforms that support "fork()". "socketserver.ForkingMixIn.server_close()" waits until all child processes complete, except if "socketserver.ForkingMixIn.block_on_close" attribute is false. "socketserver.ThreadingMixIn.server_close()" waits until all non- daemon threads complete, except if "socketserver.ThreadingMixIn.block_on_close" attribute is false. Use daemonic threads by setting "ThreadingMixIn.daemon_threads" to "True" to not wait until threads complete. Changed in version 3.7: "socketserver.ForkingMixIn.server_close()" and "socketserver.ThreadingMixIn.server_close()" now waits until all child processes and non-daemonic threads complete. Add a new "socketserver.ForkingMixIn.block_on_close" class attribute to opt- in for the pre-3.7 behaviour. class socketserver.ForkingTCPServer class socketserver.ForkingUDPServer class socketserver.ThreadingTCPServer class socketserver.ThreadingUDPServer These classes are pre-defined using the mix-in classes. To implement a service, you must derive a class from "BaseRequestHandler" and redefine its "handle()" method. You can then run various versions of the service by combining one of the server classes with your request handler class. The request handler class must be different for datagram or stream services. This can be hidden by using the handler subclasses "StreamRequestHandler" or "DatagramRequestHandler". Of course, you still have to use your head! For instance, it makes no sense to use a forking server if the service contains state in memory that can be modified by different requests, since the modifications in the child process would never reach the initial state kept in the parent process and passed to each child. In this case, you can use a threading server, but you will probably have to use locks to protect the integrity of the shared data. On the other hand, if you are building an HTTP server where all data is stored externally (for instance, in the file system), a synchronous class will essentially render the service “deaf” while one request is being handled – which may be for a very long time if a client is slow to receive all the data it has requested. Here a threading or forking server is appropriate. In some cases, it may be appropriate to process part of a request synchronously, but to finish processing in a forked child depending on the request data. This can be implemented by using a synchronous server and doing an explicit fork in the request handler class "handle()" method. Another approach to handling multiple simultaneous requests in an environment that supports neither threads nor "fork()" (or where these are too expensive or inappropriate for the service) is to maintain an explicit table of partially finished requests and to use "selectors" to decide which request to work on next (or whether to handle a new incoming request). This is particularly important for stream services where each client can potentially be connected for a long time (if threads or subprocesses cannot be used). See "asyncore" for another way to manage this. Server Objects ============== class socketserver.BaseServer(server_address, RequestHandlerClass) This is the superclass of all Server objects in the module. It defines the interface, given below, but does not implement most of the methods, which is done in subclasses. The two parameters are stored in the respective "server_address" and "RequestHandlerClass" attributes. fileno() Return an integer file descriptor for the socket on which the server is listening. This function is most commonly passed to "selectors", to allow monitoring multiple servers in the same process. handle_request() Process a single request. This function calls the following methods in order: "get_request()", "verify_request()", and "process_request()". If the user-provided "handle()" method of the handler class raises an exception, the server’s "handle_error()" method will be called. If no request is received within "timeout" seconds, "handle_timeout()" will be called and "handle_request()" will return. serve_forever(poll_interval=0.5) Handle requests until an explicit "shutdown()" request. Poll for shutdown every *poll_interval* seconds. Ignores the "timeout" attribute. It also calls "service_actions()", which may be used by a subclass or mixin to provide actions specific to a given service. For example, the "ForkingMixIn" class uses "service_actions()" to clean up zombie child processes. Changed in version 3.3: Added "service_actions" call to the "serve_forever" method. service_actions() This is called in the "serve_forever()" loop. This method can be overridden by subclasses or mixin classes to perform actions specific to a given service, such as cleanup actions. New in version 3.3. shutdown() Tell the "serve_forever()" loop to stop and wait until it does. "shutdown()" must be called while "serve_forever()" is running in a different thread otherwise it will deadlock. server_close() Clean up the server. May be overridden. address_family The family of protocols to which the server’s socket belongs. Common examples are "socket.AF_INET" and "socket.AF_UNIX". RequestHandlerClass The user-provided request handler class; an instance of this class is created for each request. server_address The address on which the server is listening. The format of addresses varies depending on the protocol family; see the documentation for the "socket" module for details. For Internet protocols, this is a tuple containing a string giving the address, and an integer port number: "('127.0.0.1', 80)", for example. socket The socket object on which the server will listen for incoming requests. The server classes support the following class variables: allow_reuse_address Whether the server will allow the reuse of an address. This defaults to "False", and can be set in subclasses to change the policy. request_queue_size The size of the request queue. If it takes a long time to process a single request, any requests that arrive while the server is busy are placed into a queue, up to "request_queue_size" requests. Once the queue is full, further requests from clients will get a “Connection denied” error. The default value is usually 5, but this can be overridden by subclasses. socket_type The type of socket used by the server; "socket.SOCK_STREAM" and "socket.SOCK_DGRAM" are two common values. timeout Timeout duration, measured in seconds, or "None" if no timeout is desired. If "handle_request()" receives no incoming requests within the timeout period, the "handle_timeout()" method is called. There are various server methods that can be overridden by subclasses of base server classes like "TCPServer"; these methods aren’t useful to external users of the server object. finish_request(request, client_address) Actually processes the request by instantiating "RequestHandlerClass" and calling its "handle()" method. get_request() Must accept a request from the socket, and return a 2-tuple containing the *new* socket object to be used to communicate with the client, and the client’s address. handle_error(request, client_address) This function is called if the "handle()" method of a "RequestHandlerClass" instance raises an exception. The default action is to print the traceback to standard error and continue handling further requests. Changed in version 3.6: Now only called for exceptions derived from the "Exception" class. handle_timeout() This function is called when the "timeout" attribute has been set to a value other than "None" and the timeout period has passed with no requests being received. The default action for forking servers is to collect the status of any child processes that have exited, while in threading servers this method does nothing. process_request(request, client_address) Calls "finish_request()" to create an instance of the "RequestHandlerClass". If desired, this function can create a new process or thread to handle the request; the "ForkingMixIn" and "ThreadingMixIn" classes do this. server_activate() Called by the server’s constructor to activate the server. The default behavior for a TCP server just invokes "listen()" on the server’s socket. May be overridden. server_bind() Called by the server’s constructor to bind the socket to the desired address. May be overridden. verify_request(request, client_address) Must return a Boolean value; if the value is "True", the request will be processed, and if it’s "False", the request will be denied. This function can be overridden to implement access controls for a server. The default implementation always returns "True". Changed in version 3.6: Support for the *context manager* protocol was added. Exiting the context manager is equivalent to calling "server_close()". Request Handler Objects ======================= class socketserver.BaseRequestHandler This is the superclass of all request handler objects. It defines the interface, given below. A concrete request handler subclass must define a new "handle()" method, and can override any of the other methods. A new instance of the subclass is created for each request. setup() Called before the "handle()" method to perform any initialization actions required. The default implementation does nothing. handle() This function must do all the work required to service a request. The default implementation does nothing. Several instance attributes are available to it; the request is available as "self.request"; the client address as "self.client_address"; and the server instance as "self.server", in case it needs access to per-server information. The type of "self.request" is different for datagram or stream services. For stream services, "self.request" is a socket object; for datagram services, "self.request" is a pair of string and socket. finish() Called after the "handle()" method to perform any clean-up actions required. The default implementation does nothing. If "setup()" raises an exception, this function will not be called. class socketserver.StreamRequestHandler class socketserver.DatagramRequestHandler These "BaseRequestHandler" subclasses override the "setup()" and "finish()" methods, and provide "self.rfile" and "self.wfile" attributes. The "self.rfile" and "self.wfile" attributes can be read or written, respectively, to get the request data or return data to the client. The "rfile" attributes of both classes support the "io.BufferedIOBase" readable interface, and "DatagramRequestHandler.wfile" supports the "io.BufferedIOBase" writable interface. Changed in version 3.6: "StreamRequestHandler.wfile" also supports the "io.BufferedIOBase" writable interface. Examples ======== "socketserver.TCPServer" Example -------------------------------- This is the server side: import socketserver class MyTCPHandler(socketserver.BaseRequestHandler): """ The request handler class for our server. It is instantiated once per connection to the server, and must override the handle() method to implement communication to the client. """ def handle(self): # self.request is the TCP socket connected to the client self.data = self.request.recv(1024).strip() print("{} wrote:".format(self.client_address[0])) print(self.data) # just send back the same data, but upper-cased self.request.sendall(self.data.upper()) if __name__ == "__main__": HOST, PORT = "localhost", 9999 # Create the server, binding to localhost on port 9999 with socketserver.TCPServer((HOST, PORT), MyTCPHandler) as server: # Activate the server; this will keep running until you # interrupt the program with Ctrl-C server.serve_forever() An alternative request handler class that makes use of streams (file- like objects that simplify communication by providing the standard file interface): class MyTCPHandler(socketserver.StreamRequestHandler): def handle(self): # self.rfile is a file-like object created by the handler; # we can now use e.g. readline() instead of raw recv() calls self.data = self.rfile.readline().strip() print("{} wrote:".format(self.client_address[0])) print(self.data) # Likewise, self.wfile is a file-like object used to write back # to the client self.wfile.write(self.data.upper()) The difference is that the "readline()" call in the second handler will call "recv()" multiple times until it encounters a newline character, while the single "recv()" call in the first handler will just return what has been sent from the client in one "sendall()" call. This is the client side: import socket import sys HOST, PORT = "localhost", 9999 data = " ".join(sys.argv[1:]) # Create a socket (SOCK_STREAM means a TCP socket) with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock: # Connect to server and send data sock.connect((HOST, PORT)) sock.sendall(bytes(data + "\n", "utf-8")) # Receive data from the server and shut down received = str(sock.recv(1024), "utf-8") print("Sent: {}".format(data)) print("Received: {}".format(received)) The output of the example should look something like this: Server: $ python TCPServer.py 127.0.0.1 wrote: b'hello world with TCP' 127.0.0.1 wrote: b'python is nice' Client: $ python TCPClient.py hello world with TCP Sent: hello world with TCP Received: HELLO WORLD WITH TCP $ python TCPClient.py python is nice Sent: python is nice Received: PYTHON IS NICE "socketserver.UDPServer" Example -------------------------------- This is the server side: import socketserver class MyUDPHandler(socketserver.BaseRequestHandler): """ This class works similar to the TCP handler class, except that self.request consists of a pair of data and client socket, and since there is no connection the client address must be given explicitly when sending data back via sendto(). """ def handle(self): data = self.request[0].strip() socket = self.request[1] print("{} wrote:".format(self.client_address[0])) print(data) socket.sendto(data.upper(), self.client_address) if __name__ == "__main__": HOST, PORT = "localhost", 9999 with socketserver.UDPServer((HOST, PORT), MyUDPHandler) as server: server.serve_forever() This is the client side: import socket import sys HOST, PORT = "localhost", 9999 data = " ".join(sys.argv[1:]) # SOCK_DGRAM is the socket type to use for UDP sockets sock = socket.socket(socket.AF_INET, socket.SOCK_DGRAM) # As you can see, there is no connect() call; UDP has no connections. # Instead, data is directly sent to the recipient via sendto(). sock.sendto(bytes(data + "\n", "utf-8"), (HOST, PORT)) received = str(sock.recv(1024), "utf-8") print("Sent: {}".format(data)) print("Received: {}".format(received)) The output of the example should look exactly like for the TCP server example. Asynchronous Mixins ------------------- To build asynchronous handlers, use the "ThreadingMixIn" and "ForkingMixIn" classes. An example for the "ThreadingMixIn" class: import socket import threading import socketserver class ThreadedTCPRequestHandler(socketserver.BaseRequestHandler): def handle(self): data = str(self.request.recv(1024), 'ascii') cur_thread = threading.current_thread() response = bytes("{}: {}".format(cur_thread.name, data), 'ascii') self.request.sendall(response) class ThreadedTCPServer(socketserver.ThreadingMixIn, socketserver.TCPServer): pass def client(ip, port, message): with socket.socket(socket.AF_INET, socket.SOCK_STREAM) as sock: sock.connect((ip, port)) sock.sendall(bytes(message, 'ascii')) response = str(sock.recv(1024), 'ascii') print("Received: {}".format(response)) if __name__ == "__main__": # Port 0 means to select an arbitrary unused port HOST, PORT = "localhost", 0 server = ThreadedTCPServer((HOST, PORT), ThreadedTCPRequestHandler) with server: ip, port = server.server_address # Start a thread with the server -- that thread will then start one # more thread for each request server_thread = threading.Thread(target=server.serve_forever) # Exit the server thread when the main thread terminates server_thread.daemon = True server_thread.start() print("Server loop running in thread:", server_thread.name) client(ip, port, "Hello World 1") client(ip, port, "Hello World 2") client(ip, port, "Hello World 3") server.shutdown() The output of the example should look something like this: $ python ThreadedTCPServer.py Server loop running in thread: Thread-1 Received: Thread-2: Hello World 1 Received: Thread-3: Hello World 2 Received: Thread-4: Hello World 3 The "ForkingMixIn" class is used in the same way, except that the server will spawn a new process for each request. Available only on POSIX platforms that support "fork()". "spwd" — The shadow password database ************************************* Deprecated since version 3.11: The "spwd" module is deprecated (see **PEP 594** for details and alternatives). ====================================================================== This module provides access to the Unix shadow password database. It is available on various Unix versions. You must have enough privileges to access the shadow password database (this usually means you have to be root). Shadow password database entries are reported as a tuple-like object, whose attributes correspond to the members of the "spwd" structure (Attribute field below, see ""): +---------+-----------------+-----------------------------------+ | Index | Attribute | Meaning | |=========|=================|===================================| | 0 | "sp_namp" | Login name | +---------+-----------------+-----------------------------------+ | 1 | "sp_pwdp" | Encrypted password | +---------+-----------------+-----------------------------------+ | 2 | "sp_lstchg" | Date of last change | +---------+-----------------+-----------------------------------+ | 3 | "sp_min" | Minimal number of days between | | | | changes | +---------+-----------------+-----------------------------------+ | 4 | "sp_max" | Maximum number of days between | | | | changes | +---------+-----------------+-----------------------------------+ | 5 | "sp_warn" | Number of days before password | | | | expires to warn user about it | +---------+-----------------+-----------------------------------+ | 6 | "sp_inact" | Number of days after password | | | | expires until account is disabled | +---------+-----------------+-----------------------------------+ | 7 | "sp_expire" | Number of days since 1970-01-01 | | | | when account expires | +---------+-----------------+-----------------------------------+ | 8 | "sp_flag" | Reserved | +---------+-----------------+-----------------------------------+ The sp_namp and sp_pwdp items are strings, all others are integers. "KeyError" is raised if the entry asked for cannot be found. The following functions are defined: spwd.getspnam(name) Return the shadow password database entry for the given user name. Changed in version 3.6: Raises a "PermissionError" instead of "KeyError" if the user doesn’t have privileges. spwd.getspall() Return a list of all available shadow password database entries, in arbitrary order. See also: Module "grp" An interface to the group database, similar to this. Module "pwd" An interface to the normal password database, similar to this. "sqlite3" — DB-API 2.0 interface for SQLite databases ***************************************************** **Source code:** Lib/sqlite3/ ====================================================================== SQLite is a C library that provides a lightweight disk-based database that doesn’t require a separate server process and allows accessing the database using a nonstandard variant of the SQL query language. Some applications can use SQLite for internal data storage. It’s also possible to prototype an application using SQLite and then port the code to a larger database such as PostgreSQL or Oracle. The sqlite3 module was written by Gerhard Häring. It provides an SQL interface compliant with the DB-API 2.0 specification described by **PEP 249**. To use the module, start by creating a "Connection" object that represents the database. Here the data will be stored in the "example.db" file: import sqlite3 con = sqlite3.connect('example.db') The special path name ":memory:" can be provided to create a temporary database in RAM. Once a "Connection" has been established, create a "Cursor" object and call its "execute()" method to perform SQL commands: cur = con.cursor() # Create table cur.execute('''CREATE TABLE stocks (date text, trans text, symbol text, qty real, price real)''') # Insert a row of data cur.execute("INSERT INTO stocks VALUES ('2006-01-05','BUY','RHAT',100,35.14)") # Save (commit) the changes con.commit() # We can also close the connection if we are done with it. # Just be sure any changes have been committed or they will be lost. con.close() The saved data is persistent: it can be reloaded in a subsequent session even after restarting the Python interpreter: import sqlite3 con = sqlite3.connect('example.db') cur = con.cursor() To retrieve data after executing a SELECT statement, either treat the cursor as an *iterator*, call the cursor’s "fetchone()" method to retrieve a single matching row, or call "fetchall()" to get a list of the matching rows. This example uses the iterator form: >>> for row in cur.execute('SELECT * FROM stocks ORDER BY price'): print(row) ('2006-01-05', 'BUY', 'RHAT', 100, 35.14) ('2006-03-28', 'BUY', 'IBM', 1000, 45.0) ('2006-04-06', 'SELL', 'IBM', 500, 53.0) ('2006-04-05', 'BUY', 'MSFT', 1000, 72.0) SQL operations usually need to use values from Python variables. However, beware of using Python’s string operations to assemble queries, as they are vulnerable to SQL injection attacks (see the xkcd webcomic for a humorous example of what can go wrong): # Never do this -- insecure! symbol = 'RHAT' cur.execute("SELECT * FROM stocks WHERE symbol = '%s'" % symbol) Instead, use the DB-API’s parameter substitution. To insert a variable into a query string, use a placeholder in the string, and substitute the actual values into the query by providing them as a "tuple" of values to the second argument of the cursor’s "execute()" method. An SQL statement may use one of two kinds of placeholders: question marks (qmark style) or named placeholders (named style). For the qmark style, "parameters" must be a *sequence*. For the named style, it can be either a *sequence* or "dict" instance. The length of the *sequence* must match the number of placeholders, or a "ProgrammingError" is raised. If a "dict" is given, it must contain keys for all named parameters. Any extra items are ignored. Here’s an example of both styles: import sqlite3 con = sqlite3.connect(":memory:") cur = con.cursor() cur.execute("create table lang (name, first_appeared)") # This is the qmark style: cur.execute("insert into lang values (?, ?)", ("C", 1972)) # The qmark style used with executemany(): lang_list = [ ("Fortran", 1957), ("Python", 1991), ("Go", 2009), ] cur.executemany("insert into lang values (?, ?)", lang_list) # And this is the named style: cur.execute("select * from lang where first_appeared=:year", {"year": 1972}) print(cur.fetchall()) con.close() See also: https://www.sqlite.org The SQLite web page; the documentation describes the syntax and the available data types for the supported SQL dialect. https://www.w3schools.com/sql/ Tutorial, reference and examples for learning SQL syntax. **PEP 249** - Database API Specification 2.0 PEP written by Marc-André Lemburg. Module functions and constants ============================== sqlite3.apilevel String constant stating the supported DB-API level. Required by the DB-API. Hard-coded to ""2.0"". sqlite3.paramstyle String constant stating the type of parameter marker formatting expected by the "sqlite3" module. Required by the DB-API. Hard- coded to ""qmark"". Note: The "sqlite3" module supports both "qmark" and "numeric" DB-API parameter styles, because that is what the underlying SQLite library supports. However, the DB-API does not allow multiple values for the "paramstyle" attribute. sqlite3.version The version number of this module, as a string. This is not the version of the SQLite library. sqlite3.version_info The version number of this module, as a tuple of integers. This is not the version of the SQLite library. sqlite3.sqlite_version The version number of the run-time SQLite library, as a string. sqlite3.sqlite_version_info The version number of the run-time SQLite library, as a tuple of integers. sqlite3.threadsafety Integer constant required by the DB-API, stating the level of thread safety the "sqlite3" module supports. Currently hard-coded to "1", meaning *“Threads may share the module, but not connections.”* However, this may not always be true. You can check the underlying SQLite library’s compile-time threaded mode using the following query: import sqlite3 con = sqlite3.connect(":memory:") con.execute(""" select * from pragma_compile_options where compile_options like 'THREADSAFE=%' """).fetchall() Note that the SQLITE_THREADSAFE levels do not match the DB-API 2.0 "threadsafety" levels. sqlite3.PARSE_DECLTYPES This constant is meant to be used with the *detect_types* parameter of the "connect()" function. Setting it makes the "sqlite3" module parse the declared type for each column it returns. It will parse out the first word of the declared type, i. e. for “integer primary key”, it will parse out “integer”, or for “number(10)” it will parse out “number”. Then for that column, it will look into the converters dictionary and use the converter function registered for that type there. sqlite3.PARSE_COLNAMES This constant is meant to be used with the *detect_types* parameter of the "connect()" function. Setting this makes the SQLite interface parse the column name for each column it returns. It will look for a string formed [mytype] in there, and then decide that ‘mytype’ is the type of the column. It will try to find an entry of ‘mytype’ in the converters dictionary and then use the converter function found there to return the value. The column name found in "Cursor.description" does not include the type, i. e. if you use something like "'as "Expiration date [datetime]"'" in your SQL, then we will parse out everything until the first "'['" for the column name and strip the preceding space: the column name would simply be “Expiration date”. sqlite3.connect(database[, timeout, detect_types, isolation_level, check_same_thread, factory, cached_statements, uri]) Opens a connection to the SQLite database file *database*. By default returns a "Connection" object, unless a custom *factory* is given. *database* is a *path-like object* giving the pathname (absolute or relative to the current working directory) of the database file to be opened. You can use "":memory:"" to open a database connection to a database that resides in RAM instead of on disk. When a database is accessed by multiple connections, and one of the processes modifies the database, the SQLite database is locked until that transaction is committed. The *timeout* parameter specifies how long the connection should wait for the lock to go away until raising an exception. The default for the timeout parameter is 5.0 (five seconds). For the *isolation_level* parameter, please see the "isolation_level" property of "Connection" objects. SQLite natively supports only the types TEXT, INTEGER, REAL, BLOB and NULL. If you want to use other types you must add support for them yourself. The *detect_types* parameter and the using custom **converters** registered with the module-level "register_converter()" function allow you to easily do that. *detect_types* defaults to 0 (i. e. off, no type detection), you can set it to any combination of "PARSE_DECLTYPES" and "PARSE_COLNAMES" to turn type detection on. Due to SQLite behaviour, types can’t be detected for generated fields (for example "max(data)"), even when *detect_types* parameter is set. In such case, the returned type is "str". By default, *check_same_thread* is "True" and only the creating thread may use the connection. If set "False", the returned connection may be shared across multiple threads. When using multiple threads with the same connection writing operations should be serialized by the user to avoid data corruption. By default, the "sqlite3" module uses its "Connection" class for the connect call. You can, however, subclass the "Connection" class and make "connect()" use your class instead by providing your class for the *factory* parameter. Consult the section SQLite and Python types of this manual for details. The "sqlite3" module internally uses a statement cache to avoid SQL parsing overhead. If you want to explicitly set the number of statements that are cached for the connection, you can set the *cached_statements* parameter. The currently implemented default is to cache 100 statements. If *uri* is "True", *database* is interpreted as a URI (Uniform Resource Identifier) with a file path and an optional query string. The scheme part *must* be ""file:"". The path can be a relative or absolute file path. The query string allows us to pass parameters to SQLite. Some useful URI tricks include: # Open a database in read-only mode. con = sqlite3.connect("file:template.db?mode=ro", uri=True) # Don't implicitly create a new database file if it does not already exist. # Will raise sqlite3.OperationalError if unable to open a database file. con = sqlite3.connect("file:nosuchdb.db?mode=rw", uri=True) # Create a shared named in-memory database. con1 = sqlite3.connect("file:mem1?mode=memory&cache=shared", uri=True) con2 = sqlite3.connect("file:mem1?mode=memory&cache=shared", uri=True) con1.executescript("create table t(t); insert into t values(28);") rows = con2.execute("select * from t").fetchall() More information about this feature, including a list of recognized parameters, can be found in the SQLite URI documentation. Raises an auditing event "sqlite3.connect" with argument "database". Changed in version 3.4: Added the *uri* parameter. Changed in version 3.7: *database* can now also be a *path-like object*, not only a string. sqlite3.register_converter(typename, callable) Registers a callable to convert a bytestring from the database into a custom Python type. The callable will be invoked for all database values that are of the type *typename*. Confer the parameter *detect_types* of the "connect()" function for how the type detection works. Note that *typename* and the name of the type in your query are matched in case-insensitive manner. sqlite3.register_adapter(type, callable) Registers a callable to convert the custom Python type *type* into one of SQLite’s supported types. The callable *callable* accepts as single parameter the Python value, and must return a value of the following types: int, float, str or bytes. sqlite3.complete_statement(sql) Returns "True" if the string *sql* contains one or more complete SQL statements terminated by semicolons. It does not verify that the SQL is syntactically correct, only that there are no unclosed string literals and the statement is terminated by a semicolon. This can be used to build a shell for SQLite, as in the following example: # A minimal SQLite shell for experiments import sqlite3 con = sqlite3.connect(":memory:") con.isolation_level = None cur = con.cursor() buffer = "" print("Enter your SQL commands to execute in sqlite3.") print("Enter a blank line to exit.") while True: line = input() if line == "": break buffer += line if sqlite3.complete_statement(buffer): try: buffer = buffer.strip() cur.execute(buffer) if buffer.lstrip().upper().startswith("SELECT"): print(cur.fetchall()) except sqlite3.Error as e: print("An error occurred:", e.args[0]) buffer = "" con.close() sqlite3.enable_callback_tracebacks(flag) By default you will not get any tracebacks in user-defined functions, aggregates, converters, authorizer callbacks etc. If you want to debug them, you can call this function with *flag* set to "True". Afterwards, you will get tracebacks from callbacks on "sys.stderr". Use "False" to disable the feature again. Connection Objects ================== class sqlite3.Connection An SQLite database connection has the following attributes and methods: isolation_level Get or set the current default isolation level. "None" for autocommit mode or one of “DEFERRED”, “IMMEDIATE” or “EXCLUSIVE”. See section Controlling Transactions for a more detailed explanation. in_transaction "True" if a transaction is active (there are uncommitted changes), "False" otherwise. Read-only attribute. New in version 3.2. cursor(factory=Cursor) The cursor method accepts a single optional parameter *factory*. If supplied, this must be a callable returning an instance of "Cursor" or its subclasses. commit() This method commits the current transaction. If you don’t call this method, anything you did since the last call to "commit()" is not visible from other database connections. If you wonder why you don’t see the data you’ve written to the database, please check you didn’t forget to call this method. rollback() This method rolls back any changes to the database since the last call to "commit()". close() This closes the database connection. Note that this does not automatically call "commit()". If you just close your database connection without calling "commit()" first, your changes will be lost! execute(sql[, parameters]) Create a new "Cursor" object and call "execute()" on it with the given *sql* and *parameters*. Return the new cursor object. executemany(sql[, parameters]) Create a new "Cursor" object and call "executemany()" on it with the given *sql* and *parameters*. Return the new cursor object. executescript(sql_script) Create a new "Cursor" object and call "executescript()" on it with the given *sql_script*. Return the new cursor object. create_function(name, num_params, func, *, deterministic=False) Creates a user-defined function that you can later use from within SQL statements under the function name *name*. *num_params* is the number of parameters the function accepts (if *num_params* is -1, the function may take any number of arguments), and *func* is a Python callable that is called as the SQL function. If *deterministic* is true, the created function is marked as deterministic, which allows SQLite to perform additional optimizations. This flag is supported by SQLite 3.8.3 or higher, "NotSupportedError" will be raised if used with older versions. The function can return any of the types supported by SQLite: bytes, str, int, float and "None". Changed in version 3.8: The *deterministic* parameter was added. Example: import sqlite3 import hashlib def md5sum(t): return hashlib.md5(t).hexdigest() con = sqlite3.connect(":memory:") con.create_function("md5", 1, md5sum) cur = con.cursor() cur.execute("select md5(?)", (b"foo",)) print(cur.fetchone()[0]) con.close() create_aggregate(name, num_params, aggregate_class) Creates a user-defined aggregate function. The aggregate class must implement a "step" method, which accepts the number of parameters *num_params* (if *num_params* is -1, the function may take any number of arguments), and a "finalize" method which will return the final result of the aggregate. The "finalize" method can return any of the types supported by SQLite: bytes, str, int, float and "None". Example: import sqlite3 class MySum: def __init__(self): self.count = 0 def step(self, value): self.count += value def finalize(self): return self.count con = sqlite3.connect(":memory:") con.create_aggregate("mysum", 1, MySum) cur = con.cursor() cur.execute("create table test(i)") cur.execute("insert into test(i) values (1)") cur.execute("insert into test(i) values (2)") cur.execute("select mysum(i) from test") print(cur.fetchone()[0]) con.close() create_collation(name, callable) Creates a collation with the specified *name* and *callable*. The callable will be passed two string arguments. It should return -1 if the first is ordered lower than the second, 0 if they are ordered equal and 1 if the first is ordered higher than the second. Note that this controls sorting (ORDER BY in SQL) so your comparisons don’t affect other SQL operations. Note that the callable will get its parameters as Python bytestrings, which will normally be encoded in UTF-8. The following example shows a custom collation that sorts “the wrong way”: import sqlite3 def collate_reverse(string1, string2): if string1 == string2: return 0 elif string1 < string2: return 1 else: return -1 con = sqlite3.connect(":memory:") con.create_collation("reverse", collate_reverse) cur = con.cursor() cur.execute("create table test(x)") cur.executemany("insert into test(x) values (?)", [("a",), ("b",)]) cur.execute("select x from test order by x collate reverse") for row in cur: print(row) con.close() To remove a collation, call "create_collation" with "None" as callable: con.create_collation("reverse", None) interrupt() You can call this method from a different thread to abort any queries that might be executing on the connection. The query will then abort and the caller will get an exception. set_authorizer(authorizer_callback) This routine registers a callback. The callback is invoked for each attempt to access a column of a table in the database. The callback should return "SQLITE_OK" if access is allowed, "SQLITE_DENY" if the entire SQL statement should be aborted with an error and "SQLITE_IGNORE" if the column should be treated as a NULL value. These constants are available in the "sqlite3" module. The first argument to the callback signifies what kind of operation is to be authorized. The second and third argument will be arguments or "None" depending on the first argument. The 4th argument is the name of the database (“main”, “temp”, etc.) if applicable. The 5th argument is the name of the inner-most trigger or view that is responsible for the access attempt or "None" if this access attempt is directly from input SQL code. Please consult the SQLite documentation about the possible values for the first argument and the meaning of the second and third argument depending on the first one. All necessary constants are available in the "sqlite3" module. set_progress_handler(handler, n) This routine registers a callback. The callback is invoked for every *n* instructions of the SQLite virtual machine. This is useful if you want to get called from SQLite during long-running operations, for example to update a GUI. If you want to clear any previously installed progress handler, call the method with "None" for *handler*. Returning a non-zero value from the handler function will terminate the currently executing query and cause it to raise an "OperationalError" exception. set_trace_callback(trace_callback) Registers *trace_callback* to be called for each SQL statement that is actually executed by the SQLite backend. The only argument passed to the callback is the statement (as "str") that is being executed. The return value of the callback is ignored. Note that the backend does not only run statements passed to the "Cursor.execute()" methods. Other sources include the transaction management of the sqlite3 module and the execution of triggers defined in the current database. Passing "None" as *trace_callback* will disable the trace callback. Note: Exceptions raised in the trace callback are not propagated. As a development and debugging aid, use "enable_callback_tracebacks()" to enable printing tracebacks from exceptions raised in the trace callback. New in version 3.3. enable_load_extension(enabled) This routine allows/disallows the SQLite engine to load SQLite extensions from shared libraries. SQLite extensions can define new functions, aggregates or whole new virtual table implementations. One well-known extension is the fulltext- search extension distributed with SQLite. Loadable extensions are disabled by default. See [1]. New in version 3.2. import sqlite3 con = sqlite3.connect(":memory:") # enable extension loading con.enable_load_extension(True) # Load the fulltext search extension con.execute("select load_extension('./fts3.so')") # alternatively you can load the extension using an API call: # con.load_extension("./fts3.so") # disable extension loading again con.enable_load_extension(False) # example from SQLite wiki con.execute("create virtual table recipe using fts3(name, ingredients)") con.executescript(""" insert into recipe (name, ingredients) values ('broccoli stew', 'broccoli peppers cheese tomatoes'); insert into recipe (name, ingredients) values ('pumpkin stew', 'pumpkin onions garlic celery'); insert into recipe (name, ingredients) values ('broccoli pie', 'broccoli cheese onions flour'); insert into recipe (name, ingredients) values ('pumpkin pie', 'pumpkin sugar flour butter'); """) for row in con.execute("select rowid, name, ingredients from recipe where name match 'pie'"): print(row) con.close() load_extension(path) This routine loads an SQLite extension from a shared library. You have to enable extension loading with "enable_load_extension()" before you can use this routine. Loadable extensions are disabled by default. See [1]. New in version 3.2. row_factory You can change this attribute to a callable that accepts the cursor and the original row as a tuple and will return the real result row. This way, you can implement more advanced ways of returning results, such as returning an object that can also access columns by name. Example: import sqlite3 def dict_factory(cursor, row): d = {} for idx, col in enumerate(cursor.description): d[col[0]] = row[idx] return d con = sqlite3.connect(":memory:") con.row_factory = dict_factory cur = con.cursor() cur.execute("select 1 as a") print(cur.fetchone()["a"]) con.close() If returning a tuple doesn’t suffice and you want name-based access to columns, you should consider setting "row_factory" to the highly-optimized "sqlite3.Row" type. "Row" provides both index-based and case-insensitive name-based access to columns with almost no memory overhead. It will probably be better than your own custom dictionary-based approach or even a db_row based solution. text_factory Using this attribute you can control what objects are returned for the "TEXT" data type. By default, this attribute is set to "str" and the "sqlite3" module will return "str" objects for "TEXT". If you want to return "bytes" instead, you can set it to "bytes". You can also set it to any other callable that accepts a single bytestring parameter and returns the resulting object. See the following example code for illustration: import sqlite3 con = sqlite3.connect(":memory:") cur = con.cursor() AUSTRIA = "Österreich" # by default, rows are returned as str cur.execute("select ?", (AUSTRIA,)) row = cur.fetchone() assert row[0] == AUSTRIA # but we can make sqlite3 always return bytestrings ... con.text_factory = bytes cur.execute("select ?", (AUSTRIA,)) row = cur.fetchone() assert type(row[0]) is bytes # the bytestrings will be encoded in UTF-8, unless you stored garbage in the # database ... assert row[0] == AUSTRIA.encode("utf-8") # we can also implement a custom text_factory ... # here we implement one that appends "foo" to all strings con.text_factory = lambda x: x.decode("utf-8") + "foo" cur.execute("select ?", ("bar",)) row = cur.fetchone() assert row[0] == "barfoo" con.close() total_changes Returns the total number of database rows that have been modified, inserted, or deleted since the database connection was opened. iterdump() Returns an iterator to dump the database in an SQL text format. Useful when saving an in-memory database for later restoration. This function provides the same capabilities as the ".dump" command in the **sqlite3** shell. Example: # Convert file existing_db.db to SQL dump file dump.sql import sqlite3 con = sqlite3.connect('existing_db.db') with open('dump.sql', 'w') as f: for line in con.iterdump(): f.write('%s\n' % line) con.close() backup(target, *, pages=-1, progress=None, name="main", sleep=0.250) This method makes a backup of an SQLite database even while it’s being accessed by other clients, or concurrently by the same connection. The copy will be written into the mandatory argument *target*, that must be another "Connection" instance. By default, or when *pages* is either "0" or a negative integer, the entire database is copied in a single step; otherwise the method performs a loop copying up to *pages* pages at a time. If *progress* is specified, it must either be "None" or a callable object that will be executed at each iteration with three integer arguments, respectively the *status* of the last iteration, the *remaining* number of pages still to be copied and the *total* number of pages. The *name* argument specifies the database name that will be copied: it must be a string containing either ""main"", the default, to indicate the main database, ""temp"" to indicate the temporary database or the name specified after the "AS" keyword in an "ATTACH DATABASE" statement for an attached database. The *sleep* argument specifies the number of seconds to sleep by between successive attempts to backup remaining pages, can be specified either as an integer or a floating point value. Example 1, copy an existing database into another: import sqlite3 def progress(status, remaining, total): print(f'Copied {total-remaining} of {total} pages...') con = sqlite3.connect('existing_db.db') bck = sqlite3.connect('backup.db') with bck: con.backup(bck, pages=1, progress=progress) bck.close() con.close() Example 2, copy an existing database into a transient copy: import sqlite3 source = sqlite3.connect('existing_db.db') dest = sqlite3.connect(':memory:') source.backup(dest) Availability: SQLite 3.6.11 or higher New in version 3.7. Cursor Objects ============== class sqlite3.Cursor A "Cursor" instance has the following attributes and methods. execute(sql[, parameters]) Executes an SQL statement. Values may be bound to the statement using placeholders. "execute()" will only execute a single SQL statement. If you try to execute more than one statement with it, it will raise a "Warning". Use "executescript()" if you want to execute multiple SQL statements with one call. executemany(sql, seq_of_parameters) Executes a parameterized SQL command against all parameter sequences or mappings found in the sequence *seq_of_parameters*. The "sqlite3" module also allows using an *iterator* yielding parameters instead of a sequence. import sqlite3 class IterChars: def __init__(self): self.count = ord('a') def __iter__(self): return self def __next__(self): if self.count > ord('z'): raise StopIteration self.count += 1 return (chr(self.count - 1),) # this is a 1-tuple con = sqlite3.connect(":memory:") cur = con.cursor() cur.execute("create table characters(c)") theIter = IterChars() cur.executemany("insert into characters(c) values (?)", theIter) cur.execute("select c from characters") print(cur.fetchall()) con.close() Here’s a shorter example using a *generator*: import sqlite3 import string def char_generator(): for c in string.ascii_lowercase: yield (c,) con = sqlite3.connect(":memory:") cur = con.cursor() cur.execute("create table characters(c)") cur.executemany("insert into characters(c) values (?)", char_generator()) cur.execute("select c from characters") print(cur.fetchall()) con.close() executescript(sql_script) This is a nonstandard convenience method for executing multiple SQL statements at once. It issues a "COMMIT" statement first, then executes the SQL script it gets as a parameter. This method disregards "isolation_level"; any transaction control must be added to *sql_script*. *sql_script* can be an instance of "str". Example: import sqlite3 con = sqlite3.connect(":memory:") cur = con.cursor() cur.executescript(""" create table person( firstname, lastname, age ); create table book( title, author, published ); insert into book(title, author, published) values ( 'Dirk Gently''s Holistic Detective Agency', 'Douglas Adams', 1987 ); """) con.close() fetchone() Fetches the next row of a query result set, returning a single sequence, or "None" when no more data is available. fetchmany(size=cursor.arraysize) Fetches the next set of rows of a query result, returning a list. An empty list is returned when no more rows are available. The number of rows to fetch per call is specified by the *size* parameter. If it is not given, the cursor’s arraysize determines the number of rows to be fetched. The method should try to fetch as many rows as indicated by the size parameter. If this is not possible due to the specified number of rows not being available, fewer rows may be returned. Note there are performance considerations involved with the *size* parameter. For optimal performance, it is usually best to use the arraysize attribute. If the *size* parameter is used, then it is best for it to retain the same value from one "fetchmany()" call to the next. fetchall() Fetches all (remaining) rows of a query result, returning a list. Note that the cursor’s arraysize attribute can affect the performance of this operation. An empty list is returned when no rows are available. close() Close the cursor now (rather than whenever "__del__" is called). The cursor will be unusable from this point forward; a "ProgrammingError" exception will be raised if any operation is attempted with the cursor. setinputsizes(sizes) Required by the DB-API. Does nothing in "sqlite3". setoutputsize(size[, column]) Required by the DB-API. Does nothing in "sqlite3". rowcount Although the "Cursor" class of the "sqlite3" module implements this attribute, the database engine’s own support for the determination of “rows affected”/”rows selected” is quirky. For "executemany()" statements, the number of modifications are summed up into "rowcount". As required by the Python DB API Spec, the "rowcount" attribute “is -1 in case no "executeXX()" has been performed on the cursor or the rowcount of the last operation is not determinable by the interface”. This includes "SELECT" statements because we cannot determine the number of rows a query produced until all rows were fetched. With SQLite versions before 3.6.5, "rowcount" is set to 0 if you make a "DELETE FROM table" without any condition. lastrowid This read-only attribute provides the row id of the last inserted row. It is only updated after successful "INSERT" or "REPLACE" statements using the "execute()" method. For other statements, after "executemany()" or "executescript()", or if the insertion failed, the value of "lastrowid" is left unchanged. The initial value of "lastrowid" is "None". Note: Inserts into "WITHOUT ROWID" tables are not recorded. Changed in version 3.6: Added support for the "REPLACE" statement. arraysize Read/write attribute that controls the number of rows returned by "fetchmany()". The default value is 1 which means a single row would be fetched per call. description This read-only attribute provides the column names of the last query. To remain compatible with the Python DB API, it returns a 7-tuple for each column where the last six items of each tuple are "None". It is set for "SELECT" statements without any matching rows as well. connection This read-only attribute provides the SQLite database "Connection" used by the "Cursor" object. A "Cursor" object created by calling "con.cursor()" will have a "connection" attribute that refers to *con*: >>> con = sqlite3.connect(":memory:") >>> cur = con.cursor() >>> cur.connection == con True Row Objects =========== class sqlite3.Row A "Row" instance serves as a highly optimized "row_factory" for "Connection" objects. It tries to mimic a tuple in most of its features. It supports mapping access by column name and index, iteration, representation, equality testing and "len()". If two "Row" objects have exactly the same columns and their members are equal, they compare equal. keys() This method returns a list of column names. Immediately after a query, it is the first member of each tuple in "Cursor.description". Changed in version 3.5: Added support of slicing. Let’s assume we initialize a table as in the example given above: con = sqlite3.connect(":memory:") cur = con.cursor() cur.execute('''create table stocks (date text, trans text, symbol text, qty real, price real)''') cur.execute("""insert into stocks values ('2006-01-05','BUY','RHAT',100,35.14)""") con.commit() cur.close() Now we plug "Row" in: >>> con.row_factory = sqlite3.Row >>> cur = con.cursor() >>> cur.execute('select * from stocks') >>> r = cur.fetchone() >>> type(r) >>> tuple(r) ('2006-01-05', 'BUY', 'RHAT', 100.0, 35.14) >>> len(r) 5 >>> r[2] 'RHAT' >>> r.keys() ['date', 'trans', 'symbol', 'qty', 'price'] >>> r['qty'] 100.0 >>> for member in r: ... print(member) ... 2006-01-05 BUY RHAT 100.0 35.14 Exceptions ========== exception sqlite3.Warning A subclass of "Exception". exception sqlite3.Error The base class of the other exceptions in this module. It is a subclass of "Exception". exception sqlite3.DatabaseError Exception raised for errors that are related to the database. exception sqlite3.IntegrityError Exception raised when the relational integrity of the database is affected, e.g. a foreign key check fails. It is a subclass of "DatabaseError". exception sqlite3.ProgrammingError Exception raised for programming errors, e.g. table not found or already exists, syntax error in the SQL statement, wrong number of parameters specified, etc. It is a subclass of "DatabaseError". exception sqlite3.OperationalError Exception raised for errors that are related to the database’s operation and not necessarily under the control of the programmer, e.g. an unexpected disconnect occurs, the data source name is not found, a transaction could not be processed, etc. It is a subclass of "DatabaseError". exception sqlite3.NotSupportedError Exception raised in case a method or database API was used which is not supported by the database, e.g. calling the "rollback()" method on a connection that does not support transaction or has transactions turned off. It is a subclass of "DatabaseError". SQLite and Python types ======================= Introduction ------------ SQLite natively supports the following types: "NULL", "INTEGER", "REAL", "TEXT", "BLOB". The following Python types can thus be sent to SQLite without any problem: +---------------------------------+---------------+ | Python type | SQLite type | |=================================|===============| | "None" | "NULL" | +---------------------------------+---------------+ | "int" | "INTEGER" | +---------------------------------+---------------+ | "float" | "REAL" | +---------------------------------+---------------+ | "str" | "TEXT" | +---------------------------------+---------------+ | "bytes" | "BLOB" | +---------------------------------+---------------+ This is how SQLite types are converted to Python types by default: +---------------+------------------------------------------------+ | SQLite type | Python type | |===============|================================================| | "NULL" | "None" | +---------------+------------------------------------------------+ | "INTEGER" | "int" | +---------------+------------------------------------------------+ | "REAL" | "float" | +---------------+------------------------------------------------+ | "TEXT" | depends on "text_factory", "str" by default | +---------------+------------------------------------------------+ | "BLOB" | "bytes" | +---------------+------------------------------------------------+ The type system of the "sqlite3" module is extensible in two ways: you can store additional Python types in an SQLite database via object adaptation, and you can let the "sqlite3" module convert SQLite types to different Python types via converters. Using adapters to store additional Python types in SQLite databases ------------------------------------------------------------------- As described before, SQLite supports only a limited set of types natively. To use other Python types with SQLite, you must **adapt** them to one of the sqlite3 module’s supported types for SQLite: one of NoneType, int, float, str, bytes. There are two ways to enable the "sqlite3" module to adapt a custom Python type to one of the supported ones. Letting your object adapt itself ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This is a good approach if you write the class yourself. Let’s suppose you have a class like this: class Point: def __init__(self, x, y): self.x, self.y = x, y Now you want to store the point in a single SQLite column. First you’ll have to choose one of the supported types to be used for representing the point. Let’s just use str and separate the coordinates using a semicolon. Then you need to give your class a method "__conform__(self, protocol)" which must return the converted value. The parameter *protocol* will be "PrepareProtocol". import sqlite3 class Point: def __init__(self, x, y): self.x, self.y = x, y def __conform__(self, protocol): if protocol is sqlite3.PrepareProtocol: return "%f;%f" % (self.x, self.y) con = sqlite3.connect(":memory:") cur = con.cursor() p = Point(4.0, -3.2) cur.execute("select ?", (p,)) print(cur.fetchone()[0]) con.close() Registering an adapter callable ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The other possibility is to create a function that converts the type to the string representation and register the function with "register_adapter()". import sqlite3 class Point: def __init__(self, x, y): self.x, self.y = x, y def adapt_point(point): return "%f;%f" % (point.x, point.y) sqlite3.register_adapter(Point, adapt_point) con = sqlite3.connect(":memory:") cur = con.cursor() p = Point(4.0, -3.2) cur.execute("select ?", (p,)) print(cur.fetchone()[0]) con.close() The "sqlite3" module has two default adapters for Python’s built-in "datetime.date" and "datetime.datetime" types. Now let’s suppose we want to store "datetime.datetime" objects not in ISO representation, but as a Unix timestamp. import sqlite3 import datetime import time def adapt_datetime(ts): return time.mktime(ts.timetuple()) sqlite3.register_adapter(datetime.datetime, adapt_datetime) con = sqlite3.connect(":memory:") cur = con.cursor() now = datetime.datetime.now() cur.execute("select ?", (now,)) print(cur.fetchone()[0]) con.close() Converting SQLite values to custom Python types ----------------------------------------------- Writing an adapter lets you send custom Python types to SQLite. But to make it really useful we need to make the Python to SQLite to Python roundtrip work. Enter converters. Let’s go back to the "Point" class. We stored the x and y coordinates separated via semicolons as strings in SQLite. First, we’ll define a converter function that accepts the string as a parameter and constructs a "Point" object from it. Note: Converter functions **always** get called with a "bytes" object, no matter under which data type you sent the value to SQLite. def convert_point(s): x, y = map(float, s.split(b";")) return Point(x, y) Now you need to make the "sqlite3" module know that what you select from the database is actually a point. There are two ways of doing this: * Implicitly via the declared type * Explicitly via the column name Both ways are described in section Module functions and constants, in the entries for the constants "PARSE_DECLTYPES" and "PARSE_COLNAMES". The following example illustrates both approaches. import sqlite3 class Point: def __init__(self, x, y): self.x, self.y = x, y def __repr__(self): return "(%f;%f)" % (self.x, self.y) def adapt_point(point): return ("%f;%f" % (point.x, point.y)).encode('ascii') def convert_point(s): x, y = list(map(float, s.split(b";"))) return Point(x, y) # Register the adapter sqlite3.register_adapter(Point, adapt_point) # Register the converter sqlite3.register_converter("point", convert_point) p = Point(4.0, -3.2) ######################### # 1) Using declared types con = sqlite3.connect(":memory:", detect_types=sqlite3.PARSE_DECLTYPES) cur = con.cursor() cur.execute("create table test(p point)") cur.execute("insert into test(p) values (?)", (p,)) cur.execute("select p from test") print("with declared types:", cur.fetchone()[0]) cur.close() con.close() ####################### # 1) Using column names con = sqlite3.connect(":memory:", detect_types=sqlite3.PARSE_COLNAMES) cur = con.cursor() cur.execute("create table test(p)") cur.execute("insert into test(p) values (?)", (p,)) cur.execute('select p as "p [point]" from test') print("with column names:", cur.fetchone()[0]) cur.close() con.close() Default adapters and converters ------------------------------- There are default adapters for the date and datetime types in the datetime module. They will be sent as ISO dates/ISO timestamps to SQLite. The default converters are registered under the name “date” for "datetime.date" and under the name “timestamp” for "datetime.datetime". This way, you can use date/timestamps from Python without any additional fiddling in most cases. The format of the adapters is also compatible with the experimental SQLite date/time functions. The following example demonstrates this. import sqlite3 import datetime con = sqlite3.connect(":memory:", detect_types=sqlite3.PARSE_DECLTYPES|sqlite3.PARSE_COLNAMES) cur = con.cursor() cur.execute("create table test(d date, ts timestamp)") today = datetime.date.today() now = datetime.datetime.now() cur.execute("insert into test(d, ts) values (?, ?)", (today, now)) cur.execute("select d, ts from test") row = cur.fetchone() print(today, "=>", row[0], type(row[0])) print(now, "=>", row[1], type(row[1])) cur.execute('select current_date as "d [date]", current_timestamp as "ts [timestamp]"') row = cur.fetchone() print("current_date", row[0], type(row[0])) print("current_timestamp", row[1], type(row[1])) con.close() If a timestamp stored in SQLite has a fractional part longer than 6 numbers, its value will be truncated to microsecond precision by the timestamp converter. Note: The default “timestamp” converter ignores UTC offsets in the database and always returns a naive "datetime.datetime" object. To preserve UTC offsets in timestamps, either leave converters disabled, or register an offset-aware converter with "register_converter()". Controlling Transactions ======================== The underlying "sqlite3" library operates in "autocommit" mode by default, but the Python "sqlite3" module by default does not. "autocommit" mode means that statements that modify the database take effect immediately. A "BEGIN" or "SAVEPOINT" statement disables "autocommit" mode, and a "COMMIT", a "ROLLBACK", or a "RELEASE" that ends the outermost transaction, turns "autocommit" mode back on. The Python "sqlite3" module by default issues a "BEGIN" statement implicitly before a Data Modification Language (DML) statement (i.e. "INSERT"/"UPDATE"/"DELETE"/"REPLACE"). You can control which kind of "BEGIN" statements "sqlite3" implicitly executes via the *isolation_level* parameter to the "connect()" call, or via the "isolation_level" property of connections. If you specify no *isolation_level*, a plain "BEGIN" is used, which is equivalent to specifying "DEFERRED". Other possible values are "IMMEDIATE" and "EXCLUSIVE". You can disable the "sqlite3" module’s implicit transaction management by setting "isolation_level" to "None". This will leave the underlying "sqlite3" library operating in "autocommit" mode. You can then completely control the transaction state by explicitly issuing "BEGIN", "ROLLBACK", "SAVEPOINT", and "RELEASE" statements in your code. Note that "executescript()" disregards "isolation_level"; any transaction control must be added explicitly. Changed in version 3.6: "sqlite3" used to implicitly commit an open transaction before DDL statements. This is no longer the case. Using "sqlite3" efficiently =========================== Using shortcut methods ---------------------- Using the nonstandard "execute()", "executemany()" and "executescript()" methods of the "Connection" object, your code can be written more concisely because you don’t have to create the (often superfluous) "Cursor" objects explicitly. Instead, the "Cursor" objects are created implicitly and these shortcut methods return the cursor objects. This way, you can execute a "SELECT" statement and iterate over it directly using only a single call on the "Connection" object. import sqlite3 langs = [ ("C++", 1985), ("Objective-C", 1984), ] con = sqlite3.connect(":memory:") # Create the table con.execute("create table lang(name, first_appeared)") # Fill the table con.executemany("insert into lang(name, first_appeared) values (?, ?)", langs) # Print the table contents for row in con.execute("select name, first_appeared from lang"): print(row) print("I just deleted", con.execute("delete from lang").rowcount, "rows") # close is not a shortcut method and it's not called automatically, # so the connection object should be closed manually con.close() Accessing columns by name instead of by index --------------------------------------------- One useful feature of the "sqlite3" module is the built-in "sqlite3.Row" class designed to be used as a row factory. Rows wrapped with this class can be accessed both by index (like tuples) and case-insensitively by name: import sqlite3 con = sqlite3.connect(":memory:") con.row_factory = sqlite3.Row cur = con.cursor() cur.execute("select 'John' as name, 42 as age") for row in cur: assert row[0] == row["name"] assert row["name"] == row["nAmE"] assert row[1] == row["age"] assert row[1] == row["AgE"] con.close() Using the connection as a context manager ----------------------------------------- Connection objects can be used as context managers that automatically commit or rollback transactions. In the event of an exception, the transaction is rolled back; otherwise, the transaction is committed: import sqlite3 con = sqlite3.connect(":memory:") con.execute("create table lang (id integer primary key, name varchar unique)") # Successful, con.commit() is called automatically afterwards with con: con.execute("insert into lang(name) values (?)", ("Python",)) # con.rollback() is called after the with block finishes with an exception, the # exception is still raised and must be caught try: with con: con.execute("insert into lang(name) values (?)", ("Python",)) except sqlite3.IntegrityError: print("couldn't add Python twice") # Connection object used as context manager only commits or rollbacks transactions, # so the connection object should be closed manually con.close() -[ Footnotes ]- [1] The sqlite3 module is not built with loadable extension support by default, because some platforms (notably macOS) have SQLite libraries which are compiled without this feature. To get loadable extension support, you must pass "--enable-loadable-sqlite- extensions" to configure. "ssl" — TLS/SSL wrapper for socket objects ****************************************** **Source code:** Lib/ssl.py ====================================================================== This module provides access to Transport Layer Security (often known as “Secure Sockets Layer”) encryption and peer authentication facilities for network sockets, both client-side and server-side. This module uses the OpenSSL library. It is available on all modern Unix systems, Windows, macOS, and probably additional platforms, as long as OpenSSL is installed on that platform. Note: Some behavior may be platform dependent, since calls are made to the operating system socket APIs. The installed version of OpenSSL may also cause variations in behavior. For example, TLSv1.1 and TLSv1.2 come with openssl version 1.0.1. Warning: Don’t use this module without reading the Security considerations. Doing so may lead to a false sense of security, as the default settings of the ssl module are not necessarily appropriate for your application. This section documents the objects and functions in the "ssl" module; for more general information about TLS, SSL, and certificates, the reader is referred to the documents in the “See Also” section at the bottom. This module provides a class, "ssl.SSLSocket", which is derived from the "socket.socket" type, and provides a socket-like wrapper that also encrypts and decrypts the data going over the socket with SSL. It supports additional methods such as "getpeercert()", which retrieves the certificate of the other side of the connection, and "cipher()", which retrieves the cipher being used for the secure connection. For more sophisticated applications, the "ssl.SSLContext" class helps manage settings and certificates, which can then be inherited by SSL sockets created through the "SSLContext.wrap_socket()" method. Changed in version 3.5.3: Updated to support linking with OpenSSL 1.1.0 Changed in version 3.6: OpenSSL 0.9.8, 1.0.0 and 1.0.1 are deprecated and no longer supported. In the future the ssl module will require at least OpenSSL 1.0.2 or 1.1.0. Functions, Constants, and Exceptions ==================================== Socket creation --------------- Since Python 3.2 and 2.7.9, it is recommended to use the "SSLContext.wrap_socket()" of an "SSLContext" instance to wrap sockets as "SSLSocket" objects. The helper functions "create_default_context()" returns a new context with secure default settings. The old "wrap_socket()" function is deprecated since it is both inefficient and has no support for server name indication (SNI) and hostname matching. Client socket example with default context and IPv4/IPv6 dual stack: import socket import ssl hostname = 'www.python.org' context = ssl.create_default_context() with socket.create_connection((hostname, 443)) as sock: with context.wrap_socket(sock, server_hostname=hostname) as ssock: print(ssock.version()) Client socket example with custom context and IPv4: hostname = 'www.python.org' # PROTOCOL_TLS_CLIENT requires valid cert chain and hostname context = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT) context.load_verify_locations('path/to/cabundle.pem') with socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0) as sock: with context.wrap_socket(sock, server_hostname=hostname) as ssock: print(ssock.version()) Server socket example listening on localhost IPv4: context = ssl.SSLContext(ssl.PROTOCOL_TLS_SERVER) context.load_cert_chain('/path/to/certchain.pem', '/path/to/private.key') with socket.socket(socket.AF_INET, socket.SOCK_STREAM, 0) as sock: sock.bind(('127.0.0.1', 8443)) sock.listen(5) with context.wrap_socket(sock, server_side=True) as ssock: conn, addr = ssock.accept() ... Context creation ---------------- A convenience function helps create "SSLContext" objects for common purposes. ssl.create_default_context(purpose=Purpose.SERVER_AUTH, cafile=None, capath=None, cadata=None) Return a new "SSLContext" object with default settings for the given *purpose*. The settings are chosen by the "ssl" module, and usually represent a higher security level than when calling the "SSLContext" constructor directly. *cafile*, *capath*, *cadata* represent optional CA certificates to trust for certificate verification, as in "SSLContext.load_verify_locations()". If all three are "None", this function can choose to trust the system’s default CA certificates instead. The settings are: "PROTOCOL_TLS", "OP_NO_SSLv2", and "OP_NO_SSLv3" with high encryption cipher suites without RC4 and without unauthenticated cipher suites. Passing "SERVER_AUTH" as *purpose* sets "verify_mode" to "CERT_REQUIRED" and either loads CA certificates (when at least one of *cafile*, *capath* or *cadata* is given) or uses "SSLContext.load_default_certs()" to load default CA certificates. When "keylog_filename" is supported and the environment variable "SSLKEYLOGFILE" is set, "create_default_context()" enables key logging. Note: The protocol, options, cipher and other settings may change to more restrictive values anytime without prior deprecation. The values represent a fair balance between compatibility and security.If your application needs specific settings, you should create a "SSLContext" and apply the settings yourself. Note: If you find that when certain older clients or servers attempt to connect with a "SSLContext" created by this function that they get an error stating “Protocol or cipher suite mismatch”, it may be that they only support SSL3.0 which this function excludes using the "OP_NO_SSLv3". SSL3.0 is widely considered to be completely broken. If you still wish to continue to use this function but still allow SSL 3.0 connections you can re-enable them using: ctx = ssl.create_default_context(Purpose.CLIENT_AUTH) ctx.options &= ~ssl.OP_NO_SSLv3 New in version 3.4. Changed in version 3.4.4: RC4 was dropped from the default cipher string. Changed in version 3.6: ChaCha20/Poly1305 was added to the default cipher string.3DES was dropped from the default cipher string. Changed in version 3.8: Support for key logging to "SSLKEYLOGFILE" was added. Exceptions ---------- exception ssl.SSLError Raised to signal an error from the underlying SSL implementation (currently provided by the OpenSSL library). This signifies some problem in the higher-level encryption and authentication layer that’s superimposed on the underlying network connection. This error is a subtype of "OSError". The error code and message of "SSLError" instances are provided by the OpenSSL library. Changed in version 3.3: "SSLError" used to be a subtype of "socket.error". library A string mnemonic designating the OpenSSL submodule in which the error occurred, such as "SSL", "PEM" or "X509". The range of possible values depends on the OpenSSL version. New in version 3.3. reason A string mnemonic designating the reason this error occurred, for example "CERTIFICATE_VERIFY_FAILED". The range of possible values depends on the OpenSSL version. New in version 3.3. exception ssl.SSLZeroReturnError A subclass of "SSLError" raised when trying to read or write and the SSL connection has been closed cleanly. Note that this doesn’t mean that the underlying transport (read TCP) has been closed. New in version 3.3. exception ssl.SSLWantReadError A subclass of "SSLError" raised by a non-blocking SSL socket when trying to read or write data, but more data needs to be received on the underlying TCP transport before the request can be fulfilled. New in version 3.3. exception ssl.SSLWantWriteError A subclass of "SSLError" raised by a non-blocking SSL socket when trying to read or write data, but more data needs to be sent on the underlying TCP transport before the request can be fulfilled. New in version 3.3. exception ssl.SSLSyscallError A subclass of "SSLError" raised when a system error was encountered while trying to fulfill an operation on a SSL socket. Unfortunately, there is no easy way to inspect the original errno number. New in version 3.3. exception ssl.SSLEOFError A subclass of "SSLError" raised when the SSL connection has been terminated abruptly. Generally, you shouldn’t try to reuse the underlying transport when this error is encountered. New in version 3.3. exception ssl.SSLCertVerificationError A subclass of "SSLError" raised when certificate validation has failed. New in version 3.7. verify_code A numeric error number that denotes the verification error. verify_message A human readable string of the verification error. exception ssl.CertificateError An alias for "SSLCertVerificationError". Changed in version 3.7: The exception is now an alias for "SSLCertVerificationError". Random generation ----------------- ssl.RAND_bytes(num) Return *num* cryptographically strong pseudo-random bytes. Raises an "SSLError" if the PRNG has not been seeded with enough data or if the operation is not supported by the current RAND method. "RAND_status()" can be used to check the status of the PRNG and "RAND_add()" can be used to seed the PRNG. For almost all applications "os.urandom()" is preferable. Read the Wikipedia article, Cryptographically secure pseudorandom number generator (CSPRNG), to get the requirements of a cryptographically strong generator. New in version 3.3. ssl.RAND_pseudo_bytes(num) Return (bytes, is_cryptographic): bytes are *num* pseudo-random bytes, is_cryptographic is "True" if the bytes generated are cryptographically strong. Raises an "SSLError" if the operation is not supported by the current RAND method. Generated pseudo-random byte sequences will be unique if they are of sufficient length, but are not necessarily unpredictable. They can be used for non-cryptographic purposes and for certain purposes in cryptographic protocols, but usually not for key generation etc. For almost all applications "os.urandom()" is preferable. New in version 3.3. Deprecated since version 3.6: OpenSSL has deprecated "ssl.RAND_pseudo_bytes()", use "ssl.RAND_bytes()" instead. ssl.RAND_status() Return "True" if the SSL pseudo-random number generator has been seeded with ‘enough’ randomness, and "False" otherwise. You can use "ssl.RAND_egd()" and "ssl.RAND_add()" to increase the randomness of the pseudo-random number generator. ssl.RAND_egd(path) If you are running an entropy-gathering daemon (EGD) somewhere, and *path* is the pathname of a socket connection open to it, this will read 256 bytes of randomness from the socket, and add it to the SSL pseudo-random number generator to increase the security of generated secret keys. This is typically only necessary on systems without better sources of randomness. See http://egd.sourceforge.net/ or http://prngd.sourceforge.net/ for sources of entropy-gathering daemons. Availability: not available with LibreSSL and OpenSSL > 1.1.0. ssl.RAND_add(bytes, entropy) Mix the given *bytes* into the SSL pseudo-random number generator. The parameter *entropy* (a float) is a lower bound on the entropy contained in string (so you can always use "0.0"). See **RFC 1750** for more information on sources of entropy. Changed in version 3.5: Writable *bytes-like object* is now accepted. Certificate handling -------------------- ssl.match_hostname(cert, hostname) Verify that *cert* (in decoded format as returned by "SSLSocket.getpeercert()") matches the given *hostname*. The rules applied are those for checking the identity of HTTPS servers as outlined in **RFC 2818**, **RFC 5280** and **RFC 6125**. In addition to HTTPS, this function should be suitable for checking the identity of servers in various SSL-based protocols such as FTPS, IMAPS, POPS and others. "CertificateError" is raised on failure. On success, the function returns nothing: >>> cert = {'subject': ((('commonName', 'example.com'),),)} >>> ssl.match_hostname(cert, "example.com") >>> ssl.match_hostname(cert, "example.org") Traceback (most recent call last): File "", line 1, in File "/home/py3k/Lib/ssl.py", line 130, in match_hostname ssl.CertificateError: hostname 'example.org' doesn't match 'example.com' New in version 3.2. Changed in version 3.3.3: The function now follows **RFC 6125**, section 6.4.3 and does neither match multiple wildcards (e.g. "*.*.com" or "*a*.example.org") nor a wildcard inside an internationalized domain names (IDN) fragment. IDN A-labels such as "www*.xn--pthon-kva.org" are still supported, but "x*.python.org" no longer matches "xn--tda.python.org". Changed in version 3.5: Matching of IP addresses, when present in the subjectAltName field of the certificate, is now supported. Changed in version 3.7: The function is no longer used to TLS connections. Hostname matching is now performed by OpenSSL.Allow wildcard when it is the leftmost and the only character in that segment. Partial wildcards like "www*.example.com" are no longer supported. Deprecated since version 3.7. ssl.cert_time_to_seconds(cert_time) Return the time in seconds since the Epoch, given the "cert_time" string representing the “notBefore” or “notAfter” date from a certificate in ""%b %d %H:%M:%S %Y %Z"" strptime format (C locale). Here’s an example: >>> import ssl >>> timestamp = ssl.cert_time_to_seconds("Jan 5 09:34:43 2018 GMT") >>> timestamp 1515144883 >>> from datetime import datetime >>> print(datetime.utcfromtimestamp(timestamp)) 2018-01-05 09:34:43 “notBefore” or “notAfter” dates must use GMT (**RFC 5280**). Changed in version 3.5: Interpret the input time as a time in UTC as specified by ‘GMT’ timezone in the input string. Local timezone was used previously. Return an integer (no fractions of a second in the input format) ssl.get_server_certificate(addr, ssl_version=PROTOCOL_TLS, ca_certs=None) Given the address "addr" of an SSL-protected server, as a (*hostname*, *port-number*) pair, fetches the server’s certificate, and returns it as a PEM-encoded string. If "ssl_version" is specified, uses that version of the SSL protocol to attempt to connect to the server. If "ca_certs" is specified, it should be a file containing a list of root certificates, the same format as used for the same parameter in "SSLContext.wrap_socket()". The call will attempt to validate the server certificate against that set of root certificates, and will fail if the validation attempt fails. Changed in version 3.3: This function is now IPv6-compatible. Changed in version 3.5: The default *ssl_version* is changed from "PROTOCOL_SSLv3" to "PROTOCOL_TLS" for maximum compatibility with modern servers. ssl.DER_cert_to_PEM_cert(DER_cert_bytes) Given a certificate as a DER-encoded blob of bytes, returns a PEM- encoded string version of the same certificate. ssl.PEM_cert_to_DER_cert(PEM_cert_string) Given a certificate as an ASCII PEM string, returns a DER-encoded sequence of bytes for that same certificate. ssl.get_default_verify_paths() Returns a named tuple with paths to OpenSSL’s default cafile and capath. The paths are the same as used by "SSLContext.set_default_verify_paths()". The return value is a *named tuple* "DefaultVerifyPaths": * "cafile" - resolved path to cafile or "None" if the file doesn’t exist, * "capath" - resolved path to capath or "None" if the directory doesn’t exist, * "openssl_cafile_env" - OpenSSL’s environment key that points to a cafile, * "openssl_cafile" - hard coded path to a cafile, * "openssl_capath_env" - OpenSSL’s environment key that points to a capath, * "openssl_capath" - hard coded path to a capath directory Availability: LibreSSL ignores the environment vars "openssl_cafile_env" and "openssl_capath_env". New in version 3.4. ssl.enum_certificates(store_name) Retrieve certificates from Windows’ system cert store. *store_name* may be one of "CA", "ROOT" or "MY". Windows may provide additional cert stores, too. The function returns a list of (cert_bytes, encoding_type, trust) tuples. The encoding_type specifies the encoding of cert_bytes. It is either "x509_asn" for X.509 ASN.1 data or "pkcs_7_asn" for PKCS#7 ASN.1 data. Trust specifies the purpose of the certificate as a set of OIDS or exactly "True" if the certificate is trustworthy for all purposes. Example: >>> ssl.enum_certificates("CA") [(b'data...', 'x509_asn', {'1.3.6.1.5.5.7.3.1', '1.3.6.1.5.5.7.3.2'}), (b'data...', 'x509_asn', True)] Availability: Windows. New in version 3.4. ssl.enum_crls(store_name) Retrieve CRLs from Windows’ system cert store. *store_name* may be one of "CA", "ROOT" or "MY". Windows may provide additional cert stores, too. The function returns a list of (cert_bytes, encoding_type, trust) tuples. The encoding_type specifies the encoding of cert_bytes. It is either "x509_asn" for X.509 ASN.1 data or "pkcs_7_asn" for PKCS#7 ASN.1 data. Availability: Windows. New in version 3.4. ssl.wrap_socket(sock, keyfile=None, certfile=None, server_side=False, cert_reqs=CERT_NONE, ssl_version=PROTOCOL_TLS, ca_certs=None, do_handshake_on_connect=True, suppress_ragged_eofs=True, ciphers=None) Takes an instance "sock" of "socket.socket", and returns an instance of "ssl.SSLSocket", a subtype of "socket.socket", which wraps the underlying socket in an SSL context. "sock" must be a "SOCK_STREAM" socket; other socket types are unsupported. Internally, function creates a "SSLContext" with protocol *ssl_version* and "SSLContext.options" set to *cert_reqs*. If parameters *keyfile*, *certfile*, *ca_certs* or *ciphers* are set, then the values are passed to "SSLContext.load_cert_chain()", "SSLContext.load_verify_locations()", and "SSLContext.set_ciphers()". The arguments *server_side*, *do_handshake_on_connect*, and *suppress_ragged_eofs* have the same meaning as "SSLContext.wrap_socket()". Deprecated since version 3.7: Since Python 3.2 and 2.7.9, it is recommended to use the "SSLContext.wrap_socket()" instead of "wrap_socket()". The top-level function is limited and creates an insecure client socket without server name indication or hostname matching. Constants --------- All constants are now "enum.IntEnum" or "enum.IntFlag" collections. New in version 3.6. ssl.CERT_NONE Possible value for "SSLContext.verify_mode", or the "cert_reqs" parameter to "wrap_socket()". Except for "PROTOCOL_TLS_CLIENT", it is the default mode. With client-side sockets, just about any cert is accepted. Validation errors, such as untrusted or expired cert, are ignored and do not abort the TLS/SSL handshake. In server mode, no certificate is requested from the client, so the client does not send any for client cert authentication. See the discussion of Security considerations below. ssl.CERT_OPTIONAL Possible value for "SSLContext.verify_mode", or the "cert_reqs" parameter to "wrap_socket()". In client mode, "CERT_OPTIONAL" has the same meaning as "CERT_REQUIRED". It is recommended to use "CERT_REQUIRED" for client-side sockets instead. In server mode, a client certificate request is sent to the client. The client may either ignore the request or send a certificate in order perform TLS client cert authentication. If the client chooses to send a certificate, it is verified. Any verification error immediately aborts the TLS handshake. Use of this setting requires a valid set of CA certificates to be passed, either to "SSLContext.load_verify_locations()" or as a value of the "ca_certs" parameter to "wrap_socket()". ssl.CERT_REQUIRED Possible value for "SSLContext.verify_mode", or the "cert_reqs" parameter to "wrap_socket()". In this mode, certificates are required from the other side of the socket connection; an "SSLError" will be raised if no certificate is provided, or if its validation fails. This mode is **not** sufficient to verify a certificate in client mode as it does not match hostnames. "check_hostname" must be enabled as well to verify the authenticity of a cert. "PROTOCOL_TLS_CLIENT" uses "CERT_REQUIRED" and enables "check_hostname" by default. With server socket, this mode provides mandatory TLS client cert authentication. A client certificate request is sent to the client and the client must provide a valid and trusted certificate. Use of this setting requires a valid set of CA certificates to be passed, either to "SSLContext.load_verify_locations()" or as a value of the "ca_certs" parameter to "wrap_socket()". class ssl.VerifyMode "enum.IntEnum" collection of CERT_* constants. New in version 3.6. ssl.VERIFY_DEFAULT Possible value for "SSLContext.verify_flags". In this mode, certificate revocation lists (CRLs) are not checked. By default OpenSSL does neither require nor verify CRLs. New in version 3.4. ssl.VERIFY_CRL_CHECK_LEAF Possible value for "SSLContext.verify_flags". In this mode, only the peer cert is checked but none of the intermediate CA certificates. The mode requires a valid CRL that is signed by the peer cert’s issuer (its direct ancestor CA). If no proper CRL has been loaded with "SSLContext.load_verify_locations", validation will fail. New in version 3.4. ssl.VERIFY_CRL_CHECK_CHAIN Possible value for "SSLContext.verify_flags". In this mode, CRLs of all certificates in the peer cert chain are checked. New in version 3.4. ssl.VERIFY_X509_STRICT Possible value for "SSLContext.verify_flags" to disable workarounds for broken X.509 certificates. New in version 3.4. ssl.VERIFY_X509_TRUSTED_FIRST Possible value for "SSLContext.verify_flags". It instructs OpenSSL to prefer trusted certificates when building the trust chain to validate a certificate. This flag is enabled by default. New in version 3.4.4. class ssl.VerifyFlags "enum.IntFlag" collection of VERIFY_* constants. New in version 3.6. ssl.PROTOCOL_TLS Selects the highest protocol version that both the client and server support. Despite the name, this option can select both “SSL” and “TLS” protocols. New in version 3.6. ssl.PROTOCOL_TLS_CLIENT Auto-negotiate the highest protocol version like "PROTOCOL_TLS", but only support client-side "SSLSocket" connections. The protocol enables "CERT_REQUIRED" and "check_hostname" by default. New in version 3.6. ssl.PROTOCOL_TLS_SERVER Auto-negotiate the highest protocol version like "PROTOCOL_TLS", but only support server-side "SSLSocket" connections. New in version 3.6. ssl.PROTOCOL_SSLv23 Alias for "PROTOCOL_TLS". Deprecated since version 3.6: Use "PROTOCOL_TLS" instead. ssl.PROTOCOL_SSLv2 Selects SSL version 2 as the channel encryption protocol. This protocol is not available if OpenSSL is compiled with the "OPENSSL_NO_SSL2" flag. Warning: SSL version 2 is insecure. Its use is highly discouraged. Deprecated since version 3.6: OpenSSL has removed support for SSLv2. ssl.PROTOCOL_SSLv3 Selects SSL version 3 as the channel encryption protocol. This protocol is not be available if OpenSSL is compiled with the "OPENSSL_NO_SSLv3" flag. Warning: SSL version 3 is insecure. Its use is highly discouraged. Deprecated since version 3.6: OpenSSL has deprecated all version specific protocols. Use the default protocol "PROTOCOL_TLS" with flags like "OP_NO_SSLv3" instead. ssl.PROTOCOL_TLSv1 Selects TLS version 1.0 as the channel encryption protocol. Deprecated since version 3.6: OpenSSL has deprecated all version specific protocols. Use the default protocol "PROTOCOL_TLS" with flags like "OP_NO_SSLv3" instead. ssl.PROTOCOL_TLSv1_1 Selects TLS version 1.1 as the channel encryption protocol. Available only with openssl version 1.0.1+. New in version 3.4. Deprecated since version 3.6: OpenSSL has deprecated all version specific protocols. Use the default protocol "PROTOCOL_TLS" with flags like "OP_NO_SSLv3" instead. ssl.PROTOCOL_TLSv1_2 Selects TLS version 1.2 as the channel encryption protocol. This is the most modern version, and probably the best choice for maximum protection, if both sides can speak it. Available only with openssl version 1.0.1+. New in version 3.4. Deprecated since version 3.6: OpenSSL has deprecated all version specific protocols. Use the default protocol "PROTOCOL_TLS" with flags like "OP_NO_SSLv3" instead. ssl.OP_ALL Enables workarounds for various bugs present in other SSL implementations. This option is set by default. It does not necessarily set the same flags as OpenSSL’s "SSL_OP_ALL" constant. New in version 3.2. ssl.OP_NO_SSLv2 Prevents an SSLv2 connection. This option is only applicable in conjunction with "PROTOCOL_TLS". It prevents the peers from choosing SSLv2 as the protocol version. New in version 3.2. Deprecated since version 3.6: SSLv2 is deprecated ssl.OP_NO_SSLv3 Prevents an SSLv3 connection. This option is only applicable in conjunction with "PROTOCOL_TLS". It prevents the peers from choosing SSLv3 as the protocol version. New in version 3.2. Deprecated since version 3.6: SSLv3 is deprecated ssl.OP_NO_TLSv1 Prevents a TLSv1 connection. This option is only applicable in conjunction with "PROTOCOL_TLS". It prevents the peers from choosing TLSv1 as the protocol version. New in version 3.2. Deprecated since version 3.7: The option is deprecated since OpenSSL 1.1.0, use the new "SSLContext.minimum_version" and "SSLContext.maximum_version" instead. ssl.OP_NO_TLSv1_1 Prevents a TLSv1.1 connection. This option is only applicable in conjunction with "PROTOCOL_TLS". It prevents the peers from choosing TLSv1.1 as the protocol version. Available only with openssl version 1.0.1+. New in version 3.4. Deprecated since version 3.7: The option is deprecated since OpenSSL 1.1.0. ssl.OP_NO_TLSv1_2 Prevents a TLSv1.2 connection. This option is only applicable in conjunction with "PROTOCOL_TLS". It prevents the peers from choosing TLSv1.2 as the protocol version. Available only with openssl version 1.0.1+. New in version 3.4. Deprecated since version 3.7: The option is deprecated since OpenSSL 1.1.0. ssl.OP_NO_TLSv1_3 Prevents a TLSv1.3 connection. This option is only applicable in conjunction with "PROTOCOL_TLS". It prevents the peers from choosing TLSv1.3 as the protocol version. TLS 1.3 is available with OpenSSL 1.1.1 or later. When Python has been compiled against an older version of OpenSSL, the flag defaults to *0*. New in version 3.7. Deprecated since version 3.7: The option is deprecated since OpenSSL 1.1.0. It was added to 2.7.15, 3.6.3 and 3.7.0 for backwards compatibility with OpenSSL 1.0.2. ssl.OP_NO_RENEGOTIATION Disable all renegotiation in TLSv1.2 and earlier. Do not send HelloRequest messages, and ignore renegotiation requests via ClientHello. This option is only available with OpenSSL 1.1.0h and later. New in version 3.7. ssl.OP_CIPHER_SERVER_PREFERENCE Use the server’s cipher ordering preference, rather than the client’s. This option has no effect on client sockets and SSLv2 server sockets. New in version 3.3. ssl.OP_SINGLE_DH_USE Prevents re-use of the same DH key for distinct SSL sessions. This improves forward secrecy but requires more computational resources. This option only applies to server sockets. New in version 3.3. ssl.OP_SINGLE_ECDH_USE Prevents re-use of the same ECDH key for distinct SSL sessions. This improves forward secrecy but requires more computational resources. This option only applies to server sockets. New in version 3.3. ssl.OP_ENABLE_MIDDLEBOX_COMPAT Send dummy Change Cipher Spec (CCS) messages in TLS 1.3 handshake to make a TLS 1.3 connection look more like a TLS 1.2 connection. This option is only available with OpenSSL 1.1.1 and later. New in version 3.8. ssl.OP_NO_COMPRESSION Disable compression on the SSL channel. This is useful if the application protocol supports its own compression scheme. This option is only available with OpenSSL 1.0.0 and later. New in version 3.3. class ssl.Options "enum.IntFlag" collection of OP_* constants. ssl.OP_NO_TICKET Prevent client side from requesting a session ticket. New in version 3.6. ssl.OP_IGNORE_UNEXPECTED_EOF Ignore unexpected shutdown of TLS connections. This option is only available with OpenSSL 3.0.0 and later. New in version 3.10. ssl.HAS_ALPN Whether the OpenSSL library has built-in support for the *Application-Layer Protocol Negotiation* TLS extension as described in **RFC 7301**. New in version 3.5. ssl.HAS_NEVER_CHECK_COMMON_NAME Whether the OpenSSL library has built-in support not checking subject common name and "SSLContext.hostname_checks_common_name" is writeable. New in version 3.7. ssl.HAS_ECDH Whether the OpenSSL library has built-in support for the Elliptic Curve-based Diffie-Hellman key exchange. This should be true unless the feature was explicitly disabled by the distributor. New in version 3.3. ssl.HAS_SNI Whether the OpenSSL library has built-in support for the *Server Name Indication* extension (as defined in **RFC 6066**). New in version 3.2. ssl.HAS_NPN Whether the OpenSSL library has built-in support for the *Next Protocol Negotiation* as described in the Application Layer Protocol Negotiation. When true, you can use the "SSLContext.set_npn_protocols()" method to advertise which protocols you want to support. New in version 3.3. ssl.HAS_SSLv2 Whether the OpenSSL library has built-in support for the SSL 2.0 protocol. New in version 3.7. ssl.HAS_SSLv3 Whether the OpenSSL library has built-in support for the SSL 3.0 protocol. New in version 3.7. ssl.HAS_TLSv1 Whether the OpenSSL library has built-in support for the TLS 1.0 protocol. New in version 3.7. ssl.HAS_TLSv1_1 Whether the OpenSSL library has built-in support for the TLS 1.1 protocol. New in version 3.7. ssl.HAS_TLSv1_2 Whether the OpenSSL library has built-in support for the TLS 1.2 protocol. New in version 3.7. ssl.HAS_TLSv1_3 Whether the OpenSSL library has built-in support for the TLS 1.3 protocol. New in version 3.7. ssl.CHANNEL_BINDING_TYPES List of supported TLS channel binding types. Strings in this list can be used as arguments to "SSLSocket.get_channel_binding()". New in version 3.3. ssl.OPENSSL_VERSION The version string of the OpenSSL library loaded by the interpreter: >>> ssl.OPENSSL_VERSION 'OpenSSL 1.0.2k 26 Jan 2017' New in version 3.2. ssl.OPENSSL_VERSION_INFO A tuple of five integers representing version information about the OpenSSL library: >>> ssl.OPENSSL_VERSION_INFO (1, 0, 2, 11, 15) New in version 3.2. ssl.OPENSSL_VERSION_NUMBER The raw version number of the OpenSSL library, as a single integer: >>> ssl.OPENSSL_VERSION_NUMBER 268443839 >>> hex(ssl.OPENSSL_VERSION_NUMBER) '0x100020bf' New in version 3.2. ssl.ALERT_DESCRIPTION_HANDSHAKE_FAILURE ssl.ALERT_DESCRIPTION_INTERNAL_ERROR ALERT_DESCRIPTION_* Alert Descriptions from **RFC 5246** and others. The IANA TLS Alert Registry contains this list and references to the RFCs where their meaning is defined. Used as the return value of the callback function in "SSLContext.set_servername_callback()". New in version 3.4. class ssl.AlertDescription "enum.IntEnum" collection of ALERT_DESCRIPTION_* constants. New in version 3.6. Purpose.SERVER_AUTH Option for "create_default_context()" and "SSLContext.load_default_certs()". This value indicates that the context may be used to authenticate Web servers (therefore, it will be used to create client-side sockets). New in version 3.4. Purpose.CLIENT_AUTH Option for "create_default_context()" and "SSLContext.load_default_certs()". This value indicates that the context may be used to authenticate Web clients (therefore, it will be used to create server-side sockets). New in version 3.4. class ssl.SSLErrorNumber "enum.IntEnum" collection of SSL_ERROR_* constants. New in version 3.6. class ssl.TLSVersion "enum.IntEnum" collection of SSL and TLS versions for "SSLContext.maximum_version" and "SSLContext.minimum_version". New in version 3.7. TLSVersion.MINIMUM_SUPPORTED TLSVersion.MAXIMUM_SUPPORTED The minimum or maximum supported SSL or TLS version. These are magic constants. Their values don’t reflect the lowest and highest available TLS/SSL versions. TLSVersion.SSLv3 TLSVersion.TLSv1 TLSVersion.TLSv1_1 TLSVersion.TLSv1_2 TLSVersion.TLSv1_3 SSL 3.0 to TLS 1.3. SSL Sockets =========== class ssl.SSLSocket(socket.socket) SSL sockets provide the following methods of Socket Objects: * "accept()" * "bind()" * "close()" * "connect()" * "detach()" * "fileno()" * "getpeername()", "getsockname()" * "getsockopt()", "setsockopt()" * "gettimeout()", "settimeout()", "setblocking()" * "listen()" * "makefile()" * "recv()", "recv_into()" (but passing a non-zero "flags" argument is not allowed) * "send()", "sendall()" (with the same limitation) * "sendfile()" (but "os.sendfile" will be used for plain-text sockets only, else "send()" will be used) * "shutdown()" However, since the SSL (and TLS) protocol has its own framing atop of TCP, the SSL sockets abstraction can, in certain respects, diverge from the specification of normal, OS-level sockets. See especially the notes on non-blocking sockets. Instances of "SSLSocket" must be created using the "SSLContext.wrap_socket()" method. Changed in version 3.5: The "sendfile()" method was added. Changed in version 3.5: The "shutdown()" does not reset the socket timeout each time bytes are received or sent. The socket timeout is now to maximum total duration of the shutdown. Deprecated since version 3.6: It is deprecated to create a "SSLSocket" instance directly, use "SSLContext.wrap_socket()" to wrap a socket. Changed in version 3.7: "SSLSocket" instances must to created with "wrap_socket()". In earlier versions, it was possible to create instances directly. This was never documented or officially supported. SSL sockets also have the following additional methods and attributes: SSLSocket.read(len=1024, buffer=None) Read up to *len* bytes of data from the SSL socket and return the result as a "bytes" instance. If *buffer* is specified, then read into the buffer instead, and return the number of bytes read. Raise "SSLWantReadError" or "SSLWantWriteError" if the socket is non-blocking and the read would block. As at any time a re-negotiation is possible, a call to "read()" can also cause write operations. Changed in version 3.5: The socket timeout is no more reset each time bytes are received or sent. The socket timeout is now to maximum total duration to read up to *len* bytes. Deprecated since version 3.6: Use "recv()" instead of "read()". SSLSocket.write(buf) Write *buf* to the SSL socket and return the number of bytes written. The *buf* argument must be an object supporting the buffer interface. Raise "SSLWantReadError" or "SSLWantWriteError" if the socket is non-blocking and the write would block. As at any time a re-negotiation is possible, a call to "write()" can also cause read operations. Changed in version 3.5: The socket timeout is no more reset each time bytes are received or sent. The socket timeout is now to maximum total duration to write *buf*. Deprecated since version 3.6: Use "send()" instead of "write()". Note: The "read()" and "write()" methods are the low-level methods that read and write unencrypted, application-level data and decrypt/encrypt it to encrypted, wire-level data. These methods require an active SSL connection, i.e. the handshake was completed and "SSLSocket.unwrap()" was not called.Normally you should use the socket API methods like "recv()" and "send()" instead of these methods. SSLSocket.do_handshake() Perform the SSL setup handshake. Changed in version 3.4: The handshake method also performs "match_hostname()" when the "check_hostname" attribute of the socket’s "context" is true. Changed in version 3.5: The socket timeout is no more reset each time bytes are received or sent. The socket timeout is now to maximum total duration of the handshake. Changed in version 3.7: Hostname or IP address is matched by OpenSSL during handshake. The function "match_hostname()" is no longer used. In case OpenSSL refuses a hostname or IP address, the handshake is aborted early and a TLS alert message is send to the peer. SSLSocket.getpeercert(binary_form=False) If there is no certificate for the peer on the other end of the connection, return "None". If the SSL handshake hasn’t been done yet, raise "ValueError". If the "binary_form" parameter is "False", and a certificate was received from the peer, this method returns a "dict" instance. If the certificate was not validated, the dict is empty. If the certificate was validated, it returns a dict with several keys, amongst them "subject" (the principal for which the certificate was issued) and "issuer" (the principal issuing the certificate). If a certificate contains an instance of the *Subject Alternative Name* extension (see **RFC 3280**), there will also be a "subjectAltName" key in the dictionary. The "subject" and "issuer" fields are tuples containing the sequence of relative distinguished names (RDNs) given in the certificate’s data structure for the respective fields, and each RDN is a sequence of name-value pairs. Here is a real-world example: {'issuer': ((('countryName', 'IL'),), (('organizationName', 'StartCom Ltd.'),), (('organizationalUnitName', 'Secure Digital Certificate Signing'),), (('commonName', 'StartCom Class 2 Primary Intermediate Server CA'),)), 'notAfter': 'Nov 22 08:15:19 2013 GMT', 'notBefore': 'Nov 21 03:09:52 2011 GMT', 'serialNumber': '95F0', 'subject': ((('description', '571208-SLe257oHY9fVQ07Z'),), (('countryName', 'US'),), (('stateOrProvinceName', 'California'),), (('localityName', 'San Francisco'),), (('organizationName', 'Electronic Frontier Foundation, Inc.'),), (('commonName', '*.eff.org'),), (('emailAddress', 'hostmaster@eff.org'),)), 'subjectAltName': (('DNS', '*.eff.org'), ('DNS', 'eff.org')), 'version': 3} Note: To validate a certificate for a particular service, you can use the "match_hostname()" function. If the "binary_form" parameter is "True", and a certificate was provided, this method returns the DER-encoded form of the entire certificate as a sequence of bytes, or "None" if the peer did not provide a certificate. Whether the peer provides a certificate depends on the SSL socket’s role: * for a client SSL socket, the server will always provide a certificate, regardless of whether validation was required; * for a server SSL socket, the client will only provide a certificate when requested by the server; therefore "getpeercert()" will return "None" if you used "CERT_NONE" (rather than "CERT_OPTIONAL" or "CERT_REQUIRED"). Changed in version 3.2: The returned dictionary includes additional items such as "issuer" and "notBefore". Changed in version 3.4: "ValueError" is raised when the handshake isn’t done. The returned dictionary includes additional X509v3 extension items such as "crlDistributionPoints", "caIssuers" and "OCSP" URIs. Changed in version 3.9: IPv6 address strings no longer have a trailing new line. SSLSocket.cipher() Returns a three-value tuple containing the name of the cipher being used, the version of the SSL protocol that defines its use, and the number of secret bits being used. If no connection has been established, returns "None". SSLSocket.shared_ciphers() Return the list of ciphers shared by the client during the handshake. Each entry of the returned list is a three-value tuple containing the name of the cipher, the version of the SSL protocol that defines its use, and the number of secret bits the cipher uses. "shared_ciphers()" returns "None" if no connection has been established or the socket is a client socket. New in version 3.5. SSLSocket.compression() Return the compression algorithm being used as a string, or "None" if the connection isn’t compressed. If the higher-level protocol supports its own compression mechanism, you can use "OP_NO_COMPRESSION" to disable SSL-level compression. New in version 3.3. SSLSocket.get_channel_binding(cb_type="tls-unique") Get channel binding data for current connection, as a bytes object. Returns "None" if not connected or the handshake has not been completed. The *cb_type* parameter allow selection of the desired channel binding type. Valid channel binding types are listed in the "CHANNEL_BINDING_TYPES" list. Currently only the ‘tls-unique’ channel binding, defined by **RFC 5929**, is supported. "ValueError" will be raised if an unsupported channel binding type is requested. New in version 3.3. SSLSocket.selected_alpn_protocol() Return the protocol that was selected during the TLS handshake. If "SSLContext.set_alpn_protocols()" was not called, if the other party does not support ALPN, if this socket does not support any of the client’s proposed protocols, or if the handshake has not happened yet, "None" is returned. New in version 3.5. SSLSocket.selected_npn_protocol() Return the higher-level protocol that was selected during the TLS/SSL handshake. If "SSLContext.set_npn_protocols()" was not called, or if the other party does not support NPN, or if the handshake has not yet happened, this will return "None". New in version 3.3. SSLSocket.unwrap() Performs the SSL shutdown handshake, which removes the TLS layer from the underlying socket, and returns the underlying socket object. This can be used to go from encrypted operation over a connection to unencrypted. The returned socket should always be used for further communication with the other side of the connection, rather than the original socket. SSLSocket.verify_client_post_handshake() Requests post-handshake authentication (PHA) from a TLS 1.3 client. PHA can only be initiated for a TLS 1.3 connection from a server- side socket, after the initial TLS handshake and with PHA enabled on both sides, see "SSLContext.post_handshake_auth". The method does not perform a cert exchange immediately. The server-side sends a CertificateRequest during the next write event and expects the client to respond with a certificate on the next read event. If any precondition isn’t met (e.g. not TLS 1.3, PHA not enabled), an "SSLError" is raised. Note: Only available with OpenSSL 1.1.1 and TLS 1.3 enabled. Without TLS 1.3 support, the method raises "NotImplementedError". New in version 3.8. SSLSocket.version() Return the actual SSL protocol version negotiated by the connection as a string, or "None" if no secure connection is established. As of this writing, possible return values include ""SSLv2"", ""SSLv3"", ""TLSv1"", ""TLSv1.1"" and ""TLSv1.2"". Recent OpenSSL versions may define more return values. New in version 3.5. SSLSocket.pending() Returns the number of already decrypted bytes available for read, pending on the connection. SSLSocket.context The "SSLContext" object this SSL socket is tied to. If the SSL socket was created using the deprecated "wrap_socket()" function (rather than "SSLContext.wrap_socket()"), this is a custom context object created for this SSL socket. New in version 3.2. SSLSocket.server_side A boolean which is "True" for server-side sockets and "False" for client-side sockets. New in version 3.2. SSLSocket.server_hostname Hostname of the server: "str" type, or "None" for server-side socket or if the hostname was not specified in the constructor. New in version 3.2. Changed in version 3.7: The attribute is now always ASCII text. When "server_hostname" is an internationalized domain name (IDN), this attribute now stores the A-label form (""xn--pythn-mua.org""), rather than the U-label form (""pythön.org""). SSLSocket.session The "SSLSession" for this SSL connection. The session is available for client and server side sockets after the TLS handshake has been performed. For client sockets the session can be set before "do_handshake()" has been called to reuse a session. New in version 3.6. SSLSocket.session_reused New in version 3.6. SSL Contexts ============ New in version 3.2. An SSL context holds various data longer-lived than single SSL connections, such as SSL configuration options, certificate(s) and private key(s). It also manages a cache of SSL sessions for server- side sockets, in order to speed up repeated connections from the same clients. class ssl.SSLContext(protocol=PROTOCOL_TLS) Create a new SSL context. You may pass *protocol* which must be one of the "PROTOCOL_*" constants defined in this module. The parameter specifies which version of the SSL protocol to use. Typically, the server chooses a particular protocol version, and the client must adapt to the server’s choice. Most of the versions are not interoperable with the other versions. If not specified, the default is "PROTOCOL_TLS"; it provides the most compatibility with other versions. Here’s a table showing which versions in a client (down the side) can connect to which versions in a server (along the top): +--------------------------+--------------+--------------+---------------+-----------+-------------+-------------+ | *client* / **server** | **SSLv2** | **SSLv3** | **TLS** [3] | **TLSv1** | **TLSv1.1** | **TLSv1.2** | +--------------------------+--------------+--------------+---------------+-----------+-------------+-------------+ | *SSLv2* | yes | no | no [1] | no | no | no | +--------------------------+--------------+--------------+---------------+-----------+-------------+-------------+ | *SSLv3* | no | yes | no [2] | no | no | no | +--------------------------+--------------+--------------+---------------+-----------+-------------+-------------+ | *TLS* (*SSLv23*) [3] | no [1] | no [2] | yes | yes | yes | yes | +--------------------------+--------------+--------------+---------------+-----------+-------------+-------------+ | *TLSv1* | no | no | yes | yes | no | no | +--------------------------+--------------+--------------+---------------+-----------+-------------+-------------+ | *TLSv1.1* | no | no | yes | no | yes | no | +--------------------------+--------------+--------------+---------------+-----------+-------------+-------------+ | *TLSv1.2* | no | no | yes | no | no | yes | +--------------------------+--------------+--------------+---------------+-----------+-------------+-------------+ -[ Footnotes ]- [1] "SSLContext" disables SSLv2 with "OP_NO_SSLv2" by default. [2] "SSLContext" disables SSLv3 with "OP_NO_SSLv3" by default. [3] TLS 1.3 protocol will be available with "PROTOCOL_TLS" in OpenSSL >= 1.1.1. There is no dedicated PROTOCOL constant for just TLS 1.3. See also: "create_default_context()" lets the "ssl" module choose security settings for a given purpose. Changed in version 3.6: The context is created with secure default values. The options "OP_NO_COMPRESSION", "OP_CIPHER_SERVER_PREFERENCE", "OP_SINGLE_DH_USE", "OP_SINGLE_ECDH_USE", "OP_NO_SSLv2" (except for "PROTOCOL_SSLv2"), and "OP_NO_SSLv3" (except for "PROTOCOL_SSLv3") are set by default. The initial cipher suite list contains only "HIGH" ciphers, no "NULL" ciphers and no "MD5" ciphers (except for "PROTOCOL_SSLv2"). "SSLContext" objects have the following methods and attributes: SSLContext.cert_store_stats() Get statistics about quantities of loaded X.509 certificates, count of X.509 certificates flagged as CA certificates and certificate revocation lists as dictionary. Example for a context with one CA cert and one other cert: >>> context.cert_store_stats() {'crl': 0, 'x509_ca': 1, 'x509': 2} New in version 3.4. SSLContext.load_cert_chain(certfile, keyfile=None, password=None) Load a private key and the corresponding certificate. The *certfile* string must be the path to a single file in PEM format containing the certificate as well as any number of CA certificates needed to establish the certificate’s authenticity. The *keyfile* string, if present, must point to a file containing the private key. Otherwise the private key will be taken from *certfile* as well. See the discussion of Certificates for more information on how the certificate is stored in the *certfile*. The *password* argument may be a function to call to get the password for decrypting the private key. It will only be called if the private key is encrypted and a password is necessary. It will be called with no arguments, and it should return a string, bytes, or bytearray. If the return value is a string it will be encoded as UTF-8 before using it to decrypt the key. Alternatively a string, bytes, or bytearray value may be supplied directly as the *password* argument. It will be ignored if the private key is not encrypted and no password is needed. If the *password* argument is not specified and a password is required, OpenSSL’s built-in password prompting mechanism will be used to interactively prompt the user for a password. An "SSLError" is raised if the private key doesn’t match with the certificate. Changed in version 3.3: New optional argument *password*. SSLContext.load_default_certs(purpose=Purpose.SERVER_AUTH) Load a set of default “certification authority” (CA) certificates from default locations. On Windows it loads CA certs from the "CA" and "ROOT" system stores. On all systems it calls "SSLContext.set_default_verify_paths()". In the future the method may load CA certificates from other locations, too. The *purpose* flag specifies what kind of CA certificates are loaded. The default settings "Purpose.SERVER_AUTH" loads certificates, that are flagged and trusted for TLS web server authentication (client side sockets). "Purpose.CLIENT_AUTH" loads CA certificates for client certificate verification on the server side. New in version 3.4. SSLContext.load_verify_locations(cafile=None, capath=None, cadata=None) Load a set of “certification authority” (CA) certificates used to validate other peers’ certificates when "verify_mode" is other than "CERT_NONE". At least one of *cafile* or *capath* must be specified. This method can also load certification revocation lists (CRLs) in PEM or DER format. In order to make use of CRLs, "SSLContext.verify_flags" must be configured properly. The *cafile* string, if present, is the path to a file of concatenated CA certificates in PEM format. See the discussion of Certificates for more information about how to arrange the certificates in this file. The *capath* string, if present, is the path to a directory containing several CA certificates in PEM format, following an OpenSSL specific layout. The *cadata* object, if present, is either an ASCII string of one or more PEM-encoded certificates or a *bytes-like object* of DER- encoded certificates. Like with *capath* extra lines around PEM- encoded certificates are ignored but at least one certificate must be present. Changed in version 3.4: New optional argument *cadata* SSLContext.get_ca_certs(binary_form=False) Get a list of loaded “certification authority” (CA) certificates. If the "binary_form" parameter is "False" each list entry is a dict like the output of "SSLSocket.getpeercert()". Otherwise the method returns a list of DER-encoded certificates. The returned list does not contain certificates from *capath* unless a certificate was requested and loaded by a SSL connection. Note: Certificates in a capath directory aren’t loaded unless they have been used at least once. New in version 3.4. SSLContext.get_ciphers() Get a list of enabled ciphers. The list is in order of cipher priority. See "SSLContext.set_ciphers()". Example: >>> ctx = ssl.SSLContext(ssl.PROTOCOL_SSLv23) >>> ctx.set_ciphers('ECDHE+AESGCM:!ECDSA') >>> ctx.get_ciphers() # OpenSSL 1.0.x [{'alg_bits': 256, 'description': 'ECDHE-RSA-AES256-GCM-SHA384 TLSv1.2 Kx=ECDH Au=RSA ' 'Enc=AESGCM(256) Mac=AEAD', 'id': 50380848, 'name': 'ECDHE-RSA-AES256-GCM-SHA384', 'protocol': 'TLSv1/SSLv3', 'strength_bits': 256}, {'alg_bits': 128, 'description': 'ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 Kx=ECDH Au=RSA ' 'Enc=AESGCM(128) Mac=AEAD', 'id': 50380847, 'name': 'ECDHE-RSA-AES128-GCM-SHA256', 'protocol': 'TLSv1/SSLv3', 'strength_bits': 128}] On OpenSSL 1.1 and newer the cipher dict contains additional fields: >>> ctx.get_ciphers() # OpenSSL 1.1+ [{'aead': True, 'alg_bits': 256, 'auth': 'auth-rsa', 'description': 'ECDHE-RSA-AES256-GCM-SHA384 TLSv1.2 Kx=ECDH Au=RSA ' 'Enc=AESGCM(256) Mac=AEAD', 'digest': None, 'id': 50380848, 'kea': 'kx-ecdhe', 'name': 'ECDHE-RSA-AES256-GCM-SHA384', 'protocol': 'TLSv1.2', 'strength_bits': 256, 'symmetric': 'aes-256-gcm'}, {'aead': True, 'alg_bits': 128, 'auth': 'auth-rsa', 'description': 'ECDHE-RSA-AES128-GCM-SHA256 TLSv1.2 Kx=ECDH Au=RSA ' 'Enc=AESGCM(128) Mac=AEAD', 'digest': None, 'id': 50380847, 'kea': 'kx-ecdhe', 'name': 'ECDHE-RSA-AES128-GCM-SHA256', 'protocol': 'TLSv1.2', 'strength_bits': 128, 'symmetric': 'aes-128-gcm'}] Availability: OpenSSL 1.0.2+. New in version 3.6. SSLContext.set_default_verify_paths() Load a set of default “certification authority” (CA) certificates from a filesystem path defined when building the OpenSSL library. Unfortunately, there’s no easy way to know whether this method succeeds: no error is returned if no certificates are to be found. When the OpenSSL library is provided as part of the operating system, though, it is likely to be configured properly. SSLContext.set_ciphers(ciphers) Set the available ciphers for sockets created with this context. It should be a string in the OpenSSL cipher list format. If no cipher can be selected (because compile-time options or other configuration forbids use of all the specified ciphers), an "SSLError" will be raised. Note: when connected, the "SSLSocket.cipher()" method of SSL sockets will give the currently selected cipher.OpenSSL 1.1.1 has TLS 1.3 cipher suites enabled by default. The suites cannot be disabled with "set_ciphers()". SSLContext.set_alpn_protocols(protocols) Specify which protocols the socket should advertise during the SSL/TLS handshake. It should be a list of ASCII strings, like "['http/1.1', 'spdy/2']", ordered by preference. The selection of a protocol will happen during the handshake, and will play out according to **RFC 7301**. After a successful handshake, the "SSLSocket.selected_alpn_protocol()" method will return the agreed- upon protocol. This method will raise "NotImplementedError" if "HAS_ALPN" is "False". OpenSSL 1.1.0 to 1.1.0e will abort the handshake and raise "SSLError" when both sides support ALPN but cannot agree on a protocol. 1.1.0f+ behaves like 1.0.2, "SSLSocket.selected_alpn_protocol()" returns None. New in version 3.5. SSLContext.set_npn_protocols(protocols) Specify which protocols the socket should advertise during the SSL/TLS handshake. It should be a list of strings, like "['http/1.1', 'spdy/2']", ordered by preference. The selection of a protocol will happen during the handshake, and will play out according to the Application Layer Protocol Negotiation. After a successful handshake, the "SSLSocket.selected_npn_protocol()" method will return the agreed-upon protocol. This method will raise "NotImplementedError" if "HAS_NPN" is "False". New in version 3.3. SSLContext.sni_callback Register a callback function that will be called after the TLS Client Hello handshake message has been received by the SSL/TLS server when the TLS client specifies a server name indication. The server name indication mechanism is specified in **RFC 6066** section 3 - Server Name Indication. Only one callback can be set per "SSLContext". If *sni_callback* is set to "None" then the callback is disabled. Calling this function a subsequent time will disable the previously registered callback. The callback function will be called with three arguments; the first being the "ssl.SSLSocket", the second is a string that represents the server name that the client is intending to communicate (or "None" if the TLS Client Hello does not contain a server name) and the third argument is the original "SSLContext". The server name argument is text. For internationalized domain name, the server name is an IDN A-label (""xn--pythn-mua.org""). A typical use of this callback is to change the "ssl.SSLSocket"’s "SSLSocket.context" attribute to a new object of type "SSLContext" representing a certificate chain that matches the server name. Due to the early negotiation phase of the TLS connection, only limited methods and attributes are usable like "SSLSocket.selected_alpn_protocol()" and "SSLSocket.context". The "SSLSocket.getpeercert()", "SSLSocket.cipher()" and "SSLSocket.compression()" methods require that the TLS connection has progressed beyond the TLS Client Hello and therefore will not return meaningful values nor can they be called safely. The *sni_callback* function must return "None" to allow the TLS negotiation to continue. If a TLS failure is required, a constant "ALERT_DESCRIPTION_*" can be returned. Other return values will result in a TLS fatal error with "ALERT_DESCRIPTION_INTERNAL_ERROR". If an exception is raised from the *sni_callback* function the TLS connection will terminate with a fatal TLS alert message "ALERT_DESCRIPTION_HANDSHAKE_FAILURE". This method will raise "NotImplementedError" if the OpenSSL library had OPENSSL_NO_TLSEXT defined when it was built. New in version 3.7. SSLContext.set_servername_callback(server_name_callback) This is a legacy API retained for backwards compatibility. When possible, you should use "sni_callback" instead. The given *server_name_callback* is similar to *sni_callback*, except that when the server hostname is an IDN-encoded internationalized domain name, the *server_name_callback* receives a decoded U-label (""pythön.org""). If there is an decoding error on the server name, the TLS connection will terminate with an "ALERT_DESCRIPTION_INTERNAL_ERROR" fatal TLS alert message to the client. New in version 3.4. SSLContext.load_dh_params(dhfile) Load the key generation parameters for Diffie-Hellman (DH) key exchange. Using DH key exchange improves forward secrecy at the expense of computational resources (both on the server and on the client). The *dhfile* parameter should be the path to a file containing DH parameters in PEM format. This setting doesn’t apply to client sockets. You can also use the "OP_SINGLE_DH_USE" option to further improve security. New in version 3.3. SSLContext.set_ecdh_curve(curve_name) Set the curve name for Elliptic Curve-based Diffie-Hellman (ECDH) key exchange. ECDH is significantly faster than regular DH while arguably as secure. The *curve_name* parameter should be a string describing a well-known elliptic curve, for example "prime256v1" for a widely supported curve. This setting doesn’t apply to client sockets. You can also use the "OP_SINGLE_ECDH_USE" option to further improve security. This method is not available if "HAS_ECDH" is "False". New in version 3.3. See also: SSL/TLS & Perfect Forward Secrecy Vincent Bernat. SSLContext.wrap_socket(sock, server_side=False, do_handshake_on_connect=True, suppress_ragged_eofs=True, server_hostname=None, session=None) Wrap an existing Python socket *sock* and return an instance of "SSLContext.sslsocket_class" (default "SSLSocket"). The returned SSL socket is tied to the context, its settings and certificates. *sock* must be a "SOCK_STREAM" socket; other socket types are unsupported. The parameter "server_side" is a boolean which identifies whether server-side or client-side behavior is desired from this socket. For client-side sockets, the context construction is lazy; if the underlying socket isn’t connected yet, the context construction will be performed after "connect()" is called on the socket. For server-side sockets, if the socket has no remote peer, it is assumed to be a listening socket, and the server-side SSL wrapping is automatically performed on client connections accepted via the "accept()" method. The method may raise "SSLError". On client connections, the optional parameter *server_hostname* specifies the hostname of the service which we are connecting to. This allows a single server to host multiple SSL-based services with distinct certificates, quite similarly to HTTP virtual hosts. Specifying *server_hostname* will raise a "ValueError" if *server_side* is true. The parameter "do_handshake_on_connect" specifies whether to do the SSL handshake automatically after doing a "socket.connect()", or whether the application program will call it explicitly, by invoking the "SSLSocket.do_handshake()" method. Calling "SSLSocket.do_handshake()" explicitly gives the program control over the blocking behavior of the socket I/O involved in the handshake. The parameter "suppress_ragged_eofs" specifies how the "SSLSocket.recv()" method should signal unexpected EOF from the other end of the connection. If specified as "True" (the default), it returns a normal EOF (an empty bytes object) in response to unexpected EOF errors raised from the underlying socket; if "False", it will raise the exceptions back to the caller. *session*, see "session". Changed in version 3.5: Always allow a server_hostname to be passed, even if OpenSSL does not have SNI. Changed in version 3.6: *session* argument was added. Changed in version 3.7: The method returns on instance of "SSLContext.sslsocket_class" instead of hard-coded "SSLSocket". SSLContext.sslsocket_class The return type of "SSLContext.wrap_socket()", defaults to "SSLSocket". The attribute can be overridden on instance of class in order to return a custom subclass of "SSLSocket". New in version 3.7. SSLContext.wrap_bio(incoming, outgoing, server_side=False, server_hostname=None, session=None) Wrap the BIO objects *incoming* and *outgoing* and return an instance of "SSLContext.sslobject_class" (default "SSLObject"). The SSL routines will read input data from the incoming BIO and write data to the outgoing BIO. The *server_side*, *server_hostname* and *session* parameters have the same meaning as in "SSLContext.wrap_socket()". Changed in version 3.6: *session* argument was added. Changed in version 3.7: The method returns on instance of "SSLContext.sslobject_class" instead of hard-coded "SSLObject". SSLContext.sslobject_class The return type of "SSLContext.wrap_bio()", defaults to "SSLObject". The attribute can be overridden on instance of class in order to return a custom subclass of "SSLObject". New in version 3.7. SSLContext.session_stats() Get statistics about the SSL sessions created or managed by this context. A dictionary is returned which maps the names of each piece of information to their numeric values. For example, here is the total number of hits and misses in the session cache since the context was created: >>> stats = context.session_stats() >>> stats['hits'], stats['misses'] (0, 0) SSLContext.check_hostname Whether to match the peer cert’s hostname in "SSLSocket.do_handshake()". The context’s "verify_mode" must be set to "CERT_OPTIONAL" or "CERT_REQUIRED", and you must pass *server_hostname* to "wrap_socket()" in order to match the hostname. Enabling hostname checking automatically sets "verify_mode" from "CERT_NONE" to "CERT_REQUIRED". It cannot be set back to "CERT_NONE" as long as hostname checking is enabled. The "PROTOCOL_TLS_CLIENT" protocol enables hostname checking by default. With other protocols, hostname checking must be enabled explicitly. Example: import socket, ssl context = ssl.SSLContext(ssl.PROTOCOL_TLSv1_2) context.verify_mode = ssl.CERT_REQUIRED context.check_hostname = True context.load_default_certs() s = socket.socket(socket.AF_INET, socket.SOCK_STREAM) ssl_sock = context.wrap_socket(s, server_hostname='www.verisign.com') ssl_sock.connect(('www.verisign.com', 443)) New in version 3.4. Changed in version 3.7: "verify_mode" is now automatically changed to "CERT_REQUIRED" when hostname checking is enabled and "verify_mode" is "CERT_NONE". Previously the same operation would have failed with a "ValueError". Note: This features requires OpenSSL 0.9.8f or newer. SSLContext.keylog_filename Write TLS keys to a keylog file, whenever key material is generated or received. The keylog file is designed for debugging purposes only. The file format is specified by NSS and used by many traffic analyzers such as Wireshark. The log file is opened in append-only mode. Writes are synchronized between threads, but not between processes. New in version 3.8. Note: This features requires OpenSSL 1.1.1 or newer. SSLContext.maximum_version A "TLSVersion" enum member representing the highest supported TLS version. The value defaults to "TLSVersion.MAXIMUM_SUPPORTED". The attribute is read-only for protocols other than "PROTOCOL_TLS", "PROTOCOL_TLS_CLIENT", and "PROTOCOL_TLS_SERVER". The attributes "maximum_version", "minimum_version" and "SSLContext.options" all affect the supported SSL and TLS versions of the context. The implementation does not prevent invalid combination. For example a context with "OP_NO_TLSv1_2" in "options" and "maximum_version" set to "TLSVersion.TLSv1_2" will not be able to establish a TLS 1.2 connection. Note: This attribute is not available unless the ssl module is compiled with OpenSSL 1.1.0g or newer. New in version 3.7. SSLContext.minimum_version Like "SSLContext.maximum_version" except it is the lowest supported version or "TLSVersion.MINIMUM_SUPPORTED". Note: This attribute is not available unless the ssl module is compiled with OpenSSL 1.1.0g or newer. New in version 3.7. SSLContext.num_tickets Control the number of TLS 1.3 session tickets of a "TLS_PROTOCOL_SERVER" context. The setting has no impact on TLS 1.0 to 1.2 connections. Note: This attribute is not available unless the ssl module is compiled with OpenSSL 1.1.1 or newer. New in version 3.8. SSLContext.options An integer representing the set of SSL options enabled on this context. The default value is "OP_ALL", but you can specify other options such as "OP_NO_SSLv2" by ORing them together. Note: With versions of OpenSSL older than 0.9.8m, it is only possible to set options, not to clear them. Attempting to clear an option (by resetting the corresponding bits) will raise a "ValueError". Changed in version 3.6: "SSLContext.options" returns "Options" flags: >>> ssl.create_default_context().options SSLContext.post_handshake_auth Enable TLS 1.3 post-handshake client authentication. Post-handshake auth is disabled by default and a server can only request a TLS client certificate during the initial handshake. When enabled, a server may request a TLS client certificate at any time after the handshake. When enabled on client-side sockets, the client signals the server that it supports post-handshake authentication. When enabled on server-side sockets, "SSLContext.verify_mode" must be set to "CERT_OPTIONAL" or "CERT_REQUIRED", too. The actual client cert exchange is delayed until "SSLSocket.verify_client_post_handshake()" is called and some I/O is performed. Note: Only available with OpenSSL 1.1.1 and TLS 1.3 enabled. Without TLS 1.3 support, the property value is None and can’t be modified New in version 3.8. SSLContext.protocol The protocol version chosen when constructing the context. This attribute is read-only. SSLContext.hostname_checks_common_name Whether "check_hostname" falls back to verify the cert’s subject common name in the absence of a subject alternative name extension (default: true). Note: Only writeable with OpenSSL 1.1.0 or higher. New in version 3.7. Changed in version 3.9.3: The flag had no effect with OpenSSL before version 1.1.1k. Python 3.8.9, 3.9.3, and 3.10 include workarounds for previous versions. SSLContext.verify_flags The flags for certificate verification operations. You can set flags like "VERIFY_CRL_CHECK_LEAF" by ORing them together. By default OpenSSL does neither require nor verify certificate revocation lists (CRLs). Available only with openssl version 0.9.8+. New in version 3.4. Changed in version 3.6: "SSLContext.verify_flags" returns "VerifyFlags" flags: >>> ssl.create_default_context().verify_flags SSLContext.verify_mode Whether to try to verify other peers’ certificates and how to behave if verification fails. This attribute must be one of "CERT_NONE", "CERT_OPTIONAL" or "CERT_REQUIRED". Changed in version 3.6: "SSLContext.verify_mode" returns "VerifyMode" enum: >>> ssl.create_default_context().verify_mode Certificates ============ Certificates in general are part of a public-key / private-key system. In this system, each *principal*, (which may be a machine, or a person, or an organization) is assigned a unique two-part encryption key. One part of the key is public, and is called the *public key*; the other part is kept secret, and is called the *private key*. The two parts are related, in that if you encrypt a message with one of the parts, you can decrypt it with the other part, and **only** with the other part. A certificate contains information about two principals. It contains the name of a *subject*, and the subject’s public key. It also contains a statement by a second principal, the *issuer*, that the subject is who they claim to be, and that this is indeed the subject’s public key. The issuer’s statement is signed with the issuer’s private key, which only the issuer knows. However, anyone can verify the issuer’s statement by finding the issuer’s public key, decrypting the statement with it, and comparing it to the other information in the certificate. The certificate also contains information about the time period over which it is valid. This is expressed as two fields, called “notBefore” and “notAfter”. In the Python use of certificates, a client or server can use a certificate to prove who they are. The other side of a network connection can also be required to produce a certificate, and that certificate can be validated to the satisfaction of the client or server that requires such validation. The connection attempt can be set to raise an exception if the validation fails. Validation is done automatically, by the underlying OpenSSL framework; the application need not concern itself with its mechanics. But the application does usually need to provide sets of certificates to allow this process to take place. Python uses files to contain certificates. They should be formatted as “PEM” (see **RFC 1422**), which is a base-64 encoded form wrapped with a header line and a footer line: -----BEGIN CERTIFICATE----- ... (certificate in base64 PEM encoding) ... -----END CERTIFICATE----- Certificate chains ------------------ The Python files which contain certificates can contain a sequence of certificates, sometimes called a *certificate chain*. This chain should start with the specific certificate for the principal who “is” the client or server, and then the certificate for the issuer of that certificate, and then the certificate for the issuer of *that* certificate, and so on up the chain till you get to a certificate which is *self-signed*, that is, a certificate which has the same subject and issuer, sometimes called a *root certificate*. The certificates should just be concatenated together in the certificate file. For example, suppose we had a three certificate chain, from our server certificate to the certificate of the certification authority that signed our server certificate, to the root certificate of the agency which issued the certification authority’s certificate: -----BEGIN CERTIFICATE----- ... (certificate for your server)... -----END CERTIFICATE----- -----BEGIN CERTIFICATE----- ... (the certificate for the CA)... -----END CERTIFICATE----- -----BEGIN CERTIFICATE----- ... (the root certificate for the CA's issuer)... -----END CERTIFICATE----- CA certificates --------------- If you are going to require validation of the other side of the connection’s certificate, you need to provide a “CA certs” file, filled with the certificate chains for each issuer you are willing to trust. Again, this file just contains these chains concatenated together. For validation, Python will use the first chain it finds in the file which matches. The platform’s certificates file can be used by calling "SSLContext.load_default_certs()", this is done automatically with "create_default_context()". Combined key and certificate ---------------------------- Often the private key is stored in the same file as the certificate; in this case, only the "certfile" parameter to "SSLContext.load_cert_chain()" and "wrap_socket()" needs to be passed. If the private key is stored with the certificate, it should come before the first certificate in the certificate chain: -----BEGIN RSA PRIVATE KEY----- ... (private key in base64 encoding) ... -----END RSA PRIVATE KEY----- -----BEGIN CERTIFICATE----- ... (certificate in base64 PEM encoding) ... -----END CERTIFICATE----- Self-signed certificates ------------------------ If you are going to create a server that provides SSL-encrypted connection services, you will need to acquire a certificate for that service. There are many ways of acquiring appropriate certificates, such as buying one from a certification authority. Another common practice is to generate a self-signed certificate. The simplest way to do this is with the OpenSSL package, using something like the following: % openssl req -new -x509 -days 365 -nodes -out cert.pem -keyout cert.pem Generating a 1024 bit RSA private key .......++++++ .............................++++++ writing new private key to 'cert.pem' ----- You are about to be asked to enter information that will be incorporated into your certificate request. What you are about to enter is what is called a Distinguished Name or a DN. There are quite a few fields but you can leave some blank For some fields there will be a default value, If you enter '.', the field will be left blank. ----- Country Name (2 letter code) [AU]:US State or Province Name (full name) [Some-State]:MyState Locality Name (eg, city) []:Some City Organization Name (eg, company) [Internet Widgits Pty Ltd]:My Organization, Inc. Organizational Unit Name (eg, section) []:My Group Common Name (eg, YOUR name) []:myserver.mygroup.myorganization.com Email Address []:ops@myserver.mygroup.myorganization.com % The disadvantage of a self-signed certificate is that it is its own root certificate, and no one else will have it in their cache of known (and trusted) root certificates. Examples ======== Testing for SSL support ----------------------- To test for the presence of SSL support in a Python installation, user code should use the following idiom: try: import ssl except ImportError: pass else: ... # do something that requires SSL support Client-side operation --------------------- This example creates a SSL context with the recommended security settings for client sockets, including automatic certificate verification: >>> context = ssl.create_default_context() If you prefer to tune security settings yourself, you might create a context from scratch (but beware that you might not get the settings right): >>> context = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT) >>> context.load_verify_locations("/etc/ssl/certs/ca-bundle.crt") (this snippet assumes your operating system places a bundle of all CA certificates in "/etc/ssl/certs/ca-bundle.crt"; if not, you’ll get an error and have to adjust the location) The "PROTOCOL_TLS_CLIENT" protocol configures the context for cert validation and hostname verification. "verify_mode" is set to "CERT_REQUIRED" and "check_hostname" is set to "True". All other protocols create SSL contexts with insecure defaults. When you use the context to connect to a server, "CERT_REQUIRED" and "check_hostname" validate the server certificate: it ensures that the server certificate was signed with one of the CA certificates, checks the signature for correctness, and verifies other properties like validity and identity of the hostname: >>> conn = context.wrap_socket(socket.socket(socket.AF_INET), ... server_hostname="www.python.org") >>> conn.connect(("www.python.org", 443)) You may then fetch the certificate: >>> cert = conn.getpeercert() Visual inspection shows that the certificate does identify the desired service (that is, the HTTPS host "www.python.org"): >>> pprint.pprint(cert) {'OCSP': ('http://ocsp.digicert.com',), 'caIssuers': ('http://cacerts.digicert.com/DigiCertSHA2ExtendedValidationServerCA.crt',), 'crlDistributionPoints': ('http://crl3.digicert.com/sha2-ev-server-g1.crl', 'http://crl4.digicert.com/sha2-ev-server-g1.crl'), 'issuer': ((('countryName', 'US'),), (('organizationName', 'DigiCert Inc'),), (('organizationalUnitName', 'www.digicert.com'),), (('commonName', 'DigiCert SHA2 Extended Validation Server CA'),)), 'notAfter': 'Sep 9 12:00:00 2016 GMT', 'notBefore': 'Sep 5 00:00:00 2014 GMT', 'serialNumber': '01BB6F00122B177F36CAB49CEA8B6B26', 'subject': ((('businessCategory', 'Private Organization'),), (('1.3.6.1.4.1.311.60.2.1.3', 'US'),), (('1.3.6.1.4.1.311.60.2.1.2', 'Delaware'),), (('serialNumber', '3359300'),), (('streetAddress', '16 Allen Rd'),), (('postalCode', '03894-4801'),), (('countryName', 'US'),), (('stateOrProvinceName', 'NH'),), (('localityName', 'Wolfeboro'),), (('organizationName', 'Python Software Foundation'),), (('commonName', 'www.python.org'),)), 'subjectAltName': (('DNS', 'www.python.org'), ('DNS', 'python.org'), ('DNS', 'pypi.org'), ('DNS', 'docs.python.org'), ('DNS', 'testpypi.org'), ('DNS', 'bugs.python.org'), ('DNS', 'wiki.python.org'), ('DNS', 'hg.python.org'), ('DNS', 'mail.python.org'), ('DNS', 'packaging.python.org'), ('DNS', 'pythonhosted.org'), ('DNS', 'www.pythonhosted.org'), ('DNS', 'test.pythonhosted.org'), ('DNS', 'us.pycon.org'), ('DNS', 'id.python.org')), 'version': 3} Now the SSL channel is established and the certificate verified, you can proceed to talk with the server: >>> conn.sendall(b"HEAD / HTTP/1.0\r\nHost: linuxfr.org\r\n\r\n") >>> pprint.pprint(conn.recv(1024).split(b"\r\n")) [b'HTTP/1.1 200 OK', b'Date: Sat, 18 Oct 2014 18:27:20 GMT', b'Server: nginx', b'Content-Type: text/html; charset=utf-8', b'X-Frame-Options: SAMEORIGIN', b'Content-Length: 45679', b'Accept-Ranges: bytes', b'Via: 1.1 varnish', b'Age: 2188', b'X-Served-By: cache-lcy1134-LCY', b'X-Cache: HIT', b'X-Cache-Hits: 11', b'Vary: Cookie', b'Strict-Transport-Security: max-age=63072000; includeSubDomains', b'Connection: close', b'', b''] See the discussion of Security considerations below. Server-side operation --------------------- For server operation, typically you’ll need to have a server certificate, and private key, each in a file. You’ll first create a context holding the key and the certificate, so that clients can check your authenticity. Then you’ll open a socket, bind it to a port, call "listen()" on it, and start waiting for clients to connect: import socket, ssl context = ssl.create_default_context(ssl.Purpose.CLIENT_AUTH) context.load_cert_chain(certfile="mycertfile", keyfile="mykeyfile") bindsocket = socket.socket() bindsocket.bind(('myaddr.example.com', 10023)) bindsocket.listen(5) When a client connects, you’ll call "accept()" on the socket to get the new socket from the other end, and use the context’s "SSLContext.wrap_socket()" method to create a server-side SSL socket for the connection: while True: newsocket, fromaddr = bindsocket.accept() connstream = context.wrap_socket(newsocket, server_side=True) try: deal_with_client(connstream) finally: connstream.shutdown(socket.SHUT_RDWR) connstream.close() Then you’ll read data from the "connstream" and do something with it till you are finished with the client (or the client is finished with you): def deal_with_client(connstream): data = connstream.recv(1024) # empty data means the client is finished with us while data: if not do_something(connstream, data): # we'll assume do_something returns False # when we're finished with client break data = connstream.recv(1024) # finished with client And go back to listening for new client connections (of course, a real server would probably handle each client connection in a separate thread, or put the sockets in non-blocking mode and use an event loop). Notes on non-blocking sockets ============================= SSL sockets behave slightly different than regular sockets in non- blocking mode. When working with non-blocking sockets, there are thus several things you need to be aware of: * Most "SSLSocket" methods will raise either "SSLWantWriteError" or "SSLWantReadError" instead of "BlockingIOError" if an I/O operation would block. "SSLWantReadError" will be raised if a read operation on the underlying socket is necessary, and "SSLWantWriteError" for a write operation on the underlying socket. Note that attempts to *write* to an SSL socket may require *reading* from the underlying socket first, and attempts to *read* from the SSL socket may require a prior *write* to the underlying socket. Changed in version 3.5: In earlier Python versions, the "SSLSocket.send()" method returned zero instead of raising "SSLWantWriteError" or "SSLWantReadError". * Calling "select()" tells you that the OS-level socket can be read from (or written to), but it does not imply that there is sufficient data at the upper SSL layer. For example, only part of an SSL frame might have arrived. Therefore, you must be ready to handle "SSLSocket.recv()" and "SSLSocket.send()" failures, and retry after another call to "select()". * Conversely, since the SSL layer has its own framing, a SSL socket may still have data available for reading without "select()" being aware of it. Therefore, you should first call "SSLSocket.recv()" to drain any potentially available data, and then only block on a "select()" call if still necessary. (of course, similar provisions apply when using other primitives such as "poll()", or those in the "selectors" module) * The SSL handshake itself will be non-blocking: the "SSLSocket.do_handshake()" method has to be retried until it returns successfully. Here is a synopsis using "select()" to wait for the socket’s readiness: while True: try: sock.do_handshake() break except ssl.SSLWantReadError: select.select([sock], [], []) except ssl.SSLWantWriteError: select.select([], [sock], []) See also: The "asyncio" module supports non-blocking SSL sockets and provides a higher level API. It polls for events using the "selectors" module and handles "SSLWantWriteError", "SSLWantReadError" and "BlockingIOError" exceptions. It runs the SSL handshake asynchronously as well. Memory BIO Support ================== New in version 3.5. Ever since the SSL module was introduced in Python 2.6, the "SSLSocket" class has provided two related but distinct areas of functionality: * SSL protocol handling * Network IO The network IO API is identical to that provided by "socket.socket", from which "SSLSocket" also inherits. This allows an SSL socket to be used as a drop-in replacement for a regular socket, making it very easy to add SSL support to an existing application. Combining SSL protocol handling and network IO usually works well, but there are some cases where it doesn’t. An example is async IO frameworks that want to use a different IO multiplexing model than the “select/poll on a file descriptor” (readiness based) model that is assumed by "socket.socket" and by the internal OpenSSL socket IO routines. This is mostly relevant for platforms like Windows where this model is not efficient. For this purpose, a reduced scope variant of "SSLSocket" called "SSLObject" is provided. class ssl.SSLObject A reduced-scope variant of "SSLSocket" representing an SSL protocol instance that does not contain any network IO methods. This class is typically used by framework authors that want to implement asynchronous IO for SSL through memory buffers. This class implements an interface on top of a low-level SSL object as implemented by OpenSSL. This object captures the state of an SSL connection but does not provide any network IO itself. IO needs to be performed through separate “BIO” objects which are OpenSSL’s IO abstraction layer. This class has no public constructor. An "SSLObject" instance must be created using the "wrap_bio()" method. This method will create the "SSLObject" instance and bind it to a pair of BIOs. The *incoming* BIO is used to pass data from Python to the SSL protocol instance, while the *outgoing* BIO is used to pass data the other way around. The following methods are available: * "context" * "server_side" * "server_hostname" * "session" * "session_reused" * "read()" * "write()" * "getpeercert()" * "selected_alpn_protocol()" * "selected_npn_protocol()" * "cipher()" * "shared_ciphers()" * "compression()" * "pending()" * "do_handshake()" * "verify_client_post_handshake()" * "unwrap()" * "get_channel_binding()" * "version()" When compared to "SSLSocket", this object lacks the following features: * Any form of network IO; "recv()" and "send()" read and write only to the underlying "MemoryBIO" buffers. * There is no *do_handshake_on_connect* machinery. You must always manually call "do_handshake()" to start the handshake. * There is no handling of *suppress_ragged_eofs*. All end-of-file conditions that are in violation of the protocol are reported via the "SSLEOFError" exception. * The method "unwrap()" call does not return anything, unlike for an SSL socket where it returns the underlying socket. * The *server_name_callback* callback passed to "SSLContext.set_servername_callback()" will get an "SSLObject" instance instead of a "SSLSocket" instance as its first parameter. Some notes related to the use of "SSLObject": * All IO on an "SSLObject" is non-blocking. This means that for example "read()" will raise an "SSLWantReadError" if it needs more data than the incoming BIO has available. * There is no module-level "wrap_bio()" call like there is for "wrap_socket()". An "SSLObject" is always created via an "SSLContext". Changed in version 3.7: "SSLObject" instances must to created with "wrap_bio()". In earlier versions, it was possible to create instances directly. This was never documented or officially supported. An SSLObject communicates with the outside world using memory buffers. The class "MemoryBIO" provides a memory buffer that can be used for this purpose. It wraps an OpenSSL memory BIO (Basic IO) object: class ssl.MemoryBIO A memory buffer that can be used to pass data between Python and an SSL protocol instance. pending Return the number of bytes currently in the memory buffer. eof A boolean indicating whether the memory BIO is current at the end-of-file position. read(n=-1) Read up to *n* bytes from the memory buffer. If *n* is not specified or negative, all bytes are returned. write(buf) Write the bytes from *buf* to the memory BIO. The *buf* argument must be an object supporting the buffer protocol. The return value is the number of bytes written, which is always equal to the length of *buf*. write_eof() Write an EOF marker to the memory BIO. After this method has been called, it is illegal to call "write()". The attribute "eof" will become true after all data currently in the buffer has been read. SSL session =========== New in version 3.6. class ssl.SSLSession Session object used by "session". id time timeout ticket_lifetime_hint has_ticket Security considerations ======================= Best defaults ------------- For **client use**, if you don’t have any special requirements for your security policy, it is highly recommended that you use the "create_default_context()" function to create your SSL context. It will load the system’s trusted CA certificates, enable certificate validation and hostname checking, and try to choose reasonably secure protocol and cipher settings. For example, here is how you would use the "smtplib.SMTP" class to create a trusted, secure connection to a SMTP server: >>> import ssl, smtplib >>> smtp = smtplib.SMTP("mail.python.org", port=587) >>> context = ssl.create_default_context() >>> smtp.starttls(context=context) (220, b'2.0.0 Ready to start TLS') If a client certificate is needed for the connection, it can be added with "SSLContext.load_cert_chain()". By contrast, if you create the SSL context by calling the "SSLContext" constructor yourself, it will not have certificate validation nor hostname checking enabled by default. If you do so, please read the paragraphs below to achieve a good security level. Manual settings --------------- Verifying certificates ~~~~~~~~~~~~~~~~~~~~~~ When calling the "SSLContext" constructor directly, "CERT_NONE" is the default. Since it does not authenticate the other peer, it can be insecure, especially in client mode where most of time you would like to ensure the authenticity of the server you’re talking to. Therefore, when in client mode, it is highly recommended to use "CERT_REQUIRED". However, it is in itself not sufficient; you also have to check that the server certificate, which can be obtained by calling "SSLSocket.getpeercert()", matches the desired service. For many protocols and applications, the service can be identified by the hostname; in this case, the "match_hostname()" function can be used. This common check is automatically performed when "SSLContext.check_hostname" is enabled. Changed in version 3.7: Hostname matchings is now performed by OpenSSL. Python no longer uses "match_hostname()". In server mode, if you want to authenticate your clients using the SSL layer (rather than using a higher-level authentication mechanism), you’ll also have to specify "CERT_REQUIRED" and similarly check the client certificate. Protocol versions ~~~~~~~~~~~~~~~~~ SSL versions 2 and 3 are considered insecure and are therefore dangerous to use. If you want maximum compatibility between clients and servers, it is recommended to use "PROTOCOL_TLS_CLIENT" or "PROTOCOL_TLS_SERVER" as the protocol version. SSLv2 and SSLv3 are disabled by default. >>> client_context = ssl.SSLContext(ssl.PROTOCOL_TLS_CLIENT) >>> client_context.options |= ssl.OP_NO_TLSv1 >>> client_context.options |= ssl.OP_NO_TLSv1_1 The SSL context created above will only allow TLSv1.2 and later (if supported by your system) connections to a server. "PROTOCOL_TLS_CLIENT" implies certificate validation and hostname checks by default. You have to load certificates into the context. Cipher selection ~~~~~~~~~~~~~~~~ If you have advanced security requirements, fine-tuning of the ciphers enabled when negotiating a SSL session is possible through the "SSLContext.set_ciphers()" method. Starting from Python 3.2.3, the ssl module disables certain weak ciphers by default, but you may want to further restrict the cipher choice. Be sure to read OpenSSL’s documentation about the cipher list format. If you want to check which ciphers are enabled by a given cipher list, use "SSLContext.get_ciphers()" or the "openssl ciphers" command on your system. Multi-processing ---------------- If using this module as part of a multi-processed application (using, for example the "multiprocessing" or "concurrent.futures" modules), be aware that OpenSSL’s internal random number generator does not properly handle forked processes. Applications must change the PRNG state of the parent process if they use any SSL feature with "os.fork()". Any successful call of "RAND_add()", "RAND_bytes()" or "RAND_pseudo_bytes()" is sufficient. TLS 1.3 ======= New in version 3.7. Python has provisional and experimental support for TLS 1.3 with OpenSSL 1.1.1. The new protocol behaves slightly differently than previous version of TLS/SSL. Some new TLS 1.3 features are not yet available. * TLS 1.3 uses a disjunct set of cipher suites. All AES-GCM and ChaCha20 cipher suites are enabled by default. The method "SSLContext.set_ciphers()" cannot enable or disable any TLS 1.3 ciphers yet, but "SSLContext.get_ciphers()" returns them. * Session tickets are no longer sent as part of the initial handshake and are handled differently. "SSLSocket.session" and "SSLSession" are not compatible with TLS 1.3. * Client-side certificates are also no longer verified during the initial handshake. A server can request a certificate at any time. Clients process certificate requests while they send or receive application data from the server. * TLS 1.3 features like early data, deferred TLS client cert request, signature algorithm configuration, and rekeying are not supported yet. LibreSSL support ================ LibreSSL is a fork of OpenSSL 1.0.1. The ssl module has limited support for LibreSSL. Some features are not available when the ssl module is compiled with LibreSSL. * LibreSSL >= 2.6.1 no longer supports NPN. The methods "SSLContext.set_npn_protocols()" and "SSLSocket.selected_npn_protocol()" are not available. * "SSLContext.set_default_verify_paths()" ignores the env vars "SSL_CERT_FILE" and "SSL_CERT_PATH" although "get_default_verify_paths()" still reports them. See also: Class "socket.socket" Documentation of underlying "socket" class SSL/TLS Strong Encryption: An Introduction Intro from the Apache HTTP Server documentation **RFC 1422: Privacy Enhancement for Internet Electronic Mail: Part II: Certificate-Based Key Management** Steve Kent **RFC 4086: Randomness Requirements for Security** Donald E., Jeffrey I. Schiller **RFC 5280: Internet X.509 Public Key Infrastructure Certificate and Certificate Revocation List (CRL) Profile** D. Cooper **RFC 5246: The Transport Layer Security (TLS) Protocol Version 1.2** T. Dierks et. al. **RFC 6066: Transport Layer Security (TLS) Extensions** D. Eastlake IANA TLS: Transport Layer Security (TLS) Parameters IANA **RFC 7525: Recommendations for Secure Use of Transport Layer Security (TLS) and Datagram Transport Layer Security (DTLS)** IETF Mozilla’s Server Side TLS recommendations Mozilla "stat" — Interpreting "stat()" results ************************************** **Source code:** Lib/stat.py ====================================================================== The "stat" module defines constants and functions for interpreting the results of "os.stat()", "os.fstat()" and "os.lstat()" (if they exist). For complete details about the "stat()", "fstat()" and "lstat()" calls, consult the documentation for your system. Changed in version 3.4: The stat module is backed by a C implementation. The "stat" module defines the following functions to test for specific file types: stat.S_ISDIR(mode) Return non-zero if the mode is from a directory. stat.S_ISCHR(mode) Return non-zero if the mode is from a character special device file. stat.S_ISBLK(mode) Return non-zero if the mode is from a block special device file. stat.S_ISREG(mode) Return non-zero if the mode is from a regular file. stat.S_ISFIFO(mode) Return non-zero if the mode is from a FIFO (named pipe). stat.S_ISLNK(mode) Return non-zero if the mode is from a symbolic link. stat.S_ISSOCK(mode) Return non-zero if the mode is from a socket. stat.S_ISDOOR(mode) Return non-zero if the mode is from a door. New in version 3.4. stat.S_ISPORT(mode) Return non-zero if the mode is from an event port. New in version 3.4. stat.S_ISWHT(mode) Return non-zero if the mode is from a whiteout. New in version 3.4. Two additional functions are defined for more general manipulation of the file’s mode: stat.S_IMODE(mode) Return the portion of the file’s mode that can be set by "os.chmod()"—that is, the file’s permission bits, plus the sticky bit, set-group-id, and set-user-id bits (on systems that support them). stat.S_IFMT(mode) Return the portion of the file’s mode that describes the file type (used by the "S_IS*()" functions above). Normally, you would use the "os.path.is*()" functions for testing the type of a file; the functions here are useful when you are doing multiple tests of the same file and wish to avoid the overhead of the "stat()" system call for each test. These are also useful when checking for information about a file that isn’t handled by "os.path", like the tests for block and character devices. Example: import os, sys from stat import * def walktree(top, callback): '''recursively descend the directory tree rooted at top, calling the callback function for each regular file''' for f in os.listdir(top): pathname = os.path.join(top, f) mode = os.lstat(pathname).st_mode if S_ISDIR(mode): # It's a directory, recurse into it walktree(pathname, callback) elif S_ISREG(mode): # It's a file, call the callback function callback(pathname) else: # Unknown file type, print a message print('Skipping %s' % pathname) def visitfile(file): print('visiting', file) if __name__ == '__main__': walktree(sys.argv[1], visitfile) An additional utility function is provided to convert a file’s mode in a human readable string: stat.filemode(mode) Convert a file’s mode to a string of the form ‘-rwxrwxrwx’. New in version 3.3. Changed in version 3.4: The function supports "S_IFDOOR", "S_IFPORT" and "S_IFWHT". All the variables below are simply symbolic indexes into the 10-tuple returned by "os.stat()", "os.fstat()" or "os.lstat()". stat.ST_MODE Inode protection mode. stat.ST_INO Inode number. stat.ST_DEV Device inode resides on. stat.ST_NLINK Number of links to the inode. stat.ST_UID User id of the owner. stat.ST_GID Group id of the owner. stat.ST_SIZE Size in bytes of a plain file; amount of data waiting on some special files. stat.ST_ATIME Time of last access. stat.ST_MTIME Time of last modification. stat.ST_CTIME The “ctime” as reported by the operating system. On some systems (like Unix) is the time of the last metadata change, and, on others (like Windows), is the creation time (see platform documentation for details). The interpretation of “file size” changes according to the file type. For plain files this is the size of the file in bytes. For FIFOs and sockets under most flavors of Unix (including Linux in particular), the “size” is the number of bytes waiting to be read at the time of the call to "os.stat()", "os.fstat()", or "os.lstat()"; this can sometimes be useful, especially for polling one of these special files after a non-blocking open. The meaning of the size field for other character and block devices varies more, depending on the implementation of the underlying system call. The variables below define the flags used in the "ST_MODE" field. Use of the functions above is more portable than use of the first set of flags: stat.S_IFSOCK Socket. stat.S_IFLNK Symbolic link. stat.S_IFREG Regular file. stat.S_IFBLK Block device. stat.S_IFDIR Directory. stat.S_IFCHR Character device. stat.S_IFIFO FIFO. stat.S_IFDOOR Door. New in version 3.4. stat.S_IFPORT Event port. New in version 3.4. stat.S_IFWHT Whiteout. New in version 3.4. Note: "S_IFDOOR", "S_IFPORT" or "S_IFWHT" are defined as 0 when the platform does not have support for the file types. The following flags can also be used in the *mode* argument of "os.chmod()": stat.S_ISUID Set UID bit. stat.S_ISGID Set-group-ID bit. This bit has several special uses. For a directory it indicates that BSD semantics is to be used for that directory: files created there inherit their group ID from the directory, not from the effective group ID of the creating process, and directories created there will also get the "S_ISGID" bit set. For a file that does not have the group execution bit ("S_IXGRP") set, the set-group-ID bit indicates mandatory file/record locking (see also "S_ENFMT"). stat.S_ISVTX Sticky bit. When this bit is set on a directory it means that a file in that directory can be renamed or deleted only by the owner of the file, by the owner of the directory, or by a privileged process. stat.S_IRWXU Mask for file owner permissions. stat.S_IRUSR Owner has read permission. stat.S_IWUSR Owner has write permission. stat.S_IXUSR Owner has execute permission. stat.S_IRWXG Mask for group permissions. stat.S_IRGRP Group has read permission. stat.S_IWGRP Group has write permission. stat.S_IXGRP Group has execute permission. stat.S_IRWXO Mask for permissions for others (not in group). stat.S_IROTH Others have read permission. stat.S_IWOTH Others have write permission. stat.S_IXOTH Others have execute permission. stat.S_ENFMT System V file locking enforcement. This flag is shared with "S_ISGID": file/record locking is enforced on files that do not have the group execution bit ("S_IXGRP") set. stat.S_IREAD Unix V7 synonym for "S_IRUSR". stat.S_IWRITE Unix V7 synonym for "S_IWUSR". stat.S_IEXEC Unix V7 synonym for "S_IXUSR". The following flags can be used in the *flags* argument of "os.chflags()": stat.UF_NODUMP Do not dump the file. stat.UF_IMMUTABLE The file may not be changed. stat.UF_APPEND The file may only be appended to. stat.UF_OPAQUE The directory is opaque when viewed through a union stack. stat.UF_NOUNLINK The file may not be renamed or deleted. stat.UF_COMPRESSED The file is stored compressed (macOS 10.6+). stat.UF_HIDDEN The file should not be displayed in a GUI (macOS 10.5+). stat.SF_ARCHIVED The file may be archived. stat.SF_IMMUTABLE The file may not be changed. stat.SF_APPEND The file may only be appended to. stat.SF_NOUNLINK The file may not be renamed or deleted. stat.SF_SNAPSHOT The file is a snapshot file. See the *BSD or macOS systems man page *chflags(2)* for more information. On Windows, the following file attribute constants are available for use when testing bits in the "st_file_attributes" member returned by "os.stat()". See the Windows API documentation for more detail on the meaning of these constants. stat.FILE_ATTRIBUTE_ARCHIVE stat.FILE_ATTRIBUTE_COMPRESSED stat.FILE_ATTRIBUTE_DEVICE stat.FILE_ATTRIBUTE_DIRECTORY stat.FILE_ATTRIBUTE_ENCRYPTED stat.FILE_ATTRIBUTE_HIDDEN stat.FILE_ATTRIBUTE_INTEGRITY_STREAM stat.FILE_ATTRIBUTE_NORMAL stat.FILE_ATTRIBUTE_NOT_CONTENT_INDEXED stat.FILE_ATTRIBUTE_NO_SCRUB_DATA stat.FILE_ATTRIBUTE_OFFLINE stat.FILE_ATTRIBUTE_READONLY stat.FILE_ATTRIBUTE_REPARSE_POINT stat.FILE_ATTRIBUTE_SPARSE_FILE stat.FILE_ATTRIBUTE_SYSTEM stat.FILE_ATTRIBUTE_TEMPORARY stat.FILE_ATTRIBUTE_VIRTUAL New in version 3.5. On Windows, the following constants are available for comparing against the "st_reparse_tag" member returned by "os.lstat()". These are well-known constants, but are not an exhaustive list. stat.IO_REPARSE_TAG_SYMLINK stat.IO_REPARSE_TAG_MOUNT_POINT stat.IO_REPARSE_TAG_APPEXECLINK New in version 3.8. "statistics" — Mathematical statistics functions ************************************************ New in version 3.4. **Source code:** Lib/statistics.py ====================================================================== This module provides functions for calculating mathematical statistics of numeric ("Real"-valued) data. The module is not intended to be a competitor to third-party libraries such as NumPy, SciPy, or proprietary full-featured statistics packages aimed at professional statisticians such as Minitab, SAS and Matlab. It is aimed at the level of graphing and scientific calculators. Unless explicitly noted, these functions support "int", "float", "Decimal" and "Fraction". Behaviour with other types (whether in the numeric tower or not) is currently unsupported. Collections with a mix of types are also undefined and implementation-dependent. If your input data consists of mixed types, you may be able to use "map()" to ensure a consistent result, for example: "map(float, input_data)". Averages and measures of central location ========================================= These functions calculate an average or typical value from a population or sample. +-------------------------+-----------------------------------------------------------------+ | "mean()" | Arithmetic mean (“average”) of data. | +-------------------------+-----------------------------------------------------------------+ | "fmean()" | Fast, floating point arithmetic mean. | +-------------------------+-----------------------------------------------------------------+ | "geometric_mean()" | Geometric mean of data. | +-------------------------+-----------------------------------------------------------------+ | "harmonic_mean()" | Harmonic mean of data. | +-------------------------+-----------------------------------------------------------------+ | "median()" | Median (middle value) of data. | +-------------------------+-----------------------------------------------------------------+ | "median_low()" | Low median of data. | +-------------------------+-----------------------------------------------------------------+ | "median_high()" | High median of data. | +-------------------------+-----------------------------------------------------------------+ | "median_grouped()" | Median, or 50th percentile, of grouped data. | +-------------------------+-----------------------------------------------------------------+ | "mode()" | Single mode (most common value) of discrete or nominal data. | +-------------------------+-----------------------------------------------------------------+ | "multimode()" | List of modes (most common values) of discrete or nomimal data. | +-------------------------+-----------------------------------------------------------------+ | "quantiles()" | Divide data into intervals with equal probability. | +-------------------------+-----------------------------------------------------------------+ Measures of spread ================== These functions calculate a measure of how much the population or sample tends to deviate from the typical or average values. +-------------------------+-----------------------------------------------+ | "pstdev()" | Population standard deviation of data. | +-------------------------+-----------------------------------------------+ | "pvariance()" | Population variance of data. | +-------------------------+-----------------------------------------------+ | "stdev()" | Sample standard deviation of data. | +-------------------------+-----------------------------------------------+ | "variance()" | Sample variance of data. | +-------------------------+-----------------------------------------------+ Function details ================ Note: The functions do not require the data given to them to be sorted. However, for reading convenience, most of the examples show sorted sequences. statistics.mean(data) Return the sample arithmetic mean of *data* which can be a sequence or iterable. The arithmetic mean is the sum of the data divided by the number of data points. It is commonly called “the average”, although it is only one of many different mathematical averages. It is a measure of the central location of the data. If *data* is empty, "StatisticsError" will be raised. Some examples of use: >>> mean([1, 2, 3, 4, 4]) 2.8 >>> mean([-1.0, 2.5, 3.25, 5.75]) 2.625 >>> from fractions import Fraction as F >>> mean([F(3, 7), F(1, 21), F(5, 3), F(1, 3)]) Fraction(13, 21) >>> from decimal import Decimal as D >>> mean([D("0.5"), D("0.75"), D("0.625"), D("0.375")]) Decimal('0.5625') Note: The mean is strongly affected by outliers and is not necessarily a typical example of the data points. For a more robust, although less efficient, measure of central tendency, see "median()".The sample mean gives an unbiased estimate of the true population mean, so that when taken on average over all the possible samples, "mean(sample)" converges on the true mean of the entire population. If *data* represents the entire population rather than a sample, then "mean(data)" is equivalent to calculating the true population mean μ. statistics.fmean(data) Convert *data* to floats and compute the arithmetic mean. This runs faster than the "mean()" function and it always returns a "float". The *data* may be a sequence or iterable. If the input dataset is empty, raises a "StatisticsError". >>> fmean([3.5, 4.0, 5.25]) 4.25 New in version 3.8. statistics.geometric_mean(data) Convert *data* to floats and compute the geometric mean. The geometric mean indicates the central tendency or typical value of the *data* using the product of the values (as opposed to the arithmetic mean which uses their sum). Raises a "StatisticsError" if the input dataset is empty, if it contains a zero, or if it contains a negative value. The *data* may be a sequence or iterable. No special efforts are made to achieve exact results. (However, this may change in the future.) >>> round(geometric_mean([54, 24, 36]), 1) 36.0 New in version 3.8. statistics.harmonic_mean(data) Return the harmonic mean of *data*, a sequence or iterable of real- valued numbers. The harmonic mean, sometimes called the subcontrary mean, is the reciprocal of the arithmetic "mean()" of the reciprocals of the data. For example, the harmonic mean of three values *a*, *b* and *c* will be equivalent to "3/(1/a + 1/b + 1/c)". If one of the values is zero, the result will be zero. The harmonic mean is a type of average, a measure of the central location of the data. It is often appropriate when averaging rates or ratios, for example speeds. Suppose a car travels 10 km at 40 km/hr, then another 10 km at 60 km/hr. What is the average speed? >>> harmonic_mean([40, 60]) 48.0 Suppose an investor purchases an equal value of shares in each of three companies, with P/E (price/earning) ratios of 2.5, 3 and 10. What is the average P/E ratio for the investor’s portfolio? >>> harmonic_mean([2.5, 3, 10]) # For an equal investment portfolio. 3.6 "StatisticsError" is raised if *data* is empty, or any element is less than zero. The current algorithm has an early-out when it encounters a zero in the input. This means that the subsequent inputs are not tested for validity. (This behavior may change in the future.) New in version 3.6. statistics.median(data) Return the median (middle value) of numeric data, using the common “mean of middle two” method. If *data* is empty, "StatisticsError" is raised. *data* can be a sequence or iterable. The median is a robust measure of central location and is less affected by the presence of outliers. When the number of data points is odd, the middle data point is returned: >>> median([1, 3, 5]) 3 When the number of data points is even, the median is interpolated by taking the average of the two middle values: >>> median([1, 3, 5, 7]) 4.0 This is suited for when your data is discrete, and you don’t mind that the median may not be an actual data point. If the data is ordinal (supports order operations) but not numeric (doesn’t support addition), consider using "median_low()" or "median_high()" instead. statistics.median_low(data) Return the low median of numeric data. If *data* is empty, "StatisticsError" is raised. *data* can be a sequence or iterable. The low median is always a member of the data set. When the number of data points is odd, the middle value is returned. When it is even, the smaller of the two middle values is returned. >>> median_low([1, 3, 5]) 3 >>> median_low([1, 3, 5, 7]) 3 Use the low median when your data are discrete and you prefer the median to be an actual data point rather than interpolated. statistics.median_high(data) Return the high median of data. If *data* is empty, "StatisticsError" is raised. *data* can be a sequence or iterable. The high median is always a member of the data set. When the number of data points is odd, the middle value is returned. When it is even, the larger of the two middle values is returned. >>> median_high([1, 3, 5]) 3 >>> median_high([1, 3, 5, 7]) 5 Use the high median when your data are discrete and you prefer the median to be an actual data point rather than interpolated. statistics.median_grouped(data, interval=1) Return the median of grouped continuous data, calculated as the 50th percentile, using interpolation. If *data* is empty, "StatisticsError" is raised. *data* can be a sequence or iterable. >>> median_grouped([52, 52, 53, 54]) 52.5 In the following example, the data are rounded, so that each value represents the midpoint of data classes, e.g. 1 is the midpoint of the class 0.5–1.5, 2 is the midpoint of 1.5–2.5, 3 is the midpoint of 2.5–3.5, etc. With the data given, the middle value falls somewhere in the class 3.5–4.5, and interpolation is used to estimate it: >>> median_grouped([1, 2, 2, 3, 4, 4, 4, 4, 4, 5]) 3.7 Optional argument *interval* represents the class interval, and defaults to 1. Changing the class interval naturally will change the interpolation: >>> median_grouped([1, 3, 3, 5, 7], interval=1) 3.25 >>> median_grouped([1, 3, 3, 5, 7], interval=2) 3.5 This function does not check whether the data points are at least *interval* apart. **CPython implementation detail:** Under some circumstances, "median_grouped()" may coerce data points to floats. This behaviour is likely to change in the future. See also: * “Statistics for the Behavioral Sciences”, Frederick J Gravetter and Larry B Wallnau (8th Edition). * The SSMEDIAN function in the Gnome Gnumeric spreadsheet, including this discussion. statistics.mode(data) Return the single most common data point from discrete or nominal *data*. The mode (when it exists) is the most typical value and serves as a measure of central location. If there are multiple modes with the same frequency, returns the first one encountered in the *data*. If the smallest or largest of those is desired instead, use "min(multimode(data))" or "max(multimode(data))". If the input *data* is empty, "StatisticsError" is raised. "mode" assumes discrete data and returns a single value. This is the standard treatment of the mode as commonly taught in schools: >>> mode([1, 1, 2, 3, 3, 3, 3, 4]) 3 The mode is unique in that it is the only statistic in this package that also applies to nominal (non-numeric) data: >>> mode(["red", "blue", "blue", "red", "green", "red", "red"]) 'red' Changed in version 3.8: Now handles multimodal datasets by returning the first mode encountered. Formerly, it raised "StatisticsError" when more than one mode was found. statistics.multimode(data) Return a list of the most frequently occurring values in the order they were first encountered in the *data*. Will return more than one result if there are multiple modes or an empty list if the *data* is empty: >>> multimode('aabbbbccddddeeffffgg') ['b', 'd', 'f'] >>> multimode('') [] New in version 3.8. statistics.pstdev(data, mu=None) Return the population standard deviation (the square root of the population variance). See "pvariance()" for arguments and other details. >>> pstdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75]) 0.986893273527251 statistics.pvariance(data, mu=None) Return the population variance of *data*, a non-empty sequence or iterable of real-valued numbers. Variance, or second moment about the mean, is a measure of the variability (spread or dispersion) of data. A large variance indicates that the data is spread out; a small variance indicates it is clustered closely around the mean. If the optional second argument *mu* is given, it is typically the mean of the *data*. It can also be used to compute the second moment around a point that is not the mean. If it is missing or "None" (the default), the arithmetic mean is automatically calculated. Use this function to calculate the variance from the entire population. To estimate the variance from a sample, the "variance()" function is usually a better choice. Raises "StatisticsError" if *data* is empty. Examples: >>> data = [0.0, 0.25, 0.25, 1.25, 1.5, 1.75, 2.75, 3.25] >>> pvariance(data) 1.25 If you have already calculated the mean of your data, you can pass it as the optional second argument *mu* to avoid recalculation: >>> mu = mean(data) >>> pvariance(data, mu) 1.25 Decimals and Fractions are supported: >>> from decimal import Decimal as D >>> pvariance([D("27.5"), D("30.25"), D("30.25"), D("34.5"), D("41.75")]) Decimal('24.815') >>> from fractions import Fraction as F >>> pvariance([F(1, 4), F(5, 4), F(1, 2)]) Fraction(13, 72) Note: When called with the entire population, this gives the population variance σ². When called on a sample instead, this is the biased sample variance s², also known as variance with N degrees of freedom.If you somehow know the true population mean μ, you may use this function to calculate the variance of a sample, giving the known population mean as the second argument. Provided the data points are a random sample of the population, the result will be an unbiased estimate of the population variance. statistics.stdev(data, xbar=None) Return the sample standard deviation (the square root of the sample variance). See "variance()" for arguments and other details. >>> stdev([1.5, 2.5, 2.5, 2.75, 3.25, 4.75]) 1.0810874155219827 statistics.variance(data, xbar=None) Return the sample variance of *data*, an iterable of at least two real-valued numbers. Variance, or second moment about the mean, is a measure of the variability (spread or dispersion) of data. A large variance indicates that the data is spread out; a small variance indicates it is clustered closely around the mean. If the optional second argument *xbar* is given, it should be the mean of *data*. If it is missing or "None" (the default), the mean is automatically calculated. Use this function when your data is a sample from a population. To calculate the variance from the entire population, see "pvariance()". Raises "StatisticsError" if *data* has fewer than two values. Examples: >>> data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5] >>> variance(data) 1.3720238095238095 If you have already calculated the mean of your data, you can pass it as the optional second argument *xbar* to avoid recalculation: >>> m = mean(data) >>> variance(data, m) 1.3720238095238095 This function does not attempt to verify that you have passed the actual mean as *xbar*. Using arbitrary values for *xbar* can lead to invalid or impossible results. Decimal and Fraction values are supported: >>> from decimal import Decimal as D >>> variance([D("27.5"), D("30.25"), D("30.25"), D("34.5"), D("41.75")]) Decimal('31.01875') >>> from fractions import Fraction as F >>> variance([F(1, 6), F(1, 2), F(5, 3)]) Fraction(67, 108) Note: This is the sample variance s² with Bessel’s correction, also known as variance with N-1 degrees of freedom. Provided that the data points are representative (e.g. independent and identically distributed), the result should be an unbiased estimate of the true population variance.If you somehow know the actual population mean μ you should pass it to the "pvariance()" function as the *mu* parameter to get the variance of a sample. statistics.quantiles(data, *, n=4, method='exclusive') Divide *data* into *n* continuous intervals with equal probability. Returns a list of "n - 1" cut points separating the intervals. Set *n* to 4 for quartiles (the default). Set *n* to 10 for deciles. Set *n* to 100 for percentiles which gives the 99 cuts points that separate *data* into 100 equal sized groups. Raises "StatisticsError" if *n* is not least 1. The *data* can be any iterable containing sample data. For meaningful results, the number of data points in *data* should be larger than *n*. Raises "StatisticsError" if there are not at least two data points. The cut points are linearly interpolated from the two nearest data points. For example, if a cut point falls one-third of the distance between two sample values, "100" and "112", the cut-point will evaluate to "104". The *method* for computing quantiles can be varied depending on whether the *data* includes or excludes the lowest and highest possible values from the population. The default *method* is “exclusive” and is used for data sampled from a population that can have more extreme values than found in the samples. The portion of the population falling below the *i-th* of *m* sorted data points is computed as "i / (m + 1)". Given nine sample values, the method sorts them and assigns the following percentiles: 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%. Setting the *method* to “inclusive” is used for describing population data or for samples that are known to include the most extreme values from the population. The minimum value in *data* is treated as the 0th percentile and the maximum value is treated as the 100th percentile. The portion of the population falling below the *i-th* of *m* sorted data points is computed as "(i - 1) / (m - 1)". Given 11 sample values, the method sorts them and assigns the following percentiles: 0%, 10%, 20%, 30%, 40%, 50%, 60%, 70%, 80%, 90%, 100%. # Decile cut points for empirically sampled data >>> data = [105, 129, 87, 86, 111, 111, 89, 81, 108, 92, 110, ... 100, 75, 105, 103, 109, 76, 119, 99, 91, 103, 129, ... 106, 101, 84, 111, 74, 87, 86, 103, 103, 106, 86, ... 111, 75, 87, 102, 121, 111, 88, 89, 101, 106, 95, ... 103, 107, 101, 81, 109, 104] >>> [round(q, 1) for q in quantiles(data, n=10)] [81.0, 86.2, 89.0, 99.4, 102.5, 103.6, 106.0, 109.8, 111.0] New in version 3.8. Exceptions ========== A single exception is defined: exception statistics.StatisticsError Subclass of "ValueError" for statistics-related exceptions. "NormalDist" objects ==================== "NormalDist" is a tool for creating and manipulating normal distributions of a random variable. It is a class that treats the mean and standard deviation of data measurements as a single entity. Normal distributions arise from the Central Limit Theorem and have a wide range of applications in statistics. class statistics.NormalDist(mu=0.0, sigma=1.0) Returns a new *NormalDist* object where *mu* represents the arithmetic mean and *sigma* represents the standard deviation. If *sigma* is negative, raises "StatisticsError". mean A read-only property for the arithmetic mean of a normal distribution. median A read-only property for the median of a normal distribution. mode A read-only property for the mode of a normal distribution. stdev A read-only property for the standard deviation of a normal distribution. variance A read-only property for the variance of a normal distribution. Equal to the square of the standard deviation. classmethod from_samples(data) Makes a normal distribution instance with *mu* and *sigma* parameters estimated from the *data* using "fmean()" and "stdev()". The *data* can be any *iterable* and should consist of values that can be converted to type "float". If *data* does not contain at least two elements, raises "StatisticsError" because it takes at least one point to estimate a central value and at least two points to estimate dispersion. samples(n, *, seed=None) Generates *n* random samples for a given mean and standard deviation. Returns a "list" of "float" values. If *seed* is given, creates a new instance of the underlying random number generator. This is useful for creating reproducible results, even in a multi-threading context. pdf(x) Using a probability density function (pdf), compute the relative likelihood that a random variable *X* will be near the given value *x*. Mathematically, it is the limit of the ratio "P(x <= X < x+dx) / dx" as *dx* approaches zero. The relative likelihood is computed as the probability of a sample occurring in a narrow range divided by the width of the range (hence the word “density”). Since the likelihood is relative to other points, its value can be greater than *1.0*. cdf(x) Using a cumulative distribution function (cdf), compute the probability that a random variable *X* will be less than or equal to *x*. Mathematically, it is written "P(X <= x)". inv_cdf(p) Compute the inverse cumulative distribution function, also known as the quantile function or the percent-point function. Mathematically, it is written "x : P(X <= x) = p". Finds the value *x* of the random variable *X* such that the probability of the variable being less than or equal to that value equals the given probability *p*. overlap(other) Measures the agreement between two normal probability distributions. Returns a value between 0.0 and 1.0 giving the overlapping area for the two probability density functions. quantiles(n=4) Divide the normal distribution into *n* continuous intervals with equal probability. Returns a list of (n - 1) cut points separating the intervals. Set *n* to 4 for quartiles (the default). Set *n* to 10 for deciles. Set *n* to 100 for percentiles which gives the 99 cuts points that separate the normal distribution into 100 equal sized groups. zscore(x) Compute the Standard Score describing *x* in terms of the number of standard deviations above or below the mean of the normal distribution: "(x - mean) / stdev". New in version 3.9. Instances of "NormalDist" support addition, subtraction, multiplication and division by a constant. These operations are used for translation and scaling. For example: >>> temperature_february = NormalDist(5, 2.5) # Celsius >>> temperature_february * (9/5) + 32 # Fahrenheit NormalDist(mu=41.0, sigma=4.5) Dividing a constant by an instance of "NormalDist" is not supported because the result wouldn’t be normally distributed. Since normal distributions arise from additive effects of independent variables, it is possible to add and subtract two independent normally distributed random variables represented as instances of "NormalDist". For example: >>> birth_weights = NormalDist.from_samples([2.5, 3.1, 2.1, 2.4, 2.7, 3.5]) >>> drug_effects = NormalDist(0.4, 0.15) >>> combined = birth_weights + drug_effects >>> round(combined.mean, 1) 3.1 >>> round(combined.stdev, 1) 0.5 New in version 3.8. "NormalDist" Examples and Recipes --------------------------------- "NormalDist" readily solves classic probability problems. For example, given historical data for SAT exams showing that scores are normally distributed with a mean of 1060 and a standard deviation of 195, determine the percentage of students with test scores between 1100 and 1200, after rounding to the nearest whole number: >>> sat = NormalDist(1060, 195) >>> fraction = sat.cdf(1200 + 0.5) - sat.cdf(1100 - 0.5) >>> round(fraction * 100.0, 1) 18.4 Find the quartiles and deciles for the SAT scores: >>> list(map(round, sat.quantiles())) [928, 1060, 1192] >>> list(map(round, sat.quantiles(n=10))) [810, 896, 958, 1011, 1060, 1109, 1162, 1224, 1310] To estimate the distribution for a model than isn’t easy to solve analytically, "NormalDist" can generate input samples for a Monte Carlo simulation: >>> def model(x, y, z): ... return (3*x + 7*x*y - 5*y) / (11 * z) ... >>> n = 100_000 >>> X = NormalDist(10, 2.5).samples(n, seed=3652260728) >>> Y = NormalDist(15, 1.75).samples(n, seed=4582495471) >>> Z = NormalDist(50, 1.25).samples(n, seed=6582483453) >>> quantiles(map(model, X, Y, Z)) [1.4591308524824727, 1.8035946855390597, 2.175091447274739] Normal distributions can be used to approximate Binomial distributions when the sample size is large and when the probability of a successful trial is near 50%. For example, an open source conference has 750 attendees and two rooms with a 500 person capacity. There is a talk about Python and another about Ruby. In previous conferences, 65% of the attendees preferred to listen to Python talks. Assuming the population preferences haven’t changed, what is the probability that the Python room will stay within its capacity limits? >>> n = 750 # Sample size >>> p = 0.65 # Preference for Python >>> q = 1.0 - p # Preference for Ruby >>> k = 500 # Room capacity >>> # Approximation using the cumulative normal distribution >>> from math import sqrt >>> round(NormalDist(mu=n*p, sigma=sqrt(n*p*q)).cdf(k + 0.5), 4) 0.8402 >>> # Solution using the cumulative binomial distribution >>> from math import comb, fsum >>> round(fsum(comb(n, r) * p**r * q**(n-r) for r in range(k+1)), 4) 0.8402 >>> # Approximation using a simulation >>> from random import seed, choices >>> seed(8675309) >>> def trial(): ... return choices(('Python', 'Ruby'), (p, q), k=n).count('Python') >>> mean(trial() <= k for i in range(10_000)) 0.8398 Normal distributions commonly arise in machine learning problems. Wikipedia has a nice example of a Naive Bayesian Classifier. The challenge is to predict a person’s gender from measurements of normally distributed features including height, weight, and foot size. We’re given a training dataset with measurements for eight people. The measurements are assumed to be normally distributed, so we summarize the data with "NormalDist": >>> height_male = NormalDist.from_samples([6, 5.92, 5.58, 5.92]) >>> height_female = NormalDist.from_samples([5, 5.5, 5.42, 5.75]) >>> weight_male = NormalDist.from_samples([180, 190, 170, 165]) >>> weight_female = NormalDist.from_samples([100, 150, 130, 150]) >>> foot_size_male = NormalDist.from_samples([12, 11, 12, 10]) >>> foot_size_female = NormalDist.from_samples([6, 8, 7, 9]) Next, we encounter a new person whose feature measurements are known but whose gender is unknown: >>> ht = 6.0 # height >>> wt = 130 # weight >>> fs = 8 # foot size Starting with a 50% prior probability of being male or female, we compute the posterior as the prior times the product of likelihoods for the feature measurements given the gender: >>> prior_male = 0.5 >>> prior_female = 0.5 >>> posterior_male = (prior_male * height_male.pdf(ht) * ... weight_male.pdf(wt) * foot_size_male.pdf(fs)) >>> posterior_female = (prior_female * height_female.pdf(ht) * ... weight_female.pdf(wt) * foot_size_female.pdf(fs)) The final prediction goes to the largest posterior. This is known as the maximum a posteriori or MAP: >>> 'male' if posterior_male > posterior_female else 'female' 'female' Built-in Types ************** The following sections describe the standard types that are built into the interpreter. The principal built-in types are numerics, sequences, mappings, classes, instances and exceptions. Some collection classes are mutable. The methods that add, subtract, or rearrange their members in place, and don’t return a specific item, never return the collection instance itself but "None". Some operations are supported by several object types; in particular, practically all objects can be compared for equality, tested for truth value, and converted to a string (with the "repr()" function or the slightly different "str()" function). The latter function is implicitly used when an object is written by the "print()" function. Truth Value Testing =================== Any object can be tested for truth value, for use in an "if" or "while" condition or as operand of the Boolean operations below. By default, an object is considered true unless its class defines either a "__bool__()" method that returns "False" or a "__len__()" method that returns zero, when called with the object. [1] Here are most of the built-in objects considered false: * constants defined to be false: "None" and "False". * zero of any numeric type: "0", "0.0", "0j", "Decimal(0)", "Fraction(0, 1)" * empty sequences and collections: "''", "()", "[]", "{}", "set()", "range(0)" Operations and built-in functions that have a Boolean result always return "0" or "False" for false and "1" or "True" for true, unless otherwise stated. (Important exception: the Boolean operations "or" and "and" always return one of their operands.) Boolean Operations — "and", "or", "not" ======================================= These are the Boolean operations, ordered by ascending priority: +---------------+-----------------------------------+---------+ | Operation | Result | Notes | |===============|===================================|=========| | "x or y" | if *x* is false, then *y*, else | (1) | | | *x* | | +---------------+-----------------------------------+---------+ | "x and y" | if *x* is false, then *x*, else | (2) | | | *y* | | +---------------+-----------------------------------+---------+ | "not x" | if *x* is false, then "True", | (3) | | | else "False" | | +---------------+-----------------------------------+---------+ Notes: 1. This is a short-circuit operator, so it only evaluates the second argument if the first one is false. 2. This is a short-circuit operator, so it only evaluates the second argument if the first one is true. 3. "not" has a lower priority than non-Boolean operators, so "not a == b" is interpreted as "not (a == b)", and "a == not b" is a syntax error. Comparisons =========== There are eight comparison operations in Python. They all have the same priority (which is higher than that of the Boolean operations). Comparisons can be chained arbitrarily; for example, "x < y <= z" is equivalent to "x < y and y <= z", except that *y* is evaluated only once (but in both cases *z* is not evaluated at all when "x < y" is found to be false). This table summarizes the comparison operations: +--------------+---------------------------+ | Operation | Meaning | |==============|===========================| | "<" | strictly less than | +--------------+---------------------------+ | "<=" | less than or equal | +--------------+---------------------------+ | ">" | strictly greater than | +--------------+---------------------------+ | ">=" | greater than or equal | +--------------+---------------------------+ | "==" | equal | +--------------+---------------------------+ | "!=" | not equal | +--------------+---------------------------+ | "is" | object identity | +--------------+---------------------------+ | "is not" | negated object identity | +--------------+---------------------------+ Objects of different types, except different numeric types, never compare equal. The "==" operator is always defined but for some object types (for example, class objects) is equivalent to "is". The "<", "<=", ">" and ">=" operators are only defined where they make sense; for example, they raise a "TypeError" exception when one of the arguments is a complex number. Non-identical instances of a class normally compare as non-equal unless the class defines the "__eq__()" method. Instances of a class cannot be ordered with respect to other instances of the same class, or other types of object, unless the class defines enough of the methods "__lt__()", "__le__()", "__gt__()", and "__ge__()" (in general, "__lt__()" and "__eq__()" are sufficient, if you want the conventional meanings of the comparison operators). The behavior of the "is" and "is not" operators cannot be customized; also they can be applied to any two objects and never raise an exception. Two more operations with the same syntactic priority, "in" and "not in", are supported by types that are *iterable* or implement the "__contains__()" method. Numeric Types — "int", "float", "complex" ========================================= There are three distinct numeric types: *integers*, *floating point numbers*, and *complex numbers*. In addition, Booleans are a subtype of integers. Integers have unlimited precision. Floating point numbers are usually implemented using "double" in C; information about the precision and internal representation of floating point numbers for the machine on which your program is running is available in "sys.float_info". Complex numbers have a real and imaginary part, which are each a floating point number. To extract these parts from a complex number *z*, use "z.real" and "z.imag". (The standard library includes the additional numeric types "fractions.Fraction", for rationals, and "decimal.Decimal", for floating-point numbers with user-definable precision.) Numbers are created by numeric literals or as the result of built-in functions and operators. Unadorned integer literals (including hex, octal and binary numbers) yield integers. Numeric literals containing a decimal point or an exponent sign yield floating point numbers. Appending "'j'" or "'J'" to a numeric literal yields an imaginary number (a complex number with a zero real part) which you can add to an integer or float to get a complex number with real and imaginary parts. Python fully supports mixed arithmetic: when a binary arithmetic operator has operands of different numeric types, the operand with the “narrower” type is widened to that of the other, where integer is narrower than floating point, which is narrower than complex. A comparison between numbers of different types behaves as though the exact values of those numbers were being compared. [2] The constructors "int()", "float()", and "complex()" can be used to produce numbers of a specific type. All numeric types (except complex) support the following operations (for priorities of the operations, see Operator precedence): +-----------------------+-----------------------------------+-----------+----------------------+ | Operation | Result | Notes | Full documentation | |=======================|===================================|===========|======================| | "x + y" | sum of *x* and *y* | | | +-----------------------+-----------------------------------+-----------+----------------------+ | "x - y" | difference of *x* and *y* | | | +-----------------------+-----------------------------------+-----------+----------------------+ | "x * y" | product of *x* and *y* | | | +-----------------------+-----------------------------------+-----------+----------------------+ | "x / y" | quotient of *x* and *y* | | | +-----------------------+-----------------------------------+-----------+----------------------+ | "x // y" | floored quotient of *x* and *y* | (1) | | +-----------------------+-----------------------------------+-----------+----------------------+ | "x % y" | remainder of "x / y" | (2) | | +-----------------------+-----------------------------------+-----------+----------------------+ | "-x" | *x* negated | | | +-----------------------+-----------------------------------+-----------+----------------------+ | "+x" | *x* unchanged | | | +-----------------------+-----------------------------------+-----------+----------------------+ | "abs(x)" | absolute value or magnitude of | | "abs()" | | | *x* | | | +-----------------------+-----------------------------------+-----------+----------------------+ | "int(x)" | *x* converted to integer | (3)(6) | "int()" | +-----------------------+-----------------------------------+-----------+----------------------+ | "float(x)" | *x* converted to floating point | (4)(6) | "float()" | +-----------------------+-----------------------------------+-----------+----------------------+ | "complex(re, im)" | a complex number with real part | (6) | "complex()" | | | *re*, imaginary part *im*. *im* | | | | | defaults to zero. | | | +-----------------------+-----------------------------------+-----------+----------------------+ | "c.conjugate()" | conjugate of the complex number | | | | | *c* | | | +-----------------------+-----------------------------------+-----------+----------------------+ | "divmod(x, y)" | the pair "(x // y, x % y)" | (2) | "divmod()" | +-----------------------+-----------------------------------+-----------+----------------------+ | "pow(x, y)" | *x* to the power *y* | (5) | "pow()" | +-----------------------+-----------------------------------+-----------+----------------------+ | "x ** y" | *x* to the power *y* | (5) | | +-----------------------+-----------------------------------+-----------+----------------------+ Notes: 1. Also referred to as integer division. The resultant value is a whole integer, though the result’s type is not necessarily int. The result is always rounded towards minus infinity: "1//2" is "0", "(-1)//2" is "-1", "1//(-2)" is "-1", and "(-1)//(-2)" is "0". 2. Not for complex numbers. Instead convert to floats using "abs()" if appropriate. 3. Conversion from floating point to integer may round or truncate as in C; see functions "math.floor()" and "math.ceil()" for well- defined conversions. 4. float also accepts the strings “nan” and “inf” with an optional prefix “+” or “-” for Not a Number (NaN) and positive or negative infinity. 5. Python defines "pow(0, 0)" and "0 ** 0" to be "1", as is common for programming languages. 6. The numeric literals accepted include the digits "0" to "9" or any Unicode equivalent (code points with the "Nd" property). See https://www.unicode.org/Public/13.0.0/ucd/extracted/DerivedNum ericType.txt for a complete list of code points with the "Nd" property. All "numbers.Real" types ("int" and "float") also include the following operations: +----------------------+-----------------------------------------------+ | Operation | Result | |======================|===============================================| | "math.trunc(x)" | *x* truncated to "Integral" | +----------------------+-----------------------------------------------+ | "round(x[, n])" | *x* rounded to *n* digits, rounding half to | | | even. If *n* is omitted, it defaults to 0. | +----------------------+-----------------------------------------------+ | "math.floor(x)" | the greatest "Integral" <= *x* | +----------------------+-----------------------------------------------+ | "math.ceil(x)" | the least "Integral" >= *x* | +----------------------+-----------------------------------------------+ For additional numeric operations see the "math" and "cmath" modules. Bitwise Operations on Integer Types ----------------------------------- Bitwise operations only make sense for integers. The result of bitwise operations is calculated as though carried out in two’s complement with an infinite number of sign bits. The priorities of the binary bitwise operations are all lower than the numeric operations and higher than the comparisons; the unary operation "~" has the same priority as the other unary numeric operations ("+" and "-"). This table lists the bitwise operations sorted in ascending priority: +--------------+----------------------------------+------------+ | Operation | Result | Notes | |==============|==================================|============| | "x | y" | bitwise *or* of *x* and *y* | (4) | +--------------+----------------------------------+------------+ | "x ^ y" | bitwise *exclusive or* of *x* | (4) | | | and *y* | | +--------------+----------------------------------+------------+ | "x & y" | bitwise *and* of *x* and *y* | (4) | +--------------+----------------------------------+------------+ | "x << n" | *x* shifted left by *n* bits | (1)(2) | +--------------+----------------------------------+------------+ | "x >> n" | *x* shifted right by *n* bits | (1)(3) | +--------------+----------------------------------+------------+ | "~x" | the bits of *x* inverted | | +--------------+----------------------------------+------------+ Notes: 1. Negative shift counts are illegal and cause a "ValueError" to be raised. 2. A left shift by *n* bits is equivalent to multiplication by "pow(2, n)". 3. A right shift by *n* bits is equivalent to floor division by "pow(2, n)". 4. Performing these calculations with at least one extra sign extension bit in a finite two’s complement representation (a working bit-width of "1 + max(x.bit_length(), y.bit_length())" or more) is sufficient to get the same result as if there were an infinite number of sign bits. Additional Methods on Integer Types ----------------------------------- The int type implements the "numbers.Integral" *abstract base class*. In addition, it provides a few more methods: int.bit_length() Return the number of bits necessary to represent an integer in binary, excluding the sign and leading zeros: >>> n = -37 >>> bin(n) '-0b100101' >>> n.bit_length() 6 More precisely, if "x" is nonzero, then "x.bit_length()" is the unique positive integer "k" such that "2**(k-1) <= abs(x) < 2**k". Equivalently, when "abs(x)" is small enough to have a correctly rounded logarithm, then "k = 1 + int(log(abs(x), 2))". If "x" is zero, then "x.bit_length()" returns "0". Equivalent to: def bit_length(self): s = bin(self) # binary representation: bin(-37) --> '-0b100101' s = s.lstrip('-0b') # remove leading zeros and minus sign return len(s) # len('100101') --> 6 New in version 3.1. int.to_bytes(length, byteorder, *, signed=False) Return an array of bytes representing an integer. >>> (1024).to_bytes(2, byteorder='big') b'\x04\x00' >>> (1024).to_bytes(10, byteorder='big') b'\x00\x00\x00\x00\x00\x00\x00\x00\x04\x00' >>> (-1024).to_bytes(10, byteorder='big', signed=True) b'\xff\xff\xff\xff\xff\xff\xff\xff\xfc\x00' >>> x = 1000 >>> x.to_bytes((x.bit_length() + 7) // 8, byteorder='little') b'\xe8\x03' The integer is represented using *length* bytes. An "OverflowError" is raised if the integer is not representable with the given number of bytes. The *byteorder* argument determines the byte order used to represent the integer. If *byteorder* is ""big"", the most significant byte is at the beginning of the byte array. If *byteorder* is ""little"", the most significant byte is at the end of the byte array. To request the native byte order of the host system, use "sys.byteorder" as the byte order value. The *signed* argument determines whether two’s complement is used to represent the integer. If *signed* is "False" and a negative integer is given, an "OverflowError" is raised. The default value for *signed* is "False". New in version 3.2. classmethod int.from_bytes(bytes, byteorder, *, signed=False) Return the integer represented by the given array of bytes. >>> int.from_bytes(b'\x00\x10', byteorder='big') 16 >>> int.from_bytes(b'\x00\x10', byteorder='little') 4096 >>> int.from_bytes(b'\xfc\x00', byteorder='big', signed=True) -1024 >>> int.from_bytes(b'\xfc\x00', byteorder='big', signed=False) 64512 >>> int.from_bytes([255, 0, 0], byteorder='big') 16711680 The argument *bytes* must either be a *bytes-like object* or an iterable producing bytes. The *byteorder* argument determines the byte order used to represent the integer. If *byteorder* is ""big"", the most significant byte is at the beginning of the byte array. If *byteorder* is ""little"", the most significant byte is at the end of the byte array. To request the native byte order of the host system, use "sys.byteorder" as the byte order value. The *signed* argument indicates whether two’s complement is used to represent the integer. New in version 3.2. int.as_integer_ratio() Return a pair of integers whose ratio is exactly equal to the original integer and with a positive denominator. The integer ratio of integers (whole numbers) is always the integer as the numerator and "1" as the denominator. New in version 3.8. Additional Methods on Float --------------------------- The float type implements the "numbers.Real" *abstract base class*. float also has the following additional methods. float.as_integer_ratio() Return a pair of integers whose ratio is exactly equal to the original float and with a positive denominator. Raises "OverflowError" on infinities and a "ValueError" on NaNs. float.is_integer() Return "True" if the float instance is finite with integral value, and "False" otherwise: >>> (-2.0).is_integer() True >>> (3.2).is_integer() False Two methods support conversion to and from hexadecimal strings. Since Python’s floats are stored internally as binary numbers, converting a float to or from a *decimal* string usually involves a small rounding error. In contrast, hexadecimal strings allow exact representation and specification of floating-point numbers. This can be useful when debugging, and in numerical work. float.hex() Return a representation of a floating-point number as a hexadecimal string. For finite floating-point numbers, this representation will always include a leading "0x" and a trailing "p" and exponent. classmethod float.fromhex(s) Class method to return the float represented by a hexadecimal string *s*. The string *s* may have leading and trailing whitespace. Note that "float.hex()" is an instance method, while "float.fromhex()" is a class method. A hexadecimal string takes the form: [sign] ['0x'] integer ['.' fraction] ['p' exponent] where the optional "sign" may by either "+" or "-", "integer" and "fraction" are strings of hexadecimal digits, and "exponent" is a decimal integer with an optional leading sign. Case is not significant, and there must be at least one hexadecimal digit in either the integer or the fraction. This syntax is similar to the syntax specified in section 6.4.4.2 of the C99 standard, and also to the syntax used in Java 1.5 onwards. In particular, the output of "float.hex()" is usable as a hexadecimal floating-point literal in C or Java code, and hexadecimal strings produced by C’s "%a" format character or Java’s "Double.toHexString" are accepted by "float.fromhex()". Note that the exponent is written in decimal rather than hexadecimal, and that it gives the power of 2 by which to multiply the coefficient. For example, the hexadecimal string "0x3.a7p10" represents the floating-point number "(3 + 10./16 + 7./16**2) * 2.0**10", or "3740.0": >>> float.fromhex('0x3.a7p10') 3740.0 Applying the reverse conversion to "3740.0" gives a different hexadecimal string representing the same number: >>> float.hex(3740.0) '0x1.d380000000000p+11' Hashing of numeric types ------------------------ For numbers "x" and "y", possibly of different types, it’s a requirement that "hash(x) == hash(y)" whenever "x == y" (see the "__hash__()" method documentation for more details). For ease of implementation and efficiency across a variety of numeric types (including "int", "float", "decimal.Decimal" and "fractions.Fraction") Python’s hash for numeric types is based on a single mathematical function that’s defined for any rational number, and hence applies to all instances of "int" and "fractions.Fraction", and all finite instances of "float" and "decimal.Decimal". Essentially, this function is given by reduction modulo "P" for a fixed prime "P". The value of "P" is made available to Python as the "modulus" attribute of "sys.hash_info". **CPython implementation detail:** Currently, the prime used is "P = 2**31 - 1" on machines with 32-bit C longs and "P = 2**61 - 1" on machines with 64-bit C longs. Here are the rules in detail: * If "x = m / n" is a nonnegative rational number and "n" is not divisible by "P", define "hash(x)" as "m * invmod(n, P) % P", where "invmod(n, P)" gives the inverse of "n" modulo "P". * If "x = m / n" is a nonnegative rational number and "n" is divisible by "P" (but "m" is not) then "n" has no inverse modulo "P" and the rule above doesn’t apply; in this case define "hash(x)" to be the constant value "sys.hash_info.inf". * If "x = m / n" is a negative rational number define "hash(x)" as "-hash(-x)". If the resulting hash is "-1", replace it with "-2". * The particular values "sys.hash_info.inf", "-sys.hash_info.inf" and "sys.hash_info.nan" are used as hash values for positive infinity, negative infinity, or nans (respectively). (All hashable nans have the same hash value.) * For a "complex" number "z", the hash values of the real and imaginary parts are combined by computing "hash(z.real) + sys.hash_info.imag * hash(z.imag)", reduced modulo "2**sys.hash_info.width" so that it lies in "range(-2**(sys.hash_info.width - 1), 2**(sys.hash_info.width - 1))". Again, if the result is "-1", it’s replaced with "-2". To clarify the above rules, here’s some example Python code, equivalent to the built-in hash, for computing the hash of a rational number, "float", or "complex": import sys, math def hash_fraction(m, n): """Compute the hash of a rational number m / n. Assumes m and n are integers, with n positive. Equivalent to hash(fractions.Fraction(m, n)). """ P = sys.hash_info.modulus # Remove common factors of P. (Unnecessary if m and n already coprime.) while m % P == n % P == 0: m, n = m // P, n // P if n % P == 0: hash_value = sys.hash_info.inf else: # Fermat's Little Theorem: pow(n, P-1, P) is 1, so # pow(n, P-2, P) gives the inverse of n modulo P. hash_value = (abs(m) % P) * pow(n, P - 2, P) % P if m < 0: hash_value = -hash_value if hash_value == -1: hash_value = -2 return hash_value def hash_float(x): """Compute the hash of a float x.""" if math.isnan(x): return sys.hash_info.nan elif math.isinf(x): return sys.hash_info.inf if x > 0 else -sys.hash_info.inf else: return hash_fraction(*x.as_integer_ratio()) def hash_complex(z): """Compute the hash of a complex number z.""" hash_value = hash_float(z.real) + sys.hash_info.imag * hash_float(z.imag) # do a signed reduction modulo 2**sys.hash_info.width M = 2**(sys.hash_info.width - 1) hash_value = (hash_value & (M - 1)) - (hash_value & M) if hash_value == -1: hash_value = -2 return hash_value Iterator Types ============== Python supports a concept of iteration over containers. This is implemented using two distinct methods; these are used to allow user- defined classes to support iteration. Sequences, described below in more detail, always support the iteration methods. One method needs to be defined for container objects to provide iteration support: container.__iter__() Return an iterator object. The object is required to support the iterator protocol described below. If a container supports different types of iteration, additional methods can be provided to specifically request iterators for those iteration types. (An example of an object supporting multiple forms of iteration would be a tree structure which supports both breadth-first and depth- first traversal.) This method corresponds to the "tp_iter" slot of the type structure for Python objects in the Python/C API. The iterator objects themselves are required to support the following two methods, which together form the *iterator protocol*: iterator.__iter__() Return the iterator object itself. This is required to allow both containers and iterators to be used with the "for" and "in" statements. This method corresponds to the "tp_iter" slot of the type structure for Python objects in the Python/C API. iterator.__next__() Return the next item from the container. If there are no further items, raise the "StopIteration" exception. This method corresponds to the "tp_iternext" slot of the type structure for Python objects in the Python/C API. Python defines several iterator objects to support iteration over general and specific sequence types, dictionaries, and other more specialized forms. The specific types are not important beyond their implementation of the iterator protocol. Once an iterator’s "__next__()" method raises "StopIteration", it must continue to do so on subsequent calls. Implementations that do not obey this property are deemed broken. Generator Types --------------- Python’s *generator*s provide a convenient way to implement the iterator protocol. If a container object’s "__iter__()" method is implemented as a generator, it will automatically return an iterator object (technically, a generator object) supplying the "__iter__()" and "__next__()" methods. More information about generators can be found in the documentation for the yield expression. Sequence Types — "list", "tuple", "range" ========================================= There are three basic sequence types: lists, tuples, and range objects. Additional sequence types tailored for processing of binary data and text strings are described in dedicated sections. Common Sequence Operations -------------------------- The operations in the following table are supported by most sequence types, both mutable and immutable. The "collections.abc.Sequence" ABC is provided to make it easier to correctly implement these operations on custom sequence types. This table lists the sequence operations sorted in ascending priority. In the table, *s* and *t* are sequences of the same type, *n*, *i*, *j* and *k* are integers and *x* is an arbitrary object that meets any type and value restrictions imposed by *s*. The "in" and "not in" operations have the same priorities as the comparison operations. The "+" (concatenation) and "*" (repetition) operations have the same priority as the corresponding numeric operations. [3] +----------------------------+----------------------------------+------------+ | Operation | Result | Notes | |============================|==================================|============| | "x in s" | "True" if an item of *s* is | (1) | | | equal to *x*, else "False" | | +----------------------------+----------------------------------+------------+ | "x not in s" | "False" if an item of *s* is | (1) | | | equal to *x*, else "True" | | +----------------------------+----------------------------------+------------+ | "s + t" | the concatenation of *s* and *t* | (6)(7) | +----------------------------+----------------------------------+------------+ | "s * n" or "n * s" | equivalent to adding *s* to | (2)(7) | | | itself *n* times | | +----------------------------+----------------------------------+------------+ | "s[i]" | *i*th item of *s*, origin 0 | (3) | +----------------------------+----------------------------------+------------+ | "s[i:j]" | slice of *s* from *i* to *j* | (3)(4) | +----------------------------+----------------------------------+------------+ | "s[i:j:k]" | slice of *s* from *i* to *j* | (3)(5) | | | with step *k* | | +----------------------------+----------------------------------+------------+ | "len(s)" | length of *s* | | +----------------------------+----------------------------------+------------+ | "min(s)" | smallest item of *s* | | +----------------------------+----------------------------------+------------+ | "max(s)" | largest item of *s* | | +----------------------------+----------------------------------+------------+ | "s.index(x[, i[, j]])" | index of the first occurrence of | (8) | | | *x* in *s* (at or after index | | | | *i* and before index *j*) | | +----------------------------+----------------------------------+------------+ | "s.count(x)" | total number of occurrences of | | | | *x* in *s* | | +----------------------------+----------------------------------+------------+ Sequences of the same type also support comparisons. In particular, tuples and lists are compared lexicographically by comparing corresponding elements. This means that to compare equal, every element must compare equal and the two sequences must be of the same type and have the same length. (For full details see Comparisons in the language reference.) Notes: 1. While the "in" and "not in" operations are used only for simple containment testing in the general case, some specialised sequences (such as "str", "bytes" and "bytearray") also use them for subsequence testing: >>> "gg" in "eggs" True 2. Values of *n* less than "0" are treated as "0" (which yields an empty sequence of the same type as *s*). Note that items in the sequence *s* are not copied; they are referenced multiple times. This often haunts new Python programmers; consider: >>> lists = [[]] * 3 >>> lists [[], [], []] >>> lists[0].append(3) >>> lists [[3], [3], [3]] What has happened is that "[[]]" is a one-element list containing an empty list, so all three elements of "[[]] * 3" are references to this single empty list. Modifying any of the elements of "lists" modifies this single list. You can create a list of different lists this way: >>> lists = [[] for i in range(3)] >>> lists[0].append(3) >>> lists[1].append(5) >>> lists[2].append(7) >>> lists [[3], [5], [7]] Further explanation is available in the FAQ entry How do I create a multidimensional list?. 3. If *i* or *j* is negative, the index is relative to the end of sequence *s*: "len(s) + i" or "len(s) + j" is substituted. But note that "-0" is still "0". 4. The slice of *s* from *i* to *j* is defined as the sequence of items with index *k* such that "i <= k < j". If *i* or *j* is greater than "len(s)", use "len(s)". If *i* is omitted or "None", use "0". If *j* is omitted or "None", use "len(s)". If *i* is greater than or equal to *j*, the slice is empty. 5. The slice of *s* from *i* to *j* with step *k* is defined as the sequence of items with index "x = i + n*k" such that "0 <= n < (j-i)/k". In other words, the indices are "i", "i+k", "i+2*k", "i+3*k" and so on, stopping when *j* is reached (but never including *j*). When *k* is positive, *i* and *j* are reduced to "len(s)" if they are greater. When *k* is negative, *i* and *j* are reduced to "len(s) - 1" if they are greater. If *i* or *j* are omitted or "None", they become “end” values (which end depends on the sign of *k*). Note, *k* cannot be zero. If *k* is "None", it is treated like "1". 6. Concatenating immutable sequences always results in a new object. This means that building up a sequence by repeated concatenation will have a quadratic runtime cost in the total sequence length. To get a linear runtime cost, you must switch to one of the alternatives below: * if concatenating "str" objects, you can build a list and use "str.join()" at the end or else write to an "io.StringIO" instance and retrieve its value when complete * if concatenating "bytes" objects, you can similarly use "bytes.join()" or "io.BytesIO", or you can do in-place concatenation with a "bytearray" object. "bytearray" objects are mutable and have an efficient overallocation mechanism * if concatenating "tuple" objects, extend a "list" instead * for other types, investigate the relevant class documentation 7. Some sequence types (such as "range") only support item sequences that follow specific patterns, and hence don’t support sequence concatenation or repetition. 8. "index" raises "ValueError" when *x* is not found in *s*. Not all implementations support passing the additional arguments *i* and *j*. These arguments allow efficient searching of subsections of the sequence. Passing the extra arguments is roughly equivalent to using "s[i:j].index(x)", only without copying any data and with the returned index being relative to the start of the sequence rather than the start of the slice. Immutable Sequence Types ------------------------ The only operation that immutable sequence types generally implement that is not also implemented by mutable sequence types is support for the "hash()" built-in. This support allows immutable sequences, such as "tuple" instances, to be used as "dict" keys and stored in "set" and "frozenset" instances. Attempting to hash an immutable sequence that contains unhashable values will result in "TypeError". Mutable Sequence Types ---------------------- The operations in the following table are defined on mutable sequence types. The "collections.abc.MutableSequence" ABC is provided to make it easier to correctly implement these operations on custom sequence types. In the table *s* is an instance of a mutable sequence type, *t* is any iterable object and *x* is an arbitrary object that meets any type and value restrictions imposed by *s* (for example, "bytearray" only accepts integers that meet the value restriction "0 <= x <= 255"). +--------------------------------+----------------------------------+-----------------------+ | Operation | Result | Notes | |================================|==================================|=======================| | "s[i] = x" | item *i* of *s* is replaced by | | | | *x* | | +--------------------------------+----------------------------------+-----------------------+ | "s[i:j] = t" | slice of *s* from *i* to *j* is | | | | replaced by the contents of the | | | | iterable *t* | | +--------------------------------+----------------------------------+-----------------------+ | "del s[i:j]" | same as "s[i:j] = []" | | +--------------------------------+----------------------------------+-----------------------+ | "s[i:j:k] = t" | the elements of "s[i:j:k]" are | (1) | | | replaced by those of *t* | | +--------------------------------+----------------------------------+-----------------------+ | "del s[i:j:k]" | removes the elements of | | | | "s[i:j:k]" from the list | | +--------------------------------+----------------------------------+-----------------------+ | "s.append(x)" | appends *x* to the end of the | | | | sequence (same as | | | | "s[len(s):len(s)] = [x]") | | +--------------------------------+----------------------------------+-----------------------+ | "s.clear()" | removes all items from *s* (same | (5) | | | as "del s[:]") | | +--------------------------------+----------------------------------+-----------------------+ | "s.copy()" | creates a shallow copy of *s* | (5) | | | (same as "s[:]") | | +--------------------------------+----------------------------------+-----------------------+ | "s.extend(t)" or "s += t" | extends *s* with the contents of | | | | *t* (for the most part the same | | | | as "s[len(s):len(s)] = t") | | +--------------------------------+----------------------------------+-----------------------+ | "s *= n" | updates *s* with its contents | (6) | | | repeated *n* times | | +--------------------------------+----------------------------------+-----------------------+ | "s.insert(i, x)" | inserts *x* into *s* at the | | | | index given by *i* (same as | | | | "s[i:i] = [x]") | | +--------------------------------+----------------------------------+-----------------------+ | "s.pop()" or "s.pop(i)" | retrieves the item at *i* and | (2) | | | also removes it from *s* | | +--------------------------------+----------------------------------+-----------------------+ | "s.remove(x)" | remove the first item from *s* | (3) | | | where "s[i]" is equal to *x* | | +--------------------------------+----------------------------------+-----------------------+ | "s.reverse()" | reverses the items of *s* in | (4) | | | place | | +--------------------------------+----------------------------------+-----------------------+ Notes: 1. *t* must have the same length as the slice it is replacing. 2. The optional argument *i* defaults to "-1", so that by default the last item is removed and returned. 3. "remove()" raises "ValueError" when *x* is not found in *s*. 4. The "reverse()" method modifies the sequence in place for economy of space when reversing a large sequence. To remind users that it operates by side effect, it does not return the reversed sequence. 5. "clear()" and "copy()" are included for consistency with the interfaces of mutable containers that don’t support slicing operations (such as "dict" and "set"). "copy()" is not part of the "collections.abc.MutableSequence" ABC, but most concrete mutable sequence classes provide it. New in version 3.3: "clear()" and "copy()" methods. 6. The value *n* is an integer, or an object implementing "__index__()". Zero and negative values of *n* clear the sequence. Items in the sequence are not copied; they are referenced multiple times, as explained for "s * n" under Common Sequence Operations. Lists ----- Lists are mutable sequences, typically used to store collections of homogeneous items (where the precise degree of similarity will vary by application). class list([iterable]) Lists may be constructed in several ways: * Using a pair of square brackets to denote the empty list: "[]" * Using square brackets, separating items with commas: "[a]", "[a, b, c]" * Using a list comprehension: "[x for x in iterable]" * Using the type constructor: "list()" or "list(iterable)" The constructor builds a list whose items are the same and in the same order as *iterable*’s items. *iterable* may be either a sequence, a container that supports iteration, or an iterator object. If *iterable* is already a list, a copy is made and returned, similar to "iterable[:]". For example, "list('abc')" returns "['a', 'b', 'c']" and "list( (1, 2, 3) )" returns "[1, 2, 3]". If no argument is given, the constructor creates a new empty list, "[]". Many other operations also produce lists, including the "sorted()" built-in. Lists implement all of the common and mutable sequence operations. Lists also provide the following additional method: sort(*, key=None, reverse=False) This method sorts the list in place, using only "<" comparisons between items. Exceptions are not suppressed - if any comparison operations fail, the entire sort operation will fail (and the list will likely be left in a partially modified state). "sort()" accepts two arguments that can only be passed by keyword (keyword-only arguments): *key* specifies a function of one argument that is used to extract a comparison key from each list element (for example, "key=str.lower"). The key corresponding to each item in the list is calculated once and then used for the entire sorting process. The default value of "None" means that list items are sorted directly without calculating a separate key value. The "functools.cmp_to_key()" utility is available to convert a 2.x style *cmp* function to a *key* function. *reverse* is a boolean value. If set to "True", then the list elements are sorted as if each comparison were reversed. This method modifies the sequence in place for economy of space when sorting a large sequence. To remind users that it operates by side effect, it does not return the sorted sequence (use "sorted()" to explicitly request a new sorted list instance). The "sort()" method is guaranteed to be stable. A sort is stable if it guarantees not to change the relative order of elements that compare equal — this is helpful for sorting in multiple passes (for example, sort by department, then by salary grade). For sorting examples and a brief sorting tutorial, see Sorting HOW TO. **CPython implementation detail:** While a list is being sorted, the effect of attempting to mutate, or even inspect, the list is undefined. The C implementation of Python makes the list appear empty for the duration, and raises "ValueError" if it can detect that the list has been mutated during a sort. Tuples ------ Tuples are immutable sequences, typically used to store collections of heterogeneous data (such as the 2-tuples produced by the "enumerate()" built-in). Tuples are also used for cases where an immutable sequence of homogeneous data is needed (such as allowing storage in a "set" or "dict" instance). class tuple([iterable]) Tuples may be constructed in a number of ways: * Using a pair of parentheses to denote the empty tuple: "()" * Using a trailing comma for a singleton tuple: "a," or "(a,)" * Separating items with commas: "a, b, c" or "(a, b, c)" * Using the "tuple()" built-in: "tuple()" or "tuple(iterable)" The constructor builds a tuple whose items are the same and in the same order as *iterable*’s items. *iterable* may be either a sequence, a container that supports iteration, or an iterator object. If *iterable* is already a tuple, it is returned unchanged. For example, "tuple('abc')" returns "('a', 'b', 'c')" and "tuple( [1, 2, 3] )" returns "(1, 2, 3)". If no argument is given, the constructor creates a new empty tuple, "()". Note that it is actually the comma which makes a tuple, not the parentheses. The parentheses are optional, except in the empty tuple case, or when they are needed to avoid syntactic ambiguity. For example, "f(a, b, c)" is a function call with three arguments, while "f((a, b, c))" is a function call with a 3-tuple as the sole argument. Tuples implement all of the common sequence operations. For heterogeneous collections of data where access by name is clearer than access by index, "collections.namedtuple()" may be a more appropriate choice than a simple tuple object. Ranges ------ The "range" type represents an immutable sequence of numbers and is commonly used for looping a specific number of times in "for" loops. class range(stop) class range(start, stop[, step]) The arguments to the range constructor must be integers (either built-in "int" or any object that implements the "__index__()" special method). If the *step* argument is omitted, it defaults to "1". If the *start* argument is omitted, it defaults to "0". If *step* is zero, "ValueError" is raised. For a positive *step*, the contents of a range "r" are determined by the formula "r[i] = start + step*i" where "i >= 0" and "r[i] < stop". For a negative *step*, the contents of the range are still determined by the formula "r[i] = start + step*i", but the constraints are "i >= 0" and "r[i] > stop". A range object will be empty if "r[0]" does not meet the value constraint. Ranges do support negative indices, but these are interpreted as indexing from the end of the sequence determined by the positive indices. Ranges containing absolute values larger than "sys.maxsize" are permitted but some features (such as "len()") may raise "OverflowError". Range examples: >>> list(range(10)) [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] >>> list(range(1, 11)) [1, 2, 3, 4, 5, 6, 7, 8, 9, 10] >>> list(range(0, 30, 5)) [0, 5, 10, 15, 20, 25] >>> list(range(0, 10, 3)) [0, 3, 6, 9] >>> list(range(0, -10, -1)) [0, -1, -2, -3, -4, -5, -6, -7, -8, -9] >>> list(range(0)) [] >>> list(range(1, 0)) [] Ranges implement all of the common sequence operations except concatenation and repetition (due to the fact that range objects can only represent sequences that follow a strict pattern and repetition and concatenation will usually violate that pattern). start The value of the *start* parameter (or "0" if the parameter was not supplied) stop The value of the *stop* parameter step The value of the *step* parameter (or "1" if the parameter was not supplied) The advantage of the "range" type over a regular "list" or "tuple" is that a "range" object will always take the same (small) amount of memory, no matter the size of the range it represents (as it only stores the "start", "stop" and "step" values, calculating individual items and subranges as needed). Range objects implement the "collections.abc.Sequence" ABC, and provide features such as containment tests, element index lookup, slicing and support for negative indices (see Sequence Types — list, tuple, range): >>> r = range(0, 20, 2) >>> r range(0, 20, 2) >>> 11 in r False >>> 10 in r True >>> r.index(10) 5 >>> r[5] 10 >>> r[:5] range(0, 10, 2) >>> r[-1] 18 Testing range objects for equality with "==" and "!=" compares them as sequences. That is, two range objects are considered equal if they represent the same sequence of values. (Note that two range objects that compare equal might have different "start", "stop" and "step" attributes, for example "range(0) == range(2, 1, 3)" or "range(0, 3, 2) == range(0, 4, 2)".) Changed in version 3.2: Implement the Sequence ABC. Support slicing and negative indices. Test "int" objects for membership in constant time instead of iterating through all items. Changed in version 3.3: Define ‘==’ and ‘!=’ to compare range objects based on the sequence of values they define (instead of comparing based on object identity). New in version 3.3: The "start", "stop" and "step" attributes. See also: * The linspace recipe shows how to implement a lazy version of range suitable for floating point applications. Text Sequence Type — "str" ========================== Textual data in Python is handled with "str" objects, or *strings*. Strings are immutable sequences of Unicode code points. String literals are written in a variety of ways: * Single quotes: "'allows embedded "double" quotes'" * Double quotes: ""allows embedded 'single' quotes"" * Triple quoted: "'''Three single quotes'''", """"Three double quotes"""" Triple quoted strings may span multiple lines - all associated whitespace will be included in the string literal. String literals that are part of a single expression and have only whitespace between them will be implicitly converted to a single string literal. That is, "("spam " "eggs") == "spam eggs"". See String and Bytes literals for more about the various forms of string literal, including supported escape sequences, and the "r" (“raw”) prefix that disables most escape sequence processing. Strings may also be created from other objects using the "str" constructor. Since there is no separate “character” type, indexing a string produces strings of length 1. That is, for a non-empty string *s*, "s[0] == s[0:1]". There is also no mutable string type, but "str.join()" or "io.StringIO" can be used to efficiently construct strings from multiple fragments. Changed in version 3.3: For backwards compatibility with the Python 2 series, the "u" prefix is once again permitted on string literals. It has no effect on the meaning of string literals and cannot be combined with the "r" prefix. class str(object='') class str(object=b'', encoding='utf-8', errors='strict') Return a string version of *object*. If *object* is not provided, returns the empty string. Otherwise, the behavior of "str()" depends on whether *encoding* or *errors* is given, as follows. If neither *encoding* nor *errors* is given, "str(object)" returns "type(object).__str__(object)", which is the “informal” or nicely printable string representation of *object*. For string objects, this is the string itself. If *object* does not have a "__str__()" method, then "str()" falls back to returning "repr(object)". If at least one of *encoding* or *errors* is given, *object* should be a *bytes-like object* (e.g. "bytes" or "bytearray"). In this case, if *object* is a "bytes" (or "bytearray") object, then "str(bytes, encoding, errors)" is equivalent to "bytes.decode(encoding, errors)". Otherwise, the bytes object underlying the buffer object is obtained before calling "bytes.decode()". See Binary Sequence Types — bytes, bytearray, memoryview and Buffer Protocol for information on buffer objects. Passing a "bytes" object to "str()" without the *encoding* or *errors* arguments falls under the first case of returning the informal string representation (see also the "-b" command-line option to Python). For example: >>> str(b'Zoot!') "b'Zoot!'" For more information on the "str" class and its methods, see Text Sequence Type — str and the String Methods section below. To output formatted strings, see the Formatted string literals and Format String Syntax sections. In addition, see the Text Processing Services section. String Methods -------------- Strings implement all of the common sequence operations, along with the additional methods described below. Strings also support two styles of string formatting, one providing a large degree of flexibility and customization (see "str.format()", Format String Syntax and Custom String Formatting) and the other based on C "printf" style formatting that handles a narrower range of types and is slightly harder to use correctly, but is often faster for the cases it can handle (printf-style String Formatting). The Text Processing Services section of the standard library covers a number of other modules that provide various text related utilities (including regular expression support in the "re" module). str.capitalize() Return a copy of the string with its first character capitalized and the rest lowercased. Changed in version 3.8: The first character is now put into titlecase rather than uppercase. This means that characters like digraphs will only have their first letter capitalized, instead of the full character. str.casefold() Return a casefolded copy of the string. Casefolded strings may be used for caseless matching. Casefolding is similar to lowercasing but more aggressive because it is intended to remove all case distinctions in a string. For example, the German lowercase letter "'ß'" is equivalent to ""ss"". Since it is already lowercase, "lower()" would do nothing to "'ß'"; "casefold()" converts it to ""ss"". The casefolding algorithm is described in section 3.13 of the Unicode Standard. New in version 3.3. str.center(width[, fillchar]) Return centered in a string of length *width*. Padding is done using the specified *fillchar* (default is an ASCII space). The original string is returned if *width* is less than or equal to "len(s)". str.count(sub[, start[, end]]) Return the number of non-overlapping occurrences of substring *sub* in the range [*start*, *end*]. Optional arguments *start* and *end* are interpreted as in slice notation. str.encode(encoding="utf-8", errors="strict") Return an encoded version of the string as a bytes object. Default encoding is "'utf-8'". *errors* may be given to set a different error handling scheme. The default for *errors* is "'strict'", meaning that encoding errors raise a "UnicodeError". Other possible values are "'ignore'", "'replace'", "'xmlcharrefreplace'", "'backslashreplace'" and any other name registered via "codecs.register_error()", see section Error Handlers. For a list of possible encodings, see section Standard Encodings. By default, the *errors* argument is not checked for best performances, but only used at the first encoding error. Enable the Python Development Mode, or use a debug build to check *errors*. Changed in version 3.1: Support for keyword arguments added. Changed in version 3.9: The *errors* is now checked in development mode and in debug mode. str.endswith(suffix[, start[, end]]) Return "True" if the string ends with the specified *suffix*, otherwise return "False". *suffix* can also be a tuple of suffixes to look for. With optional *start*, test beginning at that position. With optional *end*, stop comparing at that position. str.expandtabs(tabsize=8) Return a copy of the string where all tab characters are replaced by one or more spaces, depending on the current column and the given tab size. Tab positions occur every *tabsize* characters (default is 8, giving tab positions at columns 0, 8, 16 and so on). To expand the string, the current column is set to zero and the string is examined character by character. If the character is a tab ("\t"), one or more space characters are inserted in the result until the current column is equal to the next tab position. (The tab character itself is not copied.) If the character is a newline ("\n") or return ("\r"), it is copied and the current column is reset to zero. Any other character is copied unchanged and the current column is incremented by one regardless of how the character is represented when printed. >>> '01\t012\t0123\t01234'.expandtabs() '01 012 0123 01234' >>> '01\t012\t0123\t01234'.expandtabs(4) '01 012 0123 01234' str.find(sub[, start[, end]]) Return the lowest index in the string where substring *sub* is found within the slice "s[start:end]". Optional arguments *start* and *end* are interpreted as in slice notation. Return "-1" if *sub* is not found. Note: The "find()" method should be used only if you need to know the position of *sub*. To check if *sub* is a substring or not, use the "in" operator: >>> 'Py' in 'Python' True str.format(*args, **kwargs) Perform a string formatting operation. The string on which this method is called can contain literal text or replacement fields delimited by braces "{}". Each replacement field contains either the numeric index of a positional argument, or the name of a keyword argument. Returns a copy of the string where each replacement field is replaced with the string value of the corresponding argument. >>> "The sum of 1 + 2 is {0}".format(1+2) 'The sum of 1 + 2 is 3' See Format String Syntax for a description of the various formatting options that can be specified in format strings. Note: When formatting a number ("int", "float", "complex", "decimal.Decimal" and subclasses) with the "n" type (ex: "'{:n}'.format(1234)"), the function temporarily sets the "LC_CTYPE" locale to the "LC_NUMERIC" locale to decode "decimal_point" and "thousands_sep" fields of "localeconv()" if they are non-ASCII or longer than 1 byte, and the "LC_NUMERIC" locale is different than the "LC_CTYPE" locale. This temporary change affects other threads. Changed in version 3.7: When formatting a number with the "n" type, the function sets temporarily the "LC_CTYPE" locale to the "LC_NUMERIC" locale in some cases. str.format_map(mapping) Similar to "str.format(**mapping)", except that "mapping" is used directly and not copied to a "dict". This is useful if for example "mapping" is a dict subclass: >>> class Default(dict): ... def __missing__(self, key): ... return key ... >>> '{name} was born in {country}'.format_map(Default(name='Guido')) 'Guido was born in country' New in version 3.2. str.index(sub[, start[, end]]) Like "find()", but raise "ValueError" when the substring is not found. str.isalnum() Return "True" if all characters in the string are alphanumeric and there is at least one character, "False" otherwise. A character "c" is alphanumeric if one of the following returns "True": "c.isalpha()", "c.isdecimal()", "c.isdigit()", or "c.isnumeric()". str.isalpha() Return "True" if all characters in the string are alphabetic and there is at least one character, "False" otherwise. Alphabetic characters are those characters defined in the Unicode character database as “Letter”, i.e., those with general category property being one of “Lm”, “Lt”, “Lu”, “Ll”, or “Lo”. Note that this is different from the “Alphabetic” property defined in the Unicode Standard. str.isascii() Return "True" if the string is empty or all characters in the string are ASCII, "False" otherwise. ASCII characters have code points in the range U+0000-U+007F. New in version 3.7. str.isdecimal() Return "True" if all characters in the string are decimal characters and there is at least one character, "False" otherwise. Decimal characters are those that can be used to form numbers in base 10, e.g. U+0660, ARABIC-INDIC DIGIT ZERO. Formally a decimal character is a character in the Unicode General Category “Nd”. str.isdigit() Return "True" if all characters in the string are digits and there is at least one character, "False" otherwise. Digits include decimal characters and digits that need special handling, such as the compatibility superscript digits. This covers digits which cannot be used to form numbers in base 10, like the Kharosthi numbers. Formally, a digit is a character that has the property value Numeric_Type=Digit or Numeric_Type=Decimal. str.isidentifier() Return "True" if the string is a valid identifier according to the language definition, section Identifiers and keywords. Call "keyword.iskeyword()" to test whether string "s" is a reserved identifier, such as "def" and "class". Example: >>> from keyword import iskeyword >>> 'hello'.isidentifier(), iskeyword('hello') (True, False) >>> 'def'.isidentifier(), iskeyword('def') (True, True) str.islower() Return "True" if all cased characters [4] in the string are lowercase and there is at least one cased character, "False" otherwise. str.isnumeric() Return "True" if all characters in the string are numeric characters, and there is at least one character, "False" otherwise. Numeric characters include digit characters, and all characters that have the Unicode numeric value property, e.g. U+2155, VULGAR FRACTION ONE FIFTH. Formally, numeric characters are those with the property value Numeric_Type=Digit, Numeric_Type=Decimal or Numeric_Type=Numeric. str.isprintable() Return "True" if all characters in the string are printable or the string is empty, "False" otherwise. Nonprintable characters are those characters defined in the Unicode character database as “Other” or “Separator”, excepting the ASCII space (0x20) which is considered printable. (Note that printable characters in this context are those which should not be escaped when "repr()" is invoked on a string. It has no bearing on the handling of strings written to "sys.stdout" or "sys.stderr".) str.isspace() Return "True" if there are only whitespace characters in the string and there is at least one character, "False" otherwise. A character is *whitespace* if in the Unicode character database (see "unicodedata"), either its general category is "Zs" (“Separator, space”), or its bidirectional class is one of "WS", "B", or "S". str.istitle() Return "True" if the string is a titlecased string and there is at least one character, for example uppercase characters may only follow uncased characters and lowercase characters only cased ones. Return "False" otherwise. str.isupper() Return "True" if all cased characters [4] in the string are uppercase and there is at least one cased character, "False" otherwise. >>> 'BANANA'.isupper() True >>> 'banana'.isupper() False >>> 'baNana'.isupper() False >>> ' '.isupper() False str.join(iterable) Return a string which is the concatenation of the strings in *iterable*. A "TypeError" will be raised if there are any non- string values in *iterable*, including "bytes" objects. The separator between elements is the string providing this method. str.ljust(width[, fillchar]) Return the string left justified in a string of length *width*. Padding is done using the specified *fillchar* (default is an ASCII space). The original string is returned if *width* is less than or equal to "len(s)". str.lower() Return a copy of the string with all the cased characters [4] converted to lowercase. The lowercasing algorithm used is described in section 3.13 of the Unicode Standard. str.lstrip([chars]) Return a copy of the string with leading characters removed. The *chars* argument is a string specifying the set of characters to be removed. If omitted or "None", the *chars* argument defaults to removing whitespace. The *chars* argument is not a prefix; rather, all combinations of its values are stripped: >>> ' spacious '.lstrip() 'spacious ' >>> 'www.example.com'.lstrip('cmowz.') 'example.com' See "str.removeprefix()" for a method that will remove a single prefix string rather than all of a set of characters. For example: >>> 'Arthur: three!'.lstrip('Arthur: ') 'ee!' >>> 'Arthur: three!'.removeprefix('Arthur: ') 'three!' static str.maketrans(x[, y[, z]]) This static method returns a translation table usable for "str.translate()". If there is only one argument, it must be a dictionary mapping Unicode ordinals (integers) or characters (strings of length 1) to Unicode ordinals, strings (of arbitrary lengths) or "None". Character keys will then be converted to ordinals. If there are two arguments, they must be strings of equal length, and in the resulting dictionary, each character in x will be mapped to the character at the same position in y. If there is a third argument, it must be a string, whose characters will be mapped to "None" in the result. str.partition(sep) Split the string at the first occurrence of *sep*, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing the string itself, followed by two empty strings. str.removeprefix(prefix, /) If the string starts with the *prefix* string, return "string[len(prefix):]". Otherwise, return a copy of the original string: >>> 'TestHook'.removeprefix('Test') 'Hook' >>> 'BaseTestCase'.removeprefix('Test') 'BaseTestCase' New in version 3.9. str.removesuffix(suffix, /) If the string ends with the *suffix* string and that *suffix* is not empty, return "string[:-len(suffix)]". Otherwise, return a copy of the original string: >>> 'MiscTests'.removesuffix('Tests') 'Misc' >>> 'TmpDirMixin'.removesuffix('Tests') 'TmpDirMixin' New in version 3.9. str.replace(old, new[, count]) Return a copy of the string with all occurrences of substring *old* replaced by *new*. If the optional argument *count* is given, only the first *count* occurrences are replaced. str.rfind(sub[, start[, end]]) Return the highest index in the string where substring *sub* is found, such that *sub* is contained within "s[start:end]". Optional arguments *start* and *end* are interpreted as in slice notation. Return "-1" on failure. str.rindex(sub[, start[, end]]) Like "rfind()" but raises "ValueError" when the substring *sub* is not found. str.rjust(width[, fillchar]) Return the string right justified in a string of length *width*. Padding is done using the specified *fillchar* (default is an ASCII space). The original string is returned if *width* is less than or equal to "len(s)". str.rpartition(sep) Split the string at the last occurrence of *sep*, and return a 3-tuple containing the part before the separator, the separator itself, and the part after the separator. If the separator is not found, return a 3-tuple containing two empty strings, followed by the string itself. str.rsplit(sep=None, maxsplit=-1) Return a list of the words in the string, using *sep* as the delimiter string. If *maxsplit* is given, at most *maxsplit* splits are done, the *rightmost* ones. If *sep* is not specified or "None", any whitespace string is a separator. Except for splitting from the right, "rsplit()" behaves like "split()" which is described in detail below. str.rstrip([chars]) Return a copy of the string with trailing characters removed. The *chars* argument is a string specifying the set of characters to be removed. If omitted or "None", the *chars* argument defaults to removing whitespace. The *chars* argument is not a suffix; rather, all combinations of its values are stripped: >>> ' spacious '.rstrip() ' spacious' >>> 'mississippi'.rstrip('ipz') 'mississ' See "str.removesuffix()" for a method that will remove a single suffix string rather than all of a set of characters. For example: >>> 'Monty Python'.rstrip(' Python') 'M' >>> 'Monty Python'.removesuffix(' Python') 'Monty' str.split(sep=None, maxsplit=-1) Return a list of the words in the string, using *sep* as the delimiter string. If *maxsplit* is given, at most *maxsplit* splits are done (thus, the list will have at most "maxsplit+1" elements). If *maxsplit* is not specified or "-1", then there is no limit on the number of splits (all possible splits are made). If *sep* is given, consecutive delimiters are not grouped together and are deemed to delimit empty strings (for example, "'1,,2'.split(',')" returns "['1', '', '2']"). The *sep* argument may consist of multiple characters (for example, "'1<>2<>3'.split('<>')" returns "['1', '2', '3']"). Splitting an empty string with a specified separator returns "['']". For example: >>> '1,2,3'.split(',') ['1', '2', '3'] >>> '1,2,3'.split(',', maxsplit=1) ['1', '2,3'] >>> '1,2,,3,'.split(',') ['1', '2', '', '3', ''] If *sep* is not specified or is "None", a different splitting algorithm is applied: runs of consecutive whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the string has leading or trailing whitespace. Consequently, splitting an empty string or a string consisting of just whitespace with a "None" separator returns "[]". For example: >>> '1 2 3'.split() ['1', '2', '3'] >>> '1 2 3'.split(maxsplit=1) ['1', '2 3'] >>> ' 1 2 3 '.split() ['1', '2', '3'] str.splitlines(keepends=False) Return a list of the lines in the string, breaking at line boundaries. Line breaks are not included in the resulting list unless *keepends* is given and true. This method splits on the following line boundaries. In particular, the boundaries are a superset of *universal newlines*. +-------------------------+-------------------------------+ | Representation | Description | |=========================|===============================| | "\n" | Line Feed | +-------------------------+-------------------------------+ | "\r" | Carriage Return | +-------------------------+-------------------------------+ | "\r\n" | Carriage Return + Line Feed | +-------------------------+-------------------------------+ | "\v" or "\x0b" | Line Tabulation | +-------------------------+-------------------------------+ | "\f" or "\x0c" | Form Feed | +-------------------------+-------------------------------+ | "\x1c" | File Separator | +-------------------------+-------------------------------+ | "\x1d" | Group Separator | +-------------------------+-------------------------------+ | "\x1e" | Record Separator | +-------------------------+-------------------------------+ | "\x85" | Next Line (C1 Control Code) | +-------------------------+-------------------------------+ | "\u2028" | Line Separator | +-------------------------+-------------------------------+ | "\u2029" | Paragraph Separator | +-------------------------+-------------------------------+ Changed in version 3.2: "\v" and "\f" added to list of line boundaries. For example: >>> 'ab c\n\nde fg\rkl\r\n'.splitlines() ['ab c', '', 'de fg', 'kl'] >>> 'ab c\n\nde fg\rkl\r\n'.splitlines(keepends=True) ['ab c\n', '\n', 'de fg\r', 'kl\r\n'] Unlike "split()" when a delimiter string *sep* is given, this method returns an empty list for the empty string, and a terminal line break does not result in an extra line: >>> "".splitlines() [] >>> "One line\n".splitlines() ['One line'] For comparison, "split('\n')" gives: >>> ''.split('\n') [''] >>> 'Two lines\n'.split('\n') ['Two lines', ''] str.startswith(prefix[, start[, end]]) Return "True" if string starts with the *prefix*, otherwise return "False". *prefix* can also be a tuple of prefixes to look for. With optional *start*, test string beginning at that position. With optional *end*, stop comparing string at that position. str.strip([chars]) Return a copy of the string with the leading and trailing characters removed. The *chars* argument is a string specifying the set of characters to be removed. If omitted or "None", the *chars* argument defaults to removing whitespace. The *chars* argument is not a prefix or suffix; rather, all combinations of its values are stripped: >>> ' spacious '.strip() 'spacious' >>> 'www.example.com'.strip('cmowz.') 'example' The outermost leading and trailing *chars* argument values are stripped from the string. Characters are removed from the leading end until reaching a string character that is not contained in the set of characters in *chars*. A similar action takes place on the trailing end. For example: >>> comment_string = '#....... Section 3.2.1 Issue #32 .......' >>> comment_string.strip('.#! ') 'Section 3.2.1 Issue #32' str.swapcase() Return a copy of the string with uppercase characters converted to lowercase and vice versa. Note that it is not necessarily true that "s.swapcase().swapcase() == s". str.title() Return a titlecased version of the string where words start with an uppercase character and the remaining characters are lowercase. For example: >>> 'Hello world'.title() 'Hello World' The algorithm uses a simple language-independent definition of a word as groups of consecutive letters. The definition works in many contexts but it means that apostrophes in contractions and possessives form word boundaries, which may not be the desired result: >>> "they're bill's friends from the UK".title() "They'Re Bill'S Friends From The Uk" The "string.capwords()" function does not have this problem, as it splits words on spaces only. Alternatively, a workaround for apostrophes can be constructed using regular expressions: >>> import re >>> def titlecase(s): ... return re.sub(r"[A-Za-z]+('[A-Za-z]+)?", ... lambda mo: mo.group(0).capitalize(), ... s) ... >>> titlecase("they're bill's friends.") "They're Bill's Friends." str.translate(table) Return a copy of the string in which each character has been mapped through the given translation table. The table must be an object that implements indexing via "__getitem__()", typically a *mapping* or *sequence*. When indexed by a Unicode ordinal (an integer), the table object can do any of the following: return a Unicode ordinal or a string, to map the character to one or more other characters; return "None", to delete the character from the return string; or raise a "LookupError" exception, to map the character to itself. You can use "str.maketrans()" to create a translation map from character-to-character mappings in different formats. See also the "codecs" module for a more flexible approach to custom character mappings. str.upper() Return a copy of the string with all the cased characters [4] converted to uppercase. Note that "s.upper().isupper()" might be "False" if "s" contains uncased characters or if the Unicode category of the resulting character(s) is not “Lu” (Letter, uppercase), but e.g. “Lt” (Letter, titlecase). The uppercasing algorithm used is described in section 3.13 of the Unicode Standard. str.zfill(width) Return a copy of the string left filled with ASCII "'0'" digits to make a string of length *width*. A leading sign prefix ("'+'"/"'-'") is handled by inserting the padding *after* the sign character rather than before. The original string is returned if *width* is less than or equal to "len(s)". For example: >>> "42".zfill(5) '00042' >>> "-42".zfill(5) '-0042' "printf"-style String Formatting -------------------------------- Note: The formatting operations described here exhibit a variety of quirks that lead to a number of common errors (such as failing to display tuples and dictionaries correctly). Using the newer formatted string literals, the "str.format()" interface, or template strings may help avoid these errors. Each of these alternatives provides their own trade-offs and benefits of simplicity, flexibility, and/or extensibility. String objects have one unique built-in operation: the "%" operator (modulo). This is also known as the string *formatting* or *interpolation* operator. Given "format % values" (where *format* is a string), "%" conversion specifications in *format* are replaced with zero or more elements of *values*. The effect is similar to using the "sprintf()" in the C language. If *format* requires a single argument, *values* may be a single non- tuple object. [5] Otherwise, *values* must be a tuple with exactly the number of items specified by the format string, or a single mapping object (for example, a dictionary). A conversion specifier contains two or more characters and has the following components, which must occur in this order: 1. The "'%'" character, which marks the start of the specifier. 2. Mapping key (optional), consisting of a parenthesised sequence of characters (for example, "(somename)"). 3. Conversion flags (optional), which affect the result of some conversion types. 4. Minimum field width (optional). If specified as an "'*'" (asterisk), the actual width is read from the next element of the tuple in *values*, and the object to convert comes after the minimum field width and optional precision. 5. Precision (optional), given as a "'.'" (dot) followed by the precision. If specified as "'*'" (an asterisk), the actual precision is read from the next element of the tuple in *values*, and the value to convert comes after the precision. 6. Length modifier (optional). 7. Conversion type. When the right argument is a dictionary (or other mapping type), then the formats in the string *must* include a parenthesised mapping key into that dictionary inserted immediately after the "'%'" character. The mapping key selects the value to be formatted from the mapping. For example: >>> print('%(language)s has %(number)03d quote types.' % ... {'language': "Python", "number": 2}) Python has 002 quote types. In this case no "*" specifiers may occur in a format (since they require a sequential parameter list). The conversion flag characters are: +-----------+-----------------------------------------------------------------------+ | Flag | Meaning | |===========|=======================================================================| | "'#'" | The value conversion will use the “alternate form” (where defined | | | below). | +-----------+-----------------------------------------------------------------------+ | "'0'" | The conversion will be zero padded for numeric values. | +-----------+-----------------------------------------------------------------------+ | "'-'" | The converted value is left adjusted (overrides the "'0'" conversion | | | if both are given). | +-----------+-----------------------------------------------------------------------+ | "' '" | (a space) A blank should be left before a positive number (or empty | | | string) produced by a signed conversion. | +-----------+-----------------------------------------------------------------------+ | "'+'" | A sign character ("'+'" or "'-'") will precede the conversion | | | (overrides a “space” flag). | +-----------+-----------------------------------------------------------------------+ A length modifier ("h", "l", or "L") may be present, but is ignored as it is not necessary for Python – so e.g. "%ld" is identical to "%d". The conversion types are: +--------------+-------------------------------------------------------+---------+ | Conversion | Meaning | Notes | |==============|=======================================================|=========| | "'d'" | Signed integer decimal. | | +--------------+-------------------------------------------------------+---------+ | "'i'" | Signed integer decimal. | | +--------------+-------------------------------------------------------+---------+ | "'o'" | Signed octal value. | (1) | +--------------+-------------------------------------------------------+---------+ | "'u'" | Obsolete type – it is identical to "'d'". | (6) | +--------------+-------------------------------------------------------+---------+ | "'x'" | Signed hexadecimal (lowercase). | (2) | +--------------+-------------------------------------------------------+---------+ | "'X'" | Signed hexadecimal (uppercase). | (2) | +--------------+-------------------------------------------------------+---------+ | "'e'" | Floating point exponential format (lowercase). | (3) | +--------------+-------------------------------------------------------+---------+ | "'E'" | Floating point exponential format (uppercase). | (3) | +--------------+-------------------------------------------------------+---------+ | "'f'" | Floating point decimal format. | (3) | +--------------+-------------------------------------------------------+---------+ | "'F'" | Floating point decimal format. | (3) | +--------------+-------------------------------------------------------+---------+ | "'g'" | Floating point format. Uses lowercase exponential | (4) | | | format if exponent is less than -4 or not less than | | | | precision, decimal format otherwise. | | +--------------+-------------------------------------------------------+---------+ | "'G'" | Floating point format. Uses uppercase exponential | (4) | | | format if exponent is less than -4 or not less than | | | | precision, decimal format otherwise. | | +--------------+-------------------------------------------------------+---------+ | "'c'" | Single character (accepts integer or single character | | | | string). | | +--------------+-------------------------------------------------------+---------+ | "'r'" | String (converts any Python object using "repr()"). | (5) | +--------------+-------------------------------------------------------+---------+ | "'s'" | String (converts any Python object using "str()"). | (5) | +--------------+-------------------------------------------------------+---------+ | "'a'" | String (converts any Python object using "ascii()"). | (5) | +--------------+-------------------------------------------------------+---------+ | "'%'" | No argument is converted, results in a "'%'" | | | | character in the result. | | +--------------+-------------------------------------------------------+---------+ Notes: 1. The alternate form causes a leading octal specifier ("'0o'") to be inserted before the first digit. 2. The alternate form causes a leading "'0x'" or "'0X'" (depending on whether the "'x'" or "'X'" format was used) to be inserted before the first digit. 3. The alternate form causes the result to always contain a decimal point, even if no digits follow it. The precision determines the number of digits after the decimal point and defaults to 6. 4. The alternate form causes the result to always contain a decimal point, and trailing zeroes are not removed as they would otherwise be. The precision determines the number of significant digits before and after the decimal point and defaults to 6. 5. If precision is "N", the output is truncated to "N" characters. 6. See **PEP 237**. Since Python strings have an explicit length, "%s" conversions do not assume that "'\0'" is the end of the string. Changed in version 3.1: "%f" conversions for numbers whose absolute value is over 1e50 are no longer replaced by "%g" conversions. Binary Sequence Types — "bytes", "bytearray", "memoryview" ========================================================== The core built-in types for manipulating binary data are "bytes" and "bytearray". They are supported by "memoryview" which uses the buffer protocol to access the memory of other binary objects without needing to make a copy. The "array" module supports efficient storage of basic data types like 32-bit integers and IEEE754 double-precision floating values. Bytes Objects ------------- Bytes objects are immutable sequences of single bytes. Since many major binary protocols are based on the ASCII text encoding, bytes objects offer several methods that are only valid when working with ASCII compatible data and are closely related to string objects in a variety of other ways. class bytes([source[, encoding[, errors]]]) Firstly, the syntax for bytes literals is largely the same as that for string literals, except that a "b" prefix is added: * Single quotes: "b'still allows embedded "double" quotes'" * Double quotes: "b"still allows embedded 'single' quotes"" * Triple quoted: "b'''3 single quotes'''", "b"""3 double quotes"""" Only ASCII characters are permitted in bytes literals (regardless of the declared source code encoding). Any binary values over 127 must be entered into bytes literals using the appropriate escape sequence. As with string literals, bytes literals may also use a "r" prefix to disable processing of escape sequences. See String and Bytes literals for more about the various forms of bytes literal, including supported escape sequences. While bytes literals and representations are based on ASCII text, bytes objects actually behave like immutable sequences of integers, with each value in the sequence restricted such that "0 <= x < 256" (attempts to violate this restriction will trigger "ValueError"). This is done deliberately to emphasise that while many binary formats include ASCII based elements and can be usefully manipulated with some text-oriented algorithms, this is not generally the case for arbitrary binary data (blindly applying text processing algorithms to binary data formats that are not ASCII compatible will usually lead to data corruption). In addition to the literal forms, bytes objects can be created in a number of other ways: * A zero-filled bytes object of a specified length: "bytes(10)" * From an iterable of integers: "bytes(range(20))" * Copying existing binary data via the buffer protocol: "bytes(obj)" Also see the bytes built-in. Since 2 hexadecimal digits correspond precisely to a single byte, hexadecimal numbers are a commonly used format for describing binary data. Accordingly, the bytes type has an additional class method to read data in that format: classmethod fromhex(string) This "bytes" class method returns a bytes object, decoding the given string object. The string must contain two hexadecimal digits per byte, with ASCII whitespace being ignored. >>> bytes.fromhex('2Ef0 F1f2 ') b'.\xf0\xf1\xf2' Changed in version 3.7: "bytes.fromhex()" now skips all ASCII whitespace in the string, not just spaces. A reverse conversion function exists to transform a bytes object into its hexadecimal representation. hex([sep[, bytes_per_sep]]) Return a string object containing two hexadecimal digits for each byte in the instance. >>> b'\xf0\xf1\xf2'.hex() 'f0f1f2' If you want to make the hex string easier to read, you can specify a single character separator *sep* parameter to include in the output. By default between each byte. A second optional *bytes_per_sep* parameter controls the spacing. Positive values calculate the separator position from the right, negative values from the left. >>> value = b'\xf0\xf1\xf2' >>> value.hex('-') 'f0-f1-f2' >>> value.hex('_', 2) 'f0_f1f2' >>> b'UUDDLRLRAB'.hex(' ', -4) '55554444 4c524c52 4142' New in version 3.5. Changed in version 3.8: "bytes.hex()" now supports optional *sep* and *bytes_per_sep* parameters to insert separators between bytes in the hex output. Since bytes objects are sequences of integers (akin to a tuple), for a bytes object *b*, "b[0]" will be an integer, while "b[0:1]" will be a bytes object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1) The representation of bytes objects uses the literal format ("b'...'") since it is often more useful than e.g. "bytes([46, 46, 46])". You can always convert a bytes object into a list of integers using "list(b)". Bytearray Objects ----------------- "bytearray" objects are a mutable counterpart to "bytes" objects. class bytearray([source[, encoding[, errors]]]) There is no dedicated literal syntax for bytearray objects, instead they are always created by calling the constructor: * Creating an empty instance: "bytearray()" * Creating a zero-filled instance with a given length: "bytearray(10)" * From an iterable of integers: "bytearray(range(20))" * Copying existing binary data via the buffer protocol: "bytearray(b'Hi!')" As bytearray objects are mutable, they support the mutable sequence operations in addition to the common bytes and bytearray operations described in Bytes and Bytearray Operations. Also see the bytearray built-in. Since 2 hexadecimal digits correspond precisely to a single byte, hexadecimal numbers are a commonly used format for describing binary data. Accordingly, the bytearray type has an additional class method to read data in that format: classmethod fromhex(string) This "bytearray" class method returns bytearray object, decoding the given string object. The string must contain two hexadecimal digits per byte, with ASCII whitespace being ignored. >>> bytearray.fromhex('2Ef0 F1f2 ') bytearray(b'.\xf0\xf1\xf2') Changed in version 3.7: "bytearray.fromhex()" now skips all ASCII whitespace in the string, not just spaces. A reverse conversion function exists to transform a bytearray object into its hexadecimal representation. hex([sep[, bytes_per_sep]]) Return a string object containing two hexadecimal digits for each byte in the instance. >>> bytearray(b'\xf0\xf1\xf2').hex() 'f0f1f2' New in version 3.5. Changed in version 3.8: Similar to "bytes.hex()", "bytearray.hex()" now supports optional *sep* and *bytes_per_sep* parameters to insert separators between bytes in the hex output. Since bytearray objects are sequences of integers (akin to a list), for a bytearray object *b*, "b[0]" will be an integer, while "b[0:1]" will be a bytearray object of length 1. (This contrasts with text strings, where both indexing and slicing will produce a string of length 1) The representation of bytearray objects uses the bytes literal format ("bytearray(b'...')") since it is often more useful than e.g. "bytearray([46, 46, 46])". You can always convert a bytearray object into a list of integers using "list(b)". Bytes and Bytearray Operations ------------------------------ Both bytes and bytearray objects support the common sequence operations. They interoperate not just with operands of the same type, but with any *bytes-like object*. Due to this flexibility, they can be freely mixed in operations without causing errors. However, the return type of the result may depend on the order of operands. Note: The methods on bytes and bytearray objects don’t accept strings as their arguments, just as the methods on strings don’t accept bytes as their arguments. For example, you have to write: a = "abc" b = a.replace("a", "f") and: a = b"abc" b = a.replace(b"a", b"f") Some bytes and bytearray operations assume the use of ASCII compatible binary formats, and hence should be avoided when working with arbitrary binary data. These restrictions are covered below. Note: Using these ASCII based operations to manipulate binary data that is not stored in an ASCII based format may lead to data corruption. The following methods on bytes and bytearray objects can be used with arbitrary binary data. bytes.count(sub[, start[, end]]) bytearray.count(sub[, start[, end]]) Return the number of non-overlapping occurrences of subsequence *sub* in the range [*start*, *end*]. Optional arguments *start* and *end* are interpreted as in slice notation. The subsequence to search for may be any *bytes-like object* or an integer in the range 0 to 255. Changed in version 3.3: Also accept an integer in the range 0 to 255 as the subsequence. bytes.removeprefix(prefix, /) bytearray.removeprefix(prefix, /) If the binary data starts with the *prefix* string, return "bytes[len(prefix):]". Otherwise, return a copy of the original binary data: >>> b'TestHook'.removeprefix(b'Test') b'Hook' >>> b'BaseTestCase'.removeprefix(b'Test') b'BaseTestCase' The *prefix* may be any *bytes-like object*. Note: The bytearray version of this method does *not* operate in place - it always produces a new object, even if no changes were made. New in version 3.9. bytes.removesuffix(suffix, /) bytearray.removesuffix(suffix, /) If the binary data ends with the *suffix* string and that *suffix* is not empty, return "bytes[:-len(suffix)]". Otherwise, return a copy of the original binary data: >>> b'MiscTests'.removesuffix(b'Tests') b'Misc' >>> b'TmpDirMixin'.removesuffix(b'Tests') b'TmpDirMixin' The *suffix* may be any *bytes-like object*. Note: The bytearray version of this method does *not* operate in place - it always produces a new object, even if no changes were made. New in version 3.9. bytes.decode(encoding="utf-8", errors="strict") bytearray.decode(encoding="utf-8", errors="strict") Return a string decoded from the given bytes. Default encoding is "'utf-8'". *errors* may be given to set a different error handling scheme. The default for *errors* is "'strict'", meaning that encoding errors raise a "UnicodeError". Other possible values are "'ignore'", "'replace'" and any other name registered via "codecs.register_error()", see section Error Handlers. For a list of possible encodings, see section Standard Encodings. By default, the *errors* argument is not checked for best performances, but only used at the first decoding error. Enable the Python Development Mode, or use a debug build to check *errors*. Note: Passing the *encoding* argument to "str" allows decoding any *bytes-like object* directly, without needing to make a temporary bytes or bytearray object. Changed in version 3.1: Added support for keyword arguments. Changed in version 3.9: The *errors* is now checked in development mode and in debug mode. bytes.endswith(suffix[, start[, end]]) bytearray.endswith(suffix[, start[, end]]) Return "True" if the binary data ends with the specified *suffix*, otherwise return "False". *suffix* can also be a tuple of suffixes to look for. With optional *start*, test beginning at that position. With optional *end*, stop comparing at that position. The suffix(es) to search for may be any *bytes-like object*. bytes.find(sub[, start[, end]]) bytearray.find(sub[, start[, end]]) Return the lowest index in the data where the subsequence *sub* is found, such that *sub* is contained in the slice "s[start:end]". Optional arguments *start* and *end* are interpreted as in slice notation. Return "-1" if *sub* is not found. The subsequence to search for may be any *bytes-like object* or an integer in the range 0 to 255. Note: The "find()" method should be used only if you need to know the position of *sub*. To check if *sub* is a substring or not, use the "in" operator: >>> b'Py' in b'Python' True Changed in version 3.3: Also accept an integer in the range 0 to 255 as the subsequence. bytes.index(sub[, start[, end]]) bytearray.index(sub[, start[, end]]) Like "find()", but raise "ValueError" when the subsequence is not found. The subsequence to search for may be any *bytes-like object* or an integer in the range 0 to 255. Changed in version 3.3: Also accept an integer in the range 0 to 255 as the subsequence. bytes.join(iterable) bytearray.join(iterable) Return a bytes or bytearray object which is the concatenation of the binary data sequences in *iterable*. A "TypeError" will be raised if there are any values in *iterable* that are not *bytes- like objects*, including "str" objects. The separator between elements is the contents of the bytes or bytearray object providing this method. static bytes.maketrans(from, to) static bytearray.maketrans(from, to) This static method returns a translation table usable for "bytes.translate()" that will map each character in *from* into the character at the same position in *to*; *from* and *to* must both be *bytes-like objects* and have the same length. New in version 3.1. bytes.partition(sep) bytearray.partition(sep) Split the sequence at the first occurrence of *sep*, and return a 3-tuple containing the part before the separator, the separator itself or its bytearray copy, and the part after the separator. If the separator is not found, return a 3-tuple containing a copy of the original sequence, followed by two empty bytes or bytearray objects. The separator to search for may be any *bytes-like object*. bytes.replace(old, new[, count]) bytearray.replace(old, new[, count]) Return a copy of the sequence with all occurrences of subsequence *old* replaced by *new*. If the optional argument *count* is given, only the first *count* occurrences are replaced. The subsequence to search for and its replacement may be any *bytes-like object*. Note: The bytearray version of this method does *not* operate in place - it always produces a new object, even if no changes were made. bytes.rfind(sub[, start[, end]]) bytearray.rfind(sub[, start[, end]]) Return the highest index in the sequence where the subsequence *sub* is found, such that *sub* is contained within "s[start:end]". Optional arguments *start* and *end* are interpreted as in slice notation. Return "-1" on failure. The subsequence to search for may be any *bytes-like object* or an integer in the range 0 to 255. Changed in version 3.3: Also accept an integer in the range 0 to 255 as the subsequence. bytes.rindex(sub[, start[, end]]) bytearray.rindex(sub[, start[, end]]) Like "rfind()" but raises "ValueError" when the subsequence *sub* is not found. The subsequence to search for may be any *bytes-like object* or an integer in the range 0 to 255. Changed in version 3.3: Also accept an integer in the range 0 to 255 as the subsequence. bytes.rpartition(sep) bytearray.rpartition(sep) Split the sequence at the last occurrence of *sep*, and return a 3-tuple containing the part before the separator, the separator itself or its bytearray copy, and the part after the separator. If the separator is not found, return a 3-tuple containing two empty bytes or bytearray objects, followed by a copy of the original sequence. The separator to search for may be any *bytes-like object*. bytes.startswith(prefix[, start[, end]]) bytearray.startswith(prefix[, start[, end]]) Return "True" if the binary data starts with the specified *prefix*, otherwise return "False". *prefix* can also be a tuple of prefixes to look for. With optional *start*, test beginning at that position. With optional *end*, stop comparing at that position. The prefix(es) to search for may be any *bytes-like object*. bytes.translate(table, /, delete=b'') bytearray.translate(table, /, delete=b'') Return a copy of the bytes or bytearray object where all bytes occurring in the optional argument *delete* are removed, and the remaining bytes have been mapped through the given translation table, which must be a bytes object of length 256. You can use the "bytes.maketrans()" method to create a translation table. Set the *table* argument to "None" for translations that only delete characters: >>> b'read this short text'.translate(None, b'aeiou') b'rd ths shrt txt' Changed in version 3.6: *delete* is now supported as a keyword argument. The following methods on bytes and bytearray objects have default behaviours that assume the use of ASCII compatible binary formats, but can still be used with arbitrary binary data by passing appropriate arguments. Note that all of the bytearray methods in this section do *not* operate in place, and instead produce new objects. bytes.center(width[, fillbyte]) bytearray.center(width[, fillbyte]) Return a copy of the object centered in a sequence of length *width*. Padding is done using the specified *fillbyte* (default is an ASCII space). For "bytes" objects, the original sequence is returned if *width* is less than or equal to "len(s)". Note: The bytearray version of this method does *not* operate in place - it always produces a new object, even if no changes were made. bytes.ljust(width[, fillbyte]) bytearray.ljust(width[, fillbyte]) Return a copy of the object left justified in a sequence of length *width*. Padding is done using the specified *fillbyte* (default is an ASCII space). For "bytes" objects, the original sequence is returned if *width* is less than or equal to "len(s)". Note: The bytearray version of this method does *not* operate in place - it always produces a new object, even if no changes were made. bytes.lstrip([chars]) bytearray.lstrip([chars]) Return a copy of the sequence with specified leading bytes removed. The *chars* argument is a binary sequence specifying the set of byte values to be removed - the name refers to the fact this method is usually used with ASCII characters. If omitted or "None", the *chars* argument defaults to removing ASCII whitespace. The *chars* argument is not a prefix; rather, all combinations of its values are stripped: >>> b' spacious '.lstrip() b'spacious ' >>> b'www.example.com'.lstrip(b'cmowz.') b'example.com' The binary sequence of byte values to remove may be any *bytes-like object*. See "removeprefix()" for a method that will remove a single prefix string rather than all of a set of characters. For example: >>> b'Arthur: three!'.lstrip(b'Arthur: ') b'ee!' >>> b'Arthur: three!'.removeprefix(b'Arthur: ') b'three!' Note: The bytearray version of this method does *not* operate in place - it always produces a new object, even if no changes were made. bytes.rjust(width[, fillbyte]) bytearray.rjust(width[, fillbyte]) Return a copy of the object right justified in a sequence of length *width*. Padding is done using the specified *fillbyte* (default is an ASCII space). For "bytes" objects, the original sequence is returned if *width* is less than or equal to "len(s)". Note: The bytearray version of this method does *not* operate in place - it always produces a new object, even if no changes were made. bytes.rsplit(sep=None, maxsplit=-1) bytearray.rsplit(sep=None, maxsplit=-1) Split the binary sequence into subsequences of the same type, using *sep* as the delimiter string. If *maxsplit* is given, at most *maxsplit* splits are done, the *rightmost* ones. If *sep* is not specified or "None", any subsequence consisting solely of ASCII whitespace is a separator. Except for splitting from the right, "rsplit()" behaves like "split()" which is described in detail below. bytes.rstrip([chars]) bytearray.rstrip([chars]) Return a copy of the sequence with specified trailing bytes removed. The *chars* argument is a binary sequence specifying the set of byte values to be removed - the name refers to the fact this method is usually used with ASCII characters. If omitted or "None", the *chars* argument defaults to removing ASCII whitespace. The *chars* argument is not a suffix; rather, all combinations of its values are stripped: >>> b' spacious '.rstrip() b' spacious' >>> b'mississippi'.rstrip(b'ipz') b'mississ' The binary sequence of byte values to remove may be any *bytes-like object*. See "removesuffix()" for a method that will remove a single suffix string rather than all of a set of characters. For example: >>> b'Monty Python'.rstrip(b' Python') b'M' >>> b'Monty Python'.removesuffix(b' Python') b'Monty' Note: The bytearray version of this method does *not* operate in place - it always produces a new object, even if no changes were made. bytes.split(sep=None, maxsplit=-1) bytearray.split(sep=None, maxsplit=-1) Split the binary sequence into subsequences of the same type, using *sep* as the delimiter string. If *maxsplit* is given and non- negative, at most *maxsplit* splits are done (thus, the list will have at most "maxsplit+1" elements). If *maxsplit* is not specified or is "-1", then there is no limit on the number of splits (all possible splits are made). If *sep* is given, consecutive delimiters are not grouped together and are deemed to delimit empty subsequences (for example, "b'1,,2'.split(b',')" returns "[b'1', b'', b'2']"). The *sep* argument may consist of a multibyte sequence (for example, "b'1<>2<>3'.split(b'<>')" returns "[b'1', b'2', b'3']"). Splitting an empty sequence with a specified separator returns "[b'']" or "[bytearray(b'')]" depending on the type of object being split. The *sep* argument may be any *bytes-like object*. For example: >>> b'1,2,3'.split(b',') [b'1', b'2', b'3'] >>> b'1,2,3'.split(b',', maxsplit=1) [b'1', b'2,3'] >>> b'1,2,,3,'.split(b',') [b'1', b'2', b'', b'3', b''] If *sep* is not specified or is "None", a different splitting algorithm is applied: runs of consecutive ASCII whitespace are regarded as a single separator, and the result will contain no empty strings at the start or end if the sequence has leading or trailing whitespace. Consequently, splitting an empty sequence or a sequence consisting solely of ASCII whitespace without a specified separator returns "[]". For example: >>> b'1 2 3'.split() [b'1', b'2', b'3'] >>> b'1 2 3'.split(maxsplit=1) [b'1', b'2 3'] >>> b' 1 2 3 '.split() [b'1', b'2', b'3'] bytes.strip([chars]) bytearray.strip([chars]) Return a copy of the sequence with specified leading and trailing bytes removed. The *chars* argument is a binary sequence specifying the set of byte values to be removed - the name refers to the fact this method is usually used with ASCII characters. If omitted or "None", the *chars* argument defaults to removing ASCII whitespace. The *chars* argument is not a prefix or suffix; rather, all combinations of its values are stripped: >>> b' spacious '.strip() b'spacious' >>> b'www.example.com'.strip(b'cmowz.') b'example' The binary sequence of byte values to remove may be any *bytes-like object*. Note: The bytearray version of this method does *not* operate in place - it always produces a new object, even if no changes were made. The following methods on bytes and bytearray objects assume the use of ASCII compatible binary formats and should not be applied to arbitrary binary data. Note that all of the bytearray methods in this section do *not* operate in place, and instead produce new objects. bytes.capitalize() bytearray.capitalize() Return a copy of the sequence with each byte interpreted as an ASCII character, and the first byte capitalized and the rest lowercased. Non-ASCII byte values are passed through unchanged. Note: The bytearray version of this method does *not* operate in place - it always produces a new object, even if no changes were made. bytes.expandtabs(tabsize=8) bytearray.expandtabs(tabsize=8) Return a copy of the sequence where all ASCII tab characters are replaced by one or more ASCII spaces, depending on the current column and the given tab size. Tab positions occur every *tabsize* bytes (default is 8, giving tab positions at columns 0, 8, 16 and so on). To expand the sequence, the current column is set to zero and the sequence is examined byte by byte. If the byte is an ASCII tab character ("b'\t'"), one or more space characters are inserted in the result until the current column is equal to the next tab position. (The tab character itself is not copied.) If the current byte is an ASCII newline ("b'\n'") or carriage return ("b'\r'"), it is copied and the current column is reset to zero. Any other byte value is copied unchanged and the current column is incremented by one regardless of how the byte value is represented when printed: >>> b'01\t012\t0123\t01234'.expandtabs() b'01 012 0123 01234' >>> b'01\t012\t0123\t01234'.expandtabs(4) b'01 012 0123 01234' Note: The bytearray version of this method does *not* operate in place - it always produces a new object, even if no changes were made. bytes.isalnum() bytearray.isalnum() Return "True" if all bytes in the sequence are alphabetical ASCII characters or ASCII decimal digits and the sequence is not empty, "False" otherwise. Alphabetic ASCII characters are those byte values in the sequence "b'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'". ASCII decimal digits are those byte values in the sequence "b'0123456789'". For example: >>> b'ABCabc1'.isalnum() True >>> b'ABC abc1'.isalnum() False bytes.isalpha() bytearray.isalpha() Return "True" if all bytes in the sequence are alphabetic ASCII characters and the sequence is not empty, "False" otherwise. Alphabetic ASCII characters are those byte values in the sequence "b'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ'". For example: >>> b'ABCabc'.isalpha() True >>> b'ABCabc1'.isalpha() False bytes.isascii() bytearray.isascii() Return "True" if the sequence is empty or all bytes in the sequence are ASCII, "False" otherwise. ASCII bytes are in the range 0-0x7F. New in version 3.7. bytes.isdigit() bytearray.isdigit() Return "True" if all bytes in the sequence are ASCII decimal digits and the sequence is not empty, "False" otherwise. ASCII decimal digits are those byte values in the sequence "b'0123456789'". For example: >>> b'1234'.isdigit() True >>> b'1.23'.isdigit() False bytes.islower() bytearray.islower() Return "True" if there is at least one lowercase ASCII character in the sequence and no uppercase ASCII characters, "False" otherwise. For example: >>> b'hello world'.islower() True >>> b'Hello world'.islower() False Lowercase ASCII characters are those byte values in the sequence "b'abcdefghijklmnopqrstuvwxyz'". Uppercase ASCII characters are those byte values in the sequence "b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'". bytes.isspace() bytearray.isspace() Return "True" if all bytes in the sequence are ASCII whitespace and the sequence is not empty, "False" otherwise. ASCII whitespace characters are those byte values in the sequence "b' \t\n\r\x0b\f'" (space, tab, newline, carriage return, vertical tab, form feed). bytes.istitle() bytearray.istitle() Return "True" if the sequence is ASCII titlecase and the sequence is not empty, "False" otherwise. See "bytes.title()" for more details on the definition of “titlecase”. For example: >>> b'Hello World'.istitle() True >>> b'Hello world'.istitle() False bytes.isupper() bytearray.isupper() Return "True" if there is at least one uppercase alphabetic ASCII character in the sequence and no lowercase ASCII characters, "False" otherwise. For example: >>> b'HELLO WORLD'.isupper() True >>> b'Hello world'.isupper() False Lowercase ASCII characters are those byte values in the sequence "b'abcdefghijklmnopqrstuvwxyz'". Uppercase ASCII characters are those byte values in the sequence "b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'". bytes.lower() bytearray.lower() Return a copy of the sequence with all the uppercase ASCII characters converted to their corresponding lowercase counterpart. For example: >>> b'Hello World'.lower() b'hello world' Lowercase ASCII characters are those byte values in the sequence "b'abcdefghijklmnopqrstuvwxyz'". Uppercase ASCII characters are those byte values in the sequence "b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'". Note: The bytearray version of this method does *not* operate in place - it always produces a new object, even if no changes were made. bytes.splitlines(keepends=False) bytearray.splitlines(keepends=False) Return a list of the lines in the binary sequence, breaking at ASCII line boundaries. This method uses the *universal newlines* approach to splitting lines. Line breaks are not included in the resulting list unless *keepends* is given and true. For example: >>> b'ab c\n\nde fg\rkl\r\n'.splitlines() [b'ab c', b'', b'de fg', b'kl'] >>> b'ab c\n\nde fg\rkl\r\n'.splitlines(keepends=True) [b'ab c\n', b'\n', b'de fg\r', b'kl\r\n'] Unlike "split()" when a delimiter string *sep* is given, this method returns an empty list for the empty string, and a terminal line break does not result in an extra line: >>> b"".split(b'\n'), b"Two lines\n".split(b'\n') ([b''], [b'Two lines', b'']) >>> b"".splitlines(), b"One line\n".splitlines() ([], [b'One line']) bytes.swapcase() bytearray.swapcase() Return a copy of the sequence with all the lowercase ASCII characters converted to their corresponding uppercase counterpart and vice-versa. For example: >>> b'Hello World'.swapcase() b'hELLO wORLD' Lowercase ASCII characters are those byte values in the sequence "b'abcdefghijklmnopqrstuvwxyz'". Uppercase ASCII characters are those byte values in the sequence "b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'". Unlike "str.swapcase()", it is always the case that "bin.swapcase().swapcase() == bin" for the binary versions. Case conversions are symmetrical in ASCII, even though that is not generally true for arbitrary Unicode code points. Note: The bytearray version of this method does *not* operate in place - it always produces a new object, even if no changes were made. bytes.title() bytearray.title() Return a titlecased version of the binary sequence where words start with an uppercase ASCII character and the remaining characters are lowercase. Uncased byte values are left unmodified. For example: >>> b'Hello world'.title() b'Hello World' Lowercase ASCII characters are those byte values in the sequence "b'abcdefghijklmnopqrstuvwxyz'". Uppercase ASCII characters are those byte values in the sequence "b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'". All other byte values are uncased. The algorithm uses a simple language-independent definition of a word as groups of consecutive letters. The definition works in many contexts but it means that apostrophes in contractions and possessives form word boundaries, which may not be the desired result: >>> b"they're bill's friends from the UK".title() b"They'Re Bill'S Friends From The Uk" A workaround for apostrophes can be constructed using regular expressions: >>> import re >>> def titlecase(s): ... return re.sub(rb"[A-Za-z]+('[A-Za-z]+)?", ... lambda mo: mo.group(0)[0:1].upper() + ... mo.group(0)[1:].lower(), ... s) ... >>> titlecase(b"they're bill's friends.") b"They're Bill's Friends." Note: The bytearray version of this method does *not* operate in place - it always produces a new object, even if no changes were made. bytes.upper() bytearray.upper() Return a copy of the sequence with all the lowercase ASCII characters converted to their corresponding uppercase counterpart. For example: >>> b'Hello World'.upper() b'HELLO WORLD' Lowercase ASCII characters are those byte values in the sequence "b'abcdefghijklmnopqrstuvwxyz'". Uppercase ASCII characters are those byte values in the sequence "b'ABCDEFGHIJKLMNOPQRSTUVWXYZ'". Note: The bytearray version of this method does *not* operate in place - it always produces a new object, even if no changes were made. bytes.zfill(width) bytearray.zfill(width) Return a copy of the sequence left filled with ASCII "b'0'" digits to make a sequence of length *width*. A leading sign prefix ("b'+'"/ "b'-'") is handled by inserting the padding *after* the sign character rather than before. For "bytes" objects, the original sequence is returned if *width* is less than or equal to "len(seq)". For example: >>> b"42".zfill(5) b'00042' >>> b"-42".zfill(5) b'-0042' Note: The bytearray version of this method does *not* operate in place - it always produces a new object, even if no changes were made. "printf"-style Bytes Formatting ------------------------------- Note: The formatting operations described here exhibit a variety of quirks that lead to a number of common errors (such as failing to display tuples and dictionaries correctly). If the value being printed may be a tuple or dictionary, wrap it in a tuple. Bytes objects ("bytes"/"bytearray") have one unique built-in operation: the "%" operator (modulo). This is also known as the bytes *formatting* or *interpolation* operator. Given "format % values" (where *format* is a bytes object), "%" conversion specifications in *format* are replaced with zero or more elements of *values*. The effect is similar to using the "sprintf()" in the C language. If *format* requires a single argument, *values* may be a single non- tuple object. [5] Otherwise, *values* must be a tuple with exactly the number of items specified by the format bytes object, or a single mapping object (for example, a dictionary). A conversion specifier contains two or more characters and has the following components, which must occur in this order: 1. The "'%'" character, which marks the start of the specifier. 2. Mapping key (optional), consisting of a parenthesised sequence of characters (for example, "(somename)"). 3. Conversion flags (optional), which affect the result of some conversion types. 4. Minimum field width (optional). If specified as an "'*'" (asterisk), the actual width is read from the next element of the tuple in *values*, and the object to convert comes after the minimum field width and optional precision. 5. Precision (optional), given as a "'.'" (dot) followed by the precision. If specified as "'*'" (an asterisk), the actual precision is read from the next element of the tuple in *values*, and the value to convert comes after the precision. 6. Length modifier (optional). 7. Conversion type. When the right argument is a dictionary (or other mapping type), then the formats in the bytes object *must* include a parenthesised mapping key into that dictionary inserted immediately after the "'%'" character. The mapping key selects the value to be formatted from the mapping. For example: >>> print(b'%(language)s has %(number)03d quote types.' % ... {b'language': b"Python", b"number": 2}) b'Python has 002 quote types.' In this case no "*" specifiers may occur in a format (since they require a sequential parameter list). The conversion flag characters are: +-----------+-----------------------------------------------------------------------+ | Flag | Meaning | |===========|=======================================================================| | "'#'" | The value conversion will use the “alternate form” (where defined | | | below). | +-----------+-----------------------------------------------------------------------+ | "'0'" | The conversion will be zero padded for numeric values. | +-----------+-----------------------------------------------------------------------+ | "'-'" | The converted value is left adjusted (overrides the "'0'" conversion | | | if both are given). | +-----------+-----------------------------------------------------------------------+ | "' '" | (a space) A blank should be left before a positive number (or empty | | | string) produced by a signed conversion. | +-----------+-----------------------------------------------------------------------+ | "'+'" | A sign character ("'+'" or "'-'") will precede the conversion | | | (overrides a “space” flag). | +-----------+-----------------------------------------------------------------------+ A length modifier ("h", "l", or "L") may be present, but is ignored as it is not necessary for Python – so e.g. "%ld" is identical to "%d". The conversion types are: +--------------+-------------------------------------------------------+---------+ | Conversion | Meaning | Notes | |==============|=======================================================|=========| | "'d'" | Signed integer decimal. | | +--------------+-------------------------------------------------------+---------+ | "'i'" | Signed integer decimal. | | +--------------+-------------------------------------------------------+---------+ | "'o'" | Signed octal value. | (1) | +--------------+-------------------------------------------------------+---------+ | "'u'" | Obsolete type – it is identical to "'d'". | (8) | +--------------+-------------------------------------------------------+---------+ | "'x'" | Signed hexadecimal (lowercase). | (2) | +--------------+-------------------------------------------------------+---------+ | "'X'" | Signed hexadecimal (uppercase). | (2) | +--------------+-------------------------------------------------------+---------+ | "'e'" | Floating point exponential format (lowercase). | (3) | +--------------+-------------------------------------------------------+---------+ | "'E'" | Floating point exponential format (uppercase). | (3) | +--------------+-------------------------------------------------------+---------+ | "'f'" | Floating point decimal format. | (3) | +--------------+-------------------------------------------------------+---------+ | "'F'" | Floating point decimal format. | (3) | +--------------+-------------------------------------------------------+---------+ | "'g'" | Floating point format. Uses lowercase exponential | (4) | | | format if exponent is less than -4 or not less than | | | | precision, decimal format otherwise. | | +--------------+-------------------------------------------------------+---------+ | "'G'" | Floating point format. Uses uppercase exponential | (4) | | | format if exponent is less than -4 or not less than | | | | precision, decimal format otherwise. | | +--------------+-------------------------------------------------------+---------+ | "'c'" | Single byte (accepts integer or single byte objects). | | +--------------+-------------------------------------------------------+---------+ | "'b'" | Bytes (any object that follows the buffer protocol or | (5) | | | has "__bytes__()"). | | +--------------+-------------------------------------------------------+---------+ | "'s'" | "'s'" is an alias for "'b'" and should only be used | (6) | | | for Python2/3 code bases. | | +--------------+-------------------------------------------------------+---------+ | "'a'" | Bytes (converts any Python object using | (5) | | | "repr(obj).encode('ascii', 'backslashreplace')"). | | +--------------+-------------------------------------------------------+---------+ | "'r'" | "'r'" is an alias for "'a'" and should only be used | (7) | | | for Python2/3 code bases. | | +--------------+-------------------------------------------------------+---------+ | "'%'" | No argument is converted, results in a "'%'" | | | | character in the result. | | +--------------+-------------------------------------------------------+---------+ Notes: 1. The alternate form causes a leading octal specifier ("'0o'") to be inserted before the first digit. 2. The alternate form causes a leading "'0x'" or "'0X'" (depending on whether the "'x'" or "'X'" format was used) to be inserted before the first digit. 3. The alternate form causes the result to always contain a decimal point, even if no digits follow it. The precision determines the number of digits after the decimal point and defaults to 6. 4. The alternate form causes the result to always contain a decimal point, and trailing zeroes are not removed as they would otherwise be. The precision determines the number of significant digits before and after the decimal point and defaults to 6. 5. If precision is "N", the output is truncated to "N" characters. 6. "b'%s'" is deprecated, but will not be removed during the 3.x series. 7. "b'%r'" is deprecated, but will not be removed during the 3.x series. 8. See **PEP 237**. Note: The bytearray version of this method does *not* operate in place - it always produces a new object, even if no changes were made. See also: **PEP 461** - Adding % formatting to bytes and bytearray New in version 3.5. Memory Views ------------ "memoryview" objects allow Python code to access the internal data of an object that supports the buffer protocol without copying. class memoryview(object) Create a "memoryview" that references *object*. *object* must support the buffer protocol. Built-in objects that support the buffer protocol include "bytes" and "bytearray". A "memoryview" has the notion of an *element*, which is the atomic memory unit handled by the originating *object*. For many simple types such as "bytes" and "bytearray", an element is a single byte, but other types such as "array.array" may have bigger elements. "len(view)" is equal to the length of "tolist". If "view.ndim = 0", the length is 1. If "view.ndim = 1", the length is equal to the number of elements in the view. For higher dimensions, the length is equal to the length of the nested list representation of the view. The "itemsize" attribute will give you the number of bytes in a single element. A "memoryview" supports slicing and indexing to expose its data. One-dimensional slicing will result in a subview: >>> v = memoryview(b'abcefg') >>> v[1] 98 >>> v[-1] 103 >>> v[1:4] >>> bytes(v[1:4]) b'bce' If "format" is one of the native format specifiers from the "struct" module, indexing with an integer or a tuple of integers is also supported and returns a single *element* with the correct type. One-dimensional memoryviews can be indexed with an integer or a one-integer tuple. Multi-dimensional memoryviews can be indexed with tuples of exactly *ndim* integers where *ndim* is the number of dimensions. Zero-dimensional memoryviews can be indexed with the empty tuple. Here is an example with a non-byte format: >>> import array >>> a = array.array('l', [-11111111, 22222222, -33333333, 44444444]) >>> m = memoryview(a) >>> m[0] -11111111 >>> m[-1] 44444444 >>> m[::2].tolist() [-11111111, -33333333] If the underlying object is writable, the memoryview supports one- dimensional slice assignment. Resizing is not allowed: >>> data = bytearray(b'abcefg') >>> v = memoryview(data) >>> v.readonly False >>> v[0] = ord(b'z') >>> data bytearray(b'zbcefg') >>> v[1:4] = b'123' >>> data bytearray(b'z123fg') >>> v[2:3] = b'spam' Traceback (most recent call last): File "", line 1, in ValueError: memoryview assignment: lvalue and rvalue have different structures >>> v[2:6] = b'spam' >>> data bytearray(b'z1spam') One-dimensional memoryviews of hashable (read-only) types with formats ‘B’, ‘b’ or ‘c’ are also hashable. The hash is defined as "hash(m) == hash(m.tobytes())": >>> v = memoryview(b'abcefg') >>> hash(v) == hash(b'abcefg') True >>> hash(v[2:4]) == hash(b'ce') True >>> hash(v[::-2]) == hash(b'abcefg'[::-2]) True Changed in version 3.3: One-dimensional memoryviews can now be sliced. One-dimensional memoryviews with formats ‘B’, ‘b’ or ‘c’ are now hashable. Changed in version 3.4: memoryview is now registered automatically with "collections.abc.Sequence" Changed in version 3.5: memoryviews can now be indexed with tuple of integers. "memoryview" has several methods: __eq__(exporter) A memoryview and a **PEP 3118** exporter are equal if their shapes are equivalent and if all corresponding values are equal when the operands’ respective format codes are interpreted using "struct" syntax. For the subset of "struct" format strings currently supported by "tolist()", "v" and "w" are equal if "v.tolist() == w.tolist()": >>> import array >>> a = array.array('I', [1, 2, 3, 4, 5]) >>> b = array.array('d', [1.0, 2.0, 3.0, 4.0, 5.0]) >>> c = array.array('b', [5, 3, 1]) >>> x = memoryview(a) >>> y = memoryview(b) >>> x == a == y == b True >>> x.tolist() == a.tolist() == y.tolist() == b.tolist() True >>> z = y[::-2] >>> z == c True >>> z.tolist() == c.tolist() True If either format string is not supported by the "struct" module, then the objects will always compare as unequal (even if the format strings and buffer contents are identical): >>> from ctypes import BigEndianStructure, c_long >>> class BEPoint(BigEndianStructure): ... _fields_ = [("x", c_long), ("y", c_long)] ... >>> point = BEPoint(100, 200) >>> a = memoryview(point) >>> b = memoryview(point) >>> a == point False >>> a == b False Note that, as with floating point numbers, "v is w" does *not* imply "v == w" for memoryview objects. Changed in version 3.3: Previous versions compared the raw memory disregarding the item format and the logical array structure. tobytes(order=None) Return the data in the buffer as a bytestring. This is equivalent to calling the "bytes" constructor on the memoryview. >>> m = memoryview(b"abc") >>> m.tobytes() b'abc' >>> bytes(m) b'abc' For non-contiguous arrays the result is equal to the flattened list representation with all elements converted to bytes. "tobytes()" supports all format strings, including those that are not in "struct" module syntax. New in version 3.8: *order* can be {‘C’, ‘F’, ‘A’}. When *order* is ‘C’ or ‘F’, the data of the original array is converted to C or Fortran order. For contiguous views, ‘A’ returns an exact copy of the physical memory. In particular, in- memory Fortran order is preserved. For non-contiguous views, the data is converted to C first. *order=None* is the same as *order=’C’*. hex([sep[, bytes_per_sep]]) Return a string object containing two hexadecimal digits for each byte in the buffer. >>> m = memoryview(b"abc") >>> m.hex() '616263' New in version 3.5. Changed in version 3.8: Similar to "bytes.hex()", "memoryview.hex()" now supports optional *sep* and *bytes_per_sep* parameters to insert separators between bytes in the hex output. tolist() Return the data in the buffer as a list of elements. >>> memoryview(b'abc').tolist() [97, 98, 99] >>> import array >>> a = array.array('d', [1.1, 2.2, 3.3]) >>> m = memoryview(a) >>> m.tolist() [1.1, 2.2, 3.3] Changed in version 3.3: "tolist()" now supports all single character native formats in "struct" module syntax as well as multi-dimensional representations. toreadonly() Return a readonly version of the memoryview object. The original memoryview object is unchanged. >>> m = memoryview(bytearray(b'abc')) >>> mm = m.toreadonly() >>> mm.tolist() [89, 98, 99] >>> mm[0] = 42 Traceback (most recent call last): File "", line 1, in TypeError: cannot modify read-only memory >>> m[0] = 43 >>> mm.tolist() [43, 98, 99] New in version 3.8. release() Release the underlying buffer exposed by the memoryview object. Many objects take special actions when a view is held on them (for example, a "bytearray" would temporarily forbid resizing); therefore, calling release() is handy to remove these restrictions (and free any dangling resources) as soon as possible. After this method has been called, any further operation on the view raises a "ValueError" (except "release()" itself which can be called multiple times): >>> m = memoryview(b'abc') >>> m.release() >>> m[0] Traceback (most recent call last): File "", line 1, in ValueError: operation forbidden on released memoryview object The context management protocol can be used for a similar effect, using the "with" statement: >>> with memoryview(b'abc') as m: ... m[0] ... 97 >>> m[0] Traceback (most recent call last): File "", line 1, in ValueError: operation forbidden on released memoryview object New in version 3.2. cast(format[, shape]) Cast a memoryview to a new format or shape. *shape* defaults to "[byte_length//new_itemsize]", which means that the result view will be one-dimensional. The return value is a new memoryview, but the buffer itself is not copied. Supported casts are 1D -> C-*contiguous* and C-contiguous -> 1D. The destination format is restricted to a single element native format in "struct" syntax. One of the formats must be a byte format (‘B’, ‘b’ or ‘c’). The byte length of the result must be the same as the original length. Cast 1D/long to 1D/unsigned bytes: >>> import array >>> a = array.array('l', [1,2,3]) >>> x = memoryview(a) >>> x.format 'l' >>> x.itemsize 8 >>> len(x) 3 >>> x.nbytes 24 >>> y = x.cast('B') >>> y.format 'B' >>> y.itemsize 1 >>> len(y) 24 >>> y.nbytes 24 Cast 1D/unsigned bytes to 1D/char: >>> b = bytearray(b'zyz') >>> x = memoryview(b) >>> x[0] = b'a' Traceback (most recent call last): File "", line 1, in ValueError: memoryview: invalid value for format "B" >>> y = x.cast('c') >>> y[0] = b'a' >>> b bytearray(b'ayz') Cast 1D/bytes to 3D/ints to 1D/signed char: >>> import struct >>> buf = struct.pack("i"*12, *list(range(12))) >>> x = memoryview(buf) >>> y = x.cast('i', shape=[2,2,3]) >>> y.tolist() [[[0, 1, 2], [3, 4, 5]], [[6, 7, 8], [9, 10, 11]]] >>> y.format 'i' >>> y.itemsize 4 >>> len(y) 2 >>> y.nbytes 48 >>> z = y.cast('b') >>> z.format 'b' >>> z.itemsize 1 >>> len(z) 48 >>> z.nbytes 48 Cast 1D/unsigned long to 2D/unsigned long: >>> buf = struct.pack("L"*6, *list(range(6))) >>> x = memoryview(buf) >>> y = x.cast('L', shape=[2,3]) >>> len(y) 2 >>> y.nbytes 48 >>> y.tolist() [[0, 1, 2], [3, 4, 5]] New in version 3.3. Changed in version 3.5: The source format is no longer restricted when casting to a byte view. There are also several readonly attributes available: obj The underlying object of the memoryview: >>> b = bytearray(b'xyz') >>> m = memoryview(b) >>> m.obj is b True New in version 3.3. nbytes "nbytes == product(shape) * itemsize == len(m.tobytes())". This is the amount of space in bytes that the array would use in a contiguous representation. It is not necessarily equal to "len(m)": >>> import array >>> a = array.array('i', [1,2,3,4,5]) >>> m = memoryview(a) >>> len(m) 5 >>> m.nbytes 20 >>> y = m[::2] >>> len(y) 3 >>> y.nbytes 12 >>> len(y.tobytes()) 12 Multi-dimensional arrays: >>> import struct >>> buf = struct.pack("d"*12, *[1.5*x for x in range(12)]) >>> x = memoryview(buf) >>> y = x.cast('d', shape=[3,4]) >>> y.tolist() [[0.0, 1.5, 3.0, 4.5], [6.0, 7.5, 9.0, 10.5], [12.0, 13.5, 15.0, 16.5]] >>> len(y) 3 >>> y.nbytes 96 New in version 3.3. readonly A bool indicating whether the memory is read only. format A string containing the format (in "struct" module style) for each element in the view. A memoryview can be created from exporters with arbitrary format strings, but some methods (e.g. "tolist()") are restricted to native single element formats. Changed in version 3.3: format "'B'" is now handled according to the struct module syntax. This means that "memoryview(b'abc')[0] == b'abc'[0] == 97". itemsize The size in bytes of each element of the memoryview: >>> import array, struct >>> m = memoryview(array.array('H', [32000, 32001, 32002])) >>> m.itemsize 2 >>> m[0] 32000 >>> struct.calcsize('H') == m.itemsize True ndim An integer indicating how many dimensions of a multi-dimensional array the memory represents. shape A tuple of integers the length of "ndim" giving the shape of the memory as an N-dimensional array. Changed in version 3.3: An empty tuple instead of "None" when ndim = 0. strides A tuple of integers the length of "ndim" giving the size in bytes to access each element for each dimension of the array. Changed in version 3.3: An empty tuple instead of "None" when ndim = 0. suboffsets Used internally for PIL-style arrays. The value is informational only. c_contiguous A bool indicating whether the memory is C-*contiguous*. New in version 3.3. f_contiguous A bool indicating whether the memory is Fortran *contiguous*. New in version 3.3. contiguous A bool indicating whether the memory is *contiguous*. New in version 3.3. Set Types — "set", "frozenset" ============================== A *set* object is an unordered collection of distinct *hashable* objects. Common uses include membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference. (For other containers see the built-in "dict", "list", and "tuple" classes, and the "collections" module.) Like other collections, sets support "x in set", "len(set)", and "for x in set". Being an unordered collection, sets do not record element position or order of insertion. Accordingly, sets do not support indexing, slicing, or other sequence-like behavior. There are currently two built-in set types, "set" and "frozenset". The "set" type is mutable — the contents can be changed using methods like "add()" and "remove()". Since it is mutable, it has no hash value and cannot be used as either a dictionary key or as an element of another set. The "frozenset" type is immutable and *hashable* — its contents cannot be altered after it is created; it can therefore be used as a dictionary key or as an element of another set. Non-empty sets (not frozensets) can be created by placing a comma- separated list of elements within braces, for example: "{'jack', 'sjoerd'}", in addition to the "set" constructor. The constructors for both classes work the same: class set([iterable]) class frozenset([iterable]) Return a new set or frozenset object whose elements are taken from *iterable*. The elements of a set must be *hashable*. To represent sets of sets, the inner sets must be "frozenset" objects. If *iterable* is not specified, a new empty set is returned. Sets can be created by several means: * Use a comma-separated list of elements within braces: "{'jack', 'sjoerd'}" * Use a set comprehension: "{c for c in 'abracadabra' if c not in 'abc'}" * Use the type constructor: "set()", "set('foobar')", "set(['a', 'b', 'foo'])" Instances of "set" and "frozenset" provide the following operations: len(s) Return the number of elements in set *s* (cardinality of *s*). x in s Test *x* for membership in *s*. x not in s Test *x* for non-membership in *s*. isdisjoint(other) Return "True" if the set has no elements in common with *other*. Sets are disjoint if and only if their intersection is the empty set. issubset(other) set <= other Test whether every element in the set is in *other*. set < other Test whether the set is a proper subset of *other*, that is, "set <= other and set != other". issuperset(other) set >= other Test whether every element in *other* is in the set. set > other Test whether the set is a proper superset of *other*, that is, "set >= other and set != other". union(*others) set | other | ... Return a new set with elements from the set and all others. intersection(*others) set & other & ... Return a new set with elements common to the set and all others. difference(*others) set - other - ... Return a new set with elements in the set that are not in the others. symmetric_difference(other) set ^ other Return a new set with elements in either the set or *other* but not both. copy() Return a shallow copy of the set. Note, the non-operator versions of "union()", "intersection()", "difference()", and "symmetric_difference()", "issubset()", and "issuperset()" methods will accept any iterable as an argument. In contrast, their operator based counterparts require their arguments to be sets. This precludes error-prone constructions like "set('abc') & 'cbs'" in favor of the more readable "set('abc').intersection('cbs')". Both "set" and "frozenset" support set to set comparisons. Two sets are equal if and only if every element of each set is contained in the other (each is a subset of the other). A set is less than another set if and only if the first set is a proper subset of the second set (is a subset, but is not equal). A set is greater than another set if and only if the first set is a proper superset of the second set (is a superset, but is not equal). Instances of "set" are compared to instances of "frozenset" based on their members. For example, "set('abc') == frozenset('abc')" returns "True" and so does "set('abc') in set([frozenset('abc')])". The subset and equality comparisons do not generalize to a total ordering function. For example, any two nonempty disjoint sets are not equal and are not subsets of each other, so *all* of the following return "False": "ab". Since sets only define partial ordering (subset relationships), the output of the "list.sort()" method is undefined for lists of sets. Set elements, like dictionary keys, must be *hashable*. Binary operations that mix "set" instances with "frozenset" return the type of the first operand. For example: "frozenset('ab') | set('bc')" returns an instance of "frozenset". The following table lists operations available for "set" that do not apply to immutable instances of "frozenset": update(*others) set |= other | ... Update the set, adding elements from all others. intersection_update(*others) set &= other & ... Update the set, keeping only elements found in it and all others. difference_update(*others) set -= other | ... Update the set, removing elements found in others. symmetric_difference_update(other) set ^= other Update the set, keeping only elements found in either set, but not in both. add(elem) Add element *elem* to the set. remove(elem) Remove element *elem* from the set. Raises "KeyError" if *elem* is not contained in the set. discard(elem) Remove element *elem* from the set if it is present. pop() Remove and return an arbitrary element from the set. Raises "KeyError" if the set is empty. clear() Remove all elements from the set. Note, the non-operator versions of the "update()", "intersection_update()", "difference_update()", and "symmetric_difference_update()" methods will accept any iterable as an argument. Note, the *elem* argument to the "__contains__()", "remove()", and "discard()" methods may be a set. To support searching for an equivalent frozenset, a temporary one is created from *elem*. Mapping Types — "dict" ====================== A *mapping* object maps *hashable* values to arbitrary objects. Mappings are mutable objects. There is currently only one standard mapping type, the *dictionary*. (For other containers see the built- in "list", "set", and "tuple" classes, and the "collections" module.) A dictionary’s keys are *almost* arbitrary values. Values that are not *hashable*, that is, values containing lists, dictionaries or other mutable types (that are compared by value rather than by object identity) may not be used as keys. Numeric types used for keys obey the normal rules for numeric comparison: if two numbers compare equal (such as "1" and "1.0") then they can be used interchangeably to index the same dictionary entry. (Note however, that since computers store floating-point numbers as approximations it is usually unwise to use them as dictionary keys.) class dict(**kwargs) class dict(mapping, **kwargs) class dict(iterable, **kwargs) Return a new dictionary initialized from an optional positional argument and a possibly empty set of keyword arguments. Dictionaries can be created by several means: * Use a comma-separated list of "key: value" pairs within braces: "{'jack': 4098, 'sjoerd': 4127}" or "{4098: 'jack', 4127: 'sjoerd'}" * Use a dict comprehension: "{}", "{x: x ** 2 for x in range(10)}" * Use the type constructor: "dict()", "dict([('foo', 100), ('bar', 200)])", "dict(foo=100, bar=200)" If no positional argument is given, an empty dictionary is created. If a positional argument is given and it is a mapping object, a dictionary is created with the same key-value pairs as the mapping object. Otherwise, the positional argument must be an *iterable* object. Each item in the iterable must itself be an iterable with exactly two objects. The first object of each item becomes a key in the new dictionary, and the second object the corresponding value. If a key occurs more than once, the last value for that key becomes the corresponding value in the new dictionary. If keyword arguments are given, the keyword arguments and their values are added to the dictionary created from the positional argument. If a key being added is already present, the value from the keyword argument replaces the value from the positional argument. To illustrate, the following examples all return a dictionary equal to "{"one": 1, "two": 2, "three": 3}": >>> a = dict(one=1, two=2, three=3) >>> b = {'one': 1, 'two': 2, 'three': 3} >>> c = dict(zip(['one', 'two', 'three'], [1, 2, 3])) >>> d = dict([('two', 2), ('one', 1), ('three', 3)]) >>> e = dict({'three': 3, 'one': 1, 'two': 2}) >>> f = dict({'one': 1, 'three': 3}, two=2) >>> a == b == c == d == e == f True Providing keyword arguments as in the first example only works for keys that are valid Python identifiers. Otherwise, any valid keys can be used. These are the operations that dictionaries support (and therefore, custom mapping types should support too): list(d) Return a list of all the keys used in the dictionary *d*. len(d) Return the number of items in the dictionary *d*. d[key] Return the item of *d* with key *key*. Raises a "KeyError" if *key* is not in the map. If a subclass of dict defines a method "__missing__()" and *key* is not present, the "d[key]" operation calls that method with the key *key* as argument. The "d[key]" operation then returns or raises whatever is returned or raised by the "__missing__(key)" call. No other operations or methods invoke "__missing__()". If "__missing__()" is not defined, "KeyError" is raised. "__missing__()" must be a method; it cannot be an instance variable: >>> class Counter(dict): ... def __missing__(self, key): ... return 0 >>> c = Counter() >>> c['red'] 0 >>> c['red'] += 1 >>> c['red'] 1 The example above shows part of the implementation of "collections.Counter". A different "__missing__" method is used by "collections.defaultdict". d[key] = value Set "d[key]" to *value*. del d[key] Remove "d[key]" from *d*. Raises a "KeyError" if *key* is not in the map. key in d Return "True" if *d* has a key *key*, else "False". key not in d Equivalent to "not key in d". iter(d) Return an iterator over the keys of the dictionary. This is a shortcut for "iter(d.keys())". clear() Remove all items from the dictionary. copy() Return a shallow copy of the dictionary. classmethod fromkeys(iterable[, value]) Create a new dictionary with keys from *iterable* and values set to *value*. "fromkeys()" is a class method that returns a new dictionary. *value* defaults to "None". All of the values refer to just a single instance, so it generally doesn’t make sense for *value* to be a mutable object such as an empty list. To get distinct values, use a dict comprehension instead. get(key[, default]) Return the value for *key* if *key* is in the dictionary, else *default*. If *default* is not given, it defaults to "None", so that this method never raises a "KeyError". items() Return a new view of the dictionary’s items ("(key, value)" pairs). See the documentation of view objects. keys() Return a new view of the dictionary’s keys. See the documentation of view objects. pop(key[, default]) If *key* is in the dictionary, remove it and return its value, else return *default*. If *default* is not given and *key* is not in the dictionary, a "KeyError" is raised. popitem() Remove and return a "(key, value)" pair from the dictionary. Pairs are returned in LIFO (last-in, first-out) order. "popitem()" is useful to destructively iterate over a dictionary, as often used in set algorithms. If the dictionary is empty, calling "popitem()" raises a "KeyError". Changed in version 3.7: LIFO order is now guaranteed. In prior versions, "popitem()" would return an arbitrary key/value pair. reversed(d) Return a reverse iterator over the keys of the dictionary. This is a shortcut for "reversed(d.keys())". New in version 3.8. setdefault(key[, default]) If *key* is in the dictionary, return its value. If not, insert *key* with a value of *default* and return *default*. *default* defaults to "None". update([other]) Update the dictionary with the key/value pairs from *other*, overwriting existing keys. Return "None". "update()" accepts either another dictionary object or an iterable of key/value pairs (as tuples or other iterables of length two). If keyword arguments are specified, the dictionary is then updated with those key/value pairs: "d.update(red=1, blue=2)". values() Return a new view of the dictionary’s values. See the documentation of view objects. An equality comparison between one "dict.values()" view and another will always return "False". This also applies when comparing "dict.values()" to itself: >>> d = {'a': 1} >>> d.values() == d.values() False d | other Create a new dictionary with the merged keys and values of *d* and *other*, which must both be dictionaries. The values of *other* take priority when *d* and *other* share keys. New in version 3.9. d |= other Update the dictionary *d* with keys and values from *other*, which may be either a *mapping* or an *iterable* of key/value pairs. The values of *other* take priority when *d* and *other* share keys. New in version 3.9. Dictionaries compare equal if and only if they have the same "(key, value)" pairs (regardless of ordering). Order comparisons (‘<’, ‘<=’, ‘>=’, ‘>’) raise "TypeError". Dictionaries preserve insertion order. Note that updating a key does not affect the order. Keys added after deletion are inserted at the end. >>> d = {"one": 1, "two": 2, "three": 3, "four": 4} >>> d {'one': 1, 'two': 2, 'three': 3, 'four': 4} >>> list(d) ['one', 'two', 'three', 'four'] >>> list(d.values()) [1, 2, 3, 4] >>> d["one"] = 42 >>> d {'one': 42, 'two': 2, 'three': 3, 'four': 4} >>> del d["two"] >>> d["two"] = None >>> d {'one': 42, 'three': 3, 'four': 4, 'two': None} Changed in version 3.7: Dictionary order is guaranteed to be insertion order. This behavior was an implementation detail of CPython from 3.6. Dictionaries and dictionary views are reversible. >>> d = {"one": 1, "two": 2, "three": 3, "four": 4} >>> d {'one': 1, 'two': 2, 'three': 3, 'four': 4} >>> list(reversed(d)) ['four', 'three', 'two', 'one'] >>> list(reversed(d.values())) [4, 3, 2, 1] >>> list(reversed(d.items())) [('four', 4), ('three', 3), ('two', 2), ('one', 1)] Changed in version 3.8: Dictionaries are now reversible. See also: "types.MappingProxyType" can be used to create a read-only view of a "dict". Dictionary view objects ----------------------- The objects returned by "dict.keys()", "dict.values()" and "dict.items()" are *view objects*. They provide a dynamic view on the dictionary’s entries, which means that when the dictionary changes, the view reflects these changes. Dictionary views can be iterated over to yield their respective data, and support membership tests: len(dictview) Return the number of entries in the dictionary. iter(dictview) Return an iterator over the keys, values or items (represented as tuples of "(key, value)") in the dictionary. Keys and values are iterated over in insertion order. This allows the creation of "(value, key)" pairs using "zip()": "pairs = zip(d.values(), d.keys())". Another way to create the same list is "pairs = [(v, k) for (k, v) in d.items()]". Iterating views while adding or deleting entries in the dictionary may raise a "RuntimeError" or fail to iterate over all entries. Changed in version 3.7: Dictionary order is guaranteed to be insertion order. x in dictview Return "True" if *x* is in the underlying dictionary’s keys, values or items (in the latter case, *x* should be a "(key, value)" tuple). reversed(dictview) Return a reverse iterator over the keys, values or items of the dictionary. The view will be iterated in reverse order of the insertion. Changed in version 3.8: Dictionary views are now reversible. Keys views are set-like since their entries are unique and hashable. If all values are hashable, so that "(key, value)" pairs are unique and hashable, then the items view is also set-like. (Values views are not treated as set-like since the entries are generally not unique.) For set-like views, all of the operations defined for the abstract base class "collections.abc.Set" are available (for example, "==", "<", or "^"). An example of dictionary view usage: >>> dishes = {'eggs': 2, 'sausage': 1, 'bacon': 1, 'spam': 500} >>> keys = dishes.keys() >>> values = dishes.values() >>> # iteration >>> n = 0 >>> for val in values: ... n += val >>> print(n) 504 >>> # keys and values are iterated over in the same order (insertion order) >>> list(keys) ['eggs', 'sausage', 'bacon', 'spam'] >>> list(values) [2, 1, 1, 500] >>> # view objects are dynamic and reflect dict changes >>> del dishes['eggs'] >>> del dishes['sausage'] >>> list(keys) ['bacon', 'spam'] >>> # set operations >>> keys & {'eggs', 'bacon', 'salad'} {'bacon'} >>> keys ^ {'sausage', 'juice'} {'juice', 'sausage', 'bacon', 'spam'} Context Manager Types ===================== Python’s "with" statement supports the concept of a runtime context defined by a context manager. This is implemented using a pair of methods that allow user-defined classes to define a runtime context that is entered before the statement body is executed and exited when the statement ends: contextmanager.__enter__() Enter the runtime context and return either this object or another object related to the runtime context. The value returned by this method is bound to the identifier in the "as" clause of "with" statements using this context manager. An example of a context manager that returns itself is a *file object*. File objects return themselves from __enter__() to allow "open()" to be used as the context expression in a "with" statement. An example of a context manager that returns a related object is the one returned by "decimal.localcontext()". These managers set the active decimal context to a copy of the original decimal context and then return the copy. This allows changes to be made to the current decimal context in the body of the "with" statement without affecting code outside the "with" statement. contextmanager.__exit__(exc_type, exc_val, exc_tb) Exit the runtime context and return a Boolean flag indicating if any exception that occurred should be suppressed. If an exception occurred while executing the body of the "with" statement, the arguments contain the exception type, value and traceback information. Otherwise, all three arguments are "None". Returning a true value from this method will cause the "with" statement to suppress the exception and continue execution with the statement immediately following the "with" statement. Otherwise the exception continues propagating after this method has finished executing. Exceptions that occur during execution of this method will replace any exception that occurred in the body of the "with" statement. The exception passed in should never be reraised explicitly - instead, this method should return a false value to indicate that the method completed successfully and does not want to suppress the raised exception. This allows context management code to easily detect whether or not an "__exit__()" method has actually failed. Python defines several context managers to support easy thread synchronisation, prompt closure of files or other objects, and simpler manipulation of the active decimal arithmetic context. The specific types are not treated specially beyond their implementation of the context management protocol. See the "contextlib" module for some examples. Python’s *generator*s and the "contextlib.contextmanager" decorator provide a convenient way to implement these protocols. If a generator function is decorated with the "contextlib.contextmanager" decorator, it will return a context manager implementing the necessary "__enter__()" and "__exit__()" methods, rather than the iterator produced by an undecorated generator function. Note that there is no specific slot for any of these methods in the type structure for Python objects in the Python/C API. Extension types wanting to define these methods must provide them as a normal Python accessible method. Compared to the overhead of setting up the runtime context, the overhead of a single class dictionary lookup is negligible. Generic Alias Type ================== "GenericAlias" objects are generally created by subscripting a class. They are most often used with container classes, such as "list" or "dict". For example, "list[int]" is a "GenericAlias" object created by subscripting the "list" class with the argument "int". "GenericAlias" objects are intended primarily for use with *type annotations*. Note: It is generally only possible to subscript a class if the class implements the special method "__class_getitem__()". A "GenericAlias" object acts as a proxy for a *generic type*, implementing *parameterized generics*. For a container class, the argument(s) supplied to a subscription of the class may indicate the type(s) of the elements an object contains. For example, "set[bytes]" can be used in type annotations to signify a "set" in which all the elements are of type "bytes". For a class which defines "__class_getitem__()" but is not a container, the argument(s) supplied to a subscription of the class will often indicate the return type(s) of one or more methods defined on an object. For example, "regular expressions" can be used on both the "str" data type and the "bytes" data type: * If "x = re.search('foo', 'foo')", "x" will be a re.Match object where the return values of "x.group(0)" and "x[0]" will both be of type "str". We can represent this kind of object in type annotations with the "GenericAlias" "re.Match[str]". * If "y = re.search(b'bar', b'bar')", (note the "b" for "bytes"), "y" will also be an instance of "re.Match", but the return values of "y.group(0)" and "y[0]" will both be of type "bytes". In type annotations, we would represent this variety of re.Match objects with "re.Match[bytes]". "GenericAlias" objects are instances of the class "types.GenericAlias", which can also be used to create "GenericAlias" objects directly. T[X, Y, ...] Creates a "GenericAlias" representing a type "T" parameterized by types *X*, *Y*, and more depending on the "T" used. For example, a function expecting a "list" containing "float" elements: def average(values: list[float]) -> float: return sum(values) / len(values) Another example for *mapping* objects, using a "dict", which is a generic type expecting two type parameters representing the key type and the value type. In this example, the function expects a "dict" with keys of type "str" and values of type "int": def send_post_request(url: str, body: dict[str, int]) -> None: ... The builtin functions "isinstance()" and "issubclass()" do not accept "GenericAlias" types for their second argument: >>> isinstance([1, 2], list[str]) Traceback (most recent call last): File "", line 1, in TypeError: isinstance() argument 2 cannot be a parameterized generic The Python runtime does not enforce *type annotations*. This extends to generic types and their type parameters. When creating a container object from a "GenericAlias", the elements in the container are not checked against their type. For example, the following code is discouraged, but will run without errors: >>> t = list[str] >>> t([1, 2, 3]) [1, 2, 3] Furthermore, parameterized generics erase type parameters during object creation: >>> t = list[str] >>> type(t) >>> l = t() >>> type(l) Calling "repr()" or "str()" on a generic shows the parameterized type: >>> repr(list[int]) 'list[int]' >>> str(list[int]) 'list[int]' The "__getitem__()" method of generic containers will raise an exception to disallow mistakes like "dict[str][str]": >>> dict[str][str] Traceback (most recent call last): File "", line 1, in TypeError: There are no type variables left in dict[str] However, such expressions are valid when type variables are used. The index must have as many elements as there are type variable items in the "GenericAlias" object’s "__args__". >>> from typing import TypeVar >>> Y = TypeVar('Y') >>> dict[str, Y][int] dict[str, int] Standard Generic Classes ------------------------ The following standard library classes support parameterized generics. This list is non-exhaustive. * "tuple" * "list" * "dict" * "set" * "frozenset" * "type" * "collections.deque" * "collections.defaultdict" * "collections.OrderedDict" * "collections.Counter" * "collections.ChainMap" * "collections.abc.Awaitable" * "collections.abc.Coroutine" * "collections.abc.AsyncIterable" * "collections.abc.AsyncIterator" * "collections.abc.AsyncGenerator" * "collections.abc.Iterable" * "collections.abc.Iterator" * "collections.abc.Generator" * "collections.abc.Reversible" * "collections.abc.Container" * "collections.abc.Collection" * "collections.abc.Callable" * "collections.abc.Set" * "collections.abc.MutableSet" * "collections.abc.Mapping" * "collections.abc.MutableMapping" * "collections.abc.Sequence" * "collections.abc.MutableSequence" * "collections.abc.ByteString" * "collections.abc.MappingView" * "collections.abc.KeysView" * "collections.abc.ItemsView" * "collections.abc.ValuesView" * "contextlib.AbstractContextManager" * "contextlib.AbstractAsyncContextManager" * "dataclasses.Field" * "functools.cached_property" * "functools.partialmethod" * "os.PathLike" * "queue.LifoQueue" * "queue.Queue" * "queue.PriorityQueue" * "queue.SimpleQueue" * re.Pattern * re.Match * "shelve.BsdDbShelf" * "shelve.DbfilenameShelf" * "shelve.Shelf" * "types.MappingProxyType" * "weakref.WeakKeyDictionary" * "weakref.WeakMethod" * "weakref.WeakSet" * "weakref.WeakValueDictionary" Special Attributes of "GenericAlias" objects -------------------------------------------- All parameterized generics implement special read-only attributes. genericalias.__origin__ This attribute points at the non-parameterized generic class: >>> list[int].__origin__ genericalias.__args__ This attribute is a "tuple" (possibly of length 1) of generic types passed to the original "__class_getitem__()" of the generic class: >>> dict[str, list[int]].__args__ (, list[int]) genericalias.__parameters__ This attribute is a lazily computed tuple (possibly empty) of unique type variables found in "__args__": >>> from typing import TypeVar >>> T = TypeVar('T') >>> list[T].__parameters__ (~T,) See also: **PEP 484** - Type Hints Introducing Python’s framework for type annotations. **PEP 585** - Type Hinting Generics In Standard Collections Introducing the ability to natively parameterize standard-library classes, provided they implement the special class method "__class_getitem__()". Generics, user-defined generics and "typing.Generic" Documentation on how to implement generic classes that can be parameterized at runtime and understood by static type-checkers. New in version 3.9. Other Built-in Types ==================== The interpreter supports several other kinds of objects. Most of these support only one or two operations. Modules ------- The only special operation on a module is attribute access: "m.name", where *m* is a module and *name* accesses a name defined in *m*’s symbol table. Module attributes can be assigned to. (Note that the "import" statement is not, strictly speaking, an operation on a module object; "import foo" does not require a module object named *foo* to exist, rather it requires an (external) *definition* for a module named *foo* somewhere.) A special attribute of every module is "__dict__". This is the dictionary containing the module’s symbol table. Modifying this dictionary will actually change the module’s symbol table, but direct assignment to the "__dict__" attribute is not possible (you can write "m.__dict__['a'] = 1", which defines "m.a" to be "1", but you can’t write "m.__dict__ = {}"). Modifying "__dict__" directly is not recommended. Modules built into the interpreter are written like this: "". If loaded from a file, they are written as "". Classes and Class Instances --------------------------- See Objects, values and types and Class definitions for these. Functions --------- Function objects are created by function definitions. The only operation on a function object is to call it: "func(argument-list)". There are really two flavors of function objects: built-in functions and user-defined functions. Both support the same operation (to call the function), but the implementation is different, hence the different object types. See Function definitions for more information. Methods ------- Methods are functions that are called using the attribute notation. There are two flavors: built-in methods (such as "append()" on lists) and class instance methods. Built-in methods are described with the types that support them. If you access a method (a function defined in a class namespace) through an instance, you get a special object: a *bound method* (also called *instance method*) object. When called, it will add the "self" argument to the argument list. Bound methods have two special read- only attributes: "m.__self__" is the object on which the method operates, and "m.__func__" is the function implementing the method. Calling "m(arg-1, arg-2, ..., arg-n)" is completely equivalent to calling "m.__func__(m.__self__, arg-1, arg-2, ..., arg-n)". Like function objects, bound method objects support getting arbitrary attributes. However, since method attributes are actually stored on the underlying function object ("meth.__func__"), setting method attributes on bound methods is disallowed. Attempting to set an attribute on a method results in an "AttributeError" being raised. In order to set a method attribute, you need to explicitly set it on the underlying function object: >>> class C: ... def method(self): ... pass ... >>> c = C() >>> c.method.whoami = 'my name is method' # can't set on the method Traceback (most recent call last): File "", line 1, in AttributeError: 'method' object has no attribute 'whoami' >>> c.method.__func__.whoami = 'my name is method' >>> c.method.whoami 'my name is method' See The standard type hierarchy for more information. Code Objects ------------ Code objects are used by the implementation to represent “pseudo- compiled” executable Python code such as a function body. They differ from function objects because they don’t contain a reference to their global execution environment. Code objects are returned by the built- in "compile()" function and can be extracted from function objects through their "__code__" attribute. See also the "code" module. Accessing "__code__" raises an auditing event "object.__getattr__" with arguments "obj" and ""__code__"". A code object can be executed or evaluated by passing it (instead of a source string) to the "exec()" or "eval()" built-in functions. See The standard type hierarchy for more information. Type Objects ------------ Type objects represent the various object types. An object’s type is accessed by the built-in function "type()". There are no special operations on types. The standard module "types" defines names for all standard built-in types. Types are written like this: "". The Null Object --------------- This object is returned by functions that don’t explicitly return a value. It supports no special operations. There is exactly one null object, named "None" (a built-in name). "type(None)()" produces the same singleton. It is written as "None". The Ellipsis Object ------------------- This object is commonly used by slicing (see Slicings). It supports no special operations. There is exactly one ellipsis object, named "Ellipsis" (a built-in name). "type(Ellipsis)()" produces the "Ellipsis" singleton. It is written as "Ellipsis" or "...". The NotImplemented Object ------------------------- This object is returned from comparisons and binary operations when they are asked to operate on types they don’t support. See Comparisons for more information. There is exactly one "NotImplemented" object. "type(NotImplemented)()" produces the singleton instance. It is written as "NotImplemented". Boolean Values -------------- Boolean values are the two constant objects "False" and "True". They are used to represent truth values (although other values can also be considered false or true). In numeric contexts (for example when used as the argument to an arithmetic operator), they behave like the integers 0 and 1, respectively. The built-in function "bool()" can be used to convert any value to a Boolean, if the value can be interpreted as a truth value (see section Truth Value Testing above). They are written as "False" and "True", respectively. Internal Objects ---------------- See The standard type hierarchy for this information. It describes stack frame objects, traceback objects, and slice objects. Special Attributes ================== The implementation adds a few special read-only attributes to several object types, where they are relevant. Some of these are not reported by the "dir()" built-in function. object.__dict__ A dictionary or other mapping object used to store an object’s (writable) attributes. instance.__class__ The class to which a class instance belongs. class.__bases__ The tuple of base classes of a class object. definition.__name__ The name of the class, function, method, descriptor, or generator instance. definition.__qualname__ The *qualified name* of the class, function, method, descriptor, or generator instance. New in version 3.3. class.__mro__ This attribute is a tuple of classes that are considered when looking for base classes during method resolution. class.mro() This method can be overridden by a metaclass to customize the method resolution order for its instances. It is called at class instantiation, and its result is stored in "__mro__". class.__subclasses__() Each class keeps a list of weak references to its immediate subclasses. This method returns a list of all those references still alive. The list is in definition order. Example: >>> int.__subclasses__() [] Integer string conversion length limitation =========================================== CPython has a global limit for converting between "int" and "str" to mitigate denial of service attacks. This limit *only* applies to decimal or other non-power-of-two number bases. Hexadecimal, octal, and binary conversions are unlimited. The limit can be configured. The "int" type in CPython is an arbitrary length number stored in binary form (commonly known as a “bignum”). There exists no algorithm that can convert a string to a binary integer or a binary integer to a string in linear time, *unless* the base is a power of 2. Even the best known algorithms for base 10 have sub-quadratic complexity. Converting a large value such as "int('1' * 500_000)" can take over a second on a fast CPU. Limiting conversion size offers a practical way to avoid CVE-2020-10735. The limit is applied to the number of digit characters in the input or output string when a non-linear conversion algorithm would be involved. Underscores and the sign are not counted towards the limit. When an operation would exceed the limit, a "ValueError" is raised: >>> import sys >>> sys.set_int_max_str_digits(4300) # Illustrative, this is the default. >>> _ = int('2' * 5432) Traceback (most recent call last): ... ValueError: Exceeds the limit (4300) for integer string conversion: value has 5432 digits; use sys.set_int_max_str_digits() to increase the limit. >>> i = int('2' * 4300) >>> len(str(i)) 4300 >>> i_squared = i*i >>> len(str(i_squared)) Traceback (most recent call last): ... ValueError: Exceeds the limit (4300) for integer string conversion: value has 8599 digits; use sys.set_int_max_str_digits() to increase the limit. >>> len(hex(i_squared)) 7144 >>> assert int(hex(i_squared), base=16) == i*i # Hexadecimal is unlimited. The default limit is 4300 digits as provided in "sys.int_info.default_max_str_digits". The lowest limit that can be configured is 640 digits as provided in "sys.int_info.str_digits_check_threshold". Verification: >>> import sys >>> assert sys.int_info.default_max_str_digits == 4300, sys.int_info >>> assert sys.int_info.str_digits_check_threshold == 640, sys.int_info >>> msg = int('578966293710682886880994035146873798396722250538762761564' ... '9252925514383915483333812743580549779436104706260696366600' ... '571186405732').to_bytes(53, 'big') ... New in version 3.9.14. Affected APIs ------------- The limitation only applies to potentially slow conversions between "int" and "str" or "bytes": * "int(string)" with default base 10. * "int(string, base)" for all bases that are not a power of 2. * "str(integer)". * "repr(integer)". * any other string conversion to base 10, for example "f"{integer}"", ""{}".format(integer)", or "b"%d" % integer". The limitations do not apply to functions with a linear algorithm: * "int(string, base)" with base 2, 4, 8, 16, or 32. * "int.from_bytes()" and "int.to_bytes()". * "hex()", "oct()", "bin()". * Format Specification Mini-Language for hex, octal, and binary numbers. * "str" to "float". * "str" to "decimal.Decimal". Configuring the limit --------------------- Before Python starts up you can use an environment variable or an interpreter command line flag to configure the limit: * "PYTHONINTMAXSTRDIGITS", e.g. "PYTHONINTMAXSTRDIGITS=640 python3" to set the limit to 640 or "PYTHONINTMAXSTRDIGITS=0 python3" to disable the limitation. * "-X int_max_str_digits", e.g. "python3 -X int_max_str_digits=640" * "sys.flags.int_max_str_digits" contains the value of "PYTHONINTMAXSTRDIGITS" or "-X int_max_str_digits". If both the env var and the "-X" option are set, the "-X" option takes precedence. A value of *-1* indicates that both were unset, thus a value of "sys.int_info.default_max_str_digits" was used during initialization. From code, you can inspect the current limit and set a new one using these "sys" APIs: * "sys.get_int_max_str_digits()" and "sys.set_int_max_str_digits()" are a getter and setter for the interpreter-wide limit. Subinterpreters have their own limit. Information about the default and minimum can be found in "sys.int_info": * "sys.int_info.default_max_str_digits" is the compiled-in default limit. * "sys.int_info.str_digits_check_threshold" is the lowest accepted value for the limit (other than 0 which disables it). New in version 3.9.14. Caution: Setting a low limit *can* lead to problems. While rare, code exists that contains integer constants in decimal in their source that exceed the minimum threshold. A consequence of setting the limit is that Python source code containing decimal integer literals longer than the limit will encounter an error during parsing, usually at startup time or import time or even at installation time - anytime an up to date ".pyc" does not already exist for the code. A workaround for source that contains such large constants is to convert them to "0x" hexadecimal form as it has no limit.Test your application thoroughly if you use a low limit. Ensure your tests run with the limit set early via the environment or flag so that it applies during startup and even during any installation step that may invoke Python to precompile ".py" sources to ".pyc" files. Recommended configuration ------------------------- The default "sys.int_info.default_max_str_digits" is expected to be reasonable for most applications. If your application requires a different limit, set it from your main entry point using Python version agnostic code as these APIs were added in security patch releases in versions before 3.11. Example: >>> import sys >>> if hasattr(sys, "set_int_max_str_digits"): ... upper_bound = 68000 ... lower_bound = 4004 ... current_limit = sys.get_int_max_str_digits() ... if current_limit == 0 or current_limit > upper_bound: ... sys.set_int_max_str_digits(upper_bound) ... elif current_limit < lower_bound: ... sys.set_int_max_str_digits(lower_bound) If you need to disable it entirely, set it to "0". -[ Footnotes ]- [1] Additional information on these special methods may be found in the Python Reference Manual (Basic customization). [2] As a consequence, the list "[1, 2]" is considered equal to "[1.0, 2.0]", and similarly for tuples. [3] They must have since the parser can’t tell the type of the operands. [4] Cased characters are those with general category property being one of “Lu” (Letter, uppercase), “Ll” (Letter, lowercase), or “Lt” (Letter, titlecase). [5] To format only a tuple you should therefore provide a singleton tuple whose only element is the tuple to be formatted. "string" — Common string operations *********************************** **Source code:** Lib/string.py ====================================================================== See also: Text Sequence Type — str String Methods String constants ================ The constants defined in this module are: string.ascii_letters The concatenation of the "ascii_lowercase" and "ascii_uppercase" constants described below. This value is not locale-dependent. string.ascii_lowercase The lowercase letters "'abcdefghijklmnopqrstuvwxyz'". This value is not locale-dependent and will not change. string.ascii_uppercase The uppercase letters "'ABCDEFGHIJKLMNOPQRSTUVWXYZ'". This value is not locale-dependent and will not change. string.digits The string "'0123456789'". string.hexdigits The string "'0123456789abcdefABCDEF'". string.octdigits The string "'01234567'". string.punctuation String of ASCII characters which are considered punctuation characters in the "C" locale: "!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~". string.printable String of ASCII characters which are considered printable. This is a combination of "digits", "ascii_letters", "punctuation", and "whitespace". string.whitespace A string containing all ASCII characters that are considered whitespace. This includes the characters space, tab, linefeed, return, formfeed, and vertical tab. Custom String Formatting ======================== The built-in string class provides the ability to do complex variable substitutions and value formatting via the "format()" method described in **PEP 3101**. The "Formatter" class in the "string" module allows you to create and customize your own string formatting behaviors using the same implementation as the built-in "format()" method. class string.Formatter The "Formatter" class has the following public methods: format(format_string, /, *args, **kwargs) The primary API method. It takes a format string and an arbitrary set of positional and keyword arguments. It is just a wrapper that calls "vformat()". Changed in version 3.7: A format string argument is now positional-only. vformat(format_string, args, kwargs) This function does the actual work of formatting. It is exposed as a separate function for cases where you want to pass in a predefined dictionary of arguments, rather than unpacking and repacking the dictionary as individual arguments using the "*args" and "**kwargs" syntax. "vformat()" does the work of breaking up the format string into character data and replacement fields. It calls the various methods described below. In addition, the "Formatter" defines a number of methods that are intended to be replaced by subclasses: parse(format_string) Loop over the format_string and return an iterable of tuples (*literal_text*, *field_name*, *format_spec*, *conversion*). This is used by "vformat()" to break the string into either literal text, or replacement fields. The values in the tuple conceptually represent a span of literal text followed by a single replacement field. If there is no literal text (which can happen if two replacement fields occur consecutively), then *literal_text* will be a zero-length string. If there is no replacement field, then the values of *field_name*, *format_spec* and *conversion* will be "None". get_field(field_name, args, kwargs) Given *field_name* as returned by "parse()" (see above), convert it to an object to be formatted. Returns a tuple (obj, used_key). The default version takes strings of the form defined in **PEP 3101**, such as “0[name]” or “label.title”. *args* and *kwargs* are as passed in to "vformat()". The return value *used_key* has the same meaning as the *key* parameter to "get_value()". get_value(key, args, kwargs) Retrieve a given field value. The *key* argument will be either an integer or a string. If it is an integer, it represents the index of the positional argument in *args*; if it is a string, then it represents a named argument in *kwargs*. The *args* parameter is set to the list of positional arguments to "vformat()", and the *kwargs* parameter is set to the dictionary of keyword arguments. For compound field names, these functions are only called for the first component of the field name; subsequent components are handled through normal attribute and indexing operations. So for example, the field expression ‘0.name’ would cause "get_value()" to be called with a *key* argument of 0. The "name" attribute will be looked up after "get_value()" returns by calling the built-in "getattr()" function. If the index or keyword refers to an item that does not exist, then an "IndexError" or "KeyError" should be raised. check_unused_args(used_args, args, kwargs) Implement checking for unused arguments if desired. The arguments to this function is the set of all argument keys that were actually referred to in the format string (integers for positional arguments, and strings for named arguments), and a reference to the *args* and *kwargs* that was passed to vformat. The set of unused args can be calculated from these parameters. "check_unused_args()" is assumed to raise an exception if the check fails. format_field(value, format_spec) "format_field()" simply calls the global "format()" built-in. The method is provided so that subclasses can override it. convert_field(value, conversion) Converts the value (returned by "get_field()") given a conversion type (as in the tuple returned by the "parse()" method). The default version understands ‘s’ (str), ‘r’ (repr) and ‘a’ (ascii) conversion types. Format String Syntax ==================== The "str.format()" method and the "Formatter" class share the same syntax for format strings (although in the case of "Formatter", subclasses can define their own format string syntax). The syntax is related to that of formatted string literals, but it is less sophisticated and, in particular, does not support arbitrary expressions. Format strings contain “replacement fields” surrounded by curly braces "{}". Anything that is not contained in braces is considered literal text, which is copied unchanged to the output. If you need to include a brace character in the literal text, it can be escaped by doubling: "{{" and "}}". The grammar for a replacement field is as follows: replacement_field ::= "{" [field_name] ["!" conversion] [":" format_spec] "}" field_name ::= arg_name ("." attribute_name | "[" element_index "]")* arg_name ::= [identifier | digit+] attribute_name ::= identifier element_index ::= digit+ | index_string index_string ::= + conversion ::= "r" | "s" | "a" format_spec ::= In less formal terms, the replacement field can start with a *field_name* that specifies the object whose value is to be formatted and inserted into the output instead of the replacement field. The *field_name* is optionally followed by a *conversion* field, which is preceded by an exclamation point "'!'", and a *format_spec*, which is preceded by a colon "':'". These specify a non-default format for the replacement value. See also the Format Specification Mini-Language section. The *field_name* itself begins with an *arg_name* that is either a number or a keyword. If it’s a number, it refers to a positional argument, and if it’s a keyword, it refers to a named keyword argument. If the numerical arg_names in a format string are 0, 1, 2, … in sequence, they can all be omitted (not just some) and the numbers 0, 1, 2, … will be automatically inserted in that order. Because *arg_name* is not quote-delimited, it is not possible to specify arbitrary dictionary keys (e.g., the strings "'10'" or "':-]'") within a format string. The *arg_name* can be followed by any number of index or attribute expressions. An expression of the form "'.name'" selects the named attribute using "getattr()", while an expression of the form "'[index]'" does an index lookup using "__getitem__()". Changed in version 3.1: The positional argument specifiers can be omitted for "str.format()", so "'{} {}'.format(a, b)" is equivalent to "'{0} {1}'.format(a, b)". Changed in version 3.4: The positional argument specifiers can be omitted for "Formatter". Some simple format string examples: "First, thou shalt count to {0}" # References first positional argument "Bring me a {}" # Implicitly references the first positional argument "From {} to {}" # Same as "From {0} to {1}" "My quest is {name}" # References keyword argument 'name' "Weight in tons {0.weight}" # 'weight' attribute of first positional arg "Units destroyed: {players[0]}" # First element of keyword argument 'players'. The *conversion* field causes a type coercion before formatting. Normally, the job of formatting a value is done by the "__format__()" method of the value itself. However, in some cases it is desirable to force a type to be formatted as a string, overriding its own definition of formatting. By converting the value to a string before calling "__format__()", the normal formatting logic is bypassed. Three conversion flags are currently supported: "'!s'" which calls "str()" on the value, "'!r'" which calls "repr()" and "'!a'" which calls "ascii()". Some examples: "Harold's a clever {0!s}" # Calls str() on the argument first "Bring out the holy {name!r}" # Calls repr() on the argument first "More {!a}" # Calls ascii() on the argument first The *format_spec* field contains a specification of how the value should be presented, including such details as field width, alignment, padding, decimal precision and so on. Each value type can define its own “formatting mini-language” or interpretation of the *format_spec*. Most built-in types support a common formatting mini-language, which is described in the next section. A *format_spec* field can also include nested replacement fields within it. These nested replacement fields may contain a field name, conversion flag and format specification, but deeper nesting is not allowed. The replacement fields within the format_spec are substituted before the *format_spec* string is interpreted. This allows the formatting of a value to be dynamically specified. See the Format examples section for some examples. Format Specification Mini-Language ---------------------------------- “Format specifications” are used within replacement fields contained within a format string to define how individual values are presented (see Format String Syntax and Formatted string literals). They can also be passed directly to the built-in "format()" function. Each formattable type may define how the format specification is to be interpreted. Most built-in types implement the following options for format specifications, although some of the formatting options are only supported by the numeric types. A general convention is that an empty format specification produces the same result as if you had called "str()" on the value. A non-empty format specification typically modifies the result. The general form of a *standard format specifier* is: format_spec ::= [[fill]align][sign][#][0][width][grouping_option][.precision][type] fill ::= align ::= "<" | ">" | "=" | "^" sign ::= "+" | "-" | " " width ::= digit+ grouping_option ::= "_" | "," precision ::= digit+ type ::= "b" | "c" | "d" | "e" | "E" | "f" | "F" | "g" | "G" | "n" | "o" | "s" | "x" | "X" | "%" If a valid *align* value is specified, it can be preceded by a *fill* character that can be any character and defaults to a space if omitted. It is not possible to use a literal curly brace (”"{"” or “"}"”) as the *fill* character in a formatted string literal or when using the "str.format()" method. However, it is possible to insert a curly brace with a nested replacement field. This limitation doesn’t affect the "format()" function. The meaning of the various alignment options is as follows: +-----------+------------------------------------------------------------+ | Option | Meaning | |===========|============================================================| | "'<'" | Forces the field to be left-aligned within the available | | | space (this is the default for most objects). | +-----------+------------------------------------------------------------+ | "'>'" | Forces the field to be right-aligned within the available | | | space (this is the default for numbers). | +-----------+------------------------------------------------------------+ | "'='" | Forces the padding to be placed after the sign (if any) | | | but before the digits. This is used for printing fields | | | in the form ‘+000000120’. This alignment option is only | | | valid for numeric types. It becomes the default when ‘0’ | | | immediately precedes the field width. | +-----------+------------------------------------------------------------+ | "'^'" | Forces the field to be centered within the available | | | space. | +-----------+------------------------------------------------------------+ Note that unless a minimum field width is defined, the field width will always be the same size as the data to fill it, so that the alignment option has no meaning in this case. The *sign* option is only valid for number types, and can be one of the following: +-----------+------------------------------------------------------------+ | Option | Meaning | |===========|============================================================| | "'+'" | indicates that a sign should be used for both positive as | | | well as negative numbers. | +-----------+------------------------------------------------------------+ | "'-'" | indicates that a sign should be used only for negative | | | numbers (this is the default behavior). | +-----------+------------------------------------------------------------+ | space | indicates that a leading space should be used on positive | | | numbers, and a minus sign on negative numbers. | +-----------+------------------------------------------------------------+ The "'#'" option causes the “alternate form” to be used for the conversion. The alternate form is defined differently for different types. This option is only valid for integer, float and complex types. For integers, when binary, octal, or hexadecimal output is used, this option adds the respective prefix "'0b'", "'0o'", "'0x'", or "'0X'" to the output value. For float and complex the alternate form causes the result of the conversion to always contain a decimal- point character, even if no digits follow it. Normally, a decimal- point character appears in the result of these conversions only if a digit follows it. In addition, for "'g'" and "'G'" conversions, trailing zeros are not removed from the result. The "','" option signals the use of a comma for a thousands separator. For a locale aware separator, use the "'n'" integer presentation type instead. Changed in version 3.1: Added the "','" option (see also **PEP 378**). The "'_'" option signals the use of an underscore for a thousands separator for floating point presentation types and for integer presentation type "'d'". For integer presentation types "'b'", "'o'", "'x'", and "'X'", underscores will be inserted every 4 digits. For other presentation types, specifying this option is an error. Changed in version 3.6: Added the "'_'" option (see also **PEP 515**). *width* is a decimal integer defining the minimum total field width, including any prefixes, separators, and other formatting characters. If not specified, then the field width will be determined by the content. When no explicit alignment is given, preceding the *width* field by a zero ("'0'") character enables sign-aware zero-padding for numeric types. This is equivalent to a *fill* character of "'0'" with an *alignment* type of "'='". The *precision* is a decimal integer indicating how many digits should be displayed after the decimal point for presentation types "'f'" and "'F'", or before and after the decimal point for presentation types "'g'" or "'G'". For string presentation types the field indicates the maximum field size - in other words, how many characters will be used from the field content. The *precision* is not allowed for integer presentation types. Finally, the *type* determines how the data should be presented. The available string presentation types are: +-----------+------------------------------------------------------------+ | Type | Meaning | |===========|============================================================| | "'s'" | String format. This is the default type for strings and | | | may be omitted. | +-----------+------------------------------------------------------------+ | None | The same as "'s'". | +-----------+------------------------------------------------------------+ The available integer presentation types are: +-----------+------------------------------------------------------------+ | Type | Meaning | |===========|============================================================| | "'b'" | Binary format. Outputs the number in base 2. | +-----------+------------------------------------------------------------+ | "'c'" | Character. Converts the integer to the corresponding | | | unicode character before printing. | +-----------+------------------------------------------------------------+ | "'d'" | Decimal Integer. Outputs the number in base 10. | +-----------+------------------------------------------------------------+ | "'o'" | Octal format. Outputs the number in base 8. | +-----------+------------------------------------------------------------+ | "'x'" | Hex format. Outputs the number in base 16, using lower- | | | case letters for the digits above 9. | +-----------+------------------------------------------------------------+ | "'X'" | Hex format. Outputs the number in base 16, using upper- | | | case letters for the digits above 9. In case "'#'" is | | | specified, the prefix "'0x'" will be upper-cased to "'0X'" | | | as well. | +-----------+------------------------------------------------------------+ | "'n'" | Number. This is the same as "'d'", except that it uses the | | | current locale setting to insert the appropriate number | | | separator characters. | +-----------+------------------------------------------------------------+ | None | The same as "'d'". | +-----------+------------------------------------------------------------+ In addition to the above presentation types, integers can be formatted with the floating point presentation types listed below (except "'n'" and "None"). When doing so, "float()" is used to convert the integer to a floating point number before formatting. The available presentation types for "float" and "Decimal" values are: +-----------+------------------------------------------------------------+ | Type | Meaning | |===========|============================================================| | "'e'" | Scientific notation. For a given precision "p", formats | | | the number in scientific notation with the letter ‘e’ | | | separating the coefficient from the exponent. The | | | coefficient has one digit before and "p" digits after the | | | decimal point, for a total of "p + 1" significant digits. | | | With no precision given, uses a precision of "6" digits | | | after the decimal point for "float", and shows all | | | coefficient digits for "Decimal". If no digits follow the | | | decimal point, the decimal point is also removed unless | | | the "#" option is used. | +-----------+------------------------------------------------------------+ | "'E'" | Scientific notation. Same as "'e'" except it uses an upper | | | case ‘E’ as the separator character. | +-----------+------------------------------------------------------------+ | "'f'" | Fixed-point notation. For a given precision "p", formats | | | the number as a decimal number with exactly "p" digits | | | following the decimal point. With no precision given, uses | | | a precision of "6" digits after the decimal point for | | | "float", and uses a precision large enough to show all | | | coefficient digits for "Decimal". If no digits follow the | | | decimal point, the decimal point is also removed unless | | | the "#" option is used. | +-----------+------------------------------------------------------------+ | "'F'" | Fixed-point notation. Same as "'f'", but converts "nan" to | | | "NAN" and "inf" to "INF". | +-----------+------------------------------------------------------------+ | "'g'" | General format. For a given precision "p >= 1", this | | | rounds the number to "p" significant digits and then | | | formats the result in either fixed-point format or in | | | scientific notation, depending on its magnitude. A | | | precision of "0" is treated as equivalent to a precision | | | of "1". The precise rules are as follows: suppose that | | | the result formatted with presentation type "'e'" and | | | precision "p-1" would have exponent "exp". Then, if "m <= | | | exp < p", where "m" is -4 for floats and -6 for | | | "Decimals", the number is formatted with presentation type | | | "'f'" and precision "p-1-exp". Otherwise, the number is | | | formatted with presentation type "'e'" and precision | | | "p-1". In both cases insignificant trailing zeros are | | | removed from the significand, and the decimal point is | | | also removed if there are no remaining digits following | | | it, unless the "'#'" option is used. With no precision | | | given, uses a precision of "6" significant digits for | | | "float". For "Decimal", the coefficient of the result is | | | formed from the coefficient digits of the value; | | | scientific notation is used for values smaller than "1e-6" | | | in absolute value and values where the place value of the | | | least significant digit is larger than 1, and fixed-point | | | notation is used otherwise. Positive and negative | | | infinity, positive and negative zero, and nans, are | | | formatted as "inf", "-inf", "0", "-0" and "nan" | | | respectively, regardless of the precision. | +-----------+------------------------------------------------------------+ | "'G'" | General format. Same as "'g'" except switches to "'E'" if | | | the number gets too large. The representations of infinity | | | and NaN are uppercased, too. | +-----------+------------------------------------------------------------+ | "'n'" | Number. This is the same as "'g'", except that it uses the | | | current locale setting to insert the appropriate number | | | separator characters. | +-----------+------------------------------------------------------------+ | "'%'" | Percentage. Multiplies the number by 100 and displays in | | | fixed ("'f'") format, followed by a percent sign. | +-----------+------------------------------------------------------------+ | None | For "float" this is the same as "'g'", except that when | | | fixed-point notation is used to format the result, it | | | always includes at least one digit past the decimal point. | | | The precision used is as large as needed to represent the | | | given value faithfully. For "Decimal", this is the same | | | as either "'g'" or "'G'" depending on the value of | | | "context.capitals" for the current decimal context. The | | | overall effect is to match the output of "str()" as | | | altered by the other format modifiers. | +-----------+------------------------------------------------------------+ Format examples --------------- This section contains examples of the "str.format()" syntax and comparison with the old "%"-formatting. In most of the cases the syntax is similar to the old "%"-formatting, with the addition of the "{}" and with ":" used instead of "%". For example, "'%03.2f'" can be translated to "'{:03.2f}'". The new format syntax also supports new and different options, shown in the following examples. Accessing arguments by position: >>> '{0}, {1}, {2}'.format('a', 'b', 'c') 'a, b, c' >>> '{}, {}, {}'.format('a', 'b', 'c') # 3.1+ only 'a, b, c' >>> '{2}, {1}, {0}'.format('a', 'b', 'c') 'c, b, a' >>> '{2}, {1}, {0}'.format(*'abc') # unpacking argument sequence 'c, b, a' >>> '{0}{1}{0}'.format('abra', 'cad') # arguments' indices can be repeated 'abracadabra' Accessing arguments by name: >>> 'Coordinates: {latitude}, {longitude}'.format(latitude='37.24N', longitude='-115.81W') 'Coordinates: 37.24N, -115.81W' >>> coord = {'latitude': '37.24N', 'longitude': '-115.81W'} >>> 'Coordinates: {latitude}, {longitude}'.format(**coord) 'Coordinates: 37.24N, -115.81W' Accessing arguments’ attributes: >>> c = 3-5j >>> ('The complex number {0} is formed from the real part {0.real} ' ... 'and the imaginary part {0.imag}.').format(c) 'The complex number (3-5j) is formed from the real part 3.0 and the imaginary part -5.0.' >>> class Point: ... def __init__(self, x, y): ... self.x, self.y = x, y ... def __str__(self): ... return 'Point({self.x}, {self.y})'.format(self=self) ... >>> str(Point(4, 2)) 'Point(4, 2)' Accessing arguments’ items: >>> coord = (3, 5) >>> 'X: {0[0]}; Y: {0[1]}'.format(coord) 'X: 3; Y: 5' Replacing "%s" and "%r": >>> "repr() shows quotes: {!r}; str() doesn't: {!s}".format('test1', 'test2') "repr() shows quotes: 'test1'; str() doesn't: test2" Aligning the text and specifying a width: >>> '{:<30}'.format('left aligned') 'left aligned ' >>> '{:>30}'.format('right aligned') ' right aligned' >>> '{:^30}'.format('centered') ' centered ' >>> '{:*^30}'.format('centered') # use '*' as a fill char '***********centered***********' Replacing "%+f", "%-f", and "% f" and specifying a sign: >>> '{:+f}; {:+f}'.format(3.14, -3.14) # show it always '+3.140000; -3.140000' >>> '{: f}; {: f}'.format(3.14, -3.14) # show a space for positive numbers ' 3.140000; -3.140000' >>> '{:-f}; {:-f}'.format(3.14, -3.14) # show only the minus -- same as '{:f}; {:f}' '3.140000; -3.140000' Replacing "%x" and "%o" and converting the value to different bases: >>> # format also supports binary numbers >>> "int: {0:d}; hex: {0:x}; oct: {0:o}; bin: {0:b}".format(42) 'int: 42; hex: 2a; oct: 52; bin: 101010' >>> # with 0x, 0o, or 0b as prefix: >>> "int: {0:d}; hex: {0:#x}; oct: {0:#o}; bin: {0:#b}".format(42) 'int: 42; hex: 0x2a; oct: 0o52; bin: 0b101010' Using the comma as a thousands separator: >>> '{:,}'.format(1234567890) '1,234,567,890' Expressing a percentage: >>> points = 19 >>> total = 22 >>> 'Correct answers: {:.2%}'.format(points/total) 'Correct answers: 86.36%' Using type-specific formatting: >>> import datetime >>> d = datetime.datetime(2010, 7, 4, 12, 15, 58) >>> '{:%Y-%m-%d %H:%M:%S}'.format(d) '2010-07-04 12:15:58' Nesting arguments and more complex examples: >>> for align, text in zip('<^>', ['left', 'center', 'right']): ... '{0:{fill}{align}16}'.format(text, fill=align, align=align) ... 'left<<<<<<<<<<<<' '^^^^^center^^^^^' '>>>>>>>>>>>right' >>> >>> octets = [192, 168, 0, 1] >>> '{:02X}{:02X}{:02X}{:02X}'.format(*octets) 'C0A80001' >>> int(_, 16) 3232235521 >>> >>> width = 5 >>> for num in range(5,12): ... for base in 'dXob': ... print('{0:{width}{base}}'.format(num, base=base, width=width), end=' ') ... print() ... 5 5 5 101 6 6 6 110 7 7 7 111 8 8 10 1000 9 9 11 1001 10 A 12 1010 11 B 13 1011 Template strings ================ Template strings provide simpler string substitutions as described in **PEP 292**. A primary use case for template strings is for internationalization (i18n) since in that context, the simpler syntax and functionality makes it easier to translate than other built-in string formatting facilities in Python. As an example of a library built on template strings for i18n, see the flufl.i18n package. Template strings support "$"-based substitutions, using the following rules: * "$$" is an escape; it is replaced with a single "$". * "$identifier" names a substitution placeholder matching a mapping key of ""identifier"". By default, ""identifier"" is restricted to any case-insensitive ASCII alphanumeric string (including underscores) that starts with an underscore or ASCII letter. The first non-identifier character after the "$" character terminates this placeholder specification. * "${identifier}" is equivalent to "$identifier". It is required when valid identifier characters follow the placeholder but are not part of the placeholder, such as ""${noun}ification"". Any other appearance of "$" in the string will result in a "ValueError" being raised. The "string" module provides a "Template" class that implements these rules. The methods of "Template" are: class string.Template(template) The constructor takes a single argument which is the template string. substitute(mapping={}, /, **kwds) Performs the template substitution, returning a new string. *mapping* is any dictionary-like object with keys that match the placeholders in the template. Alternatively, you can provide keyword arguments, where the keywords are the placeholders. When both *mapping* and *kwds* are given and there are duplicates, the placeholders from *kwds* take precedence. safe_substitute(mapping={}, /, **kwds) Like "substitute()", except that if placeholders are missing from *mapping* and *kwds*, instead of raising a "KeyError" exception, the original placeholder will appear in the resulting string intact. Also, unlike with "substitute()", any other appearances of the "$" will simply return "$" instead of raising "ValueError". While other exceptions may still occur, this method is called “safe” because it always tries to return a usable string instead of raising an exception. In another sense, "safe_substitute()" may be anything other than safe, since it will silently ignore malformed templates containing dangling delimiters, unmatched braces, or placeholders that are not valid Python identifiers. "Template" instances also provide one public data attribute: template This is the object passed to the constructor’s *template* argument. In general, you shouldn’t change it, but read-only access is not enforced. Here is an example of how to use a Template: >>> from string import Template >>> s = Template('$who likes $what') >>> s.substitute(who='tim', what='kung pao') 'tim likes kung pao' >>> d = dict(who='tim') >>> Template('Give $who $100').substitute(d) Traceback (most recent call last): ... ValueError: Invalid placeholder in string: line 1, col 11 >>> Template('$who likes $what').substitute(d) Traceback (most recent call last): ... KeyError: 'what' >>> Template('$who likes $what').safe_substitute(d) 'tim likes $what' Advanced usage: you can derive subclasses of "Template" to customize the placeholder syntax, delimiter character, or the entire regular expression used to parse template strings. To do this, you can override these class attributes: * *delimiter* – This is the literal string describing a placeholder introducing delimiter. The default value is "$". Note that this should *not* be a regular expression, as the implementation will call "re.escape()" on this string as needed. Note further that you cannot change the delimiter after class creation (i.e. a different delimiter must be set in the subclass’s class namespace). * *idpattern* – This is the regular expression describing the pattern for non-braced placeholders. The default value is the regular expression "(?a:[_a-z][_a-z0-9]*)". If this is given and *braceidpattern* is "None" this pattern will also apply to braced placeholders. Note: Since default *flags* is "re.IGNORECASE", pattern "[a-z]" can match with some non-ASCII characters. That’s why we use the local "a" flag here. Changed in version 3.7: *braceidpattern* can be used to define separate patterns used inside and outside the braces. * *braceidpattern* – This is like *idpattern* but describes the pattern for braced placeholders. Defaults to "None" which means to fall back to *idpattern* (i.e. the same pattern is used both inside and outside braces). If given, this allows you to define different patterns for braced and unbraced placeholders. New in version 3.7. * *flags* – The regular expression flags that will be applied when compiling the regular expression used for recognizing substitutions. The default value is "re.IGNORECASE". Note that "re.VERBOSE" will always be added to the flags, so custom *idpattern*s must follow conventions for verbose regular expressions. New in version 3.2. Alternatively, you can provide the entire regular expression pattern by overriding the class attribute *pattern*. If you do this, the value must be a regular expression object with four named capturing groups. The capturing groups correspond to the rules given above, along with the invalid placeholder rule: * *escaped* – This group matches the escape sequence, e.g. "$$", in the default pattern. * *named* – This group matches the unbraced placeholder name; it should not include the delimiter in capturing group. * *braced* – This group matches the brace enclosed placeholder name; it should not include either the delimiter or braces in the capturing group. * *invalid* – This group matches any other delimiter pattern (usually a single delimiter), and it should appear last in the regular expression. Helper functions ================ string.capwords(s, sep=None) Split the argument into words using "str.split()", capitalize each word using "str.capitalize()", and join the capitalized words using "str.join()". If the optional second argument *sep* is absent or "None", runs of whitespace characters are replaced by a single space and leading and trailing whitespace are removed, otherwise *sep* is used to split and join the words. "stringprep" — Internet String Preparation ****************************************** **Source code:** Lib/stringprep.py ====================================================================== When identifying things (such as host names) in the internet, it is often necessary to compare such identifications for “equality”. Exactly how this comparison is executed may depend on the application domain, e.g. whether it should be case-insensitive or not. It may be also necessary to restrict the possible identifications, to allow only identifications consisting of “printable” characters. **RFC 3454** defines a procedure for “preparing” Unicode strings in internet protocols. Before passing strings onto the wire, they are processed with the preparation procedure, after which they have a certain normalized form. The RFC defines a set of tables, which can be combined into profiles. Each profile must define which tables it uses, and what other optional parts of the "stringprep" procedure are part of the profile. One example of a "stringprep" profile is "nameprep", which is used for internationalized domain names. The module "stringprep" only exposes the tables from **RFC 3454**. As these tables would be very large to represent them as dictionaries or lists, the module uses the Unicode character database internally. The module source code itself was generated using the "mkstringprep.py" utility. As a result, these tables are exposed as functions, not as data structures. There are two kinds of tables in the RFC: sets and mappings. For a set, "stringprep" provides the “characteristic function”, i.e. a function that returns "True" if the parameter is part of the set. For mappings, it provides the mapping function: given the key, it returns the associated value. Below is a list of all functions available in the module. stringprep.in_table_a1(code) Determine whether *code* is in tableA.1 (Unassigned code points in Unicode 3.2). stringprep.in_table_b1(code) Determine whether *code* is in tableB.1 (Commonly mapped to nothing). stringprep.map_table_b2(code) Return the mapped value for *code* according to tableB.2 (Mapping for case-folding used with NFKC). stringprep.map_table_b3(code) Return the mapped value for *code* according to tableB.3 (Mapping for case-folding used with no normalization). stringprep.in_table_c11(code) Determine whether *code* is in tableC.1.1 (ASCII space characters). stringprep.in_table_c12(code) Determine whether *code* is in tableC.1.2 (Non-ASCII space characters). stringprep.in_table_c11_c12(code) Determine whether *code* is in tableC.1 (Space characters, union of C.1.1 and C.1.2). stringprep.in_table_c21(code) Determine whether *code* is in tableC.2.1 (ASCII control characters). stringprep.in_table_c22(code) Determine whether *code* is in tableC.2.2 (Non-ASCII control characters). stringprep.in_table_c21_c22(code) Determine whether *code* is in tableC.2 (Control characters, union of C.2.1 and C.2.2). stringprep.in_table_c3(code) Determine whether *code* is in tableC.3 (Private use). stringprep.in_table_c4(code) Determine whether *code* is in tableC.4 (Non-character code points). stringprep.in_table_c5(code) Determine whether *code* is in tableC.5 (Surrogate codes). stringprep.in_table_c6(code) Determine whether *code* is in tableC.6 (Inappropriate for plain text). stringprep.in_table_c7(code) Determine whether *code* is in tableC.7 (Inappropriate for canonical representation). stringprep.in_table_c8(code) Determine whether *code* is in tableC.8 (Change display properties or are deprecated). stringprep.in_table_c9(code) Determine whether *code* is in tableC.9 (Tagging characters). stringprep.in_table_d1(code) Determine whether *code* is in tableD.1 (Characters with bidirectional property “R” or “AL”). stringprep.in_table_d2(code) Determine whether *code* is in tableD.2 (Characters with bidirectional property “L”). "struct" — Interpret bytes as packed binary data ************************************************ **Source code:** Lib/struct.py ====================================================================== This module performs conversions between Python values and C structs represented as Python "bytes" objects. This can be used in handling binary data stored in files or from network connections, among other sources. It uses Format Strings as compact descriptions of the layout of the C structs and the intended conversion to/from Python values. Note: By default, the result of packing a given C struct includes pad bytes in order to maintain proper alignment for the C types involved; similarly, alignment is taken into account when unpacking. This behavior is chosen so that the bytes of a packed struct correspond exactly to the layout in memory of the corresponding C struct. To handle platform-independent data formats or omit implicit pad bytes, use "standard" size and alignment instead of "native" size and alignment: see Byte Order, Size, and Alignment for details. Several "struct" functions (and methods of "Struct") take a *buffer* argument. This refers to objects that implement the Buffer Protocol and provide either a readable or read-writable buffer. The most common types used for that purpose are "bytes" and "bytearray", but many other types that can be viewed as an array of bytes implement the buffer protocol, so that they can be read/filled without additional copying from a "bytes" object. Functions and Exceptions ======================== The module defines the following exception and functions: exception struct.error Exception raised on various occasions; argument is a string describing what is wrong. struct.pack(format, v1, v2, ...) Return a bytes object containing the values *v1*, *v2*, … packed according to the format string *format*. The arguments must match the values required by the format exactly. struct.pack_into(format, buffer, offset, v1, v2, ...) Pack the values *v1*, *v2*, … according to the format string *format* and write the packed bytes into the writable buffer *buffer* starting at position *offset*. Note that *offset* is a required argument. struct.unpack(format, buffer) Unpack from the buffer *buffer* (presumably packed by "pack(format, ...)") according to the format string *format*. The result is a tuple even if it contains exactly one item. The buffer’s size in bytes must match the size required by the format, as reflected by "calcsize()". struct.unpack_from(format, /, buffer, offset=0) Unpack from *buffer* starting at position *offset*, according to the format string *format*. The result is a tuple even if it contains exactly one item. The buffer’s size in bytes, starting at position *offset*, must be at least the size required by the format, as reflected by "calcsize()". struct.iter_unpack(format, buffer) Iteratively unpack from the buffer *buffer* according to the format string *format*. This function returns an iterator which will read equally-sized chunks from the buffer until all its contents have been consumed. The buffer’s size in bytes must be a multiple of the size required by the format, as reflected by "calcsize()". Each iteration yields a tuple as specified by the format string. New in version 3.4. struct.calcsize(format) Return the size of the struct (and hence of the bytes object produced by "pack(format, ...)") corresponding to the format string *format*. Format Strings ============== Format strings are the mechanism used to specify the expected layout when packing and unpacking data. They are built up from Format Characters, which specify the type of data being packed/unpacked. In addition, there are special characters for controlling the Byte Order, Size, and Alignment. Byte Order, Size, and Alignment ------------------------------- By default, C types are represented in the machine’s native format and byte order, and properly aligned by skipping pad bytes if necessary (according to the rules used by the C compiler). Alternatively, the first character of the format string can be used to indicate the byte order, size and alignment of the packed data, according to the following table: +-------------+--------------------------+------------+-------------+ | Character | Byte order | Size | Alignment | |=============|==========================|============|=============| | "@" | native | native | native | +-------------+--------------------------+------------+-------------+ | "=" | native | standard | none | +-------------+--------------------------+------------+-------------+ | "<" | little-endian | standard | none | +-------------+--------------------------+------------+-------------+ | ">" | big-endian | standard | none | +-------------+--------------------------+------------+-------------+ | "!" | network (= big-endian) | standard | none | +-------------+--------------------------+------------+-------------+ If the first character is not one of these, "'@'" is assumed. Native byte order is big-endian or little-endian, depending on the host system. For example, Intel x86 and AMD64 (x86-64) are little- endian; Motorola 68000 and PowerPC G5 are big-endian; ARM and Intel Itanium feature switchable endianness (bi-endian). Use "sys.byteorder" to check the endianness of your system. Native size and alignment are determined using the C compiler’s "sizeof" expression. This is always combined with native byte order. Standard size depends only on the format character; see the table in the Format Characters section. Note the difference between "'@'" and "'='": both use native byte order, but the size and alignment of the latter is standardized. The form "'!'" represents the network byte order which is always big- endian as defined in IETF RFC 1700. There is no way to indicate non-native byte order (force byte- swapping); use the appropriate choice of "'<'" or "'>'". Notes: 1. Padding is only automatically added between successive structure members. No padding is added at the beginning or the end of the encoded struct. 2. No padding is added when using non-native size and alignment, e.g. with ‘<’, ‘>’, ‘=’, and ‘!’. 3. To align the end of a structure to the alignment requirement of a particular type, end the format with the code for that type with a repeat count of zero. See Examples. Format Characters ----------------- Format characters have the following meaning; the conversion between C and Python values should be obvious given their types. The ‘Standard size’ column refers to the size of the packed value in bytes when using standard size; that is, when the format string starts with one of "'<'", "'>'", "'!'" or "'='". When using native size, the size of the packed value is platform-dependent. +----------+----------------------------+----------------------+------------------+--------------+ | Format | C Type | Python type | Standard size | Notes | |==========|============================|======================|==================|==============| | "x" | pad byte | no value | | | +----------+----------------------------+----------------------+------------------+--------------+ | "c" | "char" | bytes of length 1 | 1 | | +----------+----------------------------+----------------------+------------------+--------------+ | "b" | "signed char" | integer | 1 | (1), (2) | +----------+----------------------------+----------------------+------------------+--------------+ | "B" | "unsigned char" | integer | 1 | (2) | +----------+----------------------------+----------------------+------------------+--------------+ | "?" | "_Bool" | bool | 1 | (1) | +----------+----------------------------+----------------------+------------------+--------------+ | "h" | "short" | integer | 2 | (2) | +----------+----------------------------+----------------------+------------------+--------------+ | "H" | "unsigned short" | integer | 2 | (2) | +----------+----------------------------+----------------------+------------------+--------------+ | "i" | "int" | integer | 4 | (2) | +----------+----------------------------+----------------------+------------------+--------------+ | "I" | "unsigned int" | integer | 4 | (2) | +----------+----------------------------+----------------------+------------------+--------------+ | "l" | "long" | integer | 4 | (2) | +----------+----------------------------+----------------------+------------------+--------------+ | "L" | "unsigned long" | integer | 4 | (2) | +----------+----------------------------+----------------------+------------------+--------------+ | "q" | "long long" | integer | 8 | (2) | +----------+----------------------------+----------------------+------------------+--------------+ | "Q" | "unsigned long long" | integer | 8 | (2) | +----------+----------------------------+----------------------+------------------+--------------+ | "n" | "ssize_t" | integer | | (3) | +----------+----------------------------+----------------------+------------------+--------------+ | "N" | "size_t" | integer | | (3) | +----------+----------------------------+----------------------+------------------+--------------+ | "e" | (6) | float | 2 | (4) | +----------+----------------------------+----------------------+------------------+--------------+ | "f" | "float" | float | 4 | (4) | +----------+----------------------------+----------------------+------------------+--------------+ | "d" | "double" | float | 8 | (4) | +----------+----------------------------+----------------------+------------------+--------------+ | "s" | "char[]" | bytes | | | +----------+----------------------------+----------------------+------------------+--------------+ | "p" | "char[]" | bytes | | | +----------+----------------------------+----------------------+------------------+--------------+ | "P" | "void *" | integer | | (5) | +----------+----------------------------+----------------------+------------------+--------------+ Changed in version 3.3: Added support for the "'n'" and "'N'" formats. Changed in version 3.6: Added support for the "'e'" format. Notes: 1. The "'?'" conversion code corresponds to the "_Bool" type defined by C99. If this type is not available, it is simulated using a "char". In standard mode, it is always represented by one byte. 2. When attempting to pack a non-integer using any of the integer conversion codes, if the non-integer has a "__index__()" method then that method is called to convert the argument to an integer before packing. Changed in version 3.2: Added use of the "__index__()" method for non-integers. 3. The "'n'" and "'N'" conversion codes are only available for the native size (selected as the default or with the "'@'" byte order character). For the standard size, you can use whichever of the other integer formats fits your application. 4. For the "'f'", "'d'" and "'e'" conversion codes, the packed representation uses the IEEE 754 binary32, binary64 or binary16 format (for "'f'", "'d'" or "'e'" respectively), regardless of the floating-point format used by the platform. 5. The "'P'" format character is only available for the native byte ordering (selected as the default or with the "'@'" byte order character). The byte order character "'='" chooses to use little- or big-endian ordering based on the host system. The struct module does not interpret this as native ordering, so the "'P'" format is not available. 6. The IEEE 754 binary16 “half precision” type was introduced in the 2008 revision of the IEEE 754 standard. It has a sign bit, a 5-bit exponent and 11-bit precision (with 10 bits explicitly stored), and can represent numbers between approximately "6.1e-05" and "6.5e+04" at full precision. This type is not widely supported by C compilers: on a typical machine, an unsigned short can be used for storage, but not for math operations. See the Wikipedia page on the half-precision floating-point format for more information. A format character may be preceded by an integral repeat count. For example, the format string "'4h'" means exactly the same as "'hhhh'". Whitespace characters between formats are ignored; a count and its format must not contain whitespace though. For the "'s'" format character, the count is interpreted as the length of the bytes, not a repeat count like for the other format characters; for example, "'10s'" means a single 10-byte string, while "'10c'" means 10 characters. If a count is not given, it defaults to 1. For packing, the string is truncated or padded with null bytes as appropriate to make it fit. For unpacking, the resulting bytes object always has exactly the specified number of bytes. As a special case, "'0s'" means a single, empty string (while "'0c'" means 0 characters). When packing a value "x" using one of the integer formats ("'b'", "'B'", "'h'", "'H'", "'i'", "'I'", "'l'", "'L'", "'q'", "'Q'"), if "x" is outside the valid range for that format then "struct.error" is raised. Changed in version 3.1: Previously, some of the integer formats wrapped out-of-range values and raised "DeprecationWarning" instead of "struct.error". The "'p'" format character encodes a “Pascal string”, meaning a short variable-length string stored in a *fixed number of bytes*, given by the count. The first byte stored is the length of the string, or 255, whichever is smaller. The bytes of the string follow. If the string passed in to "pack()" is too long (longer than the count minus 1), only the leading "count-1" bytes of the string are stored. If the string is shorter than "count-1", it is padded with null bytes so that exactly count bytes in all are used. Note that for "unpack()", the "'p'" format character consumes "count" bytes, but that the string returned can never contain more than 255 bytes. For the "'?'" format character, the return value is either "True" or "False". When packing, the truth value of the argument object is used. Either 0 or 1 in the native or standard bool representation will be packed, and any non-zero value will be "True" when unpacking. Examples -------- Note: All examples assume a native byte order, size, and alignment with a big-endian machine. A basic example of packing/unpacking three integers: >>> from struct import * >>> pack('hhl', 1, 2, 3) b'\x00\x01\x00\x02\x00\x00\x00\x03' >>> unpack('hhl', b'\x00\x01\x00\x02\x00\x00\x00\x03') (1, 2, 3) >>> calcsize('hhl') 8 Unpacked fields can be named by assigning them to variables or by wrapping the result in a named tuple: >>> record = b'raymond \x32\x12\x08\x01\x08' >>> name, serialnum, school, gradelevel = unpack('<10sHHb', record) >>> from collections import namedtuple >>> Student = namedtuple('Student', 'name serialnum school gradelevel') >>> Student._make(unpack('<10sHHb', record)) Student(name=b'raymond ', serialnum=4658, school=264, gradelevel=8) The ordering of format characters may have an impact on size since the padding needed to satisfy alignment requirements is different: >>> pack('ci', b'*', 0x12131415) b'*\x00\x00\x00\x12\x13\x14\x15' >>> pack('ic', 0x12131415, b'*') b'\x12\x13\x14\x15*' >>> calcsize('ci') 8 >>> calcsize('ic') 5 The following format "'llh0l'" specifies two pad bytes at the end, assuming longs are aligned on 4-byte boundaries: >>> pack('llh0l', 1, 2, 3) b'\x00\x00\x00\x01\x00\x00\x00\x02\x00\x03\x00\x00' This only works when native size and alignment are in effect; standard size and alignment does not enforce any alignment. See also: Module "array" Packed binary storage of homogeneous data. Module "xdrlib" Packing and unpacking of XDR data. Classes ======= The "struct" module also defines the following type: class struct.Struct(format) Return a new Struct object which writes and reads binary data according to the format string *format*. Creating a Struct object once and calling its methods is more efficient than calling the "struct" functions with the same format since the format string only needs to be compiled once. Note: The compiled versions of the most recent format strings passed to "Struct" and the module-level functions are cached, so programs that use only a few format strings needn’t worry about reusing a single "Struct" instance. Compiled Struct objects support the following methods and attributes: pack(v1, v2, ...) Identical to the "pack()" function, using the compiled format. ("len(result)" will equal "size".) pack_into(buffer, offset, v1, v2, ...) Identical to the "pack_into()" function, using the compiled format. unpack(buffer) Identical to the "unpack()" function, using the compiled format. The buffer’s size in bytes must equal "size". unpack_from(buffer, offset=0) Identical to the "unpack_from()" function, using the compiled format. The buffer’s size in bytes, starting at position *offset*, must be at least "size". iter_unpack(buffer) Identical to the "iter_unpack()" function, using the compiled format. The buffer’s size in bytes must be a multiple of "size". New in version 3.4. format The format string used to construct this Struct object. Changed in version 3.7: The format string type is now "str" instead of "bytes". size The calculated size of the struct (and hence of the bytes object produced by the "pack()" method) corresponding to "format". "subprocess" — Subprocess management ************************************ **Source code:** Lib/subprocess.py ====================================================================== The "subprocess" module allows you to spawn new processes, connect to their input/output/error pipes, and obtain their return codes. This module intends to replace several older modules and functions: os.system os.spawn* Information about how the "subprocess" module can be used to replace these modules and functions can be found in the following sections. See also: **PEP 324** – PEP proposing the subprocess module Using the "subprocess" Module ============================= The recommended approach to invoking subprocesses is to use the "run()" function for all use cases it can handle. For more advanced use cases, the underlying "Popen" interface can be used directly. The "run()" function was added in Python 3.5; if you need to retain compatibility with older versions, see the Older high-level API section. subprocess.run(args, *, stdin=None, input=None, stdout=None, stderr=None, capture_output=False, shell=False, cwd=None, timeout=None, check=False, encoding=None, errors=None, text=None, env=None, universal_newlines=None, **other_popen_kwargs) Run the command described by *args*. Wait for command to complete, then return a "CompletedProcess" instance. The arguments shown above are merely the most common ones, described below in Frequently Used Arguments (hence the use of keyword-only notation in the abbreviated signature). The full function signature is largely the same as that of the "Popen" constructor - most of the arguments to this function are passed through to that interface. (*timeout*, *input*, *check*, and *capture_output* are not.) If *capture_output* is true, stdout and stderr will be captured. When used, the internal "Popen" object is automatically created with "stdout=PIPE" and "stderr=PIPE". The *stdout* and *stderr* arguments may not be supplied at the same time as *capture_output*. If you wish to capture and combine both streams into one, use "stdout=PIPE" and "stderr=STDOUT" instead of *capture_output*. The *timeout* argument is passed to "Popen.communicate()". If the timeout expires, the child process will be killed and waited for. The "TimeoutExpired" exception will be re-raised after the child process has terminated. The *input* argument is passed to "Popen.communicate()" and thus to the subprocess’s stdin. If used it must be a byte sequence, or a string if *encoding* or *errors* is specified or *text* is true. When used, the internal "Popen" object is automatically created with "stdin=PIPE", and the *stdin* argument may not be used as well. If *check* is true, and the process exits with a non-zero exit code, a "CalledProcessError" exception will be raised. Attributes of that exception hold the arguments, the exit code, and stdout and stderr if they were captured. If *encoding* or *errors* are specified, or *text* is true, file objects for stdin, stdout and stderr are opened in text mode using the specified *encoding* and *errors* or the "io.TextIOWrapper" default. The *universal_newlines* argument is equivalent to *text* and is provided for backwards compatibility. By default, file objects are opened in binary mode. If *env* is not "None", it must be a mapping that defines the environment variables for the new process; these are used instead of the default behavior of inheriting the current process’ environment. It is passed directly to "Popen". Examples: >>> subprocess.run(["ls", "-l"]) # doesn't capture output CompletedProcess(args=['ls', '-l'], returncode=0) >>> subprocess.run("exit 1", shell=True, check=True) Traceback (most recent call last): ... subprocess.CalledProcessError: Command 'exit 1' returned non-zero exit status 1 >>> subprocess.run(["ls", "-l", "/dev/null"], capture_output=True) CompletedProcess(args=['ls', '-l', '/dev/null'], returncode=0, stdout=b'crw-rw-rw- 1 root root 1, 3 Jan 23 16:23 /dev/null\n', stderr=b'') New in version 3.5. Changed in version 3.6: Added *encoding* and *errors* parameters Changed in version 3.7: Added the *text* parameter, as a more understandable alias of *universal_newlines*. Added the *capture_output* parameter. Changed in version 3.9.17: Changed Windows shell search order for "shell=True". The current directory and "%PATH%" are replaced with "%COMSPEC%" and "%SystemRoot%\System32\cmd.exe". As a result, dropping a malicious program named "cmd.exe" into a current directory no longer works. class subprocess.CompletedProcess The return value from "run()", representing a process that has finished. args The arguments used to launch the process. This may be a list or a string. returncode Exit status of the child process. Typically, an exit status of 0 indicates that it ran successfully. A negative value "-N" indicates that the child was terminated by signal "N" (POSIX only). stdout Captured stdout from the child process. A bytes sequence, or a string if "run()" was called with an encoding, errors, or text=True. "None" if stdout was not captured. If you ran the process with "stderr=subprocess.STDOUT", stdout and stderr will be combined in this attribute, and "stderr" will be "None". stderr Captured stderr from the child process. A bytes sequence, or a string if "run()" was called with an encoding, errors, or text=True. "None" if stderr was not captured. check_returncode() If "returncode" is non-zero, raise a "CalledProcessError". New in version 3.5. subprocess.DEVNULL Special value that can be used as the *stdin*, *stdout* or *stderr* argument to "Popen" and indicates that the special file "os.devnull" will be used. New in version 3.3. subprocess.PIPE Special value that can be used as the *stdin*, *stdout* or *stderr* argument to "Popen" and indicates that a pipe to the standard stream should be opened. Most useful with "Popen.communicate()". subprocess.STDOUT Special value that can be used as the *stderr* argument to "Popen" and indicates that standard error should go into the same handle as standard output. exception subprocess.SubprocessError Base class for all other exceptions from this module. New in version 3.3. exception subprocess.TimeoutExpired Subclass of "SubprocessError", raised when a timeout expires while waiting for a child process. cmd Command that was used to spawn the child process. timeout Timeout in seconds. output Output of the child process if it was captured by "run()" or "check_output()". Otherwise, "None". This is always "bytes" when any output was captured regardless of the "text=True" setting. It may remain "None" instead of "b''" when no output was observed. stdout Alias for output, for symmetry with "stderr". stderr Stderr output of the child process if it was captured by "run()". Otherwise, "None". This is always "bytes" when stderr output was captured regardless of the "text=True" setting. It may remain "None" instead of "b''" when no stderr output was observed. New in version 3.3. Changed in version 3.5: *stdout* and *stderr* attributes added exception subprocess.CalledProcessError Subclass of "SubprocessError", raised when a process run by "check_call()", "check_output()", or "run()" (with "check=True") returns a non-zero exit status. returncode Exit status of the child process. If the process exited due to a signal, this will be the negative signal number. cmd Command that was used to spawn the child process. output Output of the child process if it was captured by "run()" or "check_output()". Otherwise, "None". stdout Alias for output, for symmetry with "stderr". stderr Stderr output of the child process if it was captured by "run()". Otherwise, "None". Changed in version 3.5: *stdout* and *stderr* attributes added Frequently Used Arguments ------------------------- To support a wide variety of use cases, the "Popen" constructor (and the convenience functions) accept a large number of optional arguments. For most typical use cases, many of these arguments can be safely left at their default values. The arguments that are most commonly needed are: *args* is required for all calls and should be a string, or a sequence of program arguments. Providing a sequence of arguments is generally preferred, as it allows the module to take care of any required escaping and quoting of arguments (e.g. to permit spaces in file names). If passing a single string, either *shell* must be "True" (see below) or else the string must simply name the program to be executed without specifying any arguments. *stdin*, *stdout* and *stderr* specify the executed program’s standard input, standard output and standard error file handles, respectively. Valid values are "PIPE", "DEVNULL", an existing file descriptor (a positive integer), an existing file object with a valid file descriptor, and "None". "PIPE" indicates that a new pipe to the child should be created. "DEVNULL" indicates that the special file "os.devnull" will be used. With the default settings of "None", no redirection will occur; the child’s file handles will be inherited from the parent. Additionally, *stderr* can be "STDOUT", which indicates that the stderr data from the child process should be captured into the same file handle as for *stdout*. If *encoding* or *errors* are specified, or *text* (also known as *universal_newlines*) is true, the file objects *stdin*, *stdout* and *stderr* will be opened in text mode using the *encoding* and *errors* specified in the call or the defaults for "io.TextIOWrapper". For *stdin*, line ending characters "'\n'" in the input will be converted to the default line separator "os.linesep". For *stdout* and *stderr*, all line endings in the output will be converted to "'\n'". For more information see the documentation of the "io.TextIOWrapper" class when the *newline* argument to its constructor is "None". If text mode is not used, *stdin*, *stdout* and *stderr* will be opened as binary streams. No encoding or line ending conversion is performed. New in version 3.6: Added *encoding* and *errors* parameters. New in version 3.7: Added the *text* parameter as an alias for *universal_newlines*. Note: The newlines attribute of the file objects "Popen.stdin", "Popen.stdout" and "Popen.stderr" are not updated by the "Popen.communicate()" method. If *shell* is "True", the specified command will be executed through the shell. This can be useful if you are using Python primarily for the enhanced control flow it offers over most system shells and still want convenient access to other shell features such as shell pipes, filename wildcards, environment variable expansion, and expansion of "~" to a user’s home directory. However, note that Python itself offers implementations of many shell-like features (in particular, "glob", "fnmatch", "os.walk()", "os.path.expandvars()", "os.path.expanduser()", and "shutil"). Changed in version 3.3: When *universal_newlines* is "True", the class uses the encoding "locale.getpreferredencoding(False)" instead of "locale.getpreferredencoding()". See the "io.TextIOWrapper" class for more information on this change. Note: Read the Security Considerations section before using "shell=True". These options, along with all of the other options, are described in more detail in the "Popen" constructor documentation. Popen Constructor ----------------- The underlying process creation and management in this module is handled by the "Popen" class. It offers a lot of flexibility so that developers are able to handle the less common cases not covered by the convenience functions. class subprocess.Popen(args, bufsize=-1, executable=None, stdin=None, stdout=None, stderr=None, preexec_fn=None, close_fds=True, shell=False, cwd=None, env=None, universal_newlines=None, startupinfo=None, creationflags=0, restore_signals=True, start_new_session=False, pass_fds=(), *, group=None, extra_groups=None, user=None, umask=-1, encoding=None, errors=None, text=None) Execute a child program in a new process. On POSIX, the class uses "os.execvp()"-like behavior to execute the child program. On Windows, the class uses the Windows "CreateProcess()" function. The arguments to "Popen" are as follows. *args* should be a sequence of program arguments or else a single string or *path-like object*. By default, the program to execute is the first item in *args* if *args* is a sequence. If *args* is a string, the interpretation is platform-dependent and described below. See the *shell* and *executable* arguments for additional differences from the default behavior. Unless otherwise stated, it is recommended to pass *args* as a sequence. An example of passing some arguments to an external program as a sequence is: Popen(["/usr/bin/git", "commit", "-m", "Fixes a bug."]) On POSIX, if *args* is a string, the string is interpreted as the name or path of the program to execute. However, this can only be done if not passing arguments to the program. Note: It may not be obvious how to break a shell command into a sequence of arguments, especially in complex cases. "shlex.split()" can illustrate how to determine the correct tokenization for *args*: >>> import shlex, subprocess >>> command_line = input() /bin/vikings -input eggs.txt -output "spam spam.txt" -cmd "echo '$MONEY'" >>> args = shlex.split(command_line) >>> print(args) ['/bin/vikings', '-input', 'eggs.txt', '-output', 'spam spam.txt', '-cmd', "echo '$MONEY'"] >>> p = subprocess.Popen(args) # Success! Note in particular that options (such as *-input*) and arguments (such as *eggs.txt*) that are separated by whitespace in the shell go in separate list elements, while arguments that need quoting or backslash escaping when used in the shell (such as filenames containing spaces or the *echo* command shown above) are single list elements. On Windows, if *args* is a sequence, it will be converted to a string in a manner described in Converting an argument sequence to a string on Windows. This is because the underlying "CreateProcess()" operates on strings. Changed in version 3.6: *args* parameter accepts a *path-like object* if *shell* is "False" and a sequence containing path-like objects on POSIX. Changed in version 3.8: *args* parameter accepts a *path-like object* if *shell* is "False" and a sequence containing bytes and path-like objects on Windows. The *shell* argument (which defaults to "False") specifies whether to use the shell as the program to execute. If *shell* is "True", it is recommended to pass *args* as a string rather than as a sequence. On POSIX with "shell=True", the shell defaults to "/bin/sh". If *args* is a string, the string specifies the command to execute through the shell. This means that the string must be formatted exactly as it would be when typed at the shell prompt. This includes, for example, quoting or backslash escaping filenames with spaces in them. If *args* is a sequence, the first item specifies the command string, and any additional items will be treated as additional arguments to the shell itself. That is to say, "Popen" does the equivalent of: Popen(['/bin/sh', '-c', args[0], args[1], ...]) On Windows with "shell=True", the "COMSPEC" environment variable specifies the default shell. The only time you need to specify "shell=True" on Windows is when the command you wish to execute is built into the shell (e.g. **dir** or **copy**). You do not need "shell=True" to run a batch file or console-based executable. Note: Read the Security Considerations section before using "shell=True". *bufsize* will be supplied as the corresponding argument to the "open()" function when creating the stdin/stdout/stderr pipe file objects: * "0" means unbuffered (read and write are one system call and can return short) * "1" means line buffered (only usable if "universal_newlines=True" i.e., in a text mode) * any other positive value means use a buffer of approximately that size * negative bufsize (the default) means the system default of io.DEFAULT_BUFFER_SIZE will be used. Changed in version 3.3.1: *bufsize* now defaults to -1 to enable buffering by default to match the behavior that most code expects. In versions prior to Python 3.2.4 and 3.3.1 it incorrectly defaulted to "0" which was unbuffered and allowed short reads. This was unintentional and did not match the behavior of Python 2 as most code expected. The *executable* argument specifies a replacement program to execute. It is very seldom needed. When "shell=False", *executable* replaces the program to execute specified by *args*. However, the original *args* is still passed to the program. Most programs treat the program specified by *args* as the command name, which can then be different from the program actually executed. On POSIX, the *args* name becomes the display name for the executable in utilities such as **ps**. If "shell=True", on POSIX the *executable* argument specifies a replacement shell for the default "/bin/sh". Changed in version 3.6: *executable* parameter accepts a *path-like object* on POSIX. Changed in version 3.8: *executable* parameter accepts a bytes and *path-like object* on Windows. Changed in version 3.9.17: Changed Windows shell search order for "shell=True". The current directory and "%PATH%" are replaced with "%COMSPEC%" and "%SystemRoot%\System32\cmd.exe". As a result, dropping a malicious program named "cmd.exe" into a current directory no longer works. *stdin*, *stdout* and *stderr* specify the executed program’s standard input, standard output and standard error file handles, respectively. Valid values are "PIPE", "DEVNULL", an existing file descriptor (a positive integer), an existing *file object* with a valid file descriptor, and "None". "PIPE" indicates that a new pipe to the child should be created. "DEVNULL" indicates that the special file "os.devnull" will be used. With the default settings of "None", no redirection will occur; the child’s file handles will be inherited from the parent. Additionally, *stderr* can be "STDOUT", which indicates that the stderr data from the applications should be captured into the same file handle as for stdout. If *preexec_fn* is set to a callable object, this object will be called in the child process just before the child is executed. (POSIX only) Warning: The *preexec_fn* parameter is not safe to use in the presence of threads in your application. The child process could deadlock before exec is called. If you must use it, keep it trivial! Minimize the number of libraries you call into. Note: If you need to modify the environment for the child use the *env* parameter rather than doing it in a *preexec_fn*. The *start_new_session* parameter can take the place of a previously common use of *preexec_fn* to call os.setsid() in the child. Changed in version 3.8: The *preexec_fn* parameter is no longer supported in subinterpreters. The use of the parameter in a subinterpreter raises "RuntimeError". The new restriction may affect applications that are deployed in mod_wsgi, uWSGI, and other embedded environments. If *close_fds* is true, all file descriptors except "0", "1" and "2" will be closed before the child process is executed. Otherwise when *close_fds* is false, file descriptors obey their inheritable flag as described in Inheritance of File Descriptors. On Windows, if *close_fds* is true then no handles will be inherited by the child process unless explicitly passed in the "handle_list" element of "STARTUPINFO.lpAttributeList", or by standard handle redirection. Changed in version 3.2: The default for *close_fds* was changed from "False" to what is described above. Changed in version 3.7: On Windows the default for *close_fds* was changed from "False" to "True" when redirecting the standard handles. It’s now possible to set *close_fds* to "True" when redirecting the standard handles. *pass_fds* is an optional sequence of file descriptors to keep open between the parent and child. Providing any *pass_fds* forces *close_fds* to be "True". (POSIX only) Changed in version 3.2: The *pass_fds* parameter was added. If *cwd* is not "None", the function changes the working directory to *cwd* before executing the child. *cwd* can be a string, bytes or *path-like* object. In particular, the function looks for *executable* (or for the first item in *args*) relative to *cwd* if the executable path is a relative path. Changed in version 3.6: *cwd* parameter accepts a *path-like object* on POSIX. Changed in version 3.7: *cwd* parameter accepts a *path-like object* on Windows. Changed in version 3.8: *cwd* parameter accepts a bytes object on Windows. If *restore_signals* is true (the default) all signals that Python has set to SIG_IGN are restored to SIG_DFL in the child process before the exec. Currently this includes the SIGPIPE, SIGXFZ and SIGXFSZ signals. (POSIX only) Changed in version 3.2: *restore_signals* was added. If *start_new_session* is true the setsid() system call will be made in the child process prior to the execution of the subprocess. (POSIX only) Changed in version 3.2: *start_new_session* was added. If *group* is not "None", the setregid() system call will be made in the child process prior to the execution of the subprocess. If the provided value is a string, it will be looked up via "grp.getgrnam()" and the value in "gr_gid" will be used. If the value is an integer, it will be passed verbatim. (POSIX only) Availability: POSIX New in version 3.9. If *extra_groups* is not "None", the setgroups() system call will be made in the child process prior to the execution of the subprocess. Strings provided in *extra_groups* will be looked up via "grp.getgrnam()" and the values in "gr_gid" will be used. Integer values will be passed verbatim. (POSIX only) Availability: POSIX New in version 3.9. If *user* is not "None", the setreuid() system call will be made in the child process prior to the execution of the subprocess. If the provided value is a string, it will be looked up via "pwd.getpwnam()" and the value in "pw_uid" will be used. If the value is an integer, it will be passed verbatim. (POSIX only) Availability: POSIX New in version 3.9. If *umask* is not negative, the umask() system call will be made in the child process prior to the execution of the subprocess. Availability: POSIX New in version 3.9. If *env* is not "None", it must be a mapping that defines the environment variables for the new process; these are used instead of the default behavior of inheriting the current process’ environment. Note: If specified, *env* must provide any variables required for the program to execute. On Windows, in order to run a side-by-side assembly the specified *env* **must** include a valid "SystemRoot". If *encoding* or *errors* are specified, or *text* is true, the file objects *stdin*, *stdout* and *stderr* are opened in text mode with the specified encoding and *errors*, as described above in Frequently Used Arguments. The *universal_newlines* argument is equivalent to *text* and is provided for backwards compatibility. By default, file objects are opened in binary mode. New in version 3.6: *encoding* and *errors* were added. New in version 3.7: *text* was added as a more readable alias for *universal_newlines*. If given, *startupinfo* will be a "STARTUPINFO" object, which is passed to the underlying "CreateProcess" function. *creationflags*, if given, can be one or more of the following flags: * "CREATE_NEW_CONSOLE" * "CREATE_NEW_PROCESS_GROUP" * "ABOVE_NORMAL_PRIORITY_CLASS" * "BELOW_NORMAL_PRIORITY_CLASS" * "HIGH_PRIORITY_CLASS" * "IDLE_PRIORITY_CLASS" * "NORMAL_PRIORITY_CLASS" * "REALTIME_PRIORITY_CLASS" * "CREATE_NO_WINDOW" * "DETACHED_PROCESS" * "CREATE_DEFAULT_ERROR_MODE" * "CREATE_BREAKAWAY_FROM_JOB" Popen objects are supported as context managers via the "with" statement: on exit, standard file descriptors are closed, and the process is waited for. with Popen(["ifconfig"], stdout=PIPE) as proc: log.write(proc.stdout.read()) Popen and the other functions in this module that use it raise an auditing event "subprocess.Popen" with arguments "executable", "args", "cwd", and "env". The value for "args" may be a single string or a list of strings, depending on platform. Changed in version 3.2: Added context manager support. Changed in version 3.6: Popen destructor now emits a "ResourceWarning" warning if the child process is still running. Changed in version 3.8: Popen can use "os.posix_spawn()" in some cases for better performance. On Windows Subsystem for Linux and QEMU User Emulation, Popen constructor using "os.posix_spawn()" no longer raise an exception on errors like missing program, but the child process fails with a non-zero "returncode". Exceptions ---------- Exceptions raised in the child process, before the new program has started to execute, will be re-raised in the parent. The most common exception raised is "OSError". This occurs, for example, when trying to execute a non-existent file. Applications should prepare for "OSError" exceptions. Note that, when "shell=True", "OSError" will be raised by the child only if the selected shell itself was not found. To determine if the shell failed to find the requested application, it is necessary to check the return code or output from the subprocess. A "ValueError" will be raised if "Popen" is called with invalid arguments. "check_call()" and "check_output()" will raise "CalledProcessError" if the called process returns a non-zero return code. All of the functions and methods that accept a *timeout* parameter, such as "call()" and "Popen.communicate()" will raise "TimeoutExpired" if the timeout expires before the process exits. Exceptions defined in this module all inherit from "SubprocessError". New in version 3.3: The "SubprocessError" base class was added. Security Considerations ======================= Unlike some other popen functions, this library will not implicitly choose to call a system shell. This means that all characters, including shell metacharacters, can safely be passed to child processes. If the shell is invoked explicitly, via "shell=True", it is the application’s responsibility to ensure that all whitespace and metacharacters are quoted appropriately to avoid shell injection vulnerabilities. When using "shell=True", the "shlex.quote()" function can be used to properly escape whitespace and shell metacharacters in strings that are going to be used to construct shell commands. On Windows, batch files ("*.bat" or "*.cmd") may be launched by the operating system in a system shell regardless of the arguments passed to this library. This could result in arguments being parsed according to shell rules, but without any escaping added by Python. If you are intentionally launching a batch file with arguments from untrusted sources, consider passing "shell=True" to allow Python to escape special characters. See gh-114539 for additional discussion. Popen Objects ============= Instances of the "Popen" class have the following methods: Popen.poll() Check if child process has terminated. Set and return "returncode" attribute. Otherwise, returns "None". Popen.wait(timeout=None) Wait for child process to terminate. Set and return "returncode" attribute. If the process does not terminate after *timeout* seconds, raise a "TimeoutExpired" exception. It is safe to catch this exception and retry the wait. Note: This will deadlock when using "stdout=PIPE" or "stderr=PIPE" and the child process generates enough output to a pipe such that it blocks waiting for the OS pipe buffer to accept more data. Use "Popen.communicate()" when using pipes to avoid that. Note: The function is implemented using a busy loop (non-blocking call and short sleeps). Use the "asyncio" module for an asynchronous wait: see "asyncio.create_subprocess_exec". Changed in version 3.3: *timeout* was added. Popen.communicate(input=None, timeout=None) Interact with process: Send data to stdin. Read data from stdout and stderr, until end-of-file is reached. Wait for process to terminate and set the "returncode" attribute. The optional *input* argument should be data to be sent to the child process, or "None", if no data should be sent to the child. If streams were opened in text mode, *input* must be a string. Otherwise, it must be bytes. "communicate()" returns a tuple "(stdout_data, stderr_data)". The data will be strings if streams were opened in text mode; otherwise, bytes. Note that if you want to send data to the process’s stdin, you need to create the Popen object with "stdin=PIPE". Similarly, to get anything other than "None" in the result tuple, you need to give "stdout=PIPE" and/or "stderr=PIPE" too. If the process does not terminate after *timeout* seconds, a "TimeoutExpired" exception will be raised. Catching this exception and retrying communication will not lose any output. The child process is not killed if the timeout expires, so in order to cleanup properly a well-behaved application should kill the child process and finish communication: proc = subprocess.Popen(...) try: outs, errs = proc.communicate(timeout=15) except TimeoutExpired: proc.kill() outs, errs = proc.communicate() Note: The data read is buffered in memory, so do not use this method if the data size is large or unlimited. Changed in version 3.3: *timeout* was added. Popen.send_signal(signal) Sends the signal *signal* to the child. Do nothing if the process completed. Note: On Windows, SIGTERM is an alias for "terminate()". CTRL_C_EVENT and CTRL_BREAK_EVENT can be sent to processes started with a *creationflags* parameter which includes *CREATE_NEW_PROCESS_GROUP*. Popen.terminate() Stop the child. On POSIX OSs the method sends SIGTERM to the child. On Windows the Win32 API function "TerminateProcess()" is called to stop the child. Popen.kill() Kills the child. On POSIX OSs the function sends SIGKILL to the child. On Windows "kill()" is an alias for "terminate()". The following attributes are also available: Popen.args The *args* argument as it was passed to "Popen" – a sequence of program arguments or else a single string. New in version 3.3. Popen.stdin If the *stdin* argument was "PIPE", this attribute is a writeable stream object as returned by "open()". If the *encoding* or *errors* arguments were specified or the *universal_newlines* argument was "True", the stream is a text stream, otherwise it is a byte stream. If the *stdin* argument was not "PIPE", this attribute is "None". Popen.stdout If the *stdout* argument was "PIPE", this attribute is a readable stream object as returned by "open()". Reading from the stream provides output from the child process. If the *encoding* or *errors* arguments were specified or the *universal_newlines* argument was "True", the stream is a text stream, otherwise it is a byte stream. If the *stdout* argument was not "PIPE", this attribute is "None". Popen.stderr If the *stderr* argument was "PIPE", this attribute is a readable stream object as returned by "open()". Reading from the stream provides error output from the child process. If the *encoding* or *errors* arguments were specified or the *universal_newlines* argument was "True", the stream is a text stream, otherwise it is a byte stream. If the *stderr* argument was not "PIPE", this attribute is "None". Warning: Use "communicate()" rather than ".stdin.write", ".stdout.read" or ".stderr.read" to avoid deadlocks due to any of the other OS pipe buffers filling up and blocking the child process. Popen.pid The process ID of the child process. Note that if you set the *shell* argument to "True", this is the process ID of the spawned shell. Popen.returncode The child return code, set by "poll()" and "wait()" (and indirectly by "communicate()"). A "None" value indicates that the process hasn’t terminated yet. A negative value "-N" indicates that the child was terminated by signal "N" (POSIX only). Windows Popen Helpers ===================== The "STARTUPINFO" class and following constants are only available on Windows. class subprocess.STARTUPINFO(*, dwFlags=0, hStdInput=None, hStdOutput=None, hStdError=None, wShowWindow=0, lpAttributeList=None) Partial support of the Windows STARTUPINFO structure is used for "Popen" creation. The following attributes can be set by passing them as keyword-only arguments. Changed in version 3.7: Keyword-only argument support was added. dwFlags A bit field that determines whether certain "STARTUPINFO" attributes are used when the process creates a window. si = subprocess.STARTUPINFO() si.dwFlags = subprocess.STARTF_USESTDHANDLES | subprocess.STARTF_USESHOWWINDOW hStdInput If "dwFlags" specifies "STARTF_USESTDHANDLES", this attribute is the standard input handle for the process. If "STARTF_USESTDHANDLES" is not specified, the default for standard input is the keyboard buffer. hStdOutput If "dwFlags" specifies "STARTF_USESTDHANDLES", this attribute is the standard output handle for the process. Otherwise, this attribute is ignored and the default for standard output is the console window’s buffer. hStdError If "dwFlags" specifies "STARTF_USESTDHANDLES", this attribute is the standard error handle for the process. Otherwise, this attribute is ignored and the default for standard error is the console window’s buffer. wShowWindow If "dwFlags" specifies "STARTF_USESHOWWINDOW", this attribute can be any of the values that can be specified in the "nCmdShow" parameter for the ShowWindow function, except for "SW_SHOWDEFAULT". Otherwise, this attribute is ignored. "SW_HIDE" is provided for this attribute. It is used when "Popen" is called with "shell=True". lpAttributeList A dictionary of additional attributes for process creation as given in "STARTUPINFOEX", see UpdateProcThreadAttribute. Supported attributes: **handle_list** Sequence of handles that will be inherited. *close_fds* must be true if non-empty. The handles must be temporarily made inheritable by "os.set_handle_inheritable()" when passed to the "Popen" constructor, else "OSError" will be raised with Windows error "ERROR_INVALID_PARAMETER" (87). Warning: In a multithreaded process, use caution to avoid leaking handles that are marked inheritable when combining this feature with concurrent calls to other process creation functions that inherit all handles such as "os.system()". This also applies to standard handle redirection, which temporarily creates inheritable handles. New in version 3.7. Windows Constants ----------------- The "subprocess" module exposes the following constants. subprocess.STD_INPUT_HANDLE The standard input device. Initially, this is the console input buffer, "CONIN$". subprocess.STD_OUTPUT_HANDLE The standard output device. Initially, this is the active console screen buffer, "CONOUT$". subprocess.STD_ERROR_HANDLE The standard error device. Initially, this is the active console screen buffer, "CONOUT$". subprocess.SW_HIDE Hides the window. Another window will be activated. subprocess.STARTF_USESTDHANDLES Specifies that the "STARTUPINFO.hStdInput", "STARTUPINFO.hStdOutput", and "STARTUPINFO.hStdError" attributes contain additional information. subprocess.STARTF_USESHOWWINDOW Specifies that the "STARTUPINFO.wShowWindow" attribute contains additional information. subprocess.CREATE_NEW_CONSOLE The new process has a new console, instead of inheriting its parent’s console (the default). subprocess.CREATE_NEW_PROCESS_GROUP A "Popen" "creationflags" parameter to specify that a new process group will be created. This flag is necessary for using "os.kill()" on the subprocess. This flag is ignored if "CREATE_NEW_CONSOLE" is specified. subprocess.ABOVE_NORMAL_PRIORITY_CLASS A "Popen" "creationflags" parameter to specify that a new process will have an above average priority. New in version 3.7. subprocess.BELOW_NORMAL_PRIORITY_CLASS A "Popen" "creationflags" parameter to specify that a new process will have a below average priority. New in version 3.7. subprocess.HIGH_PRIORITY_CLASS A "Popen" "creationflags" parameter to specify that a new process will have a high priority. New in version 3.7. subprocess.IDLE_PRIORITY_CLASS A "Popen" "creationflags" parameter to specify that a new process will have an idle (lowest) priority. New in version 3.7. subprocess.NORMAL_PRIORITY_CLASS A "Popen" "creationflags" parameter to specify that a new process will have an normal priority. (default) New in version 3.7. subprocess.REALTIME_PRIORITY_CLASS A "Popen" "creationflags" parameter to specify that a new process will have realtime priority. You should almost never use REALTIME_PRIORITY_CLASS, because this interrupts system threads that manage mouse input, keyboard input, and background disk flushing. This class can be appropriate for applications that “talk” directly to hardware or that perform brief tasks that should have limited interruptions. New in version 3.7. subprocess.CREATE_NO_WINDOW A "Popen" "creationflags" parameter to specify that a new process will not create a window. New in version 3.7. subprocess.DETACHED_PROCESS A "Popen" "creationflags" parameter to specify that a new process will not inherit its parent’s console. This value cannot be used with CREATE_NEW_CONSOLE. New in version 3.7. subprocess.CREATE_DEFAULT_ERROR_MODE A "Popen" "creationflags" parameter to specify that a new process does not inherit the error mode of the calling process. Instead, the new process gets the default error mode. This feature is particularly useful for multithreaded shell applications that run with hard errors disabled. New in version 3.7. subprocess.CREATE_BREAKAWAY_FROM_JOB A "Popen" "creationflags" parameter to specify that a new process is not associated with the job. New in version 3.7. Older high-level API ==================== Prior to Python 3.5, these three functions comprised the high level API to subprocess. You can now use "run()" in many cases, but lots of existing code calls these functions. subprocess.call(args, *, stdin=None, stdout=None, stderr=None, shell=False, cwd=None, timeout=None, **other_popen_kwargs) Run the command described by *args*. Wait for command to complete, then return the "returncode" attribute. Code needing to capture stdout or stderr should use "run()" instead: run(...).returncode To suppress stdout or stderr, supply a value of "DEVNULL". The arguments shown above are merely some common ones. The full function signature is the same as that of the "Popen" constructor - this function passes all supplied arguments other than *timeout* directly through to that interface. Note: Do not use "stdout=PIPE" or "stderr=PIPE" with this function. The child process will block if it generates enough output to a pipe to fill up the OS pipe buffer as the pipes are not being read from. Changed in version 3.3: *timeout* was added. Changed in version 3.9.17: Changed Windows shell search order for "shell=True". The current directory and "%PATH%" are replaced with "%COMSPEC%" and "%SystemRoot%\System32\cmd.exe". As a result, dropping a malicious program named "cmd.exe" into a current directory no longer works. subprocess.check_call(args, *, stdin=None, stdout=None, stderr=None, shell=False, cwd=None, timeout=None, **other_popen_kwargs) Run command with arguments. Wait for command to complete. If the return code was zero then return, otherwise raise "CalledProcessError". The "CalledProcessError" object will have the return code in the "returncode" attribute. If "check_call()" was unable to start the process it will propagate the exception that was raised. Code needing to capture stdout or stderr should use "run()" instead: run(..., check=True) To suppress stdout or stderr, supply a value of "DEVNULL". The arguments shown above are merely some common ones. The full function signature is the same as that of the "Popen" constructor - this function passes all supplied arguments other than *timeout* directly through to that interface. Note: Do not use "stdout=PIPE" or "stderr=PIPE" with this function. The child process will block if it generates enough output to a pipe to fill up the OS pipe buffer as the pipes are not being read from. Changed in version 3.3: *timeout* was added. Changed in version 3.9.17: Changed Windows shell search order for "shell=True". The current directory and "%PATH%" are replaced with "%COMSPEC%" and "%SystemRoot%\System32\cmd.exe". As a result, dropping a malicious program named "cmd.exe" into a current directory no longer works. subprocess.check_output(args, *, stdin=None, stderr=None, shell=False, cwd=None, encoding=None, errors=None, universal_newlines=None, timeout=None, text=None, **other_popen_kwargs) Run command with arguments and return its output. If the return code was non-zero it raises a "CalledProcessError". The "CalledProcessError" object will have the return code in the "returncode" attribute and any output in the "output" attribute. This is equivalent to: run(..., check=True, stdout=PIPE).stdout The arguments shown above are merely some common ones. The full function signature is largely the same as that of "run()" - most arguments are passed directly through to that interface. One API deviation from "run()" behavior exists: passing "input=None" will behave the same as "input=b''" (or "input=''", depending on other arguments) rather than using the parent’s standard input file handle. By default, this function will return the data as encoded bytes. The actual encoding of the output data may depend on the command being invoked, so the decoding to text will often need to be handled at the application level. This behaviour may be overridden by setting *text*, *encoding*, *errors*, or *universal_newlines* to "True" as described in Frequently Used Arguments and "run()". To also capture standard error in the result, use "stderr=subprocess.STDOUT": >>> subprocess.check_output( ... "ls non_existent_file; exit 0", ... stderr=subprocess.STDOUT, ... shell=True) 'ls: non_existent_file: No such file or directory\n' New in version 3.1. Changed in version 3.3: *timeout* was added. Changed in version 3.4: Support for the *input* keyword argument was added. Changed in version 3.6: *encoding* and *errors* were added. See "run()" for details. New in version 3.7: *text* was added as a more readable alias for *universal_newlines*. Changed in version 3.9.17: Changed Windows shell search order for "shell=True". The current directory and "%PATH%" are replaced with "%COMSPEC%" and "%SystemRoot%\System32\cmd.exe". As a result, dropping a malicious program named "cmd.exe" into a current directory no longer works. Replacing Older Functions with the "subprocess" Module ====================================================== In this section, “a becomes b” means that b can be used as a replacement for a. Note: All “a” functions in this section fail (more or less) silently if the executed program cannot be found; the “b” replacements raise "OSError" instead.In addition, the replacements using "check_output()" will fail with a "CalledProcessError" if the requested operation produces a non-zero return code. The output is still available as the "output" attribute of the raised exception. In the following examples, we assume that the relevant functions have already been imported from the "subprocess" module. Replacing **/bin/sh** shell command substitution ------------------------------------------------ output=$(mycmd myarg) becomes: output = check_output(["mycmd", "myarg"]) Replacing shell pipeline ------------------------ output=$(dmesg | grep hda) becomes: p1 = Popen(["dmesg"], stdout=PIPE) p2 = Popen(["grep", "hda"], stdin=p1.stdout, stdout=PIPE) p1.stdout.close() # Allow p1 to receive a SIGPIPE if p2 exits. output = p2.communicate()[0] The "p1.stdout.close()" call after starting the p2 is important in order for p1 to receive a SIGPIPE if p2 exits before p1. Alternatively, for trusted input, the shell’s own pipeline support may still be used directly: output=$(dmesg | grep hda) becomes: output=check_output("dmesg | grep hda", shell=True) Replacing "os.system()" ----------------------- sts = os.system("mycmd" + " myarg") # becomes retcode = call("mycmd" + " myarg", shell=True) Notes: * Calling the program through the shell is usually not required. * The "call()" return value is encoded differently to that of "os.system()". * The "os.system()" function ignores SIGINT and SIGQUIT signals while the command is running, but the caller must do this separately when using the "subprocess" module. A more realistic example would look like this: try: retcode = call("mycmd" + " myarg", shell=True) if retcode < 0: print("Child was terminated by signal", -retcode, file=sys.stderr) else: print("Child returned", retcode, file=sys.stderr) except OSError as e: print("Execution failed:", e, file=sys.stderr) Replacing the "os.spawn" family ------------------------------- P_NOWAIT example: pid = os.spawnlp(os.P_NOWAIT, "/bin/mycmd", "mycmd", "myarg") ==> pid = Popen(["/bin/mycmd", "myarg"]).pid P_WAIT example: retcode = os.spawnlp(os.P_WAIT, "/bin/mycmd", "mycmd", "myarg") ==> retcode = call(["/bin/mycmd", "myarg"]) Vector example: os.spawnvp(os.P_NOWAIT, path, args) ==> Popen([path] + args[1:]) Environment example: os.spawnlpe(os.P_NOWAIT, "/bin/mycmd", "mycmd", "myarg", env) ==> Popen(["/bin/mycmd", "myarg"], env={"PATH": "/usr/bin"}) Replacing "os.popen()", "os.popen2()", "os.popen3()" ---------------------------------------------------- (child_stdin, child_stdout) = os.popen2(cmd, mode, bufsize) ==> p = Popen(cmd, shell=True, bufsize=bufsize, stdin=PIPE, stdout=PIPE, close_fds=True) (child_stdin, child_stdout) = (p.stdin, p.stdout) (child_stdin, child_stdout, child_stderr) = os.popen3(cmd, mode, bufsize) ==> p = Popen(cmd, shell=True, bufsize=bufsize, stdin=PIPE, stdout=PIPE, stderr=PIPE, close_fds=True) (child_stdin, child_stdout, child_stderr) = (p.stdin, p.stdout, p.stderr) (child_stdin, child_stdout_and_stderr) = os.popen4(cmd, mode, bufsize) ==> p = Popen(cmd, shell=True, bufsize=bufsize, stdin=PIPE, stdout=PIPE, stderr=STDOUT, close_fds=True) (child_stdin, child_stdout_and_stderr) = (p.stdin, p.stdout) Return code handling translates as follows: pipe = os.popen(cmd, 'w') ... rc = pipe.close() if rc is not None and rc >> 8: print("There were some errors") ==> process = Popen(cmd, stdin=PIPE) ... process.stdin.close() if process.wait() != 0: print("There were some errors") Replacing functions from the "popen2" module -------------------------------------------- Note: If the cmd argument to popen2 functions is a string, the command is executed through /bin/sh. If it is a list, the command is directly executed. (child_stdout, child_stdin) = popen2.popen2("somestring", bufsize, mode) ==> p = Popen("somestring", shell=True, bufsize=bufsize, stdin=PIPE, stdout=PIPE, close_fds=True) (child_stdout, child_stdin) = (p.stdout, p.stdin) (child_stdout, child_stdin) = popen2.popen2(["mycmd", "myarg"], bufsize, mode) ==> p = Popen(["mycmd", "myarg"], bufsize=bufsize, stdin=PIPE, stdout=PIPE, close_fds=True) (child_stdout, child_stdin) = (p.stdout, p.stdin) "popen2.Popen3" and "popen2.Popen4" basically work as "subprocess.Popen", except that: * "Popen" raises an exception if the execution fails. * The *capturestderr* argument is replaced with the *stderr* argument. * "stdin=PIPE" and "stdout=PIPE" must be specified. * popen2 closes all file descriptors by default, but you have to specify "close_fds=True" with "Popen" to guarantee this behavior on all platforms or past Python versions. Legacy Shell Invocation Functions ================================= This module also provides the following legacy functions from the 2.x "commands" module. These operations implicitly invoke the system shell and none of the guarantees described above regarding security and exception handling consistency are valid for these functions. subprocess.getstatusoutput(cmd) Return "(exitcode, output)" of executing *cmd* in a shell. Execute the string *cmd* in a shell with "Popen.check_output()" and return a 2-tuple "(exitcode, output)". The locale encoding is used; see the notes on Frequently Used Arguments for more details. A trailing newline is stripped from the output. The exit code for the command can be interpreted as the return code of subprocess. Example: >>> subprocess.getstatusoutput('ls /bin/ls') (0, '/bin/ls') >>> subprocess.getstatusoutput('cat /bin/junk') (1, 'cat: /bin/junk: No such file or directory') >>> subprocess.getstatusoutput('/bin/junk') (127, 'sh: /bin/junk: not found') >>> subprocess.getstatusoutput('/bin/kill $$') (-15, '') Availability: POSIX & Windows. Changed in version 3.3.4: Windows support was added.The function now returns (exitcode, output) instead of (status, output) as it did in Python 3.3.3 and earlier. exitcode has the same value as "returncode". subprocess.getoutput(cmd) Return output (stdout and stderr) of executing *cmd* in a shell. Like "getstatusoutput()", except the exit code is ignored and the return value is a string containing the command’s output. Example: >>> subprocess.getoutput('ls /bin/ls') '/bin/ls' Availability: POSIX & Windows. Changed in version 3.3.4: Windows support added Notes ===== Converting an argument sequence to a string on Windows ------------------------------------------------------ On Windows, an *args* sequence is converted to a string that can be parsed using the following rules (which correspond to the rules used by the MS C runtime): 1. Arguments are delimited by white space, which is either a space or a tab. 2. A string surrounded by double quotation marks is interpreted as a single argument, regardless of white space contained within. A quoted string can be embedded in an argument. 3. A double quotation mark preceded by a backslash is interpreted as a literal double quotation mark. 4. Backslashes are interpreted literally, unless they immediately precede a double quotation mark. 5. If backslashes immediately precede a double quotation mark, every pair of backslashes is interpreted as a literal backslash. If the number of backslashes is odd, the last backslash escapes the next double quotation mark as described in rule 3. See also: "shlex" Module which provides function to parse and escape command lines. "sunau" — Read and write Sun AU files ************************************* **Source code:** Lib/sunau.py Deprecated since version 3.11: The "sunau" module is deprecated (see **PEP 594** for details). ====================================================================== The "sunau" module provides a convenient interface to the Sun AU sound format. Note that this module is interface-compatible with the modules "aifc" and "wave". An audio file consists of a header followed by the data. The fields of the header are: +-----------------+-------------------------------------------------+ | Field | Contents | |=================|=================================================| | magic word | The four bytes ".snd". | +-----------------+-------------------------------------------------+ | header size | Size of the header, including info, in bytes. | +-----------------+-------------------------------------------------+ | data size | Physical size of the data, in bytes. | +-----------------+-------------------------------------------------+ | encoding | Indicates how the audio samples are encoded. | +-----------------+-------------------------------------------------+ | sample rate | The sampling rate. | +-----------------+-------------------------------------------------+ | # of channels | The number of channels in the samples. | +-----------------+-------------------------------------------------+ | info | ASCII string giving a description of the audio | | | file (padded with null bytes). | +-----------------+-------------------------------------------------+ Apart from the info field, all header fields are 4 bytes in size. They are all 32-bit unsigned integers encoded in big-endian byte order. The "sunau" module defines the following functions: sunau.open(file, mode) If *file* is a string, open the file by that name, otherwise treat it as a seekable file-like object. *mode* can be any of "'r'" Read only mode. "'w'" Write only mode. Note that it does not allow read/write files. A *mode* of "'r'" returns an "AU_read" object, while a *mode* of "'w'" or "'wb'" returns an "AU_write" object. The "sunau" module defines the following exception: exception sunau.Error An error raised when something is impossible because of Sun AU specs or implementation deficiency. The "sunau" module defines the following data items: sunau.AUDIO_FILE_MAGIC An integer every valid Sun AU file begins with, stored in big- endian form. This is the string ".snd" interpreted as an integer. sunau.AUDIO_FILE_ENCODING_MULAW_8 sunau.AUDIO_FILE_ENCODING_LINEAR_8 sunau.AUDIO_FILE_ENCODING_LINEAR_16 sunau.AUDIO_FILE_ENCODING_LINEAR_24 sunau.AUDIO_FILE_ENCODING_LINEAR_32 sunau.AUDIO_FILE_ENCODING_ALAW_8 Values of the encoding field from the AU header which are supported by this module. sunau.AUDIO_FILE_ENCODING_FLOAT sunau.AUDIO_FILE_ENCODING_DOUBLE sunau.AUDIO_FILE_ENCODING_ADPCM_G721 sunau.AUDIO_FILE_ENCODING_ADPCM_G722 sunau.AUDIO_FILE_ENCODING_ADPCM_G723_3 sunau.AUDIO_FILE_ENCODING_ADPCM_G723_5 Additional known values of the encoding field from the AU header, but which are not supported by this module. AU_read Objects =============== AU_read objects, as returned by "open()" above, have the following methods: AU_read.close() Close the stream, and make the instance unusable. (This is called automatically on deletion.) AU_read.getnchannels() Returns number of audio channels (1 for mono, 2 for stereo). AU_read.getsampwidth() Returns sample width in bytes. AU_read.getframerate() Returns sampling frequency. AU_read.getnframes() Returns number of audio frames. AU_read.getcomptype() Returns compression type. Supported compression types are "'ULAW'", "'ALAW'" and "'NONE'". AU_read.getcompname() Human-readable version of "getcomptype()". The supported types have the respective names "'CCITT G.711 u-law'", "'CCITT G.711 A-law'" and "'not compressed'". AU_read.getparams() Returns a "namedtuple()" "(nchannels, sampwidth, framerate, nframes, comptype, compname)", equivalent to output of the "get*()" methods. AU_read.readframes(n) Reads and returns at most *n* frames of audio, as a "bytes" object. The data will be returned in linear format. If the original data is in u-LAW format, it will be converted. AU_read.rewind() Rewind the file pointer to the beginning of the audio stream. The following two methods define a term “position” which is compatible between them, and is otherwise implementation dependent. AU_read.setpos(pos) Set the file pointer to the specified position. Only values returned from "tell()" should be used for *pos*. AU_read.tell() Return current file pointer position. Note that the returned value has nothing to do with the actual position in the file. The following two functions are defined for compatibility with the "aifc", and don’t do anything interesting. AU_read.getmarkers() Returns "None". AU_read.getmark(id) Raise an error. AU_write Objects ================ AU_write objects, as returned by "open()" above, have the following methods: AU_write.setnchannels(n) Set the number of channels. AU_write.setsampwidth(n) Set the sample width (in bytes.) Changed in version 3.4: Added support for 24-bit samples. AU_write.setframerate(n) Set the frame rate. AU_write.setnframes(n) Set the number of frames. This can be later changed, when and if more frames are written. AU_write.setcomptype(type, name) Set the compression type and description. Only "'NONE'" and "'ULAW'" are supported on output. AU_write.setparams(tuple) The *tuple* should be "(nchannels, sampwidth, framerate, nframes, comptype, compname)", with values valid for the "set*()" methods. Set all parameters. AU_write.tell() Return current position in the file, with the same disclaimer for the "AU_read.tell()" and "AU_read.setpos()" methods. AU_write.writeframesraw(data) Write audio frames, without correcting *nframes*. Changed in version 3.4: Any *bytes-like object* is now accepted. AU_write.writeframes(data) Write audio frames and make sure *nframes* is correct. Changed in version 3.4: Any *bytes-like object* is now accepted. AU_write.close() Make sure *nframes* is correct, and close the file. This method is called upon deletion. Note that it is invalid to set any parameters after calling "writeframes()" or "writeframesraw()". Superseded Modules ****************** The modules described in this chapter are deprecated and only kept for backwards compatibility. They have been superseded by other modules. * "aifc" — Read and write AIFF and AIFC files * "asynchat" — Asynchronous socket command/response handler * asynchat Example * "asyncore" — Asynchronous socket handler * asyncore Example basic HTTP client * asyncore Example basic echo server * "audioop" — Manipulate raw audio data * "cgi" — Common Gateway Interface support * Introduction * Using the cgi module * Higher Level Interface * Functions * Caring about security * Installing your CGI script on a Unix system * Testing your CGI script * Debugging CGI scripts * Common problems and solutions * "cgitb" — Traceback manager for CGI scripts * "chunk" — Read IFF chunked data * "crypt" — Function to check Unix passwords * Hashing Methods * Module Attributes * Module Functions * Examples * "imghdr" — Determine the type of an image * "imp" — Access the *import* internals * Examples * "mailcap" — Mailcap file handling * "msilib" — Read and write Microsoft Installer files * Database Objects * View Objects * Summary Information Objects * Record Objects * Errors * CAB Objects * Directory Objects * Features * GUI classes * Precomputed tables * "nis" — Interface to Sun’s NIS (Yellow Pages) * "nntplib" — NNTP protocol client * NNTP Objects * Attributes * Methods * Utility functions * "optparse" — Parser for command line options * Background * Terminology * What are options for? * What are positional arguments for? * Tutorial * Understanding option actions * The store action * Handling boolean (flag) options * Other actions * Default values * Generating help * Grouping Options * Printing a version string * How "optparse" handles errors * Putting it all together * Reference Guide * Creating the parser * Populating the parser * Defining options * Option attributes * Standard option actions * Standard option types * Parsing arguments * Querying and manipulating your option parser * Conflicts between options * Cleanup * Other methods * Option Callbacks * Defining a callback option * How callbacks are called * Raising errors in a callback * Callback example 1: trivial callback * Callback example 2: check option order * Callback example 3: check option order (generalized) * Callback example 4: check arbitrary condition * Callback example 5: fixed arguments * Callback example 6: variable arguments * Extending "optparse" * Adding new types * Adding new actions * "ossaudiodev" — Access to OSS-compatible audio devices * Audio Device Objects * Mixer Device Objects * "pipes" — Interface to shell pipelines * Template Objects * "smtpd" — SMTP Server * SMTPServer Objects * DebuggingServer Objects * PureProxy Objects * MailmanProxy Objects * SMTPChannel Objects * "sndhdr" — Determine type of sound file * "spwd" — The shadow password database * "sunau" — Read and write Sun AU files * AU_read Objects * AU_write Objects * "telnetlib" — Telnet client * Telnet Objects * Telnet Example * "uu" — Encode and decode uuencode files * "xdrlib" — Encode and decode XDR data * Packer Objects * Unpacker Objects * Exceptions "symbol" — Constants used with Python parse trees ************************************************* **Source code:** Lib/symbol.py ====================================================================== This module provides constants which represent the numeric values of internal nodes of the parse tree. Unlike most Python constants, these use lower-case names. Refer to the file "Grammar/Grammar" in the Python distribution for the definitions of the names in the context of the language grammar. The specific numeric values which the names map to may change between Python versions. Warning: The symbol module is deprecated and will be removed in future versions of Python. This module also provides one additional data object: symbol.sym_name Dictionary mapping the numeric values of the constants defined in this module back to name strings, allowing more human-readable representation of parse trees to be generated. "symtable" — Access to the compiler’s symbol tables *************************************************** **Source code:** Lib/symtable.py ====================================================================== Symbol tables are generated by the compiler from AST just before bytecode is generated. The symbol table is responsible for calculating the scope of every identifier in the code. "symtable" provides an interface to examine these tables. Generating Symbol Tables ======================== symtable.symtable(code, filename, compile_type) Return the toplevel "SymbolTable" for the Python source *code*. *filename* is the name of the file containing the code. *compile_type* is like the *mode* argument to "compile()". Examining Symbol Tables ======================= class symtable.SymbolTable A namespace table for a block. The constructor is not public. get_type() Return the type of the symbol table. Possible values are "'class'", "'module'", and "'function'". get_id() Return the table’s identifier. get_name() Return the table’s name. This is the name of the class if the table is for a class, the name of the function if the table is for a function, or "'top'" if the table is global ("get_type()" returns "'module'"). get_lineno() Return the number of the first line in the block this table represents. is_optimized() Return "True" if the locals in this table can be optimized. is_nested() Return "True" if the block is a nested class or function. has_children() Return "True" if the block has nested namespaces within it. These can be obtained with "get_children()". get_identifiers() Return a list of names of symbols in this table. lookup(name) Lookup *name* in the table and return a "Symbol" instance. get_symbols() Return a list of "Symbol" instances for names in the table. get_children() Return a list of the nested symbol tables. class symtable.Function A namespace for a function or method. This class inherits "SymbolTable". get_parameters() Return a tuple containing names of parameters to this function. get_locals() Return a tuple containing names of locals in this function. get_globals() Return a tuple containing names of globals in this function. get_nonlocals() Return a tuple containing names of nonlocals in this function. get_frees() Return a tuple containing names of free variables in this function. class symtable.Class A namespace of a class. This class inherits "SymbolTable". get_methods() Return a tuple containing the names of methods declared in the class. class symtable.Symbol An entry in a "SymbolTable" corresponding to an identifier in the source. The constructor is not public. get_name() Return the symbol’s name. is_referenced() Return "True" if the symbol is used in its block. is_imported() Return "True" if the symbol is created from an import statement. is_parameter() Return "True" if the symbol is a parameter. is_global() Return "True" if the symbol is global. is_nonlocal() Return "True" if the symbol is nonlocal. is_declared_global() Return "True" if the symbol is declared global with a global statement. is_local() Return "True" if the symbol is local to its block. is_annotated() Return "True" if the symbol is annotated. New in version 3.6. is_free() Return "True" if the symbol is referenced in its block, but not assigned to. is_assigned() Return "True" if the symbol is assigned to in its block. is_namespace() Return "True" if name binding introduces new namespace. If the name is used as the target of a function or class statement, this will be true. For example: >>> table = symtable.symtable("def some_func(): pass", "string", "exec") >>> table.lookup("some_func").is_namespace() True Note that a single name can be bound to multiple objects. If the result is "True", the name may also be bound to other objects, like an int or list, that does not introduce a new namespace. get_namespaces() Return a list of namespaces bound to this name. get_namespace() Return the namespace bound to this name. If more than one namespace is bound, "ValueError" is raised. "sys" — System-specific parameters and functions ************************************************ ====================================================================== This module provides access to some variables used or maintained by the interpreter and to functions that interact strongly with the interpreter. It is always available. sys.abiflags On POSIX systems where Python was built with the standard "configure" script, this contains the ABI flags as specified by **PEP 3149**. Changed in version 3.8: Default flags became an empty string ("m" flag for pymalloc has been removed). New in version 3.2. sys.addaudithook(hook) Append the callable *hook* to the list of active auditing hooks for the current (sub)interpreter. When an auditing event is raised through the "sys.audit()" function, each hook will be called in the order it was added with the event name and the tuple of arguments. Native hooks added by "PySys_AddAuditHook()" are called first, followed by hooks added in the current (sub)interpreter. Hooks can then log the event, raise an exception to abort the operation, or terminate the process entirely. Calling "sys.addaudithook()" will itself raise an auditing event named "sys.addaudithook" with no arguments. If any existing hooks raise an exception derived from "RuntimeError", the new hook will not be added and the exception suppressed. As a result, callers cannot assume that their hook has been added unless they control all existing hooks. See the audit events table for all events raised by CPython, and **PEP 578** for the original design discussion. New in version 3.8. Changed in version 3.8.1: Exceptions derived from "Exception" but not "RuntimeError" are no longer suppressed. **CPython implementation detail:** When tracing is enabled (see "settrace()"), Python hooks are only traced if the callable has a "__cantrace__" member that is set to a true value. Otherwise, trace functions will skip the hook. sys.argv The list of command line arguments passed to a Python script. "argv[0]" is the script name (it is operating system dependent whether this is a full pathname or not). If the command was executed using the "-c" command line option to the interpreter, "argv[0]" is set to the string "'-c'". If no script name was passed to the Python interpreter, "argv[0]" is the empty string. To loop over the standard input, or the list of files given on the command line, see the "fileinput" module. Note: On Unix, command line arguments are passed by bytes from OS. Python decodes them with filesystem encoding and “surrogateescape” error handler. When you need original bytes, you can get it by "[os.fsencode(arg) for arg in sys.argv]". sys.audit(event, *args) Raise an auditing event and trigger any active auditing hooks. *event* is a string identifying the event, and *args* may contain optional arguments with more information about the event. The number and types of arguments for a given event are considered a public and stable API and should not be modified between releases. For example, one auditing event is named "os.chdir". This event has one argument called *path* that will contain the requested new working directory. "sys.audit()" will call the existing auditing hooks, passing the event name and arguments, and will re-raise the first exception from any hook. In general, if an exception is raised, it should not be handled and the process should be terminated as quickly as possible. This allows hook implementations to decide how to respond to particular events: they can merely log the event or abort the operation by raising an exception. Hooks are added using the "sys.addaudithook()" or "PySys_AddAuditHook()" functions. The native equivalent of this function is "PySys_Audit()". Using the native function is preferred when possible. See the audit events table for all events raised by CPython. New in version 3.8. sys.base_exec_prefix Set during Python startup, before "site.py" is run, to the same value as "exec_prefix". If not running in a virtual environment, the values will stay the same; if "site.py" finds that a virtual environment is in use, the values of "prefix" and "exec_prefix" will be changed to point to the virtual environment, whereas "base_prefix" and "base_exec_prefix" will remain pointing to the base Python installation (the one which the virtual environment was created from). New in version 3.3. sys.base_prefix Set during Python startup, before "site.py" is run, to the same value as "prefix". If not running in a virtual environment, the values will stay the same; if "site.py" finds that a virtual environment is in use, the values of "prefix" and "exec_prefix" will be changed to point to the virtual environment, whereas "base_prefix" and "base_exec_prefix" will remain pointing to the base Python installation (the one which the virtual environment was created from). New in version 3.3. sys.byteorder An indicator of the native byte order. This will have the value "'big'" on big-endian (most-significant byte first) platforms, and "'little'" on little-endian (least-significant byte first) platforms. sys.builtin_module_names A tuple of strings giving the names of all modules that are compiled into this Python interpreter. (This information is not available in any other way — "modules.keys()" only lists the imported modules.) sys.call_tracing(func, args) Call "func(*args)", while tracing is enabled. The tracing state is saved, and restored afterwards. This is intended to be called from a debugger from a checkpoint, to recursively debug some other code. sys.copyright A string containing the copyright pertaining to the Python interpreter. sys._clear_type_cache() Clear the internal type cache. The type cache is used to speed up attribute and method lookups. Use the function *only* to drop unnecessary references during reference leak debugging. This function should be used for internal and specialized purposes only. sys._current_frames() Return a dictionary mapping each thread’s identifier to the topmost stack frame currently active in that thread at the time the function is called. Note that functions in the "traceback" module can build the call stack given such a frame. This is most useful for debugging deadlock: this function does not require the deadlocked threads’ cooperation, and such threads’ call stacks are frozen for as long as they remain deadlocked. The frame returned for a non-deadlocked thread may bear no relationship to that thread’s current activity by the time calling code examines the frame. This function should be used for internal and specialized purposes only. Raises an auditing event "sys._current_frames" with no arguments. sys.breakpointhook() This hook function is called by built-in "breakpoint()". By default, it drops you into the "pdb" debugger, but it can be set to any other function so that you can choose which debugger gets used. The signature of this function is dependent on what it calls. For example, the default binding (e.g. "pdb.set_trace()") expects no arguments, but you might bind it to a function that expects additional arguments (positional and/or keyword). The built-in "breakpoint()" function passes its "*args" and "**kws" straight through. Whatever "breakpointhooks()" returns is returned from "breakpoint()". The default implementation first consults the environment variable "PYTHONBREAKPOINT". If that is set to ""0"" then this function returns immediately; i.e. it is a no-op. If the environment variable is not set, or is set to the empty string, "pdb.set_trace()" is called. Otherwise this variable should name a function to run, using Python’s dotted-import nomenclature, e.g. "package.subpackage.module.function". In this case, "package.subpackage.module" would be imported and the resulting module must have a callable named "function()". This is run, passing in "*args" and "**kws", and whatever "function()" returns, "sys.breakpointhook()" returns to the built-in "breakpoint()" function. Note that if anything goes wrong while importing the callable named by "PYTHONBREAKPOINT", a "RuntimeWarning" is reported and the breakpoint is ignored. Also note that if "sys.breakpointhook()" is overridden programmatically, "PYTHONBREAKPOINT" is *not* consulted. New in version 3.7. sys._debugmallocstats() Print low-level information to stderr about the state of CPython’s memory allocator. If Python is configured –with-pydebug, it also performs some expensive internal consistency checks. New in version 3.3. **CPython implementation detail:** This function is specific to CPython. The exact output format is not defined here, and may change. sys.dllhandle Integer specifying the handle of the Python DLL. Availability: Windows. sys.displayhook(value) If *value* is not "None", this function prints "repr(value)" to "sys.stdout", and saves *value* in "builtins._". If "repr(value)" is not encodable to "sys.stdout.encoding" with "sys.stdout.errors" error handler (which is probably "'strict'"), encode it to "sys.stdout.encoding" with "'backslashreplace'" error handler. "sys.displayhook" is called on the result of evaluating an *expression* entered in an interactive Python session. The display of these values can be customized by assigning another one-argument function to "sys.displayhook". Pseudo-code: def displayhook(value): if value is None: return # Set '_' to None to avoid recursion builtins._ = None text = repr(value) try: sys.stdout.write(text) except UnicodeEncodeError: bytes = text.encode(sys.stdout.encoding, 'backslashreplace') if hasattr(sys.stdout, 'buffer'): sys.stdout.buffer.write(bytes) else: text = bytes.decode(sys.stdout.encoding, 'strict') sys.stdout.write(text) sys.stdout.write("\n") builtins._ = value Changed in version 3.2: Use "'backslashreplace'" error handler on "UnicodeEncodeError". sys.dont_write_bytecode If this is true, Python won’t try to write ".pyc" files on the import of source modules. This value is initially set to "True" or "False" depending on the "-B" command line option and the "PYTHONDONTWRITEBYTECODE" environment variable, but you can set it yourself to control bytecode file generation. sys.pycache_prefix If this is set (not "None"), Python will write bytecode-cache ".pyc" files to (and read them from) a parallel directory tree rooted at this directory, rather than from "__pycache__" directories in the source code tree. Any "__pycache__" directories in the source code tree will be ignored and new *.pyc* files written within the pycache prefix. Thus if you use "compileall" as a pre-build step, you must ensure you run it with the same pycache prefix (if any) that you will use at runtime. A relative path is interpreted relative to the current working directory. This value is initially set based on the value of the "-X" "pycache_prefix=PATH" command-line option or the "PYTHONPYCACHEPREFIX" environment variable (command-line takes precedence). If neither are set, it is "None". New in version 3.8. sys.excepthook(type, value, traceback) This function prints out a given traceback and exception to "sys.stderr". When an exception is raised and uncaught, the interpreter calls "sys.excepthook" with three arguments, the exception class, exception instance, and a traceback object. In an interactive session this happens just before control is returned to the prompt; in a Python program this happens just before the program exits. The handling of such top-level exceptions can be customized by assigning another three-argument function to "sys.excepthook". Raise an auditing event "sys.excepthook" with arguments "hook", "type", "value", "traceback" when an uncaught exception occurs. If no hook has been set, "hook" may be "None". If any hook raises an exception derived from "RuntimeError" the call to the hook will be suppressed. Otherwise, the audit hook exception will be reported as unraisable and "sys.excepthook" will be called. See also: The "sys.unraisablehook()" function handles unraisable exceptions and the "threading.excepthook()" function handles exception raised by "threading.Thread.run()". sys.__breakpointhook__ sys.__displayhook__ sys.__excepthook__ sys.__unraisablehook__ These objects contain the original values of "breakpointhook", "displayhook", "excepthook", and "unraisablehook" at the start of the program. They are saved so that "breakpointhook", "displayhook" and "excepthook", "unraisablehook" can be restored in case they happen to get replaced with broken or alternative objects. New in version 3.7: __breakpointhook__ New in version 3.8: __unraisablehook__ sys.exc_info() This function returns a tuple of three values that give information about the exception that is currently being handled. The information returned is specific both to the current thread and to the current stack frame. If the current stack frame is not handling an exception, the information is taken from the calling stack frame, or its caller, and so on until a stack frame is found that is handling an exception. Here, “handling an exception” is defined as “executing an except clause.” For any stack frame, only information about the exception being currently handled is accessible. If no exception is being handled anywhere on the stack, a tuple containing three "None" values is returned. Otherwise, the values returned are "(type, value, traceback)". Their meaning is: *type* gets the type of the exception being handled (a subclass of "BaseException"); *value* gets the exception instance (an instance of the exception type); *traceback* gets a traceback object which encapsulates the call stack at the point where the exception originally occurred. sys.exec_prefix A string giving the site-specific directory prefix where the platform-dependent Python files are installed; by default, this is also "'/usr/local'". This can be set at build time with the "-- exec-prefix" argument to the **configure** script. Specifically, all configuration files (e.g. the "pyconfig.h" header file) are installed in the directory "*exec_prefix*/lib/python*X.Y*/config", and shared library modules are installed in "*exec_prefix*/lib/python*X.Y*/lib-dynload", where *X.Y* is the version number of Python, for example "3.2". Note: If a virtual environment is in effect, this value will be changed in "site.py" to point to the virtual environment. The value for the Python installation will still be available, via "base_exec_prefix". sys.executable A string giving the absolute path of the executable binary for the Python interpreter, on systems where this makes sense. If Python is unable to retrieve the real path to its executable, "sys.executable" will be an empty string or "None". sys.exit([arg]) Raise a "SystemExit" exception, signaling an intention to exit the interpreter. The optional argument *arg* can be an integer giving the exit status (defaulting to zero), or another type of object. If it is an integer, zero is considered “successful termination” and any nonzero value is considered “abnormal termination” by shells and the like. Most systems require it to be in the range 0–127, and produce undefined results otherwise. Some systems have a convention for assigning specific meanings to specific exit codes, but these are generally underdeveloped; Unix programs generally use 2 for command line syntax errors and 1 for all other kind of errors. If another type of object is passed, "None" is equivalent to passing zero, and any other object is printed to "stderr" and results in an exit code of 1. In particular, "sys.exit("some error message")" is a quick way to exit a program when an error occurs. Since "exit()" ultimately “only” raises an exception, it will only exit the process when called from the main thread, and the exception is not intercepted. Cleanup actions specified by finally clauses of "try" statements are honored, and it is possible to intercept the exit attempt at an outer level. Changed in version 3.6: If an error occurs in the cleanup after the Python interpreter has caught "SystemExit" (such as an error flushing buffered data in the standard streams), the exit status is changed to 120. sys.flags The *named tuple* *flags* exposes the status of command line flags. The attributes are read only. +-------------------------------+----------------------------------------------------------------------------------------------------------------+ | attribute | flag | |===============================|================================================================================================================| | "debug" | "-d" | +-------------------------------+----------------------------------------------------------------------------------------------------------------+ | "inspect" | "-i" | +-------------------------------+----------------------------------------------------------------------------------------------------------------+ | "interactive" | "-i" | +-------------------------------+----------------------------------------------------------------------------------------------------------------+ | "isolated" | "-I" | +-------------------------------+----------------------------------------------------------------------------------------------------------------+ | "optimize" | "-O" or "-OO" | +-------------------------------+----------------------------------------------------------------------------------------------------------------+ | "dont_write_bytecode" | "-B" | +-------------------------------+----------------------------------------------------------------------------------------------------------------+ | "no_user_site" | "-s" | +-------------------------------+----------------------------------------------------------------------------------------------------------------+ | "no_site" | "-S" | +-------------------------------+----------------------------------------------------------------------------------------------------------------+ | "ignore_environment" | "-E" | +-------------------------------+----------------------------------------------------------------------------------------------------------------+ | "verbose" | "-v" | +-------------------------------+----------------------------------------------------------------------------------------------------------------+ | "bytes_warning" | "-b" | +-------------------------------+----------------------------------------------------------------------------------------------------------------+ | "quiet" | "-q" | +-------------------------------+----------------------------------------------------------------------------------------------------------------+ | "hash_randomization" | "-R" | +-------------------------------+----------------------------------------------------------------------------------------------------------------+ | "dev_mode" | "-X dev" (Python Development Mode) | +-------------------------------+----------------------------------------------------------------------------------------------------------------+ | "utf8_mode" | "-X utf8" | +-------------------------------+----------------------------------------------------------------------------------------------------------------+ | "int_max_str_digits" | "-X int_max_str_digits" (integer string conversion length limitation) | +-------------------------------+----------------------------------------------------------------------------------------------------------------+ Changed in version 3.2: Added "quiet" attribute for the new "-q" flag. New in version 3.2.3: The "hash_randomization" attribute. Changed in version 3.3: Removed obsolete "division_warning" attribute. Changed in version 3.4: Added "isolated" attribute for "-I" "isolated" flag. Changed in version 3.7: Added the "dev_mode" attribute for the new Python Development Mode and the "utf8_mode" attribute for the new "-X" "utf8" flag. Changed in version 3.9.14: Added the "int_max_str_digits" attribute. sys.float_info A *named tuple* holding information about the float type. It contains low level information about the precision and internal representation. The values correspond to the various floating- point constants defined in the standard header file "float.h" for the ‘C’ programming language; see section 5.2.4.2.2 of the 1999 ISO/IEC C standard [C99], ‘Characteristics of floating types’, for details. +-----------------------+------------------+----------------------------------------------------+ | attribute | float.h macro | explanation | |=======================|==================|====================================================| | "epsilon" | DBL_EPSILON | difference between 1.0 and the least value greater | | | | than 1.0 that is representable as a float See | | | | also "math.ulp()". | +-----------------------+------------------+----------------------------------------------------+ | "dig" | DBL_DIG | maximum number of decimal digits that can be | | | | faithfully represented in a float; see below | +-----------------------+------------------+----------------------------------------------------+ | "mant_dig" | DBL_MANT_DIG | float precision: the number of base-"radix" digits | | | | in the significand of a float | +-----------------------+------------------+----------------------------------------------------+ | "max" | DBL_MAX | maximum representable positive finite float | +-----------------------+------------------+----------------------------------------------------+ | "max_exp" | DBL_MAX_EXP | maximum integer *e* such that "radix**(e-1)" is a | | | | representable finite float | +-----------------------+------------------+----------------------------------------------------+ | "max_10_exp" | DBL_MAX_10_EXP | maximum integer *e* such that "10**e" is in the | | | | range of representable finite floats | +-----------------------+------------------+----------------------------------------------------+ | "min" | DBL_MIN | minimum representable positive *normalized* float | | | | Use "math.ulp(0.0)" to get the smallest positive | | | | *denormalized* representable float. | +-----------------------+------------------+----------------------------------------------------+ | "min_exp" | DBL_MIN_EXP | minimum integer *e* such that "radix**(e-1)" is a | | | | normalized float | +-----------------------+------------------+----------------------------------------------------+ | "min_10_exp" | DBL_MIN_10_EXP | minimum integer *e* such that "10**e" is a | | | | normalized float | +-----------------------+------------------+----------------------------------------------------+ | "radix" | FLT_RADIX | radix of exponent representation | +-----------------------+------------------+----------------------------------------------------+ | "rounds" | FLT_ROUNDS | integer constant representing the rounding mode | | | | used for arithmetic operations. This reflects the | | | | value of the system FLT_ROUNDS macro at | | | | interpreter startup time. See section 5.2.4.2.2 | | | | of the C99 standard for an explanation of the | | | | possible values and their meanings. | +-----------------------+------------------+----------------------------------------------------+ The attribute "sys.float_info.dig" needs further explanation. If "s" is any string representing a decimal number with at most "sys.float_info.dig" significant digits, then converting "s" to a float and back again will recover a string representing the same decimal value: >>> import sys >>> sys.float_info.dig 15 >>> s = '3.14159265358979' # decimal string with 15 significant digits >>> format(float(s), '.15g') # convert to float and back -> same value '3.14159265358979' But for strings with more than "sys.float_info.dig" significant digits, this isn’t always true: >>> s = '9876543211234567' # 16 significant digits is too many! >>> format(float(s), '.16g') # conversion changes value '9876543211234568' sys.float_repr_style A string indicating how the "repr()" function behaves for floats. If the string has value "'short'" then for a finite float "x", "repr(x)" aims to produce a short string with the property that "float(repr(x)) == x". This is the usual behaviour in Python 3.1 and later. Otherwise, "float_repr_style" has value "'legacy'" and "repr(x)" behaves in the same way as it did in versions of Python prior to 3.1. New in version 3.1. sys.getallocatedblocks() Return the number of memory blocks currently allocated by the interpreter, regardless of their size. This function is mainly useful for tracking and debugging memory leaks. Because of the interpreter’s internal caches, the result can vary from call to call; you may have to call "_clear_type_cache()" and "gc.collect()" to get more predictable results. If a Python build or implementation cannot reasonably compute this information, "getallocatedblocks()" is allowed to return 0 instead. New in version 3.4. sys.getandroidapilevel() Return the build time API version of Android as an integer. Availability: Android. New in version 3.7. sys.getdefaultencoding() Return the name of the current default string encoding used by the Unicode implementation. sys.getdlopenflags() Return the current value of the flags that are used for "dlopen()" calls. Symbolic names for the flag values can be found in the "os" module ("RTLD_xxx" constants, e.g. "os.RTLD_LAZY"). Availability: Unix. sys.getfilesystemencoding() Return the name of the encoding used to convert between Unicode filenames and bytes filenames. For best compatibility, str should be used for filenames in all cases, although representing filenames as bytes is also supported. Functions accepting or returning filenames should support either str or bytes and internally convert to the system’s preferred representation. This encoding is always ASCII-compatible. "os.fsencode()" and "os.fsdecode()" should be used to ensure that the correct encoding and errors mode are used. * In the UTF-8 mode, the encoding is "utf-8" on any platform. * On macOS, the encoding is "'utf-8'". * On Unix, the encoding is the locale encoding. * On Windows, the encoding may be "'utf-8'" or "'mbcs'", depending on user configuration. * On Android, the encoding is "'utf-8'". * On VxWorks, the encoding is "'utf-8'". Changed in version 3.2: "getfilesystemencoding()" result cannot be "None" anymore. Changed in version 3.6: Windows is no longer guaranteed to return "'mbcs'". See **PEP 529** and "_enablelegacywindowsfsencoding()" for more information. Changed in version 3.7: Return ‘utf-8’ in the UTF-8 mode. sys.getfilesystemencodeerrors() Return the name of the error mode used to convert between Unicode filenames and bytes filenames. The encoding name is returned from "getfilesystemencoding()". "os.fsencode()" and "os.fsdecode()" should be used to ensure that the correct encoding and errors mode are used. New in version 3.6. sys.get_int_max_str_digits() Returns the current value for the integer string conversion length limitation. See also "set_int_max_str_digits()". New in version 3.9.14. sys.getrefcount(object) Return the reference count of the *object*. The count returned is generally one higher than you might expect, because it includes the (temporary) reference as an argument to "getrefcount()". sys.getrecursionlimit() Return the current value of the recursion limit, the maximum depth of the Python interpreter stack. This limit prevents infinite recursion from causing an overflow of the C stack and crashing Python. It can be set by "setrecursionlimit()". sys.getsizeof(object[, default]) Return the size of an object in bytes. The object can be any type of object. All built-in objects will return correct results, but this does not have to hold true for third-party extensions as it is implementation specific. Only the memory consumption directly attributed to the object is accounted for, not the memory consumption of objects it refers to. If given, *default* will be returned if the object does not provide means to retrieve the size. Otherwise a "TypeError" will be raised. "getsizeof()" calls the object’s "__sizeof__" method and adds an additional garbage collector overhead if the object is managed by the garbage collector. See recursive sizeof recipe for an example of using "getsizeof()" recursively to find the size of containers and all their contents. sys.getswitchinterval() Return the interpreter’s “thread switch interval”; see "setswitchinterval()". New in version 3.2. sys._getframe([depth]) Return a frame object from the call stack. If optional integer *depth* is given, return the frame object that many calls below the top of the stack. If that is deeper than the call stack, "ValueError" is raised. The default for *depth* is zero, returning the frame at the top of the call stack. Raises an auditing event "sys._getframe" with no arguments. **CPython implementation detail:** This function should be used for internal and specialized purposes only. It is not guaranteed to exist in all implementations of Python. sys.getprofile() Get the profiler function as set by "setprofile()". sys.gettrace() Get the trace function as set by "settrace()". **CPython implementation detail:** The "gettrace()" function is intended only for implementing debuggers, profilers, coverage tools and the like. Its behavior is part of the implementation platform, rather than part of the language definition, and thus may not be available in all Python implementations. sys.getwindowsversion() Return a named tuple describing the Windows version currently running. The named elements are *major*, *minor*, *build*, *platform*, *service_pack*, *service_pack_minor*, *service_pack_major*, *suite_mask*, *product_type* and *platform_version*. *service_pack* contains a string, *platform_version* a 3-tuple and all other values are integers. The components can also be accessed by name, so "sys.getwindowsversion()[0]" is equivalent to "sys.getwindowsversion().major". For compatibility with prior versions, only the first 5 elements are retrievable by indexing. *platform* will be "2 (VER_PLATFORM_WIN32_NT)". *product_type* may be one of the following values: +-----------------------------------------+-----------------------------------+ | Constant | Meaning | |=========================================|===================================| | "1 (VER_NT_WORKSTATION)" | The system is a workstation. | +-----------------------------------------+-----------------------------------+ | "2 (VER_NT_DOMAIN_CONTROLLER)" | The system is a domain | | | controller. | +-----------------------------------------+-----------------------------------+ | "3 (VER_NT_SERVER)" | The system is a server, but not a | | | domain controller. | +-----------------------------------------+-----------------------------------+ This function wraps the Win32 "GetVersionEx()" function; see the Microsoft documentation on "OSVERSIONINFOEX()" for more information about these fields. *platform_version* returns the major version, minor version and build number of the current operating system, rather than the version that is being emulated for the process. It is intended for use in logging rather than for feature detection. Note: *platform_version* derives the version from kernel32.dll which can be of a different version than the OS version. Please use "platform" module for achieving accurate OS version. Availability: Windows. Changed in version 3.2: Changed to a named tuple and added *service_pack_minor*, *service_pack_major*, *suite_mask*, and *product_type*. Changed in version 3.6: Added *platform_version* sys.get_asyncgen_hooks() Returns an *asyncgen_hooks* object, which is similar to a "namedtuple" of the form *(firstiter, finalizer)*, where *firstiter* and *finalizer* are expected to be either "None" or functions which take an *asynchronous generator iterator* as an argument, and are used to schedule finalization of an asynchronous generator by an event loop. New in version 3.6: See **PEP 525** for more details. Note: This function has been added on a provisional basis (see **PEP 411** for details.) sys.get_coroutine_origin_tracking_depth() Get the current coroutine origin tracking depth, as set by "set_coroutine_origin_tracking_depth()". New in version 3.7. Note: This function has been added on a provisional basis (see **PEP 411** for details.) Use it only for debugging purposes. sys.hash_info A *named tuple* giving parameters of the numeric hash implementation. For more details about hashing of numeric types, see Hashing of numeric types. +-----------------------+----------------------------------------------------+ | attribute | explanation | |=======================|====================================================| | "width" | width in bits used for hash values | +-----------------------+----------------------------------------------------+ | "modulus" | prime modulus P used for numeric hash scheme | +-----------------------+----------------------------------------------------+ | "inf" | hash value returned for a positive infinity | +-----------------------+----------------------------------------------------+ | "nan" | hash value returned for a nan | +-----------------------+----------------------------------------------------+ | "imag" | multiplier used for the imaginary part of a | | | complex number | +-----------------------+----------------------------------------------------+ | "algorithm" | name of the algorithm for hashing of str, bytes, | | | and memoryview | +-----------------------+----------------------------------------------------+ | "hash_bits" | internal output size of the hash algorithm | +-----------------------+----------------------------------------------------+ | "seed_bits" | size of the seed key of the hash algorithm | +-----------------------+----------------------------------------------------+ New in version 3.2. Changed in version 3.4: Added *algorithm*, *hash_bits* and *seed_bits* sys.hexversion The version number encoded as a single integer. This is guaranteed to increase with each version, including proper support for non- production releases. For example, to test that the Python interpreter is at least version 1.5.2, use: if sys.hexversion >= 0x010502F0: # use some advanced feature ... else: # use an alternative implementation or warn the user ... This is called "hexversion" since it only really looks meaningful when viewed as the result of passing it to the built-in "hex()" function. The *named tuple* "sys.version_info" may be used for a more human-friendly encoding of the same information. More details of "hexversion" can be found at API and ABI Versioning. sys.implementation An object containing information about the implementation of the currently running Python interpreter. The following attributes are required to exist in all Python implementations. *name* is the implementation’s identifier, e.g. "'cpython'". The actual string is defined by the Python implementation, but it is guaranteed to be lower case. *version* is a named tuple, in the same format as "sys.version_info". It represents the version of the Python *implementation*. This has a distinct meaning from the specific version of the Python *language* to which the currently running interpreter conforms, which "sys.version_info" represents. For example, for PyPy 1.8 "sys.implementation.version" might be "sys.version_info(1, 8, 0, 'final', 0)", whereas "sys.version_info" would be "sys.version_info(2, 7, 2, 'final', 0)". For CPython they are the same value, since it is the reference implementation. *hexversion* is the implementation version in hexadecimal format, like "sys.hexversion". *cache_tag* is the tag used by the import machinery in the filenames of cached modules. By convention, it would be a composite of the implementation’s name and version, like "'cpython-33'". However, a Python implementation may use some other value if appropriate. If "cache_tag" is set to "None", it indicates that module caching should be disabled. "sys.implementation" may contain additional attributes specific to the Python implementation. These non-standard attributes must start with an underscore, and are not described here. Regardless of its contents, "sys.implementation" will not change during a run of the interpreter, nor between implementation versions. (It may change between Python language versions, however.) See **PEP 421** for more information. New in version 3.3. Note: The addition of new required attributes must go through the normal PEP process. See **PEP 421** for more information. sys.int_info A *named tuple* that holds information about Python’s internal representation of integers. The attributes are read only. +------------------------------------------+-------------------------------------------------+ | Attribute | Explanation | |==========================================|=================================================| | "bits_per_digit" | number of bits held in each digit. Python | | | integers are stored internally in base | | | "2**int_info.bits_per_digit" | +------------------------------------------+-------------------------------------------------+ | "sizeof_digit" | size in bytes of the C type used to represent a | | | digit | +------------------------------------------+-------------------------------------------------+ | "default_max_str_digits" | default value for | | | "sys.get_int_max_str_digits()" when it is not | | | otherwise explicitly configured. | +------------------------------------------+-------------------------------------------------+ | "str_digits_check_threshold" | minimum non-zero value for | | | "sys.set_int_max_str_digits()", | | | "PYTHONINTMAXSTRDIGITS", or "-X | | | int_max_str_digits". | +------------------------------------------+-------------------------------------------------+ New in version 3.1. Changed in version 3.9.14: Added "default_max_str_digits" and "str_digits_check_threshold". sys.__interactivehook__ When this attribute exists, its value is automatically called (with no arguments) when the interpreter is launched in interactive mode. This is done after the "PYTHONSTARTUP" file is read, so that you can set this hook there. The "site" module sets this. Raises an auditing event "cpython.run_interactivehook" with the hook object as the argument when the hook is called on startup. New in version 3.4. sys.intern(string) Enter *string* in the table of “interned” strings and return the interned string – which is *string* itself or a copy. Interning strings is useful to gain a little performance on dictionary lookup – if the keys in a dictionary are interned, and the lookup key is interned, the key comparisons (after hashing) can be done by a pointer compare instead of a string compare. Normally, the names used in Python programs are automatically interned, and the dictionaries used to hold module, class or instance attributes have interned keys. Interned strings are not immortal; you must keep a reference to the return value of "intern()" around to benefit from it. sys.is_finalizing() Return "True" if the Python interpreter is *shutting down*, "False" otherwise. New in version 3.5. sys.last_type sys.last_value sys.last_traceback These three variables are not always defined; they are set when an exception is not handled and the interpreter prints an error message and a stack traceback. Their intended use is to allow an interactive user to import a debugger module and engage in post- mortem debugging without having to re-execute the command that caused the error. (Typical use is "import pdb; pdb.pm()" to enter the post-mortem debugger; see "pdb" module for more information.) The meaning of the variables is the same as that of the return values from "exc_info()" above. sys.maxsize An integer giving the maximum value a variable of type "Py_ssize_t" can take. It’s usually "2**31 - 1" on a 32-bit platform and "2**63 - 1" on a 64-bit platform. sys.maxunicode An integer giving the value of the largest Unicode code point, i.e. "1114111" ("0x10FFFF" in hexadecimal). Changed in version 3.3: Before **PEP 393**, "sys.maxunicode" used to be either "0xFFFF" or "0x10FFFF", depending on the configuration option that specified whether Unicode characters were stored as UCS-2 or UCS-4. sys.meta_path A list of *meta path finder* objects that have their "find_spec()" methods called to see if one of the objects can find the module to be imported. The "find_spec()" method is called with at least the absolute name of the module being imported. If the module to be imported is contained in a package, then the parent package’s "__path__" attribute is passed in as a second argument. The method returns a *module spec*, or "None" if the module cannot be found. See also: "importlib.abc.MetaPathFinder" The abstract base class defining the interface of finder objects on "meta_path". "importlib.machinery.ModuleSpec" The concrete class which "find_spec()" should return instances of. Changed in version 3.4: *Module specs* were introduced in Python 3.4, by **PEP 451**. Earlier versions of Python looked for a method called "find_module()". This is still called as a fallback if a "meta_path" entry doesn’t have a "find_spec()" method. sys.modules This is a dictionary that maps module names to modules which have already been loaded. This can be manipulated to force reloading of modules and other tricks. However, replacing the dictionary will not necessarily work as expected and deleting essential items from the dictionary may cause Python to fail. sys.path A list of strings that specifies the search path for modules. Initialized from the environment variable "PYTHONPATH", plus an installation-dependent default. As initialized upon program startup, the first item of this list, "path[0]", is the directory containing the script that was used to invoke the Python interpreter. If the script directory is not available (e.g. if the interpreter is invoked interactively or if the script is read from standard input), "path[0]" is the empty string, which directs Python to search modules in the current directory first. Notice that the script directory is inserted *before* the entries inserted as a result of "PYTHONPATH". A program is free to modify this list for its own purposes. Only strings and bytes should be added to "sys.path"; all other data types are ignored during import. See also: Module "site" This describes how to use .pth files to extend "sys.path". sys.path_hooks A list of callables that take a path argument to try to create a *finder* for the path. If a finder can be created, it is to be returned by the callable, else raise "ImportError". Originally specified in **PEP 302**. sys.path_importer_cache A dictionary acting as a cache for *finder* objects. The keys are paths that have been passed to "sys.path_hooks" and the values are the finders that are found. If a path is a valid file system path but no finder is found on "sys.path_hooks" then "None" is stored. Originally specified in **PEP 302**. Changed in version 3.3: "None" is stored instead of "imp.NullImporter" when no finder is found. sys.platform This string contains a platform identifier that can be used to append platform-specific components to "sys.path", for instance. For Unix systems, except on Linux and AIX, this is the lowercased OS name as returned by "uname -s" with the first part of the version as returned by "uname -r" appended, e.g. "'sunos5'" or "'freebsd8'", *at the time when Python was built*. Unless you want to test for a specific system version, it is therefore recommended to use the following idiom: if sys.platform.startswith('freebsd'): # FreeBSD-specific code here... elif sys.platform.startswith('linux'): # Linux-specific code here... elif sys.platform.startswith('aix'): # AIX-specific code here... For other systems, the values are: +------------------+-----------------------------+ | System | "platform" value | |==================|=============================| | AIX | "'aix'" | +------------------+-----------------------------+ | Linux | "'linux'" | +------------------+-----------------------------+ | Windows | "'win32'" | +------------------+-----------------------------+ | Windows/Cygwin | "'cygwin'" | +------------------+-----------------------------+ | macOS | "'darwin'" | +------------------+-----------------------------+ Changed in version 3.3: On Linux, "sys.platform" doesn’t contain the major version anymore. It is always "'linux'", instead of "'linux2'" or "'linux3'". Since older Python versions include the version number, it is recommended to always use the "startswith" idiom presented above. Changed in version 3.8: On AIX, "sys.platform" doesn’t contain the major version anymore. It is always "'aix'", instead of "'aix5'" or "'aix7'". Since older Python versions include the version number, it is recommended to always use the "startswith" idiom presented above. See also: "os.name" has a coarser granularity. "os.uname()" gives system- dependent version information. The "platform" module provides detailed checks for the system’s identity. sys.platlibdir Name of the platform-specific library directory. It is used to build the path of standard library and the paths of installed extension modules. It is equal to ""lib"" on most platforms. On Fedora and SuSE, it is equal to ""lib64"" on 64-bit platforms which gives the following "sys.path" paths (where "X.Y" is the Python "major.minor" version): * "/usr/lib64/pythonX.Y/": Standard library (like "os.py" of the "os" module) * "/usr/lib64/pythonX.Y/lib-dynload/": C extension modules of the standard library (like the "errno" module, the exact filename is platform specific) * "/usr/lib/pythonX.Y/site-packages/" (always use "lib", not "sys.platlibdir"): Third-party modules * "/usr/lib64/pythonX.Y/site-packages/": C extension modules of third-party packages New in version 3.9. sys.prefix A string giving the site-specific directory prefix where the platform independent Python files are installed; on Unix, the default is "'/usr/local'". This can be set at build time with the "--prefix" argument to the **configure** script. See Installation paths for derived paths. Note: If a virtual environment is in effect, this value will be changed in "site.py" to point to the virtual environment. The value for the Python installation will still be available, via "base_prefix". sys.ps1 sys.ps2 Strings specifying the primary and secondary prompt of the interpreter. These are only defined if the interpreter is in interactive mode. Their initial values in this case are "'>>> '" and "'... '". If a non-string object is assigned to either variable, its "str()" is re-evaluated each time the interpreter prepares to read a new interactive command; this can be used to implement a dynamic prompt. sys.setdlopenflags(n) Set the flags used by the interpreter for "dlopen()" calls, such as when the interpreter loads extension modules. Among other things, this will enable a lazy resolving of symbols when importing a module, if called as "sys.setdlopenflags(0)". To share symbols across extension modules, call as "sys.setdlopenflags(os.RTLD_GLOBAL)". Symbolic names for the flag values can be found in the "os" module ("RTLD_xxx" constants, e.g. "os.RTLD_LAZY"). Availability: Unix. sys.set_int_max_str_digits(n) Set the integer string conversion length limitation used by this interpreter. See also "get_int_max_str_digits()". New in version 3.9.14. sys.setprofile(profilefunc) Set the system’s profile function, which allows you to implement a Python source code profiler in Python. See chapter The Python Profilers for more information on the Python profiler. The system’s profile function is called similarly to the system’s trace function (see "settrace()"), but it is called with different events, for example it isn’t called for each executed line of code (only on call and return, but the return event is reported even when an exception has been set). The function is thread-specific, but there is no way for the profiler to know about context switches between threads, so it does not make sense to use this in the presence of multiple threads. Also, its return value is not used, so it can simply return "None". Error in the profile function will cause itself unset. Profile functions should have three arguments: *frame*, *event*, and *arg*. *frame* is the current stack frame. *event* is a string: "'call'", "'return'", "'c_call'", "'c_return'", or "'c_exception'". *arg* depends on the event type. Raises an auditing event "sys.setprofile" with no arguments. The events have the following meaning: "'call'" A function is called (or some other code block entered). The profile function is called; *arg* is "None". "'return'" A function (or other code block) is about to return. The profile function is called; *arg* is the value that will be returned, or "None" if the event is caused by an exception being raised. "'c_call'" A C function is about to be called. This may be an extension function or a built-in. *arg* is the C function object. "'c_return'" A C function has returned. *arg* is the C function object. "'c_exception'" A C function has raised an exception. *arg* is the C function object. sys.setrecursionlimit(limit) Set the maximum depth of the Python interpreter stack to *limit*. This limit prevents infinite recursion from causing an overflow of the C stack and crashing Python. The highest possible limit is platform-dependent. A user may need to set the limit higher when they have a program that requires deep recursion and a platform that supports a higher limit. This should be done with care, because a too-high limit can lead to a crash. If the new limit is too low at the current recursion depth, a "RecursionError" exception is raised. Changed in version 3.5.1: A "RecursionError" exception is now raised if the new limit is too low at the current recursion depth. sys.setswitchinterval(interval) Set the interpreter’s thread switch interval (in seconds). This floating-point value determines the ideal duration of the “timeslices” allocated to concurrently running Python threads. Please note that the actual value can be higher, especially if long-running internal functions or methods are used. Also, which thread becomes scheduled at the end of the interval is the operating system’s decision. The interpreter doesn’t have its own scheduler. New in version 3.2. sys.settrace(tracefunc) Set the system’s trace function, which allows you to implement a Python source code debugger in Python. The function is thread- specific; for a debugger to support multiple threads, it must register a trace function using "settrace()" for each thread being debugged or use "threading.settrace()". Trace functions should have three arguments: *frame*, *event*, and *arg*. *frame* is the current stack frame. *event* is a string: "'call'", "'line'", "'return'", "'exception'" or "'opcode'". *arg* depends on the event type. The trace function is invoked (with *event* set to "'call'") whenever a new local scope is entered; it should return a reference to a local trace function to be used for the new scope, or "None" if the scope shouldn’t be traced. The local trace function should return a reference to itself (or to another function for further tracing in that scope), or "None" to turn off tracing in that scope. If there is any error occurred in the trace function, it will be unset, just like "settrace(None)" is called. The events have the following meaning: "'call'" A function is called (or some other code block entered). The global trace function is called; *arg* is "None"; the return value specifies the local trace function. "'line'" The interpreter is about to execute a new line of code or re- execute the condition of a loop. The local trace function is called; *arg* is "None"; the return value specifies the new local trace function. See "Objects/lnotab_notes.txt" for a detailed explanation of how this works. Per-line events may be disabled for a frame by setting "f_trace_lines" to "False" on that frame. "'return'" A function (or other code block) is about to return. The local trace function is called; *arg* is the value that will be returned, or "None" if the event is caused by an exception being raised. The trace function’s return value is ignored. "'exception'" An exception has occurred. The local trace function is called; *arg* is a tuple "(exception, value, traceback)"; the return value specifies the new local trace function. "'opcode'" The interpreter is about to execute a new opcode (see "dis" for opcode details). The local trace function is called; *arg* is "None"; the return value specifies the new local trace function. Per-opcode events are not emitted by default: they must be explicitly requested by setting "f_trace_opcodes" to "True" on the frame. Note that as an exception is propagated down the chain of callers, an "'exception'" event is generated at each level. For more fine-grained usage, it’s possible to set a trace function by assigning "frame.f_trace = tracefunc" explicitly, rather than relying on it being set indirectly via the return value from an already installed trace function. This is also required for activating the trace function on the current frame, which "settrace()" doesn’t do. Note that in order for this to work, a global tracing function must have been installed with "settrace()" in order to enable the runtime tracing machinery, but it doesn’t need to be the same tracing function (e.g. it could be a low overhead tracing function that simply returns "None" to disable itself immediately on each frame). For more information on code and frame objects, refer to The standard type hierarchy. Raises an auditing event "sys.settrace" with no arguments. **CPython implementation detail:** The "settrace()" function is intended only for implementing debuggers, profilers, coverage tools and the like. Its behavior is part of the implementation platform, rather than part of the language definition, and thus may not be available in all Python implementations. Changed in version 3.7: "'opcode'" event type added; "f_trace_lines" and "f_trace_opcodes" attributes added to frames sys.set_asyncgen_hooks(firstiter, finalizer) Accepts two optional keyword arguments which are callables that accept an *asynchronous generator iterator* as an argument. The *firstiter* callable will be called when an asynchronous generator is iterated for the first time. The *finalizer* will be called when an asynchronous generator is about to be garbage collected. Raises an auditing event "sys.set_asyncgen_hooks_firstiter" with no arguments. Raises an auditing event "sys.set_asyncgen_hooks_finalizer" with no arguments. Two auditing events are raised because the underlying API consists of two calls, each of which must raise its own event. New in version 3.6: See **PEP 525** for more details, and for a reference example of a *finalizer* method see the implementation of "asyncio.Loop.shutdown_asyncgens" in Lib/asyncio/base_events.py Note: This function has been added on a provisional basis (see **PEP 411** for details.) sys.set_coroutine_origin_tracking_depth(depth) Allows enabling or disabling coroutine origin tracking. When enabled, the "cr_origin" attribute on coroutine objects will contain a tuple of (filename, line number, function name) tuples describing the traceback where the coroutine object was created, with the most recent call first. When disabled, "cr_origin" will be None. To enable, pass a *depth* value greater than zero; this sets the number of frames whose information will be captured. To disable, pass set *depth* to zero. This setting is thread-specific. New in version 3.7. Note: This function has been added on a provisional basis (see **PEP 411** for details.) Use it only for debugging purposes. sys._enablelegacywindowsfsencoding() Changes the default filesystem encoding and errors mode to ‘mbcs’ and ‘replace’ respectively, for consistency with versions of Python prior to 3.6. This is equivalent to defining the "PYTHONLEGACYWINDOWSFSENCODING" environment variable before launching Python. Availability: Windows. New in version 3.6: See **PEP 529** for more details. sys.stdin sys.stdout sys.stderr *File objects* used by the interpreter for standard input, output and errors: * "stdin" is used for all interactive input (including calls to "input()"); * "stdout" is used for the output of "print()" and *expression* statements and for the prompts of "input()"; * The interpreter’s own prompts and its error messages go to "stderr". These streams are regular *text files* like those returned by the "open()" function. Their parameters are chosen as follows: * The character encoding is platform-dependent. Non-Windows platforms use the locale encoding (see "locale.getpreferredencoding()"). On Windows, UTF-8 is used for the console device. Non-character devices such as disk files and pipes use the system locale encoding (i.e. the ANSI codepage). Non-console character devices such as NUL (i.e. where "isatty()" returns "True") use the value of the console input and output codepages at startup, respectively for stdin and stdout/stderr. This defaults to the system locale encoding if the process is not initially attached to a console. The special behaviour of the console can be overridden by setting the environment variable PYTHONLEGACYWINDOWSSTDIO before starting Python. In that case, the console codepages are used as for any other character device. Under all platforms, you can override the character encoding by setting the "PYTHONIOENCODING" environment variable before starting Python or by using the new "-X" "utf8" command line option and "PYTHONUTF8" environment variable. However, for the Windows console, this only applies when "PYTHONLEGACYWINDOWSSTDIO" is also set. * When interactive, the "stdout" stream is line-buffered. Otherwise, it is block-buffered like regular text files. The "stderr" stream is line-buffered in both cases. You can make both streams unbuffered by passing the "-u" command-line option or setting the "PYTHONUNBUFFERED" environment variable. Changed in version 3.9: Non-interactive "stderr" is now line- buffered instead of fully buffered. Note: To write or read binary data from/to the standard streams, use the underlying binary "buffer" object. For example, to write bytes to "stdout", use "sys.stdout.buffer.write(b'abc')".However, if you are writing a library (and do not control in which context its code will be executed), be aware that the standard streams may be replaced with file-like objects like "io.StringIO" which do not support the "buffer" attribute. sys.__stdin__ sys.__stdout__ sys.__stderr__ These objects contain the original values of "stdin", "stderr" and "stdout" at the start of the program. They are used during finalization, and could be useful to print to the actual standard stream no matter if the "sys.std*" object has been redirected. It can also be used to restore the actual files to known working file objects in case they have been overwritten with a broken object. However, the preferred way to do this is to explicitly save the previous stream before replacing it, and restore the saved object. Note: Under some conditions "stdin", "stdout" and "stderr" as well as the original values "__stdin__", "__stdout__" and "__stderr__" can be "None". It is usually the case for Windows GUI apps that aren’t connected to a console and Python apps started with **pythonw**. sys.thread_info A *named tuple* holding information about the thread implementation. +--------------------+-----------------------------------------------------------+ | Attribute | Explanation | |====================|===========================================================| | "name" | Name of the thread implementation: * "'nt'": Windows | | | threads * "'pthread'": POSIX threads * "'solaris'": | | | Solaris threads | +--------------------+-----------------------------------------------------------+ | "lock" | Name of the lock implementation: * "'semaphore'": a lock | | | uses a semaphore * "'mutex+cond'": a lock uses a mutex | | | and a condition variable * "None" if this information is | | | unknown | +--------------------+-----------------------------------------------------------+ | "version" | Name and version of the thread library. It is a string, | | | or "None" if this information is unknown. | +--------------------+-----------------------------------------------------------+ New in version 3.3. sys.tracebacklimit When this variable is set to an integer value, it determines the maximum number of levels of traceback information printed when an unhandled exception occurs. The default is "1000". When set to "0" or less, all traceback information is suppressed and only the exception type and value are printed. sys.unraisablehook(unraisable, /) Handle an unraisable exception. Called when an exception has occurred but there is no way for Python to handle it. For example, when a destructor raises an exception or during garbage collection ("gc.collect()"). The *unraisable* argument has the following attributes: * *exc_type*: Exception type. * *exc_value*: Exception value, can be "None". * *exc_traceback*: Exception traceback, can be "None". * *err_msg*: Error message, can be "None". * *object*: Object causing the exception, can be "None". The default hook formats *err_msg* and *object* as: "f'{err_msg}: {object!r}'"; use “Exception ignored in” error message if *err_msg* is "None". "sys.unraisablehook()" can be overridden to control how unraisable exceptions are handled. Storing *exc_value* using a custom hook can create a reference cycle. It should be cleared explicitly to break the reference cycle when the exception is no longer needed. Storing *object* using a custom hook can resurrect it if it is set to an object which is being finalized. Avoid storing *object* after the custom hook completes to avoid resurrecting objects. See also "excepthook()" which handles uncaught exceptions. Raise an auditing event "sys.unraisablehook" with arguments "hook", "unraisable" when an exception that cannot be handled occurs. The "unraisable" object is the same as what will be passed to the hook. If no hook has been set, "hook" may be "None". New in version 3.8. sys.version A string containing the version number of the Python interpreter plus additional information on the build number and compiler used. This string is displayed when the interactive interpreter is started. Do not extract version information out of it, rather, use "version_info" and the functions provided by the "platform" module. sys.api_version The C API version for this interpreter. Programmers may find this useful when debugging version conflicts between Python and extension modules. sys.version_info A tuple containing the five components of the version number: *major*, *minor*, *micro*, *releaselevel*, and *serial*. All values except *releaselevel* are integers; the release level is "'alpha'", "'beta'", "'candidate'", or "'final'". The "version_info" value corresponding to the Python version 2.0 is "(2, 0, 0, 'final', 0)". The components can also be accessed by name, so "sys.version_info[0]" is equivalent to "sys.version_info.major" and so on. Changed in version 3.1: Added named component attributes. sys.warnoptions This is an implementation detail of the warnings framework; do not modify this value. Refer to the "warnings" module for more information on the warnings framework. sys.winver The version number used to form registry keys on Windows platforms. This is stored as string resource 1000 in the Python DLL. The value is normally the first three characters of "version". It is provided in the "sys" module for informational purposes; modifying this value has no effect on the registry keys used by Python. Availability: Windows. sys._xoptions A dictionary of the various implementation-specific flags passed through the "-X" command-line option. Option names are either mapped to their values, if given explicitly, or to "True". Example: $ ./python -Xa=b -Xc Python 3.2a3+ (py3k, Oct 16 2010, 20:14:50) [GCC 4.4.3] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import sys >>> sys._xoptions {'a': 'b', 'c': True} **CPython implementation detail:** This is a CPython-specific way of accessing options passed through "-X". Other implementations may export them through other means, or not at all. New in version 3.2. -[ Citations ]- [C99] ISO/IEC 9899:1999. “Programming languages – C.” A public draft of this standard is available at http://www.open- std.org/jtc1/sc22/wg14/www/docs/n1256.pdf. "sysconfig" — Provide access to Python’s configuration information ****************************************************************** New in version 3.2. **Source code:** Lib/sysconfig.py ====================================================================== The "sysconfig" module provides access to Python’s configuration information like the list of installation paths and the configuration variables relevant for the current platform. Configuration variables ======================= A Python distribution contains a "Makefile" and a "pyconfig.h" header file that are necessary to build both the Python binary itself and third-party C extensions compiled using "distutils". "sysconfig" puts all variables found in these files in a dictionary that can be accessed using "get_config_vars()" or "get_config_var()". Notice that on Windows, it’s a much smaller set. sysconfig.get_config_vars(*args) With no arguments, return a dictionary of all configuration variables relevant for the current platform. With arguments, return a list of values that result from looking up each argument in the configuration variable dictionary. For each argument, if the value is not found, return "None". sysconfig.get_config_var(name) Return the value of a single variable *name*. Equivalent to "get_config_vars().get(name)". If *name* is not found, return "None". Example of usage: >>> import sysconfig >>> sysconfig.get_config_var('Py_ENABLE_SHARED') 0 >>> sysconfig.get_config_var('LIBDIR') '/usr/local/lib' >>> sysconfig.get_config_vars('AR', 'CXX') ['ar', 'g++'] Installation paths ================== Python uses an installation scheme that differs depending on the platform and on the installation options. These schemes are stored in "sysconfig" under unique identifiers based on the value returned by "os.name". Every new component that is installed using "distutils" or a Distutils-based system will follow the same scheme to copy its file in the right places. Python currently supports six schemes: * *posix_prefix*: scheme for POSIX platforms like Linux or macOS. This is the default scheme used when Python or a component is installed. * *posix_home*: scheme for POSIX platforms used when a *home* option is used upon installation. This scheme is used when a component is installed through Distutils with a specific home prefix. * *posix_user*: scheme for POSIX platforms used when a component is installed through Distutils and the *user* option is used. This scheme defines paths located under the user home directory. * *nt*: scheme for NT platforms like Windows. * *nt_user*: scheme for NT platforms, when the *user* option is used. * *osx_framework_user*: scheme for macOS, when the *user* option is used. Each scheme is itself composed of a series of paths and each path has a unique identifier. Python currently uses eight paths: * *stdlib*: directory containing the standard Python library files that are not platform-specific. * *platstdlib*: directory containing the standard Python library files that are platform-specific. * *platlib*: directory for site-specific, platform-specific files. * *purelib*: directory for site-specific, non-platform-specific files. * *include*: directory for non-platform-specific header files. * *platinclude*: directory for platform-specific header files. * *scripts*: directory for script files. * *data*: directory for data files. "sysconfig" provides some functions to determine these paths. sysconfig.get_scheme_names() Return a tuple containing all schemes currently supported in "sysconfig". sysconfig.get_path_names() Return a tuple containing all path names currently supported in "sysconfig". sysconfig.get_path(name[, scheme[, vars[, expand]]]) Return an installation path corresponding to the path *name*, from the install scheme named *scheme*. *name* has to be a value from the list returned by "get_path_names()". "sysconfig" stores installation paths corresponding to each path name, for each platform, with variables to be expanded. For instance the *stdlib* path for the *nt* scheme is: "{base}/Lib". "get_path()" will use the variables returned by "get_config_vars()" to expand the path. All variables have default values for each platform so one may call this function and get the default value. If *scheme* is provided, it must be a value from the list returned by "get_scheme_names()". Otherwise, the default scheme for the current platform is used. If *vars* is provided, it must be a dictionary of variables that will update the dictionary return by "get_config_vars()". If *expand* is set to "False", the path will not be expanded using the variables. If *name* is not found, raise a "KeyError". sysconfig.get_paths([scheme[, vars[, expand]]]) Return a dictionary containing all installation paths corresponding to an installation scheme. See "get_path()" for more information. If *scheme* is not provided, will use the default scheme for the current platform. If *vars* is provided, it must be a dictionary of variables that will update the dictionary used to expand the paths. If *expand* is set to false, the paths will not be expanded. If *scheme* is not an existing scheme, "get_paths()" will raise a "KeyError". Other functions =============== sysconfig.get_python_version() Return the "MAJOR.MINOR" Python version number as a string. Similar to "'%d.%d' % sys.version_info[:2]". sysconfig.get_platform() Return a string that identifies the current platform. This is used mainly to distinguish platform-specific build directories and platform-specific built distributions. Typically includes the OS name and version and the architecture (as supplied by ‘os.uname()’), although the exact information included depends on the OS; e.g., on Linux, the kernel version isn’t particularly important. Examples of returned values: * linux-i586 * linux-alpha (?) * solaris-2.6-sun4u Windows will return one of: * win-amd64 (64bit Windows on AMD64, aka x86_64, Intel64, and EM64T) * win32 (all others - specifically, sys.platform is returned) macOS can return: * macosx-10.6-ppc * macosx-10.4-ppc64 * macosx-10.3-i386 * macosx-10.4-fat For other non-POSIX platforms, currently just returns "sys.platform". sysconfig.is_python_build() Return "True" if the running Python interpreter was built from source and is being run from its built location, and not from a location resulting from e.g. running "make install" or installing via a binary installer. sysconfig.parse_config_h(fp[, vars]) Parse a "config.h"-style file. *fp* is a file-like object pointing to the "config.h"-like file. A dictionary containing name/value pairs is returned. If an optional dictionary is passed in as the second argument, it is used instead of a new dictionary, and updated with the values read in the file. sysconfig.get_config_h_filename() Return the path of "pyconfig.h". sysconfig.get_makefile_filename() Return the path of "Makefile". Using "sysconfig" as a script ============================= You can use "sysconfig" as a script with Python’s *-m* option: $ python -m sysconfig Platform: "macosx-10.4-i386" Python version: "3.2" Current installation scheme: "posix_prefix" Paths: data = "/usr/local" include = "/Users/tarek/Dev/svn.python.org/py3k/Include" platinclude = "." platlib = "/usr/local/lib/python3.2/site-packages" platstdlib = "/usr/local/lib/python3.2" purelib = "/usr/local/lib/python3.2/site-packages" scripts = "/usr/local/bin" stdlib = "/usr/local/lib/python3.2" Variables: AC_APPLE_UNIVERSAL_BUILD = "0" AIX_GENUINE_CPLUSPLUS = "0" AR = "ar" ARFLAGS = "rc" ... This call will print in the standard output the information returned by "get_platform()", "get_python_version()", "get_path()" and "get_config_vars()". "syslog" — Unix syslog library routines *************************************** ====================================================================== This module provides an interface to the Unix "syslog" library routines. Refer to the Unix manual pages for a detailed description of the "syslog" facility. This module wraps the system "syslog" family of routines. A pure Python library that can speak to a syslog server is available in the "logging.handlers" module as "SysLogHandler". The module defines the following functions: syslog.syslog(message) syslog.syslog(priority, message) Send the string *message* to the system logger. A trailing newline is added if necessary. Each message is tagged with a priority composed of a *facility* and a *level*. The optional *priority* argument, which defaults to "LOG_INFO", determines the message priority. If the facility is not encoded in *priority* using logical-or ("LOG_INFO | LOG_USER"), the value given in the "openlog()" call is used. If "openlog()" has not been called prior to the call to "syslog()", "openlog()" will be called with no arguments. Raises an auditing event "syslog.syslog" with arguments "priority", "message". syslog.openlog([ident[, logoption[, facility]]]) Logging options of subsequent "syslog()" calls can be set by calling "openlog()". "syslog()" will call "openlog()" with no arguments if the log is not currently open. The optional *ident* keyword argument is a string which is prepended to every message, and defaults to "sys.argv[0]" with leading path components stripped. The optional *logoption* keyword argument (default is 0) is a bit field – see below for possible values to combine. The optional *facility* keyword argument (default is "LOG_USER") sets the default facility for messages which do not have a facility explicitly encoded. Raises an auditing event "syslog.openlog" with arguments "ident", "logoption", "facility". Changed in version 3.2: In previous versions, keyword arguments were not allowed, and *ident* was required. The default for *ident* was dependent on the system libraries, and often was "python" instead of the name of the Python program file. syslog.closelog() Reset the syslog module values and call the system library "closelog()". This causes the module to behave as it does when initially imported. For example, "openlog()" will be called on the first "syslog()" call (if "openlog()" hasn’t already been called), and *ident* and other "openlog()" parameters are reset to defaults. Raises an auditing event "syslog.closelog" with no arguments. syslog.setlogmask(maskpri) Set the priority mask to *maskpri* and return the previous mask value. Calls to "syslog()" with a priority level not set in *maskpri* are ignored. The default is to log all priorities. The function "LOG_MASK(pri)" calculates the mask for the individual priority *pri*. The function "LOG_UPTO(pri)" calculates the mask for all priorities up to and including *pri*. Raises an auditing event "syslog.setlogmask" with argument "maskpri". The module defines the following constants: Priority levels (high to low): "LOG_EMERG", "LOG_ALERT", "LOG_CRIT", "LOG_ERR", "LOG_WARNING", "LOG_NOTICE", "LOG_INFO", "LOG_DEBUG". Facilities: "LOG_KERN", "LOG_USER", "LOG_MAIL", "LOG_DAEMON", "LOG_AUTH", "LOG_LPR", "LOG_NEWS", "LOG_UUCP", "LOG_CRON", "LOG_SYSLOG", "LOG_LOCAL0" to "LOG_LOCAL7", and, if defined in "", "LOG_AUTHPRIV". Log options: "LOG_PID", "LOG_CONS", "LOG_NDELAY", and, if defined in "", "LOG_ODELAY", "LOG_NOWAIT", and "LOG_PERROR". Examples ======== Simple example -------------- A simple set of examples: import syslog syslog.syslog('Processing started') if error: syslog.syslog(syslog.LOG_ERR, 'Processing started') An example of setting some log options, these would include the process ID in logged messages, and write the messages to the destination facility used for mail logging: syslog.openlog(logoption=syslog.LOG_PID, facility=syslog.LOG_MAIL) syslog.syslog('E-mail processing initiated...') "tabnanny" — Detection of ambiguous indentation *********************************************** **Source code:** Lib/tabnanny.py ====================================================================== For the time being this module is intended to be called as a script. However it is possible to import it into an IDE and use the function "check()" described below. Note: The API provided by this module is likely to change in future releases; such changes may not be backward compatible. tabnanny.check(file_or_dir) If *file_or_dir* is a directory and not a symbolic link, then recursively descend the directory tree named by *file_or_dir*, checking all ".py" files along the way. If *file_or_dir* is an ordinary Python source file, it is checked for whitespace related problems. The diagnostic messages are written to standard output using the "print()" function. tabnanny.verbose Flag indicating whether to print verbose messages. This is incremented by the "-v" option if called as a script. tabnanny.filename_only Flag indicating whether to print only the filenames of files containing whitespace related problems. This is set to true by the "-q" option if called as a script. exception tabnanny.NannyNag Raised by "process_tokens()" if detecting an ambiguous indent. Captured and handled in "check()". tabnanny.process_tokens(tokens) This function is used by "check()" to process tokens generated by the "tokenize" module. See also: Module "tokenize" Lexical scanner for Python source code. "tarfile" — Read and write tar archive files ******************************************** **Source code:** Lib/tarfile.py ====================================================================== The "tarfile" module makes it possible to read and write tar archives, including those using gzip, bz2 and lzma compression. Use the "zipfile" module to read or write ".zip" files, or the higher-level functions in shutil. Some facts and figures: * reads and writes "gzip", "bz2" and "lzma" compressed archives if the respective modules are available. * read/write support for the POSIX.1-1988 (ustar) format. * read/write support for the GNU tar format including *longname* and *longlink* extensions, read-only support for all variants of the *sparse* extension including restoration of sparse files. * read/write support for the POSIX.1-2001 (pax) format. * handles directories, regular files, hardlinks, symbolic links, fifos, character devices and block devices and is able to acquire and restore file information like timestamp, access permissions and owner. Changed in version 3.3: Added support for "lzma" compression. tarfile.open(name=None, mode='r', fileobj=None, bufsize=10240, **kwargs) Return a "TarFile" object for the pathname *name*. For detailed information on "TarFile" objects and the keyword arguments that are allowed, see TarFile Objects. *mode* has to be a string of the form "'filemode[:compression]'", it defaults to "'r'". Here is a full list of mode combinations: +--------------------+-----------------------------------------------+ | mode | action | |====================|===============================================| | "'r' or 'r:*'" | Open for reading with transparent compression | | | (recommended). | +--------------------+-----------------------------------------------+ | "'r:'" | Open for reading exclusively without | | | compression. | +--------------------+-----------------------------------------------+ | "'r:gz'" | Open for reading with gzip compression. | +--------------------+-----------------------------------------------+ | "'r:bz2'" | Open for reading with bzip2 compression. | +--------------------+-----------------------------------------------+ | "'r:xz'" | Open for reading with lzma compression. | +--------------------+-----------------------------------------------+ | "'x'" or "'x:'" | Create a tarfile exclusively without | | | compression. Raise an "FileExistsError" | | | exception if it already exists. | +--------------------+-----------------------------------------------+ | "'x:gz'" | Create a tarfile with gzip compression. Raise | | | an "FileExistsError" exception if it already | | | exists. | +--------------------+-----------------------------------------------+ | "'x:bz2'" | Create a tarfile with bzip2 compression. | | | Raise an "FileExistsError" exception if it | | | already exists. | +--------------------+-----------------------------------------------+ | "'x:xz'" | Create a tarfile with lzma compression. Raise | | | an "FileExistsError" exception if it already | | | exists. | +--------------------+-----------------------------------------------+ | "'a' or 'a:'" | Open for appending with no compression. The | | | file is created if it does not exist. | +--------------------+-----------------------------------------------+ | "'w' or 'w:'" | Open for uncompressed writing. | +--------------------+-----------------------------------------------+ | "'w:gz'" | Open for gzip compressed writing. | +--------------------+-----------------------------------------------+ | "'w:bz2'" | Open for bzip2 compressed writing. | +--------------------+-----------------------------------------------+ | "'w:xz'" | Open for lzma compressed writing. | +--------------------+-----------------------------------------------+ Note that "'a:gz'", "'a:bz2'" or "'a:xz'" is not possible. If *mode* is not suitable to open a certain (compressed) file for reading, "ReadError" is raised. Use *mode* "'r'" to avoid this. If a compression method is not supported, "CompressionError" is raised. If *fileobj* is specified, it is used as an alternative to a *file object* opened in binary mode for *name*. It is supposed to be at position 0. For modes "'w:gz'", "'r:gz'", "'w:bz2'", "'r:bz2'", "'x:gz'", "'x:bz2'", "tarfile.open()" accepts the keyword argument *compresslevel* (default "9") to specify the compression level of the file. For modes "'w:xz'" and "'x:xz'", "tarfile.open()" accepts the keyword argument *preset* to specify the compression level of the file. For special purposes, there is a second format for *mode*: "'filemode|[compression]'". "tarfile.open()" will return a "TarFile" object that processes its data as a stream of blocks. No random seeking will be done on the file. If given, *fileobj* may be any object that has a "read()" or "write()" method (depending on the *mode*). *bufsize* specifies the blocksize and defaults to "20 * 512" bytes. Use this variant in combination with e.g. "sys.stdin", a socket *file object* or a tape device. However, such a "TarFile" object is limited in that it does not allow random access, see Examples. The currently possible modes: +---------------+----------------------------------------------+ | Mode | Action | |===============|==============================================| | "'r|*'" | Open a *stream* of tar blocks for reading | | | with transparent compression. | +---------------+----------------------------------------------+ | "'r|'" | Open a *stream* of uncompressed tar blocks | | | for reading. | +---------------+----------------------------------------------+ | "'r|gz'" | Open a gzip compressed *stream* for reading. | +---------------+----------------------------------------------+ | "'r|bz2'" | Open a bzip2 compressed *stream* for | | | reading. | +---------------+----------------------------------------------+ | "'r|xz'" | Open an lzma compressed *stream* for | | | reading. | +---------------+----------------------------------------------+ | "'w|'" | Open an uncompressed *stream* for writing. | +---------------+----------------------------------------------+ | "'w|gz'" | Open a gzip compressed *stream* for writing. | +---------------+----------------------------------------------+ | "'w|bz2'" | Open a bzip2 compressed *stream* for | | | writing. | +---------------+----------------------------------------------+ | "'w|xz'" | Open an lzma compressed *stream* for | | | writing. | +---------------+----------------------------------------------+ Changed in version 3.5: The "'x'" (exclusive creation) mode was added. Changed in version 3.6: The *name* parameter accepts a *path-like object*. class tarfile.TarFile Class for reading and writing tar archives. Do not use this class directly: use "tarfile.open()" instead. See TarFile Objects. tarfile.is_tarfile(name) Return "True" if *name* is a tar archive file, that the "tarfile" module can read. *name* may be a "str", file, or file-like object. Changed in version 3.9: Support for file and file-like objects. The "tarfile" module defines the following exceptions: exception tarfile.TarError Base class for all "tarfile" exceptions. exception tarfile.ReadError Is raised when a tar archive is opened, that either cannot be handled by the "tarfile" module or is somehow invalid. exception tarfile.CompressionError Is raised when a compression method is not supported or when the data cannot be decoded properly. exception tarfile.StreamError Is raised for the limitations that are typical for stream-like "TarFile" objects. exception tarfile.ExtractError Is raised for *non-fatal* errors when using "TarFile.extract()", but only if "TarFile.errorlevel""== 2". exception tarfile.HeaderError Is raised by "TarInfo.frombuf()" if the buffer it gets is invalid. exception tarfile.FilterError Base class for members refused by filters. tarinfo Information about the member that the filter refused to extract, as TarInfo. exception tarfile.AbsolutePathError Raised to refuse extracting a member with an absolute path. exception tarfile.OutsideDestinationError Raised to refuse extracting a member outside the destination directory. exception tarfile.SpecialFileError Raised to refuse extracting a special file (e.g. a device or pipe). exception tarfile.AbsoluteLinkError Raised to refuse extracting a symbolic link with an absolute path. exception tarfile.LinkOutsideDestinationError Raised to refuse extracting a symbolic link pointing outside the destination directory. exception tarfile.LinkFallbackError Raised to refuse emulating a link (hard or symbolic) by extracting another archive member, when that member would be rejected by the filter location. The exception that was raised to reject the replacement member is available as "BaseException.__context__". New in version 3.9.23 (unreleased). The following constants are available at the module level: tarfile.ENCODING The default character encoding: "'utf-8'" on Windows, the value returned by "sys.getfilesystemencoding()" otherwise. Each of the following constants defines a tar archive format that the "tarfile" module is able to create. See section Supported tar formats for details. tarfile.USTAR_FORMAT POSIX.1-1988 (ustar) format. tarfile.GNU_FORMAT GNU tar format. tarfile.PAX_FORMAT POSIX.1-2001 (pax) format. tarfile.DEFAULT_FORMAT The default format for creating archives. This is currently "PAX_FORMAT". Changed in version 3.8: The default format for new archives was changed to "PAX_FORMAT" from "GNU_FORMAT". See also: Module "zipfile" Documentation of the "zipfile" standard module. Archiving operations Documentation of the higher-level archiving facilities provided by the standard "shutil" module. GNU tar manual, Basic Tar Format Documentation for tar archive files, including GNU tar extensions. TarFile Objects =============== The "TarFile" object provides an interface to a tar archive. A tar archive is a sequence of blocks. An archive member (a stored file) is made up of a header block followed by data blocks. It is possible to store a file in a tar archive several times. Each archive member is represented by a "TarInfo" object, see TarInfo Objects for details. A "TarFile" object can be used as a context manager in a "with" statement. It will automatically be closed when the block is completed. Please note that in the event of an exception an archive opened for writing will not be finalized; only the internally used file object will be closed. See the Examples section for a use case. New in version 3.2: Added support for the context management protocol. class tarfile.TarFile(name=None, mode='r', fileobj=None, format=DEFAULT_FORMAT, tarinfo=TarInfo, dereference=False, ignore_zeros=False, encoding=ENCODING, errors='surrogateescape', pax_headers=None, debug=0, errorlevel=1) All following arguments are optional and can be accessed as instance attributes as well. *name* is the pathname of the archive. *name* may be a *path-like object*. It can be omitted if *fileobj* is given. In this case, the file object’s "name" attribute is used if it exists. *mode* is either "'r'" to read from an existing archive, "'a'" to append data to an existing file, "'w'" to create a new file overwriting an existing one, or "'x'" to create a new file only if it does not already exist. If *fileobj* is given, it is used for reading or writing data. If it can be determined, *mode* is overridden by *fileobj*’s mode. *fileobj* will be used from position 0. Note: *fileobj* is not closed, when "TarFile" is closed. *format* controls the archive format for writing. It must be one of the constants "USTAR_FORMAT", "GNU_FORMAT" or "PAX_FORMAT" that are defined at module level. When reading, format will be automatically detected, even if different formats are present in a single archive. The *tarinfo* argument can be used to replace the default "TarInfo" class with a different one. If *dereference* is "False", add symbolic and hard links to the archive. If it is "True", add the content of the target files to the archive. This has no effect on systems that do not support symbolic links. If *ignore_zeros* is "False", treat an empty block as the end of the archive. If it is "True", skip empty (and invalid) blocks and try to get as many members as possible. This is only useful for reading concatenated or damaged archives. *debug* can be set from "0" (no debug messages) up to "3" (all debug messages). The messages are written to "sys.stderr". *errorlevel* controls how extraction errors are handled, see "the corresponding attribute". The *encoding* and *errors* arguments define the character encoding to be used for reading or writing the archive and how conversion errors are going to be handled. The default settings will work for most users. See section Unicode issues for in-depth information. The *pax_headers* argument is an optional dictionary of strings which will be added as a pax global header if *format* is "PAX_FORMAT". Changed in version 3.2: Use "'surrogateescape'" as the default for the *errors* argument. Changed in version 3.5: The "'x'" (exclusive creation) mode was added. Changed in version 3.6: The *name* parameter accepts a *path-like object*. classmethod TarFile.open(...) Alternative constructor. The "tarfile.open()" function is actually a shortcut to this classmethod. TarFile.getmember(name) Return a "TarInfo" object for member *name*. If *name* can not be found in the archive, "KeyError" is raised. Note: If a member occurs more than once in the archive, its last occurrence is assumed to be the most up-to-date version. TarFile.getmembers() Return the members of the archive as a list of "TarInfo" objects. The list has the same order as the members in the archive. TarFile.getnames() Return the members as a list of their names. It has the same order as the list returned by "getmembers()". TarFile.list(verbose=True, *, members=None) Print a table of contents to "sys.stdout". If *verbose* is "False", only the names of the members are printed. If it is "True", output similar to that of **ls -l** is produced. If optional *members* is given, it must be a subset of the list returned by "getmembers()". Changed in version 3.5: Added the *members* parameter. TarFile.next() Return the next member of the archive as a "TarInfo" object, when "TarFile" is opened for reading. Return "None" if there is no more available. TarFile.extractall(path=".", members=None, *, numeric_owner=False, filter=None) Extract all members from the archive to the current working directory or directory *path*. If optional *members* is given, it must be a subset of the list returned by "getmembers()". Directory information like owner, modification time and permissions are set after all members have been extracted. This is done to work around two problems: A directory’s modification time is reset each time a file is created in it. And, if a directory’s permissions do not allow writing, extracting files to it will fail. If *numeric_owner* is "True", the uid and gid numbers from the tarfile are used to set the owner/group for the extracted files. Otherwise, the named values from the tarfile are used. The *filter* argument, which was added in Python 3.9.17, specifies how "members" are modified or rejected before extraction. See Extraction filters for details. It is recommended to set this explicitly depending on which *tar* features you need to support. Warning: Never extract archives from untrusted sources without prior inspection. It is possible that files are created outside of *path*, e.g. members that have absolute filenames starting with ""/"" or filenames with two dots "".."".Set "filter='data'" to prevent the most dangerous security issues, and read the Extraction filters section for details. Changed in version 3.5: Added the *numeric_owner* parameter. Changed in version 3.6: The *path* parameter accepts a *path-like object*. Changed in version 3.9.17: Added the *filter* parameter. TarFile.extract(member, path="", set_attrs=True, *, numeric_owner=False, filter=None) Extract a member from the archive to the current working directory, using its full name. Its file information is extracted as accurately as possible. *member* may be a filename or a "TarInfo" object. You can specify a different directory using *path*. *path* may be a *path-like object*. File attributes (owner, mtime, mode) are set unless *set_attrs* is false. The *numeric_owner* and *filter* arguments are the same as for "extractall()". Note: The "extract()" method does not take care of several extraction issues. In most cases you should consider using the "extractall()" method. Warning: See the warning for "extractall()".Set "filter='data'" to prevent the most dangerous security issues, and read the Extraction filters section for details. Changed in version 3.2: Added the *set_attrs* parameter. Changed in version 3.5: Added the *numeric_owner* parameter. Changed in version 3.6: The *path* parameter accepts a *path-like object*. Changed in version 3.9.17: Added the *filter* parameter. TarFile.extractfile(member) Extract a member from the archive as a file object. *member* may be a filename or a "TarInfo" object. If *member* is a regular file or a link, an "io.BufferedReader" object is returned. For all other existing members, "None" is returned. If *member* does not appear in the archive, "KeyError" is raised. Changed in version 3.3: Return an "io.BufferedReader" object. TarFile.errorlevel: int If *errorlevel* is "0", errors are ignored when using "TarFile.extract()" and "TarFile.extractall()". Nevertheless, they appear as error messages in the debug output when *debug* is greater than 0. If "1" (the default), all *fatal* errors are raised as "OSError" or "FilterError" exceptions. If "2", all *non-fatal* errors are raised as "TarError" exceptions as well. Some exceptions, e.g. ones caused by wrong argument types or data corruption, are always raised. Custom extraction filters should raise "FilterError" for *fatal* errors and "ExtractError" for *non-fatal* ones. Note that when an exception is raised, the archive may be partially extracted. It is the user’s responsibility to clean up. TarFile.extraction_filter New in version 3.9.17. The extraction filter used as a default for the *filter* argument of "extract()" and "extractall()". The attribute may be "None" or a callable. String names are not allowed for this attribute, unlike the *filter* argument to "extract()". If "extraction_filter" is "None" (the default), calling an extraction method without a *filter* argument will use the "fully_trusted" filter for compatibility with previous Python versions. In Python 3.12+, leaving "extraction_filter=None" will emit a "DeprecationWarning". In Python 3.14+, leaving "extraction_filter=None" will cause extraction methods to use the "data" filter by default. The attribute may be set on instances or overridden in subclasses. It also is possible to set it on the "TarFile" class itself to set a global default, although, since it affects all uses of *tarfile*, it is best practice to only do so in top-level applications or "site configuration". To set a global default this way, a filter function needs to be wrapped in "staticmethod()" to prevent injection of a "self" argument. TarFile.add(name, arcname=None, recursive=True, *, filter=None) Add the file *name* to the archive. *name* may be any type of file (directory, fifo, symbolic link, etc.). If given, *arcname* specifies an alternative name for the file in the archive. Directories are added recursively by default. This can be avoided by setting *recursive* to "False". Recursion adds entries in sorted order. If *filter* is given, it should be a function that takes a "TarInfo" object argument and returns the changed "TarInfo" object. If it instead returns "None" the "TarInfo" object will be excluded from the archive. See Examples for an example. Changed in version 3.2: Added the *filter* parameter. Changed in version 3.7: Recursion adds entries in sorted order. TarFile.addfile(tarinfo, fileobj=None) Add the "TarInfo" object *tarinfo* to the archive. If *fileobj* is given, it should be a *binary file*, and "tarinfo.size" bytes are read from it and added to the archive. You can create "TarInfo" objects directly, or by using "gettarinfo()". TarFile.gettarinfo(name=None, arcname=None, fileobj=None) Create a "TarInfo" object from the result of "os.stat()" or equivalent on an existing file. The file is either named by *name*, or specified as a *file object* *fileobj* with a file descriptor. *name* may be a *path-like object*. If given, *arcname* specifies an alternative name for the file in the archive, otherwise, the name is taken from *fileobj*’s "name" attribute, or the *name* argument. The name should be a text string. You can modify some of the "TarInfo"’s attributes before you add it using "addfile()". If the file object is not an ordinary file object positioned at the beginning of the file, attributes such as "size" may need modifying. This is the case for objects such as "GzipFile". The "name" may also be modified, in which case *arcname* could be a dummy string. Changed in version 3.6: The *name* parameter accepts a *path-like object*. TarFile.close() Close the "TarFile". In write mode, two finishing zero blocks are appended to the archive. TarFile.pax_headers A dictionary containing key-value pairs of pax global headers. TarInfo Objects =============== A "TarInfo" object represents one member in a "TarFile". Aside from storing all required attributes of a file (like file type, size, time, permissions, owner etc.), it provides some useful methods to determine its type. It does *not* contain the file’s data itself. "TarInfo" objects are returned by "TarFile"’s methods "getmember()", "getmembers()" and "gettarinfo()". Modifying the objects returned by "getmember()" or "getmembers()" will affect all subsequent operations on the archive. For cases where this is unwanted, you can use "copy.copy()" or call the "replace()" method to create a modified copy in one step. Several attributes can be set to "None" to indicate that a piece of metadata is unused or unknown. Different "TarInfo" methods handle "None" differently: * The "extract()" or "extractall()" methods will ignore the corresponding metadata, leaving it set to a default. * "addfile()" will fail. * "list()" will print a placeholder string. Changed in version 3.9.17: Added "replace()" and handling of "None". class tarfile.TarInfo(name="") Create a "TarInfo" object. classmethod TarInfo.frombuf(buf, encoding, errors) Create and return a "TarInfo" object from string buffer *buf*. Raises "HeaderError" if the buffer is invalid. classmethod TarInfo.fromtarfile(tarfile) Read the next member from the "TarFile" object *tarfile* and return it as a "TarInfo" object. TarInfo.tobuf(format=DEFAULT_FORMAT, encoding=ENCODING, errors='surrogateescape') Create a string buffer from a "TarInfo" object. For information on the arguments see the constructor of the "TarFile" class. Changed in version 3.2: Use "'surrogateescape'" as the default for the *errors* argument. A "TarInfo" object has the following public data attributes: TarInfo.name: str Name of the archive member. TarInfo.size: int Size in bytes. TarInfo.mtime: int | float Time of last modification in seconds since the epoch, as in "os.stat_result.st_mtime". Changed in version 3.9.17: Can be set to "None" for "extract()" and "extractall()", causing extraction to skip applying this attribute. TarInfo.mode: int Permission bits, as for "os.chmod()". Changed in version 3.9.17: Can be set to "None" for "extract()" and "extractall()", causing extraction to skip applying this attribute. TarInfo.type File type. *type* is usually one of these constants: "REGTYPE", "AREGTYPE", "LNKTYPE", "SYMTYPE", "DIRTYPE", "FIFOTYPE", "CONTTYPE", "CHRTYPE", "BLKTYPE", "GNUTYPE_SPARSE". To determine the type of a "TarInfo" object more conveniently, use the "is*()" methods below. TarInfo.linkname: str Name of the target file name, which is only present in "TarInfo" objects of type "LNKTYPE" and "SYMTYPE". For symbolic links ("SYMTYPE"), the *linkname* is relative to the directory that contains the link. For hard links ("LNKTYPE"), the *linkname* is relative to the root of the archive. TarInfo.uid: int User ID of the user who originally stored this member. Changed in version 3.9.17: Can be set to "None" for "extract()" and "extractall()", causing extraction to skip applying this attribute. TarInfo.gid: int Group ID of the user who originally stored this member. Changed in version 3.9.17: Can be set to "None" for "extract()" and "extractall()", causing extraction to skip applying this attribute. TarInfo.uname: str User name. Changed in version 3.9.17: Can be set to "None" for "extract()" and "extractall()", causing extraction to skip applying this attribute. TarInfo.gname: str Group name. Changed in version 3.9.17: Can be set to "None" for "extract()" and "extractall()", causing extraction to skip applying this attribute. TarInfo.pax_headers: dict A dictionary containing key-value pairs of an associated pax extended header. TarInfo.replace(name=..., mtime=..., mode=..., linkname=..., uid=..., gid=..., uname=..., gname=..., deep=True) New in version 3.9.17. Return a *new* copy of the "TarInfo" object with the given attributes changed. For example, to return a "TarInfo" with the group name set to "'staff'", use: new_tarinfo = old_tarinfo.replace(gname='staff') By default, a deep copy is made. If *deep* is false, the copy is shallow, i.e. "pax_headers" and any custom attributes are shared with the original "TarInfo" object. A "TarInfo" object also provides some convenient query methods: TarInfo.isfile() Return "True" if the "Tarinfo" object is a regular file. TarInfo.isreg() Same as "isfile()". TarInfo.isdir() Return "True" if it is a directory. TarInfo.issym() Return "True" if it is a symbolic link. TarInfo.islnk() Return "True" if it is a hard link. TarInfo.ischr() Return "True" if it is a character device. TarInfo.isblk() Return "True" if it is a block device. TarInfo.isfifo() Return "True" if it is a FIFO. TarInfo.isdev() Return "True" if it is one of character device, block device or FIFO. Extraction filters ================== New in version 3.9.17. The *tar* format is designed to capture all details of a UNIX-like filesystem, which makes it very powerful. Unfortunately, the features make it easy to create tar files that have unintended – and possibly malicious – effects when extracted. For example, extracting a tar file can overwrite arbitrary files in various ways (e.g. by using absolute paths, ".." path components, or symlinks that affect later members). In most cases, the full functionality is not needed. Therefore, *tarfile* supports extraction filters: a mechanism to limit functionality, and thus mitigate some of the security issues. See also: **PEP 706** Contains further motivation and rationale behind the design. The *filter* argument to "TarFile.extract()" or "extractall()" can be: * the string "'fully_trusted'": Honor all metadata as specified in the archive. Should be used if the user trusts the archive completely, or implements their own complex verification. * the string "'tar'": Honor most *tar*-specific features (i.e. features of UNIX-like filesystems), but block features that are very likely to be surprising or malicious. See "tar_filter()" for details. * the string "'data'": Ignore or block most features specific to UNIX- like filesystems. Intended for extracting cross-platform data archives. See "data_filter()" for details. * "None" (default): Use "TarFile.extraction_filter". If that is also "None" (the default), the "'fully_trusted'" filter will be used (for compatibility with earlier versions of Python). In Python 3.12, the default will emit a "DeprecationWarning". In Python 3.14, the "'data'" filter will become the default instead. It’s possible to switch earlier; see "TarFile.extraction_filter". * A callable which will be called for each extracted member with a TarInfo describing the member and the destination path to where the archive is extracted (i.e. the same path is used for all members): filter(/, member: TarInfo, path: str) -> TarInfo | None The callable is called just before each member is extracted, so it can take the current state of the disk into account. It can: * return a "TarInfo" object which will be used instead of the metadata in the archive, or * return "None", in which case the member will be skipped, or * raise an exception to abort the operation or skip the member, depending on "errorlevel". Note that when extraction is aborted, "extractall()" may leave the archive partially extracted. It does not attempt to clean up. Default named filters --------------------- The pre-defined, named filters are available as functions, so they can be reused in custom filters: tarfile.fully_trusted_filter(/, member, path) Return *member* unchanged. This implements the "'fully_trusted'" filter. tarfile.tar_filter(/, member, path) Implements the "'tar'" filter. * Strip leading slashes ("/" and "os.sep") from filenames. * Refuse to extract files with absolute paths (in case the name is absolute even after stripping slashes, e.g. "C:/foo" on Windows). This raises "AbsolutePathError". * Refuse to extract files whose absolute path (after following symlinks) would end up outside the destination. This raises "OutsideDestinationError". * Clear high mode bits (setuid, setgid, sticky) and group/other write bits ("S_IWOTH"). Return the modified "TarInfo" member. tarfile.data_filter(/, member, path) Implements the "'data'" filter. In addition to what "tar_filter" does: * Normalize link targets ("TarInfo.linkname") using "os.path.normpath()". Note that this removes internal ".." components, which may change the meaning of the link if the path in "TarInfo.linkname" traverses symbolic links. * Refuse to extract links (hard or soft) that link to absolute paths, or ones that link outside the destination. This raises "AbsoluteLinkError" or "LinkOutsideDestinationError". Note that such files are refused even on platforms that do not support symbolic links. * Refuse to extract device files (including pipes). This raises "SpecialFileError". * For regular files, including hard links: * Set the owner read and write permissions ("S_IWUSR"). * Remove the group & other executable permission ("S_IXOTH") if the owner doesn’t have it ("S_IXUSR"). * For other files (directories), set "mode" to "None", so that extraction methods skip applying permission bits. * Set user and group info ("uid", "gid", "uname", "gname") to "None", so that extraction methods skip setting it. Return the modified "TarInfo" member. Changed in version 3.9.23 (unreleased): Link targets are now normalized. Filter errors ------------- When a filter refuses to extract a file, it will raise an appropriate exception, a subclass of "FilterError". This will abort the extraction if "TarFile.errorlevel" is 1 or more. With "errorlevel=0" the error will be logged and the member will be skipped, but extraction will continue. Hints for further verification ------------------------------ Even with "filter='data'", *tarfile* is not suited for extracting untrusted files without prior inspection. Among other issues, the pre- defined filters do not prevent denial-of-service attacks. Users should do additional checks. Here is an incomplete list of things to consider: * Extract to a "new temporary directory" to prevent e.g. exploiting pre-existing links, and to make it easier to clean up after a failed extraction. * Disallow symbolic links if you do not need the functionality. * When working with untrusted data, use external (e.g. OS-level) limits on disk, memory and CPU usage. * Check filenames against an allow-list of characters (to filter out control characters, confusables, foreign path separators, etc.). * Check that filenames have expected extensions (discouraging files that execute when you “click on them”, or extension-less files like Windows special device names). * Limit the number of extracted files, total size of extracted data, filename length (including symlink length), and size of individual files. * Check for files that would be shadowed on case-insensitive filesystems. Also note that: * Tar files may contain multiple versions of the same file. Later ones are expected to overwrite any earlier ones. This feature is crucial to allow updating tape archives, but can be abused maliciously. * *tarfile* does not protect against issues with “live” data, e.g. an attacker tinkering with the destination (or source) directory while extraction (or archiving) is in progress. Supporting older Python versions -------------------------------- Extraction filters were added to Python 3.12, and are backported to older versions as security updates. To check whether the feature is available, use e.g. "hasattr(tarfile, 'data_filter')" rather than checking the Python version. The following examples show how to support Python versions with and without the feature. Note that setting "extraction_filter" will affect any subsequent operations. * Fully trusted archive: my_tarfile.extraction_filter = (lambda member, path: member) my_tarfile.extractall() * Use the "'data'" filter if available, but revert to Python 3.11 behavior ("'fully_trusted'") if this feature is not available: my_tarfile.extraction_filter = getattr(tarfile, 'data_filter', (lambda member, path: member)) my_tarfile.extractall() * Use the "'data'" filter; *fail* if it is not available: my_tarfile.extractall(filter=tarfile.data_filter) or: my_tarfile.extraction_filter = tarfile.data_filter my_tarfile.extractall() * Use the "'data'" filter; *warn* if it is not available: if hasattr(tarfile, 'data_filter'): my_tarfile.extractall(filter='data') else: # remove this when no longer needed warn_the_user('Extracting may be unsafe; consider updating Python') my_tarfile.extractall() Stateful extraction filter example ---------------------------------- While *tarfile*’s extraction methods take a simple *filter* callable, custom filters may be more complex objects with an internal state. It may be useful to write these as context managers, to be used like this: with StatefulFilter() as filter_func: tar.extractall(path, filter=filter_func) Such a filter can be written as, for example: class StatefulFilter: def __init__(self): self.file_count = 0 def __enter__(self): return self def __call__(self, member, path): self.file_count += 1 return member def __exit__(self, *exc_info): print(f'{self.file_count} files extracted') Command-Line Interface ====================== New in version 3.4. The "tarfile" module provides a simple command-line interface to interact with tar archives. If you want to create a new tar archive, specify its name after the "-c" option and then list the filename(s) that should be included: $ python -m tarfile -c monty.tar spam.txt eggs.txt Passing a directory is also acceptable: $ python -m tarfile -c monty.tar life-of-brian_1979/ If you want to extract a tar archive into the current directory, use the "-e" option: $ python -m tarfile -e monty.tar You can also extract a tar archive into a different directory by passing the directory’s name: $ python -m tarfile -e monty.tar other-dir/ For a list of the files in a tar archive, use the "-l" option: $ python -m tarfile -l monty.tar Command-line options -------------------- -l --list List files in a tarfile. -c ... --create ... Create tarfile from source files. -e [] --extract [] Extract tarfile into the current directory if *output_dir* is not specified. -t --test Test whether the tarfile is valid or not. -v, --verbose Verbose output. --filter Specifies the *filter* for "--extract". See Extraction filters for details. Only string names are accepted (that is, "fully_trusted", "tar", and "data"). New in version 3.9.17. Examples ======== How to extract an entire tar archive to the current working directory: import tarfile tar = tarfile.open("sample.tar.gz") tar.extractall() tar.close() How to extract a subset of a tar archive with "TarFile.extractall()" using a generator function instead of a list: import os import tarfile def py_files(members): for tarinfo in members: if os.path.splitext(tarinfo.name)[1] == ".py": yield tarinfo tar = tarfile.open("sample.tar.gz") tar.extractall(members=py_files(tar)) tar.close() How to create an uncompressed tar archive from a list of filenames: import tarfile tar = tarfile.open("sample.tar", "w") for name in ["foo", "bar", "quux"]: tar.add(name) tar.close() The same example using the "with" statement: import tarfile with tarfile.open("sample.tar", "w") as tar: for name in ["foo", "bar", "quux"]: tar.add(name) How to read a gzip compressed tar archive and display some member information: import tarfile tar = tarfile.open("sample.tar.gz", "r:gz") for tarinfo in tar: print(tarinfo.name, "is", tarinfo.size, "bytes in size and is ", end="") if tarinfo.isreg(): print("a regular file.") elif tarinfo.isdir(): print("a directory.") else: print("something else.") tar.close() How to create an archive and reset the user information using the *filter* parameter in "TarFile.add()": import tarfile def reset(tarinfo): tarinfo.uid = tarinfo.gid = 0 tarinfo.uname = tarinfo.gname = "root" return tarinfo tar = tarfile.open("sample.tar.gz", "w:gz") tar.add("foo", filter=reset) tar.close() Supported tar formats ===================== There are three tar formats that can be created with the "tarfile" module: * The POSIX.1-1988 ustar format ("USTAR_FORMAT"). It supports filenames up to a length of at best 256 characters and linknames up to 100 characters. The maximum file size is 8 GiB. This is an old and limited but widely supported format. * The GNU tar format ("GNU_FORMAT"). It supports long filenames and linknames, files bigger than 8 GiB and sparse files. It is the de facto standard on GNU/Linux systems. "tarfile" fully supports the GNU tar extensions for long names, sparse file support is read-only. * The POSIX.1-2001 pax format ("PAX_FORMAT"). It is the most flexible format with virtually no limits. It supports long filenames and linknames, large files and stores pathnames in a portable way. Modern tar implementations, including GNU tar, bsdtar/libarchive and star, fully support extended *pax* features; some old or unmaintained libraries may not, but should treat *pax* archives as if they were in the universally-supported *ustar* format. It is the current default format for new archives. It extends the existing *ustar* format with extra headers for information that cannot be stored otherwise. There are two flavours of pax headers: Extended headers only affect the subsequent file header, global headers are valid for the complete archive and affect all following files. All the data in a pax header is encoded in *UTF-8* for portability reasons. There are some more variants of the tar format which can be read, but not created: * The ancient V7 format. This is the first tar format from Unix Seventh Edition, storing only regular files and directories. Names must not be longer than 100 characters, there is no user/group name information. Some archives have miscalculated header checksums in case of fields with non-ASCII characters. * The SunOS tar extended format. This format is a variant of the POSIX.1-2001 pax format, but is not compatible. Unicode issues ============== The tar format was originally conceived to make backups on tape drives with the main focus on preserving file system information. Nowadays tar archives are commonly used for file distribution and exchanging archives over networks. One problem of the original format (which is the basis of all other formats) is that there is no concept of supporting different character encodings. For example, an ordinary tar archive created on a *UTF-8* system cannot be read correctly on a *Latin-1* system if it contains non-*ASCII* characters. Textual metadata (like filenames, linknames, user/group names) will appear damaged. Unfortunately, there is no way to autodetect the encoding of an archive. The pax format was designed to solve this problem. It stores non-ASCII metadata using the universal character encoding *UTF-8*. The details of character conversion in "tarfile" are controlled by the *encoding* and *errors* keyword arguments of the "TarFile" class. *encoding* defines the character encoding to use for the metadata in the archive. The default value is "sys.getfilesystemencoding()" or "'ascii'" as a fallback. Depending on whether the archive is read or written, the metadata must be either decoded or encoded. If *encoding* is not set appropriately, this conversion may fail. The *errors* argument defines how characters are treated that cannot be converted. Possible values are listed in section Error Handlers. The default scheme is "'surrogateescape'" which Python also uses for its file system calls, see File Names, Command Line Arguments, and Environment Variables. For "PAX_FORMAT" archives (the default), *encoding* is generally not needed because all the metadata is stored using *UTF-8*. *encoding* is only used in the rare cases when binary pax headers are decoded or when strings with surrogate characters are stored. "telnetlib" — Telnet client *************************** **Source code:** Lib/telnetlib.py Deprecated since version 3.11: The "telnetlib" module is deprecated (see **PEP 594** for details and alternatives). ====================================================================== The "telnetlib" module provides a "Telnet" class that implements the Telnet protocol. See **RFC 854** for details about the protocol. In addition, it provides symbolic constants for the protocol characters (see below), and for the telnet options. The symbolic names of the telnet options follow the definitions in "arpa/telnet.h", with the leading "TELOPT_" removed. For symbolic names of options which are traditionally not included in "arpa/telnet.h", see the module source itself. The symbolic constants for the telnet commands are: IAC, DONT, DO, WONT, WILL, SE (Subnegotiation End), NOP (No Operation), DM (Data Mark), BRK (Break), IP (Interrupt process), AO (Abort output), AYT (Are You There), EC (Erase Character), EL (Erase Line), GA (Go Ahead), SB (Subnegotiation Begin). class telnetlib.Telnet(host=None, port=0[, timeout]) "Telnet" represents a connection to a Telnet server. The instance is initially not connected by default; the "open()" method must be used to establish a connection. Alternatively, the host name and optional port number can be passed to the constructor too, in which case the connection to the server will be established before the constructor returns. The optional *timeout* parameter specifies a timeout in seconds for blocking operations like the connection attempt (if not specified, the global default timeout setting will be used). Do not reopen an already connected instance. This class has many "read_*()" methods. Note that some of them raise "EOFError" when the end of the connection is read, because they can return an empty string for other reasons. See the individual descriptions below. A "Telnet" object is a context manager and can be used in a "with" statement. When the "with" block ends, the "close()" method is called: >>> from telnetlib import Telnet >>> with Telnet('localhost', 23) as tn: ... tn.interact() ... Changed in version 3.6: Context manager support added See also: **RFC 854** - Telnet Protocol Specification Definition of the Telnet protocol. Telnet Objects ============== "Telnet" instances have the following methods: Telnet.read_until(expected, timeout=None) Read until a given byte string, *expected*, is encountered or until *timeout* seconds have passed. When no match is found, return whatever is available instead, possibly empty bytes. Raise "EOFError" if the connection is closed and no cooked data is available. Telnet.read_all() Read all data until EOF as bytes; block until connection closed. Telnet.read_some() Read at least one byte of cooked data unless EOF is hit. Return "b''" if EOF is hit. Block if no data is immediately available. Telnet.read_very_eager() Read everything that can be without blocking in I/O (eager). Raise "EOFError" if connection closed and no cooked data available. Return "b''" if no cooked data available otherwise. Do not block unless in the midst of an IAC sequence. Telnet.read_eager() Read readily available data. Raise "EOFError" if connection closed and no cooked data available. Return "b''" if no cooked data available otherwise. Do not block unless in the midst of an IAC sequence. Telnet.read_lazy() Process and return data already in the queues (lazy). Raise "EOFError" if connection closed and no data available. Return "b''" if no cooked data available otherwise. Do not block unless in the midst of an IAC sequence. Telnet.read_very_lazy() Return any data available in the cooked queue (very lazy). Raise "EOFError" if connection closed and no data available. Return "b''" if no cooked data available otherwise. This method never blocks. Telnet.read_sb_data() Return the data collected between a SB/SE pair (suboption begin/end). The callback should access these data when it was invoked with a "SE" command. This method never blocks. Telnet.open(host, port=0[, timeout]) Connect to a host. The optional second argument is the port number, which defaults to the standard Telnet port (23). The optional *timeout* parameter specifies a timeout in seconds for blocking operations like the connection attempt (if not specified, the global default timeout setting will be used). Do not try to reopen an already connected instance. Raises an auditing event "telnetlib.Telnet.open" with arguments "self", "host", "port". Telnet.msg(msg, *args) Print a debug message when the debug level is ">" 0. If extra arguments are present, they are substituted in the message using the standard string formatting operator. Telnet.set_debuglevel(debuglevel) Set the debug level. The higher the value of *debuglevel*, the more debug output you get (on "sys.stdout"). Telnet.close() Close the connection. Telnet.get_socket() Return the socket object used internally. Telnet.fileno() Return the file descriptor of the socket object used internally. Telnet.write(buffer) Write a byte string to the socket, doubling any IAC characters. This can block if the connection is blocked. May raise "OSError" if the connection is closed. Raises an auditing event "telnetlib.Telnet.write" with arguments "self", "buffer". Changed in version 3.3: This method used to raise "socket.error", which is now an alias of "OSError". Telnet.interact() Interaction function, emulates a very dumb Telnet client. Telnet.mt_interact() Multithreaded version of "interact()". Telnet.expect(list, timeout=None) Read until one from a list of a regular expressions matches. The first argument is a list of regular expressions, either compiled (regex objects) or uncompiled (byte strings). The optional second argument is a timeout, in seconds; the default is to block indefinitely. Return a tuple of three items: the index in the list of the first regular expression that matches; the match object returned; and the bytes read up till and including the match. If end of file is found and no bytes were read, raise "EOFError". Otherwise, when nothing matches, return "(-1, None, data)" where *data* is the bytes received so far (may be empty bytes if a timeout happened). If a regular expression ends with a greedy match (such as ".*") or if more than one expression can match the same input, the results are non-deterministic, and may depend on the I/O timing. Telnet.set_option_negotiation_callback(callback) Each time a telnet option is read on the input flow, this *callback* (if set) is called with the following parameters: callback(telnet socket, command (DO/DONT/WILL/WONT), option). No other action is done afterwards by telnetlib. Telnet Example ============== A simple example illustrating typical use: import getpass import telnetlib HOST = "localhost" user = input("Enter your remote account: ") password = getpass.getpass() tn = telnetlib.Telnet(HOST) tn.read_until(b"login: ") tn.write(user.encode('ascii') + b"\n") if password: tn.read_until(b"Password: ") tn.write(password.encode('ascii') + b"\n") tn.write(b"ls\n") tn.write(b"exit\n") print(tn.read_all().decode('ascii')) "tempfile" — Generate temporary files and directories ***************************************************** **Source code:** Lib/tempfile.py ====================================================================== This module creates temporary files and directories. It works on all supported platforms. "TemporaryFile", "NamedTemporaryFile", "TemporaryDirectory", and "SpooledTemporaryFile" are high-level interfaces which provide automatic cleanup and can be used as context managers. "mkstemp()" and "mkdtemp()" are lower-level functions which require manual cleanup. All the user-callable functions and constructors take additional arguments which allow direct control over the location and name of temporary files and directories. Files names used by this module include a string of random characters which allows those files to be securely created in shared temporary directories. To maintain backward compatibility, the argument order is somewhat odd; it is recommended to use keyword arguments for clarity. The module defines the following user-callable items: tempfile.TemporaryFile(mode='w+b', buffering=-1, encoding=None, newline=None, suffix=None, prefix=None, dir=None, *, errors=None) Return a *file-like object* that can be used as a temporary storage area. The file is created securely, using the same rules as "mkstemp()". It will be destroyed as soon as it is closed (including an implicit close when the object is garbage collected). Under Unix, the directory entry for the file is either not created at all or is removed immediately after the file is created. Other platforms do not support this; your code should not rely on a temporary file created using this function having or not having a visible name in the file system. The resulting object can be used as a context manager (see Examples). On completion of the context or destruction of the file object the temporary file will be removed from the filesystem. The *mode* parameter defaults to "'w+b'" so that the file created can be read and written without being closed. Binary mode is used so that it behaves consistently on all platforms without regard for the data that is stored. *buffering*, *encoding*, *errors* and *newline* are interpreted as for "open()". The *dir*, *prefix* and *suffix* parameters have the same meaning and defaults as with "mkstemp()". The returned object is a true file object on POSIX platforms. On other platforms, it is a file-like object whose "file" attribute is the underlying true file object. The "os.O_TMPFILE" flag is used if it is available and works (Linux-specific, requires Linux kernel 3.11 or later). On platforms that are neither Posix nor Cygwin, TemporaryFile is an alias for NamedTemporaryFile. Raises an auditing event "tempfile.mkstemp" with argument "fullpath". Changed in version 3.5: The "os.O_TMPFILE" flag is now used if available. Changed in version 3.8: Added *errors* parameter. tempfile.NamedTemporaryFile(mode='w+b', buffering=-1, encoding=None, newline=None, suffix=None, prefix=None, dir=None, delete=True, *, errors=None) This function operates exactly as "TemporaryFile()" does, except that the file is guaranteed to have a visible name in the file system (on Unix, the directory entry is not unlinked). That name can be retrieved from the "name" attribute of the returned file- like object. Whether the name can be used to open the file a second time, while the named temporary file is still open, varies across platforms (it can be so used on Unix; it cannot on Windows). If *delete* is true (the default), the file is deleted as soon as it is closed. The returned object is always a file-like object whose "file" attribute is the underlying true file object. This file-like object can be used in a "with" statement, just like a normal file. Raises an auditing event "tempfile.mkstemp" with argument "fullpath". Changed in version 3.8: Added *errors* parameter. tempfile.SpooledTemporaryFile(max_size=0, mode='w+b', buffering=-1, encoding=None, newline=None, suffix=None, prefix=None, dir=None, *, errors=None) This function operates exactly as "TemporaryFile()" does, except that data is spooled in memory until the file size exceeds *max_size*, or until the file’s "fileno()" method is called, at which point the contents are written to disk and operation proceeds as with "TemporaryFile()". The resulting file has one additional method, "rollover()", which causes the file to roll over to an on-disk file regardless of its size. The returned object is a file-like object whose "_file" attribute is either an "io.BytesIO" or "io.TextIOWrapper" object (depending on whether binary or text *mode* was specified) or a true file object, depending on whether "rollover()" has been called. This file-like object can be used in a "with" statement, just like a normal file. Changed in version 3.3: the truncate method now accepts a "size" argument. Changed in version 3.8: Added *errors* parameter. tempfile.TemporaryDirectory(suffix=None, prefix=None, dir=None) This function securely creates a temporary directory using the same rules as "mkdtemp()". The resulting object can be used as a context manager (see Examples). On completion of the context or destruction of the temporary directory object the newly created temporary directory and all its contents are removed from the filesystem. The directory name can be retrieved from the "name" attribute of the returned object. When the returned object is used as a context manager, the "name" will be assigned to the target of the "as" clause in the "with" statement, if there is one. The directory can be explicitly cleaned up by calling the "cleanup()" method. Raises an auditing event "tempfile.mkdtemp" with argument "fullpath". New in version 3.2. tempfile.mkstemp(suffix=None, prefix=None, dir=None, text=False) Creates a temporary file in the most secure manner possible. There are no race conditions in the file’s creation, assuming that the platform properly implements the "os.O_EXCL" flag for "os.open()". The file is readable and writable only by the creating user ID. If the platform uses permission bits to indicate whether a file is executable, the file is executable by no one. The file descriptor is not inherited by child processes. Unlike "TemporaryFile()", the user of "mkstemp()" is responsible for deleting the temporary file when done with it. If *suffix* is not "None", the file name will end with that suffix, otherwise there will be no suffix. "mkstemp()" does not put a dot between the file name and the suffix; if you need one, put it at the beginning of *suffix*. If *prefix* is not "None", the file name will begin with that prefix; otherwise, a default prefix is used. The default is the return value of "gettempprefix()" or "gettempprefixb()", as appropriate. If *dir* is not "None", the file will be created in that directory; otherwise, a default directory is used. The default directory is chosen from a platform-dependent list, but the user of the application can control the directory location by setting the *TMPDIR*, *TEMP* or *TMP* environment variables. There is thus no guarantee that the generated filename will have any nice properties, such as not requiring quoting when passed to external commands via "os.popen()". If any of *suffix*, *prefix*, and *dir* are not "None", they must be the same type. If they are bytes, the returned name will be bytes instead of str. If you want to force a bytes return value with otherwise default behavior, pass "suffix=b''". If *text* is specified and true, the file is opened in text mode. Otherwise, (the default) the file is opened in binary mode. "mkstemp()" returns a tuple containing an OS-level handle to an open file (as would be returned by "os.open()") and the absolute pathname of that file, in that order. Raises an auditing event "tempfile.mkstemp" with argument "fullpath". Changed in version 3.5: *suffix*, *prefix*, and *dir* may now be supplied in bytes in order to obtain a bytes return value. Prior to this, only str was allowed. *suffix* and *prefix* now accept and default to "None" to cause an appropriate default value to be used. Changed in version 3.6: The *dir* parameter now accepts a *path- like object*. tempfile.mkdtemp(suffix=None, prefix=None, dir=None) Creates a temporary directory in the most secure manner possible. There are no race conditions in the directory’s creation. The directory is readable, writable, and searchable only by the creating user ID. The user of "mkdtemp()" is responsible for deleting the temporary directory and its contents when done with it. The *prefix*, *suffix*, and *dir* arguments are the same as for "mkstemp()". "mkdtemp()" returns the absolute pathname of the new directory. Raises an auditing event "tempfile.mkdtemp" with argument "fullpath". Changed in version 3.5: *suffix*, *prefix*, and *dir* may now be supplied in bytes in order to obtain a bytes return value. Prior to this, only str was allowed. *suffix* and *prefix* now accept and default to "None" to cause an appropriate default value to be used. Changed in version 3.6: The *dir* parameter now accepts a *path- like object*. tempfile.gettempdir() Return the name of the directory used for temporary files. This defines the default value for the *dir* argument to all functions in this module. Python searches a standard list of directories to find one which the calling user can create files in. The list is: 1. The directory named by the "TMPDIR" environment variable. 2. The directory named by the "TEMP" environment variable. 3. The directory named by the "TMP" environment variable. 4. A platform-specific location: * On Windows, the directories "C:\TEMP", "C:\TMP", "\TEMP", and "\TMP", in that order. * On all other platforms, the directories "/tmp", "/var/tmp", and "/usr/tmp", in that order. 5. As a last resort, the current working directory. The result of this search is cached, see the description of "tempdir" below. tempfile.gettempdirb() Same as "gettempdir()" but the return value is in bytes. New in version 3.5. tempfile.gettempprefix() Return the filename prefix used to create temporary files. This does not contain the directory component. tempfile.gettempprefixb() Same as "gettempprefix()" but the return value is in bytes. New in version 3.5. The module uses a global variable to store the name of the directory used for temporary files returned by "gettempdir()". It can be set directly to override the selection process, but this is discouraged. All functions in this module take a *dir* argument which can be used to specify the directory and this is the recommended approach. tempfile.tempdir When set to a value other than "None", this variable defines the default value for the *dir* argument to the functions defined in this module. If "tempdir" is "None" (the default) at any call to any of the above functions except "gettempprefix()" it is initialized following the algorithm described in "gettempdir()". Examples ======== Here are some examples of typical usage of the "tempfile" module: >>> import tempfile # create a temporary file and write some data to it >>> fp = tempfile.TemporaryFile() >>> fp.write(b'Hello world!') # read data from file >>> fp.seek(0) >>> fp.read() b'Hello world!' # close the file, it will be removed >>> fp.close() # create a temporary file using a context manager >>> with tempfile.TemporaryFile() as fp: ... fp.write(b'Hello world!') ... fp.seek(0) ... fp.read() b'Hello world!' >>> # file is now closed and removed # create a temporary directory using the context manager >>> with tempfile.TemporaryDirectory() as tmpdirname: ... print('created temporary directory', tmpdirname) >>> # directory and contents have been removed Deprecated functions and variables ================================== A historical way to create temporary files was to first generate a file name with the "mktemp()" function and then create a file using this name. Unfortunately this is not secure, because a different process may create a file with this name in the time between the call to "mktemp()" and the subsequent attempt to create the file by the first process. The solution is to combine the two steps and create the file immediately. This approach is used by "mkstemp()" and the other functions described above. tempfile.mktemp(suffix='', prefix='tmp', dir=None) Deprecated since version 2.3: Use "mkstemp()" instead. Return an absolute pathname of a file that did not exist at the time the call is made. The *prefix*, *suffix*, and *dir* arguments are similar to those of "mkstemp()", except that bytes file names, "suffix=None" and "prefix=None" are not supported. Warning: Use of this function may introduce a security hole in your program. By the time you get around to doing anything with the file name it returns, someone else may have beaten you to the punch. "mktemp()" usage can be replaced easily with "NamedTemporaryFile()", passing it the "delete=False" parameter: >>> f = NamedTemporaryFile(delete=False) >>> f.name '/tmp/tmptjujjt' >>> f.write(b"Hello World!\n") 13 >>> f.close() >>> os.unlink(f.name) >>> os.path.exists(f.name) False "termios" — POSIX style tty control *********************************** ====================================================================== This module provides an interface to the POSIX calls for tty I/O control. For a complete description of these calls, see *termios(3)* Unix manual page. It is only available for those Unix versions that support POSIX *termios* style tty I/O control configured during installation. All functions in this module take a file descriptor *fd* as their first argument. This can be an integer file descriptor, such as returned by "sys.stdin.fileno()", or a *file object*, such as "sys.stdin" itself. This module also defines all the constants needed to work with the functions provided here; these have the same name as their counterparts in C. Please refer to your system documentation for more information on using these terminal control interfaces. The module defines the following functions: termios.tcgetattr(fd) Return a list containing the tty attributes for file descriptor *fd*, as follows: "[iflag, oflag, cflag, lflag, ispeed, ospeed, cc]" where *cc* is a list of the tty special characters (each a string of length 1, except the items with indices "VMIN" and "VTIME", which are integers when these fields are defined). The interpretation of the flags and the speeds as well as the indexing in the *cc* array must be done using the symbolic constants defined in the "termios" module. termios.tcsetattr(fd, when, attributes) Set the tty attributes for file descriptor *fd* from the *attributes*, which is a list like the one returned by "tcgetattr()". The *when* argument determines when the attributes are changed: "TCSANOW" to change immediately, "TCSADRAIN" to change after transmitting all queued output, or "TCSAFLUSH" to change after transmitting all queued output and discarding all queued input. termios.tcsendbreak(fd, duration) Send a break on file descriptor *fd*. A zero *duration* sends a break for 0.25–0.5 seconds; a nonzero *duration* has a system dependent meaning. termios.tcdrain(fd) Wait until all output written to file descriptor *fd* has been transmitted. termios.tcflush(fd, queue) Discard queued data on file descriptor *fd*. The *queue* selector specifies which queue: "TCIFLUSH" for the input queue, "TCOFLUSH" for the output queue, or "TCIOFLUSH" for both queues. termios.tcflow(fd, action) Suspend or resume input or output on file descriptor *fd*. The *action* argument can be "TCOOFF" to suspend output, "TCOON" to restart output, "TCIOFF" to suspend input, or "TCION" to restart input. See also: Module "tty" Convenience functions for common terminal control operations. Example ======= Here’s a function that prompts for a password with echoing turned off. Note the technique using a separate "tcgetattr()" call and a "try" … "finally" statement to ensure that the old tty attributes are restored exactly no matter what happens: def getpass(prompt="Password: "): import termios, sys fd = sys.stdin.fileno() old = termios.tcgetattr(fd) new = termios.tcgetattr(fd) new[3] = new[3] & ~termios.ECHO # lflags try: termios.tcsetattr(fd, termios.TCSADRAIN, new) passwd = input(prompt) finally: termios.tcsetattr(fd, termios.TCSADRAIN, old) return passwd "test" — Regression tests package for Python ******************************************** Note: The "test" package is meant for internal use by Python only. It is documented for the benefit of the core developers of Python. Any use of this package outside of Python’s standard library is discouraged as code mentioned here can change or be removed without notice between releases of Python. ====================================================================== The "test" package contains all regression tests for Python as well as the modules "test.support" and "test.regrtest". "test.support" is used to enhance your tests while "test.regrtest" drives the testing suite. Each module in the "test" package whose name starts with "test_" is a testing suite for a specific module or feature. All new tests should be written using the "unittest" or "doctest" module. Some older tests are written using a “traditional” testing style that compares output printed to "sys.stdout"; this style of test is considered deprecated. See also: Module "unittest" Writing PyUnit regression tests. Module "doctest" Tests embedded in documentation strings. Writing Unit Tests for the "test" package ========================================= It is preferred that tests that use the "unittest" module follow a few guidelines. One is to name the test module by starting it with "test_" and end it with the name of the module being tested. The test methods in the test module should start with "test_" and end with a description of what the method is testing. This is needed so that the methods are recognized by the test driver as test methods. Also, no documentation string for the method should be included. A comment (such as "# Tests function returns only True or False") should be used to provide documentation for test methods. This is done because documentation strings get printed out if they exist and thus what test is being run is not stated. A basic boilerplate is often used: import unittest from test import support class MyTestCase1(unittest.TestCase): # Only use setUp() and tearDown() if necessary def setUp(self): ... code to execute in preparation for tests ... def tearDown(self): ... code to execute to clean up after tests ... def test_feature_one(self): # Test feature one. ... testing code ... def test_feature_two(self): # Test feature two. ... testing code ... ... more test methods ... class MyTestCase2(unittest.TestCase): ... same structure as MyTestCase1 ... ... more test classes ... if __name__ == '__main__': unittest.main() This code pattern allows the testing suite to be run by "test.regrtest", on its own as a script that supports the "unittest" CLI, or via the "python -m unittest" CLI. The goal for regression testing is to try to break code. This leads to a few guidelines to be followed: * The testing suite should exercise all classes, functions, and constants. This includes not just the external API that is to be presented to the outside world but also “private” code. * Whitebox testing (examining the code being tested when the tests are being written) is preferred. Blackbox testing (testing only the published user interface) is not complete enough to make sure all boundary and edge cases are tested. * Make sure all possible values are tested including invalid ones. This makes sure that not only all valid values are acceptable but also that improper values are handled correctly. * Exhaust as many code paths as possible. Test where branching occurs and thus tailor input to make sure as many different paths through the code are taken. * Add an explicit test for any bugs discovered for the tested code. This will make sure that the error does not crop up again if the code is changed in the future. * Make sure to clean up after your tests (such as close and remove all temporary files). * If a test is dependent on a specific condition of the operating system then verify the condition already exists before attempting the test. * Import as few modules as possible and do it as soon as possible. This minimizes external dependencies of tests and also minimizes possible anomalous behavior from side-effects of importing a module. * Try to maximize code reuse. On occasion, tests will vary by something as small as what type of input is used. Minimize code duplication by subclassing a basic test class with a class that specifies the input: class TestFuncAcceptsSequencesMixin: func = mySuperWhammyFunction def test_func(self): self.func(self.arg) class AcceptLists(TestFuncAcceptsSequencesMixin, unittest.TestCase): arg = [1, 2, 3] class AcceptStrings(TestFuncAcceptsSequencesMixin, unittest.TestCase): arg = 'abc' class AcceptTuples(TestFuncAcceptsSequencesMixin, unittest.TestCase): arg = (1, 2, 3) When using this pattern, remember that all classes that inherit from "unittest.TestCase" are run as tests. The "Mixin" class in the example above does not have any data and so can’t be run by itself, thus it does not inherit from "unittest.TestCase". See also: Test Driven Development A book by Kent Beck on writing tests before code. Running tests using the command-line interface ============================================== The "test" package can be run as a script to drive Python’s regression test suite, thanks to the "-m" option: **python -m test**. Under the hood, it uses "test.regrtest"; the call **python -m test.regrtest** used in previous Python versions still works. Running the script by itself automatically starts running all regression tests in the "test" package. It does this by finding all modules in the package whose name starts with "test_", importing them, and executing the function "test_main()" if present or loading the tests via unittest.TestLoader.loadTestsFromModule if "test_main" does not exist. The names of tests to execute may also be passed to the script. Specifying a single regression test (**python -m test test_spam**) will minimize output and only print whether the test passed or failed. Running "test" directly allows what resources are available for tests to use to be set. You do this by using the "-u" command-line option. Specifying "all" as the value for the "-u" option enables all possible resources: **python -m test -uall**. If all but one resource is desired (a more common case), a comma-separated list of resources that are not desired may be listed after "all". The command **python -m test -uall,-audio,-largefile** will run "test" with all resources except the "audio" and "largefile" resources. For a list of all resources and more command-line options, run **python -m test -h**. Some other ways to execute the regression tests depend on what platform the tests are being executed on. On Unix, you can run **make test** at the top-level directory where Python was built. On Windows, executing **rt.bat** from your "PCbuild" directory will run all regression tests. "test.support" — Utilities for the Python test suite **************************************************** The "test.support" module provides support for Python’s regression test suite. Note: "test.support" is not a public module. It is documented here to help Python developers write tests. The API of this module is subject to change without backwards compatibility concerns between releases. This module defines the following exceptions: exception test.support.TestFailed Exception to be raised when a test fails. This is deprecated in favor of "unittest"-based tests and "unittest.TestCase"’s assertion methods. exception test.support.ResourceDenied Subclass of "unittest.SkipTest". Raised when a resource (such as a network connection) is not available. Raised by the "requires()" function. The "test.support" module defines the following constants: test.support.verbose "True" when verbose output is enabled. Should be checked when more detailed information is desired about a running test. *verbose* is set by "test.regrtest". test.support.is_jython "True" if the running interpreter is Jython. test.support.is_android "True" if the system is Android. test.support.unix_shell Path for shell if not on Windows; otherwise "None". test.support.FS_NONASCII A non-ASCII character encodable by "os.fsencode()". test.support.TESTFN Set to a name that is safe to use as the name of a temporary file. Any temporary file that is created should be closed and unlinked (removed). test.support.TESTFN_UNICODE Set to a non-ASCII name for a temporary file. test.support.TESTFN_ENCODING Set to "sys.getfilesystemencoding()". test.support.TESTFN_UNENCODABLE Set to a filename (str type) that should not be able to be encoded by file system encoding in strict mode. It may be "None" if it’s not possible to generate such a filename. test.support.TESTFN_UNDECODABLE Set to a filename (bytes type) that should not be able to be decoded by file system encoding in strict mode. It may be "None" if it’s not possible to generate such a filename. test.support.TESTFN_NONASCII Set to a filename containing the "FS_NONASCII" character. test.support.LOOPBACK_TIMEOUT Timeout in seconds for tests using a network server listening on the network local loopback interface like "127.0.0.1". The timeout is long enough to prevent test failure: it takes into account that the client and the server can run in different threads or even different processes. The timeout should be long enough for "connect()", "recv()" and "send()" methods of "socket.socket". Its default value is 5 seconds. See also "INTERNET_TIMEOUT". test.support.INTERNET_TIMEOUT Timeout in seconds for network requests going to the Internet. The timeout is short enough to prevent a test to wait for too long if the Internet request is blocked for whatever reason. Usually, a timeout using "INTERNET_TIMEOUT" should not mark a test as failed, but skip the test instead: see "transient_internet()". Its default value is 1 minute. See also "LOOPBACK_TIMEOUT". test.support.SHORT_TIMEOUT Timeout in seconds to mark a test as failed if the test takes “too long”. The timeout value depends on the regrtest "--timeout" command line option. If a test using "SHORT_TIMEOUT" starts to fail randomly on slow buildbots, use "LONG_TIMEOUT" instead. Its default value is 30 seconds. test.support.LONG_TIMEOUT Timeout in seconds to detect when a test hangs. It is long enough to reduce the risk of test failure on the slowest Python buildbots. It should not be used to mark a test as failed if the test takes “too long”. The timeout value depends on the regrtest "--timeout" command line option. Its default value is 5 minutes. See also "LOOPBACK_TIMEOUT", "INTERNET_TIMEOUT" and "SHORT_TIMEOUT". test.support.SAVEDCWD Set to "os.getcwd()". test.support.PGO Set when tests can be skipped when they are not useful for PGO. test.support.PIPE_MAX_SIZE A constant that is likely larger than the underlying OS pipe buffer size, to make writes blocking. test.support.SOCK_MAX_SIZE A constant that is likely larger than the underlying OS socket buffer size, to make writes blocking. test.support.TEST_SUPPORT_DIR Set to the top level directory that contains "test.support". test.support.TEST_HOME_DIR Set to the top level directory for the test package. test.support.TEST_DATA_DIR Set to the "data" directory within the test package. test.support.MAX_Py_ssize_t Set to "sys.maxsize" for big memory tests. test.support.max_memuse Set by "set_memlimit()" as the memory limit for big memory tests. Limited by "MAX_Py_ssize_t". test.support.real_max_memuse Set by "set_memlimit()" as the memory limit for big memory tests. Not limited by "MAX_Py_ssize_t". test.support.MISSING_C_DOCSTRINGS Return "True" if running on CPython, not on Windows, and configuration not set with "WITH_DOC_STRINGS". test.support.HAVE_DOCSTRINGS Check for presence of docstrings. test.support.TEST_HTTP_URL Define the URL of a dedicated HTTP server for the network tests. test.support.ALWAYS_EQ Object that is equal to anything. Used to test mixed type comparison. test.support.NEVER_EQ Object that is not equal to anything (even to "ALWAYS_EQ"). Used to test mixed type comparison. test.support.LARGEST Object that is greater than anything (except itself). Used to test mixed type comparison. test.support.SMALLEST Object that is less than anything (except itself). Used to test mixed type comparison. The "test.support" module defines the following functions: test.support.forget(module_name) Remove the module named *module_name* from "sys.modules" and delete any byte-compiled files of the module. test.support.unload(name) Delete *name* from "sys.modules". test.support.unlink(filename) Call "os.unlink()" on *filename*. On Windows platforms, this is wrapped with a wait loop that checks for the existence of the file. test.support.rmdir(filename) Call "os.rmdir()" on *filename*. On Windows platforms, this is wrapped with a wait loop that checks for the existence of the file. test.support.rmtree(path) Call "shutil.rmtree()" on *path* or call "os.lstat()" and "os.rmdir()" to remove a path and its contents. On Windows platforms, this is wrapped with a wait loop that checks for the existence of the files. test.support.make_legacy_pyc(source) Move a **PEP 3147**/**PEP 488** pyc file to its legacy pyc location and return the file system path to the legacy pyc file. The *source* value is the file system path to the source file. It does not need to exist, however the PEP 3147/488 pyc file must exist. test.support.is_resource_enabled(resource) Return "True" if *resource* is enabled and available. The list of available resources is only set when "test.regrtest" is executing the tests. test.support.python_is_optimized() Return "True" if Python was not built with "-O0" or "-Og". test.support.with_pymalloc() Return "_testcapi.WITH_PYMALLOC". test.support.requires(resource, msg=None) Raise "ResourceDenied" if *resource* is not available. *msg* is the argument to "ResourceDenied" if it is raised. Always returns "True" if called by a function whose "__name__" is "'__main__'". Used when tests are executed by "test.regrtest". test.support.system_must_validate_cert(f) Raise "unittest.SkipTest" on TLS certification validation failures. test.support.sortdict(dict) Return a repr of *dict* with keys sorted. test.support.findfile(filename, subdir=None) Return the path to the file named *filename*. If no match is found *filename* is returned. This does not equal a failure since it could be the path to the file. Setting *subdir* indicates a relative path to use to find the file rather than looking directly in the path directories. test.support.create_empty_file(filename) Create an empty file with *filename*. If it already exists, truncate it. test.support.fd_count() Count the number of open file descriptors. test.support.match_test(test) Match *test* to patterns set in "set_match_tests()". test.support.set_match_tests(patterns) Define match test with regular expression *patterns*. test.support.run_unittest(*classes) Execute "unittest.TestCase" subclasses passed to the function. The function scans the classes for methods starting with the prefix "test_" and executes the tests individually. It is also legal to pass strings as parameters; these should be keys in "sys.modules". Each associated module will be scanned by "unittest.TestLoader.loadTestsFromModule()". This is usually seen in the following "test_main()" function: def test_main(): support.run_unittest(__name__) This will run all tests defined in the named module. test.support.run_doctest(module, verbosity=None, optionflags=0) Run "doctest.testmod()" on the given *module*. Return "(failure_count, test_count)". If *verbosity* is "None", "doctest.testmod()" is run with verbosity set to "verbose". Otherwise, it is run with verbosity set to "None". *optionflags* is passed as "optionflags" to "doctest.testmod()". test.support.setswitchinterval(interval) Set the "sys.setswitchinterval()" to the given *interval*. Defines a minimum interval for Android systems to prevent the system from hanging. test.support.check_impl_detail(**guards) Use this check to guard CPython’s implementation-specific tests or to run them only on the implementations guarded by the arguments: check_impl_detail() # Only on CPython (default). check_impl_detail(jython=True) # Only on Jython. check_impl_detail(cpython=False) # Everywhere except CPython. test.support.check_warnings(*filters, quiet=True) A convenience wrapper for "warnings.catch_warnings()" that makes it easier to test that a warning was correctly raised. It is approximately equivalent to calling "warnings.catch_warnings(record=True)" with "warnings.simplefilter()" set to "always" and with the option to automatically validate the results that are recorded. "check_warnings" accepts 2-tuples of the form "("message regexp", WarningCategory)" as positional arguments. If one or more *filters* are provided, or if the optional keyword argument *quiet* is "False", it checks to make sure the warnings are as expected: each specified filter must match at least one of the warnings raised by the enclosed code or the test fails, and if any warnings are raised that do not match any of the specified filters the test fails. To disable the first of these checks, set *quiet* to "True". If no arguments are specified, it defaults to: check_warnings(("", Warning), quiet=True) In this case all warnings are caught and no errors are raised. On entry to the context manager, a "WarningRecorder" instance is returned. The underlying warnings list from "catch_warnings()" is available via the recorder object’s "warnings" attribute. As a convenience, the attributes of the object representing the most recent warning can also be accessed directly through the recorder object (see example below). If no warning has been raised, then any of the attributes that would otherwise be expected on an object representing a warning will return "None". The recorder object also has a "reset()" method, which clears the warnings list. The context manager is designed to be used like this: with check_warnings(("assertion is always true", SyntaxWarning), ("", UserWarning)): exec('assert(False, "Hey!")') warnings.warn(UserWarning("Hide me!")) In this case if either warning was not raised, or some other warning was raised, "check_warnings()" would raise an error. When a test needs to look more deeply into the warnings, rather than just checking whether or not they occurred, code like this can be used: with check_warnings(quiet=True) as w: warnings.warn("foo") assert str(w.args[0]) == "foo" warnings.warn("bar") assert str(w.args[0]) == "bar" assert str(w.warnings[0].args[0]) == "foo" assert str(w.warnings[1].args[0]) == "bar" w.reset() assert len(w.warnings) == 0 Here all warnings will be caught, and the test code tests the captured warnings directly. Changed in version 3.2: New optional arguments *filters* and *quiet*. test.support.check_no_resource_warning(testcase) Context manager to check that no "ResourceWarning" was raised. You must remove the object which may emit "ResourceWarning" before the end of the context manager. test.support.set_memlimit(limit) Set the values for "max_memuse" and "real_max_memuse" for big memory tests. test.support.record_original_stdout(stdout) Store the value from *stdout*. It is meant to hold the stdout at the time the regrtest began. test.support.get_original_stdout() Return the original stdout set by "record_original_stdout()" or "sys.stdout" if it’s not set. test.support.args_from_interpreter_flags() Return a list of command line arguments reproducing the current settings in "sys.flags" and "sys.warnoptions". test.support.optim_args_from_interpreter_flags() Return a list of command line arguments reproducing the current optimization settings in "sys.flags". test.support.captured_stdin() test.support.captured_stdout() test.support.captured_stderr() A context managers that temporarily replaces the named stream with "io.StringIO" object. Example use with output streams: with captured_stdout() as stdout, captured_stderr() as stderr: print("hello") print("error", file=sys.stderr) assert stdout.getvalue() == "hello\n" assert stderr.getvalue() == "error\n" Example use with input stream: with captured_stdin() as stdin: stdin.write('hello\n') stdin.seek(0) # call test code that consumes from sys.stdin captured = input() self.assertEqual(captured, "hello") test.support.temp_dir(path=None, quiet=False) A context manager that creates a temporary directory at *path* and yields the directory. If *path* is "None", the temporary directory is created using "tempfile.mkdtemp()". If *quiet* is "False", the context manager raises an exception on error. Otherwise, if *path* is specified and cannot be created, only a warning is issued. test.support.change_cwd(path, quiet=False) A context manager that temporarily changes the current working directory to *path* and yields the directory. If *quiet* is "False", the context manager raises an exception on error. Otherwise, it issues only a warning and keeps the current working directory the same. test.support.temp_cwd(name='tempcwd', quiet=False) A context manager that temporarily creates a new directory and changes the current working directory (CWD). The context manager creates a temporary directory in the current directory with name *name* before temporarily changing the current working directory. If *name* is "None", the temporary directory is created using "tempfile.mkdtemp()". If *quiet* is "False" and it is not possible to create or change the CWD, an error is raised. Otherwise, only a warning is raised and the original CWD is used. test.support.temp_umask(umask) A context manager that temporarily sets the process umask. test.support.disable_faulthandler() A context manager that replaces "sys.stderr" with "sys.__stderr__". test.support.gc_collect() Force as many objects as possible to be collected. This is needed because timely deallocation is not guaranteed by the garbage collector. This means that "__del__" methods may be called later than expected and weakrefs may remain alive for longer than expected. test.support.disable_gc() A context manager that disables the garbage collector upon entry and reenables it upon exit. test.support.swap_attr(obj, attr, new_val) Context manager to swap out an attribute with a new object. Usage: with swap_attr(obj, "attr", 5): ... This will set "obj.attr" to 5 for the duration of the "with" block, restoring the old value at the end of the block. If "attr" doesn’t exist on "obj", it will be created and then deleted at the end of the block. The old value (or "None" if it doesn’t exist) will be assigned to the target of the “as” clause, if there is one. test.support.swap_item(obj, attr, new_val) Context manager to swap out an item with a new object. Usage: with swap_item(obj, "item", 5): ... This will set "obj["item"]" to 5 for the duration of the "with" block, restoring the old value at the end of the block. If "item" doesn’t exist on "obj", it will be created and then deleted at the end of the block. The old value (or "None" if it doesn’t exist) will be assigned to the target of the “as” clause, if there is one. test.support.print_warning(msg) Print a warning into "sys.__stderr__". Format the message as: "f"Warning -- {msg}"". If *msg* is made of multiple lines, add ""Warning -- "" prefix to each line. New in version 3.9. test.support.wait_process(pid, *, exitcode, timeout=None) Wait until process *pid* completes and check that the process exit code is *exitcode*. Raise an "AssertionError" if the process exit code is not equal to *exitcode*. If the process runs longer than *timeout* seconds ("SHORT_TIMEOUT" by default), kill the process and raise an "AssertionError". The timeout feature is not available on Windows. New in version 3.9. test.support.wait_threads_exit(timeout=60.0) Context manager to wait until all threads created in the "with" statement exit. test.support.start_threads(threads, unlock=None) Context manager to start *threads*. It attempts to join the threads upon exit. test.support.calcobjsize(fmt) Return "struct.calcsize()" for "nP{fmt}0n" or, if "gettotalrefcount" exists, "2PnP{fmt}0P". test.support.calcvobjsize(fmt) Return "struct.calcsize()" for "nPn{fmt}0n" or, if "gettotalrefcount" exists, "2PnPn{fmt}0P". test.support.checksizeof(test, o, size) For testcase *test*, assert that the "sys.getsizeof" for *o* plus the GC header size equals *size*. test.support.can_symlink() Return "True" if the OS supports symbolic links, "False" otherwise. test.support.can_xattr() Return "True" if the OS supports xattr, "False" otherwise. @test.support.skip_unless_symlink A decorator for running tests that require support for symbolic links. @test.support.skip_unless_xattr A decorator for running tests that require support for xattr. @test.support.anticipate_failure(condition) A decorator to conditionally mark tests with "unittest.expectedFailure()". Any use of this decorator should have an associated comment identifying the relevant tracker issue. @test.support.run_with_locale(catstr, *locales) A decorator for running a function in a different locale, correctly resetting it after it has finished. *catstr* is the locale category as a string (for example ""LC_ALL""). The *locales* passed will be tried sequentially, and the first valid locale will be used. @test.support.run_with_tz(tz) A decorator for running a function in a specific timezone, correctly resetting it after it has finished. @test.support.requires_freebsd_version(*min_version) Decorator for the minimum version when running test on FreeBSD. If the FreeBSD version is less than the minimum, raise "unittest.SkipTest". @test.support.requires_linux_version(*min_version) Decorator for the minimum version when running test on Linux. If the Linux version is less than the minimum, raise "unittest.SkipTest". @test.support.requires_mac_version(*min_version) Decorator for the minimum version when running test on macOS. If the macOS version is less than the minimum, raise "unittest.SkipTest". @test.support.requires_IEEE_754 Decorator for skipping tests on non-IEEE 754 platforms. @test.support.requires_zlib Decorator for skipping tests if "zlib" doesn’t exist. @test.support.requires_gzip Decorator for skipping tests if "gzip" doesn’t exist. @test.support.requires_bz2 Decorator for skipping tests if "bz2" doesn’t exist. @test.support.requires_lzma Decorator for skipping tests if "lzma" doesn’t exist. @test.support.requires_resource(resource) Decorator for skipping tests if *resource* is not available. @test.support.requires_docstrings Decorator for only running the test if "HAVE_DOCSTRINGS". @test.support.cpython_only(test) Decorator for tests only applicable to CPython. @test.support.impl_detail(msg=None, **guards) Decorator for invoking "check_impl_detail()" on *guards*. If that returns "False", then uses *msg* as the reason for skipping the test. @test.support.no_tracing(func) Decorator to temporarily turn off tracing for the duration of the test. @test.support.refcount_test(test) Decorator for tests which involve reference counting. The decorator does not run the test if it is not run by CPython. Any trace function is unset for the duration of the test to prevent unexpected refcounts caused by the trace function. @test.support.reap_threads(func) Decorator to ensure the threads are cleaned up even if the test fails. @test.support.bigmemtest(size, memuse, dry_run=True) Decorator for bigmem tests. *size* is a requested size for the test (in arbitrary, test- interpreted units.) *memuse* is the number of bytes per unit for the test, or a good estimate of it. For example, a test that needs two byte buffers, of 4 GiB each, could be decorated with "@bigmemtest(size=_4G, memuse=2)". The *size* argument is normally passed to the decorated test method as an extra argument. If *dry_run* is "True", the value passed to the test method may be less than the requested value. If *dry_run* is "False", it means the test doesn’t support dummy runs when "-M" is not specified. @test.support.bigaddrspacetest(f) Decorator for tests that fill the address space. *f* is the function to wrap. test.support.make_bad_fd() Create an invalid file descriptor by opening and closing a temporary file, and returning its descriptor. test.support.check_syntax_error(testcase, statement, errtext='', *, lineno=None, offset=None) Test for syntax errors in *statement* by attempting to compile *statement*. *testcase* is the "unittest" instance for the test. *errtext* is the regular expression which should match the string representation of the raised "SyntaxError". If *lineno* is not "None", compares to the line of the exception. If *offset* is not "None", compares to the offset of the exception. test.support.check_syntax_warning(testcase, statement, errtext='', *, lineno=1, offset=None) Test for syntax warning in *statement* by attempting to compile *statement*. Test also that the "SyntaxWarning" is emitted only once, and that it will be converted to a "SyntaxError" when turned into error. *testcase* is the "unittest" instance for the test. *errtext* is the regular expression which should match the string representation of the emitted "SyntaxWarning" and raised "SyntaxError". If *lineno* is not "None", compares to the line of the warning and exception. If *offset* is not "None", compares to the offset of the exception. New in version 3.8. test.support.open_urlresource(url, *args, **kw) Open *url*. If open fails, raises "TestFailed". test.support.import_module(name, deprecated=False, *, required_on()) This function imports and returns the named module. Unlike a normal import, this function raises "unittest.SkipTest" if the module cannot be imported. Module and package deprecation messages are suppressed during this import if *deprecated* is "True". If a module is required on a platform but optional for others, set *required_on* to an iterable of platform prefixes which will be compared against "sys.platform". New in version 3.1. test.support.import_fresh_module(name, fresh=(), blocked=(), deprecated=False) This function imports and returns a fresh copy of the named Python module by removing the named module from "sys.modules" before doing the import. Note that unlike "reload()", the original module is not affected by this operation. *fresh* is an iterable of additional module names that are also removed from the "sys.modules" cache before doing the import. *blocked* is an iterable of module names that are replaced with "None" in the module cache during the import to ensure that attempts to import them raise "ImportError". The named module and any modules named in the *fresh* and *blocked* parameters are saved before starting the import and then reinserted into "sys.modules" when the fresh import is complete. Module and package deprecation messages are suppressed during this import if *deprecated* is "True". This function will raise "ImportError" if the named module cannot be imported. Example use: # Get copies of the warnings module for testing without affecting the # version being used by the rest of the test suite. One copy uses the # C implementation, the other is forced to use the pure Python fallback # implementation py_warnings = import_fresh_module('warnings', blocked=['_warnings']) c_warnings = import_fresh_module('warnings', fresh=['_warnings']) New in version 3.1. test.support.modules_setup() Return a copy of "sys.modules". test.support.modules_cleanup(oldmodules) Remove modules except for *oldmodules* and "encodings" in order to preserve internal cache. test.support.threading_setup() Return current thread count and copy of dangling threads. test.support.threading_cleanup(*original_values) Cleanup up threads not specified in *original_values*. Designed to emit a warning if a test leaves running threads in the background. test.support.join_thread(thread, timeout=30.0) Join a *thread* within *timeout*. Raise an "AssertionError" if thread is still alive after *timeout* seconds. test.support.reap_children() Use this at the end of "test_main" whenever sub-processes are started. This will help ensure that no extra children (zombies) stick around to hog resources and create problems when looking for refleaks. test.support.get_attribute(obj, name) Get an attribute, raising "unittest.SkipTest" if "AttributeError" is raised. test.support.catch_threading_exception() Context manager catching "threading.Thread" exception using "threading.excepthook()". Attributes set when an exception is caught: * "exc_type" * "exc_value" * "exc_traceback" * "thread" See "threading.excepthook()" documentation. These attributes are deleted at the context manager exit. Usage: with support.catch_threading_exception() as cm: # code spawning a thread which raises an exception ... # check the thread exception, use cm attributes: # exc_type, exc_value, exc_traceback, thread ... # exc_type, exc_value, exc_traceback, thread attributes of cm no longer # exists at this point # (to avoid reference cycles) New in version 3.8. test.support.catch_unraisable_exception() Context manager catching unraisable exception using "sys.unraisablehook()". Storing the exception value ("cm.unraisable.exc_value") creates a reference cycle. The reference cycle is broken explicitly when the context manager exits. Storing the object ("cm.unraisable.object") can resurrect it if it is set to an object which is being finalized. Exiting the context manager clears the stored object. Usage: with support.catch_unraisable_exception() as cm: # code creating an "unraisable exception" ... # check the unraisable exception: use cm.unraisable ... # cm.unraisable attribute no longer exists at this point # (to break a reference cycle) New in version 3.8. test.support.load_package_tests(pkg_dir, loader, standard_tests, pattern) Generic implementation of the "unittest" "load_tests" protocol for use in test packages. *pkg_dir* is the root directory of the package; *loader*, *standard_tests*, and *pattern* are the arguments expected by "load_tests". In simple cases, the test package’s "__init__.py" can be the following: import os from test.support import load_package_tests def load_tests(*args): return load_package_tests(os.path.dirname(__file__), *args) test.support.fs_is_case_insensitive(directory) Return "True" if the file system for *directory* is case- insensitive. test.support.detect_api_mismatch(ref_api, other_api, *, ignore=()) Returns the set of attributes, functions or methods of *ref_api* not found on *other_api*, except for a defined list of items to be ignored in this check specified in *ignore*. By default this skips private attributes beginning with ‘_’ but includes all magic methods, i.e. those starting and ending in ‘__’. New in version 3.5. test.support.patch(test_instance, object_to_patch, attr_name, new_value) Override *object_to_patch.attr_name* with *new_value*. Also add cleanup procedure to *test_instance* to restore *object_to_patch* for *attr_name*. The *attr_name* should be a valid attribute for *object_to_patch*. test.support.run_in_subinterp(code) Run *code* in subinterpreter. Raise "unittest.SkipTest" if "tracemalloc" is enabled. test.support.check_free_after_iterating(test, iter, cls, args=()) Assert that *iter* is deallocated after iterating. test.support.missing_compiler_executable(cmd_names=[]) Check for the existence of the compiler executables whose names are listed in *cmd_names* or all the compiler executables when *cmd_names* is empty and return the first missing executable or "None" when none is found missing. test.support.check__all__(test_case, module, name_of_module=None, extra=(), blacklist=()) Assert that the "__all__" variable of *module* contains all public names. The module’s public names (its API) are detected automatically based on whether they match the public name convention and were defined in *module*. The *name_of_module* argument can specify (as a string or tuple thereof) what module(s) an API could be defined in order to be detected as a public API. One case for this is when *module* imports part of its public API from other modules, possibly a C backend (like "csv" and its "_csv"). The *extra* argument can be a set of names that wouldn’t otherwise be automatically detected as “public”, like objects without a proper "__module__" attribute. If provided, it will be added to the automatically detected ones. The *blacklist* argument can be a set of names that must not be treated as part of the public API even though their names indicate otherwise. Example use: import bar import foo import unittest from test import support class MiscTestCase(unittest.TestCase): def test__all__(self): support.check__all__(self, foo) class OtherTestCase(unittest.TestCase): def test__all__(self): extra = {'BAR_CONST', 'FOO_CONST'} blacklist = {'baz'} # Undocumented name. # bar imports part of its API from _bar. support.check__all__(self, bar, ('bar', '_bar'), extra=extra, blacklist=blacklist) New in version 3.6. test.support.adjust_int_max_str_digits(max_digits) This function returns a context manager that will change the global "sys.set_int_max_str_digits()" setting for the duration of the context to allow execution of test code that needs a different limit on the number of digits when converting between an integer and string. New in version 3.9.14. The "test.support" module defines the following classes: class test.support.TransientResource(exc, **kwargs) Instances are a context manager that raises "ResourceDenied" if the specified exception type is raised. Any keyword arguments are treated as attribute/value pairs to be compared against any exception raised within the "with" statement. Only if all pairs match properly against attributes on the exception is "ResourceDenied" raised. class test.support.EnvironmentVarGuard Class used to temporarily set or unset environment variables. Instances can be used as a context manager and have a complete dictionary interface for querying/modifying the underlying "os.environ". After exit from the context manager all changes to environment variables done through this instance will be rolled back. Changed in version 3.1: Added dictionary interface. EnvironmentVarGuard.set(envvar, value) Temporarily set the environment variable "envvar" to the value of "value". EnvironmentVarGuard.unset(envvar) Temporarily unset the environment variable "envvar". class test.support.SuppressCrashReport A context manager used to try to prevent crash dialog popups on tests that are expected to crash a subprocess. On Windows, it disables Windows Error Reporting dialogs using SetErrorMode. On UNIX, "resource.setrlimit()" is used to set "resource.RLIMIT_CORE"’s soft limit to 0 to prevent coredump file creation. On both platforms, the old value is restored by "__exit__()". class test.support.CleanImport(*module_names) A context manager to force import to return a new module reference. This is useful for testing module-level behaviors, such as the emission of a DeprecationWarning on import. Example usage: with CleanImport('foo'): importlib.import_module('foo') # New reference. class test.support.DirsOnSysPath(*paths) A context manager to temporarily add directories to sys.path. This makes a copy of "sys.path", appends any directories given as positional arguments, then reverts "sys.path" to the copied settings when the context ends. Note that *all* "sys.path" modifications in the body of the context manager, including replacement of the object, will be reverted at the end of the block. class test.support.SaveSignals Class to save and restore signal handlers registered by the Python signal handler. class test.support.Matcher matches(self, d, **kwargs) Try to match a single dict with the supplied arguments. match_value(self, k, dv, v) Try to match a single stored value (*dv*) with a supplied value (*v*). class test.support.WarningsRecorder Class used to record warnings for unit tests. See documentation of "check_warnings()" above for more details. class test.support.BasicTestRunner run(test) Run *test* and return the result. class test.support.FakePath(path) Simple *path-like object*. It implements the "__fspath__()" method which just returns the *path* argument. If *path* is an exception, it will be raised in "__fspath__()". "test.support.socket_helper" — Utilities for socket tests ********************************************************* The "test.support.socket_helper" module provides support for socket tests. New in version 3.9. test.support.socket_helper.IPV6_ENABLED Set to "True" if IPv6 is enabled on this host, "False" otherwise. test.support.socket_helper.find_unused_port(family=socket.AF_INET, socktype=socket.SOCK_STREAM) Returns an unused port that should be suitable for binding. This is achieved by creating a temporary socket with the same family and type as the "sock" parameter (default is "AF_INET", "SOCK_STREAM"), and binding it to the specified host address (defaults to "0.0.0.0") with the port set to 0, eliciting an unused ephemeral port from the OS. The temporary socket is then closed and deleted, and the ephemeral port is returned. Either this method or "bind_port()" should be used for any tests where a server socket needs to be bound to a particular port for the duration of the test. Which one to use depends on whether the calling code is creating a Python socket, or if an unused port needs to be provided in a constructor or passed to an external program (i.e. the "-accept" argument to openssl’s s_server mode). Always prefer "bind_port()" over "find_unused_port()" where possible. Using a hard coded port is discouraged since it can make multiple instances of the test impossible to run simultaneously, which is a problem for buildbots. test.support.socket_helper.bind_port(sock, host=HOST) Bind the socket to a free port and return the port number. Relies on ephemeral ports in order to ensure we are using an unbound port. This is important as many tests may be running simultaneously, especially in a buildbot environment. This method raises an exception if the "sock.family" is "AF_INET" and "sock.type" is "SOCK_STREAM", and the socket has "SO_REUSEADDR" or "SO_REUSEPORT" set on it. Tests should never set these socket options for TCP/IP sockets. The only case for setting these options is testing multicasting via multiple UDP sockets. Additionally, if the "SO_EXCLUSIVEADDRUSE" socket option is available (i.e. on Windows), it will be set on the socket. This will prevent anyone else from binding to our host/port for the duration of the test. test.support.socket_helper.bind_unix_socket(sock, addr) Bind a unix socket, raising "unittest.SkipTest" if "PermissionError" is raised. @test.support.socket_helper.skip_unless_bind_unix_socket A decorator for running tests that require a functional "bind()" for Unix sockets. test.support.socket_helper.transient_internet(resource_name, *, timeout=30.0, errnos=()) A context manager that raises "ResourceDenied" when various issues with the internet connection manifest themselves as exceptions. "test.support.script_helper" — Utilities for the Python execution tests *********************************************************************** The "test.support.script_helper" module provides support for Python’s script execution tests. test.support.script_helper.interpreter_requires_environment() Return "True" if "sys.executable interpreter" requires environment variables in order to be able to run at all. This is designed to be used with "@unittest.skipIf()" to annotate tests that need to use an "assert_python*()" function to launch an isolated mode ("-I") or no environment mode ("-E") sub-interpreter process. A normal build & test does not run into this situation but it can happen when trying to run the standard library test suite from an interpreter that doesn’t have an obvious home with Python’s current home finding logic. Setting "PYTHONHOME" is one way to get most of the testsuite to run in that situation. "PYTHONPATH" or "PYTHONUSERSITE" are other common environment variables that might impact whether or not the interpreter can start. test.support.script_helper.run_python_until_end(*args, **env_vars) Set up the environment based on *env_vars* for running the interpreter in a subprocess. The values can include "__isolated", "__cleanenv", "__cwd", and "TERM". Changed in version 3.9: The function no longer strips whitespaces from *stderr*. test.support.script_helper.assert_python_ok(*args, **env_vars) Assert that running the interpreter with *args* and optional environment variables *env_vars* succeeds ("rc == 0") and return a "(return code, stdout, stderr)" tuple. If the "__cleanenv" keyword is set, *env_vars* is used as a fresh environment. Python is started in isolated mode (command line option "-I"), except if the "__isolated" keyword is set to "False". Changed in version 3.9: The function no longer strips whitespaces from *stderr*. test.support.script_helper.assert_python_failure(*args, **env_vars) Assert that running the interpreter with *args* and optional environment variables *env_vars* fails ("rc != 0") and return a "(return code, stdout, stderr)" tuple. See "assert_python_ok()" for more options. Changed in version 3.9: The function no longer strips whitespaces from *stderr*. test.support.script_helper.spawn_python(*args, stdout=subprocess.PIPE, stderr=subprocess.STDOUT, **kw) Run a Python subprocess with the given arguments. *kw* is extra keyword args to pass to "subprocess.Popen()". Returns a "subprocess.Popen" object. test.support.script_helper.kill_python(p) Run the given "subprocess.Popen" process until completion and return stdout. test.support.script_helper.make_script(script_dir, script_basename, source, omit_suffix=False) Create script containing *source* in path *script_dir* and *script_basename*. If *omit_suffix* is "False", append ".py" to the name. Return the full script path. test.support.script_helper.make_zip_script(zip_dir, zip_basename, script_name, name_in_zip=None) Create zip file at *zip_dir* and *zip_basename* with extension "zip" which contains the files in *script_name*. *name_in_zip* is the archive name. Return a tuple containing "(full path, full path of archive name)". test.support.script_helper.make_pkg(pkg_dir, init_source='') Create a directory named *pkg_dir* containing an "__init__" file with *init_source* as its contents. test.support.script_helper.make_zip_pkg(zip_dir, zip_basename, pkg_name, script_basename, source, depth=1, compiled=False) Create a zip package directory with a path of *zip_dir* and *zip_basename* containing an empty "__init__" file and a file *script_basename* containing the *source*. If *compiled* is "True", both source files will be compiled and added to the zip package. Return a tuple of the full zip path and the archive name for the zip file. "test.support.bytecode_helper" — Support tools for testing correct bytecode generation ************************************************************************************** The "test.support.bytecode_helper" module provides support for testing and inspecting bytecode generation. New in version 3.9. The module defines the following class: class test.support.bytecode_helper.BytecodeTestCase(unittest.TestCase) This class has custom assertion methods for inspecting bytecode. BytecodeTestCase.get_disassembly_as_string(co) Return the disassembly of *co* as string. BytecodeTestCase.assertInBytecode(x, opname, argval=_UNSPECIFIED) Return instr if *opname* is found, otherwise throws "AssertionError". BytecodeTestCase.assertNotInBytecode(x, opname, argval=_UNSPECIFIED) Throws "AssertionError" if *opname* is found. Text Processing Services ************************ The modules described in this chapter provide a wide range of string manipulation operations and other text processing services. The "codecs" module described under Binary Data Services is also highly relevant to text processing. In addition, see the documentation for Python’s built-in string type in Text Sequence Type — str. * "string" — Common string operations * String constants * Custom String Formatting * Format String Syntax * Format Specification Mini-Language * Format examples * Template strings * Helper functions * "re" — Regular expression operations * Regular Expression Syntax * Module Contents * Regular Expression Objects * Match Objects * Regular Expression Examples * Checking for a Pair * Simulating scanf() * search() vs. match() * Making a Phonebook * Text Munging * Finding all Adverbs * Finding all Adverbs and their Positions * Raw String Notation * Writing a Tokenizer * "difflib" — Helpers for computing deltas * SequenceMatcher Objects * SequenceMatcher Examples * Differ Objects * Differ Example * A command-line interface to difflib * "textwrap" — Text wrapping and filling * "unicodedata" — Unicode Database * "stringprep" — Internet String Preparation * "readline" — GNU readline interface * Init file * Line buffer * History file * History list * Startup hooks * Completion * Example * "rlcompleter" — Completion function for GNU readline * Completer Objects "textwrap" — Text wrapping and filling ************************************** **Source code:** Lib/textwrap.py ====================================================================== The "textwrap" module provides some convenience functions, as well as "TextWrapper", the class that does all the work. If you’re just wrapping or filling one or two text strings, the convenience functions should be good enough; otherwise, you should use an instance of "TextWrapper" for efficiency. textwrap.wrap(text, width=70, *, initial_indent="", subsequent_indent="", expand_tabs=True, replace_whitespace=True, fix_sentence_endings=False, break_long_words=True, drop_whitespace=True, break_on_hyphens=True, tabsize=8, max_lines=None, placeholder=' [...]') Wraps the single paragraph in *text* (a string) so every line is at most *width* characters long. Returns a list of output lines, without final newlines. Optional keyword arguments correspond to the instance attributes of "TextWrapper", documented below. See the "TextWrapper.wrap()" method for additional details on how "wrap()" behaves. textwrap.fill(text, width=70, *, initial_indent="", subsequent_indent="", expand_tabs=True, replace_whitespace=True, fix_sentence_endings=False, break_long_words=True, drop_whitespace=True, break_on_hyphens=True, tabsize=8, max_lines=None, placeholder=' [...]') Wraps the single paragraph in *text*, and returns a single string containing the wrapped paragraph. "fill()" is shorthand for "\n".join(wrap(text, ...)) In particular, "fill()" accepts exactly the same keyword arguments as "wrap()". textwrap.shorten(text, width, *, fix_sentence_endings=False, break_long_words=True, break_on_hyphens=True, placeholder=' [...]') Collapse and truncate the given *text* to fit in the given *width*. First the whitespace in *text* is collapsed (all whitespace is replaced by single spaces). If the result fits in the *width*, it is returned. Otherwise, enough words are dropped from the end so that the remaining words plus the "placeholder" fit within "width": >>> textwrap.shorten("Hello world!", width=12) 'Hello world!' >>> textwrap.shorten("Hello world!", width=11) 'Hello [...]' >>> textwrap.shorten("Hello world", width=10, placeholder="...") 'Hello...' Optional keyword arguments correspond to the instance attributes of "TextWrapper", documented below. Note that the whitespace is collapsed before the text is passed to the "TextWrapper" "fill()" function, so changing the value of "tabsize", "expand_tabs", "drop_whitespace", and "replace_whitespace" will have no effect. New in version 3.4. textwrap.dedent(text) Remove any common leading whitespace from every line in *text*. This can be used to make triple-quoted strings line up with the left edge of the display, while still presenting them in the source code in indented form. Note that tabs and spaces are both treated as whitespace, but they are not equal: the lines "" hello"" and ""\thello"" are considered to have no common leading whitespace. Lines containing only whitespace are ignored in the input and normalized to a single newline character in the output. For example: def test(): # end first line with \ to avoid the empty line! s = '''\ hello world ''' print(repr(s)) # prints ' hello\n world\n ' print(repr(dedent(s))) # prints 'hello\n world\n' textwrap.indent(text, prefix, predicate=None) Add *prefix* to the beginning of selected lines in *text*. Lines are separated by calling "text.splitlines(True)". By default, *prefix* is added to all lines that do not consist solely of whitespace (including any line endings). For example: >>> s = 'hello\n\n \nworld' >>> indent(s, ' ') ' hello\n\n \n world' The optional *predicate* argument can be used to control which lines are indented. For example, it is easy to add *prefix* to even empty and whitespace-only lines: >>> print(indent(s, '+ ', lambda line: True)) + hello + + + world New in version 3.3. "wrap()", "fill()" and "shorten()" work by creating a "TextWrapper" instance and calling a single method on it. That instance is not reused, so for applications that process many text strings using "wrap()" and/or "fill()", it may be more efficient to create your own "TextWrapper" object. Text is preferably wrapped on whitespaces and right after the hyphens in hyphenated words; only then will long words be broken if necessary, unless "TextWrapper.break_long_words" is set to false. class textwrap.TextWrapper(**kwargs) The "TextWrapper" constructor accepts a number of optional keyword arguments. Each keyword argument corresponds to an instance attribute, so for example wrapper = TextWrapper(initial_indent="* ") is the same as wrapper = TextWrapper() wrapper.initial_indent = "* " You can re-use the same "TextWrapper" object many times, and you can change any of its options through direct assignment to instance attributes between uses. The "TextWrapper" instance attributes (and keyword arguments to the constructor) are as follows: width (default: "70") The maximum length of wrapped lines. As long as there are no individual words in the input text longer than "width", "TextWrapper" guarantees that no output line will be longer than "width" characters. expand_tabs (default: "True") If true, then all tab characters in *text* will be expanded to spaces using the "expandtabs()" method of *text*. tabsize (default: "8") If "expand_tabs" is true, then all tab characters in *text* will be expanded to zero or more spaces, depending on the current column and the given tab size. New in version 3.3. replace_whitespace (default: "True") If true, after tab expansion but before wrapping, the "wrap()" method will replace each whitespace character with a single space. The whitespace characters replaced are as follows: tab, newline, vertical tab, formfeed, and carriage return ("'\t\n\v\f\r'"). Note: If "expand_tabs" is false and "replace_whitespace" is true, each tab character will be replaced by a single space, which is *not* the same as tab expansion. Note: If "replace_whitespace" is false, newlines may appear in the middle of a line and cause strange output. For this reason, text should be split into paragraphs (using "str.splitlines()" or similar) which are wrapped separately. drop_whitespace (default: "True") If true, whitespace at the beginning and ending of every line (after wrapping but before indenting) is dropped. Whitespace at the beginning of the paragraph, however, is not dropped if non-whitespace follows it. If whitespace being dropped takes up an entire line, the whole line is dropped. initial_indent (default: "''") String that will be prepended to the first line of wrapped output. Counts towards the length of the first line. The empty string is not indented. subsequent_indent (default: "''") String that will be prepended to all lines of wrapped output except the first. Counts towards the length of each line except the first. fix_sentence_endings (default: "False") If true, "TextWrapper" attempts to detect sentence endings and ensure that sentences are always separated by exactly two spaces. This is generally desired for text in a monospaced font. However, the sentence detection algorithm is imperfect: it assumes that a sentence ending consists of a lowercase letter followed by one of "'.'", "'!'", or "'?'", possibly followed by one of "'"'" or ""'"", followed by a space. One problem with this is algorithm is that it is unable to detect the difference between “Dr.” in [...] Dr. Frankenstein's monster [...] and “Spot.” in [...] See Spot. See Spot run [...] "fix_sentence_endings" is false by default. Since the sentence detection algorithm relies on "string.lowercase" for the definition of “lowercase letter”, and a convention of using two spaces after a period to separate sentences on the same line, it is specific to English-language texts. break_long_words (default: "True") If true, then words longer than "width" will be broken in order to ensure that no lines are longer than "width". If it is false, long words will not be broken, and some lines may be longer than "width". (Long words will be put on a line by themselves, in order to minimize the amount by which "width" is exceeded.) break_on_hyphens (default: "True") If true, wrapping will occur preferably on whitespaces and right after hyphens in compound words, as it is customary in English. If false, only whitespaces will be considered as potentially good places for line breaks, but you need to set "break_long_words" to false if you want truly insecable words. Default behaviour in previous versions was to always allow breaking hyphenated words. max_lines (default: "None") If not "None", then the output will contain at most *max_lines* lines, with *placeholder* appearing at the end of the output. New in version 3.4. placeholder (default: "' [...]'") String that will appear at the end of the output text if it has been truncated. New in version 3.4. "TextWrapper" also provides some public methods, analogous to the module-level convenience functions: wrap(text) Wraps the single paragraph in *text* (a string) so every line is at most "width" characters long. All wrapping options are taken from instance attributes of the "TextWrapper" instance. Returns a list of output lines, without final newlines. If the wrapped output has no content, the returned list is empty. fill(text) Wraps the single paragraph in *text*, and returns a single string containing the wrapped paragraph. "threading" — Thread-based parallelism ************************************** **Source code:** Lib/threading.py ====================================================================== This module constructs higher-level threading interfaces on top of the lower level "_thread" module. See also the "queue" module. Changed in version 3.7: This module used to be optional, it is now always available. Note: While they are not listed below, the "camelCase" names used for some methods and functions in this module in the Python 2.x series are still supported by this module. **CPython implementation detail:** In CPython, due to the *Global Interpreter Lock*, only one thread can execute Python code at once (even though certain performance-oriented libraries might overcome this limitation). If you want your application to make better use of the computational resources of multi-core machines, you are advised to use "multiprocessing" or "concurrent.futures.ProcessPoolExecutor". However, threading is still an appropriate model if you want to run multiple I/O-bound tasks simultaneously. This module defines the following functions: threading.active_count() Return the number of "Thread" objects currently alive. The returned count is equal to the length of the list returned by "enumerate()". threading.current_thread() Return the current "Thread" object, corresponding to the caller’s thread of control. If the caller’s thread of control was not created through the "threading" module, a dummy thread object with limited functionality is returned. threading.excepthook(args, /) Handle uncaught exception raised by "Thread.run()". The *args* argument has the following attributes: * *exc_type*: Exception type. * *exc_value*: Exception value, can be "None". * *exc_traceback*: Exception traceback, can be "None". * *thread*: Thread which raised the exception, can be "None". If *exc_type* is "SystemExit", the exception is silently ignored. Otherwise, the exception is printed out on "sys.stderr". If this function raises an exception, "sys.excepthook()" is called to handle it. "threading.excepthook()" can be overridden to control how uncaught exceptions raised by "Thread.run()" are handled. Storing *exc_value* using a custom hook can create a reference cycle. It should be cleared explicitly to break the reference cycle when the exception is no longer needed. Storing *thread* using a custom hook can resurrect it if it is set to an object which is being finalized. Avoid storing *thread* after the custom hook completes to avoid resurrecting objects. See also: "sys.excepthook()" handles uncaught exceptions. New in version 3.8. threading.get_ident() Return the ‘thread identifier’ of the current thread. This is a nonzero integer. Its value has no direct meaning; it is intended as a magic cookie to be used e.g. to index a dictionary of thread- specific data. Thread identifiers may be recycled when a thread exits and another thread is created. New in version 3.3. threading.get_native_id() Return the native integral Thread ID of the current thread assigned by the kernel. This is a non-negative integer. Its value may be used to uniquely identify this particular thread system-wide (until the thread terminates, after which the value may be recycled by the OS). Availability: Windows, FreeBSD, Linux, macOS, OpenBSD, NetBSD, AIX. New in version 3.8. threading.enumerate() Return a list of all "Thread" objects currently active. The list includes daemonic threads and dummy thread objects created by "current_thread()". It excludes terminated threads and threads that have not yet been started. However, the main thread is always part of the result, even when terminated. threading.main_thread() Return the main "Thread" object. In normal conditions, the main thread is the thread from which the Python interpreter was started. New in version 3.4. threading.settrace(func) Set a trace function for all threads started from the "threading" module. The *func* will be passed to "sys.settrace()" for each thread, before its "run()" method is called. threading.setprofile(func) Set a profile function for all threads started from the "threading" module. The *func* will be passed to "sys.setprofile()" for each thread, before its "run()" method is called. threading.stack_size([size]) Return the thread stack size used when creating new threads. The optional *size* argument specifies the stack size to be used for subsequently created threads, and must be 0 (use platform or configured default) or a positive integer value of at least 32,768 (32 KiB). If *size* is not specified, 0 is used. If changing the thread stack size is unsupported, a "RuntimeError" is raised. If the specified stack size is invalid, a "ValueError" is raised and the stack size is unmodified. 32 KiB is currently the minimum supported stack size value to guarantee sufficient stack space for the interpreter itself. Note that some platforms may have particular restrictions on values for the stack size, such as requiring a minimum stack size > 32 KiB or requiring allocation in multiples of the system memory page size - platform documentation should be referred to for more information (4 KiB pages are common; using multiples of 4096 for the stack size is the suggested approach in the absence of more specific information). Availability: Windows, systems with POSIX threads. This module also defines the following constant: threading.TIMEOUT_MAX The maximum value allowed for the *timeout* parameter of blocking functions ("Lock.acquire()", "RLock.acquire()", "Condition.wait()", etc.). Specifying a timeout greater than this value will raise an "OverflowError". New in version 3.2. This module defines a number of classes, which are detailed in the sections below. The design of this module is loosely based on Java’s threading model. However, where Java makes locks and condition variables basic behavior of every object, they are separate objects in Python. Python’s "Thread" class supports a subset of the behavior of Java’s Thread class; currently, there are no priorities, no thread groups, and threads cannot be destroyed, stopped, suspended, resumed, or interrupted. The static methods of Java’s Thread class, when implemented, are mapped to module-level functions. All of the methods described below are executed atomically. Thread-Local Data ================= Thread-local data is data whose values are thread specific. To manage thread-local data, just create an instance of "local" (or a subclass) and store attributes on it: mydata = threading.local() mydata.x = 1 The instance’s values will be different for separate threads. class threading.local A class that represents thread-local data. For more details and extensive examples, see the documentation string of the "_threading_local" module. Thread Objects ============== The "Thread" class represents an activity that is run in a separate thread of control. There are two ways to specify the activity: by passing a callable object to the constructor, or by overriding the "run()" method in a subclass. No other methods (except for the constructor) should be overridden in a subclass. In other words, *only* override the "__init__()" and "run()" methods of this class. Once a thread object is created, its activity must be started by calling the thread’s "start()" method. This invokes the "run()" method in a separate thread of control. Once the thread’s activity is started, the thread is considered ‘alive’. It stops being alive when its "run()" method terminates – either normally, or by raising an unhandled exception. The "is_alive()" method tests whether the thread is alive. Other threads can call a thread’s "join()" method. This blocks the calling thread until the thread whose "join()" method is called is terminated. A thread has a name. The name can be passed to the constructor, and read or changed through the "name" attribute. If the "run()" method raises an exception, "threading.excepthook()" is called to handle it. By default, "threading.excepthook()" ignores silently "SystemExit". A thread can be flagged as a “daemon thread”. The significance of this flag is that the entire Python program exits when only daemon threads are left. The initial value is inherited from the creating thread. The flag can be set through the "daemon" property or the *daemon* constructor argument. Note: Daemon threads are abruptly stopped at shutdown. Their resources (such as open files, database transactions, etc.) may not be released properly. If you want your threads to stop gracefully, make them non-daemonic and use a suitable signalling mechanism such as an "Event". There is a “main thread” object; this corresponds to the initial thread of control in the Python program. It is not a daemon thread. There is the possibility that “dummy thread objects” are created. These are thread objects corresponding to “alien threads”, which are threads of control started outside the threading module, such as directly from C code. Dummy thread objects have limited functionality; they are always considered alive and daemonic, and cannot be "join()"ed. They are never deleted, since it is impossible to detect the termination of alien threads. class threading.Thread(group=None, target=None, name=None, args=(), kwargs={}, *, daemon=None) This constructor should always be called with keyword arguments. Arguments are: *group* should be "None"; reserved for future extension when a "ThreadGroup" class is implemented. *target* is the callable object to be invoked by the "run()" method. Defaults to "None", meaning nothing is called. *name* is the thread name. By default, a unique name is constructed of the form “Thread-*N*” where *N* is a small decimal number. *args* is the argument tuple for the target invocation. Defaults to "()". *kwargs* is a dictionary of keyword arguments for the target invocation. Defaults to "{}". If not "None", *daemon* explicitly sets whether the thread is daemonic. If "None" (the default), the daemonic property is inherited from the current thread. If the subclass overrides the constructor, it must make sure to invoke the base class constructor ("Thread.__init__()") before doing anything else to the thread. Changed in version 3.3: Added the *daemon* argument. start() Start the thread’s activity. It must be called at most once per thread object. It arranges for the object’s "run()" method to be invoked in a separate thread of control. This method will raise a "RuntimeError" if called more than once on the same thread object. run() Method representing the thread’s activity. You may override this method in a subclass. The standard "run()" method invokes the callable object passed to the object’s constructor as the *target* argument, if any, with positional and keyword arguments taken from the *args* and *kwargs* arguments, respectively. join(timeout=None) Wait until the thread terminates. This blocks the calling thread until the thread whose "join()" method is called terminates – either normally or through an unhandled exception – or until the optional timeout occurs. When the *timeout* argument is present and not "None", it should be a floating point number specifying a timeout for the operation in seconds (or fractions thereof). As "join()" always returns "None", you must call "is_alive()" after "join()" to decide whether a timeout happened – if the thread is still alive, the "join()" call timed out. When the *timeout* argument is not present or "None", the operation will block until the thread terminates. A thread can be "join()"ed many times. "join()" raises a "RuntimeError" if an attempt is made to join the current thread as that would cause a deadlock. It is also an error to "join()" a thread before it has been started and attempts to do so raise the same exception. name A string used for identification purposes only. It has no semantics. Multiple threads may be given the same name. The initial name is set by the constructor. getName() setName() Old getter/setter API for "name"; use it directly as a property instead. ident The ‘thread identifier’ of this thread or "None" if the thread has not been started. This is a nonzero integer. See the "get_ident()" function. Thread identifiers may be recycled when a thread exits and another thread is created. The identifier is available even after the thread has exited. native_id The native integral thread ID of this thread. This is a non- negative integer, or "None" if the thread has not been started. See the "get_native_id()" function. This represents the Thread ID ("TID") as assigned to the thread by the OS (kernel). Its value may be used to uniquely identify this particular thread system-wide (until the thread terminates, after which the value may be recycled by the OS). Note: Similar to Process IDs, Thread IDs are only valid (guaranteed unique system-wide) from the time the thread is created until the thread has been terminated. Availability: Requires "get_native_id()" function. New in version 3.8. is_alive() Return whether the thread is alive. This method returns "True" just before the "run()" method starts until just after the "run()" method terminates. The module function "enumerate()" returns a list of all alive threads. daemon A boolean value indicating whether this thread is a daemon thread (True) or not (False). This must be set before "start()" is called, otherwise "RuntimeError" is raised. Its initial value is inherited from the creating thread; the main thread is not a daemon thread and therefore all threads created in the main thread default to "daemon" = "False". The entire Python program exits when no alive non-daemon threads are left. isDaemon() setDaemon() Old getter/setter API for "daemon"; use it directly as a property instead. Lock Objects ============ A primitive lock is a synchronization primitive that is not owned by a particular thread when locked. In Python, it is currently the lowest level synchronization primitive available, implemented directly by the "_thread" extension module. A primitive lock is in one of two states, “locked” or “unlocked”. It is created in the unlocked state. It has two basic methods, "acquire()" and "release()". When the state is unlocked, "acquire()" changes the state to locked and returns immediately. When the state is locked, "acquire()" blocks until a call to "release()" in another thread changes it to unlocked, then the "acquire()" call resets it to locked and returns. The "release()" method should only be called in the locked state; it changes the state to unlocked and returns immediately. If an attempt is made to release an unlocked lock, a "RuntimeError" will be raised. Locks also support the context management protocol. When more than one thread is blocked in "acquire()" waiting for the state to turn to unlocked, only one thread proceeds when a "release()" call resets the state to unlocked; which one of the waiting threads proceeds is not defined, and may vary across implementations. All methods are executed atomically. class threading.Lock The class implementing primitive lock objects. Once a thread has acquired a lock, subsequent attempts to acquire it block, until it is released; any thread may release it. Note that "Lock" is actually a factory function which returns an instance of the most efficient version of the concrete Lock class that is supported by the platform. acquire(blocking=True, timeout=-1) Acquire a lock, blocking or non-blocking. When invoked with the *blocking* argument set to "True" (the default), block until the lock is unlocked, then set it to locked and return "True". When invoked with the *blocking* argument set to "False", do not block. If a call with *blocking* set to "True" would block, return "False" immediately; otherwise, set the lock to locked and return "True". When invoked with the floating-point *timeout* argument set to a positive value, block for at most the number of seconds specified by *timeout* and as long as the lock cannot be acquired. A *timeout* argument of "-1" specifies an unbounded wait. It is forbidden to specify a *timeout* when *blocking* is false. The return value is "True" if the lock is acquired successfully, "False" if not (for example if the *timeout* expired). Changed in version 3.2: The *timeout* parameter is new. Changed in version 3.2: Lock acquisition can now be interrupted by signals on POSIX if the underlying threading implementation supports it. release() Release a lock. This can be called from any thread, not only the thread which has acquired the lock. When the lock is locked, reset it to unlocked, and return. If any other threads are blocked waiting for the lock to become unlocked, allow exactly one of them to proceed. When invoked on an unlocked lock, a "RuntimeError" is raised. There is no return value. locked() Return true if the lock is acquired. RLock Objects ============= A reentrant lock is a synchronization primitive that may be acquired multiple times by the same thread. Internally, it uses the concepts of “owning thread” and “recursion level” in addition to the locked/unlocked state used by primitive locks. In the locked state, some thread owns the lock; in the unlocked state, no thread owns it. To lock the lock, a thread calls its "acquire()" method; this returns once the thread owns the lock. To unlock the lock, a thread calls its "release()" method. "acquire()"/"release()" call pairs may be nested; only the final "release()" (the "release()" of the outermost pair) resets the lock to unlocked and allows another thread blocked in "acquire()" to proceed. Reentrant locks also support the context management protocol. class threading.RLock This class implements reentrant lock objects. A reentrant lock must be released by the thread that acquired it. Once a thread has acquired a reentrant lock, the same thread may acquire it again without blocking; the thread must release it once for each time it has acquired it. Note that "RLock" is actually a factory function which returns an instance of the most efficient version of the concrete RLock class that is supported by the platform. acquire(blocking=True, timeout=-1) Acquire a lock, blocking or non-blocking. When invoked without arguments: if this thread already owns the lock, increment the recursion level by one, and return immediately. Otherwise, if another thread owns the lock, block until the lock is unlocked. Once the lock is unlocked (not owned by any thread), then grab ownership, set the recursion level to one, and return. If more than one thread is blocked waiting until the lock is unlocked, only one at a time will be able to grab ownership of the lock. There is no return value in this case. When invoked with the *blocking* argument set to true, do the same thing as when called without arguments, and return "True". When invoked with the *blocking* argument set to false, do not block. If a call without an argument would block, return "False" immediately; otherwise, do the same thing as when called without arguments, and return "True". When invoked with the floating-point *timeout* argument set to a positive value, block for at most the number of seconds specified by *timeout* and as long as the lock cannot be acquired. Return "True" if the lock has been acquired, false if the timeout has elapsed. Changed in version 3.2: The *timeout* parameter is new. release() Release a lock, decrementing the recursion level. If after the decrement it is zero, reset the lock to unlocked (not owned by any thread), and if any other threads are blocked waiting for the lock to become unlocked, allow exactly one of them to proceed. If after the decrement the recursion level is still nonzero, the lock remains locked and owned by the calling thread. Only call this method when the calling thread owns the lock. A "RuntimeError" is raised if this method is called when the lock is unlocked. There is no return value. Condition Objects ================= A condition variable is always associated with some kind of lock; this can be passed in or one will be created by default. Passing one in is useful when several condition variables must share the same lock. The lock is part of the condition object: you don’t have to track it separately. A condition variable obeys the context management protocol: using the "with" statement acquires the associated lock for the duration of the enclosed block. The "acquire()" and "release()" methods also call the corresponding methods of the associated lock. Other methods must be called with the associated lock held. The "wait()" method releases the lock, and then blocks until another thread awakens it by calling "notify()" or "notify_all()". Once awakened, "wait()" re-acquires the lock and returns. It is also possible to specify a timeout. The "notify()" method wakes up one of the threads waiting for the condition variable, if any are waiting. The "notify_all()" method wakes up all threads waiting for the condition variable. Note: the "notify()" and "notify_all()" methods don’t release the lock; this means that the thread or threads awakened will not return from their "wait()" call immediately, but only when the thread that called "notify()" or "notify_all()" finally relinquishes ownership of the lock. The typical programming style using condition variables uses the lock to synchronize access to some shared state; threads that are interested in a particular change of state call "wait()" repeatedly until they see the desired state, while threads that modify the state call "notify()" or "notify_all()" when they change the state in such a way that it could possibly be a desired state for one of the waiters. For example, the following code is a generic producer-consumer situation with unlimited buffer capacity: # Consume one item with cv: while not an_item_is_available(): cv.wait() get_an_available_item() # Produce one item with cv: make_an_item_available() cv.notify() The "while" loop checking for the application’s condition is necessary because "wait()" can return after an arbitrary long time, and the condition which prompted the "notify()" call may no longer hold true. This is inherent to multi-threaded programming. The "wait_for()" method can be used to automate the condition checking, and eases the computation of timeouts: # Consume an item with cv: cv.wait_for(an_item_is_available) get_an_available_item() To choose between "notify()" and "notify_all()", consider whether one state change can be interesting for only one or several waiting threads. E.g. in a typical producer-consumer situation, adding one item to the buffer only needs to wake up one consumer thread. class threading.Condition(lock=None) This class implements condition variable objects. A condition variable allows one or more threads to wait until they are notified by another thread. If the *lock* argument is given and not "None", it must be a "Lock" or "RLock" object, and it is used as the underlying lock. Otherwise, a new "RLock" object is created and used as the underlying lock. Changed in version 3.3: changed from a factory function to a class. acquire(*args) Acquire the underlying lock. This method calls the corresponding method on the underlying lock; the return value is whatever that method returns. release() Release the underlying lock. This method calls the corresponding method on the underlying lock; there is no return value. wait(timeout=None) Wait until notified or until a timeout occurs. If the calling thread has not acquired the lock when this method is called, a "RuntimeError" is raised. This method releases the underlying lock, and then blocks until it is awakened by a "notify()" or "notify_all()" call for the same condition variable in another thread, or until the optional timeout occurs. Once awakened or timed out, it re-acquires the lock and returns. When the *timeout* argument is present and not "None", it should be a floating point number specifying a timeout for the operation in seconds (or fractions thereof). When the underlying lock is an "RLock", it is not released using its "release()" method, since this may not actually unlock the lock when it was acquired multiple times recursively. Instead, an internal interface of the "RLock" class is used, which really unlocks it even when it has been recursively acquired several times. Another internal interface is then used to restore the recursion level when the lock is reacquired. The return value is "True" unless a given *timeout* expired, in which case it is "False". Changed in version 3.2: Previously, the method always returned "None". wait_for(predicate, timeout=None) Wait until a condition evaluates to true. *predicate* should be a callable which result will be interpreted as a boolean value. A *timeout* may be provided giving the maximum time to wait. This utility method may call "wait()" repeatedly until the predicate is satisfied, or until a timeout occurs. The return value is the last return value of the predicate and will evaluate to "False" if the method timed out. Ignoring the timeout feature, calling this method is roughly equivalent to writing: while not predicate(): cv.wait() Therefore, the same rules apply as with "wait()": The lock must be held when called and is re-acquired on return. The predicate is evaluated with the lock held. New in version 3.2. notify(n=1) By default, wake up one thread waiting on this condition, if any. If the calling thread has not acquired the lock when this method is called, a "RuntimeError" is raised. This method wakes up at most *n* of the threads waiting for the condition variable; it is a no-op if no threads are waiting. The current implementation wakes up exactly *n* threads, if at least *n* threads are waiting. However, it’s not safe to rely on this behavior. A future, optimized implementation may occasionally wake up more than *n* threads. Note: an awakened thread does not actually return from its "wait()" call until it can reacquire the lock. Since "notify()" does not release the lock, its caller should. notify_all() Wake up all threads waiting on this condition. This method acts like "notify()", but wakes up all waiting threads instead of one. If the calling thread has not acquired the lock when this method is called, a "RuntimeError" is raised. Semaphore Objects ================= This is one of the oldest synchronization primitives in the history of computer science, invented by the early Dutch computer scientist Edsger W. Dijkstra (he used the names "P()" and "V()" instead of "acquire()" and "release()"). A semaphore manages an internal counter which is decremented by each "acquire()" call and incremented by each "release()" call. The counter can never go below zero; when "acquire()" finds that it is zero, it blocks, waiting until some other thread calls "release()". Semaphores also support the context management protocol. class threading.Semaphore(value=1) This class implements semaphore objects. A semaphore manages an atomic counter representing the number of "release()" calls minus the number of "acquire()" calls, plus an initial value. The "acquire()" method blocks if necessary until it can return without making the counter negative. If not given, *value* defaults to 1. The optional argument gives the initial *value* for the internal counter; it defaults to "1". If the *value* given is less than 0, "ValueError" is raised. Changed in version 3.3: changed from a factory function to a class. acquire(blocking=True, timeout=None) Acquire a semaphore. When invoked without arguments: * If the internal counter is larger than zero on entry, decrement it by one and return "True" immediately. * If the internal counter is zero on entry, block until awoken by a call to "release()". Once awoken (and the counter is greater than 0), decrement the counter by 1 and return "True". Exactly one thread will be awoken by each call to "release()". The order in which threads are awoken should not be relied on. When invoked with *blocking* set to false, do not block. If a call without an argument would block, return "False" immediately; otherwise, do the same thing as when called without arguments, and return "True". When invoked with a *timeout* other than "None", it will block for at most *timeout* seconds. If acquire does not complete successfully in that interval, return "False". Return "True" otherwise. Changed in version 3.2: The *timeout* parameter is new. release(n=1) Release a semaphore, incrementing the internal counter by *n*. When it was zero on entry and other threads are waiting for it to become larger than zero again, wake up *n* of those threads. Changed in version 3.9: Added the *n* parameter to release multiple waiting threads at once. class threading.BoundedSemaphore(value=1) Class implementing bounded semaphore objects. A bounded semaphore checks to make sure its current value doesn’t exceed its initial value. If it does, "ValueError" is raised. In most situations semaphores are used to guard resources with limited capacity. If the semaphore is released too many times it’s a sign of a bug. If not given, *value* defaults to 1. Changed in version 3.3: changed from a factory function to a class. "Semaphore" Example ------------------- Semaphores are often used to guard resources with limited capacity, for example, a database server. In any situation where the size of the resource is fixed, you should use a bounded semaphore. Before spawning any worker threads, your main thread would initialize the semaphore: maxconnections = 5 # ... pool_sema = BoundedSemaphore(value=maxconnections) Once spawned, worker threads call the semaphore’s acquire and release methods when they need to connect to the server: with pool_sema: conn = connectdb() try: # ... use connection ... finally: conn.close() The use of a bounded semaphore reduces the chance that a programming error which causes the semaphore to be released more than it’s acquired will go undetected. Event Objects ============= This is one of the simplest mechanisms for communication between threads: one thread signals an event and other threads wait for it. An event object manages an internal flag that can be set to true with the "set()" method and reset to false with the "clear()" method. The "wait()" method blocks until the flag is true. class threading.Event Class implementing event objects. An event manages a flag that can be set to true with the "set()" method and reset to false with the "clear()" method. The "wait()" method blocks until the flag is true. The flag is initially false. Changed in version 3.3: changed from a factory function to a class. is_set() Return "True" if and only if the internal flag is true. set() Set the internal flag to true. All threads waiting for it to become true are awakened. Threads that call "wait()" once the flag is true will not block at all. clear() Reset the internal flag to false. Subsequently, threads calling "wait()" will block until "set()" is called to set the internal flag to true again. wait(timeout=None) Block until the internal flag is true. If the internal flag is true on entry, return immediately. Otherwise, block until another thread calls "set()" to set the flag to true, or until the optional timeout occurs. When the timeout argument is present and not "None", it should be a floating point number specifying a timeout for the operation in seconds (or fractions thereof). This method returns "True" if and only if the internal flag has been set to true, either before the wait call or after the wait starts, so it will always return "True" except if a timeout is given and the operation times out. Changed in version 3.1: Previously, the method always returned "None". Timer Objects ============= This class represents an action that should be run only after a certain amount of time has passed — a timer. "Timer" is a subclass of "Thread" and as such also functions as an example of creating custom threads. Timers are started, as with threads, by calling their "start()" method. The timer can be stopped (before its action has begun) by calling the "cancel()" method. The interval the timer will wait before executing its action may not be exactly the same as the interval specified by the user. For example: def hello(): print("hello, world") t = Timer(30.0, hello) t.start() # after 30 seconds, "hello, world" will be printed class threading.Timer(interval, function, args=None, kwargs=None) Create a timer that will run *function* with arguments *args* and keyword arguments *kwargs*, after *interval* seconds have passed. If *args* is "None" (the default) then an empty list will be used. If *kwargs* is "None" (the default) then an empty dict will be used. Changed in version 3.3: changed from a factory function to a class. cancel() Stop the timer, and cancel the execution of the timer’s action. This will only work if the timer is still in its waiting stage. Barrier Objects =============== New in version 3.2. This class provides a simple synchronization primitive for use by a fixed number of threads that need to wait for each other. Each of the threads tries to pass the barrier by calling the "wait()" method and will block until all of the threads have made their "wait()" calls. At this point, the threads are released simultaneously. The barrier can be reused any number of times for the same number of threads. As an example, here is a simple way to synchronize a client and server thread: b = Barrier(2, timeout=5) def server(): start_server() b.wait() while True: connection = accept_connection() process_server_connection(connection) def client(): b.wait() while True: connection = make_connection() process_client_connection(connection) class threading.Barrier(parties, action=None, timeout=None) Create a barrier object for *parties* number of threads. An *action*, when provided, is a callable to be called by one of the threads when they are released. *timeout* is the default timeout value if none is specified for the "wait()" method. wait(timeout=None) Pass the barrier. When all the threads party to the barrier have called this function, they are all released simultaneously. If a *timeout* is provided, it is used in preference to any that was supplied to the class constructor. The return value is an integer in the range 0 to *parties* – 1, different for each thread. This can be used to select a thread to do some special housekeeping, e.g.: i = barrier.wait() if i == 0: # Only one thread needs to print this print("passed the barrier") If an *action* was provided to the constructor, one of the threads will have called it prior to being released. Should this call raise an error, the barrier is put into the broken state. If the call times out, the barrier is put into the broken state. This method may raise a "BrokenBarrierError" exception if the barrier is broken or reset while a thread is waiting. reset() Return the barrier to the default, empty state. Any threads waiting on it will receive the "BrokenBarrierError" exception. Note that using this function may require some external synchronization if there are other threads whose state is unknown. If a barrier is broken it may be better to just leave it and create a new one. abort() Put the barrier into a broken state. This causes any active or future calls to "wait()" to fail with the "BrokenBarrierError". Use this for example if one of the threads needs to abort, to avoid deadlocking the application. It may be preferable to simply create the barrier with a sensible *timeout* value to automatically guard against one of the threads going awry. parties The number of threads required to pass the barrier. n_waiting The number of threads currently waiting in the barrier. broken A boolean that is "True" if the barrier is in the broken state. exception threading.BrokenBarrierError This exception, a subclass of "RuntimeError", is raised when the "Barrier" object is reset or broken. Using locks, conditions, and semaphores in the "with" statement =============================================================== All of the objects provided by this module that have "acquire()" and "release()" methods can be used as context managers for a "with" statement. The "acquire()" method will be called when the block is entered, and "release()" will be called when the block is exited. Hence, the following snippet: with some_lock: # do something... is equivalent to: some_lock.acquire() try: # do something... finally: some_lock.release() Currently, "Lock", "RLock", "Condition", "Semaphore", and "BoundedSemaphore" objects may be used as "with" statement context managers. "time" — Time access and conversions ************************************ ====================================================================== This module provides various time-related functions. For related functionality, see also the "datetime" and "calendar" modules. Although this module is always available, not all functions are available on all platforms. Most of the functions defined in this module call platform C library functions with the same name. It may sometimes be helpful to consult the platform documentation, because the semantics of these functions varies among platforms. An explanation of some terminology and conventions is in order. * The *epoch* is the point where the time starts, and is platform dependent. For Unix, the epoch is January 1, 1970, 00:00:00 (UTC). To find out what the epoch is on a given platform, look at "time.gmtime(0)". * The term *seconds since the epoch* refers to the total number of elapsed seconds since the epoch, typically excluding leap seconds. Leap seconds are excluded from this total on all POSIX-compliant platforms. * The functions in this module may not handle dates and times before the epoch or far in the future. The cut-off point in the future is determined by the C library; for 32-bit systems, it is typically in 2038. * Function "strptime()" can parse 2-digit years when given "%y" format code. When 2-digit years are parsed, they are converted according to the POSIX and ISO C standards: values 69–99 are mapped to 1969–1999, and values 0–68 are mapped to 2000–2068. * UTC is Coordinated Universal Time (formerly known as Greenwich Mean Time, or GMT). The acronym UTC is not a mistake but a compromise between English and French. * DST is Daylight Saving Time, an adjustment of the timezone by (usually) one hour during part of the year. DST rules are magic (determined by local law) and can change from year to year. The C library has a table containing the local rules (often it is read from a system file for flexibility) and is the only source of True Wisdom in this respect. * The precision of the various real-time functions may be less than suggested by the units in which their value or argument is expressed. E.g. on most Unix systems, the clock “ticks” only 50 or 100 times a second. * On the other hand, the precision of "time()" and "sleep()" is better than their Unix equivalents: times are expressed as floating point numbers, "time()" returns the most accurate time available (using Unix "gettimeofday()" where available), and "sleep()" will accept a time with a nonzero fraction (Unix "select()" is used to implement this, where available). * The time value as returned by "gmtime()", "localtime()", and "strptime()", and accepted by "asctime()", "mktime()" and "strftime()", is a sequence of 9 integers. The return values of "gmtime()", "localtime()", and "strptime()" also offer attribute names for individual fields. See "struct_time" for a description of these objects. Changed in version 3.3: The "struct_time" type was extended to provide the "tm_gmtoff" and "tm_zone" attributes when platform supports corresponding "struct tm" members. Changed in version 3.6: The "struct_time" attributes "tm_gmtoff" and "tm_zone" are now available on all platforms. * Use the following functions to convert between time representations: +---------------------------+---------------------------+---------------------------+ | From | To | Use | |===========================|===========================|===========================| | seconds since the epoch | "struct_time" in UTC | "gmtime()" | +---------------------------+---------------------------+---------------------------+ | seconds since the epoch | "struct_time" in local | "localtime()" | | | time | | +---------------------------+---------------------------+---------------------------+ | "struct_time" in UTC | seconds since the epoch | "calendar.timegm()" | +---------------------------+---------------------------+---------------------------+ | "struct_time" in local | seconds since the epoch | "mktime()" | | time | | | +---------------------------+---------------------------+---------------------------+ Functions ========= time.asctime([t]) Convert a tuple or "struct_time" representing a time as returned by "gmtime()" or "localtime()" to a string of the following form: "'Sun Jun 20 23:21:05 1993'". The day field is two characters long and is space padded if the day is a single digit, e.g.: "'Wed Jun 9 04:26:40 1993'". If *t* is not provided, the current time as returned by "localtime()" is used. Locale information is not used by "asctime()". Note: Unlike the C function of the same name, "asctime()" does not add a trailing newline. time.pthread_getcpuclockid(thread_id) Return the *clk_id* of the thread-specific CPU-time clock for the specified *thread_id*. Use "threading.get_ident()" or the "ident" attribute of "threading.Thread" objects to get a suitable value for *thread_id*. Warning: Passing an invalid or expired *thread_id* may result in undefined behavior, such as segmentation fault. Availability: Unix (see the man page for *pthread_getcpuclockid(3)* for further information). New in version 3.7. time.clock_getres(clk_id) Return the resolution (precision) of the specified clock *clk_id*. Refer to Clock ID Constants for a list of accepted values for *clk_id*. Availability: Unix. New in version 3.3. time.clock_gettime(clk_id) -> float Return the time of the specified clock *clk_id*. Refer to Clock ID Constants for a list of accepted values for *clk_id*. Availability: Unix. New in version 3.3. time.clock_gettime_ns(clk_id) -> int Similar to "clock_gettime()" but return time as nanoseconds. Availability: Unix. New in version 3.7. time.clock_settime(clk_id, time: float) Set the time of the specified clock *clk_id*. Currently, "CLOCK_REALTIME" is the only accepted value for *clk_id*. Availability: Unix. New in version 3.3. time.clock_settime_ns(clk_id, time: int) Similar to "clock_settime()" but set time with nanoseconds. Availability: Unix. New in version 3.7. time.ctime([secs]) Convert a time expressed in seconds since the epoch to a string of a form: "'Sun Jun 20 23:21:05 1993'" representing local time. The day field is two characters long and is space padded if the day is a single digit, e.g.: "'Wed Jun 9 04:26:40 1993'". If *secs* is not provided or "None", the current time as returned by "time()" is used. "ctime(secs)" is equivalent to "asctime(localtime(secs))". Locale information is not used by "ctime()". time.get_clock_info(name) Get information on the specified clock as a namespace object. Supported clock names and the corresponding functions to read their value are: * "'monotonic'": "time.monotonic()" * "'perf_counter'": "time.perf_counter()" * "'process_time'": "time.process_time()" * "'thread_time'": "time.thread_time()" * "'time'": "time.time()" The result has the following attributes: * *adjustable*: "True" if the clock can be changed automatically (e.g. by a NTP daemon) or manually by the system administrator, "False" otherwise * *implementation*: The name of the underlying C function used to get the clock value. Refer to Clock ID Constants for possible values. * *monotonic*: "True" if the clock cannot go backward, "False" otherwise * *resolution*: The resolution of the clock in seconds ("float") New in version 3.3. time.gmtime([secs]) Convert a time expressed in seconds since the epoch to a "struct_time" in UTC in which the dst flag is always zero. If *secs* is not provided or "None", the current time as returned by "time()" is used. Fractions of a second are ignored. See above for a description of the "struct_time" object. See "calendar.timegm()" for the inverse of this function. time.localtime([secs]) Like "gmtime()" but converts to local time. If *secs* is not provided or "None", the current time as returned by "time()" is used. The dst flag is set to "1" when DST applies to the given time. "localtime()" may raise "OverflowError", if the timestamp is outside the range of values supported by the platform C "localtime()" or "gmtime()" functions, and "OSError" on "localtime()" or "gmtime()" failure. It’s common for this to be restricted to years between 1970 and 2038. time.mktime(t) This is the inverse function of "localtime()". Its argument is the "struct_time" or full 9-tuple (since the dst flag is needed; use "-1" as the dst flag if it is unknown) which expresses the time in *local* time, not UTC. It returns a floating point number, for compatibility with "time()". If the input value cannot be represented as a valid time, either "OverflowError" or "ValueError" will be raised (which depends on whether the invalid value is caught by Python or the underlying C libraries). The earliest date for which it can generate a time is platform-dependent. time.monotonic() -> float Return the value (in fractional seconds) of a monotonic clock, i.e. a clock that cannot go backwards. The clock is not affected by system clock updates. The reference point of the returned value is undefined, so that only the difference between the results of two calls is valid. New in version 3.3. Changed in version 3.5: The function is now always available and always system-wide. time.monotonic_ns() -> int Similar to "monotonic()", but return time as nanoseconds. New in version 3.7. time.perf_counter() -> float Return the value (in fractional seconds) of a performance counter, i.e. a clock with the highest available resolution to measure a short duration. It does include time elapsed during sleep and is system-wide. The reference point of the returned value is undefined, so that only the difference between the results of two calls is valid. New in version 3.3. time.perf_counter_ns() -> int Similar to "perf_counter()", but return time as nanoseconds. New in version 3.7. time.process_time() -> float Return the value (in fractional seconds) of the sum of the system and user CPU time of the current process. It does not include time elapsed during sleep. It is process-wide by definition. The reference point of the returned value is undefined, so that only the difference between the results of two calls is valid. New in version 3.3. time.process_time_ns() -> int Similar to "process_time()" but return time as nanoseconds. New in version 3.7. time.sleep(secs) Suspend execution of the calling thread for the given number of seconds. The argument may be a floating point number to indicate a more precise sleep time. The actual suspension time may be less than that requested because any caught signal will terminate the "sleep()" following execution of that signal’s catching routine. Also, the suspension time may be longer than requested by an arbitrary amount because of the scheduling of other activity in the system. Changed in version 3.5: The function now sleeps at least *secs* even if the sleep is interrupted by a signal, except if the signal handler raises an exception (see **PEP 475** for the rationale). time.strftime(format[, t]) Convert a tuple or "struct_time" representing a time as returned by "gmtime()" or "localtime()" to a string as specified by the *format* argument. If *t* is not provided, the current time as returned by "localtime()" is used. *format* must be a string. "ValueError" is raised if any field in *t* is outside of the allowed range. 0 is a legal argument for any position in the time tuple; if it is normally illegal the value is forced to a correct one. The following directives can be embedded in the *format* string. They are shown without the optional field width and precision specification, and are replaced by the indicated characters in the "strftime()" result: +-------------+--------------------------------------------------+---------+ | Directive | Meaning | Notes | |=============|==================================================|=========| | "%a" | Locale’s abbreviated weekday name. | | +-------------+--------------------------------------------------+---------+ | "%A" | Locale’s full weekday name. | | +-------------+--------------------------------------------------+---------+ | "%b" | Locale’s abbreviated month name. | | +-------------+--------------------------------------------------+---------+ | "%B" | Locale’s full month name. | | +-------------+--------------------------------------------------+---------+ | "%c" | Locale’s appropriate date and time | | | | representation. | | +-------------+--------------------------------------------------+---------+ | "%d" | Day of the month as a decimal number [01,31]. | | +-------------+--------------------------------------------------+---------+ | "%H" | Hour (24-hour clock) as a decimal number | | | | [00,23]. | | +-------------+--------------------------------------------------+---------+ | "%I" | Hour (12-hour clock) as a decimal number | | | | [01,12]. | | +-------------+--------------------------------------------------+---------+ | "%j" | Day of the year as a decimal number [001,366]. | | +-------------+--------------------------------------------------+---------+ | "%m" | Month as a decimal number [01,12]. | | +-------------+--------------------------------------------------+---------+ | "%M" | Minute as a decimal number [00,59]. | | +-------------+--------------------------------------------------+---------+ | "%p" | Locale’s equivalent of either AM or PM. | (1) | +-------------+--------------------------------------------------+---------+ | "%S" | Second as a decimal number [00,61]. | (2) | +-------------+--------------------------------------------------+---------+ | "%U" | Week number of the year (Sunday as the first day | (3) | | | of the week) as a decimal number [00,53]. All | | | | days in a new year preceding the first Sunday | | | | are considered to be in week 0. | | +-------------+--------------------------------------------------+---------+ | "%w" | Weekday as a decimal number [0(Sunday),6]. | | +-------------+--------------------------------------------------+---------+ | "%W" | Week number of the year (Monday as the first day | (3) | | | of the week) as a decimal number [00,53]. All | | | | days in a new year preceding the first Monday | | | | are considered to be in week 0. | | +-------------+--------------------------------------------------+---------+ | "%x" | Locale’s appropriate date representation. | | +-------------+--------------------------------------------------+---------+ | "%X" | Locale’s appropriate time representation. | | +-------------+--------------------------------------------------+---------+ | "%y" | Year without century as a decimal number | | | | [00,99]. | | +-------------+--------------------------------------------------+---------+ | "%Y" | Year with century as a decimal number. | | +-------------+--------------------------------------------------+---------+ | "%z" | Time zone offset indicating a positive or | | | | negative time difference from UTC/GMT of the | | | | form +HHMM or -HHMM, where H represents decimal | | | | hour digits and M represents decimal minute | | | | digits [-23:59, +23:59]. [1] | | +-------------+--------------------------------------------------+---------+ | "%Z" | Time zone name (no characters if no time zone | | | | exists). Deprecated. [1] | | +-------------+--------------------------------------------------+---------+ | "%%" | A literal "'%'" character. | | +-------------+--------------------------------------------------+---------+ Notes: 1. When used with the "strptime()" function, the "%p" directive only affects the output hour field if the "%I" directive is used to parse the hour. 2. The range really is "0" to "61"; value "60" is valid in timestamps representing leap seconds and value "61" is supported for historical reasons. 3. When used with the "strptime()" function, "%U" and "%W" are only used in calculations when the day of the week and the year are specified. Here is an example, a format for dates compatible with that specified in the **RFC 2822** Internet email standard. [1] >>> from time import gmtime, strftime >>> strftime("%a, %d %b %Y %H:%M:%S +0000", gmtime()) 'Thu, 28 Jun 2001 14:17:15 +0000' Additional directives may be supported on certain platforms, but only the ones listed here have a meaning standardized by ANSI C. To see the full set of format codes supported on your platform, consult the *strftime(3)* documentation. On some platforms, an optional field width and precision specification can immediately follow the initial "'%'" of a directive in the following order; this is also not portable. The field width is normally 2 except for "%j" where it is 3. time.strptime(string[, format]) Parse a string representing a time according to a format. The return value is a "struct_time" as returned by "gmtime()" or "localtime()". The *format* parameter uses the same directives as those used by "strftime()"; it defaults to ""%a %b %d %H:%M:%S %Y"" which matches the formatting returned by "ctime()". If *string* cannot be parsed according to *format*, or if it has excess data after parsing, "ValueError" is raised. The default values used to fill in any missing data when more accurate values cannot be inferred are "(1900, 1, 1, 0, 0, 0, 0, 1, -1)". Both *string* and *format* must be strings. For example: >>> import time >>> time.strptime("30 Nov 00", "%d %b %y") time.struct_time(tm_year=2000, tm_mon=11, tm_mday=30, tm_hour=0, tm_min=0, tm_sec=0, tm_wday=3, tm_yday=335, tm_isdst=-1) Support for the "%Z" directive is based on the values contained in "tzname" and whether "daylight" is true. Because of this, it is platform-specific except for recognizing UTC and GMT which are always known (and are considered to be non-daylight savings timezones). Only the directives specified in the documentation are supported. Because "strftime()" is implemented per platform it can sometimes offer more directives than those listed. But "strptime()" is independent of any platform and thus does not necessarily support all directives available that are not documented as supported. class time.struct_time The type of the time value sequence returned by "gmtime()", "localtime()", and "strptime()". It is an object with a *named tuple* interface: values can be accessed by index and by attribute name. The following values are present: +---------+---------------------+-----------------------------------+ | Index | Attribute | Values | |=========|=====================|===================================| | 0 | "tm_year" | (for example, 1993) | +---------+---------------------+-----------------------------------+ | 1 | "tm_mon" | range [1, 12] | +---------+---------------------+-----------------------------------+ | 2 | "tm_mday" | range [1, 31] | +---------+---------------------+-----------------------------------+ | 3 | "tm_hour" | range [0, 23] | +---------+---------------------+-----------------------------------+ | 4 | "tm_min" | range [0, 59] | +---------+---------------------+-----------------------------------+ | 5 | "tm_sec" | range [0, 61]; see **(2)** in | | | | "strftime()" description | +---------+---------------------+-----------------------------------+ | 6 | "tm_wday" | range [0, 6], Monday is 0 | +---------+---------------------+-----------------------------------+ | 7 | "tm_yday" | range [1, 366] | +---------+---------------------+-----------------------------------+ | 8 | "tm_isdst" | 0, 1 or -1; see below | +---------+---------------------+-----------------------------------+ | N/A | "tm_zone" | abbreviation of timezone name | +---------+---------------------+-----------------------------------+ | N/A | "tm_gmtoff" | offset east of UTC in seconds | +---------+---------------------+-----------------------------------+ Note that unlike the C structure, the month value is a range of [1, 12], not [0, 11]. In calls to "mktime()", "tm_isdst" may be set to 1 when daylight savings time is in effect, and 0 when it is not. A value of -1 indicates that this is not known, and will usually result in the correct state being filled in. When a tuple with an incorrect length is passed to a function expecting a "struct_time", or having elements of the wrong type, a "TypeError" is raised. time.time() -> float Return the time in seconds since the epoch as a floating point number. The specific date of the epoch and the handling of leap seconds is platform dependent. On Windows and most Unix systems, the epoch is January 1, 1970, 00:00:00 (UTC) and leap seconds are not counted towards the time in seconds since the epoch. This is commonly referred to as Unix time. To find out what the epoch is on a given platform, look at "gmtime(0)". Note that even though the time is always returned as a floating point number, not all systems provide time with a better precision than 1 second. While this function normally returns non-decreasing values, it can return a lower value than a previous call if the system clock has been set back between the two calls. The number returned by "time()" may be converted into a more common time format (i.e. year, month, day, hour, etc…) in UTC by passing it to "gmtime()" function or in local time by passing it to the "localtime()" function. In both cases a "struct_time" object is returned, from which the components of the calendar date may be accessed as attributes. time.thread_time() -> float Return the value (in fractional seconds) of the sum of the system and user CPU time of the current thread. It does not include time elapsed during sleep. It is thread-specific by definition. The reference point of the returned value is undefined, so that only the difference between the results of two calls in the same thread is valid. Availability: Windows, Linux, Unix systems supporting "CLOCK_THREAD_CPUTIME_ID". New in version 3.7. time.thread_time_ns() -> int Similar to "thread_time()" but return time as nanoseconds. New in version 3.7. time.time_ns() -> int Similar to "time()" but returns time as an integer number of nanoseconds since the epoch. New in version 3.7. time.tzset() Reset the time conversion rules used by the library routines. The environment variable "TZ" specifies how this is done. It will also set the variables "tzname" (from the "TZ" environment variable), "timezone" (non-DST seconds West of UTC), "altzone" (DST seconds west of UTC) and "daylight" (to 0 if this timezone does not have any daylight saving time rules, or to nonzero if there is a time, past, present or future when daylight saving time applies). Availability: Unix. Note: Although in many cases, changing the "TZ" environment variable may affect the output of functions like "localtime()" without calling "tzset()", this behavior should not be relied on.The "TZ" environment variable should contain no whitespace. The standard format of the "TZ" environment variable is (whitespace added for clarity): std offset [dst [offset [,start[/time], end[/time]]]] Where the components are: "std" and "dst" Three or more alphanumerics giving the timezone abbreviations. These will be propagated into time.tzname "offset" The offset has the form: "± hh[:mm[:ss]]". This indicates the value added the local time to arrive at UTC. If preceded by a ‘-’, the timezone is east of the Prime Meridian; otherwise, it is west. If no offset follows dst, summer time is assumed to be one hour ahead of standard time. "start[/time], end[/time]" Indicates when to change to and back from DST. The format of the start and end dates are one of the following: "J*n*" The Julian day *n* (1 <= *n* <= 365). Leap days are not counted, so in all years February 28 is day 59 and March 1 is day 60. "*n*" The zero-based Julian day (0 <= *n* <= 365). Leap days are counted, and it is possible to refer to February 29. "M*m*.*n*.*d*" The *d*’th day (0 <= *d* <= 6) of week *n* of month *m* of the year (1 <= *n* <= 5, 1 <= *m* <= 12, where week 5 means “the last *d* day in month *m*” which may occur in either the fourth or the fifth week). Week 1 is the first week in which the *d*’th day occurs. Day zero is a Sunday. "time" has the same format as "offset" except that no leading sign (‘-’ or ‘+’) is allowed. The default, if time is not given, is 02:00:00. >>> os.environ['TZ'] = 'EST+05EDT,M4.1.0,M10.5.0' >>> time.tzset() >>> time.strftime('%X %x %Z') '02:07:36 05/08/03 EDT' >>> os.environ['TZ'] = 'AEST-10AEDT-11,M10.5.0,M3.5.0' >>> time.tzset() >>> time.strftime('%X %x %Z') '16:08:12 05/08/03 AEST' On many Unix systems (including *BSD, Linux, Solaris, and Darwin), it is more convenient to use the system’s zoneinfo (*tzfile(5)*) database to specify the timezone rules. To do this, set the "TZ" environment variable to the path of the required timezone datafile, relative to the root of the systems ‘zoneinfo’ timezone database, usually located at "/usr/share/zoneinfo". For example, "'US/Eastern'", "'Australia/Melbourne'", "'Egypt'" or "'Europe/Amsterdam'". >>> os.environ['TZ'] = 'US/Eastern' >>> time.tzset() >>> time.tzname ('EST', 'EDT') >>> os.environ['TZ'] = 'Egypt' >>> time.tzset() >>> time.tzname ('EET', 'EEST') Clock ID Constants ================== These constants are used as parameters for "clock_getres()" and "clock_gettime()". time.CLOCK_BOOTTIME Identical to "CLOCK_MONOTONIC", except it also includes any time that the system is suspended. This allows applications to get a suspend-aware monotonic clock without having to deal with the complications of "CLOCK_REALTIME", which may have discontinuities if the time is changed using "settimeofday()" or similar. Availability: Linux 2.6.39 or later. New in version 3.7. time.CLOCK_HIGHRES The Solaris OS has a "CLOCK_HIGHRES" timer that attempts to use an optimal hardware source, and may give close to nanosecond resolution. "CLOCK_HIGHRES" is the nonadjustable, high-resolution clock. Availability: Solaris. New in version 3.3. time.CLOCK_MONOTONIC Clock that cannot be set and represents monotonic time since some unspecified starting point. Availability: Unix. New in version 3.3. time.CLOCK_MONOTONIC_RAW Similar to "CLOCK_MONOTONIC", but provides access to a raw hardware-based time that is not subject to NTP adjustments. Availability: Linux 2.6.28 and newer, macOS 10.12 and newer. New in version 3.3. time.CLOCK_PROCESS_CPUTIME_ID High-resolution per-process timer from the CPU. Availability: Unix. New in version 3.3. time.CLOCK_PROF High-resolution per-process timer from the CPU. Availability: FreeBSD, NetBSD 7 or later, OpenBSD. New in version 3.7. time.CLOCK_TAI International Atomic Time The system must have a current leap second table in order for this to give the correct answer. PTP or NTP software can maintain a leap second table. Availability: Linux. New in version 3.9. time.CLOCK_THREAD_CPUTIME_ID Thread-specific CPU-time clock. Availability: Unix. New in version 3.3. time.CLOCK_UPTIME Time whose absolute value is the time the system has been running and not suspended, providing accurate uptime measurement, both absolute and interval. Availability: FreeBSD, OpenBSD 5.5 or later. New in version 3.7. time.CLOCK_UPTIME_RAW Clock that increments monotonically, tracking the time since an arbitrary point, unaffected by frequency or time adjustments and not incremented while the system is asleep. Availability: macOS 10.12 and newer. New in version 3.8. The following constant is the only parameter that can be sent to "clock_settime()". time.CLOCK_REALTIME System-wide real-time clock. Setting this clock requires appropriate privileges. Availability: Unix. New in version 3.3. Timezone Constants ================== time.altzone The offset of the local DST timezone, in seconds west of UTC, if one is defined. This is negative if the local DST timezone is east of UTC (as in Western Europe, including the UK). Only use this if "daylight" is nonzero. See note below. time.daylight Nonzero if a DST timezone is defined. See note below. time.timezone The offset of the local (non-DST) timezone, in seconds west of UTC (negative in most of Western Europe, positive in the US, zero in the UK). See note below. time.tzname A tuple of two strings: the first is the name of the local non-DST timezone, the second is the name of the local DST timezone. If no DST timezone is defined, the second string should not be used. See note below. Note: For the above Timezone constants ("altzone", "daylight", "timezone", and "tzname"), the value is determined by the timezone rules in effect at module load time or the last time "tzset()" is called and may be incorrect for times in the past. It is recommended to use the "tm_gmtoff" and "tm_zone" results from "localtime()" to obtain timezone information. See also: Module "datetime" More object-oriented interface to dates and times. Module "locale" Internationalization services. The locale setting affects the interpretation of many format specifiers in "strftime()" and "strptime()". Module "calendar" General calendar-related functions. "timegm()" is the inverse of "gmtime()" from this module. -[ Footnotes ]- [1] The use of "%Z" is now deprecated, but the "%z" escape that expands to the preferred hour/minute offset is not supported by all ANSI C libraries. Also, a strict reading of the original 1982 **RFC 822** standard calls for a two-digit year ("%y" rather than "%Y"), but practice moved to 4-digit years long before the year 2000. After that, **RFC 822** became obsolete and the 4-digit year has been first recommended by **RFC 1123** and then mandated by **RFC 2822**. "timeit" — Measure execution time of small code snippets ******************************************************** **Source code:** Lib/timeit.py ====================================================================== This module provides a simple way to time small bits of Python code. It has both a Command-Line Interface as well as a callable one. It avoids a number of common traps for measuring execution times. See also Tim Peters’ introduction to the “Algorithms” chapter in the second edition of *Python Cookbook*, published by O’Reilly. Basic Examples ============== The following example shows how the Command-Line Interface can be used to compare three different expressions: $ python3 -m timeit '"-".join(str(n) for n in range(100))' 10000 loops, best of 5: 30.2 usec per loop $ python3 -m timeit '"-".join([str(n) for n in range(100)])' 10000 loops, best of 5: 27.5 usec per loop $ python3 -m timeit '"-".join(map(str, range(100)))' 10000 loops, best of 5: 23.2 usec per loop This can be achieved from the Python Interface with: >>> import timeit >>> timeit.timeit('"-".join(str(n) for n in range(100))', number=10000) 0.3018611848820001 >>> timeit.timeit('"-".join([str(n) for n in range(100)])', number=10000) 0.2727368790656328 >>> timeit.timeit('"-".join(map(str, range(100)))', number=10000) 0.23702679807320237 A callable can also be passed from the Python Interface: >>> timeit.timeit(lambda: "-".join(map(str, range(100))), number=10000) 0.19665591977536678 Note however that "timeit()" will automatically determine the number of repetitions only when the command-line interface is used. In the Examples section you can find more advanced examples. Python Interface ================ The module defines three convenience functions and a public class: timeit.timeit(stmt='pass', setup='pass', timer=, number=1000000, globals=None) Create a "Timer" instance with the given statement, *setup* code and *timer* function and run its "timeit()" method with *number* executions. The optional *globals* argument specifies a namespace in which to execute the code. Changed in version 3.5: The optional *globals* parameter was added. timeit.repeat(stmt='pass', setup='pass', timer=, repeat=5, number=1000000, globals=None) Create a "Timer" instance with the given statement, *setup* code and *timer* function and run its "repeat()" method with the given *repeat* count and *number* executions. The optional *globals* argument specifies a namespace in which to execute the code. Changed in version 3.5: The optional *globals* parameter was added. Changed in version 3.7: Default value of *repeat* changed from 3 to 5. timeit.default_timer() The default timer, which is always "time.perf_counter()". Changed in version 3.3: "time.perf_counter()" is now the default timer. class timeit.Timer(stmt='pass', setup='pass', timer=, globals=None) Class for timing execution speed of small code snippets. The constructor takes a statement to be timed, an additional statement used for setup, and a timer function. Both statements default to "'pass'"; the timer function is platform-dependent (see the module doc string). *stmt* and *setup* may also contain multiple statements separated by ";" or newlines, as long as they don’t contain multi-line string literals. The statement will by default be executed within timeit’s namespace; this behavior can be controlled by passing a namespace to *globals*. To measure the execution time of the first statement, use the "timeit()" method. The "repeat()" and "autorange()" methods are convenience methods to call "timeit()" multiple times. The execution time of *setup* is excluded from the overall timed execution run. The *stmt* and *setup* parameters can also take objects that are callable without arguments. This will embed calls to them in a timer function that will then be executed by "timeit()". Note that the timing overhead is a little larger in this case because of the extra function calls. Changed in version 3.5: The optional *globals* parameter was added. timeit(number=1000000) Time *number* executions of the main statement. This executes the setup statement once, and then returns the time it takes to execute the main statement a number of times, measured in seconds as a float. The argument is the number of times through the loop, defaulting to one million. The main statement, the setup statement and the timer function to be used are passed to the constructor. Note: By default, "timeit()" temporarily turns off *garbage collection* during the timing. The advantage of this approach is that it makes independent timings more comparable. The disadvantage is that GC may be an important component of the performance of the function being measured. If so, GC can be re-enabled as the first statement in the *setup* string. For example: timeit.Timer('for i in range(10): oct(i)', 'gc.enable()').timeit() autorange(callback=None) Automatically determine how many times to call "timeit()". This is a convenience function that calls "timeit()" repeatedly so that the total time >= 0.2 second, returning the eventual (number of loops, time taken for that number of loops). It calls "timeit()" with increasing numbers from the sequence 1, 2, 5, 10, 20, 50, … until the time taken is at least 0.2 second. If *callback* is given and is not "None", it will be called after each trial with two arguments: "callback(number, time_taken)". New in version 3.6. repeat(repeat=5, number=1000000) Call "timeit()" a few times. This is a convenience function that calls the "timeit()" repeatedly, returning a list of results. The first argument specifies how many times to call "timeit()". The second argument specifies the *number* argument for "timeit()". Note: It’s tempting to calculate mean and standard deviation from the result vector and report these. However, this is not very useful. In a typical case, the lowest value gives a lower bound for how fast your machine can run the given code snippet; higher values in the result vector are typically not caused by variability in Python’s speed, but by other processes interfering with your timing accuracy. So the "min()" of the result is probably the only number you should be interested in. After that, you should look at the entire vector and apply common sense rather than statistics. Changed in version 3.7: Default value of *repeat* changed from 3 to 5. print_exc(file=None) Helper to print a traceback from the timed code. Typical use: t = Timer(...) # outside the try/except try: t.timeit(...) # or t.repeat(...) except Exception: t.print_exc() The advantage over the standard traceback is that source lines in the compiled template will be displayed. The optional *file* argument directs where the traceback is sent; it defaults to "sys.stderr". Command-Line Interface ====================== When called as a program from the command line, the following form is used: python -m timeit [-n N] [-r N] [-u U] [-s S] [-h] [statement ...] Where the following options are understood: -n N, --number=N how many times to execute ‘statement’ -r N, --repeat=N how many times to repeat the timer (default 5) -s S, --setup=S statement to be executed once initially (default "pass") -p, --process measure process time, not wallclock time, using "time.process_time()" instead of "time.perf_counter()", which is the default New in version 3.3. -u, --unit=U specify a time unit for timer output; can select nsec, usec, msec, or sec New in version 3.5. -v, --verbose print raw timing results; repeat for more digits precision -h, --help print a short usage message and exit A multi-line statement may be given by specifying each line as a separate statement argument; indented lines are possible by enclosing an argument in quotes and using leading spaces. Multiple "-s" options are treated similarly. If "-n" is not given, a suitable number of loops is calculated by trying increasing numbers from the sequence 1, 2, 5, 10, 20, 50, … until the total time is at least 0.2 seconds. "default_timer()" measurements can be affected by other programs running on the same machine, so the best thing to do when accurate timing is necessary is to repeat the timing a few times and use the best time. The "-r" option is good for this; the default of 5 repetitions is probably enough in most cases. You can use "time.process_time()" to measure CPU time. Note: There is a certain baseline overhead associated with executing a pass statement. The code here doesn’t try to hide it, but you should be aware of it. The baseline overhead can be measured by invoking the program without arguments, and it might differ between Python versions. Examples ======== It is possible to provide a setup statement that is executed only once at the beginning: $ python -m timeit -s 'text = "sample string"; char = "g"' 'char in text' 5000000 loops, best of 5: 0.0877 usec per loop $ python -m timeit -s 'text = "sample string"; char = "g"' 'text.find(char)' 1000000 loops, best of 5: 0.342 usec per loop In the output, there are three fields. The loop count, which tells you how many times the statement body was run per timing loop repetition. The repetition count (‘best of 5’) which tells you how many times the timing loop was repeated, and finally the time the statement body took on average within the best repetition of the timing loop. That is, the time the fastest repetition took divided by the loop count. >>> import timeit >>> timeit.timeit('char in text', setup='text = "sample string"; char = "g"') 0.41440500499993504 >>> timeit.timeit('text.find(char)', setup='text = "sample string"; char = "g"') 1.7246671520006203 The same can be done using the "Timer" class and its methods: >>> import timeit >>> t = timeit.Timer('char in text', setup='text = "sample string"; char = "g"') >>> t.timeit() 0.3955516149999312 >>> t.repeat() [0.40183617287970225, 0.37027556854118704, 0.38344867356679524, 0.3712595970846668, 0.37866875250654886] The following examples show how to time expressions that contain multiple lines. Here we compare the cost of using "hasattr()" vs. "try"/"except" to test for missing and present object attributes: $ python -m timeit 'try:' ' str.__bool__' 'except AttributeError:' ' pass' 20000 loops, best of 5: 15.7 usec per loop $ python -m timeit 'if hasattr(str, "__bool__"): pass' 50000 loops, best of 5: 4.26 usec per loop $ python -m timeit 'try:' ' int.__bool__' 'except AttributeError:' ' pass' 200000 loops, best of 5: 1.43 usec per loop $ python -m timeit 'if hasattr(int, "__bool__"): pass' 100000 loops, best of 5: 2.23 usec per loop >>> import timeit >>> # attribute is missing >>> s = """\ ... try: ... str.__bool__ ... except AttributeError: ... pass ... """ >>> timeit.timeit(stmt=s, number=100000) 0.9138244460009446 >>> s = "if hasattr(str, '__bool__'): pass" >>> timeit.timeit(stmt=s, number=100000) 0.5829014980008651 >>> >>> # attribute is present >>> s = """\ ... try: ... int.__bool__ ... except AttributeError: ... pass ... """ >>> timeit.timeit(stmt=s, number=100000) 0.04215312199994514 >>> s = "if hasattr(int, '__bool__'): pass" >>> timeit.timeit(stmt=s, number=100000) 0.08588060699912603 To give the "timeit" module access to functions you define, you can pass a *setup* parameter which contains an import statement: def test(): """Stupid test function""" L = [i for i in range(100)] if __name__ == '__main__': import timeit print(timeit.timeit("test()", setup="from __main__ import test")) Another option is to pass "globals()" to the *globals* parameter, which will cause the code to be executed within your current global namespace. This can be more convenient than individually specifying imports: def f(x): return x**2 def g(x): return x**4 def h(x): return x**8 import timeit print(timeit.timeit('[func(42) for func in (f,g,h)]', globals=globals())) Graphical User Interfaces with Tk ********************************* Tk/Tcl has long been an integral part of Python. It provides a robust and platform independent windowing toolkit, that is available to Python programmers using the "tkinter" package, and its extension, the "tkinter.tix" and the "tkinter.ttk" modules. The "tkinter" package is a thin object-oriented layer on top of Tcl/Tk. To use "tkinter", you don’t need to write Tcl code, but you will need to consult the Tk documentation, and occasionally the Tcl documentation. "tkinter" is a set of wrappers that implement the Tk widgets as Python classes. "tkinter"’s chief virtues are that it is fast, and that it usually comes bundled with Python. Although its standard documentation is weak, good material is available, which includes: references, tutorials, a book and others. "tkinter" is also famous for having an outdated look and feel, which has been vastly improved in Tk 8.5. Nevertheless, there are many other GUI libraries that you could be interested in. The Python wiki lists several alternative GUI frameworks and tools. * "tkinter" — Python interface to Tcl/Tk * Tkinter Modules * Tkinter Life Preserver * How To Use This Section * A Simple Hello World Program * A (Very) Quick Look at Tcl/Tk * Mapping Basic Tk into Tkinter * How Tk and Tkinter are Related * Handy Reference * Setting Options * The Packer * Packer Options * Coupling Widget Variables * The Window Manager * Tk Option Data Types * Bindings and Events * The index Parameter * Images * File Handlers * "tkinter.colorchooser" — Color choosing dialog * "tkinter.font" — Tkinter font wrapper * Tkinter Dialogs * "tkinter.simpledialog" — Standard Tkinter input dialogs * "tkinter.filedialog" — File selection dialogs * Native Load/Save Dialogs * "tkinter.commondialog" — Dialog window templates * "tkinter.messagebox" — Tkinter message prompts * "tkinter.scrolledtext" — Scrolled Text Widget * "tkinter.dnd" — Drag and drop support * "tkinter.ttk" — Tk themed widgets * Using Ttk * Ttk Widgets * Widget * Standard Options * Scrollable Widget Options * Label Options * Compatibility Options * Widget States * ttk.Widget * Combobox * Options * Virtual events * ttk.Combobox * Spinbox * Options * Virtual events * ttk.Spinbox * Notebook * Options * Tab Options * Tab Identifiers * Virtual Events * ttk.Notebook * Progressbar * Options * ttk.Progressbar * Separator * Options * Sizegrip * Platform-specific notes * Bugs * Treeview * Options * Item Options * Tag Options * Column Identifiers * Virtual Events * ttk.Treeview * Ttk Styling * Layouts * "tkinter.tix" — Extension widgets for Tk * Using Tix * Tix Widgets * Basic Widgets * File Selectors * Hierarchical ListBox * Tabular ListBox * Manager Widgets * Image Types * Miscellaneous Widgets * Form Geometry Manager * Tix Commands * IDLE * Menus * File menu (Shell and Editor) * Edit menu (Shell and Editor) * Format menu (Editor window only) * Run menu (Editor window only) * Shell menu (Shell window only) * Debug menu (Shell window only) * Options menu (Shell and Editor) * Window menu (Shell and Editor) * Help menu (Shell and Editor) * Context Menus * Editing and navigation * Editor windows * Key bindings * Automatic indentation * Completions * Calltips * Code Context * Python Shell window * Text colors * Startup and code execution * Command line usage * Startup failure * Running user code * User output in Shell * Developing tkinter applications * Running without a subprocess * Help and preferences * Help sources * Setting preferences * IDLE on macOS * Extensions "tkinter.colorchooser" — Color choosing dialog ********************************************** **Source code:** Lib/tkinter/colorchooser.py ====================================================================== The "tkinter.colorchooser" module provides the "Chooser" class as an interface to the native color picker dialog. "Chooser" implements a modal color choosing dialog window. The "Chooser" class inherits from the "Dialog" class. class tkinter.colorchooser.Chooser(master=None, **options) tkinter.colorchooser.askcolor(color=None, **options) Create a color choosing dialog. A call to this method will show the window, wait for the user to make a selection, and return the selected color (or "None") to the caller. See also: Module "tkinter.commondialog" Tkinter standard dialog module "tkinter.dnd" — Drag and drop support ************************************* **Source code:** Lib/tkinter/dnd.py ====================================================================== Note: This is experimental and due to be deprecated when it is replaced with the Tk DND. The "tkinter.dnd" module provides drag-and-drop support for objects within a single application, within the same window or between windows. To enable an object to be dragged, you must create an event binding for it that starts the drag-and-drop process. Typically, you bind a ButtonPress event to a callback function that you write (see Bindings and Events). The function should call "dnd_start()", where ‘source’ is the object to be dragged, and ‘event’ is the event that invoked the call (the argument to your callback function). Selection of a target object occurs as follows: 1. Top-down search of area under mouse for target widget * Target widget should have a callable *dnd_accept* attribute * If *dnd_accept* is not present or returns None, search moves to parent widget * If no target widget is found, then the target object is None 2. Call to *.dnd_leave(source, event)* 3. Call to *.dnd_enter(source, event)* 4. Call to *.dnd_commit(source, event)* to notify of drop 5. Call to *.dnd_end(target, event)* to signal end of drag- and-drop class tkinter.dnd.DndHandler(source, event) The *DndHandler* class handles drag-and-drop events tracking Motion and ButtonRelease events on the root of the event widget. cancel(event=None) Cancel the drag-and-drop process. finish(event, commit=0) Execute end of drag-and-drop functions. on_motion(event) Inspect area below mouse for target objects while drag is performed. on_release(event) Signal end of drag when the release pattern is triggered. tkinter.dnd.dnd_start(source, event) Factory function for drag-and-drop process. See also: Bindings and Events "tkinter.font" — Tkinter font wrapper ************************************* **Source code:** Lib/tkinter/font.py ====================================================================== The "tkinter.font" module provides the "Font" class for creating and using named fonts. The different font weights and slants are: tkinter.font.NORMAL tkinter.font.BOLD tkinter.font.ITALIC tkinter.font.ROMAN class tkinter.font.Font(root=None, font=None, name=None, exists=False, **options) The "Font" class represents a named font. *Font* instances are given unique names and can be specified by their family, size, and style configuration. Named fonts are Tk’s method of creating and identifying fonts as a single object, rather than specifying a font by its attributes with each occurrence. arguments: *font* - font specifier tuple (family, size, options) *name* - unique font name *exists* - self points to existing named font if true additional keyword options (ignored if *font* is specified): *family* - font family i.e. Courier, Times *size* - font size If *size* is positive it is interpreted as size in points. If *size* is a negative number its absolute value is treated as size in pixels. *weight* - font emphasis (NORMAL, BOLD) *slant* - ROMAN, ITALIC *underline* - font underlining (0 - none, 1 - underline) *overstrike* - font strikeout (0 - none, 1 - strikeout) actual(option=None, displayof=None) Return the attributes of the font. cget(option) Retrieve an attribute of the font. config(**options) Modify attributes of the font. copy() Return new instance of the current font. measure(text, displayof=None) Return amount of space the text would occupy on the specified display when formatted in the current font. If no display is specified then the main application window is assumed. metrics(*options, **kw) Return font-specific data. Options include: *ascent* - distance between baseline and highest point that a character of the font can occupy *descent* - distance between baseline and lowest point that a character of the font can occupy *linespace* - minimum vertical separation necessary between any two characters of the font that ensures no vertical overlap between lines. *fixed* - 1 if font is fixed-width else 0 tkinter.font.families(root=None, displayof=None) Return the different font families. tkinter.font.names(root=None) Return the names of defined fonts. tkinter.font.nametofont(name) Return a "Font" representation of a tk named font. "tkinter.messagebox" — Tkinter message prompts ********************************************** **Source code:** Lib/tkinter/messagebox.py ====================================================================== The "tkinter.messagebox" module provides a template base class as well as a variety of convenience methods for commonly used configurations. The message boxes are modal and will return a subset of (True, False, OK, None, Yes, No) based on the user’s selection. Common message box styles and layouts include but are not limited to: [image] class tkinter.messagebox.Message(master=None, **options) Create a default information message box. **Information message box** tkinter.messagebox.showinfo(title=None, message=None, **options) **Warning message boxes** tkinter.messagebox.showwarning(title=None, message=None, **options) tkinter.messagebox.showerror(title=None, message=None, **options) **Question message boxes** tkinter.messagebox.askquestion(title=None, message=None, **options) tkinter.messagebox.askokcancel(title=None, message=None, **options) tkinter.messagebox.askretrycancel(title=None, message=None, **options) tkinter.messagebox.askyesno(title=None, message=None, **options) tkinter.messagebox.askyesnocancel(title=None, message=None, **options) "tkinter.scrolledtext" — Scrolled Text Widget ********************************************* **Source code:** Lib/tkinter/scrolledtext.py ====================================================================== The "tkinter.scrolledtext" module provides a class of the same name which implements a basic text widget which has a vertical scroll bar configured to do the “right thing.” Using the "ScrolledText" class is a lot easier than setting up a text widget and scroll bar directly. The text widget and scrollbar are packed together in a "Frame", and the methods of the "Grid" and "Pack" geometry managers are acquired from the "Frame" object. This allows the "ScrolledText" widget to be used directly to achieve most normal geometry management behavior. Should more specific control be necessary, the following attributes are available: class tkinter.scrolledtext.ScrolledText(master=None, **kw) frame The frame which surrounds the text and scroll bar widgets. vbar The scroll bar widget. "tkinter.tix" — Extension widgets for Tk **************************************** **Source code:** Lib/tkinter/tix.py Deprecated since version 3.6: This Tk extension is unmaintained and should not be used in new code. Use "tkinter.ttk" instead. ====================================================================== The "tkinter.tix" (Tk Interface Extension) module provides an additional rich set of widgets. Although the standard Tk library has many useful widgets, they are far from complete. The "tkinter.tix" library provides most of the commonly needed widgets that are missing from standard Tk: "HList", "ComboBox", "Control" (a.k.a. SpinBox) and an assortment of scrollable widgets. "tkinter.tix" also includes many more widgets that are generally useful in a wide range of applications: "NoteBook", "FileEntry", "PanedWindow", etc; there are more than 40 of them. With all these new widgets, you can introduce new interaction techniques into applications, creating more useful and more intuitive user interfaces. You can design your application by choosing the most appropriate widgets to match the special needs of your application and users. See also: Tix Homepage The home page for "Tix". This includes links to additional documentation and downloads. Tix Man Pages On-line version of the man pages and reference material. Tix Programming Guide On-line version of the programmer’s reference material. Tix Development Applications Tix applications for development of Tix and Tkinter programs. Tide applications work under Tk or Tkinter, and include **TixInspect**, an inspector to remotely modify and debug Tix/Tk/Tkinter applications. Using Tix ========= class tkinter.tix.Tk(screenName=None, baseName=None, className='Tix') Toplevel widget of Tix which represents mostly the main window of an application. It has an associated Tcl interpreter. Classes in the "tkinter.tix" module subclasses the classes in the "tkinter". The former imports the latter, so to use "tkinter.tix" with Tkinter, all you need to do is to import one module. In general, you can just import "tkinter.tix", and replace the toplevel call to "tkinter.Tk" with "tix.Tk": from tkinter import tix from tkinter.constants import * root = tix.Tk() To use "tkinter.tix", you must have the Tix widgets installed, usually alongside your installation of the Tk widgets. To test your installation, try the following: from tkinter import tix root = tix.Tk() root.tk.eval('package require Tix') Tix Widgets =========== Tix introduces over 40 widget classes to the "tkinter" repertoire. Basic Widgets ------------- class tkinter.tix.Balloon A Balloon that pops up over a widget to provide help. When the user moves the cursor inside a widget to which a Balloon widget has been bound, a small pop-up window with a descriptive message will be shown on the screen. class tkinter.tix.ButtonBox The ButtonBox widget creates a box of buttons, such as is commonly used for "Ok Cancel". class tkinter.tix.ComboBox The ComboBox widget is similar to the combo box control in MS Windows. The user can select a choice by either typing in the entry subwidget or selecting from the listbox subwidget. class tkinter.tix.Control The Control widget is also known as the "SpinBox" widget. The user can adjust the value by pressing the two arrow buttons or by entering the value directly into the entry. The new value will be checked against the user-defined upper and lower limits. class tkinter.tix.LabelEntry The LabelEntry widget packages an entry widget and a label into one mega widget. It can be used to simplify the creation of “entry- form” type of interface. class tkinter.tix.LabelFrame The LabelFrame widget packages a frame widget and a label into one mega widget. To create widgets inside a LabelFrame widget, one creates the new widgets relative to the "frame" subwidget and manage them inside the "frame" subwidget. class tkinter.tix.Meter The Meter widget can be used to show the progress of a background job which may take a long time to execute. class tkinter.tix.OptionMenu The OptionMenu creates a menu button of options. class tkinter.tix.PopupMenu The PopupMenu widget can be used as a replacement of the "tk_popup" command. The advantage of the "Tix" "PopupMenu" widget is it requires less application code to manipulate. class tkinter.tix.Select The Select widget is a container of button subwidgets. It can be used to provide radio-box or check-box style of selection options for the user. class tkinter.tix.StdButtonBox The StdButtonBox widget is a group of standard buttons for Motif- like dialog boxes. File Selectors -------------- class tkinter.tix.DirList The DirList widget displays a list view of a directory, its previous directories and its sub-directories. The user can choose one of the directories displayed in the list or change to another directory. class tkinter.tix.DirTree The DirTree widget displays a tree view of a directory, its previous directories and its sub-directories. The user can choose one of the directories displayed in the list or change to another directory. class tkinter.tix.DirSelectDialog The DirSelectDialog widget presents the directories in the file system in a dialog window. The user can use this dialog window to navigate through the file system to select the desired directory. class tkinter.tix.DirSelectBox The "DirSelectBox" is similar to the standard Motif(TM) directory- selection box. It is generally used for the user to choose a directory. DirSelectBox stores the directories mostly recently selected into a ComboBox widget so that they can be quickly selected again. class tkinter.tix.ExFileSelectBox The ExFileSelectBox widget is usually embedded in a tixExFileSelectDialog widget. It provides a convenient method for the user to select files. The style of the "ExFileSelectBox" widget is very similar to the standard file dialog on MS Windows 3.1. class tkinter.tix.FileSelectBox The FileSelectBox is similar to the standard Motif(TM) file- selection box. It is generally used for the user to choose a file. FileSelectBox stores the files mostly recently selected into a "ComboBox" widget so that they can be quickly selected again. class tkinter.tix.FileEntry The FileEntry widget can be used to input a filename. The user can type in the filename manually. Alternatively, the user can press the button widget that sits next to the entry, which will bring up a file selection dialog. Hierarchical ListBox -------------------- class tkinter.tix.HList The HList widget can be used to display any data that have a hierarchical structure, for example, file system directory trees. The list entries are indented and connected by branch lines according to their places in the hierarchy. class tkinter.tix.CheckList The CheckList widget displays a list of items to be selected by the user. CheckList acts similarly to the Tk checkbutton or radiobutton widgets, except it is capable of handling many more items than checkbuttons or radiobuttons. class tkinter.tix.Tree The Tree widget can be used to display hierarchical data in a tree form. The user can adjust the view of the tree by opening or closing parts of the tree. Tabular ListBox --------------- class tkinter.tix.TList The TList widget can be used to display data in a tabular format. The list entries of a "TList" widget are similar to the entries in the Tk listbox widget. The main differences are (1) the "TList" widget can display the list entries in a two dimensional format and (2) you can use graphical images as well as multiple colors and fonts for the list entries. Manager Widgets --------------- class tkinter.tix.PanedWindow The PanedWindow widget allows the user to interactively manipulate the sizes of several panes. The panes can be arranged either vertically or horizontally. The user changes the sizes of the panes by dragging the resize handle between two panes. class tkinter.tix.ListNoteBook The ListNoteBook widget is very similar to the "TixNoteBook" widget: it can be used to display many windows in a limited space using a notebook metaphor. The notebook is divided into a stack of pages (windows). At one time only one of these pages can be shown. The user can navigate through these pages by choosing the name of the desired page in the "hlist" subwidget. class tkinter.tix.NoteBook The NoteBook widget can be used to display many windows in a limited space using a notebook metaphor. The notebook is divided into a stack of pages. At one time only one of these pages can be shown. The user can navigate through these pages by choosing the visual “tabs” at the top of the NoteBook widget. Image Types ----------- The "tkinter.tix" module adds: * pixmap capabilities to all "tkinter.tix" and "tkinter" widgets to create color images from XPM files. * Compound image types can be used to create images that consists of multiple horizontal lines; each line is composed of a series of items (texts, bitmaps, images or spaces) arranged from left to right. For example, a compound image can be used to display a bitmap and a text string simultaneously in a Tk "Button" widget. Miscellaneous Widgets --------------------- class tkinter.tix.InputOnly The InputOnly widgets are to accept inputs from the user, which can be done with the "bind" command (Unix only). Form Geometry Manager --------------------- In addition, "tkinter.tix" augments "tkinter" by providing: class tkinter.tix.Form The Form geometry manager based on attachment rules for all Tk widgets. Tix Commands ============ class tkinter.tix.tixCommand The tix commands provide access to miscellaneous elements of "Tix"’s internal state and the "Tix" application context. Most of the information manipulated by these methods pertains to the application as a whole, or to a screen or display, rather than to a particular window. To view the current settings, the common usage is: from tkinter import tix root = tix.Tk() print(root.tix_configure()) tixCommand.tix_configure(cnf=None, **kw) Query or modify the configuration options of the Tix application context. If no option is specified, returns a dictionary all of the available options. If option is specified with no value, then the method returns a list describing the one named option (this list will be identical to the corresponding sublist of the value returned if no option is specified). If one or more option-value pairs are specified, then the method modifies the given option(s) to have the given value(s); in this case the method returns an empty string. Option may be any of the configuration options. tixCommand.tix_cget(option) Returns the current value of the configuration option given by *option*. Option may be any of the configuration options. tixCommand.tix_getbitmap(name) Locates a bitmap file of the name "name.xpm" or "name" in one of the bitmap directories (see the "tix_addbitmapdir()" method). By using "tix_getbitmap()", you can avoid hard coding the pathnames of the bitmap files in your application. When successful, it returns the complete pathname of the bitmap file, prefixed with the character "@". The returned value can be used to configure the "bitmap" option of the Tk and Tix widgets. tixCommand.tix_addbitmapdir(directory) Tix maintains a list of directories under which the "tix_getimage()" and "tix_getbitmap()" methods will search for image files. The standard bitmap directory is "$TIX_LIBRARY/bitmaps". The "tix_addbitmapdir()" method adds *directory* into this list. By using this method, the image files of an applications can also be located using the "tix_getimage()" or "tix_getbitmap()" method. tixCommand.tix_filedialog([dlgclass]) Returns the file selection dialog that may be shared among different calls from this application. This method will create a file selection dialog widget when it is called the first time. This dialog will be returned by all subsequent calls to "tix_filedialog()". An optional dlgclass parameter can be passed as a string to specified what type of file selection dialog widget is desired. Possible options are "tix", "FileSelectDialog" or "tixExFileSelectDialog". tixCommand.tix_getimage(self, name) Locates an image file of the name "name.xpm", "name.xbm" or "name.ppm" in one of the bitmap directories (see the "tix_addbitmapdir()" method above). If more than one file with the same name (but different extensions) exist, then the image type is chosen according to the depth of the X display: xbm images are chosen on monochrome displays and color images are chosen on color displays. By using "tix_getimage()", you can avoid hard coding the pathnames of the image files in your application. When successful, this method returns the name of the newly created image, which can be used to configure the "image" option of the Tk and Tix widgets. tixCommand.tix_option_get(name) Gets the options maintained by the Tix scheme mechanism. tixCommand.tix_resetoptions(newScheme, newFontSet[, newScmPrio]) Resets the scheme and fontset of the Tix application to *newScheme* and *newFontSet*, respectively. This affects only those widgets created after this call. Therefore, it is best to call the resetoptions method before the creation of any widgets in a Tix application. The optional parameter *newScmPrio* can be given to reset the priority level of the Tk options set by the Tix schemes. Because of the way Tk handles the X option database, after Tix has been has imported and inited, it is not possible to reset the color schemes and font sets using the "tix_config()" method. Instead, the "tix_resetoptions()" method must be used. "tkinter.ttk" — Tk themed widgets ********************************* **Source code:** Lib/tkinter/ttk.py ====================================================================== The "tkinter.ttk" module provides access to the Tk themed widget set, introduced in Tk 8.5. If Python has not been compiled against Tk 8.5, this module can still be accessed if *Tile* has been installed. The former method using Tk 8.5 provides additional benefits including anti-aliased font rendering under X11 and window transparency (requiring a composition window manager on X11). The basic idea for "tkinter.ttk" is to separate, to the extent possible, the code implementing a widget’s behavior from the code implementing its appearance. See also: Tk Widget Styling Support A document introducing theming support for Tk Using Ttk ========= To start using Ttk, import its module: from tkinter import ttk To override the basic Tk widgets, the import should follow the Tk import: from tkinter import * from tkinter.ttk import * That code causes several "tkinter.ttk" widgets ("Button", "Checkbutton", "Entry", "Frame", "Label", "LabelFrame", "Menubutton", "PanedWindow", "Radiobutton", "Scale" and "Scrollbar") to automatically replace the Tk widgets. This has the direct benefit of using the new widgets which gives a better look and feel across platforms; however, the replacement widgets are not completely compatible. The main difference is that widget options such as “fg”, “bg” and others related to widget styling are no longer present in Ttk widgets. Instead, use the "ttk.Style" class for improved styling effects. See also: Converting existing applications to use Tile widgets A monograph (using Tcl terminology) about differences typically encountered when moving applications to use the new widgets. Ttk Widgets =========== Ttk comes with 18 widgets, twelve of which already existed in tkinter: "Button", "Checkbutton", "Entry", "Frame", "Label", "LabelFrame", "Menubutton", "PanedWindow", "Radiobutton", "Scale", "Scrollbar", and "Spinbox". The other six are new: "Combobox", "Notebook", "Progressbar", "Separator", "Sizegrip" and "Treeview". And all them are subclasses of "Widget". Using the Ttk widgets gives the application an improved look and feel. As discussed above, there are differences in how the styling is coded. Tk code: l1 = tkinter.Label(text="Test", fg="black", bg="white") l2 = tkinter.Label(text="Test", fg="black", bg="white") Ttk code: style = ttk.Style() style.configure("BW.TLabel", foreground="black", background="white") l1 = ttk.Label(text="Test", style="BW.TLabel") l2 = ttk.Label(text="Test", style="BW.TLabel") For more information about TtkStyling, see the "Style" class documentation. Widget ====== "ttk.Widget" defines standard options and methods supported by Tk themed widgets and is not supposed to be directly instantiated. Standard Options ---------------- All the "ttk" Widgets accepts the following options: +-------------+----------------------------------------------------------------+ | Option | Description | |=============|================================================================| | class | Specifies the window class. The class is used when querying | | | the option database for the window’s other options, to | | | determine the default bindtags for the window, and to select | | | the widget’s default layout and style. This option is read- | | | only, and may only be specified when the window is created. | +-------------+----------------------------------------------------------------+ | cursor | Specifies the mouse cursor to be used for the widget. If set | | | to the empty string (the default), the cursor is inherited for | | | the parent widget. | +-------------+----------------------------------------------------------------+ | takefocus | Determines whether the window accepts the focus during | | | keyboard traversal. 0, 1 or an empty string is returned. If 0 | | | is returned, it means that the window should be skipped | | | entirely during keyboard traversal. If 1, it means that the | | | window should receive the input focus as long as it is | | | viewable. And an empty string means that the traversal scripts | | | make the decision about whether or not to focus on the window. | +-------------+----------------------------------------------------------------+ | style | May be used to specify a custom widget style. | +-------------+----------------------------------------------------------------+ Scrollable Widget Options ------------------------- The following options are supported by widgets that are controlled by a scrollbar. +------------------+-----------------------------------------------------------+ | Option | Description | |==================|===========================================================| | xscrollcommand | Used to communicate with horizontal scrollbars. When the | | | view in the widget’s window change, the widget will | | | generate a Tcl command based on the scrollcommand. | | | Usually this option consists of the method | | | "Scrollbar.set()" of some scrollbar. This will cause the | | | scrollbar to be updated whenever the view in the window | | | changes. | +------------------+-----------------------------------------------------------+ | yscrollcommand | Used to communicate with vertical scrollbars. For some | | | more information, see above. | +------------------+-----------------------------------------------------------+ Label Options ------------- The following options are supported by labels, buttons and other button-like widgets. +----------------+-------------------------------------------------------------+ | Option | Description | |================|=============================================================| | text | Specifies a text string to be displayed inside the widget. | +----------------+-------------------------------------------------------------+ | textvariable | Specifies a name whose value will be used in place of the | | | text option resource. | +----------------+-------------------------------------------------------------+ | underline | If set, specifies the index (0-based) of a character to | | | underline in the text string. The underline character is | | | used for mnemonic activation. | +----------------+-------------------------------------------------------------+ | image | Specifies an image to display. This is a list of 1 or more | | | elements. The first element is the default image name. The | | | rest of the list if a sequence of statespec/value pairs as | | | defined by "Style.map()", specifying different images to | | | use when the widget is in a particular state or a | | | combination of states. All images in the list should have | | | the same size. | +----------------+-------------------------------------------------------------+ | compound | Specifies how to display the image relative to the text, in | | | the case both text and images options are present. Valid | | | values are: * text: display text only * image: display | | | image only * top, bottom, left, right: display image | | | above, below, left of, or right of the text, respectively. | | | * none: the default. display the image if present, | | | otherwise the text. | +----------------+-------------------------------------------------------------+ | width | If greater than zero, specifies how much space, in | | | character widths, to allocate for the text label, if less | | | than zero, specifies a minimum width. If zero or | | | unspecified, the natural width of the text label is used. | +----------------+-------------------------------------------------------------+ Compatibility Options --------------------- +----------+------------------------------------------------------------------+ | Option | Description | |==========|==================================================================| | state | May be set to “normal” or “disabled” to control the “disabled” | | | state bit. This is a write-only option: setting it changes the | | | widget state, but the "Widget.state()" method does not affect | | | this option. | +----------+------------------------------------------------------------------+ Widget States ------------- The widget state is a bitmap of independent state flags. +--------------+---------------------------------------------------------------+ | Flag | Description | |==============|===============================================================| | active | The mouse cursor is over the widget and pressing a mouse | | | button will cause some action to occur | +--------------+---------------------------------------------------------------+ | disabled | Widget is disabled under program control | +--------------+---------------------------------------------------------------+ | focus | Widget has keyboard focus | +--------------+---------------------------------------------------------------+ | pressed | Widget is being pressed | +--------------+---------------------------------------------------------------+ | selected | “On”, “true”, or “current” for things like Checkbuttons and | | | radiobuttons | +--------------+---------------------------------------------------------------+ | background | Windows and Mac have a notion of an “active” or foreground | | | window. The *background* state is set for widgets in a | | | background window, and cleared for those in the foreground | | | window | +--------------+---------------------------------------------------------------+ | readonly | Widget should not allow user modification | +--------------+---------------------------------------------------------------+ | alternate | A widget-specific alternate display format | +--------------+---------------------------------------------------------------+ | invalid | The widget’s value is invalid | +--------------+---------------------------------------------------------------+ A state specification is a sequence of state names, optionally prefixed with an exclamation point indicating that the bit is off. ttk.Widget ---------- Besides the methods described below, the "ttk.Widget" supports the methods "tkinter.Widget.cget()" and "tkinter.Widget.configure()". class tkinter.ttk.Widget identify(x, y) Returns the name of the element at position *x* *y*, or the empty string if the point does not lie within any element. *x* and *y* are pixel coordinates relative to the widget. instate(statespec, callback=None, *args, **kw) Test the widget’s state. If a callback is not specified, returns "True" if the widget state matches *statespec* and "False" otherwise. If callback is specified then it is called with args if widget state matches *statespec*. state(statespec=None) Modify or inquire widget state. If *statespec* is specified, sets the widget state according to it and return a new *statespec* indicating which flags were changed. If *statespec* is not specified, returns the currently-enabled state flags. *statespec* will usually be a list or a tuple. Combobox ======== The "ttk.Combobox" widget combines a text field with a pop-down list of values. This widget is a subclass of "Entry". Besides the methods inherited from "Widget": "Widget.cget()", "Widget.configure()", "Widget.identify()", "Widget.instate()" and "Widget.state()", and the following inherited from "Entry": "Entry.bbox()", "Entry.delete()", "Entry.icursor()", "Entry.index()", "Entry.insert()", "Entry.selection()", "Entry.xview()", it has some other methods, described at "ttk.Combobox". Options ------- This widget accepts the following specific options: +-------------------+----------------------------------------------------------+ | Option | Description | |===================|==========================================================| | exportselection | Boolean value. If set, the widget selection is linked to | | | the Window Manager selection (which can be returned by | | | invoking Misc.selection_get, for example). | +-------------------+----------------------------------------------------------+ | justify | Specifies how the text is aligned within the widget. One | | | of “left”, “center”, or “right”. | +-------------------+----------------------------------------------------------+ | height | Specifies the height of the pop-down listbox, in rows. | +-------------------+----------------------------------------------------------+ | postcommand | A script (possibly registered with Misc.register) that | | | is called immediately before displaying the values. It | | | may specify which values to display. | +-------------------+----------------------------------------------------------+ | state | One of “normal”, “readonly”, or “disabled”. In the | | | “readonly” state, the value may not be edited directly, | | | and the user can only selection of the values from the | | | dropdown list. In the “normal” state, the text field is | | | directly editable. In the “disabled” state, no | | | interaction is possible. | +-------------------+----------------------------------------------------------+ | textvariable | Specifies a name whose value is linked to the widget | | | value. Whenever the value associated with that name | | | changes, the widget value is updated, and vice versa. | | | See "tkinter.StringVar". | +-------------------+----------------------------------------------------------+ | values | Specifies the list of values to display in the drop-down | | | listbox. | +-------------------+----------------------------------------------------------+ | width | Specifies an integer value indicating the desired width | | | of the entry window, in average-size characters of the | | | widget’s font. | +-------------------+----------------------------------------------------------+ Virtual events -------------- The combobox widgets generates a **<>** virtual event when the user selects an element from the list of values. ttk.Combobox ------------ class tkinter.ttk.Combobox current(newindex=None) If *newindex* is specified, sets the combobox value to the element position *newindex*. Otherwise, returns the index of the current value or -1 if the current value is not in the values list. get() Returns the current value of the combobox. set(value) Sets the value of the combobox to *value*. Spinbox ======= The "ttk.Spinbox" widget is a "ttk.Entry" enhanced with increment and decrement arrows. It can be used for numbers or lists of string values. This widget is a subclass of "Entry". Besides the methods inherited from "Widget": "Widget.cget()", "Widget.configure()", "Widget.identify()", "Widget.instate()" and "Widget.state()", and the following inherited from "Entry": "Entry.bbox()", "Entry.delete()", "Entry.icursor()", "Entry.index()", "Entry.insert()", "Entry.xview()", it has some other methods, described at "ttk.Spinbox". Options ------- This widget accepts the following specific options: +------------------------+--------------------------------------------------------+ | Option | Description | |========================|========================================================| | from | Float value. If set, this is the minimum value to | | | which the decrement button will decrement. Must be | | | spelled as "from_" when used as an argument, since | | | "from" is a Python keyword. | +------------------------+--------------------------------------------------------+ | to | Float value. If set, this is the maximum value to | | | which the increment button will increment. | +------------------------+--------------------------------------------------------+ | increment | Float value. Specifies the amount which the | | | increment/decrement buttons change the value. Defaults | | | to 1.0. | +------------------------+--------------------------------------------------------+ | values | Sequence of string or float values. If specified, the | | | increment/decrement buttons will cycle through the | | | items in this sequence rather than incrementing or | | | decrementing numbers. | +------------------------+--------------------------------------------------------+ | wrap | Boolean value. If "True", increment and decrement | | | buttons will cycle from the "to" value to the "from" | | | value or the "from" value to the "to" value, | | | respectively. | +------------------------+--------------------------------------------------------+ | format | String value. This specifies the format of numbers | | | set by the increment/decrement buttons. It must be in | | | the form “%W.Pf”, where W is the padded width of the | | | value, P is the precision, and ‘%’ and ‘f’ are | | | literal. | +------------------------+--------------------------------------------------------+ | command | Python callable. Will be called with no arguments | | | whenever either of the increment or decrement buttons | | | are pressed. | +------------------------+--------------------------------------------------------+ Virtual events -------------- The spinbox widget generates an **<>** virtual event when the user presses , and a **<>** virtual event when the user presses . ttk.Spinbox ----------- class tkinter.ttk.Spinbox get() Returns the current value of the spinbox. set(value) Sets the value of the spinbox to *value*. Notebook ======== Ttk Notebook widget manages a collection of windows and displays a single one at a time. Each child window is associated with a tab, which the user may select to change the currently-displayed window. Options ------- This widget accepts the following specific options: +-----------+------------------------------------------------------------------+ | Option | Description | |===========|==================================================================| | height | If present and greater than zero, specifies the desired height | | | of the pane area (not including internal padding or tabs). | | | Otherwise, the maximum height of all panes is used. | +-----------+------------------------------------------------------------------+ | padding | Specifies the amount of extra space to add around the outside of | | | the notebook. The padding is a list up to four length | | | specifications left top right bottom. If fewer than four | | | elements are specified, bottom defaults to top, right defaults | | | to left, and top defaults to left. | +-----------+------------------------------------------------------------------+ | width | If present and greater than zero, specified the desired width of | | | the pane area (not including internal padding). Otherwise, the | | | maximum width of all panes is used. | +-----------+------------------------------------------------------------------+ Tab Options ----------- There are also specific options for tabs: +-------------+----------------------------------------------------------------+ | Option | Description | |=============|================================================================| | state | Either “normal”, “disabled” or “hidden”. If “disabled”, then | | | the tab is not selectable. If “hidden”, then the tab is not | | | shown. | +-------------+----------------------------------------------------------------+ | sticky | Specifies how the child window is positioned within the pane | | | area. Value is a string containing zero or more of the | | | characters “n”, “s”, “e” or “w”. Each letter refers to a side | | | (north, south, east or west) that the child window will stick | | | to, as per the "grid()" geometry manager. | +-------------+----------------------------------------------------------------+ | padding | Specifies the amount of extra space to add between the | | | notebook and this pane. Syntax is the same as for the option | | | padding used by this widget. | +-------------+----------------------------------------------------------------+ | text | Specifies a text to be displayed in the tab. | +-------------+----------------------------------------------------------------+ | image | Specifies an image to display in the tab. See the option image | | | described in "Widget". | +-------------+----------------------------------------------------------------+ | compound | Specifies how to display the image relative to the text, in | | | the case both options text and image are present. See Label | | | Options for legal values. | +-------------+----------------------------------------------------------------+ | underline | Specifies the index (0-based) of a character to underline in | | | the text string. The underlined character is used for mnemonic | | | activation if "Notebook.enable_traversal()" is called. | +-------------+----------------------------------------------------------------+ Tab Identifiers --------------- The tab_id present in several methods of "ttk.Notebook" may take any of the following forms: * An integer between zero and the number of tabs * The name of a child window * A positional specification of the form “@x,y”, which identifies the tab * The literal string “current”, which identifies the currently- selected tab * The literal string “end”, which returns the number of tabs (only valid for "Notebook.index()") Virtual Events -------------- This widget generates a **<>** virtual event after a new tab is selected. ttk.Notebook ------------ class tkinter.ttk.Notebook add(child, **kw) Adds a new tab to the notebook. If window is currently managed by the notebook but hidden, it is restored to its previous position. See Tab Options for the list of available options. forget(tab_id) Removes the tab specified by *tab_id*, unmaps and unmanages the associated window. hide(tab_id) Hides the tab specified by *tab_id*. The tab will not be displayed, but the associated window remains managed by the notebook and its configuration remembered. Hidden tabs may be restored with the "add()" command. identify(x, y) Returns the name of the tab element at position *x*, *y*, or the empty string if none. index(tab_id) Returns the numeric index of the tab specified by *tab_id*, or the total number of tabs if *tab_id* is the string “end”. insert(pos, child, **kw) Inserts a pane at the specified position. *pos* is either the string “end”, an integer index, or the name of a managed child. If *child* is already managed by the notebook, moves it to the specified position. See Tab Options for the list of available options. select(tab_id=None) Selects the specified *tab_id*. The associated child window will be displayed, and the previously-selected window (if different) is unmapped. If *tab_id* is omitted, returns the widget name of the currently selected pane. tab(tab_id, option=None, **kw) Query or modify the options of the specific *tab_id*. If *kw* is not given, returns a dictionary of the tab option values. If *option* is specified, returns the value of that *option*. Otherwise, sets the options to the corresponding values. tabs() Returns a list of windows managed by the notebook. enable_traversal() Enable keyboard traversal for a toplevel window containing this notebook. This will extend the bindings for the toplevel window containing the notebook as follows: * "Control-Tab": selects the tab following the currently selected one. * "Shift-Control-Tab": selects the tab preceding the currently selected one. * "Alt-K": where *K* is the mnemonic (underlined) character of any tab, will select that tab. Multiple notebooks in a single toplevel may be enabled for traversal, including nested notebooks. However, notebook traversal only works properly if all panes have the notebook they are in as master. Progressbar =========== The "ttk.Progressbar" widget shows the status of a long-running operation. It can operate in two modes: 1) the determinate mode which shows the amount completed relative to the total amount of work to be done and 2) the indeterminate mode which provides an animated display to let the user know that work is progressing. Options ------- This widget accepts the following specific options: +------------+-----------------------------------------------------------------+ | Option | Description | |============|=================================================================| | orient | One of “horizontal” or “vertical”. Specifies the orientation of | | | the progress bar. | +------------+-----------------------------------------------------------------+ | length | Specifies the length of the long axis of the progress bar | | | (width if horizontal, height if vertical). | +------------+-----------------------------------------------------------------+ | mode | One of “determinate” or “indeterminate”. | +------------+-----------------------------------------------------------------+ | maximum | A number specifying the maximum value. Defaults to 100. | +------------+-----------------------------------------------------------------+ | value | The current value of the progress bar. In “determinate” mode, | | | this represents the amount of work completed. In | | | “indeterminate” mode, it is interpreted as modulo *maximum*; | | | that is, the progress bar completes one “cycle” when its value | | | increases by *maximum*. | +------------+-----------------------------------------------------------------+ | variable | A name which is linked to the option value. If specified, the | | | value of the progress bar is automatically set to the value of | | | this name whenever the latter is modified. | +------------+-----------------------------------------------------------------+ | phase | Read-only option. The widget periodically increments the value | | | of this option whenever its value is greater than 0 and, in | | | determinate mode, less than maximum. This option may be used by | | | the current theme to provide additional animation effects. | +------------+-----------------------------------------------------------------+ ttk.Progressbar --------------- class tkinter.ttk.Progressbar start(interval=None) Begin autoincrement mode: schedules a recurring timer event that calls "Progressbar.step()" every *interval* milliseconds. If omitted, *interval* defaults to 50 milliseconds. step(amount=None) Increments the progress bar’s value by *amount*. *amount* defaults to 1.0 if omitted. stop() Stop autoincrement mode: cancels any recurring timer event initiated by "Progressbar.start()" for this progress bar. Separator ========= The "ttk.Separator" widget displays a horizontal or vertical separator bar. It has no other methods besides the ones inherited from "ttk.Widget". Options ------- This widget accepts the following specific option: +----------+------------------------------------------------------------------+ | Option | Description | |==========|==================================================================| | orient | One of “horizontal” or “vertical”. Specifies the orientation of | | | the separator. | +----------+------------------------------------------------------------------+ Sizegrip ======== The "ttk.Sizegrip" widget (also known as a grow box) allows the user to resize the containing toplevel window by pressing and dragging the grip. This widget has neither specific options nor specific methods, besides the ones inherited from "ttk.Widget". Platform-specific notes ----------------------- * On macOS, toplevel windows automatically include a built-in size grip by default. Adding a "Sizegrip" is harmless, since the built-in grip will just mask the widget. Bugs ---- * If the containing toplevel’s position was specified relative to the right or bottom of the screen (e.g. ….), the "Sizegrip" widget will not resize the window. * This widget supports only “southeast” resizing. Treeview ======== The "ttk.Treeview" widget displays a hierarchical collection of items. Each item has a textual label, an optional image, and an optional list of data values. The data values are displayed in successive columns after the tree label. The order in which data values are displayed may be controlled by setting the widget option "displaycolumns". The tree widget can also display column headings. Columns may be accessed by number or symbolic names listed in the widget option columns. See Column Identifiers. Each item is identified by a unique name. The widget will generate item IDs if they are not supplied by the caller. There is a distinguished root item, named "{}". The root item itself is not displayed; its children appear at the top level of the hierarchy. Each item also has a list of tags, which can be used to associate event bindings with individual items and control the appearance of the item. The Treeview widget supports horizontal and vertical scrolling, according to the options described in Scrollable Widget Options and the methods "Treeview.xview()" and "Treeview.yview()". Options ------- This widget accepts the following specific options: +------------------+----------------------------------------------------------+ | Option | Description | |==================|==========================================================| | columns | A list of column identifiers, specifying the number of | | | columns and their names. | +------------------+----------------------------------------------------------+ | displaycolumns | A list of column identifiers (either symbolic or integer | | | indices) specifying which data columns are displayed and | | | the order in which they appear, or the string “#all”. | +------------------+----------------------------------------------------------+ | height | Specifies the number of rows which should be visible. | | | Note: the requested width is determined from the sum of | | | the column widths. | +------------------+----------------------------------------------------------+ | padding | Specifies the internal padding for the widget. The | | | padding is a list of up to four length specifications. | +------------------+----------------------------------------------------------+ | selectmode | Controls how the built-in class bindings manage the | | | selection. One of “extended”, “browse” or “none”. If set | | | to “extended” (the default), multiple items may be | | | selected. If “browse”, only a single item will be | | | selected at a time. If “none”, the selection will not be | | | changed. Note that the application code and tag | | | bindings can set the selection however they wish, | | | regardless of the value of this option. | +------------------+----------------------------------------------------------+ | show | A list containing zero or more of the following values, | | | specifying which elements of the tree to display. * | | | tree: display tree labels in column #0. * headings: | | | display the heading row. The default is “tree | | | headings”, i.e., show all elements. **Note**: Column #0 | | | always refers to the tree column, even if show=”tree” is | | | not specified. | +------------------+----------------------------------------------------------+ Item Options ------------ The following item options may be specified for items in the insert and item widget commands. +----------+-----------------------------------------------------------------+ | Option | Description | |==========|=================================================================| | text | The textual label to display for the item. | +----------+-----------------------------------------------------------------+ | image | A Tk Image, displayed to the left of the label. | +----------+-----------------------------------------------------------------+ | values | The list of values associated with the item. Each item should | | | have the same number of values as the widget option columns. If | | | there are fewer values than columns, the remaining values are | | | assumed empty. If there are more values than columns, the extra | | | values are ignored. | +----------+-----------------------------------------------------------------+ | open | "True"/"False" value indicating whether the item’s children | | | should be displayed or hidden. | +----------+-----------------------------------------------------------------+ | tags | A list of tags associated with this item. | +----------+-----------------------------------------------------------------+ Tag Options ----------- The following options may be specified on tags: +--------------+-------------------------------------------------------------+ | Option | Description | |==============|=============================================================| | foreground | Specifies the text foreground color. | +--------------+-------------------------------------------------------------+ | background | Specifies the cell or item background color. | +--------------+-------------------------------------------------------------+ | font | Specifies the font to use when drawing text. | +--------------+-------------------------------------------------------------+ | image | Specifies the item image, in case the item’s image option | | | is empty. | +--------------+-------------------------------------------------------------+ Column Identifiers ------------------ Column identifiers take any of the following forms: * A symbolic name from the list of columns option. * An integer n, specifying the nth data column. * A string of the form #n, where n is an integer, specifying the nth display column. Notes: * Item’s option values may be displayed in a different order than the order in which they are stored. * Column #0 always refers to the tree column, even if show=”tree” is not specified. A data column number is an index into an item’s option values list; a display column number is the column number in the tree where the values are displayed. Tree labels are displayed in column #0. If option displaycolumns is not set, then data column n is displayed in column #n+1. Again, **column #0 always refers to the tree column**. Virtual Events -------------- The Treeview widget generates the following virtual events. +----------------------+----------------------------------------------------+ | Event | Description | |======================|====================================================| | <> | Generated whenever the selection changes. | +----------------------+----------------------------------------------------+ | <> | Generated just before settings the focus item to | | | open=True. | +----------------------+----------------------------------------------------+ | <> | Generated just after setting the focus item to | | | open=False. | +----------------------+----------------------------------------------------+ The "Treeview.focus()" and "Treeview.selection()" methods can be used to determine the affected item or items. ttk.Treeview ------------ class tkinter.ttk.Treeview bbox(item, column=None) Returns the bounding box (relative to the treeview widget’s window) of the specified *item* in the form (x, y, width, height). If *column* is specified, returns the bounding box of that cell. If the *item* is not visible (i.e., if it is a descendant of a closed item or is scrolled offscreen), returns an empty string. get_children(item=None) Returns the list of children belonging to *item*. If *item* is not specified, returns root children. set_children(item, *newchildren) Replaces *item*’s child with *newchildren*. Children present in *item* that are not present in *newchildren* are detached from the tree. No items in *newchildren* may be an ancestor of *item*. Note that not specifying *newchildren* results in detaching *item*’s children. column(column, option=None, **kw) Query or modify the options for the specified *column*. If *kw* is not given, returns a dict of the column option values. If *option* is specified then the value for that *option* is returned. Otherwise, sets the options to the corresponding values. The valid options/values are: * id Returns the column name. This is a read-only option. * anchor: One of the standard Tk anchor values. Specifies how the text in this column should be aligned with respect to the cell. * minwidth: width The minimum width of the column in pixels. The treeview widget will not make the column any smaller than specified by this option when the widget is resized or the user drags a column. * stretch: "True"/"False" Specifies whether the column’s width should be adjusted when the widget is resized. * width: width The width of the column in pixels. To configure the tree column, call this with column = “#0” delete(*items) Delete all specified *items* and all their descendants. The root item may not be deleted. detach(*items) Unlinks all of the specified *items* from the tree. The items and all of their descendants are still present, and may be reinserted at another point in the tree, but will not be displayed. The root item may not be detached. exists(item) Returns "True" if the specified *item* is present in the tree. focus(item=None) If *item* is specified, sets the focus item to *item*. Otherwise, returns the current focus item, or ‘’ if there is none. heading(column, option=None, **kw) Query or modify the heading options for the specified *column*. If *kw* is not given, returns a dict of the heading option values. If *option* is specified then the value for that *option* is returned. Otherwise, sets the options to the corresponding values. The valid options/values are: * text: text The text to display in the column heading. * image: imageName Specifies an image to display to the right of the column heading. * anchor: anchor Specifies how the heading text should be aligned. One of the standard Tk anchor values. * command: callback A callback to be invoked when the heading label is pressed. To configure the tree column heading, call this with column = “#0”. identify(component, x, y) Returns a description of the specified *component* under the point given by *x* and *y*, or the empty string if no such *component* is present at that position. identify_row(y) Returns the item ID of the item at position *y*. identify_column(x) Returns the data column identifier of the cell at position *x*. The tree column has ID #0. identify_region(x, y) Returns one of: +-------------+----------------------------------------+ | region | meaning | |=============|========================================| | heading | Tree heading area. | +-------------+----------------------------------------+ | separator | Space between two columns headings. | +-------------+----------------------------------------+ | tree | The tree area. | +-------------+----------------------------------------+ | cell | A data cell. | +-------------+----------------------------------------+ Availability: Tk 8.6. identify_element(x, y) Returns the element at position *x*, *y*. Availability: Tk 8.6. index(item) Returns the integer index of *item* within its parent’s list of children. insert(parent, index, iid=None, **kw) Creates a new item and returns the item identifier of the newly created item. *parent* is the item ID of the parent item, or the empty string to create a new top-level item. *index* is an integer, or the value “end”, specifying where in the list of parent’s children to insert the new item. If *index* is less than or equal to zero, the new node is inserted at the beginning; if *index* is greater than or equal to the current number of children, it is inserted at the end. If *iid* is specified, it is used as the item identifier; *iid* must not already exist in the tree. Otherwise, a new unique identifier is generated. See Item Options for the list of available points. item(item, option=None, **kw) Query or modify the options for the specified *item*. If no options are given, a dict with options/values for the item is returned. If *option* is specified then the value for that option is returned. Otherwise, sets the options to the corresponding values as given by *kw*. move(item, parent, index) Moves *item* to position *index* in *parent*’s list of children. It is illegal to move an item under one of its descendants. If *index* is less than or equal to zero, *item* is moved to the beginning; if greater than or equal to the number of children, it is moved to the end. If *item* was detached it is reattached. next(item) Returns the identifier of *item*’s next sibling, or ‘’ if *item* is the last child of its parent. parent(item) Returns the ID of the parent of *item*, or ‘’ if *item* is at the top level of the hierarchy. prev(item) Returns the identifier of *item*’s previous sibling, or ‘’ if *item* is the first child of its parent. reattach(item, parent, index) An alias for "Treeview.move()". see(item) Ensure that *item* is visible. Sets all of *item*’s ancestors open option to "True", and scrolls the widget if necessary so that *item* is within the visible portion of the tree. selection() Returns a tuple of selected items. Changed in version 3.8: "selection()" no longer takes arguments. For changing the selection state use the following selection methods. selection_set(*items) *items* becomes the new selection. Changed in version 3.6: *items* can be passed as separate arguments, not just as a single tuple. selection_add(*items) Add *items* to the selection. Changed in version 3.6: *items* can be passed as separate arguments, not just as a single tuple. selection_remove(*items) Remove *items* from the selection. Changed in version 3.6: *items* can be passed as separate arguments, not just as a single tuple. selection_toggle(*items) Toggle the selection state of each item in *items*. Changed in version 3.6: *items* can be passed as separate arguments, not just as a single tuple. set(item, column=None, value=None) With one argument, returns a dictionary of column/value pairs for the specified *item*. With two arguments, returns the current value of the specified *column*. With three arguments, sets the value of given *column* in given *item* to the specified *value*. tag_bind(tagname, sequence=None, callback=None) Bind a callback for the given event *sequence* to the tag *tagname*. When an event is delivered to an item, the callbacks for each of the item’s tags option are called. tag_configure(tagname, option=None, **kw) Query or modify the options for the specified *tagname*. If *kw* is not given, returns a dict of the option settings for *tagname*. If *option* is specified, returns the value for that *option* for the specified *tagname*. Otherwise, sets the options to the corresponding values for the given *tagname*. tag_has(tagname, item=None) If *item* is specified, returns 1 or 0 depending on whether the specified *item* has the given *tagname*. Otherwise, returns a list of all items that have the specified tag. Availability: Tk 8.6 xview(*args) Query or modify horizontal position of the treeview. yview(*args) Query or modify vertical position of the treeview. Ttk Styling =========== Each widget in "ttk" is assigned a style, which specifies the set of elements making up the widget and how they are arranged, along with dynamic and default settings for element options. By default the style name is the same as the widget’s class name, but it may be overridden by the widget’s style option. If you don’t know the class name of a widget, use the method "Misc.winfo_class()" (somewidget.winfo_class()). See also: Tcl’2004 conference presentation This document explains how the theme engine works class tkinter.ttk.Style This class is used to manipulate the style database. configure(style, query_opt=None, **kw) Query or set the default value of the specified option(s) in *style*. Each key in *kw* is an option and each value is a string identifying the value for that option. For example, to change every default button to be a flat button with some padding and a different background color: from tkinter import ttk import tkinter root = tkinter.Tk() ttk.Style().configure("TButton", padding=6, relief="flat", background="#ccc") btn = ttk.Button(text="Sample") btn.pack() root.mainloop() map(style, query_opt=None, **kw) Query or sets dynamic values of the specified option(s) in *style*. Each key in *kw* is an option and each value should be a list or a tuple (usually) containing statespecs grouped in tuples, lists, or some other preference. A statespec is a compound of one or more states and then a value. An example may make it more understandable: import tkinter from tkinter import ttk root = tkinter.Tk() style = ttk.Style() style.map("C.TButton", foreground=[('pressed', 'red'), ('active', 'blue')], background=[('pressed', '!disabled', 'black'), ('active', 'white')] ) colored_btn = ttk.Button(text="Test", style="C.TButton").pack() root.mainloop() Note that the order of the (states, value) sequences for an option does matter, if the order is changed to "[('active', 'blue'), ('pressed', 'red')]" in the foreground option, for example, the result would be a blue foreground when the widget were in active or pressed states. lookup(style, option, state=None, default=None) Returns the value specified for *option* in *style*. If *state* is specified, it is expected to be a sequence of one or more states. If the *default* argument is set, it is used as a fallback value in case no specification for option is found. To check what font a Button uses by default: from tkinter import ttk print(ttk.Style().lookup("TButton", "font")) layout(style, layoutspec=None) Define the widget layout for given *style*. If *layoutspec* is omitted, return the layout specification for given style. *layoutspec*, if specified, is expected to be a list or some other sequence type (excluding strings), where each item should be a tuple and the first item is the layout name and the second item should have the format described in Layouts. To understand the format, see the following example (it is not intended to do anything useful): from tkinter import ttk import tkinter root = tkinter.Tk() style = ttk.Style() style.layout("TMenubutton", [ ("Menubutton.background", None), ("Menubutton.button", {"children": [("Menubutton.focus", {"children": [("Menubutton.padding", {"children": [("Menubutton.label", {"side": "left", "expand": 1})] })] })] }), ]) mbtn = ttk.Menubutton(text='Text') mbtn.pack() root.mainloop() element_create(elementname, etype, *args, **kw) Create a new element in the current theme, of the given *etype* which is expected to be either “image”, “from” or “vsapi”. The latter is only available in Tk 8.6a for Windows XP and Vista and is not described here. If “image” is used, *args* should contain the default image name followed by statespec/value pairs (this is the imagespec), and *kw* may have the following options: * border=padding padding is a list of up to four integers, specifying the left, top, right, and bottom borders, respectively. * height=height Specifies a minimum height for the element. If less than zero, the base image’s height is used as a default. * padding=padding Specifies the element’s interior padding. Defaults to border’s value if not specified. * sticky=spec Specifies how the image is placed within the final parcel. spec contains zero or more characters “n”, “s”, “w”, or “e”. * width=width Specifies a minimum width for the element. If less than zero, the base image’s width is used as a default. If “from” is used as the value of *etype*, "element_create()" will clone an existing element. *args* is expected to contain a themename, from which the element will be cloned, and optionally an element to clone from. If this element to clone from is not specified, an empty element will be used. *kw* is discarded. element_names() Returns the list of elements defined in the current theme. element_options(elementname) Returns the list of *elementname*’s options. theme_create(themename, parent=None, settings=None) Create a new theme. It is an error if *themename* already exists. If *parent* is specified, the new theme will inherit styles, elements and layouts from the parent theme. If *settings* are present they are expected to have the same syntax used for "theme_settings()". theme_settings(themename, settings) Temporarily sets the current theme to *themename*, apply specified *settings* and then restore the previous theme. Each key in *settings* is a style and each value may contain the keys ‘configure’, ‘map’, ‘layout’ and ‘element create’ and they are expected to have the same format as specified by the methods "Style.configure()", "Style.map()", "Style.layout()" and "Style.element_create()" respectively. As an example, let’s change the Combobox for the default theme a bit: from tkinter import ttk import tkinter root = tkinter.Tk() style = ttk.Style() style.theme_settings("default", { "TCombobox": { "configure": {"padding": 5}, "map": { "background": [("active", "green2"), ("!disabled", "green4")], "fieldbackground": [("!disabled", "green3")], "foreground": [("focus", "OliveDrab1"), ("!disabled", "OliveDrab2")] } } }) combo = ttk.Combobox().pack() root.mainloop() theme_names() Returns a list of all known themes. theme_use(themename=None) If *themename* is not given, returns the theme in use. Otherwise, sets the current theme to *themename*, refreshes all widgets and emits a <> event. Layouts ------- A layout can be just "None", if it takes no options, or a dict of options specifying how to arrange the element. The layout mechanism uses a simplified version of the pack geometry manager: given an initial cavity, each element is allocated a parcel. Valid options/values are: * side: whichside Specifies which side of the cavity to place the element; one of top, right, bottom or left. If omitted, the element occupies the entire cavity. * sticky: nswe Specifies where the element is placed inside its allocated parcel. * unit: 0 or 1 If set to 1, causes the element and all of its descendants to be treated as a single element for the purposes of "Widget.identify()" et al. It’s used for things like scrollbar thumbs with grips. * children: [sublayout… ] Specifies a list of elements to place inside the element. Each element is a tuple (or other sequence type) where the first item is the layout name, and the other is a Layout. "tkinter" — Python interface to Tcl/Tk ************************************** **Source code:** Lib/tkinter/__init__.py ====================================================================== The "tkinter" package (“Tk interface”) is the standard Python interface to the Tcl/Tk GUI toolkit. Both Tk and "tkinter" are available on most Unix platforms, including macOS, as well as on Windows systems. Running "python -m tkinter" from the command line should open a window demonstrating a simple Tk interface, letting you know that "tkinter" is properly installed on your system, and also showing what version of Tcl/Tk is installed, so you can read the Tcl/Tk documentation specific to that version. See also: * TkDocs Extensive tutorial on creating user interfaces with Tkinter. Explains key concepts, and illustrates recommended approaches using the modern API. * Tkinter 8.5 reference: a GUI for Python Reference documentation for Tkinter 8.5 detailing available classes, methods, and options. Tcl/Tk Resources: * Tk commands Comprehensive reference to each of the underlying Tcl/Tk commands used by Tkinter. * Tcl/Tk Home Page Additional documentation, and links to Tcl/Tk core development. Books: * Modern Tkinter for Busy Python Developers By Mark Roseman. (ISBN 978-1999149567) * Python and Tkinter Programming By Alan Moore. (ISBN 978-1788835886) * Programming Python By Mark Lutz; has excellent coverage of Tkinter. (ISBN 978-0596158101) * Tcl and the Tk Toolkit (2nd edition) By John Ousterhout, inventor of Tcl/Tk, and Ken Jones; does not cover Tkinter. (ISBN 978-0321336330) Tkinter Modules =============== Support for Tkinter is spread across several modules. Most applications will need the main "tkinter" module, as well as the "tkinter.ttk" module, which provides the modern themed widget set and API: from tkinter import * from tkinter import ttk class tkinter.Tk(screenName=None, baseName=None, className='Tk', useTk=True, sync=False, use=None) Construct a toplevel Tk widget, which is usually the main window of an application, and initialize a Tcl interpreter for this widget. Each instance has its own associated Tcl interpreter. The "Tk" class is typically instantiated using all default values. However, the following keyword arguments are currently recognized: *screenName* When given (as a string), sets the "DISPLAY" environment variable. (X11 only) *baseName* Name of the profile file. By default, *baseName* is derived from the program name ("sys.argv[0]"). *className* Name of the widget class. Used as a profile file and also as the name with which Tcl is invoked (*argv0* in *interp*). *useTk* If "True", initialize the Tk subsystem. The "tkinter.Tcl()" function sets this to "False". *sync* If "True", execute all X server commands synchronously, so that errors are reported immediately. Can be used for debugging. (X11 only) *use* Specifies the *id* of the window in which to embed the application, instead of it being created as an independent toplevel window. *id* must be specified in the same way as the value for the -use option for toplevel widgets (that is, it has a form like that returned by "winfo_id()"). Note that on some platforms this will only work correctly if *id* refers to a Tk frame or toplevel that has its -container option enabled. "Tk" reads and interprets profile files, named ".*className*.tcl" and ".*baseName*.tcl", into the Tcl interpreter and calls "exec()" on the contents of ".*className*.py" and ".*baseName*.py". The path for the profile files is the "HOME" environment variable or, if that isn’t defined, then "os.curdir". tk The Tk application object created by instantiating "Tk". This provides access to the Tcl interpreter. Each widget that is attached the same instance of "Tk" has the same value for its "tk" attribute. master The widget object that contains this widget. For "Tk", the *master* is "None" because it is the main window. The terms *master* and *parent* are similar and sometimes used interchangeably as argument names; however, calling "winfo_parent()" returns a string of the widget name whereas "master" returns the object. *parent*/*child* reflects the tree- like relationship while *master*/*slave* reflects the container structure. children The immediate descendants of this widget as a "dict" with the child widget names as the keys and the child instance objects as the values. tkinter.Tcl(screenName=None, baseName=None, className='Tk', useTk=False) The "Tcl()" function is a factory function which creates an object much like that created by the "Tk" class, except that it does not initialize the Tk subsystem. This is most often useful when driving the Tcl interpreter in an environment where one doesn’t want to create extraneous toplevel windows, or where one cannot (such as Unix/Linux systems without an X server). An object created by the "Tcl()" object can have a Toplevel window created (and the Tk subsystem initialized) by calling its "loadtk()" method. The modules that provide Tk support include: "tkinter" Main Tkinter module. "tkinter.colorchooser" Dialog to let the user choose a color. "tkinter.commondialog" Base class for the dialogs defined in the other modules listed here. "tkinter.filedialog" Common dialogs to allow the user to specify a file to open or save. "tkinter.font" Utilities to help work with fonts. "tkinter.messagebox" Access to standard Tk dialog boxes. "tkinter.scrolledtext" Text widget with a vertical scroll bar built in. "tkinter.simpledialog" Basic dialogs and convenience functions. "tkinter.ttk" Themed widget set introduced in Tk 8.5, providing modern alternatives for many of the classic widgets in the main "tkinter" module. Additional modules: "_tkinter" A binary module that contains the low-level interface to Tcl/Tk. It is automatically imported by the main "tkinter" module, and should never be used directly by application programmers. It is usually a shared library (or DLL), but might in some cases be statically linked with the Python interpreter. "idlelib" Python’s Integrated Development and Learning Environment (IDLE). Based on "tkinter". "tkinter.constants" Symbolic constants that can be used in place of strings when passing various parameters to Tkinter calls. Automatically imported by the main "tkinter" module. "tkinter.dnd" (experimental) Drag-and-drop support for "tkinter". This will become deprecated when it is replaced with the Tk DND. "tkinter.tix" (deprecated) An older third-party Tcl/Tk package that adds several new widgets. Better alternatives for most can be found in "tkinter.ttk". "turtle" Turtle graphics in a Tk window. Tkinter Life Preserver ====================== This section is not designed to be an exhaustive tutorial on either Tk or Tkinter. Rather, it is intended as a stop gap, providing some introductory orientation on the system. Credits: * Tk was written by John Ousterhout while at Berkeley. * Tkinter was written by Steen Lumholt and Guido van Rossum. * This Life Preserver was written by Matt Conway at the University of Virginia. * The HTML rendering, and some liberal editing, was produced from a FrameMaker version by Ken Manheimer. * Fredrik Lundh elaborated and revised the class interface descriptions, to get them current with Tk 4.2. * Mike Clarkson converted the documentation to LaTeX, and compiled the User Interface chapter of the reference manual. How To Use This Section ----------------------- This section is designed in two parts: the first half (roughly) covers background material, while the second half can be taken to the keyboard as a handy reference. When trying to answer questions of the form “how do I do blah”, it is often best to find out how to do “blah” in straight Tk, and then convert this back into the corresponding "tkinter" call. Python programmers can often guess at the correct Python command by looking at the Tk documentation. This means that in order to use Tkinter, you will have to know a little bit about Tk. This document can’t fulfill that role, so the best we can do is point you to the best documentation that exists. Here are some hints: * The authors strongly suggest getting a copy of the Tk man pages. Specifically, the man pages in the "manN" directory are most useful. The "man3" man pages describe the C interface to the Tk library and thus are not especially helpful for script writers. * Addison-Wesley publishes a book called Tcl and the Tk Toolkit by John Ousterhout (ISBN 0-201-63337-X) which is a good introduction to Tcl and Tk for the novice. The book is not exhaustive, and for many details it defers to the man pages. * "tkinter/__init__.py" is a last resort for most, but can be a good place to go when nothing else makes sense. A Simple Hello World Program ---------------------------- import tkinter as tk class Application(tk.Frame): def __init__(self, master=None): super().__init__(master) self.master = master self.pack() self.create_widgets() def create_widgets(self): self.hi_there = tk.Button(self) self.hi_there["text"] = "Hello World\n(click me)" self.hi_there["command"] = self.say_hi self.hi_there.pack(side="top") self.quit = tk.Button(self, text="QUIT", fg="red", command=self.master.destroy) self.quit.pack(side="bottom") def say_hi(self): print("hi there, everyone!") root = tk.Tk() app = Application(master=root) app.mainloop() A (Very) Quick Look at Tcl/Tk ============================= The class hierarchy looks complicated, but in actual practice, application programmers almost always refer to the classes at the very bottom of the hierarchy. Notes: * These classes are provided for the purposes of organizing certain functions under one namespace. They aren’t meant to be instantiated independently. * The "Tk" class is meant to be instantiated only once in an application. Application programmers need not instantiate one explicitly, the system creates one whenever any of the other classes are instantiated. * The "Widget" class is not meant to be instantiated, it is meant only for subclassing to make “real” widgets (in C++, this is called an ‘abstract class’). To make use of this reference material, there will be times when you will need to know how to read short passages of Tk and how to identify the various parts of a Tk command. (See section Mapping Basic Tk into Tkinter for the "tkinter" equivalents of what’s below.) Tk scripts are Tcl programs. Like all Tcl programs, Tk scripts are just lists of tokens separated by spaces. A Tk widget is just its *class*, the *options* that help configure it, and the *actions* that make it do useful things. To make a widget in Tk, the command is always of the form: classCommand newPathname options *classCommand* denotes which kind of widget to make (a button, a label, a menu…) *newPathname* is the new name for this widget. All names in Tk must be unique. To help enforce this, widgets in Tk are named with *pathnames*, just like files in a file system. The top level widget, the *root*, is called "." (period) and children are delimited by more periods. For example, ".myApp.controlPanel.okButton" might be the name of a widget. *options* configure the widget’s appearance and in some cases, its behavior. The options come in the form of a list of flags and values. Flags are preceded by a ‘-‘, like Unix shell command flags, and values are put in quotes if they are more than one word. For example: button .fred -fg red -text "hi there" ^ ^ \______________________/ | | | class new options command widget (-opt val -opt val ...) Once created, the pathname to the widget becomes a new command. This new *widget command* is the programmer’s handle for getting the new widget to perform some *action*. In C, you’d express this as someAction(fred, someOptions), in C++, you would express this as fred.someAction(someOptions), and in Tk, you say: .fred someAction someOptions Note that the object name, ".fred", starts with a dot. As you’d expect, the legal values for *someAction* will depend on the widget’s class: ".fred disable" works if fred is a button (fred gets greyed out), but does not work if fred is a label (disabling of labels is not supported in Tk). The legal values of *someOptions* is action dependent. Some actions, like "disable", require no arguments, others, like a text-entry box’s "delete" command, would need arguments to specify what range of text to delete. Mapping Basic Tk into Tkinter ============================= Class commands in Tk correspond to class constructors in Tkinter. button .fred =====> fred = Button() The master of an object is implicit in the new name given to it at creation time. In Tkinter, masters are specified explicitly. button .panel.fred =====> fred = Button(panel) The configuration options in Tk are given in lists of hyphened tags followed by values. In Tkinter, options are specified as keyword- arguments in the instance constructor, and keyword-args for configure calls or as instance indices, in dictionary style, for established instances. See section Setting Options on setting options. button .fred -fg red =====> fred = Button(panel, fg="red") .fred configure -fg red =====> fred["fg"] = red OR ==> fred.config(fg="red") In Tk, to perform an action on a widget, use the widget name as a command, and follow it with an action name, possibly with arguments (options). In Tkinter, you call methods on the class instance to invoke actions on the widget. The actions (methods) that a given widget can perform are listed in "tkinter/__init__.py". .fred invoke =====> fred.invoke() To give a widget to the packer (geometry manager), you call pack with optional arguments. In Tkinter, the Pack class holds all this functionality, and the various forms of the pack command are implemented as methods. All widgets in "tkinter" are subclassed from the Packer, and so inherit all the packing methods. See the "tkinter.tix" module documentation for additional information on the Form geometry manager. pack .fred -side left =====> fred.pack(side="left") How Tk and Tkinter are Related ============================== From the top down: Your App Here (Python) A Python application makes a "tkinter" call. tkinter (Python Package) This call (say, for example, creating a button widget), is implemented in the "tkinter" package, which is written in Python. This Python function will parse the commands and the arguments and convert them into a form that makes them look as if they had come from a Tk script instead of a Python script. _tkinter (C) These commands and their arguments will be passed to a C function in the "_tkinter" - note the underscore - extension module. Tk Widgets (C and Tcl) This C function is able to make calls into other C modules, including the C functions that make up the Tk library. Tk is implemented in C and some Tcl. The Tcl part of the Tk widgets is used to bind certain default behaviors to widgets, and is executed once at the point where the Python "tkinter" package is imported. (The user never sees this stage). Tk (C) The Tk part of the Tk Widgets implement the final mapping to … Xlib (C) the Xlib library to draw graphics on the screen. Handy Reference =============== Setting Options --------------- Options control things like the color and border width of a widget. Options can be set in three ways: At object creation time, using keyword arguments fred = Button(self, fg="red", bg="blue") After object creation, treating the option name like a dictionary index fred["fg"] = "red" fred["bg"] = "blue" Use the config() method to update multiple attrs subsequent to object creation fred.config(fg="red", bg="blue") For a complete explanation of a given option and its behavior, see the Tk man pages for the widget in question. Note that the man pages list “STANDARD OPTIONS” and “WIDGET SPECIFIC OPTIONS” for each widget. The former is a list of options that are common to many widgets, the latter are the options that are idiosyncratic to that particular widget. The Standard Options are documented on the *options(3)* man page. No distinction between standard and widget-specific options is made in this document. Some options don’t apply to some kinds of widgets. Whether a given widget responds to a particular option depends on the class of the widget; buttons have a "command" option, labels do not. The options supported by a given widget are listed in that widget’s man page, or can be queried at runtime by calling the "config()" method without arguments, or by calling the "keys()" method on that widget. The return value of these calls is a dictionary whose key is the name of the option as a string (for example, "'relief'") and whose values are 5-tuples. Some options, like "bg" are synonyms for common options with long names ("bg" is shorthand for “background”). Passing the "config()" method the name of a shorthand option will return a 2-tuple, not 5-tuple. The 2-tuple passed back will contain the name of the synonym and the “real” option (such as "('bg', 'background')"). +---------+-----------------------------------+----------------+ | Index | Meaning | Example | |=========|===================================|================| | 0 | option name | "'relief'" | +---------+-----------------------------------+----------------+ | 1 | option name for database lookup | "'relief'" | +---------+-----------------------------------+----------------+ | 2 | option class for database lookup | "'Relief'" | +---------+-----------------------------------+----------------+ | 3 | default value | "'raised'" | +---------+-----------------------------------+----------------+ | 4 | current value | "'groove'" | +---------+-----------------------------------+----------------+ Example: >>> print(fred.config()) {'relief': ('relief', 'relief', 'Relief', 'raised', 'groove')} Of course, the dictionary printed will include all the options available and their values. This is meant only as an example. The Packer ---------- The packer is one of Tk’s geometry-management mechanisms. Geometry managers are used to specify the relative positioning of widgets within their container - their mutual *master*. In contrast to the more cumbersome *placer* (which is used less commonly, and we do not cover here), the packer takes qualitative relationship specification - *above*, *to the left of*, *filling*, etc - and works everything out to determine the exact placement coordinates for you. The size of any *master* widget is determined by the size of the “slave widgets” inside. The packer is used to control where slave widgets appear inside the master into which they are packed. You can pack widgets into frames, and frames into other frames, in order to achieve the kind of layout you desire. Additionally, the arrangement is dynamically adjusted to accommodate incremental changes to the configuration, once it is packed. Note that widgets do not appear until they have had their geometry specified with a geometry manager. It’s a common early mistake to leave out the geometry specification, and then be surprised when the widget is created but nothing appears. A widget will appear only after it has had, for example, the packer’s "pack()" method applied to it. The pack() method can be called with keyword-option/value pairs that control where the widget is to appear within its container, and how it is to behave when the main application window is resized. Here are some examples: fred.pack() # defaults to side = "top" fred.pack(side="left") fred.pack(expand=1) Packer Options -------------- For more extensive information on the packer and the options that it can take, see the man pages and page 183 of John Ousterhout’s book. anchor Anchor type. Denotes where the packer is to place each slave in its parcel. expand Boolean, "0" or "1". fill Legal values: "'x'", "'y'", "'both'", "'none'". ipadx and ipady A distance - designating internal padding on each side of the slave widget. padx and pady A distance - designating external padding on each side of the slave widget. side Legal values are: "'left'", "'right'", "'top'", "'bottom'". Coupling Widget Variables ------------------------- The current-value setting of some widgets (like text entry widgets) can be connected directly to application variables by using special options. These options are "variable", "textvariable", "onvalue", "offvalue", and "value". This connection works both ways: if the variable changes for any reason, the widget it’s connected to will be updated to reflect the new value. Unfortunately, in the current implementation of "tkinter" it is not possible to hand over an arbitrary Python variable to a widget through a "variable" or "textvariable" option. The only kinds of variables for which this works are variables that are subclassed from a class called Variable, defined in "tkinter". There are many useful subclasses of Variable already defined: "StringVar", "IntVar", "DoubleVar", and "BooleanVar". To read the current value of such a variable, call the "get()" method on it, and to change its value you call the "set()" method. If you follow this protocol, the widget will always track the value of the variable, with no further intervention on your part. For example: import tkinter as tk class App(tk.Frame): def __init__(self, master): super().__init__(master) self.pack() self.entrythingy = tk.Entry() self.entrythingy.pack() # Create the application variable. self.contents = tk.StringVar() # Set it to some value. self.contents.set("this is a variable") # Tell the entry widget to watch this variable. self.entrythingy["textvariable"] = self.contents # Define a callback for when the user hits return. # It prints the current value of the variable. self.entrythingy.bind('', self.print_contents) def print_contents(self, event): print("Hi. The current entry content is:", self.contents.get()) root = tk.Tk() myapp = App(root) myapp.mainloop() The Window Manager ------------------ In Tk, there is a utility command, "wm", for interacting with the window manager. Options to the "wm" command allow you to control things like titles, placement, icon bitmaps, and the like. In "tkinter", these commands have been implemented as methods on the "Wm" class. Toplevel widgets are subclassed from the "Wm" class, and so can call the "Wm" methods directly. To get at the toplevel window that contains a given widget, you can often just refer to the widget’s master. Of course if the widget has been packed inside of a frame, the master won’t represent a toplevel window. To get at the toplevel window that contains an arbitrary widget, you can call the "_root()" method. This method begins with an underscore to denote the fact that this function is part of the implementation, and not an interface to Tk functionality. Here are some examples of typical usage: import tkinter as tk class App(tk.Frame): def __init__(self, master=None): super().__init__(master) self.pack() # create the application myapp = App() # # here are method calls to the window manager class # myapp.master.title("My Do-Nothing Application") myapp.master.maxsize(1000, 400) # start the program myapp.mainloop() Tk Option Data Types -------------------- anchor Legal values are points of the compass: ""n"", ""ne"", ""e"", ""se"", ""s"", ""sw"", ""w"", ""nw"", and also ""center"". bitmap There are eight built-in, named bitmaps: "'error'", "'gray25'", "'gray50'", "'hourglass'", "'info'", "'questhead'", "'question'", "'warning'". To specify an X bitmap filename, give the full path to the file, preceded with an "@", as in ""@/usr/contrib/bitmap/gumby.bit"". boolean You can pass integers 0 or 1 or the strings ""yes"" or ""no"". callback This is any Python function that takes no arguments. For example: def print_it(): print("hi there") fred["command"] = print_it color Colors can be given as the names of X colors in the rgb.txt file, or as strings representing RGB values in 4 bit: ""#RGB"", 8 bit: ""#RRGGBB"", 12 bit: ""#RRRGGGBBB"", or 16 bit: ""#RRRRGGGGBBBB"" ranges, where R,G,B here represent any legal hex digit. See page 160 of Ousterhout’s book for details. cursor The standard X cursor names from "cursorfont.h" can be used, without the "XC_" prefix. For example to get a hand cursor ("XC_hand2"), use the string ""hand2"". You can also specify a bitmap and mask file of your own. See page 179 of Ousterhout’s book. distance Screen distances can be specified in either pixels or absolute distances. Pixels are given as numbers and absolute distances as strings, with the trailing character denoting units: "c" for centimetres, "i" for inches, "m" for millimetres, "p" for printer’s points. For example, 3.5 inches is expressed as ""3.5i"". font Tk uses a list font name format, such as "{courier 10 bold}". Font sizes with positive numbers are measured in points; sizes with negative numbers are measured in pixels. geometry This is a string of the form "widthxheight", where width and height are measured in pixels for most widgets (in characters for widgets displaying text). For example: "fred["geometry"] = "200x100"". justify Legal values are the strings: ""left"", ""center"", ""right"", and ""fill"". region This is a string with four space-delimited elements, each of which is a legal distance (see above). For example: ""2 3 4 5"" and ""3i 2i 4.5i 2i"" and ""3c 2c 4c 10.43c"" are all legal regions. relief Determines what the border style of a widget will be. Legal values are: ""raised"", ""sunken"", ""flat"", ""groove"", and ""ridge"". scrollcommand This is almost always the "set()" method of some scrollbar widget, but can be any widget method that takes a single argument. wrap Must be one of: ""none"", ""char"", or ""word"". Bindings and Events ------------------- The bind method from the widget command allows you to watch for certain events and to have a callback function trigger when that event type occurs. The form of the bind method is: def bind(self, sequence, func, add=''): where: sequence is a string that denotes the target kind of event. (See the bind man page and page 201 of John Ousterhout’s book for details). func is a Python function, taking one argument, to be invoked when the event occurs. An Event instance will be passed as the argument. (Functions deployed this way are commonly known as *callbacks*.) add is optional, either "''" or "'+'". Passing an empty string denotes that this binding is to replace any other bindings that this event is associated with. Passing a "'+'" means that this function is to be added to the list of functions bound to this event type. For example: def turn_red(self, event): event.widget["activeforeground"] = "red" self.button.bind("", self.turn_red) Notice how the widget field of the event is being accessed in the "turn_red()" callback. This field contains the widget that caught the X event. The following table lists the other event fields you can access, and how they are denoted in Tk, which can be useful when referring to the Tk man pages. +------+-----------------------+------+-----------------------+ | Tk | Tkinter Event Field | Tk | Tkinter Event Field | |======|=======================|======|=======================| | %f | focus | %A | char | +------+-----------------------+------+-----------------------+ | %h | height | %E | send_event | +------+-----------------------+------+-----------------------+ | %k | keycode | %K | keysym | +------+-----------------------+------+-----------------------+ | %s | state | %N | keysym_num | +------+-----------------------+------+-----------------------+ | %t | time | %T | type | +------+-----------------------+------+-----------------------+ | %w | width | %W | widget | +------+-----------------------+------+-----------------------+ | %x | x | %X | x_root | +------+-----------------------+------+-----------------------+ | %y | y | %Y | y_root | +------+-----------------------+------+-----------------------+ The index Parameter ------------------- A number of widgets require “index” parameters to be passed. These are used to point at a specific place in a Text widget, or to particular characters in an Entry widget, or to particular menu items in a Menu widget. Entry widget indexes (index, view index, etc.) Entry widgets have options that refer to character positions in the text being displayed. You can use these "tkinter" functions to access these special points in text widgets: Text widget indexes The index notation for Text widgets is very rich and is best described in the Tk man pages. Menu indexes (menu.invoke(), menu.entryconfig(), etc.) Some options and methods for menus manipulate specific menu entries. Anytime a menu index is needed for an option or a parameter, you may pass in: * an integer which refers to the numeric position of the entry in the widget, counted from the top, starting with 0; * the string ""active"", which refers to the menu position that is currently under the cursor; * the string ""last"" which refers to the last menu item; * An integer preceded by "@", as in "@6", where the integer is interpreted as a y pixel coordinate in the menu’s coordinate system; * the string ""none"", which indicates no menu entry at all, most often used with menu.activate() to deactivate all entries, and finally, * a text string that is pattern matched against the label of the menu entry, as scanned from the top of the menu to the bottom. Note that this index type is considered after all the others, which means that matches for menu items labelled "last", "active", or "none" may be interpreted as the above literals, instead. Images ------ Images of different formats can be created through the corresponding subclass of "tkinter.Image": * "BitmapImage" for images in XBM format. * "PhotoImage" for images in PGM, PPM, GIF and PNG formats. The latter is supported starting with Tk 8.6. Either type of image is created through either the "file" or the "data" option (other options are available as well). The image object can then be used wherever an "image" option is supported by some widget (e.g. labels, buttons, menus). In these cases, Tk will not keep a reference to the image. When the last Python reference to the image object is deleted, the image data is deleted as well, and Tk will display an empty box wherever the image was used. See also: The Pillow package adds support for formats such as BMP, JPEG, TIFF, and WebP, among others. File Handlers ============= Tk allows you to register and unregister a callback function which will be called from the Tk mainloop when I/O is possible on a file descriptor. Only one handler may be registered per file descriptor. Example code: import tkinter widget = tkinter.Tk() mask = tkinter.READABLE | tkinter.WRITABLE widget.tk.createfilehandler(file, mask, callback) ... widget.tk.deletefilehandler(file) This feature is not available on Windows. Since you don’t know how many bytes are available for reading, you may not want to use the "BufferedIOBase" or "TextIOBase" "read()" or "readline()" methods, since these will insist on reading a predefined number of bytes. For sockets, the "recv()" or "recvfrom()" methods will work fine; for other files, use raw reads or "os.read(file.fileno(), maxbytecount)". Widget.tk.createfilehandler(file, mask, func) Registers the file handler callback function *func*. The *file* argument may either be an object with a "fileno()" method (such as a file or socket object), or an integer file descriptor. The *mask* argument is an ORed combination of any of the three constants below. The callback is called as follows: callback(file, mask) Widget.tk.deletefilehandler(file) Unregisters a file handler. tkinter.READABLE tkinter.WRITABLE tkinter.EXCEPTION Constants used in the *mask* arguments. "token" — Constants used with Python parse trees ************************************************ **Source code:** Lib/token.py ====================================================================== This module provides constants which represent the numeric values of leaf nodes of the parse tree (terminal tokens). Refer to the file "Grammar/Grammar" in the Python distribution for the definitions of the names in the context of the language grammar. The specific numeric values which the names map to may change between Python versions. The module also provides a mapping from numeric codes to names and some functions. The functions mirror definitions in the Python C header files. token.tok_name Dictionary mapping the numeric values of the constants defined in this module back to name strings, allowing more human-readable representation of parse trees to be generated. token.ISTERMINAL(x) Return "True" for terminal token values. token.ISNONTERMINAL(x) Return "True" for non-terminal token values. token.ISEOF(x) Return "True" if *x* is the marker indicating the end of input. The token constants are: token.ENDMARKER token.NAME token.NUMBER token.STRING token.NEWLINE token.INDENT token.DEDENT token.LPAR Token value for ""("". token.RPAR Token value for "")"". token.LSQB Token value for ""["". token.RSQB Token value for ""]"". token.COLON Token value for "":"". token.COMMA Token value for "","". token.SEMI Token value for "";"". token.PLUS Token value for ""+"". token.MINUS Token value for ""-"". token.STAR Token value for ""*"". token.SLASH Token value for ""/"". token.VBAR Token value for ""|"". token.AMPER Token value for ""&"". token.LESS Token value for ""<"". token.GREATER Token value for "">"". token.EQUAL Token value for ""="". token.DOT Token value for ""."". token.PERCENT Token value for ""%"". token.LBRACE Token value for ""{"". token.RBRACE Token value for ""}"". token.EQEQUAL Token value for ""=="". token.NOTEQUAL Token value for ""!="". token.LESSEQUAL Token value for ""<="". token.GREATEREQUAL Token value for "">="". token.TILDE Token value for ""~"". token.CIRCUMFLEX Token value for ""^"". token.LEFTSHIFT Token value for ""<<"". token.RIGHTSHIFT Token value for "">>"". token.DOUBLESTAR Token value for ""**"". token.PLUSEQUAL Token value for ""+="". token.MINEQUAL Token value for ""-="". token.STAREQUAL Token value for ""*="". token.SLASHEQUAL Token value for ""/="". token.PERCENTEQUAL Token value for ""%="". token.AMPEREQUAL Token value for ""&="". token.VBAREQUAL Token value for ""|="". token.CIRCUMFLEXEQUAL Token value for ""^="". token.LEFTSHIFTEQUAL Token value for ""<<="". token.RIGHTSHIFTEQUAL Token value for "">>="". token.DOUBLESTAREQUAL Token value for ""**="". token.DOUBLESLASH Token value for ""//"". token.DOUBLESLASHEQUAL Token value for ""//="". token.AT Token value for ""@"". token.ATEQUAL Token value for ""@="". token.RARROW Token value for ""->"". token.ELLIPSIS Token value for ""..."". token.COLONEQUAL Token value for "":="". token.OP token.AWAIT token.ASYNC token.TYPE_IGNORE token.TYPE_COMMENT token.ERRORTOKEN token.N_TOKENS token.NT_OFFSET The following token type values aren’t used by the C tokenizer but are needed for the "tokenize" module. token.COMMENT Token value used to indicate a comment. token.NL Token value used to indicate a non-terminating newline. The "NEWLINE" token indicates the end of a logical line of Python code; "NL" tokens are generated when a logical line of code is continued over multiple physical lines. token.ENCODING Token value that indicates the encoding used to decode the source bytes into text. The first token returned by "tokenize.tokenize()" will always be an "ENCODING" token. token.TYPE_COMMENT Token value indicating that a type comment was recognized. Such tokens are only produced when "ast.parse()" is invoked with "type_comments=True". Changed in version 3.5: Added "AWAIT" and "ASYNC" tokens. Changed in version 3.7: Added "COMMENT", "NL" and "ENCODING" tokens. Changed in version 3.7: Removed "AWAIT" and "ASYNC" tokens. “async” and “await” are now tokenized as "NAME" tokens. Changed in version 3.8: Added "TYPE_COMMENT", "TYPE_IGNORE", "COLONEQUAL". Added "AWAIT" and "ASYNC" tokens back (they’re needed to support parsing older Python versions for "ast.parse()" with "feature_version" set to 6 or lower). "tokenize" — Tokenizer for Python source **************************************** **Source code:** Lib/tokenize.py ====================================================================== The "tokenize" module provides a lexical scanner for Python source code, implemented in Python. The scanner in this module returns comments as tokens as well, making it useful for implementing “pretty- printers”, including colorizers for on-screen displays. To simplify token stream handling, all operator and delimiter tokens and "Ellipsis" are returned using the generic "OP" token type. The exact type can be determined by checking the "exact_type" property on the *named tuple* returned from "tokenize.tokenize()". Tokenizing Input ================ The primary entry point is a *generator*: tokenize.tokenize(readline) The "tokenize()" generator requires one argument, *readline*, which must be a callable object which provides the same interface as the "io.IOBase.readline()" method of file objects. Each call to the function should return one line of input as bytes. The generator produces 5-tuples with these members: the token type; the token string; a 2-tuple "(srow, scol)" of ints specifying the row and column where the token begins in the source; a 2-tuple "(erow, ecol)" of ints specifying the row and column where the token ends in the source; and the line on which the token was found. The line passed (the last tuple item) is the *physical* line. The 5 tuple is returned as a *named tuple* with the field names: "type string start end line". The returned *named tuple* has an additional property named "exact_type" that contains the exact operator type for "OP" tokens. For all other token types "exact_type" equals the named tuple "type" field. Changed in version 3.1: Added support for named tuples. Changed in version 3.3: Added support for "exact_type". "tokenize()" determines the source encoding of the file by looking for a UTF-8 BOM or encoding cookie, according to **PEP 263**. tokenize.generate_tokens(readline) Tokenize a source reading unicode strings instead of bytes. Like "tokenize()", the *readline* argument is a callable returning a single line of input. However, "generate_tokens()" expects *readline* to return a str object rather than bytes. The result is an iterator yielding named tuples, exactly like "tokenize()". It does not yield an "ENCODING" token. All constants from the "token" module are also exported from "tokenize". Another function is provided to reverse the tokenization process. This is useful for creating tools that tokenize a script, modify the token stream, and write back the modified script. tokenize.untokenize(iterable) Converts tokens back into Python source code. The *iterable* must return sequences with at least two elements, the token type and the token string. Any additional sequence elements are ignored. The reconstructed script is returned as a single string. The result is guaranteed to tokenize back to match the input so that the conversion is lossless and round-trips are assured. The guarantee applies only to the token type and token string as the spacing between tokens (column positions) may change. It returns bytes, encoded using the "ENCODING" token, which is the first token sequence output by "tokenize()". If there is no encoding token in the input, it returns a str instead. "tokenize()" needs to detect the encoding of source files it tokenizes. The function it uses to do this is available: tokenize.detect_encoding(readline) The "detect_encoding()" function is used to detect the encoding that should be used to decode a Python source file. It requires one argument, readline, in the same way as the "tokenize()" generator. It will call readline a maximum of twice, and return the encoding used (as a string) and a list of any lines (not decoded from bytes) it has read in. It detects the encoding from the presence of a UTF-8 BOM or an encoding cookie as specified in **PEP 263**. If both a BOM and a cookie are present, but disagree, a "SyntaxError" will be raised. Note that if the BOM is found, "'utf-8-sig'" will be returned as an encoding. If no encoding is specified, then the default of "'utf-8'" will be returned. Use "open()" to open Python source files: it uses "detect_encoding()" to detect the file encoding. tokenize.open(filename) Open a file in read only mode using the encoding detected by "detect_encoding()". New in version 3.2. exception tokenize.TokenError Raised when either a docstring or expression that may be split over several lines is not completed anywhere in the file, for example: """Beginning of docstring or: [1, 2, 3 Note that unclosed single-quoted strings do not cause an error to be raised. They are tokenized as "ERRORTOKEN", followed by the tokenization of their contents. Command-Line Usage ================== New in version 3.3. The "tokenize" module can be executed as a script from the command line. It is as simple as: python -m tokenize [-e] [filename.py] The following options are accepted: -h, --help show this help message and exit -e, --exact display token names using the exact type If "filename.py" is specified its contents are tokenized to stdout. Otherwise, tokenization is performed on stdin. Examples ======== Example of a script rewriter that transforms float literals into Decimal objects: from tokenize import tokenize, untokenize, NUMBER, STRING, NAME, OP from io import BytesIO def decistmt(s): """Substitute Decimals for floats in a string of statements. >>> from decimal import Decimal >>> s = 'print(+21.3e-5*-.1234/81.7)' >>> decistmt(s) "print (+Decimal ('21.3e-5')*-Decimal ('.1234')/Decimal ('81.7'))" The format of the exponent is inherited from the platform C library. Known cases are "e-007" (Windows) and "e-07" (not Windows). Since we're only showing 12 digits, and the 13th isn't close to 5, the rest of the output should be platform-independent. >>> exec(s) #doctest: +ELLIPSIS -3.21716034272e-0...7 Output from calculations with Decimal should be identical across all platforms. >>> exec(decistmt(s)) -3.217160342717258261933904529E-7 """ result = [] g = tokenize(BytesIO(s.encode('utf-8')).readline) # tokenize the string for toknum, tokval, _, _, _ in g: if toknum == NUMBER and '.' in tokval: # replace NUMBER tokens result.extend([ (NAME, 'Decimal'), (OP, '('), (STRING, repr(tokval)), (OP, ')') ]) else: result.append((toknum, tokval)) return untokenize(result).decode('utf-8') Example of tokenizing from the command line. The script: def say_hello(): print("Hello, World!") say_hello() will be tokenized to the following output where the first column is the range of the line/column coordinates where the token is found, the second column is the name of the token, and the final column is the value of the token (if any) $ python -m tokenize hello.py 0,0-0,0: ENCODING 'utf-8' 1,0-1,3: NAME 'def' 1,4-1,13: NAME 'say_hello' 1,13-1,14: OP '(' 1,14-1,15: OP ')' 1,15-1,16: OP ':' 1,16-1,17: NEWLINE '\n' 2,0-2,4: INDENT ' ' 2,4-2,9: NAME 'print' 2,9-2,10: OP '(' 2,10-2,25: STRING '"Hello, World!"' 2,25-2,26: OP ')' 2,26-2,27: NEWLINE '\n' 3,0-3,1: NL '\n' 4,0-4,0: DEDENT '' 4,0-4,9: NAME 'say_hello' 4,9-4,10: OP '(' 4,10-4,11: OP ')' 4,11-4,12: NEWLINE '\n' 5,0-5,0: ENDMARKER '' The exact token type names can be displayed using the "-e" option: $ python -m tokenize -e hello.py 0,0-0,0: ENCODING 'utf-8' 1,0-1,3: NAME 'def' 1,4-1,13: NAME 'say_hello' 1,13-1,14: LPAR '(' 1,14-1,15: RPAR ')' 1,15-1,16: COLON ':' 1,16-1,17: NEWLINE '\n' 2,0-2,4: INDENT ' ' 2,4-2,9: NAME 'print' 2,9-2,10: LPAR '(' 2,10-2,25: STRING '"Hello, World!"' 2,25-2,26: RPAR ')' 2,26-2,27: NEWLINE '\n' 3,0-3,1: NL '\n' 4,0-4,0: DEDENT '' 4,0-4,9: NAME 'say_hello' 4,9-4,10: LPAR '(' 4,10-4,11: RPAR ')' 4,11-4,12: NEWLINE '\n' 5,0-5,0: ENDMARKER '' Example of tokenizing a file programmatically, reading unicode strings instead of bytes with "generate_tokens()": import tokenize with tokenize.open('hello.py') as f: tokens = tokenize.generate_tokens(f.readline) for token in tokens: print(token) Or reading bytes directly with "tokenize()": import tokenize with open('hello.py', 'rb') as f: tokens = tokenize.tokenize(f.readline) for token in tokens: print(token) "trace" — Trace or track Python statement execution *************************************************** **Source code:** Lib/trace.py ====================================================================== The "trace" module allows you to trace program execution, generate annotated statement coverage listings, print caller/callee relationships and list functions executed during a program run. It can be used in another program or from the command line. See also: Coverage.py A popular third-party coverage tool that provides HTML output along with advanced features such as branch coverage. Command-Line Usage ================== The "trace" module can be invoked from the command line. It can be as simple as python -m trace --count -C . somefile.py ... The above will execute "somefile.py" and generate annotated listings of all Python modules imported during the execution into the current directory. --help Display usage and exit. --version Display the version of the module and exit. New in version 3.8: Added "--module" option that allows to run an executable module. Main options ------------ At least one of the following options must be specified when invoking "trace". The "--listfuncs" option is mutually exclusive with the "-- trace" and "--count" options. When "--listfuncs" is provided, neither "--count" nor "--trace" are accepted, and vice versa. -c, --count Produce a set of annotated listing files upon program completion that shows how many times each statement was executed. See also " --coverdir", "--file" and "--no-report" below. -t, --trace Display lines as they are executed. -l, --listfuncs Display the functions executed by running the program. -r, --report Produce an annotated list from an earlier program run that used the "--count" and "--file" option. This does not execute any code. -T, --trackcalls Display the calling relationships exposed by running the program. Modifiers --------- -f, --file= Name of a file to accumulate counts over several tracing runs. Should be used with the "--count" option. -C, --coverdir= Directory where the report files go. The coverage report for "package.module" is written to file "*dir*/*package*/*module*.cover". -m, --missing When generating annotated listings, mark lines which were not executed with ">>>>>>". -s, --summary When using "--count" or "--report", write a brief summary to stdout for each file processed. -R, --no-report Do not generate annotated listings. This is useful if you intend to make several runs with "--count", and then produce a single set of annotated listings at the end. -g, --timing Prefix each line with the time since the program started. Only used while tracing. Filters ------- These options may be repeated multiple times. --ignore-module= Ignore each of the given module names and its submodules (if it is a package). The argument can be a list of names separated by a comma. --ignore-dir= Ignore all modules and packages in the named directory and subdirectories. The argument can be a list of directories separated by "os.pathsep". Programmatic Interface ====================== class trace.Trace(count=1, trace=1, countfuncs=0, countcallers=0, ignoremods=(), ignoredirs=(), infile=None, outfile=None, timing=False) Create an object to trace execution of a single statement or expression. All parameters are optional. *count* enables counting of line numbers. *trace* enables line execution tracing. *countfuncs* enables listing of the functions called during the run. *countcallers* enables call relationship tracking. *ignoremods* is a list of modules or packages to ignore. *ignoredirs* is a list of directories whose modules or packages should be ignored. *infile* is the name of the file from which to read stored count information. *outfile* is the name of the file in which to write updated count information. *timing* enables a timestamp relative to when tracing was started to be displayed. run(cmd) Execute the command and gather statistics from the execution with the current tracing parameters. *cmd* must be a string or code object, suitable for passing into "exec()". runctx(cmd, globals=None, locals=None) Execute the command and gather statistics from the execution with the current tracing parameters, in the defined global and local environments. If not defined, *globals* and *locals* default to empty dictionaries. runfunc(func, /, *args, **kwds) Call *func* with the given arguments under control of the "Trace" object with the current tracing parameters. results() Return a "CoverageResults" object that contains the cumulative results of all previous calls to "run", "runctx" and "runfunc" for the given "Trace" instance. Does not reset the accumulated trace results. class trace.CoverageResults A container for coverage results, created by "Trace.results()". Should not be created directly by the user. update(other) Merge in data from another "CoverageResults" object. write_results(show_missing=True, summary=False, coverdir=None) Write coverage results. Set *show_missing* to show lines that had no hits. Set *summary* to include in the output the coverage summary per module. *coverdir* specifies the directory into which the coverage result files will be output. If "None", the results for each source file are placed in its directory. A simple example demonstrating the use of the programmatic interface: import sys import trace # create a Trace object, telling it what to ignore, and whether to # do tracing or line-counting or both. tracer = trace.Trace( ignoredirs=[sys.prefix, sys.exec_prefix], trace=0, count=1) # run the new command using the given tracer tracer.run('main()') # make a report, placing output in the current directory r = tracer.results() r.write_results(show_missing=True, coverdir=".") "traceback" — Print or retrieve a stack traceback ************************************************* **Source code:** Lib/traceback.py ====================================================================== This module provides a standard interface to extract, format and print stack traces of Python programs. It exactly mimics the behavior of the Python interpreter when it prints a stack trace. This is useful when you want to print stack traces under program control, such as in a “wrapper” around the interpreter. The module uses traceback objects — this is the object type that is stored in the "sys.last_traceback" variable and returned as the third item from "sys.exc_info()". The module defines the following functions: traceback.print_tb(tb, limit=None, file=None) Print up to *limit* stack trace entries from traceback object *tb* (starting from the caller’s frame) if *limit* is positive. Otherwise, print the last "abs(limit)" entries. If *limit* is omitted or "None", all entries are printed. If *file* is omitted or "None", the output goes to "sys.stderr"; otherwise it should be an open file or file-like object to receive the output. Changed in version 3.5: Added negative *limit* support. traceback.print_exception(etype, value, tb, limit=None, file=None, chain=True) Print exception information and stack trace entries from traceback object *tb* to *file*. This differs from "print_tb()" in the following ways: * if *tb* is not "None", it prints a header "Traceback (most recent call last):" * it prints the exception *etype* and *value* after the stack trace * if *type(value)* is "SyntaxError" and *value* has the appropriate format, it prints the line where the syntax error occurred with a caret indicating the approximate position of the error. The optional *limit* argument has the same meaning as for "print_tb()". If *chain* is true (the default), then chained exceptions (the "__cause__" or "__context__" attributes of the exception) will be printed as well, like the interpreter itself does when printing an unhandled exception. Changed in version 3.5: The *etype* argument is ignored and inferred from the type of *value*. traceback.print_exc(limit=None, file=None, chain=True) This is a shorthand for "print_exception(*sys.exc_info(), limit, file, chain)". traceback.print_last(limit=None, file=None, chain=True) This is a shorthand for "print_exception(sys.last_type, sys.last_value, sys.last_traceback, limit, file, chain)". In general it will work only after an exception has reached an interactive prompt (see "sys.last_type"). traceback.print_stack(f=None, limit=None, file=None) Print up to *limit* stack trace entries (starting from the invocation point) if *limit* is positive. Otherwise, print the last "abs(limit)" entries. If *limit* is omitted or "None", all entries are printed. The optional *f* argument can be used to specify an alternate stack frame to start. The optional *file* argument has the same meaning as for "print_tb()". Changed in version 3.5: Added negative *limit* support. traceback.extract_tb(tb, limit=None) Return a "StackSummary" object representing a list of “pre- processed” stack trace entries extracted from the traceback object *tb*. It is useful for alternate formatting of stack traces. The optional *limit* argument has the same meaning as for "print_tb()". A “pre-processed” stack trace entry is a "FrameSummary" object containing attributes "filename", "lineno", "name", and "line" representing the information that is usually printed for a stack trace. The "line" is a string with leading and trailing whitespace stripped; if the source is not available it is "None". traceback.extract_stack(f=None, limit=None) Extract the raw traceback from the current stack frame. The return value has the same format as for "extract_tb()". The optional *f* and *limit* arguments have the same meaning as for "print_stack()". traceback.format_list(extracted_list) Given a list of tuples or "FrameSummary" objects as returned by "extract_tb()" or "extract_stack()", return a list of strings ready for printing. Each string in the resulting list corresponds to the item with the same index in the argument list. Each string ends in a newline; the strings may contain internal newlines as well, for those items whose source text line is not "None". traceback.format_exception_only(etype, value) Format the exception part of a traceback. The arguments are the exception type and value such as given by "sys.last_type" and "sys.last_value". The return value is a list of strings, each ending in a newline. Normally, the list contains a single string; however, for "SyntaxError" exceptions, it contains several lines that (when printed) display detailed information about where the syntax error occurred. The message indicating which exception occurred is the always last string in the list. traceback.format_exception(etype, value, tb, limit=None, chain=True) Format a stack trace and the exception information. The arguments have the same meaning as the corresponding arguments to "print_exception()". The return value is a list of strings, each ending in a newline and some containing internal newlines. When these lines are concatenated and printed, exactly the same text is printed as does "print_exception()". Changed in version 3.5: The *etype* argument is ignored and inferred from the type of *value*. traceback.format_exc(limit=None, chain=True) This is like "print_exc(limit)" but returns a string instead of printing to a file. traceback.format_tb(tb, limit=None) A shorthand for "format_list(extract_tb(tb, limit))". traceback.format_stack(f=None, limit=None) A shorthand for "format_list(extract_stack(f, limit))". traceback.clear_frames(tb) Clears the local variables of all the stack frames in a traceback *tb* by calling the "clear()" method of each frame object. New in version 3.4. traceback.walk_stack(f) Walk a stack following "f.f_back" from the given frame, yielding the frame and line number for each frame. If *f* is "None", the current stack is used. This helper is used with "StackSummary.extract()". New in version 3.5. traceback.walk_tb(tb) Walk a traceback following "tb_next" yielding the frame and line number for each frame. This helper is used with "StackSummary.extract()". New in version 3.5. The module also defines the following classes: "TracebackException" Objects ============================ New in version 3.5. "TracebackException" objects are created from actual exceptions to capture data for later printing in a lightweight fashion. class traceback.TracebackException(exc_type, exc_value, exc_traceback, *, limit=None, lookup_lines=True, capture_locals=False) Capture an exception for later rendering. *limit*, *lookup_lines* and *capture_locals* are as for the "StackSummary" class. Note that when locals are captured, they are also shown in the traceback. __cause__ A "TracebackException" of the original "__cause__". __context__ A "TracebackException" of the original "__context__". __suppress_context__ The "__suppress_context__" value from the original exception. stack A "StackSummary" representing the traceback. exc_type The class of the original traceback. filename For syntax errors - the file name where the error occurred. lineno For syntax errors - the line number where the error occurred. text For syntax errors - the text where the error occurred. offset For syntax errors - the offset into the text where the error occurred. msg For syntax errors - the compiler error message. classmethod from_exception(exc, *, limit=None, lookup_lines=True, capture_locals=False) Capture an exception for later rendering. *limit*, *lookup_lines* and *capture_locals* are as for the "StackSummary" class. Note that when locals are captured, they are also shown in the traceback. format(*, chain=True) Format the exception. If *chain* is not "True", "__cause__" and "__context__" will not be formatted. The return value is a generator of strings, each ending in a newline and some containing internal newlines. "print_exception()" is a wrapper around this method which just prints the lines to a file. The message indicating which exception occurred is always the last string in the output. format_exception_only() Format the exception part of the traceback. The return value is a generator of strings, each ending in a newline. Normally, the generator emits a single string; however, for "SyntaxError" exceptions, it emits several lines that (when printed) display detailed information about where the syntax error occurred. The message indicating which exception occurred is always the last string in the output. "StackSummary" Objects ====================== New in version 3.5. "StackSummary" objects represent a call stack ready for formatting. class traceback.StackSummary classmethod extract(frame_gen, *, limit=None, lookup_lines=True, capture_locals=False) Construct a "StackSummary" object from a frame generator (such as is returned by "walk_stack()" or "walk_tb()"). If *limit* is supplied, only this many frames are taken from *frame_gen*. If *lookup_lines* is "False", the returned "FrameSummary" objects will not have read their lines in yet, making the cost of creating the "StackSummary" cheaper (which may be valuable if it may not actually get formatted). If *capture_locals* is "True" the local variables in each "FrameSummary" are captured as object representations. classmethod from_list(a_list) Construct a "StackSummary" object from a supplied list of "FrameSummary" objects or old-style list of tuples. Each tuple should be a 4-tuple with filename, lineno, name, line as the elements. format() Returns a list of strings ready for printing. Each string in the resulting list corresponds to a single frame from the stack. Each string ends in a newline; the strings may contain internal newlines as well, for those items with source text lines. For long sequences of the same frame and line, the first few repetitions are shown, followed by a summary line stating the exact number of further repetitions. Changed in version 3.6: Long sequences of repeated frames are now abbreviated. "FrameSummary" Objects ====================== New in version 3.5. "FrameSummary" objects represent a single frame in a traceback. class traceback.FrameSummary(filename, lineno, name, lookup_line=True, locals=None, line=None) Represent a single frame in the traceback or stack that is being formatted or printed. It may optionally have a stringified version of the frames locals included in it. If *lookup_line* is "False", the source code is not looked up until the "FrameSummary" has the "line" attribute accessed (which also happens when casting it to a tuple). "line" may be directly provided, and will prevent line lookups happening at all. *locals* is an optional local variable dictionary, and if supplied the variable representations are stored in the summary for later display. Traceback Examples ================== This simple example implements a basic read-eval-print loop, similar to (but less useful than) the standard Python interactive interpreter loop. For a more complete implementation of the interpreter loop, refer to the "code" module. import sys, traceback def run_user_code(envdir): source = input(">>> ") try: exec(source, envdir) except Exception: print("Exception in user code:") print("-"*60) traceback.print_exc(file=sys.stdout) print("-"*60) envdir = {} while True: run_user_code(envdir) The following example demonstrates the different ways to print and format the exception and traceback: import sys, traceback def lumberjack(): bright_side_of_death() def bright_side_of_death(): return tuple()[0] try: lumberjack() except IndexError: exc_type, exc_value, exc_traceback = sys.exc_info() print("*** print_tb:") traceback.print_tb(exc_traceback, limit=1, file=sys.stdout) print("*** print_exception:") # exc_type below is ignored on 3.5 and later traceback.print_exception(exc_type, exc_value, exc_traceback, limit=2, file=sys.stdout) print("*** print_exc:") traceback.print_exc(limit=2, file=sys.stdout) print("*** format_exc, first and last line:") formatted_lines = traceback.format_exc().splitlines() print(formatted_lines[0]) print(formatted_lines[-1]) print("*** format_exception:") # exc_type below is ignored on 3.5 and later print(repr(traceback.format_exception(exc_type, exc_value, exc_traceback))) print("*** extract_tb:") print(repr(traceback.extract_tb(exc_traceback))) print("*** format_tb:") print(repr(traceback.format_tb(exc_traceback))) print("*** tb_lineno:", exc_traceback.tb_lineno) The output for the example would look similar to this: *** print_tb: File "", line 10, in lumberjack() *** print_exception: Traceback (most recent call last): File "", line 10, in lumberjack() File "", line 4, in lumberjack bright_side_of_death() IndexError: tuple index out of range *** print_exc: Traceback (most recent call last): File "", line 10, in lumberjack() File "", line 4, in lumberjack bright_side_of_death() IndexError: tuple index out of range *** format_exc, first and last line: Traceback (most recent call last): IndexError: tuple index out of range *** format_exception: ['Traceback (most recent call last):\n', ' File "", line 10, in \n lumberjack()\n', ' File "", line 4, in lumberjack\n bright_side_of_death()\n', ' File "", line 7, in bright_side_of_death\n return tuple()[0]\n', 'IndexError: tuple index out of range\n'] *** extract_tb: [, line 10 in >, , line 4 in lumberjack>, , line 7 in bright_side_of_death>] *** format_tb: [' File "", line 10, in \n lumberjack()\n', ' File "", line 4, in lumberjack\n bright_side_of_death()\n', ' File "", line 7, in bright_side_of_death\n return tuple()[0]\n'] *** tb_lineno: 10 The following example shows the different ways to print and format the stack: >>> import traceback >>> def another_function(): ... lumberstack() ... >>> def lumberstack(): ... traceback.print_stack() ... print(repr(traceback.extract_stack())) ... print(repr(traceback.format_stack())) ... >>> another_function() File "", line 10, in another_function() File "", line 3, in another_function lumberstack() File "", line 6, in lumberstack traceback.print_stack() [('', 10, '', 'another_function()'), ('', 3, 'another_function', 'lumberstack()'), ('', 7, 'lumberstack', 'print(repr(traceback.extract_stack()))')] [' File "", line 10, in \n another_function()\n', ' File "", line 3, in another_function\n lumberstack()\n', ' File "", line 8, in lumberstack\n print(repr(traceback.format_stack()))\n'] This last example demonstrates the final few formatting functions: >>> import traceback >>> traceback.format_list([('spam.py', 3, '', 'spam.eggs()'), ... ('eggs.py', 42, 'eggs', 'return "bacon"')]) [' File "spam.py", line 3, in \n spam.eggs()\n', ' File "eggs.py", line 42, in eggs\n return "bacon"\n'] >>> an_error = IndexError('tuple index out of range') >>> traceback.format_exception_only(type(an_error), an_error) ['IndexError: tuple index out of range\n'] "tracemalloc" — Trace memory allocations **************************************** New in version 3.4. **Source code:** Lib/tracemalloc.py ====================================================================== The tracemalloc module is a debug tool to trace memory blocks allocated by Python. It provides the following information: * Traceback where an object was allocated * Statistics on allocated memory blocks per filename and per line number: total size, number and average size of allocated memory blocks * Compute the differences between two snapshots to detect memory leaks To trace most memory blocks allocated by Python, the module should be started as early as possible by setting the "PYTHONTRACEMALLOC" environment variable to "1", or by using "-X" "tracemalloc" command line option. The "tracemalloc.start()" function can be called at runtime to start tracing Python memory allocations. By default, a trace of an allocated memory block only stores the most recent frame (1 frame). To store 25 frames at startup: set the "PYTHONTRACEMALLOC" environment variable to "25", or use the "-X" "tracemalloc=25" command line option. Examples ======== Display the top 10 ------------------ Display the 10 files allocating the most memory: import tracemalloc tracemalloc.start() # ... run your application ... snapshot = tracemalloc.take_snapshot() top_stats = snapshot.statistics('lineno') print("[ Top 10 ]") for stat in top_stats[:10]: print(stat) Example of output of the Python test suite: [ Top 10 ] :716: size=4855 KiB, count=39328, average=126 B :284: size=521 KiB, count=3199, average=167 B /usr/lib/python3.4/collections/__init__.py:368: size=244 KiB, count=2315, average=108 B /usr/lib/python3.4/unittest/case.py:381: size=185 KiB, count=779, average=243 B /usr/lib/python3.4/unittest/case.py:402: size=154 KiB, count=378, average=416 B /usr/lib/python3.4/abc.py:133: size=88.7 KiB, count=347, average=262 B :1446: size=70.4 KiB, count=911, average=79 B :1454: size=52.0 KiB, count=25, average=2131 B :5: size=49.7 KiB, count=148, average=344 B /usr/lib/python3.4/sysconfig.py:411: size=48.0 KiB, count=1, average=48.0 KiB We can see that Python loaded "4855 KiB" data (bytecode and constants) from modules and that the "collections" module allocated "244 KiB" to build "namedtuple" types. See "Snapshot.statistics()" for more options. Compute differences ------------------- Take two snapshots and display the differences: import tracemalloc tracemalloc.start() # ... start your application ... snapshot1 = tracemalloc.take_snapshot() # ... call the function leaking memory ... snapshot2 = tracemalloc.take_snapshot() top_stats = snapshot2.compare_to(snapshot1, 'lineno') print("[ Top 10 differences ]") for stat in top_stats[:10]: print(stat) Example of output before/after running some tests of the Python test suite: [ Top 10 differences ] :716: size=8173 KiB (+4428 KiB), count=71332 (+39369), average=117 B /usr/lib/python3.4/linecache.py:127: size=940 KiB (+940 KiB), count=8106 (+8106), average=119 B /usr/lib/python3.4/unittest/case.py:571: size=298 KiB (+298 KiB), count=589 (+589), average=519 B :284: size=1005 KiB (+166 KiB), count=7423 (+1526), average=139 B /usr/lib/python3.4/mimetypes.py:217: size=112 KiB (+112 KiB), count=1334 (+1334), average=86 B /usr/lib/python3.4/http/server.py:848: size=96.0 KiB (+96.0 KiB), count=1 (+1), average=96.0 KiB /usr/lib/python3.4/inspect.py:1465: size=83.5 KiB (+83.5 KiB), count=109 (+109), average=784 B /usr/lib/python3.4/unittest/mock.py:491: size=77.7 KiB (+77.7 KiB), count=143 (+143), average=557 B /usr/lib/python3.4/urllib/parse.py:476: size=71.8 KiB (+71.8 KiB), count=969 (+969), average=76 B /usr/lib/python3.4/contextlib.py:38: size=67.2 KiB (+67.2 KiB), count=126 (+126), average=546 B We can see that Python has loaded "8173 KiB" of module data (bytecode and constants), and that this is "4428 KiB" more than had been loaded before the tests, when the previous snapshot was taken. Similarly, the "linecache" module has cached "940 KiB" of Python source code to format tracebacks, all of it since the previous snapshot. If the system has little free memory, snapshots can be written on disk using the "Snapshot.dump()" method to analyze the snapshot offline. Then use the "Snapshot.load()" method reload the snapshot. Get the traceback of a memory block ----------------------------------- Code to display the traceback of the biggest memory block: import tracemalloc # Store 25 frames tracemalloc.start(25) # ... run your application ... snapshot = tracemalloc.take_snapshot() top_stats = snapshot.statistics('traceback') # pick the biggest memory block stat = top_stats[0] print("%s memory blocks: %.1f KiB" % (stat.count, stat.size / 1024)) for line in stat.traceback.format(): print(line) Example of output of the Python test suite (traceback limited to 25 frames): 903 memory blocks: 870.1 KiB File "", line 716 File "", line 1036 File "", line 934 File "", line 1068 File "", line 619 File "", line 1581 File "", line 1614 File "/usr/lib/python3.4/doctest.py", line 101 import pdb File "", line 284 File "", line 938 File "", line 1068 File "", line 619 File "", line 1581 File "", line 1614 File "/usr/lib/python3.4/test/support/__init__.py", line 1728 import doctest File "/usr/lib/python3.4/test/test_pickletools.py", line 21 support.run_doctest(pickletools) File "/usr/lib/python3.4/test/regrtest.py", line 1276 test_runner() File "/usr/lib/python3.4/test/regrtest.py", line 976 display_failure=not verbose) File "/usr/lib/python3.4/test/regrtest.py", line 761 match_tests=ns.match_tests) File "/usr/lib/python3.4/test/regrtest.py", line 1563 main() File "/usr/lib/python3.4/test/__main__.py", line 3 regrtest.main_in_temp_cwd() File "/usr/lib/python3.4/runpy.py", line 73 exec(code, run_globals) File "/usr/lib/python3.4/runpy.py", line 160 "__main__", fname, loader, pkg_name) We can see that the most memory was allocated in the "importlib" module to load data (bytecode and constants) from modules: "870.1 KiB". The traceback is where the "importlib" loaded data most recently: on the "import pdb" line of the "doctest" module. The traceback may change if a new module is loaded. Pretty top ---------- Code to display the 10 lines allocating the most memory with a pretty output, ignoring "" and "" files: import linecache import os import tracemalloc def display_top(snapshot, key_type='lineno', limit=10): snapshot = snapshot.filter_traces(( tracemalloc.Filter(False, ""), tracemalloc.Filter(False, ""), )) top_stats = snapshot.statistics(key_type) print("Top %s lines" % limit) for index, stat in enumerate(top_stats[:limit], 1): frame = stat.traceback[0] print("#%s: %s:%s: %.1f KiB" % (index, frame.filename, frame.lineno, stat.size / 1024)) line = linecache.getline(frame.filename, frame.lineno).strip() if line: print(' %s' % line) other = top_stats[limit:] if other: size = sum(stat.size for stat in other) print("%s other: %.1f KiB" % (len(other), size / 1024)) total = sum(stat.size for stat in top_stats) print("Total allocated size: %.1f KiB" % (total / 1024)) tracemalloc.start() # ... run your application ... snapshot = tracemalloc.take_snapshot() display_top(snapshot) Example of output of the Python test suite: Top 10 lines #1: Lib/base64.py:414: 419.8 KiB _b85chars2 = [(a + b) for a in _b85chars for b in _b85chars] #2: Lib/base64.py:306: 419.8 KiB _a85chars2 = [(a + b) for a in _a85chars for b in _a85chars] #3: collections/__init__.py:368: 293.6 KiB exec(class_definition, namespace) #4: Lib/abc.py:133: 115.2 KiB cls = super().__new__(mcls, name, bases, namespace) #5: unittest/case.py:574: 103.1 KiB testMethod() #6: Lib/linecache.py:127: 95.4 KiB lines = fp.readlines() #7: urllib/parse.py:476: 71.8 KiB for a in _hexdig for b in _hexdig} #8: :5: 62.0 KiB #9: Lib/_weakrefset.py:37: 60.0 KiB self.data = set() #10: Lib/base64.py:142: 59.8 KiB _b32tab2 = [a + b for a in _b32tab for b in _b32tab] 6220 other: 3602.8 KiB Total allocated size: 5303.1 KiB See "Snapshot.statistics()" for more options. Record the current and peak size of all traced memory blocks ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The following code computes two sums like "0 + 1 + 2 + ..." inefficiently, by creating a list of those numbers. This list consumes a lot of memory temporarily. We can use "get_traced_memory()" and "reset_peak()" to observe the small memory usage after the sum is computed as well as the peak memory usage during the computations: import tracemalloc tracemalloc.start() # Example code: compute a sum with a large temporary list large_sum = sum(list(range(100000))) first_size, first_peak = tracemalloc.get_traced_memory() tracemalloc.reset_peak() # Example code: compute a sum with a small temporary list small_sum = sum(list(range(1000))) second_size, second_peak = tracemalloc.get_traced_memory() print(f"{first_size=}, {first_peak=}") print(f"{second_size=}, {second_peak=}") Output: first_size=664, first_peak=3592984 second_size=804, second_peak=29704 Using "reset_peak()" ensured we could accurately record the peak during the computation of "small_sum", even though it is much smaller than the overall peak size of memory blocks since the "start()" call. Without the call to "reset_peak()", "second_peak" would still be the peak from the computation "large_sum" (that is, equal to "first_peak"). In this case, both peaks are much higher than the final memory usage, and which suggests we could optimise (by removing the unnecessary call to "list", and writing "sum(range(...))"). API === Functions --------- tracemalloc.clear_traces() Clear traces of memory blocks allocated by Python. See also "stop()". tracemalloc.get_object_traceback(obj) Get the traceback where the Python object *obj* was allocated. Return a "Traceback" instance, or "None" if the "tracemalloc" module is not tracing memory allocations or did not trace the allocation of the object. See also "gc.get_referrers()" and "sys.getsizeof()" functions. tracemalloc.get_traceback_limit() Get the maximum number of frames stored in the traceback of a trace. The "tracemalloc" module must be tracing memory allocations to get the limit, otherwise an exception is raised. The limit is set by the "start()" function. tracemalloc.get_traced_memory() Get the current size and peak size of memory blocks traced by the "tracemalloc" module as a tuple: "(current: int, peak: int)". tracemalloc.reset_peak() Set the peak size of memory blocks traced by the "tracemalloc" module to the current size. Do nothing if the "tracemalloc" module is not tracing memory allocations. This function only modifies the recorded peak size, and does not modify or clear any traces, unlike "clear_traces()". Snapshots taken with "take_snapshot()" before a call to "reset_peak()" can be meaningfully compared to snapshots taken after the call. See also "get_traced_memory()". New in version 3.9. tracemalloc.get_tracemalloc_memory() Get the memory usage in bytes of the "tracemalloc" module used to store traces of memory blocks. Return an "int". tracemalloc.is_tracing() "True" if the "tracemalloc" module is tracing Python memory allocations, "False" otherwise. See also "start()" and "stop()" functions. tracemalloc.start(nframe: int=1) Start tracing Python memory allocations: install hooks on Python memory allocators. Collected tracebacks of traces will be limited to *nframe* frames. By default, a trace of a memory block only stores the most recent frame: the limit is "1". *nframe* must be greater or equal to "1". You can still read the original number of total frames that composed the traceback by looking at the "Traceback.total_nframe" attribute. Storing more than "1" frame is only useful to compute statistics grouped by "'traceback'" or to compute cumulative statistics: see the "Snapshot.compare_to()" and "Snapshot.statistics()" methods. Storing more frames increases the memory and CPU overhead of the "tracemalloc" module. Use the "get_tracemalloc_memory()" function to measure how much memory is used by the "tracemalloc" module. The "PYTHONTRACEMALLOC" environment variable ("PYTHONTRACEMALLOC=NFRAME") and the "-X" "tracemalloc=NFRAME" command line option can be used to start tracing at startup. See also "stop()", "is_tracing()" and "get_traceback_limit()" functions. tracemalloc.stop() Stop tracing Python memory allocations: uninstall hooks on Python memory allocators. Also clears all previously collected traces of memory blocks allocated by Python. Call "take_snapshot()" function to take a snapshot of traces before clearing them. See also "start()", "is_tracing()" and "clear_traces()" functions. tracemalloc.take_snapshot() Take a snapshot of traces of memory blocks allocated by Python. Return a new "Snapshot" instance. The snapshot does not include memory blocks allocated before the "tracemalloc" module started to trace memory allocations. Tracebacks of traces are limited to "get_traceback_limit()" frames. Use the *nframe* parameter of the "start()" function to store more frames. The "tracemalloc" module must be tracing memory allocations to take a snapshot, see the "start()" function. See also the "get_object_traceback()" function. DomainFilter ------------ class tracemalloc.DomainFilter(inclusive: bool, domain: int) Filter traces of memory blocks by their address space (domain). New in version 3.6. inclusive If *inclusive* is "True" (include), match memory blocks allocated in the address space "domain". If *inclusive* is "False" (exclude), match memory blocks not allocated in the address space "domain". domain Address space of a memory block ("int"). Read-only property. Filter ------ class tracemalloc.Filter(inclusive: bool, filename_pattern: str, lineno: int=None, all_frames: bool=False, domain: int=None) Filter on traces of memory blocks. See the "fnmatch.fnmatch()" function for the syntax of *filename_pattern*. The "'.pyc'" file extension is replaced with "'.py'". Examples: * "Filter(True, subprocess.__file__)" only includes traces of the "subprocess" module * "Filter(False, tracemalloc.__file__)" excludes traces of the "tracemalloc" module * "Filter(False, "")" excludes empty tracebacks Changed in version 3.5: The "'.pyo'" file extension is no longer replaced with "'.py'". Changed in version 3.6: Added the "domain" attribute. domain Address space of a memory block ("int" or "None"). tracemalloc uses the domain "0" to trace memory allocations made by Python. C extensions can use other domains to trace other resources. inclusive If *inclusive* is "True" (include), only match memory blocks allocated in a file with a name matching "filename_pattern" at line number "lineno". If *inclusive* is "False" (exclude), ignore memory blocks allocated in a file with a name matching "filename_pattern" at line number "lineno". lineno Line number ("int") of the filter. If *lineno* is "None", the filter matches any line number. filename_pattern Filename pattern of the filter ("str"). Read-only property. all_frames If *all_frames* is "True", all frames of the traceback are checked. If *all_frames* is "False", only the most recent frame is checked. This attribute has no effect if the traceback limit is "1". See the "get_traceback_limit()" function and "Snapshot.traceback_limit" attribute. Frame ----- class tracemalloc.Frame Frame of a traceback. The "Traceback" class is a sequence of "Frame" instances. filename Filename ("str"). lineno Line number ("int"). Snapshot -------- class tracemalloc.Snapshot Snapshot of traces of memory blocks allocated by Python. The "take_snapshot()" function creates a snapshot instance. compare_to(old_snapshot: Snapshot, key_type: str, cumulative: bool=False) Compute the differences with an old snapshot. Get statistics as a sorted list of "StatisticDiff" instances grouped by *key_type*. See the "Snapshot.statistics()" method for *key_type* and *cumulative* parameters. The result is sorted from the biggest to the smallest by: absolute value of "StatisticDiff.size_diff", "StatisticDiff.size", absolute value of "StatisticDiff.count_diff", "Statistic.count" and then by "StatisticDiff.traceback". dump(filename) Write the snapshot into a file. Use "load()" to reload the snapshot. filter_traces(filters) Create a new "Snapshot" instance with a filtered "traces" sequence, *filters* is a list of "DomainFilter" and "Filter" instances. If *filters* is an empty list, return a new "Snapshot" instance with a copy of the traces. All inclusive filters are applied at once, a trace is ignored if no inclusive filters match it. A trace is ignored if at least one exclusive filter matches it. Changed in version 3.6: "DomainFilter" instances are now also accepted in *filters*. classmethod load(filename) Load a snapshot from a file. See also "dump()". statistics(key_type: str, cumulative: bool=False) Get statistics as a sorted list of "Statistic" instances grouped by *key_type*: +-----------------------+--------------------------+ | key_type | description | |=======================|==========================| | "'filename'" | filename | +-----------------------+--------------------------+ | "'lineno'" | filename and line number | +-----------------------+--------------------------+ | "'traceback'" | traceback | +-----------------------+--------------------------+ If *cumulative* is "True", cumulate size and count of memory blocks of all frames of the traceback of a trace, not only the most recent frame. The cumulative mode can only be used with *key_type* equals to "'filename'" and "'lineno'". The result is sorted from the biggest to the smallest by: "Statistic.size", "Statistic.count" and then by "Statistic.traceback". traceback_limit Maximum number of frames stored in the traceback of "traces": result of the "get_traceback_limit()" when the snapshot was taken. traces Traces of all memory blocks allocated by Python: sequence of "Trace" instances. The sequence has an undefined order. Use the "Snapshot.statistics()" method to get a sorted list of statistics. Statistic --------- class tracemalloc.Statistic Statistic on memory allocations. "Snapshot.statistics()" returns a list of "Statistic" instances. See also the "StatisticDiff" class. count Number of memory blocks ("int"). size Total size of memory blocks in bytes ("int"). traceback Traceback where the memory block was allocated, "Traceback" instance. StatisticDiff ------------- class tracemalloc.StatisticDiff Statistic difference on memory allocations between an old and a new "Snapshot" instance. "Snapshot.compare_to()" returns a list of "StatisticDiff" instances. See also the "Statistic" class. count Number of memory blocks in the new snapshot ("int"): "0" if the memory blocks have been released in the new snapshot. count_diff Difference of number of memory blocks between the old and the new snapshots ("int"): "0" if the memory blocks have been allocated in the new snapshot. size Total size of memory blocks in bytes in the new snapshot ("int"): "0" if the memory blocks have been released in the new snapshot. size_diff Difference of total size of memory blocks in bytes between the old and the new snapshots ("int"): "0" if the memory blocks have been allocated in the new snapshot. traceback Traceback where the memory blocks were allocated, "Traceback" instance. Trace ----- class tracemalloc.Trace Trace of a memory block. The "Snapshot.traces" attribute is a sequence of "Trace" instances. Changed in version 3.6: Added the "domain" attribute. domain Address space of a memory block ("int"). Read-only property. tracemalloc uses the domain "0" to trace memory allocations made by Python. C extensions can use other domains to trace other resources. size Size of the memory block in bytes ("int"). traceback Traceback where the memory block was allocated, "Traceback" instance. Traceback --------- class tracemalloc.Traceback Sequence of "Frame" instances sorted from the oldest frame to the most recent frame. A traceback contains at least "1" frame. If the "tracemalloc" module failed to get a frame, the filename """" at line number "0" is used. When a snapshot is taken, tracebacks of traces are limited to "get_traceback_limit()" frames. See the "take_snapshot()" function. The original number of frames of the traceback is stored in the "Traceback.total_nframe" attribute. That allows to know if a traceback has been truncated by the traceback limit. The "Trace.traceback" attribute is an instance of "Traceback" instance. Changed in version 3.7: Frames are now sorted from the oldest to the most recent, instead of most recent to oldest. total_nframe Total number of frames that composed the traceback before truncation. This attribute can be set to "None" if the information is not available. Changed in version 3.9: The "Traceback.total_nframe" attribute was added. format(limit=None, most_recent_first=False) Format the traceback as a list of lines. Use the "linecache" module to retrieve lines from the source code. If *limit* is set, format the *limit* most recent frames if *limit* is positive. Otherwise, format the "abs(limit)" oldest frames. If *most_recent_first* is "True", the order of the formatted frames is reversed, returning the most recent frame first instead of last. Similar to the "traceback.format_tb()" function, except that "format()" does not include newlines. Example: print("Traceback (most recent call first):") for line in traceback: print(line) Output: Traceback (most recent call first): File "test.py", line 9 obj = Object() File "test.py", line 12 tb = tracemalloc.get_object_traceback(f()) "tty" — Terminal control functions ********************************** **Source code:** Lib/tty.py ====================================================================== The "tty" module defines functions for putting the tty into cbreak and raw modes. Because it requires the "termios" module, it will work only on Unix. The "tty" module defines the following functions: tty.setraw(fd, when=termios.TCSAFLUSH) Change the mode of the file descriptor *fd* to raw. If *when* is omitted, it defaults to "termios.TCSAFLUSH", and is passed to "termios.tcsetattr()". tty.setcbreak(fd, when=termios.TCSAFLUSH) Change the mode of file descriptor *fd* to cbreak. If *when* is omitted, it defaults to "termios.TCSAFLUSH", and is passed to "termios.tcsetattr()". See also: Module "termios" Low-level terminal control interface. "turtle" — Turtle graphics ************************** **Source code:** Lib/turtle.py ====================================================================== Introduction ============ Turtle graphics is a popular way for introducing programming to kids. It was part of the original Logo programming language developed by Wally Feurzeig, Seymour Papert and Cynthia Solomon in 1967. Imagine a robotic turtle starting at (0, 0) in the x-y plane. After an "import turtle", give it the command "turtle.forward(15)", and it moves (on-screen!) 15 pixels in the direction it is facing, drawing a line as it moves. Give it the command "turtle.right(25)", and it rotates in-place 25 degrees clockwise. Turtle star ^^^^^^^^^^^ Turtle can draw intricate shapes using programs that repeat simple moves. [image] from turtle import * color('red', 'yellow') begin_fill() while True: forward(200) left(170) if abs(pos()) < 1: break end_fill() done() By combining together these and similar commands, intricate shapes and pictures can easily be drawn. The "turtle" module is an extended reimplementation of the same-named module from the Python standard distribution up to version Python 2.5. It tries to keep the merits of the old turtle module and to be (nearly) 100% compatible with it. This means in the first place to enable the learning programmer to use all the commands, classes and methods interactively when using the module from within IDLE run with the "-n" switch. The turtle module provides turtle graphics primitives, in both object- oriented and procedure-oriented ways. Because it uses "tkinter" for the underlying graphics, it needs a version of Python installed with Tk support. The object-oriented interface uses essentially two+two classes: 1. The "TurtleScreen" class defines graphics windows as a playground for the drawing turtles. Its constructor needs a "tkinter.Canvas" or a "ScrolledCanvas" as argument. It should be used when "turtle" is used as part of some application. The function "Screen()" returns a singleton object of a "TurtleScreen" subclass. This function should be used when "turtle" is used as a standalone tool for doing graphics. As a singleton object, inheriting from its class is not possible. All methods of TurtleScreen/Screen also exist as functions, i.e. as part of the procedure-oriented interface. 2. "RawTurtle" (alias: "RawPen") defines Turtle objects which draw on a "TurtleScreen". Its constructor needs a Canvas, ScrolledCanvas or TurtleScreen as argument, so the RawTurtle objects know where to draw. Derived from RawTurtle is the subclass "Turtle" (alias: "Pen"), which draws on “the” "Screen" instance which is automatically created, if not already present. All methods of RawTurtle/Turtle also exist as functions, i.e. part of the procedure-oriented interface. The procedural interface provides functions which are derived from the methods of the classes "Screen" and "Turtle". They have the same names as the corresponding methods. A screen object is automatically created whenever a function derived from a Screen method is called. An (unnamed) turtle object is automatically created whenever any of the functions derived from a Turtle method is called. To use multiple turtles on a screen one has to use the object-oriented interface. Note: In the following documentation the argument list for functions is given. Methods, of course, have the additional first argument *self* which is omitted here. Overview of available Turtle and Screen methods =============================================== Turtle methods -------------- Turtle motion Move and draw "forward()" | "fd()" "backward()" | "bk()" | "back()" "right()" | "rt()" "left()" | "lt()" "goto()" | "setpos()" | "setposition()" "setx()" "sety()" "setheading()" | "seth()" "home()" "circle()" "dot()" "stamp()" "clearstamp()" "clearstamps()" "undo()" "speed()" Tell Turtle’s state "position()" | "pos()" "towards()" "xcor()" "ycor()" "heading()" "distance()" Setting and measurement "degrees()" "radians()" Pen control Drawing state "pendown()" | "pd()" | "down()" "penup()" | "pu()" | "up()" "pensize()" | "width()" "pen()" "isdown()" Color control "color()" "pencolor()" "fillcolor()" Filling "filling()" "begin_fill()" "end_fill()" More drawing control "reset()" "clear()" "write()" Turtle state Visibility "showturtle()" | "st()" "hideturtle()" | "ht()" "isvisible()" Appearance "shape()" "resizemode()" "shapesize()" | "turtlesize()" "shearfactor()" "settiltangle()" "tiltangle()" "tilt()" "shapetransform()" "get_shapepoly()" Using events "onclick()" "onrelease()" "ondrag()" Special Turtle methods "begin_poly()" "end_poly()" "get_poly()" "clone()" "getturtle()" | "getpen()" "getscreen()" "setundobuffer()" "undobufferentries()" Methods of TurtleScreen/Screen ------------------------------ Window control "bgcolor()" "bgpic()" "clearscreen()" "resetscreen()" "screensize()" "setworldcoordinates()" Animation control "delay()" "tracer()" "update()" Using screen events "listen()" "onkey()" | "onkeyrelease()" "onkeypress()" "onclick()" | "onscreenclick()" "ontimer()" "mainloop()" | "done()" Settings and special methods "mode()" "colormode()" "getcanvas()" "getshapes()" "register_shape()" | "addshape()" "turtles()" "window_height()" "window_width()" Input methods "textinput()" "numinput()" Methods specific to Screen "bye()" "exitonclick()" "setup()" "title()" Methods of RawTurtle/Turtle and corresponding functions ======================================================= Most of the examples in this section refer to a Turtle instance called "turtle". Turtle motion ------------- turtle.forward(distance) turtle.fd(distance) Parameters: **distance** – a number (integer or float) Move the turtle forward by the specified *distance*, in the direction the turtle is headed. >>> turtle.position() (0.00,0.00) >>> turtle.forward(25) >>> turtle.position() (25.00,0.00) >>> turtle.forward(-75) >>> turtle.position() (-50.00,0.00) turtle.back(distance) turtle.bk(distance) turtle.backward(distance) Parameters: **distance** – a number Move the turtle backward by *distance*, opposite to the direction the turtle is headed. Do not change the turtle’s heading. >>> turtle.position() (0.00,0.00) >>> turtle.backward(30) >>> turtle.position() (-30.00,0.00) turtle.right(angle) turtle.rt(angle) Parameters: **angle** – a number (integer or float) Turn turtle right by *angle* units. (Units are by default degrees, but can be set via the "degrees()" and "radians()" functions.) Angle orientation depends on the turtle mode, see "mode()". >>> turtle.heading() 22.0 >>> turtle.right(45) >>> turtle.heading() 337.0 turtle.left(angle) turtle.lt(angle) Parameters: **angle** – a number (integer or float) Turn turtle left by *angle* units. (Units are by default degrees, but can be set via the "degrees()" and "radians()" functions.) Angle orientation depends on the turtle mode, see "mode()". >>> turtle.heading() 22.0 >>> turtle.left(45) >>> turtle.heading() 67.0 turtle.goto(x, y=None) turtle.setpos(x, y=None) turtle.setposition(x, y=None) Parameters: * **x** – a number or a pair/vector of numbers * **y** – a number or "None" If *y* is "None", *x* must be a pair of coordinates or a "Vec2D" (e.g. as returned by "pos()"). Move turtle to an absolute position. If the pen is down, draw line. Do not change the turtle’s orientation. >>> tp = turtle.pos() >>> tp (0.00,0.00) >>> turtle.setpos(60,30) >>> turtle.pos() (60.00,30.00) >>> turtle.setpos((20,80)) >>> turtle.pos() (20.00,80.00) >>> turtle.setpos(tp) >>> turtle.pos() (0.00,0.00) turtle.setx(x) Parameters: **x** – a number (integer or float) Set the turtle’s first coordinate to *x*, leave second coordinate unchanged. >>> turtle.position() (0.00,240.00) >>> turtle.setx(10) >>> turtle.position() (10.00,240.00) turtle.sety(y) Parameters: **y** – a number (integer or float) Set the turtle’s second coordinate to *y*, leave first coordinate unchanged. >>> turtle.position() (0.00,40.00) >>> turtle.sety(-10) >>> turtle.position() (0.00,-10.00) turtle.setheading(to_angle) turtle.seth(to_angle) Parameters: **to_angle** – a number (integer or float) Set the orientation of the turtle to *to_angle*. Here are some common directions in degrees: +---------------------+----------------------+ | standard mode | logo mode | |=====================|======================| | 0 - east | 0 - north | +---------------------+----------------------+ | 90 - north | 90 - east | +---------------------+----------------------+ | 180 - west | 180 - south | +---------------------+----------------------+ | 270 - south | 270 - west | +---------------------+----------------------+ >>> turtle.setheading(90) >>> turtle.heading() 90.0 turtle.home() Move turtle to the origin – coordinates (0,0) – and set its heading to its start-orientation (which depends on the mode, see "mode()"). >>> turtle.heading() 90.0 >>> turtle.position() (0.00,-10.00) >>> turtle.home() >>> turtle.position() (0.00,0.00) >>> turtle.heading() 0.0 turtle.circle(radius, extent=None, steps=None) Parameters: * **radius** – a number * **extent** – a number (or "None") * **steps** – an integer (or "None") Draw a circle with given *radius*. The center is *radius* units left of the turtle; *extent* – an angle – determines which part of the circle is drawn. If *extent* is not given, draw the entire circle. If *extent* is not a full circle, one endpoint of the arc is the current pen position. Draw the arc in counterclockwise direction if *radius* is positive, otherwise in clockwise direction. Finally the direction of the turtle is changed by the amount of *extent*. As the circle is approximated by an inscribed regular polygon, *steps* determines the number of steps to use. If not given, it will be calculated automatically. May be used to draw regular polygons. >>> turtle.home() >>> turtle.position() (0.00,0.00) >>> turtle.heading() 0.0 >>> turtle.circle(50) >>> turtle.position() (-0.00,0.00) >>> turtle.heading() 0.0 >>> turtle.circle(120, 180) # draw a semicircle >>> turtle.position() (0.00,240.00) >>> turtle.heading() 180.0 turtle.dot(size=None, *color) Parameters: * **size** – an integer >= 1 (if given) * **color** – a colorstring or a numeric color tuple Draw a circular dot with diameter *size*, using *color*. If *size* is not given, the maximum of pensize+4 and 2*pensize is used. >>> turtle.home() >>> turtle.dot() >>> turtle.fd(50); turtle.dot(20, "blue"); turtle.fd(50) >>> turtle.position() (100.00,-0.00) >>> turtle.heading() 0.0 turtle.stamp() Stamp a copy of the turtle shape onto the canvas at the current turtle position. Return a stamp_id for that stamp, which can be used to delete it by calling "clearstamp(stamp_id)". >>> turtle.color("blue") >>> turtle.stamp() 11 >>> turtle.fd(50) turtle.clearstamp(stampid) Parameters: **stampid** – an integer, must be return value of previous "stamp()" call Delete stamp with given *stampid*. >>> turtle.position() (150.00,-0.00) >>> turtle.color("blue") >>> astamp = turtle.stamp() >>> turtle.fd(50) >>> turtle.position() (200.00,-0.00) >>> turtle.clearstamp(astamp) >>> turtle.position() (200.00,-0.00) turtle.clearstamps(n=None) Parameters: **n** – an integer (or "None") Delete all or first/last *n* of turtle’s stamps. If *n* is "None", delete all stamps, if *n* > 0 delete first *n* stamps, else if *n* < 0 delete last *n* stamps. >>> for i in range(8): ... turtle.stamp(); turtle.fd(30) 13 14 15 16 17 18 19 20 >>> turtle.clearstamps(2) >>> turtle.clearstamps(-2) >>> turtle.clearstamps() turtle.undo() Undo (repeatedly) the last turtle action(s). Number of available undo actions is determined by the size of the undobuffer. >>> for i in range(4): ... turtle.fd(50); turtle.lt(80) ... >>> for i in range(8): ... turtle.undo() turtle.speed(speed=None) Parameters: **speed** – an integer in the range 0..10 or a speedstring (see below) Set the turtle’s speed to an integer value in the range 0..10. If no argument is given, return current speed. If input is a number greater than 10 or smaller than 0.5, speed is set to 0. Speedstrings are mapped to speedvalues as follows: * “fastest”: 0 * “fast”: 10 * “normal”: 6 * “slow”: 3 * “slowest”: 1 Speeds from 1 to 10 enforce increasingly faster animation of line drawing and turtle turning. Attention: *speed* = 0 means that *no* animation takes place. forward/back makes turtle jump and likewise left/right make the turtle turn instantly. >>> turtle.speed() 3 >>> turtle.speed('normal') >>> turtle.speed() 6 >>> turtle.speed(9) >>> turtle.speed() 9 Tell Turtle’s state ------------------- turtle.position() turtle.pos() Return the turtle’s current location (x,y) (as a "Vec2D" vector). >>> turtle.pos() (440.00,-0.00) turtle.towards(x, y=None) Parameters: * **x** – a number or a pair/vector of numbers or a turtle instance * **y** – a number if *x* is a number, else "None" Return the angle between the line from turtle position to position specified by (x,y), the vector or the other turtle. This depends on the turtle’s start orientation which depends on the mode - “standard”/”world” or “logo”. >>> turtle.goto(10, 10) >>> turtle.towards(0,0) 225.0 turtle.xcor() Return the turtle’s x coordinate. >>> turtle.home() >>> turtle.left(50) >>> turtle.forward(100) >>> turtle.pos() (64.28,76.60) >>> print(round(turtle.xcor(), 5)) 64.27876 turtle.ycor() Return the turtle’s y coordinate. >>> turtle.home() >>> turtle.left(60) >>> turtle.forward(100) >>> print(turtle.pos()) (50.00,86.60) >>> print(round(turtle.ycor(), 5)) 86.60254 turtle.heading() Return the turtle’s current heading (value depends on the turtle mode, see "mode()"). >>> turtle.home() >>> turtle.left(67) >>> turtle.heading() 67.0 turtle.distance(x, y=None) Parameters: * **x** – a number or a pair/vector of numbers or a turtle instance * **y** – a number if *x* is a number, else "None" Return the distance from the turtle to (x,y), the given vector, or the given other turtle, in turtle step units. >>> turtle.home() >>> turtle.distance(30,40) 50.0 >>> turtle.distance((30,40)) 50.0 >>> joe = Turtle() >>> joe.forward(77) >>> turtle.distance(joe) 77.0 Settings for measurement ------------------------ turtle.degrees(fullcircle=360.0) Parameters: **fullcircle** – a number Set angle measurement units, i.e. set number of “degrees” for a full circle. Default value is 360 degrees. >>> turtle.home() >>> turtle.left(90) >>> turtle.heading() 90.0 Change angle measurement unit to grad (also known as gon, grade, or gradian and equals 1/100-th of the right angle.) >>> turtle.degrees(400.0) >>> turtle.heading() 100.0 >>> turtle.degrees(360) >>> turtle.heading() 90.0 turtle.radians() Set the angle measurement units to radians. Equivalent to "degrees(2*math.pi)". >>> turtle.home() >>> turtle.left(90) >>> turtle.heading() 90.0 >>> turtle.radians() >>> turtle.heading() 1.5707963267948966 Pen control ----------- Drawing state ~~~~~~~~~~~~~ turtle.pendown() turtle.pd() turtle.down() Pull the pen down – drawing when moving. turtle.penup() turtle.pu() turtle.up() Pull the pen up – no drawing when moving. turtle.pensize(width=None) turtle.width(width=None) Parameters: **width** – a positive number Set the line thickness to *width* or return it. If resizemode is set to “auto” and turtleshape is a polygon, that polygon is drawn with the same line thickness. If no argument is given, the current pensize is returned. >>> turtle.pensize() 1 >>> turtle.pensize(10) # from here on lines of width 10 are drawn turtle.pen(pen=None, **pendict) Parameters: * **pen** – a dictionary with some or all of the below listed keys * **pendict** – one or more keyword-arguments with the below listed keys as keywords Return or set the pen’s attributes in a “pen-dictionary” with the following key/value pairs: * “shown”: True/False * “pendown”: True/False * “pencolor”: color-string or color-tuple * “fillcolor”: color-string or color-tuple * “pensize”: positive number * “speed”: number in range 0..10 * “resizemode”: “auto” or “user” or “noresize” * “stretchfactor”: (positive number, positive number) * “outline”: positive number * “tilt”: number This dictionary can be used as argument for a subsequent call to "pen()" to restore the former pen-state. Moreover one or more of these attributes can be provided as keyword-arguments. This can be used to set several pen attributes in one statement. >>> turtle.pen(fillcolor="black", pencolor="red", pensize=10) >>> sorted(turtle.pen().items()) [('fillcolor', 'black'), ('outline', 1), ('pencolor', 'red'), ('pendown', True), ('pensize', 10), ('resizemode', 'noresize'), ('shearfactor', 0.0), ('shown', True), ('speed', 9), ('stretchfactor', (1.0, 1.0)), ('tilt', 0.0)] >>> penstate=turtle.pen() >>> turtle.color("yellow", "") >>> turtle.penup() >>> sorted(turtle.pen().items())[:3] [('fillcolor', ''), ('outline', 1), ('pencolor', 'yellow')] >>> turtle.pen(penstate, fillcolor="green") >>> sorted(turtle.pen().items())[:3] [('fillcolor', 'green'), ('outline', 1), ('pencolor', 'red')] turtle.isdown() Return "True" if pen is down, "False" if it’s up. >>> turtle.penup() >>> turtle.isdown() False >>> turtle.pendown() >>> turtle.isdown() True Color control ~~~~~~~~~~~~~ turtle.pencolor(*args) Return or set the pencolor. Four input formats are allowed: "pencolor()" Return the current pencolor as color specification string or as a tuple (see example). May be used as input to another color/pencolor/fillcolor call. "pencolor(colorstring)" Set pencolor to *colorstring*, which is a Tk color specification string, such as ""red"", ""yellow"", or ""#33cc8c"". "pencolor((r, g, b))" Set pencolor to the RGB color represented by the tuple of *r*, *g*, and *b*. Each of *r*, *g*, and *b* must be in the range 0..colormode, where colormode is either 1.0 or 255 (see "colormode()"). "pencolor(r, g, b)" Set pencolor to the RGB color represented by *r*, *g*, and *b*. Each of *r*, *g*, and *b* must be in the range 0..colormode. If turtleshape is a polygon, the outline of that polygon is drawn with the newly set pencolor. >>> colormode() 1.0 >>> turtle.pencolor() 'red' >>> turtle.pencolor("brown") >>> turtle.pencolor() 'brown' >>> tup = (0.2, 0.8, 0.55) >>> turtle.pencolor(tup) >>> turtle.pencolor() (0.2, 0.8, 0.5490196078431373) >>> colormode(255) >>> turtle.pencolor() (51.0, 204.0, 140.0) >>> turtle.pencolor('#32c18f') >>> turtle.pencolor() (50.0, 193.0, 143.0) turtle.fillcolor(*args) Return or set the fillcolor. Four input formats are allowed: "fillcolor()" Return the current fillcolor as color specification string, possibly in tuple format (see example). May be used as input to another color/pencolor/fillcolor call. "fillcolor(colorstring)" Set fillcolor to *colorstring*, which is a Tk color specification string, such as ""red"", ""yellow"", or ""#33cc8c"". "fillcolor((r, g, b))" Set fillcolor to the RGB color represented by the tuple of *r*, *g*, and *b*. Each of *r*, *g*, and *b* must be in the range 0..colormode, where colormode is either 1.0 or 255 (see "colormode()"). "fillcolor(r, g, b)" Set fillcolor to the RGB color represented by *r*, *g*, and *b*. Each of *r*, *g*, and *b* must be in the range 0..colormode. If turtleshape is a polygon, the interior of that polygon is drawn with the newly set fillcolor. >>> turtle.fillcolor("violet") >>> turtle.fillcolor() 'violet' >>> turtle.pencolor() (50.0, 193.0, 143.0) >>> turtle.fillcolor((50, 193, 143)) # Integers, not floats >>> turtle.fillcolor() (50.0, 193.0, 143.0) >>> turtle.fillcolor('#ffffff') >>> turtle.fillcolor() (255.0, 255.0, 255.0) turtle.color(*args) Return or set pencolor and fillcolor. Several input formats are allowed. They use 0 to 3 arguments as follows: "color()" Return the current pencolor and the current fillcolor as a pair of color specification strings or tuples as returned by "pencolor()" and "fillcolor()". "color(colorstring)", "color((r,g,b))", "color(r,g,b)" Inputs as in "pencolor()", set both, fillcolor and pencolor, to the given value. "color(colorstring1, colorstring2)", "color((r1,g1,b1), (r2,g2,b2))" Equivalent to "pencolor(colorstring1)" and "fillcolor(colorstring2)" and analogously if the other input format is used. If turtleshape is a polygon, outline and interior of that polygon is drawn with the newly set colors. >>> turtle.color("red", "green") >>> turtle.color() ('red', 'green') >>> color("#285078", "#a0c8f0") >>> color() ((40.0, 80.0, 120.0), (160.0, 200.0, 240.0)) See also: Screen method "colormode()". Filling ~~~~~~~ turtle.filling() Return fillstate ("True" if filling, "False" else). >>> turtle.begin_fill() >>> if turtle.filling(): ... turtle.pensize(5) ... else: ... turtle.pensize(3) turtle.begin_fill() To be called just before drawing a shape to be filled. turtle.end_fill() Fill the shape drawn after the last call to "begin_fill()". Whether or not overlap regions for self-intersecting polygons or multiple shapes are filled depends on the operating system graphics, type of overlap, and number of overlaps. For example, the Turtle star above may be either all yellow or have some white regions. >>> turtle.color("black", "red") >>> turtle.begin_fill() >>> turtle.circle(80) >>> turtle.end_fill() More drawing control ~~~~~~~~~~~~~~~~~~~~ turtle.reset() Delete the turtle’s drawings from the screen, re-center the turtle and set variables to the default values. >>> turtle.goto(0,-22) >>> turtle.left(100) >>> turtle.position() (0.00,-22.00) >>> turtle.heading() 100.0 >>> turtle.reset() >>> turtle.position() (0.00,0.00) >>> turtle.heading() 0.0 turtle.clear() Delete the turtle’s drawings from the screen. Do not move turtle. State and position of the turtle as well as drawings of other turtles are not affected. turtle.write(arg, move=False, align="left", font=("Arial", 8, "normal")) Parameters: * **arg** – object to be written to the TurtleScreen * **move** – True/False * **align** – one of the strings “left”, “center” or right” * **font** – a triple (fontname, fontsize, fonttype) Write text - the string representation of *arg* - at the current turtle position according to *align* (“left”, “center” or “right”) and with the given font. If *move* is true, the pen is moved to the bottom-right corner of the text. By default, *move* is "False". >>> turtle.write("Home = ", True, align="center") >>> turtle.write((0,0), True) Turtle state ------------ Visibility ~~~~~~~~~~ turtle.hideturtle() turtle.ht() Make the turtle invisible. It’s a good idea to do this while you’re in the middle of doing some complex drawing, because hiding the turtle speeds up the drawing observably. >>> turtle.hideturtle() turtle.showturtle() turtle.st() Make the turtle visible. >>> turtle.showturtle() turtle.isvisible() Return "True" if the Turtle is shown, "False" if it’s hidden. >>> turtle.hideturtle() >>> turtle.isvisible() False >>> turtle.showturtle() >>> turtle.isvisible() True Appearance ~~~~~~~~~~ turtle.shape(name=None) Parameters: **name** – a string which is a valid shapename Set turtle shape to shape with given *name* or, if name is not given, return name of current shape. Shape with *name* must exist in the TurtleScreen’s shape dictionary. Initially there are the following polygon shapes: “arrow”, “turtle”, “circle”, “square”, “triangle”, “classic”. To learn about how to deal with shapes see Screen method "register_shape()". >>> turtle.shape() 'classic' >>> turtle.shape("turtle") >>> turtle.shape() 'turtle' turtle.resizemode(rmode=None) Parameters: **rmode** – one of the strings “auto”, “user”, “noresize” Set resizemode to one of the values: “auto”, “user”, “noresize”. If *rmode* is not given, return current resizemode. Different resizemodes have the following effects: * “auto”: adapts the appearance of the turtle corresponding to the value of pensize. * “user”: adapts the appearance of the turtle according to the values of stretchfactor and outlinewidth (outline), which are set by "shapesize()". * “noresize”: no adaption of the turtle’s appearance takes place. "resizemode("user")" is called by "shapesize()" when used with arguments. >>> turtle.resizemode() 'noresize' >>> turtle.resizemode("auto") >>> turtle.resizemode() 'auto' turtle.shapesize(stretch_wid=None, stretch_len=None, outline=None) turtle.turtlesize(stretch_wid=None, stretch_len=None, outline=None) Parameters: * **stretch_wid** – positive number * **stretch_len** – positive number * **outline** – positive number Return or set the pen’s attributes x/y-stretchfactors and/or outline. Set resizemode to “user”. If and only if resizemode is set to “user”, the turtle will be displayed stretched according to its stretchfactors: *stretch_wid* is stretchfactor perpendicular to its orientation, *stretch_len* is stretchfactor in direction of its orientation, *outline* determines the width of the shapes’s outline. >>> turtle.shapesize() (1.0, 1.0, 1) >>> turtle.resizemode("user") >>> turtle.shapesize(5, 5, 12) >>> turtle.shapesize() (5, 5, 12) >>> turtle.shapesize(outline=8) >>> turtle.shapesize() (5, 5, 8) turtle.shearfactor(shear=None) Parameters: **shear** – number (optional) Set or return the current shearfactor. Shear the turtleshape according to the given shearfactor shear, which is the tangent of the shear angle. Do *not* change the turtle’s heading (direction of movement). If shear is not given: return the current shearfactor, i. e. the tangent of the shear angle, by which lines parallel to the heading of the turtle are sheared. >>> turtle.shape("circle") >>> turtle.shapesize(5,2) >>> turtle.shearfactor(0.5) >>> turtle.shearfactor() 0.5 turtle.tilt(angle) Parameters: **angle** – a number Rotate the turtleshape by *angle* from its current tilt-angle, but do *not* change the turtle’s heading (direction of movement). >>> turtle.reset() >>> turtle.shape("circle") >>> turtle.shapesize(5,2) >>> turtle.tilt(30) >>> turtle.fd(50) >>> turtle.tilt(30) >>> turtle.fd(50) turtle.settiltangle(angle) Parameters: **angle** – a number Rotate the turtleshape to point in the direction specified by *angle*, regardless of its current tilt-angle. *Do not* change the turtle’s heading (direction of movement). >>> turtle.reset() >>> turtle.shape("circle") >>> turtle.shapesize(5,2) >>> turtle.settiltangle(45) >>> turtle.fd(50) >>> turtle.settiltangle(-45) >>> turtle.fd(50) Deprecated since version 3.1. turtle.tiltangle(angle=None) Parameters: **angle** – a number (optional) Set or return the current tilt-angle. If angle is given, rotate the turtleshape to point in the direction specified by angle, regardless of its current tilt-angle. Do *not* change the turtle’s heading (direction of movement). If angle is not given: return the current tilt-angle, i. e. the angle between the orientation of the turtleshape and the heading of the turtle (its direction of movement). >>> turtle.reset() >>> turtle.shape("circle") >>> turtle.shapesize(5,2) >>> turtle.tilt(45) >>> turtle.tiltangle() 45.0 turtle.shapetransform(t11=None, t12=None, t21=None, t22=None) Parameters: * **t11** – a number (optional) * **t12** – a number (optional) * **t21** – a number (optional) * **t12** – a number (optional) Set or return the current transformation matrix of the turtle shape. If none of the matrix elements are given, return the transformation matrix as a tuple of 4 elements. Otherwise set the given elements and transform the turtleshape according to the matrix consisting of first row t11, t12 and second row t21, t22. The determinant t11 * t22 - t12 * t21 must not be zero, otherwise an error is raised. Modify stretchfactor, shearfactor and tiltangle according to the given matrix. >>> turtle = Turtle() >>> turtle.shape("square") >>> turtle.shapesize(4,2) >>> turtle.shearfactor(-0.5) >>> turtle.shapetransform() (4.0, -1.0, -0.0, 2.0) turtle.get_shapepoly() Return the current shape polygon as tuple of coordinate pairs. This can be used to define a new shape or components of a compound shape. >>> turtle.shape("square") >>> turtle.shapetransform(4, -1, 0, 2) >>> turtle.get_shapepoly() ((50, -20), (30, 20), (-50, 20), (-30, -20)) Using events ------------ turtle.onclick(fun, btn=1, add=None) Parameters: * **fun** – a function with two arguments which will be called with the coordinates of the clicked point on the canvas * **btn** – number of the mouse-button, defaults to 1 (left mouse button) * **add** – "True" or "False" – if "True", a new binding will be added, otherwise it will replace a former binding Bind *fun* to mouse-click events on this turtle. If *fun* is "None", existing bindings are removed. Example for the anonymous turtle, i.e. the procedural way: >>> def turn(x, y): ... left(180) ... >>> onclick(turn) # Now clicking into the turtle will turn it. >>> onclick(None) # event-binding will be removed turtle.onrelease(fun, btn=1, add=None) Parameters: * **fun** – a function with two arguments which will be called with the coordinates of the clicked point on the canvas * **btn** – number of the mouse-button, defaults to 1 (left mouse button) * **add** – "True" or "False" – if "True", a new binding will be added, otherwise it will replace a former binding Bind *fun* to mouse-button-release events on this turtle. If *fun* is "None", existing bindings are removed. >>> class MyTurtle(Turtle): ... def glow(self,x,y): ... self.fillcolor("red") ... def unglow(self,x,y): ... self.fillcolor("") ... >>> turtle = MyTurtle() >>> turtle.onclick(turtle.glow) # clicking on turtle turns fillcolor red, >>> turtle.onrelease(turtle.unglow) # releasing turns it to transparent. turtle.ondrag(fun, btn=1, add=None) Parameters: * **fun** – a function with two arguments which will be called with the coordinates of the clicked point on the canvas * **btn** – number of the mouse-button, defaults to 1 (left mouse button) * **add** – "True" or "False" – if "True", a new binding will be added, otherwise it will replace a former binding Bind *fun* to mouse-move events on this turtle. If *fun* is "None", existing bindings are removed. Remark: Every sequence of mouse-move-events on a turtle is preceded by a mouse-click event on that turtle. >>> turtle.ondrag(turtle.goto) Subsequently, clicking and dragging the Turtle will move it across the screen thereby producing handdrawings (if pen is down). Special Turtle methods ---------------------- turtle.begin_poly() Start recording the vertices of a polygon. Current turtle position is first vertex of polygon. turtle.end_poly() Stop recording the vertices of a polygon. Current turtle position is last vertex of polygon. This will be connected with the first vertex. turtle.get_poly() Return the last recorded polygon. >>> turtle.home() >>> turtle.begin_poly() >>> turtle.fd(100) >>> turtle.left(20) >>> turtle.fd(30) >>> turtle.left(60) >>> turtle.fd(50) >>> turtle.end_poly() >>> p = turtle.get_poly() >>> register_shape("myFavouriteShape", p) turtle.clone() Create and return a clone of the turtle with same position, heading and turtle properties. >>> mick = Turtle() >>> joe = mick.clone() turtle.getturtle() turtle.getpen() Return the Turtle object itself. Only reasonable use: as a function to return the “anonymous turtle”: >>> pet = getturtle() >>> pet.fd(50) >>> pet turtle.getscreen() Return the "TurtleScreen" object the turtle is drawing on. TurtleScreen methods can then be called for that object. >>> ts = turtle.getscreen() >>> ts >>> ts.bgcolor("pink") turtle.setundobuffer(size) Parameters: **size** – an integer or "None" Set or disable undobuffer. If *size* is an integer, an empty undobuffer of given size is installed. *size* gives the maximum number of turtle actions that can be undone by the "undo()" method/function. If *size* is "None", the undobuffer is disabled. >>> turtle.setundobuffer(42) turtle.undobufferentries() Return number of entries in the undobuffer. >>> while undobufferentries(): ... undo() Compound shapes --------------- To use compound turtle shapes, which consist of several polygons of different color, you must use the helper class "Shape" explicitly as described below: 1. Create an empty Shape object of type “compound”. 2. Add as many components to this object as desired, using the "addcomponent()" method. For example: >>> s = Shape("compound") >>> poly1 = ((0,0),(10,-5),(0,10),(-10,-5)) >>> s.addcomponent(poly1, "red", "blue") >>> poly2 = ((0,0),(10,-5),(-10,-5)) >>> s.addcomponent(poly2, "blue", "red") 3. Now add the Shape to the Screen’s shapelist and use it: >>> register_shape("myshape", s) >>> shape("myshape") Note: The "Shape" class is used internally by the "register_shape()" method in different ways. The application programmer has to deal with the Shape class *only* when using compound shapes like shown above! Methods of TurtleScreen/Screen and corresponding functions ========================================================== Most of the examples in this section refer to a TurtleScreen instance called "screen". Window control -------------- turtle.bgcolor(*args) Parameters: **args** – a color string or three numbers in the range 0..colormode or a 3-tuple of such numbers Set or return background color of the TurtleScreen. >>> screen.bgcolor("orange") >>> screen.bgcolor() 'orange' >>> screen.bgcolor("#800080") >>> screen.bgcolor() (128.0, 0.0, 128.0) turtle.bgpic(picname=None) Parameters: **picname** – a string, name of a gif-file or ""nopic"", or "None" Set background image or return name of current backgroundimage. If *picname* is a filename, set the corresponding image as background. If *picname* is ""nopic"", delete background image, if present. If *picname* is "None", return the filename of the current backgroundimage. >>> screen.bgpic() 'nopic' >>> screen.bgpic("landscape.gif") >>> screen.bgpic() "landscape.gif" turtle.clear() Note: This TurtleScreen method is available as a global function only under the name "clearscreen". The global function "clear" is a different one derived from the Turtle method "clear". turtle.clearscreen() Delete all drawings and all turtles from the TurtleScreen. Reset the now empty TurtleScreen to its initial state: white background, no background image, no event bindings and tracing on. turtle.reset() Note: This TurtleScreen method is available as a global function only under the name "resetscreen". The global function "reset" is another one derived from the Turtle method "reset". turtle.resetscreen() Reset all Turtles on the Screen to their initial state. turtle.screensize(canvwidth=None, canvheight=None, bg=None) Parameters: * **canvwidth** – positive integer, new width of canvas in pixels * **canvheight** – positive integer, new height of canvas in pixels * **bg** – colorstring or color-tuple, new background color If no arguments are given, return current (canvaswidth, canvasheight). Else resize the canvas the turtles are drawing on. Do not alter the drawing window. To observe hidden parts of the canvas, use the scrollbars. With this method, one can make visible those parts of a drawing which were outside the canvas before. >>> screen.screensize() (400, 300) >>> screen.screensize(2000,1500) >>> screen.screensize() (2000, 1500) e.g. to search for an erroneously escaped turtle ;-) turtle.setworldcoordinates(llx, lly, urx, ury) Parameters: * **llx** – a number, x-coordinate of lower left corner of canvas * **lly** – a number, y-coordinate of lower left corner of canvas * **urx** – a number, x-coordinate of upper right corner of canvas * **ury** – a number, y-coordinate of upper right corner of canvas Set up user-defined coordinate system and switch to mode “world” if necessary. This performs a "screen.reset()". If mode “world” is already active, all drawings are redrawn according to the new coordinates. **ATTENTION**: in user-defined coordinate systems angles may appear distorted. >>> screen.reset() >>> screen.setworldcoordinates(-50,-7.5,50,7.5) >>> for _ in range(72): ... left(10) ... >>> for _ in range(8): ... left(45); fd(2) # a regular octagon Animation control ----------------- turtle.delay(delay=None) Parameters: **delay** – positive integer Set or return the drawing *delay* in milliseconds. (This is approximately the time interval between two consecutive canvas updates.) The longer the drawing delay, the slower the animation. Optional argument: >>> screen.delay() 10 >>> screen.delay(5) >>> screen.delay() 5 turtle.tracer(n=None, delay=None) Parameters: * **n** – nonnegative integer * **delay** – nonnegative integer Turn turtle animation on/off and set delay for update drawings. If *n* is given, only each n-th regular screen update is really performed. (Can be used to accelerate the drawing of complex graphics.) When called without arguments, returns the currently stored value of n. Second argument sets delay value (see "delay()"). >>> screen.tracer(8, 25) >>> dist = 2 >>> for i in range(200): ... fd(dist) ... rt(90) ... dist += 2 turtle.update() Perform a TurtleScreen update. To be used when tracer is turned off. See also the RawTurtle/Turtle method "speed()". Using screen events ------------------- turtle.listen(xdummy=None, ydummy=None) Set focus on TurtleScreen (in order to collect key-events). Dummy arguments are provided in order to be able to pass "listen()" to the onclick method. turtle.onkey(fun, key) turtle.onkeyrelease(fun, key) Parameters: * **fun** – a function with no arguments or "None" * **key** – a string: key (e.g. “a”) or key-symbol (e.g. “space”) Bind *fun* to key-release event of key. If *fun* is "None", event bindings are removed. Remark: in order to be able to register key- events, TurtleScreen must have the focus. (See method "listen()".) >>> def f(): ... fd(50) ... lt(60) ... >>> screen.onkey(f, "Up") >>> screen.listen() turtle.onkeypress(fun, key=None) Parameters: * **fun** – a function with no arguments or "None" * **key** – a string: key (e.g. “a”) or key-symbol (e.g. “space”) Bind *fun* to key-press event of key if key is given, or to any key-press-event if no key is given. Remark: in order to be able to register key-events, TurtleScreen must have focus. (See method "listen()".) >>> def f(): ... fd(50) ... >>> screen.onkey(f, "Up") >>> screen.listen() turtle.onclick(fun, btn=1, add=None) turtle.onscreenclick(fun, btn=1, add=None) Parameters: * **fun** – a function with two arguments which will be called with the coordinates of the clicked point on the canvas * **btn** – number of the mouse-button, defaults to 1 (left mouse button) * **add** – "True" or "False" – if "True", a new binding will be added, otherwise it will replace a former binding Bind *fun* to mouse-click events on this screen. If *fun* is "None", existing bindings are removed. Example for a TurtleScreen instance named "screen" and a Turtle instance named "turtle": >>> screen.onclick(turtle.goto) # Subsequently clicking into the TurtleScreen will >>> # make the turtle move to the clicked point. >>> screen.onclick(None) # remove event binding again Note: This TurtleScreen method is available as a global function only under the name "onscreenclick". The global function "onclick" is another one derived from the Turtle method "onclick". turtle.ontimer(fun, t=0) Parameters: * **fun** – a function with no arguments * **t** – a number >= 0 Install a timer that calls *fun* after *t* milliseconds. >>> running = True >>> def f(): ... if running: ... fd(50) ... lt(60) ... screen.ontimer(f, 250) >>> f() ### makes the turtle march around >>> running = False turtle.mainloop() turtle.done() Starts event loop - calling Tkinter’s mainloop function. Must be the last statement in a turtle graphics program. Must *not* be used if a script is run from within IDLE in -n mode (No subprocess) - for interactive use of turtle graphics. >>> screen.mainloop() Input methods ------------- turtle.textinput(title, prompt) Parameters: * **title** – string * **prompt** – string Pop up a dialog window for input of a string. Parameter title is the title of the dialog window, prompt is a text mostly describing what information to input. Return the string input. If the dialog is canceled, return "None". >>> screen.textinput("NIM", "Name of first player:") turtle.numinput(title, prompt, default=None, minval=None, maxval=None) Parameters: * **title** – string * **prompt** – string * **default** – number (optional) * **minval** – number (optional) * **maxval** – number (optional) Pop up a dialog window for input of a number. title is the title of the dialog window, prompt is a text mostly describing what numerical information to input. default: default value, minval: minimum value for input, maxval: maximum value for input The number input must be in the range minval .. maxval if these are given. If not, a hint is issued and the dialog remains open for correction. Return the number input. If the dialog is canceled, return "None". >>> screen.numinput("Poker", "Your stakes:", 1000, minval=10, maxval=10000) Settings and special methods ---------------------------- turtle.mode(mode=None) Parameters: **mode** – one of the strings “standard”, “logo” or “world” Set turtle mode (“standard”, “logo” or “world”) and perform reset. If mode is not given, current mode is returned. Mode “standard” is compatible with old "turtle". Mode “logo” is compatible with most Logo turtle graphics. Mode “world” uses user- defined “world coordinates”. **Attention**: in this mode angles appear distorted if "x/y" unit-ratio doesn’t equal 1. +--------------+---------------------------+---------------------+ | Mode | Initial turtle heading | positive angles | |==============|===========================|=====================| | “standard” | to the right (east) | counterclockwise | +--------------+---------------------------+---------------------+ | “logo” | upward (north) | clockwise | +--------------+---------------------------+---------------------+ >>> mode("logo") # resets turtle heading to north >>> mode() 'logo' turtle.colormode(cmode=None) Parameters: **cmode** – one of the values 1.0 or 255 Return the colormode or set it to 1.0 or 255. Subsequently *r*, *g*, *b* values of color triples have to be in the range 0..*cmode*. >>> screen.colormode(1) >>> turtle.pencolor(240, 160, 80) Traceback (most recent call last): ... TurtleGraphicsError: bad color sequence: (240, 160, 80) >>> screen.colormode() 1.0 >>> screen.colormode(255) >>> screen.colormode() 255 >>> turtle.pencolor(240,160,80) turtle.getcanvas() Return the Canvas of this TurtleScreen. Useful for insiders who know what to do with a Tkinter Canvas. >>> cv = screen.getcanvas() >>> cv turtle.getshapes() Return a list of names of all currently available turtle shapes. >>> screen.getshapes() ['arrow', 'blank', 'circle', ..., 'turtle'] turtle.register_shape(name, shape=None) turtle.addshape(name, shape=None) There are three different ways to call this function: 1. *name* is the name of a gif-file and *shape* is "None": Install the corresponding image shape. >>> screen.register_shape("turtle.gif") Note: Image shapes *do not* rotate when turning the turtle, so they do not display the heading of the turtle! 2. *name* is an arbitrary string and *shape* is a tuple of pairs of coordinates: Install the corresponding polygon shape. >>> screen.register_shape("triangle", ((5,-3), (0,5), (-5,-3))) 3. *name* is an arbitrary string and shape is a (compound) "Shape" object: Install the corresponding compound shape. Add a turtle shape to TurtleScreen’s shapelist. Only thusly registered shapes can be used by issuing the command "shape(shapename)". turtle.turtles() Return the list of turtles on the screen. >>> for turtle in screen.turtles(): ... turtle.color("red") turtle.window_height() Return the height of the turtle window. >>> screen.window_height() 480 turtle.window_width() Return the width of the turtle window. >>> screen.window_width() 640 Methods specific to Screen, not inherited from TurtleScreen ----------------------------------------------------------- turtle.bye() Shut the turtlegraphics window. turtle.exitonclick() Bind "bye()" method to mouse clicks on the Screen. If the value “using_IDLE” in the configuration dictionary is "False" (default value), also enter mainloop. Remark: If IDLE with the "-n" switch (no subprocess) is used, this value should be set to "True" in "turtle.cfg". In this case IDLE’s own mainloop is active also for the client script. turtle.setup(width=_CFG["width"], height=_CFG["height"], startx=_CFG["leftright"], starty=_CFG["topbottom"]) Set the size and position of the main window. Default values of arguments are stored in the configuration dictionary and can be changed via a "turtle.cfg" file. Parameters: * **width** – if an integer, a size in pixels, if a float, a fraction of the screen; default is 50% of screen * **height** – if an integer, the height in pixels, if a float, a fraction of the screen; default is 75% of screen * **startx** – if positive, starting position in pixels from the left edge of the screen, if negative from the right edge, if "None", center window horizontally * **starty** – if positive, starting position in pixels from the top edge of the screen, if negative from the bottom edge, if "None", center window vertically >>> screen.setup (width=200, height=200, startx=0, starty=0) >>> # sets window to 200x200 pixels, in upper left of screen >>> screen.setup(width=.75, height=0.5, startx=None, starty=None) >>> # sets window to 75% of screen by 50% of screen and centers turtle.title(titlestring) Parameters: **titlestring** – a string that is shown in the titlebar of the turtle graphics window Set title of turtle window to *titlestring*. >>> screen.title("Welcome to the turtle zoo!") Public classes ============== class turtle.RawTurtle(canvas) class turtle.RawPen(canvas) Parameters: **canvas** – a "tkinter.Canvas", a "ScrolledCanvas" or a "TurtleScreen" Create a turtle. The turtle has all methods described above as “methods of Turtle/RawTurtle”. class turtle.Turtle Subclass of RawTurtle, has the same interface but draws on a default "Screen" object created automatically when needed for the first time. class turtle.TurtleScreen(cv) Parameters: **cv** – a "tkinter.Canvas" Provides screen oriented methods like "setbg()" etc. that are described above. class turtle.Screen Subclass of TurtleScreen, with four methods added. class turtle.ScrolledCanvas(master) Parameters: **master** – some Tkinter widget to contain the ScrolledCanvas, i.e. a Tkinter-canvas with scrollbars added Used by class Screen, which thus automatically provides a ScrolledCanvas as playground for the turtles. class turtle.Shape(type_, data) Parameters: **type_** – one of the strings “polygon”, “image”, “compound” Data structure modeling shapes. The pair "(type_, data)" must follow this specification: +-------------+------------------------------------------------------------+ | *type_* | *data* | |=============|============================================================| | “polygon” | a polygon-tuple, i.e. a tuple of pairs of coordinates | +-------------+------------------------------------------------------------+ | “image” | an image (in this form only used internally!) | +-------------+------------------------------------------------------------+ | “compound” | "None" (a compound shape has to be constructed using the | | | "addcomponent()" method) | +-------------+------------------------------------------------------------+ addcomponent(poly, fill, outline=None) Parameters: * **poly** – a polygon, i.e. a tuple of pairs of numbers * **fill** – a color the *poly* will be filled with * **outline** – a color for the poly’s outline (if given) Example: >>> poly = ((0,0),(10,-5),(0,10),(-10,-5)) >>> s = Shape("compound") >>> s.addcomponent(poly, "red", "blue") >>> # ... add more components and then use register_shape() See Compound shapes. class turtle.Vec2D(x, y) A two-dimensional vector class, used as a helper class for implementing turtle graphics. May be useful for turtle graphics programs too. Derived from tuple, so a vector is a tuple! Provides (for *a*, *b* vectors, *k* number): * "a + b" vector addition * "a - b" vector subtraction * "a * b" inner product * "k * a" and "a * k" multiplication with scalar * "abs(a)" absolute value of a * "a.rotate(angle)" rotation Help and configuration ====================== How to use help --------------- The public methods of the Screen and Turtle classes are documented extensively via docstrings. So these can be used as online-help via the Python help facilities: * When using IDLE, tooltips show the signatures and first lines of the docstrings of typed in function-/method calls. * Calling "help()" on methods or functions displays the docstrings: >>> help(Screen.bgcolor) Help on method bgcolor in module turtle: bgcolor(self, *args) unbound turtle.Screen method Set or return backgroundcolor of the TurtleScreen. Arguments (if given): a color string or three numbers in the range 0..colormode or a 3-tuple of such numbers. >>> screen.bgcolor("orange") >>> screen.bgcolor() "orange" >>> screen.bgcolor(0.5,0,0.5) >>> screen.bgcolor() "#800080" >>> help(Turtle.penup) Help on method penup in module turtle: penup(self) unbound turtle.Turtle method Pull the pen up -- no drawing when moving. Aliases: penup | pu | up No argument >>> turtle.penup() * The docstrings of the functions which are derived from methods have a modified form: >>> help(bgcolor) Help on function bgcolor in module turtle: bgcolor(*args) Set or return backgroundcolor of the TurtleScreen. Arguments (if given): a color string or three numbers in the range 0..colormode or a 3-tuple of such numbers. Example:: >>> bgcolor("orange") >>> bgcolor() "orange" >>> bgcolor(0.5,0,0.5) >>> bgcolor() "#800080" >>> help(penup) Help on function penup in module turtle: penup() Pull the pen up -- no drawing when moving. Aliases: penup | pu | up No argument Example: >>> penup() These modified docstrings are created automatically together with the function definitions that are derived from the methods at import time. Translation of docstrings into different languages -------------------------------------------------- There is a utility to create a dictionary the keys of which are the method names and the values of which are the docstrings of the public methods of the classes Screen and Turtle. turtle.write_docstringdict(filename="turtle_docstringdict") Parameters: **filename** – a string, used as filename Create and write docstring-dictionary to a Python script with the given filename. This function has to be called explicitly (it is not used by the turtle graphics classes). The docstring dictionary will be written to the Python script "*filename*.py". It is intended to serve as a template for translation of the docstrings into different languages. If you (or your students) want to use "turtle" with online help in your native language, you have to translate the docstrings and save the resulting file as e.g. "turtle_docstringdict_german.py". If you have an appropriate entry in your "turtle.cfg" file this dictionary will be read in at import time and will replace the original English docstrings. At the time of this writing there are docstring dictionaries in German and in Italian. (Requests please to glingl@aon.at.) How to configure Screen and Turtles ----------------------------------- The built-in default configuration mimics the appearance and behaviour of the old turtle module in order to retain best possible compatibility with it. If you want to use a different configuration which better reflects the features of this module or which better fits to your needs, e.g. for use in a classroom, you can prepare a configuration file "turtle.cfg" which will be read at import time and modify the configuration according to its settings. The built in configuration would correspond to the following turtle.cfg: width = 0.5 height = 0.75 leftright = None topbottom = None canvwidth = 400 canvheight = 300 mode = standard colormode = 1.0 delay = 10 undobuffersize = 1000 shape = classic pencolor = black fillcolor = black resizemode = noresize visible = True language = english exampleturtle = turtle examplescreen = screen title = Python Turtle Graphics using_IDLE = False Short explanation of selected entries: * The first four lines correspond to the arguments of the "Screen.setup()" method. * Line 5 and 6 correspond to the arguments of the method "Screen.screensize()". * *shape* can be any of the built-in shapes, e.g: arrow, turtle, etc. For more info try "help(shape)". * If you want to use no fillcolor (i.e. make the turtle transparent), you have to write "fillcolor = """ (but all nonempty strings must not have quotes in the cfg-file). * If you want to reflect the turtle its state, you have to use "resizemode = auto". * If you set e.g. "language = italian" the docstringdict "turtle_docstringdict_italian.py" will be loaded at import time (if present on the import path, e.g. in the same directory as "turtle". * The entries *exampleturtle* and *examplescreen* define the names of these objects as they occur in the docstrings. The transformation of method-docstrings to function-docstrings will delete these names from the docstrings. * *using_IDLE*: Set this to "True" if you regularly work with IDLE and its -n switch (“no subprocess”). This will prevent "exitonclick()" to enter the mainloop. There can be a "turtle.cfg" file in the directory where "turtle" is stored and an additional one in the current working directory. The latter will override the settings of the first one. The "Lib/turtledemo" directory contains a "turtle.cfg" file. You can study it as an example and see its effects when running the demos (preferably not from within the demo-viewer). "turtledemo" — Demo scripts =========================== The "turtledemo" package includes a set of demo scripts. These scripts can be run and viewed using the supplied demo viewer as follows: python -m turtledemo Alternatively, you can run the demo scripts individually. For example, python -m turtledemo.bytedesign The "turtledemo" package directory contains: * A demo viewer "__main__.py" which can be used to view the sourcecode of the scripts and run them at the same time. * Multiple scripts demonstrating different features of the "turtle" module. Examples can be accessed via the Examples menu. They can also be run standalone. * A "turtle.cfg" file which serves as an example of how to write and use such files. The demo scripts are: +------------------+--------------------------------+-------------------------+ | Name | Description | Features | |==================|================================|=========================| | bytedesign | complex classical turtle | "tracer()", delay, | | | graphics pattern | "update()" | +------------------+--------------------------------+-------------------------+ | chaos | graphs Verhulst dynamics, | world coordinates | | | shows that computer’s | | | | computations can generate | | | | results sometimes against the | | | | common sense expectations | | +------------------+--------------------------------+-------------------------+ | clock | analog clock showing time of | turtles as clock’s | | | your computer | hands, ontimer | +------------------+--------------------------------+-------------------------+ | colormixer | experiment with r, g, b | "ondrag()" | +------------------+--------------------------------+-------------------------+ | forest | 3 breadth-first trees | randomization | +------------------+--------------------------------+-------------------------+ | fractalcurves | Hilbert & Koch curves | recursion | +------------------+--------------------------------+-------------------------+ | lindenmayer | ethnomathematics (indian | L-System | | | kolams) | | +------------------+--------------------------------+-------------------------+ | minimal_hanoi | Towers of Hanoi | Rectangular Turtles as | | | | Hanoi discs (shape, | | | | shapesize) | +------------------+--------------------------------+-------------------------+ | nim | play the classical nim game | turtles as nimsticks, | | | with three heaps of sticks | event driven (mouse, | | | against the computer. | keyboard) | +------------------+--------------------------------+-------------------------+ | paint | super minimalistic drawing | "onclick()" | | | program | | +------------------+--------------------------------+-------------------------+ | peace | elementary | turtle: appearance and | | | | animation | +------------------+--------------------------------+-------------------------+ | penrose | aperiodic tiling with kites | "stamp()" | | | and darts | | +------------------+--------------------------------+-------------------------+ | planet_and_moon | simulation of gravitational | compound shapes, | | | system | "Vec2D" | +------------------+--------------------------------+-------------------------+ | round_dance | dancing turtles rotating | compound shapes, clone | | | pairwise in opposite direction | shapesize, tilt, | | | | get_shapepoly, update | +------------------+--------------------------------+-------------------------+ | sorting_animate | visual demonstration of | simple alignment, | | | different sorting methods | randomization | +------------------+--------------------------------+-------------------------+ | tree | a (graphical) breadth first | "clone()" | | | tree (using generators) | | +------------------+--------------------------------+-------------------------+ | two_canvases | simple design | turtles on two canvases | +------------------+--------------------------------+-------------------------+ | wikipedia | a pattern from the wikipedia | "clone()", "undo()" | | | article on turtle graphics | | +------------------+--------------------------------+-------------------------+ | yinyang | another elementary example | "circle()" | +------------------+--------------------------------+-------------------------+ Have fun! Changes since Python 2.6 ======================== * The methods "Turtle.tracer()", "Turtle.window_width()" and "Turtle.window_height()" have been eliminated. Methods with these names and functionality are now available only as methods of "Screen". The functions derived from these remain available. (In fact already in Python 2.6 these methods were merely duplications of the corresponding "TurtleScreen"/"Screen"-methods.) * The method "Turtle.fill()" has been eliminated. The behaviour of "begin_fill()" and "end_fill()" have changed slightly: now every filling-process must be completed with an "end_fill()" call. * A method "Turtle.filling()" has been added. It returns a boolean value: "True" if a filling process is under way, "False" otherwise. This behaviour corresponds to a "fill()" call without arguments in Python 2.6. Changes since Python 3.0 ======================== * The methods "Turtle.shearfactor()", "Turtle.shapetransform()" and "Turtle.get_shapepoly()" have been added. Thus the full range of regular linear transforms is now available for transforming turtle shapes. "Turtle.tiltangle()" has been enhanced in functionality: it now can be used to get or set the tiltangle. "Turtle.settiltangle()" has been deprecated. * The method "Screen.onkeypress()" has been added as a complement to "Screen.onkey()" which in fact binds actions to the keyrelease event. Accordingly the latter has got an alias: "Screen.onkeyrelease()". * The method "Screen.mainloop()" has been added. So when working only with Screen and Turtle objects one must not additionally import "mainloop()" anymore. * Two input methods has been added "Screen.textinput()" and "Screen.numinput()". These popup input dialogs and return strings and numbers respectively. * Two example scripts "tdemo_nim.py" and "tdemo_round_dance.py" have been added to the "Lib/turtledemo" directory. "types" — Dynamic type creation and names for built-in types ************************************************************ **Source code:** Lib/types.py ====================================================================== This module defines utility functions to assist in dynamic creation of new types. It also defines names for some object types that are used by the standard Python interpreter, but not exposed as builtins like "int" or "str" are. Finally, it provides some additional type-related utility classes and functions that are not fundamental enough to be builtins. Dynamic Type Creation ===================== types.new_class(name, bases=(), kwds=None, exec_body=None) Creates a class object dynamically using the appropriate metaclass. The first three arguments are the components that make up a class definition header: the class name, the base classes (in order), the keyword arguments (such as "metaclass"). The *exec_body* argument is a callback that is used to populate the freshly created class namespace. It should accept the class namespace as its sole argument and update the namespace directly with the class contents. If no callback is provided, it has the same effect as passing in "lambda ns: None". New in version 3.3. types.prepare_class(name, bases=(), kwds=None) Calculates the appropriate metaclass and creates the class namespace. The arguments are the components that make up a class definition header: the class name, the base classes (in order) and the keyword arguments (such as "metaclass"). The return value is a 3-tuple: "metaclass, namespace, kwds" *metaclass* is the appropriate metaclass, *namespace* is the prepared class namespace and *kwds* is an updated copy of the passed in *kwds* argument with any "'metaclass'" entry removed. If no *kwds* argument is passed in, this will be an empty dict. New in version 3.3. Changed in version 3.6: The default value for the "namespace" element of the returned tuple has changed. Now an insertion-order- preserving mapping is used when the metaclass does not have a "__prepare__" method. See also: Metaclasses Full details of the class creation process supported by these functions **PEP 3115** - Metaclasses in Python 3000 Introduced the "__prepare__" namespace hook types.resolve_bases(bases) Resolve MRO entries dynamically as specified by **PEP 560**. This function looks for items in *bases* that are not instances of "type", and returns a tuple where each such object that has an "__mro_entries__" method is replaced with an unpacked result of calling this method. If a *bases* item is an instance of "type", or it doesn’t have an "__mro_entries__" method, then it is included in the return tuple unchanged. New in version 3.7. See also: **PEP 560** - Core support for typing module and generic types Standard Interpreter Types ========================== This module provides names for many of the types that are required to implement a Python interpreter. It deliberately avoids including some of the types that arise only incidentally during processing such as the "listiterator" type. Typical use of these names is for "isinstance()" or "issubclass()" checks. If you instantiate any of these types, note that signatures may vary between Python versions. Standard names are defined for the following types: types.FunctionType types.LambdaType The type of user-defined functions and functions created by "lambda" expressions. Raises an auditing event "function.__new__" with argument "code". The audit event only occurs for direct instantiation of function objects, and is not raised for normal compilation. types.GeneratorType The type of *generator*-iterator objects, created by generator functions. types.CoroutineType The type of *coroutine* objects, created by "async def" functions. New in version 3.5. types.AsyncGeneratorType The type of *asynchronous generator*-iterator objects, created by asynchronous generator functions. New in version 3.6. class types.CodeType(**kwargs) The type for code objects such as returned by "compile()". Raises an auditing event "code.__new__" with arguments "code", "filename", "name", "argcount", "posonlyargcount", "kwonlyargcount", "nlocals", "stacksize", "flags". Note that the audited arguments may not match the names or positions required by the initializer. The audit event only occurs for direct instantiation of code objects, and is not raised for normal compilation. replace(**kwargs) Return a copy of the code object with new values for the specified fields. New in version 3.8. types.CellType The type for cell objects: such objects are used as containers for a function’s free variables. New in version 3.8. types.MethodType The type of methods of user-defined class instances. types.BuiltinFunctionType types.BuiltinMethodType The type of built-in functions like "len()" or "sys.exit()", and methods of built-in classes. (Here, the term “built-in” means “written in C”.) types.WrapperDescriptorType The type of methods of some built-in data types and base classes such as "object.__init__()" or "object.__lt__()". New in version 3.7. types.MethodWrapperType The type of *bound* methods of some built-in data types and base classes. For example it is the type of "object().__str__". New in version 3.7. types.MethodDescriptorType The type of methods of some built-in data types such as "str.join()". New in version 3.7. types.ClassMethodDescriptorType The type of *unbound* class methods of some built-in data types such as "dict.__dict__['fromkeys']". New in version 3.7. class types.ModuleType(name, doc=None) The type of *modules*. The constructor takes the name of the module to be created and optionally its *docstring*. Note: Use "importlib.util.module_from_spec()" to create a new module if you wish to set the various import-controlled attributes. __doc__ The *docstring* of the module. Defaults to "None". __loader__ The *loader* which loaded the module. Defaults to "None". This attribute is to match "importlib.machinery.ModuleSpec.loader" as stored in the attr:*__spec__* object. Note: A future version of Python may stop setting this attribute by default. To guard against this potential change, preferably read from the "__spec__" attribute instead or use "getattr(module, "__loader__", None)" if you explicitly need to use this attribute. Changed in version 3.4: Defaults to "None". Previously the attribute was optional. __name__ The name of the module. Expected to match "importlib.machinery.ModuleSpec.name". __package__ Which *package* a module belongs to. If the module is top-level (i.e. not a part of any specific package) then the attribute should be set to "''", else it should be set to the name of the package (which can be "__name__" if the module is a package itself). Defaults to "None". This attribute is to match "importlib.machinery.ModuleSpec.parent" as stored in the attr:*__spec__* object. Note: A future version of Python may stop setting this attribute by default. To guard against this potential change, preferably read from the "__spec__" attribute instead or use "getattr(module, "__package__", None)" if you explicitly need to use this attribute. Changed in version 3.4: Defaults to "None". Previously the attribute was optional. __spec__ A record of the module’s import-system-related state. Expected to be an instance of "importlib.machinery.ModuleSpec". New in version 3.4. class types.GenericAlias(t_origin, t_args) The type of parameterized generics such as "list[int]". "t_origin" should be a non-parameterized generic class, such as "list", "tuple" or "dict". "t_args" should be a "tuple" (possibly of length 1) of types which parameterize "t_origin": >>> from types import GenericAlias >>> list[int] == GenericAlias(list, (int,)) True >>> dict[str, int] == GenericAlias(dict, (str, int)) True New in version 3.9. Changed in version 3.9.2: This type can now be subclassed. class types.TracebackType(tb_next, tb_frame, tb_lasti, tb_lineno) The type of traceback objects such as found in "sys.exc_info()[2]". See the language reference for details of the available attributes and operations, and guidance on creating tracebacks dynamically. types.FrameType The type of frame objects such as found in "tb.tb_frame" if "tb" is a traceback object. See the language reference for details of the available attributes and operations. types.GetSetDescriptorType The type of objects defined in extension modules with "PyGetSetDef", such as "FrameType.f_locals" or "array.array.typecode". This type is used as descriptor for object attributes; it has the same purpose as the "property" type, but for classes defined in extension modules. types.MemberDescriptorType The type of objects defined in extension modules with "PyMemberDef", such as "datetime.timedelta.days". This type is used as descriptor for simple C data members which use standard conversion functions; it has the same purpose as the "property" type, but for classes defined in extension modules. **CPython implementation detail:** In other implementations of Python, this type may be identical to "GetSetDescriptorType". class types.MappingProxyType(mapping) Read-only proxy of a mapping. It provides a dynamic view on the mapping’s entries, which means that when the mapping changes, the view reflects these changes. New in version 3.3. Changed in version 3.9: Updated to support the new union ("|") operator from **PEP 584**, which simply delegates to the underlying mapping. key in proxy Return "True" if the underlying mapping has a key *key*, else "False". proxy[key] Return the item of the underlying mapping with key *key*. Raises a "KeyError" if *key* is not in the underlying mapping. iter(proxy) Return an iterator over the keys of the underlying mapping. This is a shortcut for "iter(proxy.keys())". len(proxy) Return the number of items in the underlying mapping. copy() Return a shallow copy of the underlying mapping. get(key[, default]) Return the value for *key* if *key* is in the underlying mapping, else *default*. If *default* is not given, it defaults to "None", so that this method never raises a "KeyError". items() Return a new view of the underlying mapping’s items ("(key, value)" pairs). keys() Return a new view of the underlying mapping’s keys. values() Return a new view of the underlying mapping’s values. reversed(proxy) Return a reverse iterator over the keys of the underlying mapping. New in version 3.9. Additional Utility Classes and Functions ======================================== class types.SimpleNamespace A simple "object" subclass that provides attribute access to its namespace, as well as a meaningful repr. Unlike "object", with "SimpleNamespace" you can add and remove attributes. If a "SimpleNamespace" object is initialized with keyword arguments, those are directly added to the underlying namespace. The type is roughly equivalent to the following code: class SimpleNamespace: def __init__(self, /, **kwargs): self.__dict__.update(kwargs) def __repr__(self): items = (f"{k}={v!r}" for k, v in self.__dict__.items()) return "{}({})".format(type(self).__name__, ", ".join(items)) def __eq__(self, other): if isinstance(self, SimpleNamespace) and isinstance(other, SimpleNamespace): return self.__dict__ == other.__dict__ return NotImplemented "SimpleNamespace" may be useful as a replacement for "class NS: pass". However, for a structured record type use "namedtuple()" instead. New in version 3.3. Changed in version 3.9: Attribute order in the repr changed from alphabetical to insertion (like "dict"). types.DynamicClassAttribute(fget=None, fset=None, fdel=None, doc=None) Route attribute access on a class to __getattr__. This is a descriptor, used to define attributes that act differently when accessed through an instance and through a class. Instance access remains normal, but access to an attribute through a class will be routed to the class’s __getattr__ method; this is done by raising AttributeError. This allows one to have properties active on an instance, and have virtual attributes on the class with the same name (see "enum.Enum" for an example). New in version 3.4. Coroutine Utility Functions =========================== types.coroutine(gen_func) This function transforms a *generator* function into a *coroutine function* which returns a generator-based coroutine. The generator- based coroutine is still a *generator iterator*, but is also considered to be a *coroutine* object and is *awaitable*. However, it may not necessarily implement the "__await__()" method. If *gen_func* is a generator function, it will be modified in- place. If *gen_func* is not a generator function, it will be wrapped. If it returns an instance of "collections.abc.Generator", the instance will be wrapped in an *awaitable* proxy object. All other types of objects will be returned as is. New in version 3.5. "typing" — Support for type hints ********************************* New in version 3.5. **Source code:** Lib/typing.py Note: The Python runtime does not enforce function and variable type annotations. They can be used by third party tools such as type checkers, IDEs, linters, etc. ====================================================================== This module provides runtime support for type hints. The most fundamental support consists of the types "Any", "Union", "Callable", "TypeVar", and "Generic". For a full specification, please see **PEP 484**. For a simplified introduction to type hints, see **PEP 483**. The function below takes and returns a string and is annotated as follows: def greeting(name: str) -> str: return 'Hello ' + name In the function "greeting", the argument "name" is expected to be of type "str" and the return type "str". Subtypes are accepted as arguments. New features are frequently added to the "typing" module. The typing_extensions package provides backports of these new features to older versions of Python. Relevant PEPs ============= Since the initial introduction of type hints in **PEP 484** and **PEP 483**, a number of PEPs have modified and enhanced Python’s framework for type annotations. These include: * **PEP 526**: Syntax for Variable Annotations *Introducing* syntax for annotating variables outside of function definitions, and "ClassVar" * **PEP 544**: Protocols: Structural subtyping (static duck typing) *Introducing* "Protocol" and the "@runtime_checkable" decorator * **PEP 585**: Type Hinting Generics In Standard Collections *Introducing* "types.GenericAlias" and the ability to use standard library classes as generic types * **PEP 586**: Literal Types *Introducing* "Literal" * **PEP 589**: TypedDict: Type Hints for Dictionaries with a Fixed Set of Keys *Introducing* "TypedDict" * **PEP 591**: Adding a final qualifier to typing *Introducing* "Final" and the "@final" decorator * **PEP 593**: Flexible function and variable annotations *Introducing* "Annotated" Type aliases ============ A type alias is defined by assigning the type to the alias. In this example, "Vector" and "list[float]" will be treated as interchangeable synonyms: Vector = list[float] def scale(scalar: float, vector: Vector) -> Vector: return [scalar * num for num in vector] # typechecks; a list of floats qualifies as a Vector. new_vector = scale(2.0, [1.0, -4.2, 5.4]) Type aliases are useful for simplifying complex type signatures. For example: from collections.abc import Sequence ConnectionOptions = dict[str, str] Address = tuple[str, int] Server = tuple[Address, ConnectionOptions] def broadcast_message(message: str, servers: Sequence[Server]) -> None: ... # The static type checker will treat the previous type signature as # being exactly equivalent to this one. def broadcast_message( message: str, servers: Sequence[tuple[tuple[str, int], dict[str, str]]]) -> None: ... Note that "None" as a type hint is a special case and is replaced by "type(None)". NewType ======= Use the "NewType()" helper to create distinct types: from typing import NewType UserId = NewType('UserId', int) some_id = UserId(524313) The static type checker will treat the new type as if it were a subclass of the original type. This is useful in helping catch logical errors: def get_user_name(user_id: UserId) -> str: ... # typechecks user_a = get_user_name(UserId(42351)) # does not typecheck; an int is not a UserId user_b = get_user_name(-1) You may still perform all "int" operations on a variable of type "UserId", but the result will always be of type "int". This lets you pass in a "UserId" wherever an "int" might be expected, but will prevent you from accidentally creating a "UserId" in an invalid way: # 'output' is of type 'int', not 'UserId' output = UserId(23413) + UserId(54341) Note that these checks are enforced only by the static type checker. At runtime, the statement "Derived = NewType('Derived', Base)" will make "Derived" a callable that immediately returns whatever parameter you pass it. That means the expression "Derived(some_value)" does not create a new class or introduce any overhead beyond that of a regular function call. More precisely, the expression "some_value is Derived(some_value)" is always true at runtime. This also means that it is not possible to create a subtype of "Derived" since it is an identity function at runtime, not an actual type: from typing import NewType UserId = NewType('UserId', int) # Fails at runtime and does not typecheck class AdminUserId(UserId): pass However, it is possible to create a "NewType()" based on a ‘derived’ "NewType": from typing import NewType UserId = NewType('UserId', int) ProUserId = NewType('ProUserId', UserId) and typechecking for "ProUserId" will work as expected. See **PEP 484** for more details. Note: Recall that the use of a type alias declares two types to be *equivalent* to one another. Doing "Alias = Original" will make the static type checker treat "Alias" as being *exactly equivalent* to "Original" in all cases. This is useful when you want to simplify complex type signatures.In contrast, "NewType" declares one type to be a *subtype* of another. Doing "Derived = NewType('Derived', Original)" will make the static type checker treat "Derived" as a *subclass* of "Original", which means a value of type "Original" cannot be used in places where a value of type "Derived" is expected. This is useful when you want to prevent logic errors with minimal runtime cost. New in version 3.5.2. Callable ======== Frameworks expecting callback functions of specific signatures might be type hinted using "Callable[[Arg1Type, Arg2Type], ReturnType]". For example: from collections.abc import Callable def feeder(get_next_item: Callable[[], str]) -> None: # Body def async_query(on_success: Callable[[int], None], on_error: Callable[[int, Exception], None]) -> None: # Body async def on_update(value: str) -> None: # Body callback: Callable[[str], Awaitable[None]] = on_update It is possible to declare the return type of a callable without specifying the call signature by substituting a literal ellipsis for the list of arguments in the type hint: "Callable[..., ReturnType]". Generics ======== Since type information about objects kept in containers cannot be statically inferred in a generic way, abstract base classes have been extended to support subscription to denote expected types for container elements. from collections.abc import Mapping, Sequence def notify_by_email(employees: Sequence[Employee], overrides: Mapping[str, str]) -> None: ... Generics can be parameterized by using a factory available in typing called "TypeVar". from collections.abc import Sequence from typing import TypeVar T = TypeVar('T') # Declare type variable def first(l: Sequence[T]) -> T: # Generic function return l[0] User-defined generic types ========================== A user-defined class can be defined as a generic class. from typing import TypeVar, Generic from logging import Logger T = TypeVar('T') class LoggedVar(Generic[T]): def __init__(self, value: T, name: str, logger: Logger) -> None: self.name = name self.logger = logger self.value = value def set(self, new: T) -> None: self.log('Set ' + repr(self.value)) self.value = new def get(self) -> T: self.log('Get ' + repr(self.value)) return self.value def log(self, message: str) -> None: self.logger.info('%s: %s', self.name, message) "Generic[T]" as a base class defines that the class "LoggedVar" takes a single type parameter "T" . This also makes "T" valid as a type within the class body. The "Generic" base class defines "__class_getitem__()" so that "LoggedVar[t]" is valid as a type: from collections.abc import Iterable def zero_all_vars(vars: Iterable[LoggedVar[int]]) -> None: for var in vars: var.set(0) A generic type can have any number of type variables. All varieties of "TypeVar" are permissible as parameters for a generic type: from typing import TypeVar, Generic, Sequence T = TypeVar('T', contravariant=True) B = TypeVar('B', bound=Sequence[bytes], covariant=True) S = TypeVar('S', int, str) class WeirdTrio(Generic[T, B, S]): ... Each type variable argument to "Generic" must be distinct. This is thus invalid: from typing import TypeVar, Generic ... T = TypeVar('T') class Pair(Generic[T, T]): # INVALID ... You can use multiple inheritance with "Generic": from collections.abc import Sized from typing import TypeVar, Generic T = TypeVar('T') class LinkedList(Sized, Generic[T]): ... When inheriting from generic classes, some type variables could be fixed: from collections.abc import Mapping from typing import TypeVar T = TypeVar('T') class MyDict(Mapping[str, T]): ... In this case "MyDict" has a single parameter, "T". Using a generic class without specifying type parameters assumes "Any" for each position. In the following example, "MyIterable" is not generic but implicitly inherits from "Iterable[Any]": from collections.abc import Iterable class MyIterable(Iterable): # Same as Iterable[Any] User defined generic type aliases are also supported. Examples: from collections.abc import Iterable from typing import TypeVar, Union S = TypeVar('S') Response = Union[Iterable[S], int] # Return type here is same as Union[Iterable[str], int] def response(query: str) -> Response[str]: ... T = TypeVar('T', int, float, complex) Vec = Iterable[tuple[T, T]] def inproduct(v: Vec[T]) -> T: # Same as Iterable[tuple[T, T]] return sum(x*y for x, y in v) Changed in version 3.7: "Generic" no longer has a custom metaclass. A user-defined generic class can have ABCs as base classes without a metaclass conflict. Generic metaclasses are not supported. The outcome of parameterizing generics is cached, and most types in the typing module are hashable and comparable for equality. The "Any" type ============== A special kind of type is "Any". A static type checker will treat every type as being compatible with "Any" and "Any" as being compatible with every type. This means that it is possible to perform any operation or method call on a value of type "Any" and assign it to any variable: from typing import Any a: Any = None a = [] # OK a = 2 # OK s: str = '' s = a # OK def foo(item: Any) -> int: # Typechecks; 'item' could be any type, # and that type might have a 'bar' method item.bar() ... Notice that no typechecking is performed when assigning a value of type "Any" to a more precise type. For example, the static type checker did not report an error when assigning "a" to "s" even though "s" was declared to be of type "str" and receives an "int" value at runtime! Furthermore, all functions without a return type or parameter types will implicitly default to using "Any": def legacy_parser(text): ... return data # A static type checker will treat the above # as having the same signature as: def legacy_parser(text: Any) -> Any: ... return data This behavior allows "Any" to be used as an *escape hatch* when you need to mix dynamically and statically typed code. Contrast the behavior of "Any" with the behavior of "object". Similar to "Any", every type is a subtype of "object". However, unlike "Any", the reverse is not true: "object" is *not* a subtype of every other type. That means when the type of a value is "object", a type checker will reject almost all operations on it, and assigning it to a variable (or using it as a return value) of a more specialized type is a type error. For example: def hash_a(item: object) -> int: # Fails; an object does not have a 'magic' method. item.magic() ... def hash_b(item: Any) -> int: # Typechecks item.magic() ... # Typechecks, since ints and strs are subclasses of object hash_a(42) hash_a("foo") # Typechecks, since Any is compatible with all types hash_b(42) hash_b("foo") Use "object" to indicate that a value could be any type in a typesafe manner. Use "Any" to indicate that a value is dynamically typed. Nominal vs structural subtyping =============================== Initially **PEP 484** defined the Python static type system as using *nominal subtyping*. This means that a class "A" is allowed where a class "B" is expected if and only if "A" is a subclass of "B". This requirement previously also applied to abstract base classes, such as "Iterable". The problem with this approach is that a class had to be explicitly marked to support them, which is unpythonic and unlike what one would normally do in idiomatic dynamically typed Python code. For example, this conforms to **PEP 484**: from collections.abc import Sized, Iterable, Iterator class Bucket(Sized, Iterable[int]): ... def __len__(self) -> int: ... def __iter__(self) -> Iterator[int]: ... **PEP 544** allows to solve this problem by allowing users to write the above code without explicit base classes in the class definition, allowing "Bucket" to be implicitly considered a subtype of both "Sized" and "Iterable[int]" by static type checkers. This is known as *structural subtyping* (or static duck-typing): from collections.abc import Iterator, Iterable class Bucket: # Note: no base classes ... def __len__(self) -> int: ... def __iter__(self) -> Iterator[int]: ... def collect(items: Iterable[int]) -> int: ... result = collect(Bucket()) # Passes type check Moreover, by subclassing a special class "Protocol", a user can define new custom protocols to fully enjoy structural subtyping (see examples below). Module contents =============== The module defines the following classes, functions and decorators. Note: This module defines several types that are subclasses of pre- existing standard library classes which also extend "Generic" to support type variables inside "[]". These types became redundant in Python 3.9 when the corresponding pre-existing classes were enhanced to support "[]".The redundant types are deprecated as of Python 3.9 but no deprecation warnings will be issued by the interpreter. It is expected that type checkers will flag the deprecated types when the checked program targets Python 3.9 or newer.The deprecated types will be removed from the "typing" module in the first Python version released 5 years after the release of Python 3.9.0. See details in **PEP 585**—*Type Hinting Generics In Standard Collections*. Special typing primitives ------------------------- Special types ~~~~~~~~~~~~~ These can be used as types in annotations and do not support "[]". typing.Any Special type indicating an unconstrained type. * Every type is compatible with "Any". * "Any" is compatible with every type. typing.NoReturn Special type indicating that a function never returns. For example: from typing import NoReturn def stop() -> NoReturn: raise RuntimeError('no way') New in version 3.5.4. New in version 3.6.2. Special forms ~~~~~~~~~~~~~ These can be used as types in annotations using "[]", each having a unique syntax. typing.Tuple Tuple type; "Tuple[X, Y]" is the type of a tuple of two items with the first item of type X and the second of type Y. The type of the empty tuple can be written as "Tuple[()]". Example: "Tuple[T1, T2]" is a tuple of two elements corresponding to type variables T1 and T2. "Tuple[int, float, str]" is a tuple of an int, a float and a string. To specify a variable-length tuple of homogeneous type, use literal ellipsis, e.g. "Tuple[int, ...]". A plain "Tuple" is equivalent to "Tuple[Any, ...]", and in turn to "tuple". Deprecated since version 3.9: "builtins.tuple" now supports "[]". See **PEP 585** and Generic Alias Type. typing.Union Union type; "Union[X, Y]" means either X or Y. To define a union, use e.g. "Union[int, str]". Details: * The arguments must be types and there must be at least one. * Unions of unions are flattened, e.g.: Union[Union[int, str], float] == Union[int, str, float] * Unions of a single argument vanish, e.g.: Union[int] == int # The constructor actually returns int * Redundant arguments are skipped, e.g.: Union[int, str, int] == Union[int, str] * When comparing unions, the argument order is ignored, e.g.: Union[int, str] == Union[str, int] * You cannot subclass or instantiate a union. * You cannot write "Union[X][Y]". * You can use "Optional[X]" as a shorthand for "Union[X, None]". Changed in version 3.7: Don’t remove explicit subclasses from unions at runtime. typing.Optional Optional type. "Optional[X]" is equivalent to "Union[X, None]". Note that this is not the same concept as an optional argument, which is one that has a default. An optional argument with a default does not require the "Optional" qualifier on its type annotation just because it is optional. For example: def foo(arg: int = 0) -> None: ... On the other hand, if an explicit value of "None" is allowed, the use of "Optional" is appropriate, whether the argument is optional or not. For example: def foo(arg: Optional[int] = None) -> None: ... typing.Callable Callable type; "Callable[[int], str]" is a function of (int) -> str. The subscription syntax must always be used with exactly two values: the argument list and the return type. The argument list must be a list of types or an ellipsis; the return type must be a single type. There is no syntax to indicate optional or keyword arguments; such function types are rarely used as callback types. "Callable[..., ReturnType]" (literal ellipsis) can be used to type hint a callable taking any number of arguments and returning "ReturnType". A plain "Callable" is equivalent to "Callable[..., Any]", and in turn to "collections.abc.Callable". Deprecated since version 3.9: "collections.abc.Callable" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.Type(Generic[CT_co]) A variable annotated with "C" may accept a value of type "C". In contrast, a variable annotated with "Type[C]" may accept values that are classes themselves – specifically, it will accept the *class object* of "C". For example: a = 3 # Has type 'int' b = int # Has type 'Type[int]' c = type(a) # Also has type 'Type[int]' Note that "Type[C]" is covariant: class User: ... class BasicUser(User): ... class ProUser(User): ... class TeamUser(User): ... # Accepts User, BasicUser, ProUser, TeamUser, ... def make_new_user(user_class: Type[User]) -> User: # ... return user_class() The fact that "Type[C]" is covariant implies that all subclasses of "C" should implement the same constructor signature and class method signatures as "C". The type checker should flag violations of this, but should also allow constructor calls in subclasses that match the constructor calls in the indicated base class. How the type checker is required to handle this particular case may change in future revisions of **PEP 484**. The only legal parameters for "Type" are classes, "Any", type variables, and unions of any of these types. For example: def new_non_team_user(user_class: Type[Union[BasicUser, ProUser]]): ... "Type[Any]" is equivalent to "Type" which in turn is equivalent to "type", which is the root of Python’s metaclass hierarchy. New in version 3.5.2. Deprecated since version 3.9: "builtins.type" now supports "[]". See **PEP 585** and Generic Alias Type. typing.Literal A type that can be used to indicate to type checkers that the corresponding variable or function parameter has a value equivalent to the provided literal (or one of several literals). For example: def validate_simple(data: Any) -> Literal[True]: # always returns True ... MODE = Literal['r', 'rb', 'w', 'wb'] def open_helper(file: str, mode: MODE) -> str: ... open_helper('/some/path', 'r') # Passes type check open_helper('/other/path', 'typo') # Error in type checker "Literal[...]" cannot be subclassed. At runtime, an arbitrary value is allowed as type argument to "Literal[...]", but type checkers may impose restrictions. See **PEP 586** for more details about literal types. New in version 3.8. Changed in version 3.9.1: "Literal" now de-duplicates parameters. Equality comparisons of "Literal" objects are no longer order dependent. "Literal" objects will now raise a "TypeError" exception during equality comparisons if one of their parameters are not *hashable*. typing.ClassVar Special type construct to mark class variables. As introduced in **PEP 526**, a variable annotation wrapped in ClassVar indicates that a given attribute is intended to be used as a class variable and should not be set on instances of that class. Usage: class Starship: stats: ClassVar[dict[str, int]] = {} # class variable damage: int = 10 # instance variable "ClassVar" accepts only types and cannot be further subscribed. "ClassVar" is not a class itself, and should not be used with "isinstance()" or "issubclass()". "ClassVar" does not change Python runtime behavior, but it can be used by third-party type checkers. For example, a type checker might flag the following code as an error: enterprise_d = Starship(3000) enterprise_d.stats = {} # Error, setting class variable on instance Starship.stats = {} # This is OK New in version 3.5.3. typing.Final A special typing construct to indicate to type checkers that a name cannot be re-assigned or overridden in a subclass. For example: MAX_SIZE: Final = 9000 MAX_SIZE += 1 # Error reported by type checker class Connection: TIMEOUT: Final[int] = 10 class FastConnector(Connection): TIMEOUT = 1 # Error reported by type checker There is no runtime checking of these properties. See **PEP 591** for more details. New in version 3.8. typing.Annotated A type, introduced in **PEP 593** ("Flexible function and variable annotations"), to decorate existing types with context-specific metadata (possibly multiple pieces of it, as "Annotated" is variadic). Specifically, a type "T" can be annotated with metadata "x" via the typehint "Annotated[T, x]". This metadata can be used for either static analysis or at runtime. If a library (or tool) encounters a typehint "Annotated[T, x]" and has no special logic for metadata "x", it should ignore it and simply treat the type as "T". Unlike the "no_type_check" functionality that currently exists in the "typing" module which completely disables typechecking annotations on a function or a class, the "Annotated" type allows for both static typechecking of "T" (which can safely ignore "x") together with runtime access to "x" within a specific application. Ultimately, the responsibility of how to interpret the annotations (if at all) is the responsibility of the tool or library encountering the "Annotated" type. A tool or library encountering an "Annotated" type can scan through the annotations to determine if they are of interest (e.g., using "isinstance()"). When a tool or a library does not support annotations or encounters an unknown annotation it should just ignore it and treat annotated type as the underlying type. It’s up to the tool consuming the annotations to decide whether the client is allowed to have several annotations on one type and how to merge those annotations. Since the "Annotated" type allows you to put several annotations of the same (or different) type(s) on any node, the tools or libraries consuming those annotations are in charge of dealing with potential duplicates. For example, if you are doing value range analysis you might allow this: T1 = Annotated[int, ValueRange(-10, 5)] T2 = Annotated[T1, ValueRange(-20, 3)] Passing "include_extras=True" to "get_type_hints()" lets one access the extra annotations at runtime. The details of the syntax: * The first argument to "Annotated" must be a valid type * Multiple type annotations are supported ("Annotated" supports variadic arguments): Annotated[int, ValueRange(3, 10), ctype("char")] * "Annotated" must be called with at least two arguments ( "Annotated[int]" is not valid) * The order of the annotations is preserved and matters for equality checks: Annotated[int, ValueRange(3, 10), ctype("char")] != Annotated[ int, ctype("char"), ValueRange(3, 10) ] * Nested "Annotated" types are flattened, with metadata ordered starting with the innermost annotation: Annotated[Annotated[int, ValueRange(3, 10)], ctype("char")] == Annotated[ int, ValueRange(3, 10), ctype("char") ] * Duplicated annotations are not removed: Annotated[int, ValueRange(3, 10)] != Annotated[ int, ValueRange(3, 10), ValueRange(3, 10) ] * "Annotated" can be used with nested and generic aliases: T = TypeVar('T') Vec = Annotated[list[tuple[T, T]], MaxLen(10)] V = Vec[int] V == Annotated[list[tuple[int, int]], MaxLen(10)] New in version 3.9. Building generic types ~~~~~~~~~~~~~~~~~~~~~~ These are not used in annotations. They are building blocks for creating generic types. class typing.Generic Abstract base class for generic types. A generic type is typically declared by inheriting from an instantiation of this class with one or more type variables. For example, a generic mapping type might be defined as: class Mapping(Generic[KT, VT]): def __getitem__(self, key: KT) -> VT: ... # Etc. This class can then be used as follows: X = TypeVar('X') Y = TypeVar('Y') def lookup_name(mapping: Mapping[X, Y], key: X, default: Y) -> Y: try: return mapping[key] except KeyError: return default class typing.TypeVar Type variable. Usage: T = TypeVar('T') # Can be anything S = TypeVar('S', bound=str) # Can be any subtype of str A = TypeVar('A', str, bytes) # Must be exactly str or bytes Type variables exist primarily for the benefit of static type checkers. They serve as the parameters for generic types as well as for generic function definitions. See "Generic" for more information on generic types. Generic functions work as follows: def repeat(x: T, n: int) -> Sequence[T]: """Return a list containing n references to x.""" return [x]*n def print_capitalized(x: S) -> S: """Print x capitalized, and return x.""" print(x.capitalize()) return x def concatenate(x: A, y: A) -> A: """Add two strings or bytes objects together.""" return x + y Note that type variables can be *bound*, *constrained*, or neither, but cannot be both bound *and* constrained. Constrained type variables and bound type variables have different semantics in several important ways. Using a *constrained* type variable means that the "TypeVar" can only ever be solved as being exactly one of the constraints given: a = concatenate('one', 'two') # Ok, variable 'a' has type 'str' b = concatenate(StringSubclass('one'), StringSubclass('two')) # Inferred type of variable 'b' is 'str', # despite 'StringSubclass' being passed in c = concatenate('one', b'two') # error: type variable 'A' can be either 'str' or 'bytes' in a function call, but not both Using a *bound* type variable, however, means that the "TypeVar" will be solved using the most specific type possible: print_capitalized('a string') # Ok, output has type 'str' class StringSubclass(str): pass print_capitalized(StringSubclass('another string')) # Ok, output has type 'StringSubclass' print_capitalized(45) # error: int is not a subtype of str Type variables can be bound to concrete types, abstract types (ABCs or protocols), and even unions of types: U = TypeVar('U', bound=str|bytes) # Can be any subtype of the union str|bytes V = TypeVar('V', bound=SupportsAbs) # Can be anything with an __abs__ method Bound type variables are particularly useful for annotating "classmethods" that serve as alternative constructors. In the following example (© Raymond Hettinger), the type variable "C" is bound to the "Circle" class through the use of a forward reference. Using this type variable to annotate the "with_circumference" classmethod, rather than hardcoding the return type as "Circle", means that a type checker can correctly infer the return type even if the method is called on a subclass: import math C = TypeVar('C', bound='Circle') class Circle: """An abstract circle""" def __init__(self, radius: float) -> None: self.radius = radius # Use a type variable to show that the return type # will always be an instance of whatever ``cls`` is @classmethod def with_circumference(cls: type[C], circumference: float) -> C: """Create a circle with the specified circumference""" radius = circumference / (math.pi * 2) return cls(radius) class Tire(Circle): """A specialised circle (made out of rubber)""" MATERIAL = 'rubber' c = Circle.with_circumference(3) # Ok, variable 'c' has type 'Circle' t = Tire.with_circumference(4) # Ok, variable 't' has type 'Tire' (not 'Circle') At runtime, "isinstance(x, T)" will raise "TypeError". In general, "isinstance()" and "issubclass()" should not be used with types. Type variables may be marked covariant or contravariant by passing "covariant=True" or "contravariant=True". See **PEP 484** for more details. By default, type variables are invariant. typing.AnyStr "AnyStr" is a "constrained type variable" defined as "AnyStr = TypeVar('AnyStr', str, bytes)". It is meant to be used for functions that may accept any kind of string without allowing different kinds of strings to mix. For example: def concat(a: AnyStr, b: AnyStr) -> AnyStr: return a + b concat(u"foo", u"bar") # Ok, output has type 'unicode' concat(b"foo", b"bar") # Ok, output has type 'bytes' concat(u"foo", b"bar") # Error, cannot mix unicode and bytes class typing.Protocol(Generic) Base class for protocol classes. Protocol classes are defined like this: class Proto(Protocol): def meth(self) -> int: ... Such classes are primarily used with static type checkers that recognize structural subtyping (static duck-typing), for example: class C: def meth(self) -> int: return 0 def func(x: Proto) -> int: return x.meth() func(C()) # Passes static type check See **PEP 544** for details. Protocol classes decorated with "runtime_checkable()" (described later) act as simple-minded runtime protocols that check only the presence of given attributes, ignoring their type signatures. Protocol classes can be generic, for example: class GenProto(Protocol[T]): def meth(self) -> T: ... New in version 3.8. @typing.runtime_checkable Mark a protocol class as a runtime protocol. Such a protocol can be used with "isinstance()" and "issubclass()". This raises "TypeError" when applied to a non-protocol class. This allows a simple-minded structural check, very similar to “one trick ponies” in "collections.abc" such as "Iterable". For example: @runtime_checkable class Closable(Protocol): def close(self): ... assert isinstance(open('/some/file'), Closable) Note: "runtime_checkable()" will check only the presence of the required methods, not their type signatures! For example, "builtins.complex" implements "__float__()", therefore it passes an "issubclass()" check against "SupportsFloat". However, the "complex.__float__" method exists only to raise a "TypeError" with a more informative message. New in version 3.8. Other special directives ~~~~~~~~~~~~~~~~~~~~~~~~ These are not used in annotations. They are building blocks for declaring types. class typing.NamedTuple Typed version of "collections.namedtuple()". Usage: class Employee(NamedTuple): name: str id: int This is equivalent to: Employee = collections.namedtuple('Employee', ['name', 'id']) To give a field a default value, you can assign to it in the class body: class Employee(NamedTuple): name: str id: int = 3 employee = Employee('Guido') assert employee.id == 3 Fields with a default value must come after any fields without a default. The resulting class has an extra attribute "__annotations__" giving a dict that maps the field names to the field types. (The field names are in the "_fields" attribute and the default values are in the "_field_defaults" attribute, both of which are part of the "namedtuple()" API.) "NamedTuple" subclasses can also have docstrings and methods: class Employee(NamedTuple): """Represents an employee.""" name: str id: int = 3 def __repr__(self) -> str: return f'' Backward-compatible usage: Employee = NamedTuple('Employee', [('name', str), ('id', int)]) Changed in version 3.6: Added support for **PEP 526** variable annotation syntax. Changed in version 3.6.1: Added support for default values, methods, and docstrings. Changed in version 3.8: The "_field_types" and "__annotations__" attributes are now regular dictionaries instead of instances of "OrderedDict". Changed in version 3.9: Removed the "_field_types" attribute in favor of the more standard "__annotations__" attribute which has the same information. typing.NewType(name, tp) A helper function to indicate a distinct type to a typechecker, see NewType. At runtime it returns a function that returns its argument. Usage: UserId = NewType('UserId', int) first_user = UserId(1) New in version 3.5.2. class typing.TypedDict(dict) Special construct to add type hints to a dictionary. At runtime it is a plain "dict". "TypedDict" declares a dictionary type that expects all of its instances to have a certain set of keys, where each key is associated with a value of a consistent type. This expectation is not checked at runtime but is only enforced by type checkers. Usage: class Point2D(TypedDict): x: int y: int label: str a: Point2D = {'x': 1, 'y': 2, 'label': 'good'} # OK b: Point2D = {'z': 3, 'label': 'bad'} # Fails type check assert Point2D(x=1, y=2, label='first') == dict(x=1, y=2, label='first') To allow using this feature with older versions of Python that do not support **PEP 526**, "TypedDict" supports two additional equivalent syntactic forms: * Using a literal "dict" as the second argument: Point2D = TypedDict('Point2D', {'x': int, 'y': int, 'label': str}) * Using keyword arguments: Point2D = TypedDict('Point2D', x=int, y=int, label=str) The functional syntax should also be used when any of the keys are not valid identifiers, for example because they are keywords or contain hyphens. Example: # raises SyntaxError class Point2D(TypedDict): in: int # 'in' is a keyword x-y: int # name with hyphens # OK, functional syntax Point2D = TypedDict('Point2D', {'in': int, 'x-y': int}) By default, all keys must be present in a "TypedDict". It is possible to override this by specifying totality. Usage: class Point2D(TypedDict, total=False): x: int y: int # Alternative syntax Point2D = TypedDict('Point2D', {'x': int, 'y': int}, total=False) This means that a "Point2D" "TypedDict" can have any of the keys omitted. A type checker is only expected to support a literal "False" or "True" as the value of the "total" argument. "True" is the default, and makes all items defined in the class body required. It is possible for a "TypedDict" type to inherit from one or more other "TypedDict" types using the class-based syntax. Usage: class Point3D(Point2D): z: int "Point3D" has three items: "x", "y" and "z". It is equivalent to this definition: class Point3D(TypedDict): x: int y: int z: int A "TypedDict" cannot inherit from a non-"TypedDict" class, notably including "Generic". For example: class X(TypedDict): x: int class Y(TypedDict): y: int class Z(object): pass # A non-TypedDict class class XY(X, Y): pass # OK class XZ(X, Z): pass # raises TypeError T = TypeVar('T') class XT(X, Generic[T]): pass # raises TypeError A "TypedDict" can be introspected via "__annotations__", "__total__", "__required_keys__", and "__optional_keys__". __total__ "Point2D.__total__" gives the value of the "total" argument. Example: >>> from typing import TypedDict >>> class Point2D(TypedDict): pass >>> Point2D.__total__ True >>> class Point2D(TypedDict, total=False): pass >>> Point2D.__total__ False >>> class Point3D(Point2D): pass >>> Point3D.__total__ True __required_keys__ __optional_keys__ "Point2D.__required_keys__" and "Point2D.__optional_keys__" return "frozenset" objects containing required and non-required keys, respectively. Currently the only way to declare both required and non-required keys in the same "TypedDict" is mixed inheritance, declaring a "TypedDict" with one value for the "total" argument and then inheriting it from another "TypedDict" with a different value for "total". Usage: >>> class Point2D(TypedDict, total=False): ... x: int ... y: int ... >>> class Point3D(Point2D): ... z: int ... >>> Point3D.__required_keys__ == frozenset({'z'}) True >>> Point3D.__optional_keys__ == frozenset({'x', 'y'}) True See **PEP 589** for more examples and detailed rules of using "TypedDict". New in version 3.8. Generic concrete collections ---------------------------- Corresponding to built-in types ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ class typing.Dict(dict, MutableMapping[KT, VT]) A generic version of "dict". Useful for annotating return types. To annotate arguments it is preferred to use an abstract collection type such as "Mapping". This type can be used as follows: def count_words(text: str) -> Dict[str, int]: ... Deprecated since version 3.9: "builtins.dict" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.List(list, MutableSequence[T]) Generic version of "list". Useful for annotating return types. To annotate arguments it is preferred to use an abstract collection type such as "Sequence" or "Iterable". This type may be used as follows: T = TypeVar('T', int, float) def vec2(x: T, y: T) -> List[T]: return [x, y] def keep_positives(vector: Sequence[T]) -> List[T]: return [item for item in vector if item > 0] Deprecated since version 3.9: "builtins.list" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.Set(set, MutableSet[T]) A generic version of "builtins.set". Useful for annotating return types. To annotate arguments it is preferred to use an abstract collection type such as "AbstractSet". Deprecated since version 3.9: "builtins.set" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.FrozenSet(frozenset, AbstractSet[T_co]) A generic version of "builtins.frozenset". Deprecated since version 3.9: "builtins.frozenset" now supports "[]". See **PEP 585** and Generic Alias Type. Note: "Tuple" is a special form. Corresponding to types in "collections" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ class typing.DefaultDict(collections.defaultdict, MutableMapping[KT, VT]) A generic version of "collections.defaultdict". New in version 3.5.2. Deprecated since version 3.9: "collections.defaultdict" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.OrderedDict(collections.OrderedDict, MutableMapping[KT, VT]) A generic version of "collections.OrderedDict". New in version 3.7.2. Deprecated since version 3.9: "collections.OrderedDict" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.ChainMap(collections.ChainMap, MutableMapping[KT, VT]) A generic version of "collections.ChainMap". New in version 3.5.4. New in version 3.6.1. Deprecated since version 3.9: "collections.ChainMap" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.Counter(collections.Counter, Dict[T, int]) A generic version of "collections.Counter". New in version 3.5.4. New in version 3.6.1. Deprecated since version 3.9: "collections.Counter" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.Deque(deque, MutableSequence[T]) A generic version of "collections.deque". New in version 3.5.4. New in version 3.6.1. Deprecated since version 3.9: "collections.deque" now supports "[]". See **PEP 585** and Generic Alias Type. Other concrete types ~~~~~~~~~~~~~~~~~~~~ class typing.IO class typing.TextIO class typing.BinaryIO Generic type "IO[AnyStr]" and its subclasses "TextIO(IO[str])" and "BinaryIO(IO[bytes])" represent the types of I/O streams such as returned by "open()". Deprecated since version 3.8, will be removed in version 3.12: These types are also in the "typing.io" namespace, which was never supported by type checkers and will be removed. class typing.Pattern class typing.Match These type aliases correspond to the return types from "re.compile()" and "re.match()". These types (and the corresponding functions) are generic in "AnyStr" and can be made specific by writing "Pattern[str]", "Pattern[bytes]", "Match[str]", or "Match[bytes]". Deprecated since version 3.8, will be removed in version 3.12: These types are also in the "typing.re" namespace, which was never supported by type checkers and will be removed. Deprecated since version 3.9: Classes "Pattern" and "Match" from "re" now support "[]". See **PEP 585** and Generic Alias Type. class typing.Text "Text" is an alias for "str". It is provided to supply a forward compatible path for Python 2 code: in Python 2, "Text" is an alias for "unicode". Use "Text" to indicate that a value must contain a unicode string in a manner that is compatible with both Python 2 and Python 3: def add_unicode_checkmark(text: Text) -> Text: return text + u' \u2713' New in version 3.5.2. Abstract Base Classes --------------------- Corresponding to collections in "collections.abc" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ class typing.AbstractSet(Sized, Collection[T_co]) A generic version of "collections.abc.Set". Deprecated since version 3.9: "collections.abc.Set" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.ByteString(Sequence[int]) A generic version of "collections.abc.ByteString". This type represents the types "bytes", "bytearray", and "memoryview" of byte sequences. As a shorthand for this type, "bytes" can be used to annotate arguments of any of the types mentioned above. Deprecated since version 3.9: "collections.abc.ByteString" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.Collection(Sized, Iterable[T_co], Container[T_co]) A generic version of "collections.abc.Collection" New in version 3.6.0. Deprecated since version 3.9: "collections.abc.Collection" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.Container(Generic[T_co]) A generic version of "collections.abc.Container". Deprecated since version 3.9: "collections.abc.Container" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.ItemsView(MappingView, Generic[KT_co, VT_co]) A generic version of "collections.abc.ItemsView". Deprecated since version 3.9: "collections.abc.ItemsView" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.KeysView(MappingView[KT_co], AbstractSet[KT_co]) A generic version of "collections.abc.KeysView". Deprecated since version 3.9: "collections.abc.KeysView" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.Mapping(Sized, Collection[KT], Generic[VT_co]) A generic version of "collections.abc.Mapping". This type can be used as follows: def get_position_in_index(word_list: Mapping[str, int], word: str) -> int: return word_list[word] Deprecated since version 3.9: "collections.abc.Mapping" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.MappingView(Sized, Iterable[T_co]) A generic version of "collections.abc.MappingView". Deprecated since version 3.9: "collections.abc.MappingView" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.MutableMapping(Mapping[KT, VT]) A generic version of "collections.abc.MutableMapping". Deprecated since version 3.9: "collections.abc.MutableMapping" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.MutableSequence(Sequence[T]) A generic version of "collections.abc.MutableSequence". Deprecated since version 3.9: "collections.abc.MutableSequence" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.MutableSet(AbstractSet[T]) A generic version of "collections.abc.MutableSet". Deprecated since version 3.9: "collections.abc.MutableSet" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.Sequence(Reversible[T_co], Collection[T_co]) A generic version of "collections.abc.Sequence". Deprecated since version 3.9: "collections.abc.Sequence" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.ValuesView(MappingView[VT_co]) A generic version of "collections.abc.ValuesView". Deprecated since version 3.9: "collections.abc.ValuesView" now supports "[]". See **PEP 585** and Generic Alias Type. Corresponding to other types in "collections.abc" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ class typing.Iterable(Generic[T_co]) A generic version of "collections.abc.Iterable". Deprecated since version 3.9: "collections.abc.Iterable" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.Iterator(Iterable[T_co]) A generic version of "collections.abc.Iterator". Deprecated since version 3.9: "collections.abc.Iterator" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.Generator(Iterator[T_co], Generic[T_co, T_contra, V_co]) A generator can be annotated by the generic type "Generator[YieldType, SendType, ReturnType]". For example: def echo_round() -> Generator[int, float, str]: sent = yield 0 while sent >= 0: sent = yield round(sent) return 'Done' Note that unlike many other generics in the typing module, the "SendType" of "Generator" behaves contravariantly, not covariantly or invariantly. If your generator will only yield values, set the "SendType" and "ReturnType" to "None": def infinite_stream(start: int) -> Generator[int, None, None]: while True: yield start start += 1 Alternatively, annotate your generator as having a return type of either "Iterable[YieldType]" or "Iterator[YieldType]": def infinite_stream(start: int) -> Iterator[int]: while True: yield start start += 1 Deprecated since version 3.9: "collections.abc.Generator" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.Hashable An alias to "collections.abc.Hashable". class typing.Reversible(Iterable[T_co]) A generic version of "collections.abc.Reversible". Deprecated since version 3.9: "collections.abc.Reversible" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.Sized An alias to "collections.abc.Sized". Asynchronous programming ~~~~~~~~~~~~~~~~~~~~~~~~ class typing.Coroutine(Awaitable[V_co], Generic[T_co, T_contra, V_co]) A generic version of "collections.abc.Coroutine". The variance and order of type variables correspond to those of "Generator", for example: from collections.abc import Coroutine c: Coroutine[list[str], str, int] # Some coroutine defined elsewhere x = c.send('hi') # Inferred type of 'x' is list[str] async def bar() -> None: y = await c # Inferred type of 'y' is int New in version 3.5.3. Deprecated since version 3.9: "collections.abc.Coroutine" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.AsyncGenerator(AsyncIterator[T_co], Generic[T_co, T_contra]) An async generator can be annotated by the generic type "AsyncGenerator[YieldType, SendType]". For example: async def echo_round() -> AsyncGenerator[int, float]: sent = yield 0 while sent >= 0.0: rounded = await round(sent) sent = yield rounded Unlike normal generators, async generators cannot return a value, so there is no "ReturnType" type parameter. As with "Generator", the "SendType" behaves contravariantly. If your generator will only yield values, set the "SendType" to "None": async def infinite_stream(start: int) -> AsyncGenerator[int, None]: while True: yield start start = await increment(start) Alternatively, annotate your generator as having a return type of either "AsyncIterable[YieldType]" or "AsyncIterator[YieldType]": async def infinite_stream(start: int) -> AsyncIterator[int]: while True: yield start start = await increment(start) New in version 3.6.1. Deprecated since version 3.9: "collections.abc.AsyncGenerator" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.AsyncIterable(Generic[T_co]) A generic version of "collections.abc.AsyncIterable". New in version 3.5.2. Deprecated since version 3.9: "collections.abc.AsyncIterable" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.AsyncIterator(AsyncIterable[T_co]) A generic version of "collections.abc.AsyncIterator". New in version 3.5.2. Deprecated since version 3.9: "collections.abc.AsyncIterator" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.Awaitable(Generic[T_co]) A generic version of "collections.abc.Awaitable". New in version 3.5.2. Deprecated since version 3.9: "collections.abc.Awaitable" now supports "[]". See **PEP 585** and Generic Alias Type. Context manager types ~~~~~~~~~~~~~~~~~~~~~ class typing.ContextManager(Generic[T_co]) A generic version of "contextlib.AbstractContextManager". New in version 3.5.4. New in version 3.6.0. Deprecated since version 3.9: "contextlib.AbstractContextManager" now supports "[]". See **PEP 585** and Generic Alias Type. class typing.AsyncContextManager(Generic[T_co]) A generic version of "contextlib.AbstractAsyncContextManager". New in version 3.5.4. New in version 3.6.2. Deprecated since version 3.9: "contextlib.AbstractAsyncContextManager" now supports "[]". See **PEP 585** and Generic Alias Type. Protocols --------- These protocols are decorated with "runtime_checkable()". class typing.SupportsAbs An ABC with one abstract method "__abs__" that is covariant in its return type. class typing.SupportsBytes An ABC with one abstract method "__bytes__". class typing.SupportsComplex An ABC with one abstract method "__complex__". class typing.SupportsFloat An ABC with one abstract method "__float__". class typing.SupportsIndex An ABC with one abstract method "__index__". New in version 3.8. class typing.SupportsInt An ABC with one abstract method "__int__". class typing.SupportsRound An ABC with one abstract method "__round__" that is covariant in its return type. Functions and decorators ------------------------ typing.cast(typ, val) Cast a value to a type. This returns the value unchanged. To the type checker this signals that the return value has the designated type, but at runtime we intentionally don’t check anything (we want this to be as fast as possible). @typing.overload The "@overload" decorator allows describing functions and methods that support multiple different combinations of argument types. A series of "@overload"-decorated definitions must be followed by exactly one non-"@overload"-decorated definition (for the same function/method). The "@overload"-decorated definitions are for the benefit of the type checker only, since they will be overwritten by the non-"@overload"-decorated definition, while the latter is used at runtime but should be ignored by a type checker. At runtime, calling a "@overload"-decorated function directly will raise "NotImplementedError". An example of overload that gives a more precise type than can be expressed using a union or a type variable: @overload def process(response: None) -> None: ... @overload def process(response: int) -> tuple[int, str]: ... @overload def process(response: bytes) -> str: ... def process(response): See **PEP 484** for details and comparison with other typing semantics. @typing.final A decorator to indicate to type checkers that the decorated method cannot be overridden, and the decorated class cannot be subclassed. For example: class Base: @final def done(self) -> None: ... class Sub(Base): def done(self) -> None: # Error reported by type checker ... @final class Leaf: ... class Other(Leaf): # Error reported by type checker ... There is no runtime checking of these properties. See **PEP 591** for more details. New in version 3.8. @typing.no_type_check Decorator to indicate that annotations are not type hints. This works as class or function *decorator*. With a class, it applies recursively to all methods defined in that class (but not to methods defined in its superclasses or subclasses). This mutates the function(s) in place. @typing.no_type_check_decorator Decorator to give another decorator the "no_type_check()" effect. This wraps the decorator with something that wraps the decorated function in "no_type_check()". @typing.type_check_only Decorator to mark a class or function to be unavailable at runtime. This decorator is itself not available at runtime. It is mainly intended to mark classes that are defined in type stub files if an implementation returns an instance of a private class: @type_check_only class Response: # private or not available at runtime code: int def get_header(self, name: str) -> str: ... def fetch_response() -> Response: ... Note that returning instances of private classes is not recommended. It is usually preferable to make such classes public. Introspection helpers --------------------- typing.get_type_hints(obj, globalns=None, localns=None, include_extras=False) Return a dictionary containing type hints for a function, method, module or class object. This is often the same as "obj.__annotations__". In addition, forward references encoded as string literals are handled by evaluating them in "globals" and "locals" namespaces. If necessary, "Optional[t]" is added for function and method annotations if a default value equal to "None" is set. For a class "C", return a dictionary constructed by merging all the "__annotations__" along "C.__mro__" in reverse order. The function recursively replaces all "Annotated[T, ...]" with "T", unless "include_extras" is set to "True" (see "Annotated" for more information). For example: class Student(NamedTuple): name: Annotated[str, 'some marker'] get_type_hints(Student) == {'name': str} get_type_hints(Student, include_extras=False) == {'name': str} get_type_hints(Student, include_extras=True) == { 'name': Annotated[str, 'some marker'] } Changed in version 3.9: Added "include_extras" parameter as part of **PEP 593**. typing.get_args(tp) typing.get_origin(tp) Provide basic introspection for generic types and special typing forms. For a typing object of the form "X[Y, Z, ...]" these functions return "X" and "(Y, Z, ...)". If "X" is a generic alias for a builtin or "collections" class, it gets normalized to the original class. If "X" is a "Union" or "Literal" contained in another generic type, the order of "(Y, Z, ...)" may be different from the order of the original arguments "[Y, Z, ...]" due to type caching. For unsupported objects return "None" and "()" correspondingly. Examples: assert get_origin(Dict[str, int]) is dict assert get_args(Dict[int, str]) == (int, str) assert get_origin(Union[int, str]) is Union assert get_args(Union[int, str]) == (int, str) New in version 3.8. class typing.ForwardRef A class used for internal typing representation of string forward references. For example, "List["SomeClass"]" is implicitly transformed into "List[ForwardRef("SomeClass")]". This class should not be instantiated by a user, but may be used by introspection tools. Note: **PEP 585** generic types such as "list["SomeClass"]" will not be implicitly transformed into "list[ForwardRef("SomeClass")]" and thus will not automatically resolve to "list[SomeClass]". New in version 3.7.4. Constant -------- typing.TYPE_CHECKING A special constant that is assumed to be "True" by 3rd party static type checkers. It is "False" at runtime. Usage: if TYPE_CHECKING: import expensive_mod def fun(arg: 'expensive_mod.SomeType') -> None: local_var: expensive_mod.AnotherType = other_fun() The first type annotation must be enclosed in quotes, making it a “forward reference”, to hide the "expensive_mod" reference from the interpreter runtime. Type annotations for local variables are not evaluated, so the second annotation does not need to be enclosed in quotes. Note: If "from __future__ import annotations" is used, annotations are not evaluated at function definition time. Instead, they are stored as strings in "__annotations__". This makes it unnecessary to use quotes around the annotation (see **PEP 563**). New in version 3.5.2. "unicodedata" — Unicode Database ******************************** ====================================================================== This module provides access to the Unicode Character Database (UCD) which defines character properties for all Unicode characters. The data contained in this database is compiled from the UCD version 13.0.0. The module uses the same names and symbols as defined by Unicode Standard Annex #44, “Unicode Character Database”. It defines the following functions: unicodedata.lookup(name) Look up character by name. If a character with the given name is found, return the corresponding character. If not found, "KeyError" is raised. Changed in version 3.3: Support for name aliases [1] and named sequences [2] has been added. unicodedata.name(chr[, default]) Returns the name assigned to the character *chr* as a string. If no name is defined, *default* is returned, or, if not given, "ValueError" is raised. unicodedata.decimal(chr[, default]) Returns the decimal value assigned to the character *chr* as integer. If no such value is defined, *default* is returned, or, if not given, "ValueError" is raised. unicodedata.digit(chr[, default]) Returns the digit value assigned to the character *chr* as integer. If no such value is defined, *default* is returned, or, if not given, "ValueError" is raised. unicodedata.numeric(chr[, default]) Returns the numeric value assigned to the character *chr* as float. If no such value is defined, *default* is returned, or, if not given, "ValueError" is raised. unicodedata.category(chr) Returns the general category assigned to the character *chr* as string. unicodedata.bidirectional(chr) Returns the bidirectional class assigned to the character *chr* as string. If no such value is defined, an empty string is returned. unicodedata.combining(chr) Returns the canonical combining class assigned to the character *chr* as integer. Returns "0" if no combining class is defined. unicodedata.east_asian_width(chr) Returns the east asian width assigned to the character *chr* as string. unicodedata.mirrored(chr) Returns the mirrored property assigned to the character *chr* as integer. Returns "1" if the character has been identified as a “mirrored” character in bidirectional text, "0" otherwise. unicodedata.decomposition(chr) Returns the character decomposition mapping assigned to the character *chr* as string. An empty string is returned in case no such mapping is defined. unicodedata.normalize(form, unistr) Return the normal form *form* for the Unicode string *unistr*. Valid values for *form* are ‘NFC’, ‘NFKC’, ‘NFD’, and ‘NFKD’. The Unicode standard defines various normalization forms of a Unicode string, based on the definition of canonical equivalence and compatibility equivalence. In Unicode, several characters can be expressed in various way. For example, the character U+00C7 (LATIN CAPITAL LETTER C WITH CEDILLA) can also be expressed as the sequence U+0043 (LATIN CAPITAL LETTER C) U+0327 (COMBINING CEDILLA). For each character, there are two normal forms: normal form C and normal form D. Normal form D (NFD) is also known as canonical decomposition, and translates each character into its decomposed form. Normal form C (NFC) first applies a canonical decomposition, then composes pre-combined characters again. In addition to these two forms, there are two additional normal forms based on compatibility equivalence. In Unicode, certain characters are supported which normally would be unified with other characters. For example, U+2160 (ROMAN NUMERAL ONE) is really the same thing as U+0049 (LATIN CAPITAL LETTER I). However, it is supported in Unicode for compatibility with existing character sets (e.g. gb2312). The normal form KD (NFKD) will apply the compatibility decomposition, i.e. replace all compatibility characters with their equivalents. The normal form KC (NFKC) first applies the compatibility decomposition, followed by the canonical composition. Even if two unicode strings are normalized and look the same to a human reader, if one has combining characters and the other doesn’t, they may not compare equal. unicodedata.is_normalized(form, unistr) Return whether the Unicode string *unistr* is in the normal form *form*. Valid values for *form* are ‘NFC’, ‘NFKC’, ‘NFD’, and ‘NFKD’. New in version 3.8. In addition, the module exposes the following constant: unicodedata.unidata_version The version of the Unicode database used in this module. unicodedata.ucd_3_2_0 This is an object that has the same methods as the entire module, but uses the Unicode database version 3.2 instead, for applications that require this specific version of the Unicode database (such as IDNA). Examples: >>> import unicodedata >>> unicodedata.lookup('LEFT CURLY BRACKET') '{' >>> unicodedata.name('/') 'SOLIDUS' >>> unicodedata.decimal('9') 9 >>> unicodedata.decimal('a') Traceback (most recent call last): File "", line 1, in ValueError: not a decimal >>> unicodedata.category('A') # 'L'etter, 'u'ppercase 'Lu' >>> unicodedata.bidirectional('\u0660') # 'A'rabic, 'N'umber 'AN' -[ Footnotes ]- [1] https://www.unicode.org/Public/13.0.0/ucd/NameAliases.txt [2] https://www.unicode.org/Public/13.0.0/ucd/NamedSequences.txt "unittest.mock" — getting started ********************************* New in version 3.3. Using Mock ========== Mock Patching Methods --------------------- Common uses for "Mock" objects include: * Patching methods * Recording method calls on objects You might want to replace a method on an object to check that it is called with the correct arguments by another part of the system: >>> real = SomeClass() >>> real.method = MagicMock(name='method') >>> real.method(3, 4, 5, key='value') Once our mock has been used ("real.method" in this example) it has methods and attributes that allow you to make assertions about how it has been used. Note: In most of these examples the "Mock" and "MagicMock" classes are interchangeable. As the "MagicMock" is the more capable class it makes a sensible one to use by default. Once the mock has been called its "called" attribute is set to "True". More importantly we can use the "assert_called_with()" or "assert_called_once_with()" method to check that it was called with the correct arguments. This example tests that calling "ProductionClass().method" results in a call to the "something" method: >>> class ProductionClass: ... def method(self): ... self.something(1, 2, 3) ... def something(self, a, b, c): ... pass ... >>> real = ProductionClass() >>> real.something = MagicMock() >>> real.method() >>> real.something.assert_called_once_with(1, 2, 3) Mock for Method Calls on an Object ---------------------------------- In the last example we patched a method directly on an object to check that it was called correctly. Another common use case is to pass an object into a method (or some part of the system under test) and then check that it is used in the correct way. The simple "ProductionClass" below has a "closer" method. If it is called with an object then it calls "close" on it. >>> class ProductionClass: ... def closer(self, something): ... something.close() ... So to test it we need to pass in an object with a "close" method and check that it was called correctly. >>> real = ProductionClass() >>> mock = Mock() >>> real.closer(mock) >>> mock.close.assert_called_with() We don’t have to do any work to provide the ‘close’ method on our mock. Accessing close creates it. So, if ‘close’ hasn’t already been called then accessing it in the test will create it, but "assert_called_with()" will raise a failure exception. Mocking Classes --------------- A common use case is to mock out classes instantiated by your code under test. When you patch a class, then that class is replaced with a mock. Instances are created by *calling the class*. This means you access the “mock instance” by looking at the return value of the mocked class. In the example below we have a function "some_function" that instantiates "Foo" and calls a method on it. The call to "patch()" replaces the class "Foo" with a mock. The "Foo" instance is the result of calling the mock, so it is configured by modifying the mock "return_value". >>> def some_function(): ... instance = module.Foo() ... return instance.method() ... >>> with patch('module.Foo') as mock: ... instance = mock.return_value ... instance.method.return_value = 'the result' ... result = some_function() ... assert result == 'the result' Naming your mocks ----------------- It can be useful to give your mocks a name. The name is shown in the repr of the mock and can be helpful when the mock appears in test failure messages. The name is also propagated to attributes or methods of the mock: >>> mock = MagicMock(name='foo') >>> mock >>> mock.method Tracking all Calls ------------------ Often you want to track more than a single call to a method. The "mock_calls" attribute records all calls to child attributes of the mock - and also to their children. >>> mock = MagicMock() >>> mock.method() >>> mock.attribute.method(10, x=53) >>> mock.mock_calls [call.method(), call.attribute.method(10, x=53)] If you make an assertion about "mock_calls" and any unexpected methods have been called, then the assertion will fail. This is useful because as well as asserting that the calls you expected have been made, you are also checking that they were made in the right order and with no additional calls: You use the "call" object to construct lists for comparing with "mock_calls": >>> expected = [call.method(), call.attribute.method(10, x=53)] >>> mock.mock_calls == expected True However, parameters to calls that return mocks are not recorded, which means it is not possible to track nested calls where the parameters used to create ancestors are important: >>> m = Mock() >>> m.factory(important=True).deliver() >>> m.mock_calls[-1] == call.factory(important=False).deliver() True Setting Return Values and Attributes ------------------------------------ Setting the return values on a mock object is trivially easy: >>> mock = Mock() >>> mock.return_value = 3 >>> mock() 3 Of course you can do the same for methods on the mock: >>> mock = Mock() >>> mock.method.return_value = 3 >>> mock.method() 3 The return value can also be set in the constructor: >>> mock = Mock(return_value=3) >>> mock() 3 If you need an attribute setting on your mock, just do it: >>> mock = Mock() >>> mock.x = 3 >>> mock.x 3 Sometimes you want to mock up a more complex situation, like for example "mock.connection.cursor().execute("SELECT 1")". If we wanted this call to return a list, then we have to configure the result of the nested call. We can use "call" to construct the set of calls in a “chained call” like this for easy assertion afterwards: >>> mock = Mock() >>> cursor = mock.connection.cursor.return_value >>> cursor.execute.return_value = ['foo'] >>> mock.connection.cursor().execute("SELECT 1") ['foo'] >>> expected = call.connection.cursor().execute("SELECT 1").call_list() >>> mock.mock_calls [call.connection.cursor(), call.connection.cursor().execute('SELECT 1')] >>> mock.mock_calls == expected True It is the call to ".call_list()" that turns our call object into a list of calls representing the chained calls. Raising exceptions with mocks ----------------------------- A useful attribute is "side_effect". If you set this to an exception class or instance then the exception will be raised when the mock is called. >>> mock = Mock(side_effect=Exception('Boom!')) >>> mock() Traceback (most recent call last): ... Exception: Boom! Side effect functions and iterables ----------------------------------- "side_effect" can also be set to a function or an iterable. The use case for "side_effect" as an iterable is where your mock is going to be called several times, and you want each call to return a different value. When you set "side_effect" to an iterable every call to the mock returns the next value from the iterable: >>> mock = MagicMock(side_effect=[4, 5, 6]) >>> mock() 4 >>> mock() 5 >>> mock() 6 For more advanced use cases, like dynamically varying the return values depending on what the mock is called with, "side_effect" can be a function. The function will be called with the same arguments as the mock. Whatever the function returns is what the call returns: >>> vals = {(1, 2): 1, (2, 3): 2} >>> def side_effect(*args): ... return vals[args] ... >>> mock = MagicMock(side_effect=side_effect) >>> mock(1, 2) 1 >>> mock(2, 3) 2 Mocking asynchronous iterators ------------------------------ Since Python 3.8, "AsyncMock" and "MagicMock" have support to mock Asynchronous Iterators through "__aiter__". The "return_value" attribute of "__aiter__" can be used to set the return values to be used for iteration. >>> mock = MagicMock() # AsyncMock also works here >>> mock.__aiter__.return_value = [1, 2, 3] >>> async def main(): ... return [i async for i in mock] ... >>> asyncio.run(main()) [1, 2, 3] Mocking asynchronous context manager ------------------------------------ Since Python 3.8, "AsyncMock" and "MagicMock" have support to mock Asynchronous Context Managers through "__aenter__" and "__aexit__". By default, "__aenter__" and "__aexit__" are "AsyncMock" instances that return an async function. >>> class AsyncContextManager: ... async def __aenter__(self): ... return self ... async def __aexit__(self, exc_type, exc, tb): ... pass ... >>> mock_instance = MagicMock(AsyncContextManager()) # AsyncMock also works here >>> async def main(): ... async with mock_instance as result: ... pass ... >>> asyncio.run(main()) >>> mock_instance.__aenter__.assert_awaited_once() >>> mock_instance.__aexit__.assert_awaited_once() Creating a Mock from an Existing Object --------------------------------------- One problem with over use of mocking is that it couples your tests to the implementation of your mocks rather than your real code. Suppose you have a class that implements "some_method". In a test for another class, you provide a mock of this object that *also* provides "some_method". If later you refactor the first class, so that it no longer has "some_method" - then your tests will continue to pass even though your code is now broken! "Mock" allows you to provide an object as a specification for the mock, using the *spec* keyword argument. Accessing methods / attributes on the mock that don’t exist on your specification object will immediately raise an attribute error. If you change the implementation of your specification, then tests that use that class will start failing immediately without you having to instantiate the class in those tests. >>> mock = Mock(spec=SomeClass) >>> mock.old_method() Traceback (most recent call last): ... AttributeError: object has no attribute 'old_method' Using a specification also enables a smarter matching of calls made to the mock, regardless of whether some parameters were passed as positional or named arguments: >>> def f(a, b, c): pass ... >>> mock = Mock(spec=f) >>> mock(1, 2, 3) >>> mock.assert_called_with(a=1, b=2, c=3) If you want this smarter matching to also work with method calls on the mock, you can use auto-speccing. If you want a stronger form of specification that prevents the setting of arbitrary attributes as well as the getting of them then you can use *spec_set* instead of *spec*. Patch Decorators ================ Note: With "patch()" it matters that you patch objects in the namespace where they are looked up. This is normally straightforward, but for a quick guide read where to patch. A common need in tests is to patch a class attribute or a module attribute, for example patching a builtin or patching a class in a module to test that it is instantiated. Modules and classes are effectively global, so patching on them has to be undone after the test or the patch will persist into other tests and cause hard to diagnose problems. mock provides three convenient decorators for this: "patch()", "patch.object()" and "patch.dict()". "patch" takes a single string, of the form "package.module.Class.attribute" to specify the attribute you are patching. It also optionally takes a value that you want the attribute (or class or whatever) to be replaced with. ‘patch.object’ takes an object and the name of the attribute you would like patched, plus optionally the value to patch it with. "patch.object": >>> original = SomeClass.attribute >>> @patch.object(SomeClass, 'attribute', sentinel.attribute) ... def test(): ... assert SomeClass.attribute == sentinel.attribute ... >>> test() >>> assert SomeClass.attribute == original >>> @patch('package.module.attribute', sentinel.attribute) ... def test(): ... from package.module import attribute ... assert attribute is sentinel.attribute ... >>> test() If you are patching a module (including "builtins") then use "patch()" instead of "patch.object()": >>> mock = MagicMock(return_value=sentinel.file_handle) >>> with patch('builtins.open', mock): ... handle = open('filename', 'r') ... >>> mock.assert_called_with('filename', 'r') >>> assert handle == sentinel.file_handle, "incorrect file handle returned" The module name can be ‘dotted’, in the form "package.module" if needed: >>> @patch('package.module.ClassName.attribute', sentinel.attribute) ... def test(): ... from package.module import ClassName ... assert ClassName.attribute == sentinel.attribute ... >>> test() A nice pattern is to actually decorate test methods themselves: >>> class MyTest(unittest.TestCase): ... @patch.object(SomeClass, 'attribute', sentinel.attribute) ... def test_something(self): ... self.assertEqual(SomeClass.attribute, sentinel.attribute) ... >>> original = SomeClass.attribute >>> MyTest('test_something').test_something() >>> assert SomeClass.attribute == original If you want to patch with a Mock, you can use "patch()" with only one argument (or "patch.object()" with two arguments). The mock will be created for you and passed into the test function / method: >>> class MyTest(unittest.TestCase): ... @patch.object(SomeClass, 'static_method') ... def test_something(self, mock_method): ... SomeClass.static_method() ... mock_method.assert_called_with() ... >>> MyTest('test_something').test_something() You can stack up multiple patch decorators using this pattern: >>> class MyTest(unittest.TestCase): ... @patch('package.module.ClassName1') ... @patch('package.module.ClassName2') ... def test_something(self, MockClass2, MockClass1): ... self.assertIs(package.module.ClassName1, MockClass1) ... self.assertIs(package.module.ClassName2, MockClass2) ... >>> MyTest('test_something').test_something() When you nest patch decorators the mocks are passed in to the decorated function in the same order they applied (the normal *Python* order that decorators are applied). This means from the bottom up, so in the example above the mock for "test_module.ClassName2" is passed in first. There is also "patch.dict()" for setting values in a dictionary just during a scope and restoring the dictionary to its original state when the test ends: >>> foo = {'key': 'value'} >>> original = foo.copy() >>> with patch.dict(foo, {'newkey': 'newvalue'}, clear=True): ... assert foo == {'newkey': 'newvalue'} ... >>> assert foo == original "patch", "patch.object" and "patch.dict" can all be used as context managers. Where you use "patch()" to create a mock for you, you can get a reference to the mock using the “as” form of the with statement: >>> class ProductionClass: ... def method(self): ... pass ... >>> with patch.object(ProductionClass, 'method') as mock_method: ... mock_method.return_value = None ... real = ProductionClass() ... real.method(1, 2, 3) ... >>> mock_method.assert_called_with(1, 2, 3) As an alternative "patch", "patch.object" and "patch.dict" can be used as class decorators. When used in this way it is the same as applying the decorator individually to every method whose name starts with “test”. Further Examples ================ Here are some more examples for some slightly more advanced scenarios. Mocking chained calls --------------------- Mocking chained calls is actually straightforward with mock once you understand the "return_value" attribute. When a mock is called for the first time, or you fetch its "return_value" before it has been called, a new "Mock" is created. This means that you can see how the object returned from a call to a mocked object has been used by interrogating the "return_value" mock: >>> mock = Mock() >>> mock().foo(a=2, b=3) >>> mock.return_value.foo.assert_called_with(a=2, b=3) From here it is a simple step to configure and then make assertions about chained calls. Of course another alternative is writing your code in a more testable way in the first place… So, suppose we have some code that looks a little bit like this: >>> class Something: ... def __init__(self): ... self.backend = BackendProvider() ... def method(self): ... response = self.backend.get_endpoint('foobar').create_call('spam', 'eggs').start_call() ... # more code Assuming that "BackendProvider" is already well tested, how do we test "method()"? Specifically, we want to test that the code section "# more code" uses the response object in the correct way. As this chain of calls is made from an instance attribute we can monkey patch the "backend" attribute on a "Something" instance. In this particular case we are only interested in the return value from the final call to "start_call" so we don’t have much configuration to do. Let’s assume the object it returns is ‘file-like’, so we’ll ensure that our response object uses the builtin "open()" as its "spec". To do this we create a mock instance as our mock backend and create a mock response object for it. To set the response as the return value for that final "start_call" we could do this: mock_backend.get_endpoint.return_value.create_call.return_value.start_call.return_value = mock_response We can do that in a slightly nicer way using the "configure_mock()" method to directly set the return value for us: >>> something = Something() >>> mock_response = Mock(spec=open) >>> mock_backend = Mock() >>> config = {'get_endpoint.return_value.create_call.return_value.start_call.return_value': mock_response} >>> mock_backend.configure_mock(**config) With these we monkey patch the “mock backend” in place and can make the real call: >>> something.backend = mock_backend >>> something.method() Using "mock_calls" we can check the chained call with a single assert. A chained call is several calls in one line of code, so there will be several entries in "mock_calls". We can use "call.call_list()" to create this list of calls for us: >>> chained = call.get_endpoint('foobar').create_call('spam', 'eggs').start_call() >>> call_list = chained.call_list() >>> assert mock_backend.mock_calls == call_list Partial mocking --------------- In some tests I wanted to mock out a call to "datetime.date.today()" to return a known date, but I didn’t want to prevent the code under test from creating new date objects. Unfortunately "datetime.date" is written in C, and so I couldn’t just monkey-patch out the static "date.today()" method. I found a simple way of doing this that involved effectively wrapping the date class with a mock, but passing through calls to the constructor to the real class (and returning real instances). The "patch decorator" is used here to mock out the "date" class in the module under test. The "side_effect" attribute on the mock date class is then set to a lambda function that returns a real date. When the mock date class is called a real date will be constructed and returned by "side_effect". >>> from datetime import date >>> with patch('mymodule.date') as mock_date: ... mock_date.today.return_value = date(2010, 10, 8) ... mock_date.side_effect = lambda *args, **kw: date(*args, **kw) ... ... assert mymodule.date.today() == date(2010, 10, 8) ... assert mymodule.date(2009, 6, 8) == date(2009, 6, 8) Note that we don’t patch "datetime.date" globally, we patch "date" in the module that *uses* it. See where to patch. When "date.today()" is called a known date is returned, but calls to the "date(...)" constructor still return normal dates. Without this you can find yourself having to calculate an expected result using exactly the same algorithm as the code under test, which is a classic testing anti-pattern. Calls to the date constructor are recorded in the "mock_date" attributes ("call_count" and friends) which may also be useful for your tests. An alternative way of dealing with mocking dates, or other builtin classes, is discussed in this blog entry. Mocking a Generator Method -------------------------- A Python generator is a function or method that uses the "yield" statement to return a series of values when iterated over [1]. A generator method / function is called to return the generator object. It is the generator object that is then iterated over. The protocol method for iteration is "__iter__()", so we can mock this using a "MagicMock". Here’s an example class with an “iter” method implemented as a generator: >>> class Foo: ... def iter(self): ... for i in [1, 2, 3]: ... yield i ... >>> foo = Foo() >>> list(foo.iter()) [1, 2, 3] How would we mock this class, and in particular its “iter” method? To configure the values returned from the iteration (implicit in the call to "list"), we need to configure the object returned by the call to "foo.iter()". >>> mock_foo = MagicMock() >>> mock_foo.iter.return_value = iter([1, 2, 3]) >>> list(mock_foo.iter()) [1, 2, 3] [1] There are also generator expressions and more advanced uses of generators, but we aren’t concerned about them here. A very good introduction to generators and how powerful they are is: Generator Tricks for Systems Programmers. Applying the same patch to every test method -------------------------------------------- If you want several patches in place for multiple test methods the obvious way is to apply the patch decorators to every method. This can feel like unnecessary repetition. For Python 2.6 or more recent you can use "patch()" (in all its various forms) as a class decorator. This applies the patches to all test methods on the class. A test method is identified by methods whose names start with "test": >>> @patch('mymodule.SomeClass') ... class MyTest(unittest.TestCase): ... ... def test_one(self, MockSomeClass): ... self.assertIs(mymodule.SomeClass, MockSomeClass) ... ... def test_two(self, MockSomeClass): ... self.assertIs(mymodule.SomeClass, MockSomeClass) ... ... def not_a_test(self): ... return 'something' ... >>> MyTest('test_one').test_one() >>> MyTest('test_two').test_two() >>> MyTest('test_two').not_a_test() 'something' An alternative way of managing patches is to use the patch methods: start and stop. These allow you to move the patching into your "setUp" and "tearDown" methods. >>> class MyTest(unittest.TestCase): ... def setUp(self): ... self.patcher = patch('mymodule.foo') ... self.mock_foo = self.patcher.start() ... ... def test_foo(self): ... self.assertIs(mymodule.foo, self.mock_foo) ... ... def tearDown(self): ... self.patcher.stop() ... >>> MyTest('test_foo').run() If you use this technique you must ensure that the patching is “undone” by calling "stop". This can be fiddlier than you might think, because if an exception is raised in the setUp then tearDown is not called. "unittest.TestCase.addCleanup()" makes this easier: >>> class MyTest(unittest.TestCase): ... def setUp(self): ... patcher = patch('mymodule.foo') ... self.addCleanup(patcher.stop) ... self.mock_foo = patcher.start() ... ... def test_foo(self): ... self.assertIs(mymodule.foo, self.mock_foo) ... >>> MyTest('test_foo').run() Mocking Unbound Methods ----------------------- Whilst writing tests today I needed to patch an *unbound method* (patching the method on the class rather than on the instance). I needed self to be passed in as the first argument because I want to make asserts about which objects were calling this particular method. The issue is that you can’t patch with a mock for this, because if you replace an unbound method with a mock it doesn’t become a bound method when fetched from the instance, and so it doesn’t get self passed in. The workaround is to patch the unbound method with a real function instead. The "patch()" decorator makes it so simple to patch out methods with a mock that having to create a real function becomes a nuisance. If you pass "autospec=True" to patch then it does the patching with a *real* function object. This function object has the same signature as the one it is replacing, but delegates to a mock under the hood. You still get your mock auto-created in exactly the same way as before. What it means though, is that if you use it to patch out an unbound method on a class the mocked function will be turned into a bound method if it is fetched from an instance. It will have "self" passed in as the first argument, which is exactly what I wanted: >>> class Foo: ... def foo(self): ... pass ... >>> with patch.object(Foo, 'foo', autospec=True) as mock_foo: ... mock_foo.return_value = 'foo' ... foo = Foo() ... foo.foo() ... 'foo' >>> mock_foo.assert_called_once_with(foo) If we don’t use "autospec=True" then the unbound method is patched out with a Mock instance instead, and isn’t called with "self". Checking multiple calls with mock --------------------------------- mock has a nice API for making assertions about how your mock objects are used. >>> mock = Mock() >>> mock.foo_bar.return_value = None >>> mock.foo_bar('baz', spam='eggs') >>> mock.foo_bar.assert_called_with('baz', spam='eggs') If your mock is only being called once you can use the "assert_called_once_with()" method that also asserts that the "call_count" is one. >>> mock.foo_bar.assert_called_once_with('baz', spam='eggs') >>> mock.foo_bar() >>> mock.foo_bar.assert_called_once_with('baz', spam='eggs') Traceback (most recent call last): ... AssertionError: Expected to be called once. Called 2 times. Both "assert_called_with" and "assert_called_once_with" make assertions about the *most recent* call. If your mock is going to be called several times, and you want to make assertions about *all* those calls you can use "call_args_list": >>> mock = Mock(return_value=None) >>> mock(1, 2, 3) >>> mock(4, 5, 6) >>> mock() >>> mock.call_args_list [call(1, 2, 3), call(4, 5, 6), call()] The "call" helper makes it easy to make assertions about these calls. You can build up a list of expected calls and compare it to "call_args_list". This looks remarkably similar to the repr of the "call_args_list": >>> expected = [call(1, 2, 3), call(4, 5, 6), call()] >>> mock.call_args_list == expected True Coping with mutable arguments ----------------------------- Another situation is rare, but can bite you, is when your mock is called with mutable arguments. "call_args" and "call_args_list" store *references* to the arguments. If the arguments are mutated by the code under test then you can no longer make assertions about what the values were when the mock was called. Here’s some example code that shows the problem. Imagine the following functions defined in ‘mymodule’: def frob(val): pass def grob(val): "First frob and then clear val" frob(val) val.clear() When we try to test that "grob" calls "frob" with the correct argument look what happens: >>> with patch('mymodule.frob') as mock_frob: ... val = {6} ... mymodule.grob(val) ... >>> val set() >>> mock_frob.assert_called_with({6}) Traceback (most recent call last): ... AssertionError: Expected: (({6},), {}) Called with: ((set(),), {}) One possibility would be for mock to copy the arguments you pass in. This could then cause problems if you do assertions that rely on object identity for equality. Here’s one solution that uses the "side_effect" functionality. If you provide a "side_effect" function for a mock then "side_effect" will be called with the same args as the mock. This gives us an opportunity to copy the arguments and store them for later assertions. In this example I’m using *another* mock to store the arguments so that I can use the mock methods for doing the assertion. Again a helper function sets this up for me. >>> from copy import deepcopy >>> from unittest.mock import Mock, patch, DEFAULT >>> def copy_call_args(mock): ... new_mock = Mock() ... def side_effect(*args, **kwargs): ... args = deepcopy(args) ... kwargs = deepcopy(kwargs) ... new_mock(*args, **kwargs) ... return DEFAULT ... mock.side_effect = side_effect ... return new_mock ... >>> with patch('mymodule.frob') as mock_frob: ... new_mock = copy_call_args(mock_frob) ... val = {6} ... mymodule.grob(val) ... >>> new_mock.assert_called_with({6}) >>> new_mock.call_args call({6}) "copy_call_args" is called with the mock that will be called. It returns a new mock that we do the assertion on. The "side_effect" function makes a copy of the args and calls our "new_mock" with the copy. Note: If your mock is only going to be used once there is an easier way of checking arguments at the point they are called. You can simply do the checking inside a "side_effect" function. >>> def side_effect(arg): ... assert arg == {6} ... >>> mock = Mock(side_effect=side_effect) >>> mock({6}) >>> mock(set()) Traceback (most recent call last): ... AssertionError An alternative approach is to create a subclass of "Mock" or "MagicMock" that copies (using "copy.deepcopy()") the arguments. Here’s an example implementation: >>> from copy import deepcopy >>> class CopyingMock(MagicMock): ... def __call__(self, /, *args, **kwargs): ... args = deepcopy(args) ... kwargs = deepcopy(kwargs) ... return super().__call__(*args, **kwargs) ... >>> c = CopyingMock(return_value=None) >>> arg = set() >>> c(arg) >>> arg.add(1) >>> c.assert_called_with(set()) >>> c.assert_called_with(arg) Traceback (most recent call last): ... AssertionError: Expected call: mock({1}) Actual call: mock(set()) >>> c.foo When you subclass "Mock" or "MagicMock" all dynamically created attributes, and the "return_value" will use your subclass automatically. That means all children of a "CopyingMock" will also have the type "CopyingMock". Nesting Patches --------------- Using patch as a context manager is nice, but if you do multiple patches you can end up with nested with statements indenting further and further to the right: >>> class MyTest(unittest.TestCase): ... ... def test_foo(self): ... with patch('mymodule.Foo') as mock_foo: ... with patch('mymodule.Bar') as mock_bar: ... with patch('mymodule.Spam') as mock_spam: ... assert mymodule.Foo is mock_foo ... assert mymodule.Bar is mock_bar ... assert mymodule.Spam is mock_spam ... >>> original = mymodule.Foo >>> MyTest('test_foo').test_foo() >>> assert mymodule.Foo is original With unittest "cleanup" functions and the patch methods: start and stop we can achieve the same effect without the nested indentation. A simple helper method, "create_patch", puts the patch in place and returns the created mock for us: >>> class MyTest(unittest.TestCase): ... ... def create_patch(self, name): ... patcher = patch(name) ... thing = patcher.start() ... self.addCleanup(patcher.stop) ... return thing ... ... def test_foo(self): ... mock_foo = self.create_patch('mymodule.Foo') ... mock_bar = self.create_patch('mymodule.Bar') ... mock_spam = self.create_patch('mymodule.Spam') ... ... assert mymodule.Foo is mock_foo ... assert mymodule.Bar is mock_bar ... assert mymodule.Spam is mock_spam ... >>> original = mymodule.Foo >>> MyTest('test_foo').run() >>> assert mymodule.Foo is original Mocking a dictionary with MagicMock ----------------------------------- You may want to mock a dictionary, or other container object, recording all access to it whilst having it still behave like a dictionary. We can do this with "MagicMock", which will behave like a dictionary, and using "side_effect" to delegate dictionary access to a real underlying dictionary that is under our control. When the "__getitem__()" and "__setitem__()" methods of our "MagicMock" are called (normal dictionary access) then "side_effect" is called with the key (and in the case of "__setitem__" the value too). We can also control what is returned. After the "MagicMock" has been used we can use attributes like "call_args_list" to assert about how the dictionary was used: >>> my_dict = {'a': 1, 'b': 2, 'c': 3} >>> def getitem(name): ... return my_dict[name] ... >>> def setitem(name, val): ... my_dict[name] = val ... >>> mock = MagicMock() >>> mock.__getitem__.side_effect = getitem >>> mock.__setitem__.side_effect = setitem Note: An alternative to using "MagicMock" is to use "Mock" and *only* provide the magic methods you specifically want: >>> mock = Mock() >>> mock.__getitem__ = Mock(side_effect=getitem) >>> mock.__setitem__ = Mock(side_effect=setitem) A *third* option is to use "MagicMock" but passing in "dict" as the *spec* (or *spec_set*) argument so that the "MagicMock" created only has dictionary magic methods available: >>> mock = MagicMock(spec_set=dict) >>> mock.__getitem__.side_effect = getitem >>> mock.__setitem__.side_effect = setitem With these side effect functions in place, the "mock" will behave like a normal dictionary but recording the access. It even raises a "KeyError" if you try to access a key that doesn’t exist. >>> mock['a'] 1 >>> mock['c'] 3 >>> mock['d'] Traceback (most recent call last): ... KeyError: 'd' >>> mock['b'] = 'fish' >>> mock['d'] = 'eggs' >>> mock['b'] 'fish' >>> mock['d'] 'eggs' After it has been used you can make assertions about the access using the normal mock methods and attributes: >>> mock.__getitem__.call_args_list [call('a'), call('c'), call('d'), call('b'), call('d')] >>> mock.__setitem__.call_args_list [call('b', 'fish'), call('d', 'eggs')] >>> my_dict {'a': 1, 'b': 'fish', 'c': 3, 'd': 'eggs'} Mock subclasses and their attributes ------------------------------------ There are various reasons why you might want to subclass "Mock". One reason might be to add helper methods. Here’s a silly example: >>> class MyMock(MagicMock): ... def has_been_called(self): ... return self.called ... >>> mymock = MyMock(return_value=None) >>> mymock >>> mymock.has_been_called() False >>> mymock() >>> mymock.has_been_called() True The standard behaviour for "Mock" instances is that attributes and the return value mocks are of the same type as the mock they are accessed on. This ensures that "Mock" attributes are "Mocks" and "MagicMock" attributes are "MagicMocks" [2]. So if you’re subclassing to add helper methods then they’ll also be available on the attributes and return value mock of instances of your subclass. >>> mymock.foo >>> mymock.foo.has_been_called() False >>> mymock.foo() >>> mymock.foo.has_been_called() True Sometimes this is inconvenient. For example, one user is subclassing mock to created a Twisted adaptor. Having this applied to attributes too actually causes errors. "Mock" (in all its flavours) uses a method called "_get_child_mock" to create these “sub-mocks” for attributes and return values. You can prevent your subclass being used for attributes by overriding this method. The signature is that it takes arbitrary keyword arguments ("**kwargs") which are then passed onto the mock constructor: >>> class Subclass(MagicMock): ... def _get_child_mock(self, /, **kwargs): ... return MagicMock(**kwargs) ... >>> mymock = Subclass() >>> mymock.foo >>> assert isinstance(mymock, Subclass) >>> assert not isinstance(mymock.foo, Subclass) >>> assert not isinstance(mymock(), Subclass) [2] An exception to this rule are the non-callable mocks. Attributes use the callable variant because otherwise non-callable mocks couldn’t have callable methods. Mocking imports with patch.dict ------------------------------- One situation where mocking can be hard is where you have a local import inside a function. These are harder to mock because they aren’t using an object from the module namespace that we can patch out. Generally local imports are to be avoided. They are sometimes done to prevent circular dependencies, for which there is *usually* a much better way to solve the problem (refactor the code) or to prevent “up front costs” by delaying the import. This can also be solved in better ways than an unconditional local import (store the module as a class or module attribute and only do the import on first use). That aside there is a way to use "mock" to affect the results of an import. Importing fetches an *object* from the "sys.modules" dictionary. Note that it fetches an *object*, which need not be a module. Importing a module for the first time results in a module object being put in *sys.modules*, so usually when you import something you get a module back. This need not be the case however. This means you can use "patch.dict()" to *temporarily* put a mock in place in "sys.modules". Any imports whilst this patch is active will fetch the mock. When the patch is complete (the decorated function exits, the with statement body is complete or "patcher.stop()" is called) then whatever was there previously will be restored safely. Here’s an example that mocks out the ‘fooble’ module. >>> import sys >>> mock = Mock() >>> with patch.dict('sys.modules', {'fooble': mock}): ... import fooble ... fooble.blob() ... >>> assert 'fooble' not in sys.modules >>> mock.blob.assert_called_once_with() As you can see the "import fooble" succeeds, but on exit there is no ‘fooble’ left in "sys.modules". This also works for the "from module import name" form: >>> mock = Mock() >>> with patch.dict('sys.modules', {'fooble': mock}): ... from fooble import blob ... blob.blip() ... >>> mock.blob.blip.assert_called_once_with() With slightly more work you can also mock package imports: >>> mock = Mock() >>> modules = {'package': mock, 'package.module': mock.module} >>> with patch.dict('sys.modules', modules): ... from package.module import fooble ... fooble() ... >>> mock.module.fooble.assert_called_once_with() Tracking order of calls and less verbose call assertions -------------------------------------------------------- The "Mock" class allows you to track the *order* of method calls on your mock objects through the "method_calls" attribute. This doesn’t allow you to track the order of calls between separate mock objects, however we can use "mock_calls" to achieve the same effect. Because mocks track calls to child mocks in "mock_calls", and accessing an arbitrary attribute of a mock creates a child mock, we can create our separate mocks from a parent one. Calls to those child mock will then all be recorded, in order, in the "mock_calls" of the parent: >>> manager = Mock() >>> mock_foo = manager.foo >>> mock_bar = manager.bar >>> mock_foo.something() >>> mock_bar.other.thing() >>> manager.mock_calls [call.foo.something(), call.bar.other.thing()] We can then assert about the calls, including the order, by comparing with the "mock_calls" attribute on the manager mock: >>> expected_calls = [call.foo.something(), call.bar.other.thing()] >>> manager.mock_calls == expected_calls True If "patch" is creating, and putting in place, your mocks then you can attach them to a manager mock using the "attach_mock()" method. After attaching calls will be recorded in "mock_calls" of the manager. >>> manager = MagicMock() >>> with patch('mymodule.Class1') as MockClass1: ... with patch('mymodule.Class2') as MockClass2: ... manager.attach_mock(MockClass1, 'MockClass1') ... manager.attach_mock(MockClass2, 'MockClass2') ... MockClass1().foo() ... MockClass2().bar() >>> manager.mock_calls [call.MockClass1(), call.MockClass1().foo(), call.MockClass2(), call.MockClass2().bar()] If many calls have been made, but you’re only interested in a particular sequence of them then an alternative is to use the "assert_has_calls()" method. This takes a list of calls (constructed with the "call" object). If that sequence of calls are in "mock_calls" then the assert succeeds. >>> m = MagicMock() >>> m().foo().bar().baz() >>> m.one().two().three() >>> calls = call.one().two().three().call_list() >>> m.assert_has_calls(calls) Even though the chained call "m.one().two().three()" aren’t the only calls that have been made to the mock, the assert still succeeds. Sometimes a mock may have several calls made to it, and you are only interested in asserting about *some* of those calls. You may not even care about the order. In this case you can pass "any_order=True" to "assert_has_calls": >>> m = MagicMock() >>> m(1), m.two(2, 3), m.seven(7), m.fifty('50') (...) >>> calls = [call.fifty('50'), call(1), call.seven(7)] >>> m.assert_has_calls(calls, any_order=True) More complex argument matching ------------------------------ Using the same basic concept as "ANY" we can implement matchers to do more complex assertions on objects used as arguments to mocks. Suppose we expect some object to be passed to a mock that by default compares equal based on object identity (which is the Python default for user defined classes). To use "assert_called_with()" we would need to pass in the exact same object. If we are only interested in some of the attributes of this object then we can create a matcher that will check these attributes for us. You can see in this example how a ‘standard’ call to "assert_called_with" isn’t sufficient: >>> class Foo: ... def __init__(self, a, b): ... self.a, self.b = a, b ... >>> mock = Mock(return_value=None) >>> mock(Foo(1, 2)) >>> mock.assert_called_with(Foo(1, 2)) Traceback (most recent call last): ... AssertionError: Expected: call(<__main__.Foo object at 0x...>) Actual call: call(<__main__.Foo object at 0x...>) A comparison function for our "Foo" class might look something like this: >>> def compare(self, other): ... if not type(self) == type(other): ... return False ... if self.a != other.a: ... return False ... if self.b != other.b: ... return False ... return True ... And a matcher object that can use comparison functions like this for its equality operation would look something like this: >>> class Matcher: ... def __init__(self, compare, some_obj): ... self.compare = compare ... self.some_obj = some_obj ... def __eq__(self, other): ... return self.compare(self.some_obj, other) ... Putting all this together: >>> match_foo = Matcher(compare, Foo(1, 2)) >>> mock.assert_called_with(match_foo) The "Matcher" is instantiated with our compare function and the "Foo" object we want to compare against. In "assert_called_with" the "Matcher" equality method will be called, which compares the object the mock was called with against the one we created our matcher with. If they match then "assert_called_with" passes, and if they don’t an "AssertionError" is raised: >>> match_wrong = Matcher(compare, Foo(3, 4)) >>> mock.assert_called_with(match_wrong) Traceback (most recent call last): ... AssertionError: Expected: ((,), {}) Called with: ((,), {}) With a bit of tweaking you could have the comparison function raise the "AssertionError" directly and provide a more useful failure message. As of version 1.5, the Python testing library PyHamcrest provides similar functionality, that may be useful here, in the form of its equality matcher (hamcrest.library.integration.match_equality). "unittest.mock" — mock object library ************************************* New in version 3.3. **Source code:** Lib/unittest/mock.py ====================================================================== "unittest.mock" is a library for testing in Python. It allows you to replace parts of your system under test with mock objects and make assertions about how they have been used. "unittest.mock" provides a core "Mock" class removing the need to create a host of stubs throughout your test suite. After performing an action, you can make assertions about which methods / attributes were used and arguments they were called with. You can also specify return values and set needed attributes in the normal way. Additionally, mock provides a "patch()" decorator that handles patching module and class level attributes within the scope of a test, along with "sentinel" for creating unique objects. See the quick guide for some examples of how to use "Mock", "MagicMock" and "patch()". Mock is designed for use with "unittest" and is based on the ‘action -> assertion’ pattern instead of ‘record -> replay’ used by many mocking frameworks. There is a backport of "unittest.mock" for earlier versions of Python, available as mock on PyPI. Quick Guide =========== "Mock" and "MagicMock" objects create all attributes and methods as you access them and store details of how they have been used. You can configure them, to specify return values or limit what attributes are available, and then make assertions about how they have been used: >>> from unittest.mock import MagicMock >>> thing = ProductionClass() >>> thing.method = MagicMock(return_value=3) >>> thing.method(3, 4, 5, key='value') 3 >>> thing.method.assert_called_with(3, 4, 5, key='value') "side_effect" allows you to perform side effects, including raising an exception when a mock is called: >>> mock = Mock(side_effect=KeyError('foo')) >>> mock() Traceback (most recent call last): ... KeyError: 'foo' >>> values = {'a': 1, 'b': 2, 'c': 3} >>> def side_effect(arg): ... return values[arg] ... >>> mock.side_effect = side_effect >>> mock('a'), mock('b'), mock('c') (1, 2, 3) >>> mock.side_effect = [5, 4, 3, 2, 1] >>> mock(), mock(), mock() (5, 4, 3) Mock has many other ways you can configure it and control its behaviour. For example the *spec* argument configures the mock to take its specification from another object. Attempting to access attributes or methods on the mock that don’t exist on the spec will fail with an "AttributeError". The "patch()" decorator / context manager makes it easy to mock classes or objects in a module under test. The object you specify will be replaced with a mock (or other object) during the test and restored when the test ends: >>> from unittest.mock import patch >>> @patch('module.ClassName2') ... @patch('module.ClassName1') ... def test(MockClass1, MockClass2): ... module.ClassName1() ... module.ClassName2() ... assert MockClass1 is module.ClassName1 ... assert MockClass2 is module.ClassName2 ... assert MockClass1.called ... assert MockClass2.called ... >>> test() Note: When you nest patch decorators the mocks are passed in to the decorated function in the same order they applied (the normal *Python* order that decorators are applied). This means from the bottom up, so in the example above the mock for "module.ClassName1" is passed in first.With "patch()" it matters that you patch objects in the namespace where they are looked up. This is normally straightforward, but for a quick guide read where to patch. As well as a decorator "patch()" can be used as a context manager in a with statement: >>> with patch.object(ProductionClass, 'method', return_value=None) as mock_method: ... thing = ProductionClass() ... thing.method(1, 2, 3) ... >>> mock_method.assert_called_once_with(1, 2, 3) There is also "patch.dict()" for setting values in a dictionary just during a scope and restoring the dictionary to its original state when the test ends: >>> foo = {'key': 'value'} >>> original = foo.copy() >>> with patch.dict(foo, {'newkey': 'newvalue'}, clear=True): ... assert foo == {'newkey': 'newvalue'} ... >>> assert foo == original Mock supports the mocking of Python magic methods. The easiest way of using magic methods is with the "MagicMock" class. It allows you to do things like: >>> mock = MagicMock() >>> mock.__str__.return_value = 'foobarbaz' >>> str(mock) 'foobarbaz' >>> mock.__str__.assert_called_with() Mock allows you to assign functions (or other Mock instances) to magic methods and they will be called appropriately. The "MagicMock" class is just a Mock variant that has all of the magic methods pre-created for you (well, all the useful ones anyway). The following is an example of using magic methods with the ordinary Mock class: >>> mock = Mock() >>> mock.__str__ = Mock(return_value='wheeeeee') >>> str(mock) 'wheeeeee' For ensuring that the mock objects in your tests have the same api as the objects they are replacing, you can use auto-speccing. Auto- speccing can be done through the *autospec* argument to patch, or the "create_autospec()" function. Auto-speccing creates mock objects that have the same attributes and methods as the objects they are replacing, and any functions and methods (including constructors) have the same call signature as the real object. This ensures that your mocks will fail in the same way as your production code if they are used incorrectly: >>> from unittest.mock import create_autospec >>> def function(a, b, c): ... pass ... >>> mock_function = create_autospec(function, return_value='fishy') >>> mock_function(1, 2, 3) 'fishy' >>> mock_function.assert_called_once_with(1, 2, 3) >>> mock_function('wrong arguments') Traceback (most recent call last): ... TypeError: () takes exactly 3 arguments (1 given) "create_autospec()" can also be used on classes, where it copies the signature of the "__init__" method, and on callable objects where it copies the signature of the "__call__" method. The Mock Class ============== "Mock" is a flexible mock object intended to replace the use of stubs and test doubles throughout your code. Mocks are callable and create attributes as new mocks when you access them [1]. Accessing the same attribute will always return the same mock. Mocks record how you use them, allowing you to make assertions about what your code has done to them. "MagicMock" is a subclass of "Mock" with all the magic methods pre- created and ready to use. There are also non-callable variants, useful when you are mocking out objects that aren’t callable: "NonCallableMock" and "NonCallableMagicMock" The "patch()" decorators makes it easy to temporarily replace classes in a particular module with a "Mock" object. By default "patch()" will create a "MagicMock" for you. You can specify an alternative class of "Mock" using the *new_callable* argument to "patch()". class unittest.mock.Mock(spec=None, side_effect=None, return_value=DEFAULT, wraps=None, name=None, spec_set=None, unsafe=False, **kwargs) Create a new "Mock" object. "Mock" takes several optional arguments that specify the behaviour of the Mock object: * *spec*: This can be either a list of strings or an existing object (a class or instance) that acts as the specification for the mock object. If you pass in an object then a list of strings is formed by calling dir on the object (excluding unsupported magic attributes and methods). Accessing any attribute not in this list will raise an "AttributeError". If *spec* is an object (rather than a list of strings) then "__class__" returns the class of the spec object. This allows mocks to pass "isinstance()" tests. * *spec_set*: A stricter variant of *spec*. If used, attempting to *set* or get an attribute on the mock that isn’t on the object passed as *spec_set* will raise an "AttributeError". * *side_effect*: A function to be called whenever the Mock is called. See the "side_effect" attribute. Useful for raising exceptions or dynamically changing return values. The function is called with the same arguments as the mock, and unless it returns "DEFAULT", the return value of this function is used as the return value. Alternatively *side_effect* can be an exception class or instance. In this case the exception will be raised when the mock is called. If *side_effect* is an iterable then each call to the mock will return the next value from the iterable. A *side_effect* can be cleared by setting it to "None". * *return_value*: The value returned when the mock is called. By default this is a new Mock (created on first access). See the "return_value" attribute. * *unsafe*: By default if any attribute starts with *assert* or *assret* will raise an "AttributeError". Passing "unsafe=True" will allow access to these attributes. New in version 3.5. * *wraps*: Item for the mock object to wrap. If *wraps* is not "None" then calling the Mock will pass the call through to the wrapped object (returning the real result). Attribute access on the mock will return a Mock object that wraps the corresponding attribute of the wrapped object (so attempting to access an attribute that doesn’t exist will raise an "AttributeError"). If the mock has an explicit *return_value* set then calls are not passed to the wrapped object and the *return_value* is returned instead. * *name*: If the mock has a name then it will be used in the repr of the mock. This can be useful for debugging. The name is propagated to child mocks. Mocks can also be called with arbitrary keyword arguments. These will be used to set attributes on the mock after it is created. See the "configure_mock()" method for details. assert_called() Assert that the mock was called at least once. >>> mock = Mock() >>> mock.method() >>> mock.method.assert_called() New in version 3.6. assert_called_once() Assert that the mock was called exactly once. >>> mock = Mock() >>> mock.method() >>> mock.method.assert_called_once() >>> mock.method() >>> mock.method.assert_called_once() Traceback (most recent call last): ... AssertionError: Expected 'method' to have been called once. Called 2 times. New in version 3.6. assert_called_with(*args, **kwargs) This method is a convenient way of asserting that the last call has been made in a particular way: >>> mock = Mock() >>> mock.method(1, 2, 3, test='wow') >>> mock.method.assert_called_with(1, 2, 3, test='wow') assert_called_once_with(*args, **kwargs) Assert that the mock was called exactly once and that call was with the specified arguments. >>> mock = Mock(return_value=None) >>> mock('foo', bar='baz') >>> mock.assert_called_once_with('foo', bar='baz') >>> mock('other', bar='values') >>> mock.assert_called_once_with('other', bar='values') Traceback (most recent call last): ... AssertionError: Expected 'mock' to be called once. Called 2 times. assert_any_call(*args, **kwargs) assert the mock has been called with the specified arguments. The assert passes if the mock has *ever* been called, unlike "assert_called_with()" and "assert_called_once_with()" that only pass if the call is the most recent one, and in the case of "assert_called_once_with()" it must also be the only call. >>> mock = Mock(return_value=None) >>> mock(1, 2, arg='thing') >>> mock('some', 'thing', 'else') >>> mock.assert_any_call(1, 2, arg='thing') assert_has_calls(calls, any_order=False) assert the mock has been called with the specified calls. The "mock_calls" list is checked for the calls. If *any_order* is false then the calls must be sequential. There can be extra calls before or after the specified calls. If *any_order* is true then the calls can be in any order, but they must all appear in "mock_calls". >>> mock = Mock(return_value=None) >>> mock(1) >>> mock(2) >>> mock(3) >>> mock(4) >>> calls = [call(2), call(3)] >>> mock.assert_has_calls(calls) >>> calls = [call(4), call(2), call(3)] >>> mock.assert_has_calls(calls, any_order=True) assert_not_called() Assert the mock was never called. >>> m = Mock() >>> m.hello.assert_not_called() >>> obj = m.hello() >>> m.hello.assert_not_called() Traceback (most recent call last): ... AssertionError: Expected 'hello' to not have been called. Called 1 times. New in version 3.5. reset_mock(*, return_value=False, side_effect=False) The reset_mock method resets all the call attributes on a mock object: >>> mock = Mock(return_value=None) >>> mock('hello') >>> mock.called True >>> mock.reset_mock() >>> mock.called False Changed in version 3.6: Added two keyword only argument to the reset_mock function. This can be useful where you want to make a series of assertions that reuse the same object. Note that "reset_mock()" *doesn’t* clear the return value, "side_effect" or any child attributes you have set using normal assignment by default. In case you want to reset *return_value* or "side_effect", then pass the corresponding parameter as "True". Child mocks and the return value mock (if any) are reset as well. Note: *return_value*, and "side_effect" are keyword only argument. mock_add_spec(spec, spec_set=False) Add a spec to a mock. *spec* can either be an object or a list of strings. Only attributes on the *spec* can be fetched as attributes from the mock. If *spec_set* is true then only attributes on the spec can be set. attach_mock(mock, attribute) Attach a mock as an attribute of this one, replacing its name and parent. Calls to the attached mock will be recorded in the "method_calls" and "mock_calls" attributes of this one. configure_mock(**kwargs) Set attributes on the mock through keyword arguments. Attributes plus return values and side effects can be set on child mocks using standard dot notation and unpacking a dictionary in the method call: >>> mock = Mock() >>> attrs = {'method.return_value': 3, 'other.side_effect': KeyError} >>> mock.configure_mock(**attrs) >>> mock.method() 3 >>> mock.other() Traceback (most recent call last): ... KeyError The same thing can be achieved in the constructor call to mocks: >>> attrs = {'method.return_value': 3, 'other.side_effect': KeyError} >>> mock = Mock(some_attribute='eggs', **attrs) >>> mock.some_attribute 'eggs' >>> mock.method() 3 >>> mock.other() Traceback (most recent call last): ... KeyError "configure_mock()" exists to make it easier to do configuration after the mock has been created. __dir__() "Mock" objects limit the results of "dir(some_mock)" to useful results. For mocks with a *spec* this includes all the permitted attributes for the mock. See "FILTER_DIR" for what this filtering does, and how to switch it off. _get_child_mock(**kw) Create the child mocks for attributes and return value. By default child mocks will be the same type as the parent. Subclasses of Mock may want to override this to customize the way child mocks are made. For non-callable mocks the callable variant will be used (rather than any custom subclass). called A boolean representing whether or not the mock object has been called: >>> mock = Mock(return_value=None) >>> mock.called False >>> mock() >>> mock.called True call_count An integer telling you how many times the mock object has been called: >>> mock = Mock(return_value=None) >>> mock.call_count 0 >>> mock() >>> mock() >>> mock.call_count 2 return_value Set this to configure the value returned by calling the mock: >>> mock = Mock() >>> mock.return_value = 'fish' >>> mock() 'fish' The default return value is a mock object and you can configure it in the normal way: >>> mock = Mock() >>> mock.return_value.attribute = sentinel.Attribute >>> mock.return_value() >>> mock.return_value.assert_called_with() "return_value" can also be set in the constructor: >>> mock = Mock(return_value=3) >>> mock.return_value 3 >>> mock() 3 side_effect This can either be a function to be called when the mock is called, an iterable or an exception (class or instance) to be raised. If you pass in a function it will be called with same arguments as the mock and unless the function returns the "DEFAULT" singleton the call to the mock will then return whatever the function returns. If the function returns "DEFAULT" then the mock will return its normal value (from the "return_value"). If you pass in an iterable, it is used to retrieve an iterator which must yield a value on every call. This value can either be an exception instance to be raised, or a value to be returned from the call to the mock ("DEFAULT" handling is identical to the function case). An example of a mock that raises an exception (to test exception handling of an API): >>> mock = Mock() >>> mock.side_effect = Exception('Boom!') >>> mock() Traceback (most recent call last): ... Exception: Boom! Using "side_effect" to return a sequence of values: >>> mock = Mock() >>> mock.side_effect = [3, 2, 1] >>> mock(), mock(), mock() (3, 2, 1) Using a callable: >>> mock = Mock(return_value=3) >>> def side_effect(*args, **kwargs): ... return DEFAULT ... >>> mock.side_effect = side_effect >>> mock() 3 "side_effect" can be set in the constructor. Here’s an example that adds one to the value the mock is called with and returns it: >>> side_effect = lambda value: value + 1 >>> mock = Mock(side_effect=side_effect) >>> mock(3) 4 >>> mock(-8) -7 Setting "side_effect" to "None" clears it: >>> m = Mock(side_effect=KeyError, return_value=3) >>> m() Traceback (most recent call last): ... KeyError >>> m.side_effect = None >>> m() 3 call_args This is either "None" (if the mock hasn’t been called), or the arguments that the mock was last called with. This will be in the form of a tuple: the first member, which can also be accessed through the "args" property, is any ordered arguments the mock was called with (or an empty tuple) and the second member, which can also be accessed through the "kwargs" property, is any keyword arguments (or an empty dictionary). >>> mock = Mock(return_value=None) >>> print(mock.call_args) None >>> mock() >>> mock.call_args call() >>> mock.call_args == () True >>> mock(3, 4) >>> mock.call_args call(3, 4) >>> mock.call_args == ((3, 4),) True >>> mock.call_args.args (3, 4) >>> mock.call_args.kwargs {} >>> mock(3, 4, 5, key='fish', next='w00t!') >>> mock.call_args call(3, 4, 5, key='fish', next='w00t!') >>> mock.call_args.args (3, 4, 5) >>> mock.call_args.kwargs {'key': 'fish', 'next': 'w00t!'} "call_args", along with members of the lists "call_args_list", "method_calls" and "mock_calls" are "call" objects. These are tuples, so they can be unpacked to get at the individual arguments and make more complex assertions. See calls as tuples. Changed in version 3.8: Added "args" and "kwargs" properties. call_args_list This is a list of all the calls made to the mock object in sequence (so the length of the list is the number of times it has been called). Before any calls have been made it is an empty list. The "call" object can be used for conveniently constructing lists of calls to compare with "call_args_list". >>> mock = Mock(return_value=None) >>> mock() >>> mock(3, 4) >>> mock(key='fish', next='w00t!') >>> mock.call_args_list [call(), call(3, 4), call(key='fish', next='w00t!')] >>> expected = [(), ((3, 4),), ({'key': 'fish', 'next': 'w00t!'},)] >>> mock.call_args_list == expected True Members of "call_args_list" are "call" objects. These can be unpacked as tuples to get at the individual arguments. See calls as tuples. method_calls As well as tracking calls to themselves, mocks also track calls to methods and attributes, and *their* methods and attributes: >>> mock = Mock() >>> mock.method() >>> mock.property.method.attribute() >>> mock.method_calls [call.method(), call.property.method.attribute()] Members of "method_calls" are "call" objects. These can be unpacked as tuples to get at the individual arguments. See calls as tuples. mock_calls "mock_calls" records *all* calls to the mock object, its methods, magic methods *and* return value mocks. >>> mock = MagicMock() >>> result = mock(1, 2, 3) >>> mock.first(a=3) >>> mock.second() >>> int(mock) 1 >>> result(1) >>> expected = [call(1, 2, 3), call.first(a=3), call.second(), ... call.__int__(), call()(1)] >>> mock.mock_calls == expected True Members of "mock_calls" are "call" objects. These can be unpacked as tuples to get at the individual arguments. See calls as tuples. Note: The way "mock_calls" are recorded means that where nested calls are made, the parameters of ancestor calls are not recorded and so will always compare equal: >>> mock = MagicMock() >>> mock.top(a=3).bottom() >>> mock.mock_calls [call.top(a=3), call.top().bottom()] >>> mock.mock_calls[-1] == call.top(a=-1).bottom() True __class__ Normally the "__class__" attribute of an object will return its type. For a mock object with a "spec", "__class__" returns the spec class instead. This allows mock objects to pass "isinstance()" tests for the object they are replacing / masquerading as: >>> mock = Mock(spec=3) >>> isinstance(mock, int) True "__class__" is assignable to, this allows a mock to pass an "isinstance()" check without forcing you to use a spec: >>> mock = Mock() >>> mock.__class__ = dict >>> isinstance(mock, dict) True class unittest.mock.NonCallableMock(spec=None, wraps=None, name=None, spec_set=None, **kwargs) A non-callable version of "Mock". The constructor parameters have the same meaning of "Mock", with the exception of *return_value* and *side_effect* which have no meaning on a non-callable mock. Mock objects that use a class or an instance as a "spec" or "spec_set" are able to pass "isinstance()" tests: >>> mock = Mock(spec=SomeClass) >>> isinstance(mock, SomeClass) True >>> mock = Mock(spec_set=SomeClass()) >>> isinstance(mock, SomeClass) True The "Mock" classes have support for mocking magic methods. See magic methods for the full details. The mock classes and the "patch()" decorators all take arbitrary keyword arguments for configuration. For the "patch()" decorators the keywords are passed to the constructor of the mock being created. The keyword arguments are for configuring attributes of the mock: >>> m = MagicMock(attribute=3, other='fish') >>> m.attribute 3 >>> m.other 'fish' The return value and side effect of child mocks can be set in the same way, using dotted notation. As you can’t use dotted names directly in a call you have to create a dictionary and unpack it using "**": >>> attrs = {'method.return_value': 3, 'other.side_effect': KeyError} >>> mock = Mock(some_attribute='eggs', **attrs) >>> mock.some_attribute 'eggs' >>> mock.method() 3 >>> mock.other() Traceback (most recent call last): ... KeyError A callable mock which was created with a *spec* (or a *spec_set*) will introspect the specification object’s signature when matching calls to the mock. Therefore, it can match the actual call’s arguments regardless of whether they were passed positionally or by name: >>> def f(a, b, c): pass ... >>> mock = Mock(spec=f) >>> mock(1, 2, c=3) >>> mock.assert_called_with(1, 2, 3) >>> mock.assert_called_with(a=1, b=2, c=3) This applies to "assert_called_with()", "assert_called_once_with()", "assert_has_calls()" and "assert_any_call()". When Autospeccing, it will also apply to method calls on the mock object. Changed in version 3.4: Added signature introspection on specced and autospecced mock objects. class unittest.mock.PropertyMock(*args, **kwargs) A mock intended to be used as a property, or other descriptor, on a class. "PropertyMock" provides "__get__()" and "__set__()" methods so you can specify a return value when it is fetched. Fetching a "PropertyMock" instance from an object calls the mock, with no args. Setting it calls the mock with the value being set. >>> class Foo: ... @property ... def foo(self): ... return 'something' ... @foo.setter ... def foo(self, value): ... pass ... >>> with patch('__main__.Foo.foo', new_callable=PropertyMock) as mock_foo: ... mock_foo.return_value = 'mockity-mock' ... this_foo = Foo() ... print(this_foo.foo) ... this_foo.foo = 6 ... mockity-mock >>> mock_foo.mock_calls [call(), call(6)] Because of the way mock attributes are stored you can’t directly attach a "PropertyMock" to a mock object. Instead you can attach it to the mock type object: >>> m = MagicMock() >>> p = PropertyMock(return_value=3) >>> type(m).foo = p >>> m.foo 3 >>> p.assert_called_once_with() class unittest.mock.AsyncMock(spec=None, side_effect=None, return_value=DEFAULT, wraps=None, name=None, spec_set=None, unsafe=False, **kwargs) An asynchronous version of "MagicMock". The "AsyncMock" object will behave so the object is recognized as an async function, and the result of a call is an awaitable. >>> mock = AsyncMock() >>> asyncio.iscoroutinefunction(mock) True >>> inspect.isawaitable(mock()) True The result of "mock()" is an async function which will have the outcome of "side_effect" or "return_value" after it has been awaited: * if "side_effect" is a function, the async function will return the result of that function, * if "side_effect" is an exception, the async function will raise the exception, * if "side_effect" is an iterable, the async function will return the next value of the iterable, however, if the sequence of result is exhausted, "StopAsyncIteration" is raised immediately, * if "side_effect" is not defined, the async function will return the value defined by "return_value", hence, by default, the async function returns a new "AsyncMock" object. Setting the *spec* of a "Mock" or "MagicMock" to an async function will result in a coroutine object being returned after calling. >>> async def async_func(): pass ... >>> mock = MagicMock(async_func) >>> mock >>> mock() Setting the *spec* of a "Mock", "MagicMock", or "AsyncMock" to a class with asynchronous and synchronous functions will automatically detect the synchronous functions and set them as "MagicMock" (if the parent mock is "AsyncMock" or "MagicMock") or "Mock" (if the parent mock is "Mock"). All asynchronous functions will be "AsyncMock". >>> class ExampleClass: ... def sync_foo(): ... pass ... async def async_foo(): ... pass ... >>> a_mock = AsyncMock(ExampleClass) >>> a_mock.sync_foo >>> a_mock.async_foo >>> mock = Mock(ExampleClass) >>> mock.sync_foo >>> mock.async_foo New in version 3.8. assert_awaited() Assert that the mock was awaited at least once. Note that this is separate from the object having been called, the "await" keyword must be used: >>> mock = AsyncMock() >>> async def main(coroutine_mock): ... await coroutine_mock ... >>> coroutine_mock = mock() >>> mock.called True >>> mock.assert_awaited() Traceback (most recent call last): ... AssertionError: Expected mock to have been awaited. >>> asyncio.run(main(coroutine_mock)) >>> mock.assert_awaited() assert_awaited_once() Assert that the mock was awaited exactly once. >>> mock = AsyncMock() >>> async def main(): ... await mock() ... >>> asyncio.run(main()) >>> mock.assert_awaited_once() >>> asyncio.run(main()) >>> mock.method.assert_awaited_once() Traceback (most recent call last): ... AssertionError: Expected mock to have been awaited once. Awaited 2 times. assert_awaited_with(*args, **kwargs) Assert that the last await was with the specified arguments. >>> mock = AsyncMock() >>> async def main(*args, **kwargs): ... await mock(*args, **kwargs) ... >>> asyncio.run(main('foo', bar='bar')) >>> mock.assert_awaited_with('foo', bar='bar') >>> mock.assert_awaited_with('other') Traceback (most recent call last): ... AssertionError: expected call not found. Expected: mock('other') Actual: mock('foo', bar='bar') assert_awaited_once_with(*args, **kwargs) Assert that the mock was awaited exactly once and with the specified arguments. >>> mock = AsyncMock() >>> async def main(*args, **kwargs): ... await mock(*args, **kwargs) ... >>> asyncio.run(main('foo', bar='bar')) >>> mock.assert_awaited_once_with('foo', bar='bar') >>> asyncio.run(main('foo', bar='bar')) >>> mock.assert_awaited_once_with('foo', bar='bar') Traceback (most recent call last): ... AssertionError: Expected mock to have been awaited once. Awaited 2 times. assert_any_await(*args, **kwargs) Assert the mock has ever been awaited with the specified arguments. >>> mock = AsyncMock() >>> async def main(*args, **kwargs): ... await mock(*args, **kwargs) ... >>> asyncio.run(main('foo', bar='bar')) >>> asyncio.run(main('hello')) >>> mock.assert_any_await('foo', bar='bar') >>> mock.assert_any_await('other') Traceback (most recent call last): ... AssertionError: mock('other') await not found assert_has_awaits(calls, any_order=False) Assert the mock has been awaited with the specified calls. The "await_args_list" list is checked for the awaits. If *any_order* is false then the awaits must be sequential. There can be extra calls before or after the specified awaits. If *any_order* is true then the awaits can be in any order, but they must all appear in "await_args_list". >>> mock = AsyncMock() >>> async def main(*args, **kwargs): ... await mock(*args, **kwargs) ... >>> calls = [call("foo"), call("bar")] >>> mock.assert_has_awaits(calls) Traceback (most recent call last): ... AssertionError: Awaits not found. Expected: [call('foo'), call('bar')] Actual: [] >>> asyncio.run(main('foo')) >>> asyncio.run(main('bar')) >>> mock.assert_has_awaits(calls) assert_not_awaited() Assert that the mock was never awaited. >>> mock = AsyncMock() >>> mock.assert_not_awaited() reset_mock(*args, **kwargs) See "Mock.reset_mock()". Also sets "await_count" to 0, "await_args" to None, and clears the "await_args_list". await_count An integer keeping track of how many times the mock object has been awaited. >>> mock = AsyncMock() >>> async def main(): ... await mock() ... >>> asyncio.run(main()) >>> mock.await_count 1 >>> asyncio.run(main()) >>> mock.await_count 2 await_args This is either "None" (if the mock hasn’t been awaited), or the arguments that the mock was last awaited with. Functions the same as "Mock.call_args". >>> mock = AsyncMock() >>> async def main(*args): ... await mock(*args) ... >>> mock.await_args >>> asyncio.run(main('foo')) >>> mock.await_args call('foo') >>> asyncio.run(main('bar')) >>> mock.await_args call('bar') await_args_list This is a list of all the awaits made to the mock object in sequence (so the length of the list is the number of times it has been awaited). Before any awaits have been made it is an empty list. >>> mock = AsyncMock() >>> async def main(*args): ... await mock(*args) ... >>> mock.await_args_list [] >>> asyncio.run(main('foo')) >>> mock.await_args_list [call('foo')] >>> asyncio.run(main('bar')) >>> mock.await_args_list [call('foo'), call('bar')] Calling ------- Mock objects are callable. The call will return the value set as the "return_value" attribute. The default return value is a new Mock object; it is created the first time the return value is accessed (either explicitly or by calling the Mock) - but it is stored and the same one returned each time. Calls made to the object will be recorded in the attributes like "call_args" and "call_args_list". If "side_effect" is set then it will be called after the call has been recorded, so if "side_effect" raises an exception the call is still recorded. The simplest way to make a mock raise an exception when called is to make "side_effect" an exception class or instance: >>> m = MagicMock(side_effect=IndexError) >>> m(1, 2, 3) Traceback (most recent call last): ... IndexError >>> m.mock_calls [call(1, 2, 3)] >>> m.side_effect = KeyError('Bang!') >>> m('two', 'three', 'four') Traceback (most recent call last): ... KeyError: 'Bang!' >>> m.mock_calls [call(1, 2, 3), call('two', 'three', 'four')] If "side_effect" is a function then whatever that function returns is what calls to the mock return. The "side_effect" function is called with the same arguments as the mock. This allows you to vary the return value of the call dynamically, based on the input: >>> def side_effect(value): ... return value + 1 ... >>> m = MagicMock(side_effect=side_effect) >>> m(1) 2 >>> m(2) 3 >>> m.mock_calls [call(1), call(2)] If you want the mock to still return the default return value (a new mock), or any set return value, then there are two ways of doing this. Either return "mock.return_value" from inside "side_effect", or return "DEFAULT": >>> m = MagicMock() >>> def side_effect(*args, **kwargs): ... return m.return_value ... >>> m.side_effect = side_effect >>> m.return_value = 3 >>> m() 3 >>> def side_effect(*args, **kwargs): ... return DEFAULT ... >>> m.side_effect = side_effect >>> m() 3 To remove a "side_effect", and return to the default behaviour, set the "side_effect" to "None": >>> m = MagicMock(return_value=6) >>> def side_effect(*args, **kwargs): ... return 3 ... >>> m.side_effect = side_effect >>> m() 3 >>> m.side_effect = None >>> m() 6 The "side_effect" can also be any iterable object. Repeated calls to the mock will return values from the iterable (until the iterable is exhausted and a "StopIteration" is raised): >>> m = MagicMock(side_effect=[1, 2, 3]) >>> m() 1 >>> m() 2 >>> m() 3 >>> m() Traceback (most recent call last): ... StopIteration If any members of the iterable are exceptions they will be raised instead of returned: >>> iterable = (33, ValueError, 66) >>> m = MagicMock(side_effect=iterable) >>> m() 33 >>> m() Traceback (most recent call last): ... ValueError >>> m() 66 Deleting Attributes ------------------- Mock objects create attributes on demand. This allows them to pretend to be objects of any type. You may want a mock object to return "False" to a "hasattr()" call, or raise an "AttributeError" when an attribute is fetched. You can do this by providing an object as a "spec" for a mock, but that isn’t always convenient. You “block” attributes by deleting them. Once deleted, accessing an attribute will raise an "AttributeError". >>> mock = MagicMock() >>> hasattr(mock, 'm') True >>> del mock.m >>> hasattr(mock, 'm') False >>> del mock.f >>> mock.f Traceback (most recent call last): ... AttributeError: f Mock names and the name attribute --------------------------------- Since “name” is an argument to the "Mock" constructor, if you want your mock object to have a “name” attribute you can’t just pass it in at creation time. There are two alternatives. One option is to use "configure_mock()": >>> mock = MagicMock() >>> mock.configure_mock(name='my_name') >>> mock.name 'my_name' A simpler option is to simply set the “name” attribute after mock creation: >>> mock = MagicMock() >>> mock.name = "foo" Attaching Mocks as Attributes ----------------------------- When you attach a mock as an attribute of another mock (or as the return value) it becomes a “child” of that mock. Calls to the child are recorded in the "method_calls" and "mock_calls" attributes of the parent. This is useful for configuring child mocks and then attaching them to the parent, or for attaching mocks to a parent that records all calls to the children and allows you to make assertions about the order of calls between mocks: >>> parent = MagicMock() >>> child1 = MagicMock(return_value=None) >>> child2 = MagicMock(return_value=None) >>> parent.child1 = child1 >>> parent.child2 = child2 >>> child1(1) >>> child2(2) >>> parent.mock_calls [call.child1(1), call.child2(2)] The exception to this is if the mock has a name. This allows you to prevent the “parenting” if for some reason you don’t want it to happen. >>> mock = MagicMock() >>> not_a_child = MagicMock(name='not-a-child') >>> mock.attribute = not_a_child >>> mock.attribute() >>> mock.mock_calls [] Mocks created for you by "patch()" are automatically given names. To attach mocks that have names to a parent you use the "attach_mock()" method: >>> thing1 = object() >>> thing2 = object() >>> parent = MagicMock() >>> with patch('__main__.thing1', return_value=None) as child1: ... with patch('__main__.thing2', return_value=None) as child2: ... parent.attach_mock(child1, 'child1') ... parent.attach_mock(child2, 'child2') ... child1('one') ... child2('two') ... >>> parent.mock_calls [call.child1('one'), call.child2('two')] [1] The only exceptions are magic methods and attributes (those that have leading and trailing double underscores). Mock doesn’t create these but instead raises an "AttributeError". This is because the interpreter will often implicitly request these methods, and gets *very* confused to get a new Mock object when it expects a magic method. If you need magic method support see magic methods. The patchers ============ The patch decorators are used for patching objects only within the scope of the function they decorate. They automatically handle the unpatching for you, even if exceptions are raised. All of these functions can also be used in with statements or as class decorators. patch ----- Note: The key is to do the patching in the right namespace. See the section where to patch. unittest.mock.patch(target, new=DEFAULT, spec=None, create=False, spec_set=None, autospec=None, new_callable=None, **kwargs) "patch()" acts as a function decorator, class decorator or a context manager. Inside the body of the function or with statement, the *target* is patched with a *new* object. When the function/with statement exits the patch is undone. If *new* is omitted, then the target is replaced with an "AsyncMock" if the patched object is an async function or a "MagicMock" otherwise. If "patch()" is used as a decorator and *new* is omitted, the created mock is passed in as an extra argument to the decorated function. If "patch()" is used as a context manager the created mock is returned by the context manager. *target* should be a string in the form "'package.module.ClassName'". The *target* is imported and the specified object replaced with the *new* object, so the *target* must be importable from the environment you are calling "patch()" from. The target is imported when the decorated function is executed, not at decoration time. The *spec* and *spec_set* keyword arguments are passed to the "MagicMock" if patch is creating one for you. In addition you can pass "spec=True" or "spec_set=True", which causes patch to pass in the object being mocked as the spec/spec_set object. *new_callable* allows you to specify a different class, or callable object, that will be called to create the *new* object. By default "AsyncMock" is used for async functions and "MagicMock" for the rest. A more powerful form of *spec* is *autospec*. If you set "autospec=True" then the mock will be created with a spec from the object being replaced. All attributes of the mock will also have the spec of the corresponding attribute of the object being replaced. Methods and functions being mocked will have their arguments checked and will raise a "TypeError" if they are called with the wrong signature. For mocks replacing a class, their return value (the ‘instance’) will have the same spec as the class. See the "create_autospec()" function and Autospeccing. Instead of "autospec=True" you can pass "autospec=some_object" to use an arbitrary object as the spec instead of the one being replaced. By default "patch()" will fail to replace attributes that don’t exist. If you pass in "create=True", and the attribute doesn’t exist, patch will create the attribute for you when the patched function is called, and delete it again after the patched function has exited. This is useful for writing tests against attributes that your production code creates at runtime. It is off by default because it can be dangerous. With it switched on you can write passing tests against APIs that don’t actually exist! Note: Changed in version 3.5: If you are patching builtins in a module then you don’t need to pass "create=True", it will be added by default. Patch can be used as a "TestCase" class decorator. It works by decorating each test method in the class. This reduces the boilerplate code when your test methods share a common patchings set. "patch()" finds tests by looking for method names that start with "patch.TEST_PREFIX". By default this is "'test'", which matches the way "unittest" finds tests. You can specify an alternative prefix by setting "patch.TEST_PREFIX". Patch can be used as a context manager, with the with statement. Here the patching applies to the indented block after the with statement. If you use “as” then the patched object will be bound to the name after the “as”; very useful if "patch()" is creating a mock object for you. "patch()" takes arbitrary keyword arguments. These will be passed to "AsyncMock" if the patched object is asynchronous, to "MagicMock" otherwise or to *new_callable* if specified. "patch.dict(...)", "patch.multiple(...)" and "patch.object(...)" are available for alternate use-cases. "patch()" as function decorator, creating the mock for you and passing it into the decorated function: >>> @patch('__main__.SomeClass') ... def function(normal_argument, mock_class): ... print(mock_class is SomeClass) ... >>> function(None) True Patching a class replaces the class with a "MagicMock" *instance*. If the class is instantiated in the code under test then it will be the "return_value" of the mock that will be used. If the class is instantiated multiple times you could use "side_effect" to return a new mock each time. Alternatively you can set the *return_value* to be anything you want. To configure return values on methods of *instances* on the patched class you must do this on the "return_value". For example: >>> class Class: ... def method(self): ... pass ... >>> with patch('__main__.Class') as MockClass: ... instance = MockClass.return_value ... instance.method.return_value = 'foo' ... assert Class() is instance ... assert Class().method() == 'foo' ... If you use *spec* or *spec_set* and "patch()" is replacing a *class*, then the return value of the created mock will have the same spec. >>> Original = Class >>> patcher = patch('__main__.Class', spec=True) >>> MockClass = patcher.start() >>> instance = MockClass() >>> assert isinstance(instance, Original) >>> patcher.stop() The *new_callable* argument is useful where you want to use an alternative class to the default "MagicMock" for the created mock. For example, if you wanted a "NonCallableMock" to be used: >>> thing = object() >>> with patch('__main__.thing', new_callable=NonCallableMock) as mock_thing: ... assert thing is mock_thing ... thing() ... Traceback (most recent call last): ... TypeError: 'NonCallableMock' object is not callable Another use case might be to replace an object with an "io.StringIO" instance: >>> from io import StringIO >>> def foo(): ... print('Something') ... >>> @patch('sys.stdout', new_callable=StringIO) ... def test(mock_stdout): ... foo() ... assert mock_stdout.getvalue() == 'Something\n' ... >>> test() When "patch()" is creating a mock for you, it is common that the first thing you need to do is to configure the mock. Some of that configuration can be done in the call to patch. Any arbitrary keywords you pass into the call will be used to set attributes on the created mock: >>> patcher = patch('__main__.thing', first='one', second='two') >>> mock_thing = patcher.start() >>> mock_thing.first 'one' >>> mock_thing.second 'two' As well as attributes on the created mock attributes, like the "return_value" and "side_effect", of child mocks can also be configured. These aren’t syntactically valid to pass in directly as keyword arguments, but a dictionary with these as keys can still be expanded into a "patch()" call using "**": >>> config = {'method.return_value': 3, 'other.side_effect': KeyError} >>> patcher = patch('__main__.thing', **config) >>> mock_thing = patcher.start() >>> mock_thing.method() 3 >>> mock_thing.other() Traceback (most recent call last): ... KeyError By default, attempting to patch a function in a module (or a method or an attribute in a class) that does not exist will fail with "AttributeError": >>> @patch('sys.non_existing_attribute', 42) ... def test(): ... assert sys.non_existing_attribute == 42 ... >>> test() Traceback (most recent call last): ... AttributeError: does not have the attribute 'non_existing_attribute' but adding "create=True" in the call to "patch()" will make the previous example work as expected: >>> @patch('sys.non_existing_attribute', 42, create=True) ... def test(mock_stdout): ... assert sys.non_existing_attribute == 42 ... >>> test() Changed in version 3.8: "patch()" now returns an "AsyncMock" if the target is an async function. patch.object ------------ patch.object(target, attribute, new=DEFAULT, spec=None, create=False, spec_set=None, autospec=None, new_callable=None, **kwargs) patch the named member (*attribute*) on an object (*target*) with a mock object. "patch.object()" can be used as a decorator, class decorator or a context manager. Arguments *new*, *spec*, *create*, *spec_set*, *autospec* and *new_callable* have the same meaning as for "patch()". Like "patch()", "patch.object()" takes arbitrary keyword arguments for configuring the mock object it creates. When used as a class decorator "patch.object()" honours "patch.TEST_PREFIX" for choosing which methods to wrap. You can either call "patch.object()" with three arguments or two arguments. The three argument form takes the object to be patched, the attribute name and the object to replace the attribute with. When calling with the two argument form you omit the replacement object, and a mock is created for you and passed in as an extra argument to the decorated function: >>> @patch.object(SomeClass, 'class_method') ... def test(mock_method): ... SomeClass.class_method(3) ... mock_method.assert_called_with(3) ... >>> test() *spec*, *create* and the other arguments to "patch.object()" have the same meaning as they do for "patch()". patch.dict ---------- patch.dict(in_dict, values=(), clear=False, **kwargs) Patch a dictionary, or dictionary like object, and restore the dictionary to its original state after the test. *in_dict* can be a dictionary or a mapping like container. If it is a mapping then it must at least support getting, setting and deleting items plus iterating over keys. *in_dict* can also be a string specifying the name of the dictionary, which will then be fetched by importing it. *values* can be a dictionary of values to set in the dictionary. *values* can also be an iterable of "(key, value)" pairs. If *clear* is true then the dictionary will be cleared before the new values are set. "patch.dict()" can also be called with arbitrary keyword arguments to set values in the dictionary. Changed in version 3.8: "patch.dict()" now returns the patched dictionary when used as a context manager. "patch.dict()" can be used as a context manager, decorator or class decorator: >>> foo = {} >>> @patch.dict(foo, {'newkey': 'newvalue'}) ... def test(): ... assert foo == {'newkey': 'newvalue'} >>> test() >>> assert foo == {} When used as a class decorator "patch.dict()" honours "patch.TEST_PREFIX" (default to "'test'") for choosing which methods to wrap: >>> import os >>> import unittest >>> from unittest.mock import patch >>> @patch.dict('os.environ', {'newkey': 'newvalue'}) ... class TestSample(unittest.TestCase): ... def test_sample(self): ... self.assertEqual(os.environ['newkey'], 'newvalue') If you want to use a different prefix for your test, you can inform the patchers of the different prefix by setting "patch.TEST_PREFIX". For more details about how to change the value of see TEST_PREFIX. "patch.dict()" can be used to add members to a dictionary, or simply let a test change a dictionary, and ensure the dictionary is restored when the test ends. >>> foo = {} >>> with patch.dict(foo, {'newkey': 'newvalue'}) as patched_foo: ... assert foo == {'newkey': 'newvalue'} ... assert patched_foo == {'newkey': 'newvalue'} ... # You can add, update or delete keys of foo (or patched_foo, it's the same dict) ... patched_foo['spam'] = 'eggs' ... >>> assert foo == {} >>> assert patched_foo == {} >>> import os >>> with patch.dict('os.environ', {'newkey': 'newvalue'}): ... print(os.environ['newkey']) ... newvalue >>> assert 'newkey' not in os.environ Keywords can be used in the "patch.dict()" call to set values in the dictionary: >>> mymodule = MagicMock() >>> mymodule.function.return_value = 'fish' >>> with patch.dict('sys.modules', mymodule=mymodule): ... import mymodule ... mymodule.function('some', 'args') ... 'fish' "patch.dict()" can be used with dictionary like objects that aren’t actually dictionaries. At the very minimum they must support item getting, setting, deleting and either iteration or membership test. This corresponds to the magic methods "__getitem__()", "__setitem__()", "__delitem__()" and either "__iter__()" or "__contains__()". >>> class Container: ... def __init__(self): ... self.values = {} ... def __getitem__(self, name): ... return self.values[name] ... def __setitem__(self, name, value): ... self.values[name] = value ... def __delitem__(self, name): ... del self.values[name] ... def __iter__(self): ... return iter(self.values) ... >>> thing = Container() >>> thing['one'] = 1 >>> with patch.dict(thing, one=2, two=3): ... assert thing['one'] == 2 ... assert thing['two'] == 3 ... >>> assert thing['one'] == 1 >>> assert list(thing) == ['one'] patch.multiple -------------- patch.multiple(target, spec=None, create=False, spec_set=None, autospec=None, new_callable=None, **kwargs) Perform multiple patches in a single call. It takes the object to be patched (either as an object or a string to fetch the object by importing) and keyword arguments for the patches: with patch.multiple(settings, FIRST_PATCH='one', SECOND_PATCH='two'): ... Use "DEFAULT" as the value if you want "patch.multiple()" to create mocks for you. In this case the created mocks are passed into a decorated function by keyword, and a dictionary is returned when "patch.multiple()" is used as a context manager. "patch.multiple()" can be used as a decorator, class decorator or a context manager. The arguments *spec*, *spec_set*, *create*, *autospec* and *new_callable* have the same meaning as for "patch()". These arguments will be applied to *all* patches done by "patch.multiple()". When used as a class decorator "patch.multiple()" honours "patch.TEST_PREFIX" for choosing which methods to wrap. If you want "patch.multiple()" to create mocks for you, then you can use "DEFAULT" as the value. If you use "patch.multiple()" as a decorator then the created mocks are passed into the decorated function by keyword. >>> thing = object() >>> other = object() >>> @patch.multiple('__main__', thing=DEFAULT, other=DEFAULT) ... def test_function(thing, other): ... assert isinstance(thing, MagicMock) ... assert isinstance(other, MagicMock) ... >>> test_function() "patch.multiple()" can be nested with other "patch" decorators, but put arguments passed by keyword *after* any of the standard arguments created by "patch()": >>> @patch('sys.exit') ... @patch.multiple('__main__', thing=DEFAULT, other=DEFAULT) ... def test_function(mock_exit, other, thing): ... assert 'other' in repr(other) ... assert 'thing' in repr(thing) ... assert 'exit' in repr(mock_exit) ... >>> test_function() If "patch.multiple()" is used as a context manager, the value returned by the context manager is a dictionary where created mocks are keyed by name: >>> with patch.multiple('__main__', thing=DEFAULT, other=DEFAULT) as values: ... assert 'other' in repr(values['other']) ... assert 'thing' in repr(values['thing']) ... assert values['thing'] is thing ... assert values['other'] is other ... patch methods: start and stop ----------------------------- All the patchers have "start()" and "stop()" methods. These make it simpler to do patching in "setUp" methods or where you want to do multiple patches without nesting decorators or with statements. To use them call "patch()", "patch.object()" or "patch.dict()" as normal and keep a reference to the returned "patcher" object. You can then call "start()" to put the patch in place and "stop()" to undo it. If you are using "patch()" to create a mock for you then it will be returned by the call to "patcher.start". >>> patcher = patch('package.module.ClassName') >>> from package import module >>> original = module.ClassName >>> new_mock = patcher.start() >>> assert module.ClassName is not original >>> assert module.ClassName is new_mock >>> patcher.stop() >>> assert module.ClassName is original >>> assert module.ClassName is not new_mock A typical use case for this might be for doing multiple patches in the "setUp" method of a "TestCase": >>> class MyTest(unittest.TestCase): ... def setUp(self): ... self.patcher1 = patch('package.module.Class1') ... self.patcher2 = patch('package.module.Class2') ... self.MockClass1 = self.patcher1.start() ... self.MockClass2 = self.patcher2.start() ... ... def tearDown(self): ... self.patcher1.stop() ... self.patcher2.stop() ... ... def test_something(self): ... assert package.module.Class1 is self.MockClass1 ... assert package.module.Class2 is self.MockClass2 ... >>> MyTest('test_something').run() Caution: If you use this technique you must ensure that the patching is “undone” by calling "stop". This can be fiddlier than you might think, because if an exception is raised in the "setUp" then "tearDown" is not called. "unittest.TestCase.addCleanup()" makes this easier: >>> class MyTest(unittest.TestCase): ... def setUp(self): ... patcher = patch('package.module.Class') ... self.MockClass = patcher.start() ... self.addCleanup(patcher.stop) ... ... def test_something(self): ... assert package.module.Class is self.MockClass ... As an added bonus you no longer need to keep a reference to the "patcher" object. It is also possible to stop all patches which have been started by using "patch.stopall()". patch.stopall() Stop all active patches. Only stops patches started with "start". patch builtins -------------- You can patch any builtins within a module. The following example patches builtin "ord()": >>> @patch('__main__.ord') ... def test(mock_ord): ... mock_ord.return_value = 101 ... print(ord('c')) ... >>> test() 101 TEST_PREFIX ----------- All of the patchers can be used as class decorators. When used in this way they wrap every test method on the class. The patchers recognise methods that start with "'test'" as being test methods. This is the same way that the "unittest.TestLoader" finds test methods by default. It is possible that you want to use a different prefix for your tests. You can inform the patchers of the different prefix by setting "patch.TEST_PREFIX": >>> patch.TEST_PREFIX = 'foo' >>> value = 3 >>> >>> @patch('__main__.value', 'not three') ... class Thing: ... def foo_one(self): ... print(value) ... def foo_two(self): ... print(value) ... >>> >>> Thing().foo_one() not three >>> Thing().foo_two() not three >>> value 3 Nesting Patch Decorators ------------------------ If you want to perform multiple patches then you can simply stack up the decorators. You can stack up multiple patch decorators using this pattern: >>> @patch.object(SomeClass, 'class_method') ... @patch.object(SomeClass, 'static_method') ... def test(mock1, mock2): ... assert SomeClass.static_method is mock1 ... assert SomeClass.class_method is mock2 ... SomeClass.static_method('foo') ... SomeClass.class_method('bar') ... return mock1, mock2 ... >>> mock1, mock2 = test() >>> mock1.assert_called_once_with('foo') >>> mock2.assert_called_once_with('bar') Note that the decorators are applied from the bottom upwards. This is the standard way that Python applies decorators. The order of the created mocks passed into your test function matches this order. Where to patch -------------- "patch()" works by (temporarily) changing the object that a *name* points to with another one. There can be many names pointing to any individual object, so for patching to work you must ensure that you patch the name used by the system under test. The basic principle is that you patch where an object is *looked up*, which is not necessarily the same place as where it is defined. A couple of examples will help to clarify this. Imagine we have a project that we want to test with the following structure: a.py -> Defines SomeClass b.py -> from a import SomeClass -> some_function instantiates SomeClass Now we want to test "some_function" but we want to mock out "SomeClass" using "patch()". The problem is that when we import module b, which we will have to do then it imports "SomeClass" from module a. If we use "patch()" to mock out "a.SomeClass" then it will have no effect on our test; module b already has a reference to the *real* "SomeClass" and it looks like our patching had no effect. The key is to patch out "SomeClass" where it is used (or where it is looked up). In this case "some_function" will actually look up "SomeClass" in module b, where we have imported it. The patching should look like: @patch('b.SomeClass') However, consider the alternative scenario where instead of "from a import SomeClass" module b does "import a" and "some_function" uses "a.SomeClass". Both of these import forms are common. In this case the class we want to patch is being looked up in the module and so we have to patch "a.SomeClass" instead: @patch('a.SomeClass') Patching Descriptors and Proxy Objects -------------------------------------- Both patch and patch.object correctly patch and restore descriptors: class methods, static methods and properties. You should patch these on the *class* rather than an instance. They also work with *some* objects that proxy attribute access, like the django settings object. MagicMock and magic method support ================================== Mocking Magic Methods --------------------- "Mock" supports mocking the Python protocol methods, also known as “magic methods”. This allows mock objects to replace containers or other objects that implement Python protocols. Because magic methods are looked up differently from normal methods [2], this support has been specially implemented. This means that only specific magic methods are supported. The supported list includes *almost* all of them. If there are any missing that you need please let us know. You mock magic methods by setting the method you are interested in to a function or a mock instance. If you are using a function then it *must* take "self" as the first argument [3]. >>> def __str__(self): ... return 'fooble' ... >>> mock = Mock() >>> mock.__str__ = __str__ >>> str(mock) 'fooble' >>> mock = Mock() >>> mock.__str__ = Mock() >>> mock.__str__.return_value = 'fooble' >>> str(mock) 'fooble' >>> mock = Mock() >>> mock.__iter__ = Mock(return_value=iter([])) >>> list(mock) [] One use case for this is for mocking objects used as context managers in a "with" statement: >>> mock = Mock() >>> mock.__enter__ = Mock(return_value='foo') >>> mock.__exit__ = Mock(return_value=False) >>> with mock as m: ... assert m == 'foo' ... >>> mock.__enter__.assert_called_with() >>> mock.__exit__.assert_called_with(None, None, None) Calls to magic methods do not appear in "method_calls", but they are recorded in "mock_calls". Note: If you use the *spec* keyword argument to create a mock then attempting to set a magic method that isn’t in the spec will raise an "AttributeError". The full list of supported magic methods is: * "__hash__", "__sizeof__", "__repr__" and "__str__" * "__dir__", "__format__" and "__subclasses__" * "__round__", "__floor__", "__trunc__" and "__ceil__" * Comparisons: "__lt__", "__gt__", "__le__", "__ge__", "__eq__" and "__ne__" * Container methods: "__getitem__", "__setitem__", "__delitem__", "__contains__", "__len__", "__iter__", "__reversed__" and "__missing__" * Context manager: "__enter__", "__exit__", "__aenter__" and "__aexit__" * Unary numeric methods: "__neg__", "__pos__" and "__invert__" * The numeric methods (including right hand and in-place variants): "__add__", "__sub__", "__mul__", "__matmul__", "__div__", "__truediv__", "__floordiv__", "__mod__", "__divmod__", "__lshift__", "__rshift__", "__and__", "__xor__", "__or__", and "__pow__" * Numeric conversion methods: "__complex__", "__int__", "__float__" and "__index__" * Descriptor methods: "__get__", "__set__" and "__delete__" * Pickling: "__reduce__", "__reduce_ex__", "__getinitargs__", "__getnewargs__", "__getstate__" and "__setstate__" * File system path representation: "__fspath__" * Asynchronous iteration methods: "__aiter__" and "__anext__" Changed in version 3.8: Added support for "os.PathLike.__fspath__()". Changed in version 3.8: Added support for "__aenter__", "__aexit__", "__aiter__" and "__anext__". The following methods exist but are *not* supported as they are either in use by mock, can’t be set dynamically, or can cause problems: * "__getattr__", "__setattr__", "__init__" and "__new__" * "__prepare__", "__instancecheck__", "__subclasscheck__", "__del__" Magic Mock ---------- There are two "MagicMock" variants: "MagicMock" and "NonCallableMagicMock". class unittest.mock.MagicMock(*args, **kw) "MagicMock" is a subclass of "Mock" with default implementations of most of the magic methods. You can use "MagicMock" without having to configure the magic methods yourself. The constructor parameters have the same meaning as for "Mock". If you use the *spec* or *spec_set* arguments then *only* magic methods that exist in the spec will be created. class unittest.mock.NonCallableMagicMock(*args, **kw) A non-callable version of "MagicMock". The constructor parameters have the same meaning as for "MagicMock", with the exception of *return_value* and *side_effect* which have no meaning on a non-callable mock. The magic methods are setup with "MagicMock" objects, so you can configure them and use them in the usual way: >>> mock = MagicMock() >>> mock[3] = 'fish' >>> mock.__setitem__.assert_called_with(3, 'fish') >>> mock.__getitem__.return_value = 'result' >>> mock[2] 'result' By default many of the protocol methods are required to return objects of a specific type. These methods are preconfigured with a default return value, so that they can be used without you having to do anything if you aren’t interested in the return value. You can still *set* the return value manually if you want to change the default. Methods and their defaults: * "__lt__": "NotImplemented" * "__gt__": "NotImplemented" * "__le__": "NotImplemented" * "__ge__": "NotImplemented" * "__int__": "1" * "__contains__": "False" * "__len__": "0" * "__iter__": "iter([])" * "__exit__": "False" * "__aexit__": "False" * "__complex__": "1j" * "__float__": "1.0" * "__bool__": "True" * "__index__": "1" * "__hash__": default hash for the mock * "__str__": default str for the mock * "__sizeof__": default sizeof for the mock For example: >>> mock = MagicMock() >>> int(mock) 1 >>> len(mock) 0 >>> list(mock) [] >>> object() in mock False The two equality methods, "__eq__()" and "__ne__()", are special. They do the default equality comparison on identity, using the "side_effect" attribute, unless you change their return value to return something else: >>> MagicMock() == 3 False >>> MagicMock() != 3 True >>> mock = MagicMock() >>> mock.__eq__.return_value = True >>> mock == 3 True The return value of "MagicMock.__iter__()" can be any iterable object and isn’t required to be an iterator: >>> mock = MagicMock() >>> mock.__iter__.return_value = ['a', 'b', 'c'] >>> list(mock) ['a', 'b', 'c'] >>> list(mock) ['a', 'b', 'c'] If the return value *is* an iterator, then iterating over it once will consume it and subsequent iterations will result in an empty list: >>> mock.__iter__.return_value = iter(['a', 'b', 'c']) >>> list(mock) ['a', 'b', 'c'] >>> list(mock) [] "MagicMock" has all of the supported magic methods configured except for some of the obscure and obsolete ones. You can still set these up if you want. Magic methods that are supported but not setup by default in "MagicMock" are: * "__subclasses__" * "__dir__" * "__format__" * "__get__", "__set__" and "__delete__" * "__reversed__" and "__missing__" * "__reduce__", "__reduce_ex__", "__getinitargs__", "__getnewargs__", "__getstate__" and "__setstate__" * "__getformat__" and "__setformat__" [2] Magic methods *should* be looked up on the class rather than the instance. Different versions of Python are inconsistent about applying this rule. The supported protocol methods should work with all supported versions of Python. [3] The function is basically hooked up to the class, but each "Mock" instance is kept isolated from the others. Helpers ======= sentinel -------- unittest.mock.sentinel The "sentinel" object provides a convenient way of providing unique objects for your tests. Attributes are created on demand when you access them by name. Accessing the same attribute will always return the same object. The objects returned have a sensible repr so that test failure messages are readable. Changed in version 3.7: The "sentinel" attributes now preserve their identity when they are "copied" or "pickled". Sometimes when testing you need to test that a specific object is passed as an argument to another method, or returned. It can be common to create named sentinel objects to test this. "sentinel" provides a convenient way of creating and testing the identity of objects like this. In this example we monkey patch "method" to return "sentinel.some_object": >>> real = ProductionClass() >>> real.method = Mock(name="method") >>> real.method.return_value = sentinel.some_object >>> result = real.method() >>> assert result is sentinel.some_object >>> result sentinel.some_object DEFAULT ------- unittest.mock.DEFAULT The "DEFAULT" object is a pre-created sentinel (actually "sentinel.DEFAULT"). It can be used by "side_effect" functions to indicate that the normal return value should be used. call ---- unittest.mock.call(*args, **kwargs) "call()" is a helper object for making simpler assertions, for comparing with "call_args", "call_args_list", "mock_calls" and "method_calls". "call()" can also be used with "assert_has_calls()". >>> m = MagicMock(return_value=None) >>> m(1, 2, a='foo', b='bar') >>> m() >>> m.call_args_list == [call(1, 2, a='foo', b='bar'), call()] True call.call_list() For a call object that represents multiple calls, "call_list()" returns a list of all the intermediate calls as well as the final call. "call_list" is particularly useful for making assertions on “chained calls”. A chained call is multiple calls on a single line of code. This results in multiple entries in "mock_calls" on a mock. Manually constructing the sequence of calls can be tedious. "call_list()" can construct the sequence of calls from the same chained call: >>> m = MagicMock() >>> m(1).method(arg='foo').other('bar')(2.0) >>> kall = call(1).method(arg='foo').other('bar')(2.0) >>> kall.call_list() [call(1), call().method(arg='foo'), call().method().other('bar'), call().method().other()(2.0)] >>> m.mock_calls == kall.call_list() True A "call" object is either a tuple of (positional args, keyword args) or (name, positional args, keyword args) depending on how it was constructed. When you construct them yourself this isn’t particularly interesting, but the "call" objects that are in the "Mock.call_args", "Mock.call_args_list" and "Mock.mock_calls" attributes can be introspected to get at the individual arguments they contain. The "call" objects in "Mock.call_args" and "Mock.call_args_list" are two-tuples of (positional args, keyword args) whereas the "call" objects in "Mock.mock_calls", along with ones you construct yourself, are three-tuples of (name, positional args, keyword args). You can use their “tupleness” to pull out the individual arguments for more complex introspection and assertions. The positional arguments are a tuple (an empty tuple if there are no positional arguments) and the keyword arguments are a dictionary: >>> m = MagicMock(return_value=None) >>> m(1, 2, 3, arg='one', arg2='two') >>> kall = m.call_args >>> kall.args (1, 2, 3) >>> kall.kwargs {'arg': 'one', 'arg2': 'two'} >>> kall.args is kall[0] True >>> kall.kwargs is kall[1] True >>> m = MagicMock() >>> m.foo(4, 5, 6, arg='two', arg2='three') >>> kall = m.mock_calls[0] >>> name, args, kwargs = kall >>> name 'foo' >>> args (4, 5, 6) >>> kwargs {'arg': 'two', 'arg2': 'three'} >>> name is m.mock_calls[0][0] True create_autospec --------------- unittest.mock.create_autospec(spec, spec_set=False, instance=False, **kwargs) Create a mock object using another object as a spec. Attributes on the mock will use the corresponding attribute on the *spec* object as their spec. Functions or methods being mocked will have their arguments checked to ensure that they are called with the correct signature. If *spec_set* is "True" then attempting to set attributes that don’t exist on the spec object will raise an "AttributeError". If a class is used as a spec then the return value of the mock (the instance of the class) will have the same spec. You can use a class as the spec for an instance object by passing "instance=True". The returned mock will only be callable if instances of the mock are callable. "create_autospec()" also takes arbitrary keyword arguments that are passed to the constructor of the created mock. See Autospeccing for examples of how to use auto-speccing with "create_autospec()" and the *autospec* argument to "patch()". Changed in version 3.8: "create_autospec()" now returns an "AsyncMock" if the target is an async function. ANY --- unittest.mock.ANY Sometimes you may need to make assertions about *some* of the arguments in a call to mock, but either not care about some of the arguments or want to pull them individually out of "call_args" and make more complex assertions on them. To ignore certain arguments you can pass in objects that compare equal to *everything*. Calls to "assert_called_with()" and "assert_called_once_with()" will then succeed no matter what was passed in. >>> mock = Mock(return_value=None) >>> mock('foo', bar=object()) >>> mock.assert_called_once_with('foo', bar=ANY) "ANY" can also be used in comparisons with call lists like "mock_calls": >>> m = MagicMock(return_value=None) >>> m(1) >>> m(1, 2) >>> m(object()) >>> m.mock_calls == [call(1), call(1, 2), ANY] True FILTER_DIR ---------- unittest.mock.FILTER_DIR "FILTER_DIR" is a module level variable that controls the way mock objects respond to "dir()" (only for Python 2.6 or more recent). The default is "True", which uses the filtering described below, to only show useful members. If you dislike this filtering, or need to switch it off for diagnostic purposes, then set "mock.FILTER_DIR = False". With filtering on, "dir(some_mock)" shows only useful attributes and will include any dynamically created attributes that wouldn’t normally be shown. If the mock was created with a *spec* (or *autospec* of course) then all the attributes from the original are shown, even if they haven’t been accessed yet: >>> dir(Mock()) ['assert_any_call', 'assert_called', 'assert_called_once', 'assert_called_once_with', 'assert_called_with', 'assert_has_calls', 'assert_not_called', 'attach_mock', ... >>> from urllib import request >>> dir(Mock(spec=request)) ['AbstractBasicAuthHandler', 'AbstractDigestAuthHandler', 'AbstractHTTPHandler', 'BaseHandler', ... Many of the not-very-useful (private to "Mock" rather than the thing being mocked) underscore and double underscore prefixed attributes have been filtered from the result of calling "dir()" on a "Mock". If you dislike this behaviour you can switch it off by setting the module level switch "FILTER_DIR": >>> from unittest import mock >>> mock.FILTER_DIR = False >>> dir(mock.Mock()) ['_NonCallableMock__get_return_value', '_NonCallableMock__get_side_effect', '_NonCallableMock__return_value_doc', '_NonCallableMock__set_return_value', '_NonCallableMock__set_side_effect', '__call__', '__class__', ... Alternatively you can just use "vars(my_mock)" (instance members) and "dir(type(my_mock))" (type members) to bypass the filtering irrespective of "mock.FILTER_DIR". mock_open --------- unittest.mock.mock_open(mock=None, read_data=None) A helper function to create a mock to replace the use of "open()". It works for "open()" called directly or used as a context manager. The *mock* argument is the mock object to configure. If "None" (the default) then a "MagicMock" will be created for you, with the API limited to methods or attributes available on standard file handles. *read_data* is a string for the "read()", "readline()", and "readlines()" methods of the file handle to return. Calls to those methods will take data from *read_data* until it is depleted. The mock of these methods is pretty simplistic: every time the *mock* is called, the *read_data* is rewound to the start. If you need more control over the data that you are feeding to the tested code you will need to customize this mock for yourself. When that is insufficient, one of the in-memory filesystem packages on PyPI can offer a realistic filesystem for testing. Changed in version 3.4: Added "readline()" and "readlines()" support. The mock of "read()" changed to consume *read_data* rather than returning it on each call. Changed in version 3.5: *read_data* is now reset on each call to the *mock*. Changed in version 3.8: Added "__iter__()" to implementation so that iteration (such as in for loops) correctly consumes *read_data*. Using "open()" as a context manager is a great way to ensure your file handles are closed properly and is becoming common: with open('/some/path', 'w') as f: f.write('something') The issue is that even if you mock out the call to "open()" it is the *returned object* that is used as a context manager (and has "__enter__()" and "__exit__()" called). Mocking context managers with a "MagicMock" is common enough and fiddly enough that a helper function is useful. >>> m = mock_open() >>> with patch('__main__.open', m): ... with open('foo', 'w') as h: ... h.write('some stuff') ... >>> m.mock_calls [call('foo', 'w'), call().__enter__(), call().write('some stuff'), call().__exit__(None, None, None)] >>> m.assert_called_once_with('foo', 'w') >>> handle = m() >>> handle.write.assert_called_once_with('some stuff') And for reading files: >>> with patch('__main__.open', mock_open(read_data='bibble')) as m: ... with open('foo') as h: ... result = h.read() ... >>> m.assert_called_once_with('foo') >>> assert result == 'bibble' Autospeccing ------------ Autospeccing is based on the existing "spec" feature of mock. It limits the api of mocks to the api of an original object (the spec), but it is recursive (implemented lazily) so that attributes of mocks only have the same api as the attributes of the spec. In addition mocked functions / methods have the same call signature as the original so they raise a "TypeError" if they are called incorrectly. Before I explain how auto-speccing works, here’s why it is needed. "Mock" is a very powerful and flexible object, but it suffers from two flaws when used to mock out objects from a system under test. One of these flaws is specific to the "Mock" api and the other is a more general problem with using mock objects. First the problem specific to "Mock". "Mock" has two assert methods that are extremely handy: "assert_called_with()" and "assert_called_once_with()". >>> mock = Mock(name='Thing', return_value=None) >>> mock(1, 2, 3) >>> mock.assert_called_once_with(1, 2, 3) >>> mock(1, 2, 3) >>> mock.assert_called_once_with(1, 2, 3) Traceback (most recent call last): ... AssertionError: Expected 'mock' to be called once. Called 2 times. Because mocks auto-create attributes on demand, and allow you to call them with arbitrary arguments, if you misspell one of these assert methods then your assertion is gone: >>> mock = Mock(name='Thing', return_value=None) >>> mock(1, 2, 3) >>> mock.assret_called_once_with(4, 5, 6) # Intentional typo! Your tests can pass silently and incorrectly because of the typo. The second issue is more general to mocking. If you refactor some of your code, rename members and so on, any tests for code that is still using the *old api* but uses mocks instead of the real objects will still pass. This means your tests can all pass even though your code is broken. Note that this is another reason why you need integration tests as well as unit tests. Testing everything in isolation is all fine and dandy, but if you don’t test how your units are “wired together” there is still lots of room for bugs that tests might have caught. "mock" already provides a feature to help with this, called speccing. If you use a class or instance as the "spec" for a mock then you can only access attributes on the mock that exist on the real class: >>> from urllib import request >>> mock = Mock(spec=request.Request) >>> mock.assret_called_with # Intentional typo! Traceback (most recent call last): ... AttributeError: Mock object has no attribute 'assret_called_with' The spec only applies to the mock itself, so we still have the same issue with any methods on the mock: >>> mock.has_data() >>> mock.has_data.assret_called_with() # Intentional typo! Auto-speccing solves this problem. You can either pass "autospec=True" to "patch()" / "patch.object()" or use the "create_autospec()" function to create a mock with a spec. If you use the "autospec=True" argument to "patch()" then the object that is being replaced will be used as the spec object. Because the speccing is done “lazily” (the spec is created as attributes on the mock are accessed) you can use it with very complex or deeply nested objects (like modules that import modules that import modules) without a big performance hit. Here’s an example of it in use: >>> from urllib import request >>> patcher = patch('__main__.request', autospec=True) >>> mock_request = patcher.start() >>> request is mock_request True >>> mock_request.Request You can see that "request.Request" has a spec. "request.Request" takes two arguments in the constructor (one of which is *self*). Here’s what happens if we try to call it incorrectly: >>> req = request.Request() Traceback (most recent call last): ... TypeError: () takes at least 2 arguments (1 given) The spec also applies to instantiated classes (i.e. the return value of specced mocks): >>> req = request.Request('foo') >>> req "Request" objects are not callable, so the return value of instantiating our mocked out "request.Request" is a non-callable mock. With the spec in place any typos in our asserts will raise the correct error: >>> req.add_header('spam', 'eggs') >>> req.add_header.assret_called_with # Intentional typo! Traceback (most recent call last): ... AttributeError: Mock object has no attribute 'assret_called_with' >>> req.add_header.assert_called_with('spam', 'eggs') In many cases you will just be able to add "autospec=True" to your existing "patch()" calls and then be protected against bugs due to typos and api changes. As well as using *autospec* through "patch()" there is a "create_autospec()" for creating autospecced mocks directly: >>> from urllib import request >>> mock_request = create_autospec(request) >>> mock_request.Request('foo', 'bar') This isn’t without caveats and limitations however, which is why it is not the default behaviour. In order to know what attributes are available on the spec object, autospec has to introspect (access attributes) the spec. As you traverse attributes on the mock a corresponding traversal of the original object is happening under the hood. If any of your specced objects have properties or descriptors that can trigger code execution then you may not be able to use autospec. On the other hand it is much better to design your objects so that introspection is safe [4]. A more serious problem is that it is common for instance attributes to be created in the "__init__()" method and not to exist on the class at all. *autospec* can’t know about any dynamically created attributes and restricts the api to visible attributes. >>> class Something: ... def __init__(self): ... self.a = 33 ... >>> with patch('__main__.Something', autospec=True): ... thing = Something() ... thing.a ... Traceback (most recent call last): ... AttributeError: Mock object has no attribute 'a' There are a few different ways of resolving this problem. The easiest, but not necessarily the least annoying, way is to simply set the required attributes on the mock after creation. Just because *autospec* doesn’t allow you to fetch attributes that don’t exist on the spec it doesn’t prevent you setting them: >>> with patch('__main__.Something', autospec=True): ... thing = Something() ... thing.a = 33 ... There is a more aggressive version of both *spec* and *autospec* that *does* prevent you setting non-existent attributes. This is useful if you want to ensure your code only *sets* valid attributes too, but obviously it prevents this particular scenario: >>> with patch('__main__.Something', autospec=True, spec_set=True): ... thing = Something() ... thing.a = 33 ... Traceback (most recent call last): ... AttributeError: Mock object has no attribute 'a' Probably the best way of solving the problem is to add class attributes as default values for instance members initialised in "__init__()". Note that if you are only setting default attributes in "__init__()" then providing them via class attributes (shared between instances of course) is faster too. e.g. class Something: a = 33 This brings up another issue. It is relatively common to provide a default value of "None" for members that will later be an object of a different type. "None" would be useless as a spec because it wouldn’t let you access *any* attributes or methods on it. As "None" is *never* going to be useful as a spec, and probably indicates a member that will normally of some other type, autospec doesn’t use a spec for members that are set to "None". These will just be ordinary mocks (well - MagicMocks): >>> class Something: ... member = None ... >>> mock = create_autospec(Something) >>> mock.member.foo.bar.baz() If modifying your production classes to add defaults isn’t to your liking then there are more options. One of these is simply to use an instance as the spec rather than the class. The other is to create a subclass of the production class and add the defaults to the subclass without affecting the production class. Both of these require you to use an alternative object as the spec. Thankfully "patch()" supports this - you can simply pass the alternative object as the *autospec* argument: >>> class Something: ... def __init__(self): ... self.a = 33 ... >>> class SomethingForTest(Something): ... a = 33 ... >>> p = patch('__main__.Something', autospec=SomethingForTest) >>> mock = p.start() >>> mock.a [4] This only applies to classes or already instantiated objects. Calling a mocked class to create a mock instance *does not* create a real instance. It is only attribute lookups - along with calls to "dir()" - that are done. Sealing mocks ------------- unittest.mock.seal(mock) Seal will disable the automatic creation of mocks when accessing an attribute of the mock being sealed or any of its attributes that are already mocks recursively. If a mock instance with a name or a spec is assigned to an attribute it won’t be considered in the sealing chain. This allows one to prevent seal from fixing part of the mock object. >>> mock = Mock() >>> mock.submock.attribute1 = 2 >>> mock.not_submock = mock.Mock(name="sample_name") >>> seal(mock) >>> mock.new_attribute # This will raise AttributeError. >>> mock.submock.attribute2 # This will raise AttributeError. >>> mock.not_submock.attribute2 # This won't raise. New in version 3.7. "unittest" — Unit testing framework *********************************** **Source code:** Lib/unittest/__init__.py ====================================================================== (If you are already familiar with the basic concepts of testing, you might want to skip to the list of assert methods.) The "unittest" unit testing framework was originally inspired by JUnit and has a similar flavor as major unit testing frameworks in other languages. It supports test automation, sharing of setup and shutdown code for tests, aggregation of tests into collections, and independence of the tests from the reporting framework. To achieve this, "unittest" supports some important concepts in an object-oriented way: test fixture A *test fixture* represents the preparation needed to perform one or more tests, and any associated cleanup actions. This may involve, for example, creating temporary or proxy databases, directories, or starting a server process. test case A *test case* is the individual unit of testing. It checks for a specific response to a particular set of inputs. "unittest" provides a base class, "TestCase", which may be used to create new test cases. test suite A *test suite* is a collection of test cases, test suites, or both. It is used to aggregate tests that should be executed together. test runner A *test runner* is a component which orchestrates the execution of tests and provides the outcome to the user. The runner may use a graphical interface, a textual interface, or return a special value to indicate the results of executing the tests. See also: Module "doctest" Another test-support module with a very different flavor. Simple Smalltalk Testing: With Patterns Kent Beck’s original paper on testing frameworks using the pattern shared by "unittest". pytest Third-party unittest framework with a lighter-weight syntax for writing tests. For example, "assert func(10) == 42". The Python Testing Tools Taxonomy An extensive list of Python testing tools including functional testing frameworks and mock object libraries. Testing in Python Mailing List A special-interest-group for discussion of testing, and testing tools, in Python. The script "Tools/unittestgui/unittestgui.py" in the Python source distribution is a GUI tool for test discovery and execution. This is intended largely for ease of use for those new to unit testing. For production environments it is recommended that tests be driven by a continuous integration system such as Buildbot, Jenkins or Travis-CI, or AppVeyor. Basic example ============= The "unittest" module provides a rich set of tools for constructing and running tests. This section demonstrates that a small subset of the tools suffice to meet the needs of most users. Here is a short script to test three string methods: import unittest class TestStringMethods(unittest.TestCase): def test_upper(self): self.assertEqual('foo'.upper(), 'FOO') def test_isupper(self): self.assertTrue('FOO'.isupper()) self.assertFalse('Foo'.isupper()) def test_split(self): s = 'hello world' self.assertEqual(s.split(), ['hello', 'world']) # check that s.split fails when the separator is not a string with self.assertRaises(TypeError): s.split(2) if __name__ == '__main__': unittest.main() A testcase is created by subclassing "unittest.TestCase". The three individual tests are defined with methods whose names start with the letters "test". This naming convention informs the test runner about which methods represent tests. The crux of each test is a call to "assertEqual()" to check for an expected result; "assertTrue()" or "assertFalse()" to verify a condition; or "assertRaises()" to verify that a specific exception gets raised. These methods are used instead of the "assert" statement so the test runner can accumulate all test results and produce a report. The "setUp()" and "tearDown()" methods allow you to define instructions that will be executed before and after each test method. They are covered in more detail in the section Organizing test code. The final block shows a simple way to run the tests. "unittest.main()" provides a command-line interface to the test script. When run from the command line, the above script produces an output that looks like this: ... ---------------------------------------------------------------------- Ran 3 tests in 0.000s OK Passing the "-v" option to your test script will instruct "unittest.main()" to enable a higher level of verbosity, and produce the following output: test_isupper (__main__.TestStringMethods) ... ok test_split (__main__.TestStringMethods) ... ok test_upper (__main__.TestStringMethods) ... ok ---------------------------------------------------------------------- Ran 3 tests in 0.001s OK The above examples show the most commonly used "unittest" features which are sufficient to meet many everyday testing needs. The remainder of the documentation explores the full feature set from first principles. Command-Line Interface ====================== The unittest module can be used from the command line to run tests from modules, classes or even individual test methods: python -m unittest test_module1 test_module2 python -m unittest test_module.TestClass python -m unittest test_module.TestClass.test_method You can pass in a list with any combination of module names, and fully qualified class or method names. Test modules can be specified by file path as well: python -m unittest tests/test_something.py This allows you to use the shell filename completion to specify the test module. The file specified must still be importable as a module. The path is converted to a module name by removing the ‘.py’ and converting path separators into ‘.’. If you want to execute a test file that isn’t importable as a module you should execute the file directly instead. You can run tests with more detail (higher verbosity) by passing in the -v flag: python -m unittest -v test_module When executed without arguments Test Discovery is started: python -m unittest For a list of all the command-line options: python -m unittest -h Changed in version 3.2: In earlier versions it was only possible to run individual test methods and not modules or classes. Command-line options -------------------- **unittest** supports these command-line options: -b, --buffer The standard output and standard error streams are buffered during the test run. Output during a passing test is discarded. Output is echoed normally on test fail or error and is added to the failure messages. -c, --catch "Control-C" during the test run waits for the current test to end and then reports all the results so far. A second "Control-C" raises the normal "KeyboardInterrupt" exception. See Signal Handling for the functions that provide this functionality. -f, --failfast Stop the test run on the first error or failure. -k Only run test methods and classes that match the pattern or substring. This option may be used multiple times, in which case all test cases that match any of the given patterns are included. Patterns that contain a wildcard character ("*") are matched against the test name using "fnmatch.fnmatchcase()"; otherwise simple case-sensitive substring matching is used. Patterns are matched against the fully qualified test method name as imported by the test loader. For example, "-k foo" matches "foo_tests.SomeTest.test_something", "bar_tests.SomeTest.test_foo", but not "bar_tests.FooTest.test_something". --locals Show local variables in tracebacks. New in version 3.2: The command-line options "-b", "-c" and "-f" were added. New in version 3.5: The command-line option "--locals". New in version 3.7: The command-line option "-k". The command line can also be used for test discovery, for running all of the tests in a project or just a subset. Test Discovery ============== New in version 3.2. Unittest supports simple test discovery. In order to be compatible with test discovery, all of the test files must be modules or packages (including *namespace packages*) importable from the top-level directory of the project (this means that their filenames must be valid identifiers). Test discovery is implemented in "TestLoader.discover()", but can also be used from the command line. The basic command-line usage is: cd project_directory python -m unittest discover Note: As a shortcut, "python -m unittest" is the equivalent of "python -m unittest discover". If you want to pass arguments to test discovery the "discover" sub-command must be used explicitly. The "discover" sub-command has the following options: -v, --verbose Verbose output -s, --start-directory directory Directory to start discovery ("." default) -p, --pattern pattern Pattern to match test files ("test*.py" default) -t, --top-level-directory directory Top level directory of project (defaults to start directory) The "-s", "-p", and "-t" options can be passed in as positional arguments in that order. The following two command lines are equivalent: python -m unittest discover -s project_directory -p "*_test.py" python -m unittest discover project_directory "*_test.py" As well as being a path it is possible to pass a package name, for example "myproject.subpackage.test", as the start directory. The package name you supply will then be imported and its location on the filesystem will be used as the start directory. Caution: Test discovery loads tests by importing them. Once test discovery has found all the test files from the start directory you specify it turns the paths into package names to import. For example "foo/bar/baz.py" will be imported as "foo.bar.baz".If you have a package installed globally and attempt test discovery on a different copy of the package then the import *could* happen from the wrong place. If this happens test discovery will warn you and exit.If you supply the start directory as a package name rather than a path to a directory then discover assumes that whichever location it imports from is the location you intended, so you will not get the warning. Test modules and packages can customize test loading and discovery by through the load_tests protocol. Changed in version 3.4: Test discovery supports *namespace packages* for the start directory. Note that you need to specify the top level directory too (e.g. "python -m unittest discover -s root/namespace -t root"). Organizing test code ==================== The basic building blocks of unit testing are *test cases* — single scenarios that must be set up and checked for correctness. In "unittest", test cases are represented by "unittest.TestCase" instances. To make your own test cases you must write subclasses of "TestCase" or use "FunctionTestCase". The testing code of a "TestCase" instance should be entirely self contained, such that it can be run either in isolation or in arbitrary combination with any number of other test cases. The simplest "TestCase" subclass will simply implement a test method (i.e. a method whose name starts with "test") in order to perform specific testing code: import unittest class DefaultWidgetSizeTestCase(unittest.TestCase): def test_default_widget_size(self): widget = Widget('The widget') self.assertEqual(widget.size(), (50, 50)) Note that in order to test something, we use one of the "assert*()" methods provided by the "TestCase" base class. If the test fails, an exception will be raised with an explanatory message, and "unittest" will identify the test case as a *failure*. Any other exceptions will be treated as *errors*. Tests can be numerous, and their set-up can be repetitive. Luckily, we can factor out set-up code by implementing a method called "setUp()", which the testing framework will automatically call for every single test we run: import unittest class WidgetTestCase(unittest.TestCase): def setUp(self): self.widget = Widget('The widget') def test_default_widget_size(self): self.assertEqual(self.widget.size(), (50,50), 'incorrect default size') def test_widget_resize(self): self.widget.resize(100,150) self.assertEqual(self.widget.size(), (100,150), 'wrong size after resize') Note: The order in which the various tests will be run is determined by sorting the test method names with respect to the built-in ordering for strings. If the "setUp()" method raises an exception while the test is running, the framework will consider the test to have suffered an error, and the test method will not be executed. Similarly, we can provide a "tearDown()" method that tidies up after the test method has been run: import unittest class WidgetTestCase(unittest.TestCase): def setUp(self): self.widget = Widget('The widget') def tearDown(self): self.widget.dispose() If "setUp()" succeeded, "tearDown()" will be run whether the test method succeeded or not. Such a working environment for the testing code is called a *test fixture*. A new TestCase instance is created as a unique test fixture used to execute each individual test method. Thus "setUp()", "tearDown()", and "__init__()" will be called once per test. It is recommended that you use TestCase implementations to group tests together according to the features they test. "unittest" provides a mechanism for this: the *test suite*, represented by "unittest"’s "TestSuite" class. In most cases, calling "unittest.main()" will do the right thing and collect all the module’s test cases for you and execute them. However, should you want to customize the building of your test suite, you can do it yourself: def suite(): suite = unittest.TestSuite() suite.addTest(WidgetTestCase('test_default_widget_size')) suite.addTest(WidgetTestCase('test_widget_resize')) return suite if __name__ == '__main__': runner = unittest.TextTestRunner() runner.run(suite()) You can place the definitions of test cases and test suites in the same modules as the code they are to test (such as "widget.py"), but there are several advantages to placing the test code in a separate module, such as "test_widget.py": * The test module can be run standalone from the command line. * The test code can more easily be separated from shipped code. * There is less temptation to change test code to fit the code it tests without a good reason. * Test code should be modified much less frequently than the code it tests. * Tested code can be refactored more easily. * Tests for modules written in C must be in separate modules anyway, so why not be consistent? * If the testing strategy changes, there is no need to change the source code. Re-using old test code ====================== Some users will find that they have existing test code that they would like to run from "unittest", without converting every old test function to a "TestCase" subclass. For this reason, "unittest" provides a "FunctionTestCase" class. This subclass of "TestCase" can be used to wrap an existing test function. Set-up and tear-down functions can also be provided. Given the following test function: def testSomething(): something = makeSomething() assert something.name is not None # ... one can create an equivalent test case instance as follows, with optional set-up and tear-down methods: testcase = unittest.FunctionTestCase(testSomething, setUp=makeSomethingDB, tearDown=deleteSomethingDB) Note: Even though "FunctionTestCase" can be used to quickly convert an existing test base over to a "unittest"-based system, this approach is not recommended. Taking the time to set up proper "TestCase" subclasses will make future test refactorings infinitely easier. In some cases, the existing tests may have been written using the "doctest" module. If so, "doctest" provides a "DocTestSuite" class that can automatically build "unittest.TestSuite" instances from the existing "doctest"-based tests. Skipping tests and expected failures ==================================== New in version 3.1. Unittest supports skipping individual test methods and even whole classes of tests. In addition, it supports marking a test as an “expected failure,” a test that is broken and will fail, but shouldn’t be counted as a failure on a "TestResult". Skipping a test is simply a matter of using the "skip()" *decorator* or one of its conditional variants, calling "TestCase.skipTest()" within a "setUp()" or test method, or raising "SkipTest" directly. Basic skipping looks like this: class MyTestCase(unittest.TestCase): @unittest.skip("demonstrating skipping") def test_nothing(self): self.fail("shouldn't happen") @unittest.skipIf(mylib.__version__ < (1, 3), "not supported in this library version") def test_format(self): # Tests that work for only a certain version of the library. pass @unittest.skipUnless(sys.platform.startswith("win"), "requires Windows") def test_windows_support(self): # windows specific testing code pass def test_maybe_skipped(self): if not external_resource_available(): self.skipTest("external resource not available") # test code that depends on the external resource pass This is the output of running the example above in verbose mode: test_format (__main__.MyTestCase) ... skipped 'not supported in this library version' test_nothing (__main__.MyTestCase) ... skipped 'demonstrating skipping' test_maybe_skipped (__main__.MyTestCase) ... skipped 'external resource not available' test_windows_support (__main__.MyTestCase) ... skipped 'requires Windows' ---------------------------------------------------------------------- Ran 4 tests in 0.005s OK (skipped=4) Classes can be skipped just like methods: @unittest.skip("showing class skipping") class MySkippedTestCase(unittest.TestCase): def test_not_run(self): pass "TestCase.setUp()" can also skip the test. This is useful when a resource that needs to be set up is not available. Expected failures use the "expectedFailure()" decorator. class ExpectedFailureTestCase(unittest.TestCase): @unittest.expectedFailure def test_fail(self): self.assertEqual(1, 0, "broken") It’s easy to roll your own skipping decorators by making a decorator that calls "skip()" on the test when it wants it to be skipped. This decorator skips the test unless the passed object has a certain attribute: def skipUnlessHasattr(obj, attr): if hasattr(obj, attr): return lambda func: func return unittest.skip("{!r} doesn't have {!r}".format(obj, attr)) The following decorators and exception implement test skipping and expected failures: @unittest.skip(reason) Unconditionally skip the decorated test. *reason* should describe why the test is being skipped. @unittest.skipIf(condition, reason) Skip the decorated test if *condition* is true. @unittest.skipUnless(condition, reason) Skip the decorated test unless *condition* is true. @unittest.expectedFailure Mark the test as an expected failure or error. If the test fails or errors in the test function itself (rather than in one of the *test fixture* methods) then it will be considered a success. If the test passes, it will be considered a failure. exception unittest.SkipTest(reason) This exception is raised to skip a test. Usually you can use "TestCase.skipTest()" or one of the skipping decorators instead of raising this directly. Skipped tests will not have "setUp()" or "tearDown()" run around them. Skipped classes will not have "setUpClass()" or "tearDownClass()" run. Skipped modules will not have "setUpModule()" or "tearDownModule()" run. Distinguishing test iterations using subtests ============================================= New in version 3.4. When there are very small differences among your tests, for instance some parameters, unittest allows you to distinguish them inside the body of a test method using the "subTest()" context manager. For example, the following test: class NumbersTest(unittest.TestCase): def test_even(self): """ Test that numbers between 0 and 5 are all even. """ for i in range(0, 6): with self.subTest(i=i): self.assertEqual(i % 2, 0) will produce the following output: ====================================================================== FAIL: test_even (__main__.NumbersTest) (i=1) ---------------------------------------------------------------------- Traceback (most recent call last): File "subtests.py", line 32, in test_even self.assertEqual(i % 2, 0) AssertionError: 1 != 0 ====================================================================== FAIL: test_even (__main__.NumbersTest) (i=3) ---------------------------------------------------------------------- Traceback (most recent call last): File "subtests.py", line 32, in test_even self.assertEqual(i % 2, 0) AssertionError: 1 != 0 ====================================================================== FAIL: test_even (__main__.NumbersTest) (i=5) ---------------------------------------------------------------------- Traceback (most recent call last): File "subtests.py", line 32, in test_even self.assertEqual(i % 2, 0) AssertionError: 1 != 0 Without using a subtest, execution would stop after the first failure, and the error would be less easy to diagnose because the value of "i" wouldn’t be displayed: ====================================================================== FAIL: test_even (__main__.NumbersTest) ---------------------------------------------------------------------- Traceback (most recent call last): File "subtests.py", line 32, in test_even self.assertEqual(i % 2, 0) AssertionError: 1 != 0 Classes and functions ===================== This section describes in depth the API of "unittest". Test cases ---------- class unittest.TestCase(methodName='runTest') Instances of the "TestCase" class represent the logical test units in the "unittest" universe. This class is intended to be used as a base class, with specific tests being implemented by concrete subclasses. This class implements the interface needed by the test runner to allow it to drive the tests, and methods that the test code can use to check for and report various kinds of failure. Each instance of "TestCase" will run a single base method: the method named *methodName*. In most uses of "TestCase", you will neither change the *methodName* nor reimplement the default "runTest()" method. Changed in version 3.2: "TestCase" can be instantiated successfully without providing a *methodName*. This makes it easier to experiment with "TestCase" from the interactive interpreter. "TestCase" instances provide three groups of methods: one group used to run the test, another used by the test implementation to check conditions and report failures, and some inquiry methods allowing information about the test itself to be gathered. Methods in the first group (running the test) are: setUp() Method called to prepare the test fixture. This is called immediately before calling the test method; other than "AssertionError" or "SkipTest", any exception raised by this method will be considered an error rather than a test failure. The default implementation does nothing. tearDown() Method called immediately after the test method has been called and the result recorded. This is called even if the test method raised an exception, so the implementation in subclasses may need to be particularly careful about checking internal state. Any exception, other than "AssertionError" or "SkipTest", raised by this method will be considered an additional error rather than a test failure (thus increasing the total number of reported errors). This method will only be called if the "setUp()" succeeds, regardless of the outcome of the test method. The default implementation does nothing. setUpClass() A class method called before tests in an individual class are run. "setUpClass" is called with the class as the only argument and must be decorated as a "classmethod()": @classmethod def setUpClass(cls): ... See Class and Module Fixtures for more details. New in version 3.2. tearDownClass() A class method called after tests in an individual class have run. "tearDownClass" is called with the class as the only argument and must be decorated as a "classmethod()": @classmethod def tearDownClass(cls): ... See Class and Module Fixtures for more details. New in version 3.2. run(result=None) Run the test, collecting the result into the "TestResult" object passed as *result*. If *result* is omitted or "None", a temporary result object is created (by calling the "defaultTestResult()" method) and used. The result object is returned to "run()"’s caller. The same effect may be had by simply calling the "TestCase" instance. Changed in version 3.3: Previous versions of "run" did not return the result. Neither did calling an instance. skipTest(reason) Calling this during a test method or "setUp()" skips the current test. See Skipping tests and expected failures for more information. New in version 3.1. subTest(msg=None, **params) Return a context manager which executes the enclosed code block as a subtest. *msg* and *params* are optional, arbitrary values which are displayed whenever a subtest fails, allowing you to identify them clearly. A test case can contain any number of subtest declarations, and they can be arbitrarily nested. See Distinguishing test iterations using subtests for more information. New in version 3.4. debug() Run the test without collecting the result. This allows exceptions raised by the test to be propagated to the caller, and can be used to support running tests under a debugger. The "TestCase" class provides several assert methods to check for and report failures. The following table lists the most commonly used methods (see the tables below for more assert methods): +-------------------------------------------+-------------------------------+-----------------+ | Method | Checks that | New in | |===========================================|===============================|=================| | "assertEqual(a, b)" | "a == b" | | +-------------------------------------------+-------------------------------+-----------------+ | "assertNotEqual(a, b)" | "a != b" | | +-------------------------------------------+-------------------------------+-----------------+ | "assertTrue(x)" | "bool(x) is True" | | +-------------------------------------------+-------------------------------+-----------------+ | "assertFalse(x)" | "bool(x) is False" | | +-------------------------------------------+-------------------------------+-----------------+ | "assertIs(a, b)" | "a is b" | 3.1 | +-------------------------------------------+-------------------------------+-----------------+ | "assertIsNot(a, b)" | "a is not b" | 3.1 | +-------------------------------------------+-------------------------------+-----------------+ | "assertIsNone(x)" | "x is None" | 3.1 | +-------------------------------------------+-------------------------------+-----------------+ | "assertIsNotNone(x)" | "x is not None" | 3.1 | +-------------------------------------------+-------------------------------+-----------------+ | "assertIn(a, b)" | "a in b" | 3.1 | +-------------------------------------------+-------------------------------+-----------------+ | "assertNotIn(a, b)" | "a not in b" | 3.1 | +-------------------------------------------+-------------------------------+-----------------+ | "assertIsInstance(a, b)" | "isinstance(a, b)" | 3.2 | +-------------------------------------------+-------------------------------+-----------------+ | "assertNotIsInstance(a, b)" | "not isinstance(a, b)" | 3.2 | +-------------------------------------------+-------------------------------+-----------------+ All the assert methods accept a *msg* argument that, if specified, is used as the error message on failure (see also "longMessage"). Note that the *msg* keyword argument can be passed to "assertRaises()", "assertRaisesRegex()", "assertWarns()", "assertWarnsRegex()" only when they are used as a context manager. assertEqual(first, second, msg=None) Test that *first* and *second* are equal. If the values do not compare equal, the test will fail. In addition, if *first* and *second* are the exact same type and one of list, tuple, dict, set, frozenset or str or any type that a subclass registers with "addTypeEqualityFunc()" the type- specific equality function will be called in order to generate a more useful default error message (see also the list of type- specific methods). Changed in version 3.1: Added the automatic calling of type- specific equality function. Changed in version 3.2: "assertMultiLineEqual()" added as the default type equality function for comparing strings. assertNotEqual(first, second, msg=None) Test that *first* and *second* are not equal. If the values do compare equal, the test will fail. assertTrue(expr, msg=None) assertFalse(expr, msg=None) Test that *expr* is true (or false). Note that this is equivalent to "bool(expr) is True" and not to "expr is True" (use "assertIs(expr, True)" for the latter). This method should also be avoided when more specific methods are available (e.g. "assertEqual(a, b)" instead of "assertTrue(a == b)"), because they provide a better error message in case of failure. assertIs(first, second, msg=None) assertIsNot(first, second, msg=None) Test that *first* and *second* are (or are not) the same object. New in version 3.1. assertIsNone(expr, msg=None) assertIsNotNone(expr, msg=None) Test that *expr* is (or is not) "None". New in version 3.1. assertIn(member, container, msg=None) assertNotIn(member, container, msg=None) Test that *member* is (or is not) in *container*. New in version 3.1. assertIsInstance(obj, cls, msg=None) assertNotIsInstance(obj, cls, msg=None) Test that *obj* is (or is not) an instance of *cls* (which can be a class or a tuple of classes, as supported by "isinstance()"). To check for the exact type, use "assertIs(type(obj), cls)". New in version 3.2. It is also possible to check the production of exceptions, warnings, and log messages using the following methods: +-----------------------------------------------------------+----------------------------------------+--------------+ | Method | Checks that | New in | |===========================================================|========================================|==============| | "assertRaises(exc, fun, *args, **kwds)" | "fun(*args, **kwds)" raises *exc* | | +-----------------------------------------------------------+----------------------------------------+--------------+ | "assertRaisesRegex(exc, r, fun, *args, **kwds)" | "fun(*args, **kwds)" raises *exc* and | 3.1 | | | the message matches regex *r* | | +-----------------------------------------------------------+----------------------------------------+--------------+ | "assertWarns(warn, fun, *args, **kwds)" | "fun(*args, **kwds)" raises *warn* | 3.2 | +-----------------------------------------------------------+----------------------------------------+--------------+ | "assertWarnsRegex(warn, r, fun, *args, **kwds)" | "fun(*args, **kwds)" raises *warn* and | 3.2 | | | the message matches regex *r* | | +-----------------------------------------------------------+----------------------------------------+--------------+ | "assertLogs(logger, level)" | The "with" block logs on *logger* with | 3.4 | | | minimum *level* | | +-----------------------------------------------------------+----------------------------------------+--------------+ assertRaises(exception, callable, *args, **kwds) assertRaises(exception, *, msg=None) Test that an exception is raised when *callable* is called with any positional or keyword arguments that are also passed to "assertRaises()". The test passes if *exception* is raised, is an error if another exception is raised, or fails if no exception is raised. To catch any of a group of exceptions, a tuple containing the exception classes may be passed as *exception*. If only the *exception* and possibly the *msg* arguments are given, return a context manager so that the code under test can be written inline rather than as a function: with self.assertRaises(SomeException): do_something() When used as a context manager, "assertRaises()" accepts the additional keyword argument *msg*. The context manager will store the caught exception object in its "exception" attribute. This can be useful if the intention is to perform additional checks on the exception raised: with self.assertRaises(SomeException) as cm: do_something() the_exception = cm.exception self.assertEqual(the_exception.error_code, 3) Changed in version 3.1: Added the ability to use "assertRaises()" as a context manager. Changed in version 3.2: Added the "exception" attribute. Changed in version 3.3: Added the *msg* keyword argument when used as a context manager. assertRaisesRegex(exception, regex, callable, *args, **kwds) assertRaisesRegex(exception, regex, *, msg=None) Like "assertRaises()" but also tests that *regex* matches on the string representation of the raised exception. *regex* may be a regular expression object or a string containing a regular expression suitable for use by "re.search()". Examples: self.assertRaisesRegex(ValueError, "invalid literal for.*XYZ'$", int, 'XYZ') or: with self.assertRaisesRegex(ValueError, 'literal'): int('XYZ') New in version 3.1: Added under the name "assertRaisesRegexp". Changed in version 3.2: Renamed to "assertRaisesRegex()". Changed in version 3.3: Added the *msg* keyword argument when used as a context manager. assertWarns(warning, callable, *args, **kwds) assertWarns(warning, *, msg=None) Test that a warning is triggered when *callable* is called with any positional or keyword arguments that are also passed to "assertWarns()". The test passes if *warning* is triggered and fails if it isn’t. Any exception is an error. To catch any of a group of warnings, a tuple containing the warning classes may be passed as *warnings*. If only the *warning* and possibly the *msg* arguments are given, return a context manager so that the code under test can be written inline rather than as a function: with self.assertWarns(SomeWarning): do_something() When used as a context manager, "assertWarns()" accepts the additional keyword argument *msg*. The context manager will store the caught warning object in its "warning" attribute, and the source line which triggered the warnings in the "filename" and "lineno" attributes. This can be useful if the intention is to perform additional checks on the warning caught: with self.assertWarns(SomeWarning) as cm: do_something() self.assertIn('myfile.py', cm.filename) self.assertEqual(320, cm.lineno) This method works regardless of the warning filters in place when it is called. New in version 3.2. Changed in version 3.3: Added the *msg* keyword argument when used as a context manager. assertWarnsRegex(warning, regex, callable, *args, **kwds) assertWarnsRegex(warning, regex, *, msg=None) Like "assertWarns()" but also tests that *regex* matches on the message of the triggered warning. *regex* may be a regular expression object or a string containing a regular expression suitable for use by "re.search()". Example: self.assertWarnsRegex(DeprecationWarning, r'legacy_function\(\) is deprecated', legacy_function, 'XYZ') or: with self.assertWarnsRegex(RuntimeWarning, 'unsafe frobnicating'): frobnicate('/etc/passwd') New in version 3.2. Changed in version 3.3: Added the *msg* keyword argument when used as a context manager. assertLogs(logger=None, level=None) A context manager to test that at least one message is logged on the *logger* or one of its children, with at least the given *level*. If given, *logger* should be a "logging.Logger" object or a "str" giving the name of a logger. The default is the root logger, which will catch all messages that were not blocked by a non-propagating descendent logger. If given, *level* should be either a numeric logging level or its string equivalent (for example either ""ERROR"" or "logging.ERROR"). The default is "logging.INFO". The test passes if at least one message emitted inside the "with" block matches the *logger* and *level* conditions, otherwise it fails. The object returned by the context manager is a recording helper which keeps tracks of the matching log messages. It has two attributes: records A list of "logging.LogRecord" objects of the matching log messages. output A list of "str" objects with the formatted output of matching messages. Example: with self.assertLogs('foo', level='INFO') as cm: logging.getLogger('foo').info('first message') logging.getLogger('foo.bar').error('second message') self.assertEqual(cm.output, ['INFO:foo:first message', 'ERROR:foo.bar:second message']) New in version 3.4. There are also other methods used to perform more specific checks, such as: +-----------------------------------------+----------------------------------+----------------+ | Method | Checks that | New in | |=========================================|==================================|================| | "assertAlmostEqual(a, b)" | "round(a-b, 7) == 0" | | +-----------------------------------------+----------------------------------+----------------+ | "assertNotAlmostEqual(a, b)" | "round(a-b, 7) != 0" | | +-----------------------------------------+----------------------------------+----------------+ | "assertGreater(a, b)" | "a > b" | 3.1 | +-----------------------------------------+----------------------------------+----------------+ | "assertGreaterEqual(a, b)" | "a >= b" | 3.1 | +-----------------------------------------+----------------------------------+----------------+ | "assertLess(a, b)" | "a < b" | 3.1 | +-----------------------------------------+----------------------------------+----------------+ | "assertLessEqual(a, b)" | "a <= b" | 3.1 | +-----------------------------------------+----------------------------------+----------------+ | "assertRegex(s, r)" | "r.search(s)" | 3.1 | +-----------------------------------------+----------------------------------+----------------+ | "assertNotRegex(s, r)" | "not r.search(s)" | 3.2 | +-----------------------------------------+----------------------------------+----------------+ | "assertCountEqual(a, b)" | *a* and *b* have the same | 3.2 | | | elements in the same number, | | | | regardless of their order. | | +-----------------------------------------+----------------------------------+----------------+ assertAlmostEqual(first, second, places=7, msg=None, delta=None) assertNotAlmostEqual(first, second, places=7, msg=None, delta=None) Test that *first* and *second* are approximately (or not approximately) equal by computing the difference, rounding to the given number of decimal *places* (default 7), and comparing to zero. Note that these methods round the values to the given number of *decimal places* (i.e. like the "round()" function) and not *significant digits*. If *delta* is supplied instead of *places* then the difference between *first* and *second* must be less or equal to (or greater than) *delta*. Supplying both *delta* and *places* raises a "TypeError". Changed in version 3.2: "assertAlmostEqual()" automatically considers almost equal objects that compare equal. "assertNotAlmostEqual()" automatically fails if the objects compare equal. Added the *delta* keyword argument. assertGreater(first, second, msg=None) assertGreaterEqual(first, second, msg=None) assertLess(first, second, msg=None) assertLessEqual(first, second, msg=None) Test that *first* is respectively >, >=, < or <= than *second* depending on the method name. If not, the test will fail: >>> self.assertGreaterEqual(3, 4) AssertionError: "3" unexpectedly not greater than or equal to "4" New in version 3.1. assertRegex(text, regex, msg=None) assertNotRegex(text, regex, msg=None) Test that a *regex* search matches (or does not match) *text*. In case of failure, the error message will include the pattern and the *text* (or the pattern and the part of *text* that unexpectedly matched). *regex* may be a regular expression object or a string containing a regular expression suitable for use by "re.search()". New in version 3.1: Added under the name "assertRegexpMatches". Changed in version 3.2: The method "assertRegexpMatches()" has been renamed to "assertRegex()". New in version 3.2: "assertNotRegex()". New in version 3.5: The name "assertNotRegexpMatches" is a deprecated alias for "assertNotRegex()". assertCountEqual(first, second, msg=None) Test that sequence *first* contains the same elements as *second*, regardless of their order. When they don’t, an error message listing the differences between the sequences will be generated. Duplicate elements are *not* ignored when comparing *first* and *second*. It verifies whether each element has the same count in both sequences. Equivalent to: "assertEqual(Counter(list(first)), Counter(list(second)))" but works with sequences of unhashable objects as well. New in version 3.2. The "assertEqual()" method dispatches the equality check for objects of the same type to different type-specific methods. These methods are already implemented for most of the built-in types, but it’s also possible to register new methods using "addTypeEqualityFunc()": addTypeEqualityFunc(typeobj, function) Registers a type-specific method called by "assertEqual()" to check if two objects of exactly the same *typeobj* (not subclasses) compare equal. *function* must take two positional arguments and a third msg=None keyword argument just as "assertEqual()" does. It must raise "self.failureException(msg)" when inequality between the first two parameters is detected – possibly providing useful information and explaining the inequalities in details in the error message. New in version 3.1. The list of type-specific methods automatically used by "assertEqual()" are summarized in the following table. Note that it’s usually not necessary to invoke these methods directly. +-------------------------------------------+-------------------------------+----------------+ | Method | Used to compare | New in | |===========================================|===============================|================| | "assertMultiLineEqual(a, b)" | strings | 3.1 | +-------------------------------------------+-------------------------------+----------------+ | "assertSequenceEqual(a, b)" | sequences | 3.1 | +-------------------------------------------+-------------------------------+----------------+ | "assertListEqual(a, b)" | lists | 3.1 | +-------------------------------------------+-------------------------------+----------------+ | "assertTupleEqual(a, b)" | tuples | 3.1 | +-------------------------------------------+-------------------------------+----------------+ | "assertSetEqual(a, b)" | sets or frozensets | 3.1 | +-------------------------------------------+-------------------------------+----------------+ | "assertDictEqual(a, b)" | dicts | 3.1 | +-------------------------------------------+-------------------------------+----------------+ assertMultiLineEqual(first, second, msg=None) Test that the multiline string *first* is equal to the string *second*. When not equal a diff of the two strings highlighting the differences will be included in the error message. This method is used by default when comparing strings with "assertEqual()". New in version 3.1. assertSequenceEqual(first, second, msg=None, seq_type=None) Tests that two sequences are equal. If a *seq_type* is supplied, both *first* and *second* must be instances of *seq_type* or a failure will be raised. If the sequences are different an error message is constructed that shows the difference between the two. This method is not called directly by "assertEqual()", but it’s used to implement "assertListEqual()" and "assertTupleEqual()". New in version 3.1. assertListEqual(first, second, msg=None) assertTupleEqual(first, second, msg=None) Tests that two lists or tuples are equal. If not, an error message is constructed that shows only the differences between the two. An error is also raised if either of the parameters are of the wrong type. These methods are used by default when comparing lists or tuples with "assertEqual()". New in version 3.1. assertSetEqual(first, second, msg=None) Tests that two sets are equal. If not, an error message is constructed that lists the differences between the sets. This method is used by default when comparing sets or frozensets with "assertEqual()". Fails if either of *first* or *second* does not have a "set.difference()" method. New in version 3.1. assertDictEqual(first, second, msg=None) Test that two dictionaries are equal. If not, an error message is constructed that shows the differences in the dictionaries. This method will be used by default to compare dictionaries in calls to "assertEqual()". New in version 3.1. Finally the "TestCase" provides the following methods and attributes: fail(msg=None) Signals a test failure unconditionally, with *msg* or "None" for the error message. failureException This class attribute gives the exception raised by the test method. If a test framework needs to use a specialized exception, possibly to carry additional information, it must subclass this exception in order to “play fair” with the framework. The initial value of this attribute is "AssertionError". longMessage This class attribute determines what happens when a custom failure message is passed as the msg argument to an assertXYY call that fails. "True" is the default value. In this case, the custom message is appended to the end of the standard failure message. When set to "False", the custom message replaces the standard message. The class setting can be overridden in individual test methods by assigning an instance attribute, self.longMessage, to "True" or "False" before calling the assert methods. The class setting gets reset before each test call. New in version 3.1. maxDiff This attribute controls the maximum length of diffs output by assert methods that report diffs on failure. It defaults to 80*8 characters. Assert methods affected by this attribute are "assertSequenceEqual()" (including all the sequence comparison methods that delegate to it), "assertDictEqual()" and "assertMultiLineEqual()". Setting "maxDiff" to "None" means that there is no maximum length of diffs. New in version 3.2. Testing frameworks can use the following methods to collect information on the test: countTestCases() Return the number of tests represented by this test object. For "TestCase" instances, this will always be "1". defaultTestResult() Return an instance of the test result class that should be used for this test case class (if no other result instance is provided to the "run()" method). For "TestCase" instances, this will always be an instance of "TestResult"; subclasses of "TestCase" should override this as necessary. id() Return a string identifying the specific test case. This is usually the full name of the test method, including the module and class name. shortDescription() Returns a description of the test, or "None" if no description has been provided. The default implementation of this method returns the first line of the test method’s docstring, if available, or "None". Changed in version 3.1: In 3.1 this was changed to add the test name to the short description even in the presence of a docstring. This caused compatibility issues with unittest extensions and adding the test name was moved to the "TextTestResult" in Python 3.2. addCleanup(function, /, *args, **kwargs) Add a function to be called after "tearDown()" to cleanup resources used during the test. Functions will be called in reverse order to the order they are added (LIFO (last-in, first- out)). They are called with any arguments and keyword arguments passed into "addCleanup()" when they are added. If "setUp()" fails, meaning that "tearDown()" is not called, then any cleanup functions added will still be called. New in version 3.1. doCleanups() This method is called unconditionally after "tearDown()", or after "setUp()" if "setUp()" raises an exception. It is responsible for calling all the cleanup functions added by "addCleanup()". If you need cleanup functions to be called *prior* to "tearDown()" then you can call "doCleanups()" yourself. "doCleanups()" pops methods off the stack of cleanup functions one at a time, so it can be called at any time. New in version 3.1. classmethod addClassCleanup(function, /, *args, **kwargs) Add a function to be called after "tearDownClass()" to cleanup resources used during the test class. Functions will be called in reverse order to the order they are added (LIFO (last-in, first-out)). They are called with any arguments and keyword arguments passed into "addClassCleanup()" when they are added. If "setUpClass()" fails, meaning that "tearDownClass()" is not called, then any cleanup functions added will still be called. New in version 3.8. classmethod doClassCleanups() This method is called unconditionally after "tearDownClass()", or after "setUpClass()" if "setUpClass()" raises an exception. It is responsible for calling all the cleanup functions added by "addClassCleanup()". If you need cleanup functions to be called *prior* to "tearDownClass()" then you can call "doClassCleanups()" yourself. "doClassCleanups()" pops methods off the stack of cleanup functions one at a time, so it can be called at any time. New in version 3.8. class unittest.IsolatedAsyncioTestCase(methodName='runTest') This class provides an API similar to "TestCase" and also accepts coroutines as test functions. New in version 3.8. coroutine asyncSetUp() Method called to prepare the test fixture. This is called after "setUp()". This is called immediately before calling the test method; other than "AssertionError" or "SkipTest", any exception raised by this method will be considered an error rather than a test failure. The default implementation does nothing. coroutine asyncTearDown() Method called immediately after the test method has been called and the result recorded. This is called before "tearDown()". This is called even if the test method raised an exception, so the implementation in subclasses may need to be particularly careful about checking internal state. Any exception, other than "AssertionError" or "SkipTest", raised by this method will be considered an additional error rather than a test failure (thus increasing the total number of reported errors). This method will only be called if the "asyncSetUp()" succeeds, regardless of the outcome of the test method. The default implementation does nothing. addAsyncCleanup(function, /, *args, **kwargs) This method accepts a coroutine that can be used as a cleanup function. run(result=None) Sets up a new event loop to run the test, collecting the result into the "TestResult" object passed as *result*. If *result* is omitted or "None", a temporary result object is created (by calling the "defaultTestResult()" method) and used. The result object is returned to "run()"’s caller. At the end of the test all the tasks in the event loop are cancelled. An example illustrating the order: from unittest import IsolatedAsyncioTestCase events = [] class Test(IsolatedAsyncioTestCase): def setUp(self): events.append("setUp") async def asyncSetUp(self): self._async_connection = await AsyncConnection() events.append("asyncSetUp") async def test_response(self): events.append("test_response") response = await self._async_connection.get("https://example.com") self.assertEqual(response.status_code, 200) self.addAsyncCleanup(self.on_cleanup) def tearDown(self): events.append("tearDown") async def asyncTearDown(self): await self._async_connection.close() events.append("asyncTearDown") async def on_cleanup(self): events.append("cleanup") if __name__ == "__main__": unittest.main() After running the test, "events" would contain "["setUp", "asyncSetUp", "test_response", "asyncTearDown", "tearDown", "cleanup"]". class unittest.FunctionTestCase(testFunc, setUp=None, tearDown=None, description=None) This class implements the portion of the "TestCase" interface which allows the test runner to drive the test, but does not provide the methods which test code can use to check and report errors. This is used to create test cases using legacy test code, allowing it to be integrated into a "unittest"-based test framework. Deprecated aliases ~~~~~~~~~~~~~~~~~~ For historical reasons, some of the "TestCase" methods had one or more aliases that are now deprecated. The following table lists the correct names along with their deprecated aliases: +--------------------------------+------------------------+-------------------------+ | Method Name | Deprecated alias | Deprecated alias | |================================|========================|=========================| | "assertEqual()" | failUnlessEqual | assertEquals | +--------------------------------+------------------------+-------------------------+ | "assertNotEqual()" | failIfEqual | assertNotEquals | +--------------------------------+------------------------+-------------------------+ | "assertTrue()" | failUnless | assert_ | +--------------------------------+------------------------+-------------------------+ | "assertFalse()" | failIf | | +--------------------------------+------------------------+-------------------------+ | "assertRaises()" | failUnlessRaises | | +--------------------------------+------------------------+-------------------------+ | "assertAlmostEqual()" | failUnlessAlmostEqual | assertAlmostEquals | +--------------------------------+------------------------+-------------------------+ | "assertNotAlmostEqual()" | failIfAlmostEqual | assertNotAlmostEquals | +--------------------------------+------------------------+-------------------------+ | "assertRegex()" | | assertRegexpMatches | +--------------------------------+------------------------+-------------------------+ | "assertNotRegex()" | | assertNotRegexpMatches | +--------------------------------+------------------------+-------------------------+ | "assertRaisesRegex()" | | assertRaisesRegexp | +--------------------------------+------------------------+-------------------------+ Deprecated since version 3.1: The fail* aliases listed in the second column have been deprecated. Deprecated since version 3.2: The assert* aliases listed in the third column have been deprecated. Deprecated since version 3.2: "assertRegexpMatches" and "assertRaisesRegexp" have been renamed to "assertRegex()" and "assertRaisesRegex()". Deprecated since version 3.5: The "assertNotRegexpMatches" name is deprecated in favor of "assertNotRegex()". Grouping tests -------------- class unittest.TestSuite(tests=()) This class represents an aggregation of individual test cases and test suites. The class presents the interface needed by the test runner to allow it to be run as any other test case. Running a "TestSuite" instance is the same as iterating over the suite, running each test individually. If *tests* is given, it must be an iterable of individual test cases or other test suites that will be used to build the suite initially. Additional methods are provided to add test cases and suites to the collection later on. "TestSuite" objects behave much like "TestCase" objects, except they do not actually implement a test. Instead, they are used to aggregate tests into groups of tests that should be run together. Some additional methods are available to add tests to "TestSuite" instances: addTest(test) Add a "TestCase" or "TestSuite" to the suite. addTests(tests) Add all the tests from an iterable of "TestCase" and "TestSuite" instances to this test suite. This is equivalent to iterating over *tests*, calling "addTest()" for each element. "TestSuite" shares the following methods with "TestCase": run(result) Run the tests associated with this suite, collecting the result into the test result object passed as *result*. Note that unlike "TestCase.run()", "TestSuite.run()" requires the result object to be passed in. debug() Run the tests associated with this suite without collecting the result. This allows exceptions raised by the test to be propagated to the caller and can be used to support running tests under a debugger. countTestCases() Return the number of tests represented by this test object, including all individual tests and sub-suites. __iter__() Tests grouped by a "TestSuite" are always accessed by iteration. Subclasses can lazily provide tests by overriding "__iter__()". Note that this method may be called several times on a single suite (for example when counting tests or comparing for equality) so the tests returned by repeated iterations before "TestSuite.run()" must be the same for each call iteration. After "TestSuite.run()", callers should not rely on the tests returned by this method unless the caller uses a subclass that overrides "TestSuite._removeTestAtIndex()" to preserve test references. Changed in version 3.2: In earlier versions the "TestSuite" accessed tests directly rather than through iteration, so overriding "__iter__()" wasn’t sufficient for providing tests. Changed in version 3.4: In earlier versions the "TestSuite" held references to each "TestCase" after "TestSuite.run()". Subclasses can restore that behavior by overriding "TestSuite._removeTestAtIndex()". In the typical usage of a "TestSuite" object, the "run()" method is invoked by a "TestRunner" rather than by the end-user test harness. Loading and running tests ------------------------- class unittest.TestLoader The "TestLoader" class is used to create test suites from classes and modules. Normally, there is no need to create an instance of this class; the "unittest" module provides an instance that can be shared as "unittest.defaultTestLoader". Using a subclass or instance, however, allows customization of some configurable properties. "TestLoader" objects have the following attributes: errors A list of the non-fatal errors encountered while loading tests. Not reset by the loader at any point. Fatal errors are signalled by the relevant a method raising an exception to the caller. Non-fatal errors are also indicated by a synthetic test that will raise the original error when run. New in version 3.5. "TestLoader" objects have the following methods: loadTestsFromTestCase(testCaseClass) Return a suite of all test cases contained in the "TestCase"-derived "testCaseClass". A test case instance is created for each method named by "getTestCaseNames()". By default these are the method names beginning with "test". If "getTestCaseNames()" returns no methods, but the "runTest()" method is implemented, a single test case is created for that method instead. loadTestsFromModule(module, pattern=None) Return a suite of all test cases contained in the given module. This method searches *module* for classes derived from "TestCase" and creates an instance of the class for each test method defined for the class. Note: While using a hierarchy of "TestCase"-derived classes can be convenient in sharing fixtures and helper functions, defining test methods on base classes that are not intended to be instantiated directly does not play well with this method. Doing so, however, can be useful when the fixtures are different and defined in subclasses. If a module provides a "load_tests" function it will be called to load the tests. This allows modules to customize test loading. This is the load_tests protocol. The *pattern* argument is passed as the third argument to "load_tests". Changed in version 3.2: Support for "load_tests" added. Changed in version 3.5: The undocumented and unofficial *use_load_tests* default argument is deprecated and ignored, although it is still accepted for backward compatibility. The method also now accepts a keyword-only argument *pattern* which is passed to "load_tests" as the third argument. loadTestsFromName(name, module=None) Return a suite of all test cases given a string specifier. The specifier *name* is a “dotted name” that may resolve either to a module, a test case class, a test method within a test case class, a "TestSuite" instance, or a callable object which returns a "TestCase" or "TestSuite" instance. These checks are applied in the order listed here; that is, a method on a possible test case class will be picked up as “a test method within a test case class”, rather than “a callable object”. For example, if you have a module "SampleTests" containing a "TestCase"-derived class "SampleTestCase" with three test methods ("test_one()", "test_two()", and "test_three()"), the specifier "'SampleTests.SampleTestCase'" would cause this method to return a suite which will run all three test methods. Using the specifier "'SampleTests.SampleTestCase.test_two'" would cause it to return a test suite which will run only the "test_two()" test method. The specifier can refer to modules and packages which have not been imported; they will be imported as a side-effect. The method optionally resolves *name* relative to the given *module*. Changed in version 3.5: If an "ImportError" or "AttributeError" occurs while traversing *name* then a synthetic test that raises that error when run will be returned. These errors are included in the errors accumulated by self.errors. loadTestsFromNames(names, module=None) Similar to "loadTestsFromName()", but takes a sequence of names rather than a single name. The return value is a test suite which supports all the tests defined for each name. getTestCaseNames(testCaseClass) Return a sorted sequence of method names found within *testCaseClass*; this should be a subclass of "TestCase". discover(start_dir, pattern='test*.py', top_level_dir=None) Find all the test modules by recursing into subdirectories from the specified start directory, and return a TestSuite object containing them. Only test files that match *pattern* will be loaded. (Using shell style pattern matching.) Only module names that are importable (i.e. are valid Python identifiers) will be loaded. All test modules must be importable from the top level of the project. If the start directory is not the top level directory then the top level directory must be specified separately. If importing a module fails, for example due to a syntax error, then this will be recorded as a single error and discovery will continue. If the import failure is due to "SkipTest" being raised, it will be recorded as a skip instead of an error. If a package (a directory containing a file named "__init__.py") is found, the package will be checked for a "load_tests" function. If this exists then it will be called "package.load_tests(loader, tests, pattern)". Test discovery takes care to ensure that a package is only checked for tests once during an invocation, even if the load_tests function itself calls "loader.discover". If "load_tests" exists then discovery does *not* recurse into the package, "load_tests" is responsible for loading all tests in the package. The pattern is deliberately not stored as a loader attribute so that packages can continue discovery themselves. *top_level_dir* is stored so "load_tests" does not need to pass this argument in to "loader.discover()". *start_dir* can be a dotted module name as well as a directory. New in version 3.2. Changed in version 3.4: Modules that raise "SkipTest" on import are recorded as skips, not errors. Changed in version 3.4: *start_dir* can be a *namespace packages*. Changed in version 3.4: Paths are sorted before being imported so that execution order is the same even if the underlying file system’s ordering is not dependent on file name. Changed in version 3.5: Found packages are now checked for "load_tests" regardless of whether their path matches *pattern*, because it is impossible for a package name to match the default pattern. The following attributes of a "TestLoader" can be configured either by subclassing or assignment on an instance: testMethodPrefix String giving the prefix of method names which will be interpreted as test methods. The default value is "'test'". This affects "getTestCaseNames()" and all the "loadTestsFrom*()" methods. sortTestMethodsUsing Function to be used to compare method names when sorting them in "getTestCaseNames()" and all the "loadTestsFrom*()" methods. suiteClass Callable object that constructs a test suite from a list of tests. No methods on the resulting object are needed. The default value is the "TestSuite" class. This affects all the "loadTestsFrom*()" methods. testNamePatterns List of Unix shell-style wildcard test name patterns that test methods have to match to be included in test suites (see "-v" option). If this attribute is not "None" (the default), all test methods to be included in test suites must match one of the patterns in this list. Note that matches are always performed using "fnmatch.fnmatchcase()", so unlike patterns passed to the "-v" option, simple substring patterns will have to be converted using "*" wildcards. This affects all the "loadTestsFrom*()" methods. New in version 3.7. class unittest.TestResult This class is used to compile information about which tests have succeeded and which have failed. A "TestResult" object stores the results of a set of tests. The "TestCase" and "TestSuite" classes ensure that results are properly recorded; test authors do not need to worry about recording the outcome of tests. Testing frameworks built on top of "unittest" may want access to the "TestResult" object generated by running a set of tests for reporting purposes; a "TestResult" instance is returned by the "TestRunner.run()" method for this purpose. "TestResult" instances have the following attributes that will be of interest when inspecting the results of running a set of tests: errors A list containing 2-tuples of "TestCase" instances and strings holding formatted tracebacks. Each tuple represents a test which raised an unexpected exception. failures A list containing 2-tuples of "TestCase" instances and strings holding formatted tracebacks. Each tuple represents a test where a failure was explicitly signalled using the "TestCase.assert*()" methods. skipped A list containing 2-tuples of "TestCase" instances and strings holding the reason for skipping the test. New in version 3.1. expectedFailures A list containing 2-tuples of "TestCase" instances and strings holding formatted tracebacks. Each tuple represents an expected failure or error of the test case. unexpectedSuccesses A list containing "TestCase" instances that were marked as expected failures, but succeeded. shouldStop Set to "True" when the execution of tests should stop by "stop()". testsRun The total number of tests run so far. buffer If set to true, "sys.stdout" and "sys.stderr" will be buffered in between "startTest()" and "stopTest()" being called. Collected output will only be echoed onto the real "sys.stdout" and "sys.stderr" if the test fails or errors. Any output is also attached to the failure / error message. New in version 3.2. failfast If set to true "stop()" will be called on the first failure or error, halting the test run. New in version 3.2. tb_locals If set to true then local variables will be shown in tracebacks. New in version 3.5. wasSuccessful() Return "True" if all tests run so far have passed, otherwise returns "False". Changed in version 3.4: Returns "False" if there were any "unexpectedSuccesses" from tests marked with the "expectedFailure()" decorator. stop() This method can be called to signal that the set of tests being run should be aborted by setting the "shouldStop" attribute to "True". "TestRunner" objects should respect this flag and return without running any additional tests. For example, this feature is used by the "TextTestRunner" class to stop the test framework when the user signals an interrupt from the keyboard. Interactive tools which provide "TestRunner" implementations can use this in a similar manner. The following methods of the "TestResult" class are used to maintain the internal data structures, and may be extended in subclasses to support additional reporting requirements. This is particularly useful in building tools which support interactive reporting while tests are being run. startTest(test) Called when the test case *test* is about to be run. stopTest(test) Called after the test case *test* has been executed, regardless of the outcome. startTestRun() Called once before any tests are executed. New in version 3.1. stopTestRun() Called once after all tests are executed. New in version 3.1. addError(test, err) Called when the test case *test* raises an unexpected exception. *err* is a tuple of the form returned by "sys.exc_info()": "(type, value, traceback)". The default implementation appends a tuple "(test, formatted_err)" to the instance’s "errors" attribute, where *formatted_err* is a formatted traceback derived from *err*. addFailure(test, err) Called when the test case *test* signals a failure. *err* is a tuple of the form returned by "sys.exc_info()": "(type, value, traceback)". The default implementation appends a tuple "(test, formatted_err)" to the instance’s "failures" attribute, where *formatted_err* is a formatted traceback derived from *err*. addSuccess(test) Called when the test case *test* succeeds. The default implementation does nothing. addSkip(test, reason) Called when the test case *test* is skipped. *reason* is the reason the test gave for skipping. The default implementation appends a tuple "(test, reason)" to the instance’s "skipped" attribute. addExpectedFailure(test, err) Called when the test case *test* fails or errors, but was marked with the "expectedFailure()" decorator. The default implementation appends a tuple "(test, formatted_err)" to the instance’s "expectedFailures" attribute, where *formatted_err* is a formatted traceback derived from *err*. addUnexpectedSuccess(test) Called when the test case *test* was marked with the "expectedFailure()" decorator, but succeeded. The default implementation appends the test to the instance’s "unexpectedSuccesses" attribute. addSubTest(test, subtest, outcome) Called when a subtest finishes. *test* is the test case corresponding to the test method. *subtest* is a custom "TestCase" instance describing the subtest. If *outcome* is "None", the subtest succeeded. Otherwise, it failed with an exception where *outcome* is a tuple of the form returned by "sys.exc_info()": "(type, value, traceback)". The default implementation does nothing when the outcome is a success, and records subtest failures as normal failures. New in version 3.4. class unittest.TextTestResult(stream, descriptions, verbosity) A concrete implementation of "TestResult" used by the "TextTestRunner". New in version 3.2: This class was previously named "_TextTestResult". The old name still exists as an alias but is deprecated. unittest.defaultTestLoader Instance of the "TestLoader" class intended to be shared. If no customization of the "TestLoader" is needed, this instance can be used instead of repeatedly creating new instances. class unittest.TextTestRunner(stream=None, descriptions=True, verbosity=1, failfast=False, buffer=False, resultclass=None, warnings=None, *, tb_locals=False) A basic test runner implementation that outputs results to a stream. If *stream* is "None", the default, "sys.stderr" is used as the output stream. This class has a few configurable parameters, but is essentially very simple. Graphical applications which run test suites should provide alternate implementations. Such implementations should accept "**kwargs" as the interface to construct runners changes when features are added to unittest. By default this runner shows "DeprecationWarning", "PendingDeprecationWarning", "ResourceWarning" and "ImportWarning" even if they are ignored by default. Deprecation warnings caused by deprecated unittest methods are also special-cased and, when the warning filters are "'default'" or "'always'", they will appear only once per-module, in order to avoid too many warning messages. This behavior can be overridden using Python’s "-Wd" or "-Wa" options (see Warning control) and leaving *warnings* to "None". Changed in version 3.2: Added the "warnings" argument. Changed in version 3.2: The default stream is set to "sys.stderr" at instantiation time rather than import time. Changed in version 3.5: Added the tb_locals parameter. _makeResult() This method returns the instance of "TestResult" used by "run()". It is not intended to be called directly, but can be overridden in subclasses to provide a custom "TestResult". "_makeResult()" instantiates the class or callable passed in the "TextTestRunner" constructor as the "resultclass" argument. It defaults to "TextTestResult" if no "resultclass" is provided. The result class is instantiated with the following arguments: stream, descriptions, verbosity run(test) This method is the main public interface to the "TextTestRunner". This method takes a "TestSuite" or "TestCase" instance. A "TestResult" is created by calling "_makeResult()" and the test(s) are run and the results printed to stdout. unittest.main(module='__main__', defaultTest=None, argv=None, testRunner=None, testLoader=unittest.defaultTestLoader, exit=True, verbosity=1, failfast=None, catchbreak=None, buffer=None, warnings=None) A command-line program that loads a set of tests from *module* and runs them; this is primarily for making test modules conveniently executable. The simplest use for this function is to include the following line at the end of a test script: if __name__ == '__main__': unittest.main() You can run tests with more detailed information by passing in the verbosity argument: if __name__ == '__main__': unittest.main(verbosity=2) The *defaultTest* argument is either the name of a single test or an iterable of test names to run if no test names are specified via *argv*. If not specified or "None" and no test names are provided via *argv*, all tests found in *module* are run. The *argv* argument can be a list of options passed to the program, with the first element being the program name. If not specified or "None", the values of "sys.argv" are used. The *testRunner* argument can either be a test runner class or an already created instance of it. By default "main" calls "sys.exit()" with an exit code indicating success or failure of the tests run. The *testLoader* argument has to be a "TestLoader" instance, and defaults to "defaultTestLoader". "main" supports being used from the interactive interpreter by passing in the argument "exit=False". This displays the result on standard output without calling "sys.exit()": >>> from unittest import main >>> main(module='test_module', exit=False) The *failfast*, *catchbreak* and *buffer* parameters have the same effect as the same-name command-line options. The *warnings* argument specifies the warning filter that should be used while running the tests. If it’s not specified, it will remain "None" if a "-W" option is passed to **python** (see Warning control), otherwise it will be set to "'default'". Calling "main" actually returns an instance of the "TestProgram" class. This stores the result of the tests run as the "result" attribute. Changed in version 3.1: The *exit* parameter was added. Changed in version 3.2: The *verbosity*, *failfast*, *catchbreak*, *buffer* and *warnings* parameters were added. Changed in version 3.4: The *defaultTest* parameter was changed to also accept an iterable of test names. load_tests Protocol ~~~~~~~~~~~~~~~~~~~ New in version 3.2. Modules or packages can customize how tests are loaded from them during normal test runs or test discovery by implementing a function called "load_tests". If a test module defines "load_tests" it will be called by "TestLoader.loadTestsFromModule()" with the following arguments: load_tests(loader, standard_tests, pattern) where *pattern* is passed straight through from "loadTestsFromModule". It defaults to "None". It should return a "TestSuite". *loader* is the instance of "TestLoader" doing the loading. *standard_tests* are the tests that would be loaded by default from the module. It is common for test modules to only want to add or remove tests from the standard set of tests. The third argument is used when loading packages as part of test discovery. A typical "load_tests" function that loads tests from a specific set of "TestCase" classes may look like: test_cases = (TestCase1, TestCase2, TestCase3) def load_tests(loader, tests, pattern): suite = TestSuite() for test_class in test_cases: tests = loader.loadTestsFromTestCase(test_class) suite.addTests(tests) return suite If discovery is started in a directory containing a package, either from the command line or by calling "TestLoader.discover()", then the package "__init__.py" will be checked for "load_tests". If that function does not exist, discovery will recurse into the package as though it were just another directory. Otherwise, discovery of the package’s tests will be left up to "load_tests" which is called with the following arguments: load_tests(loader, standard_tests, pattern) This should return a "TestSuite" representing all the tests from the package. ("standard_tests" will only contain tests collected from "__init__.py".) Because the pattern is passed into "load_tests" the package is free to continue (and potentially modify) test discovery. A ‘do nothing’ "load_tests" function for a test package would look like: def load_tests(loader, standard_tests, pattern): # top level directory cached on loader instance this_dir = os.path.dirname(__file__) package_tests = loader.discover(start_dir=this_dir, pattern=pattern) standard_tests.addTests(package_tests) return standard_tests Changed in version 3.5: Discovery no longer checks package names for matching *pattern* due to the impossibility of package names matching the default pattern. Class and Module Fixtures ========================= Class and module level fixtures are implemented in "TestSuite". When the test suite encounters a test from a new class then "tearDownClass()" from the previous class (if there is one) is called, followed by "setUpClass()" from the new class. Similarly if a test is from a different module from the previous test then "tearDownModule" from the previous module is run, followed by "setUpModule" from the new module. After all the tests have run the final "tearDownClass" and "tearDownModule" are run. Note that shared fixtures do not play well with [potential] features like test parallelization and they break test isolation. They should be used with care. The default ordering of tests created by the unittest test loaders is to group all tests from the same modules and classes together. This will lead to "setUpClass" / "setUpModule" (etc) being called exactly once per class and module. If you randomize the order, so that tests from different modules and classes are adjacent to each other, then these shared fixture functions may be called multiple times in a single test run. Shared fixtures are not intended to work with suites with non-standard ordering. A "BaseTestSuite" still exists for frameworks that don’t want to support shared fixtures. If there are any exceptions raised during one of the shared fixture functions the test is reported as an error. Because there is no corresponding test instance an "_ErrorHolder" object (that has the same interface as a "TestCase") is created to represent the error. If you are just using the standard unittest test runner then this detail doesn’t matter, but if you are a framework author it may be relevant. setUpClass and tearDownClass ---------------------------- These must be implemented as class methods: import unittest class Test(unittest.TestCase): @classmethod def setUpClass(cls): cls._connection = createExpensiveConnectionObject() @classmethod def tearDownClass(cls): cls._connection.destroy() If you want the "setUpClass" and "tearDownClass" on base classes called then you must call up to them yourself. The implementations in "TestCase" are empty. If an exception is raised during a "setUpClass" then the tests in the class are not run and the "tearDownClass" is not run. Skipped classes will not have "setUpClass" or "tearDownClass" run. If the exception is a "SkipTest" exception then the class will be reported as having been skipped instead of as an error. setUpModule and tearDownModule ------------------------------ These should be implemented as functions: def setUpModule(): createConnection() def tearDownModule(): closeConnection() If an exception is raised in a "setUpModule" then none of the tests in the module will be run and the "tearDownModule" will not be run. If the exception is a "SkipTest" exception then the module will be reported as having been skipped instead of as an error. To add cleanup code that must be run even in the case of an exception, use "addModuleCleanup": unittest.addModuleCleanup(function, /, *args, **kwargs) Add a function to be called after "tearDownModule()" to cleanup resources used during the test class. Functions will be called in reverse order to the order they are added (LIFO (last-in, first- out)). They are called with any arguments and keyword arguments passed into "addModuleCleanup()" when they are added. If "setUpModule()" fails, meaning that "tearDownModule()" is not called, then any cleanup functions added will still be called. New in version 3.8. unittest.doModuleCleanups() This function is called unconditionally after "tearDownModule()", or after "setUpModule()" if "setUpModule()" raises an exception. It is responsible for calling all the cleanup functions added by "addModuleCleanup()". If you need cleanup functions to be called *prior* to "tearDownModule()" then you can call "doModuleCleanups()" yourself. "doModuleCleanups()" pops methods off the stack of cleanup functions one at a time, so it can be called at any time. New in version 3.8. Signal Handling =============== New in version 3.2. The "-c/--catch" command-line option to unittest, along with the "catchbreak" parameter to "unittest.main()", provide more friendly handling of control-C during a test run. With catch break behavior enabled control-C will allow the currently running test to complete, and the test run will then end and report all the results so far. A second control-c will raise a "KeyboardInterrupt" in the usual way. The control-c handling signal handler attempts to remain compatible with code or tests that install their own "signal.SIGINT" handler. If the "unittest" handler is called but *isn’t* the installed "signal.SIGINT" handler, i.e. it has been replaced by the system under test and delegated to, then it calls the default handler. This will normally be the expected behavior by code that replaces an installed handler and delegates to it. For individual tests that need "unittest" control-c handling disabled the "removeHandler()" decorator can be used. There are a few utility functions for framework authors to enable control-c handling functionality within test frameworks. unittest.installHandler() Install the control-c handler. When a "signal.SIGINT" is received (usually in response to the user pressing control-c) all registered results have "stop()" called. unittest.registerResult(result) Register a "TestResult" object for control-c handling. Registering a result stores a weak reference to it, so it doesn’t prevent the result from being garbage collected. Registering a "TestResult" object has no side-effects if control-c handling is not enabled, so test frameworks can unconditionally register all results they create independently of whether or not handling is enabled. unittest.removeResult(result) Remove a registered result. Once a result has been removed then "stop()" will no longer be called on that result object in response to a control-c. unittest.removeHandler(function=None) When called without arguments this function removes the control-c handler if it has been installed. This function can also be used as a test decorator to temporarily remove the handler while the test is being executed: @unittest.removeHandler def test_signal_handling(self): ... Unix Specific Services ********************** The modules described in this chapter provide interfaces to features that are unique to the Unix operating system, or in some cases to some or many variants of it. Here’s an overview: * "posix" — The most common POSIX system calls * Large File Support * Notable Module Contents * "pwd" — The password database * "grp" — The group database * "termios" — POSIX style tty control * Example * "tty" — Terminal control functions * "pty" — Pseudo-terminal utilities * Example * "fcntl" — The "fcntl" and "ioctl" system calls * "resource" — Resource usage information * Resource Limits * Resource Usage * "syslog" — Unix syslog library routines * Examples * Simple example "urllib.error" — Exception classes raised by urllib.request *********************************************************** **Source code:** Lib/urllib/error.py ====================================================================== The "urllib.error" module defines the exception classes for exceptions raised by "urllib.request". The base exception class is "URLError". The following exceptions are raised by "urllib.error" as appropriate: exception urllib.error.URLError The handlers raise this exception (or derived exceptions) when they run into a problem. It is a subclass of "OSError". reason The reason for this error. It can be a message string or another exception instance. Changed in version 3.3: "URLError" has been made a subclass of "OSError" instead of "IOError". exception urllib.error.HTTPError Though being an exception (a subclass of "URLError"), an "HTTPError" can also function as a non-exceptional file-like return value (the same thing that "urlopen()" returns). This is useful when handling exotic HTTP errors, such as requests for authentication. code An HTTP status code as defined in **RFC 2616**. This numeric value corresponds to a value found in the dictionary of codes as found in "http.server.BaseHTTPRequestHandler.responses". reason This is usually a string explaining the reason for this error. headers The HTTP response headers for the HTTP request that caused the "HTTPError". New in version 3.4. exception urllib.error.ContentTooShortError(msg, content) This exception is raised when the "urlretrieve()" function detects that the amount of the downloaded data is less than the expected amount (given by the *Content-Length* header). The "content" attribute stores the downloaded (and supposedly truncated) data. "urllib.parse" — Parse URLs into components ******************************************* **Source code:** Lib/urllib/parse.py ====================================================================== This module defines a standard interface to break Uniform Resource Locator (URL) strings up in components (addressing scheme, network location, path etc.), to combine the components back into a URL string, and to convert a “relative URL” to an absolute URL given a “base URL.” The module has been designed to match the Internet RFC on Relative Uniform Resource Locators. It supports the following URL schemes: "file", "ftp", "gopher", "hdl", "http", "https", "imap", "mailto", "mms", "news", "nntp", "prospero", "rsync", "rtsp", "rtspu", "sftp", "shttp", "sip", "sips", "snews", "svn", "svn+ssh", "telnet", "wais", "ws", "wss". The "urllib.parse" module defines functions that fall into two broad categories: URL parsing and URL quoting. These are covered in detail in the following sections. URL Parsing =========== The URL parsing functions focus on splitting a URL string into its components, or on combining URL components into a URL string. urllib.parse.urlparse(urlstring, scheme='', allow_fragments=True) Parse a URL into six components, returning a 6-item *named tuple*. This corresponds to the general structure of a URL: "scheme://netloc/path;parameters?query#fragment". Each tuple item is a string, possibly empty. The components are not broken up into smaller parts (for example, the network location is a single string), and % escapes are not expanded. The delimiters as shown above are not part of the result, except for a leading slash in the *path* component, which is retained if present. For example: >>> from urllib.parse import urlparse >>> urlparse("scheme://netloc/path;parameters?query#fragment") ParseResult(scheme='scheme', netloc='netloc', path='/path;parameters', params='', query='query', fragment='fragment') >>> o = urlparse("http://docs.python.org:80/3/library/urllib.parse.html?" ... "highlight=params#url-parsing") >>> o ParseResult(scheme='http', netloc='docs.python.org:80', path='/3/library/urllib.parse.html', params='', query='highlight=params', fragment='url-parsing') >>> o.scheme 'http' >>> o.netloc 'docs.python.org:80' >>> o.hostname 'docs.python.org' >>> o.port 80 >>> o._replace(fragment="").geturl() 'http://docs.python.org:80/3/library/urllib.parse.html?highlight=params' Following the syntax specifications in **RFC 1808**, urlparse recognizes a netloc only if it is properly introduced by ‘//’. Otherwise the input is presumed to be a relative URL and thus to start with a path component. >>> from urllib.parse import urlparse >>> urlparse('//www.cwi.nl:80/%7Eguido/Python.html') ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html', params='', query='', fragment='') >>> urlparse('www.cwi.nl/%7Eguido/Python.html') ParseResult(scheme='', netloc='', path='www.cwi.nl/%7Eguido/Python.html', params='', query='', fragment='') >>> urlparse('help/Python.html') ParseResult(scheme='', netloc='', path='help/Python.html', params='', query='', fragment='') The *scheme* argument gives the default addressing scheme, to be used only if the URL does not specify one. It should be the same type (text or bytes) as *urlstring*, except that the default value "''" is always allowed, and is automatically converted to "b''" if appropriate. If the *allow_fragments* argument is false, fragment identifiers are not recognized. Instead, they are parsed as part of the path, parameters or query component, and "fragment" is set to the empty string in the return value. The return value is a *named tuple*, which means that its items can be accessed by index or as named attributes, which are: +--------------------+---------+---------------------------+--------------------------+ | Attribute | Index | Value | Value if not present | |====================|=========|===========================|==========================| | "scheme" | 0 | URL scheme specifier | *scheme* parameter | +--------------------+---------+---------------------------+--------------------------+ | "netloc" | 1 | Network location part | empty string | +--------------------+---------+---------------------------+--------------------------+ | "path" | 2 | Hierarchical path | empty string | +--------------------+---------+---------------------------+--------------------------+ | "params" | 3 | Parameters for last path | empty string | | | | element | | +--------------------+---------+---------------------------+--------------------------+ | "query" | 4 | Query component | empty string | +--------------------+---------+---------------------------+--------------------------+ | "fragment" | 5 | Fragment identifier | empty string | +--------------------+---------+---------------------------+--------------------------+ | "username" | | User name | "None" | +--------------------+---------+---------------------------+--------------------------+ | "password" | | Password | "None" | +--------------------+---------+---------------------------+--------------------------+ | "hostname" | | Host name (lower case) | "None" | +--------------------+---------+---------------------------+--------------------------+ | "port" | | Port number as integer, | "None" | | | | if present | | +--------------------+---------+---------------------------+--------------------------+ Reading the "port" attribute will raise a "ValueError" if an invalid port is specified in the URL. See section Structured Parse Results for more information on the result object. Unmatched square brackets in the "netloc" attribute will raise a "ValueError". Characters in the "netloc" attribute that decompose under NFKC normalization (as used by the IDNA encoding) into any of "/", "?", "#", "@", or ":" will raise a "ValueError". If the URL is decomposed before parsing, no error will be raised. As is the case with all named tuples, the subclass has a few additional methods and attributes that are particularly useful. One such method is "_replace()". The "_replace()" method will return a new ParseResult object replacing specified fields with new values. >>> from urllib.parse import urlparse >>> u = urlparse('//www.cwi.nl:80/%7Eguido/Python.html') >>> u ParseResult(scheme='', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html', params='', query='', fragment='') >>> u._replace(scheme='http') ParseResult(scheme='http', netloc='www.cwi.nl:80', path='/%7Eguido/Python.html', params='', query='', fragment='') Warning: "urlparse()" does not perform validation. See URL parsing security for details. Changed in version 3.2: Added IPv6 URL parsing capabilities. Changed in version 3.3: The fragment is now parsed for all URL schemes (unless *allow_fragment* is false), in accordance with **RFC 3986**. Previously, a whitelist of schemes that support fragments existed. Changed in version 3.6: Out-of-range port numbers now raise "ValueError", instead of returning "None". Changed in version 3.8: Characters that affect netloc parsing under NFKC normalization will now raise "ValueError". urllib.parse.parse_qs(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&') Parse a query string given as a string argument (data of type *application/x-www-form-urlencoded*). Data are returned as a dictionary. The dictionary keys are the unique query variable names and the values are lists of values for each name. The optional argument *keep_blank_values* is a flag indicating whether blank values in percent-encoded queries should be treated as blank strings. A true value indicates that blanks should be retained as blank strings. The default false value indicates that blank values are to be ignored and treated as if they were not included. The optional argument *strict_parsing* is a flag indicating what to do with parsing errors. If false (the default), errors are silently ignored. If true, errors raise a "ValueError" exception. The optional *encoding* and *errors* parameters specify how to decode percent-encoded sequences into Unicode characters, as accepted by the "bytes.decode()" method. The optional argument *max_num_fields* is the maximum number of fields to read. If set, then throws a "ValueError" if there are more than *max_num_fields* fields read. The optional argument *separator* is the symbol to use for separating the query arguments. It defaults to "&". Use the "urllib.parse.urlencode()" function (with the "doseq" parameter set to "True") to convert such dictionaries into query strings. Changed in version 3.2: Add *encoding* and *errors* parameters. Changed in version 3.8: Added *max_num_fields* parameter. Changed in version 3.9.2: Added *separator* parameter with the default value of "&". Python versions earlier than Python 3.9.2 allowed using both ";" and "&" as query parameter separator. This has been changed to allow only a single separator key, with "&" as the default separator. urllib.parse.parse_qsl(qs, keep_blank_values=False, strict_parsing=False, encoding='utf-8', errors='replace', max_num_fields=None, separator='&') Parse a query string given as a string argument (data of type *application/x-www-form-urlencoded*). Data are returned as a list of name, value pairs. The optional argument *keep_blank_values* is a flag indicating whether blank values in percent-encoded queries should be treated as blank strings. A true value indicates that blanks should be retained as blank strings. The default false value indicates that blank values are to be ignored and treated as if they were not included. The optional argument *strict_parsing* is a flag indicating what to do with parsing errors. If false (the default), errors are silently ignored. If true, errors raise a "ValueError" exception. The optional *encoding* and *errors* parameters specify how to decode percent-encoded sequences into Unicode characters, as accepted by the "bytes.decode()" method. The optional argument *max_num_fields* is the maximum number of fields to read. If set, then throws a "ValueError" if there are more than *max_num_fields* fields read. The optional argument *separator* is the symbol to use for separating the query arguments. It defaults to "&". Use the "urllib.parse.urlencode()" function to convert such lists of pairs into query strings. Changed in version 3.2: Add *encoding* and *errors* parameters. Changed in version 3.8: Added *max_num_fields* parameter. Changed in version 3.9.2: Added *separator* parameter with the default value of "&". Python versions earlier than Python 3.9.2 allowed using both ";" and "&" as query parameter separator. This has been changed to allow only a single separator key, with "&" as the default separator. urllib.parse.urlunparse(parts) Construct a URL from a tuple as returned by "urlparse()". The *parts* argument can be any six-item iterable. This may result in a slightly different, but equivalent URL, if the URL that was parsed originally had unnecessary delimiters (for example, a "?" with an empty query; the RFC states that these are equivalent). urllib.parse.urlsplit(urlstring, scheme='', allow_fragments=True) This is similar to "urlparse()", but does not split the params from the URL. This should generally be used instead of "urlparse()" if the more recent URL syntax allowing parameters to be applied to each segment of the *path* portion of the URL (see **RFC 2396**) is wanted. A separate function is needed to separate the path segments and parameters. This function returns a 5-item *named tuple*: (addressing scheme, network location, path, query, fragment identifier). The return value is a *named tuple*, its items can be accessed by index or as named attributes: +--------------------+---------+---------------------------+------------------------+ | Attribute | Index | Value | Value if not present | |====================|=========|===========================|========================| | "scheme" | 0 | URL scheme specifier | *scheme* parameter | +--------------------+---------+---------------------------+------------------------+ | "netloc" | 1 | Network location part | empty string | +--------------------+---------+---------------------------+------------------------+ | "path" | 2 | Hierarchical path | empty string | +--------------------+---------+---------------------------+------------------------+ | "query" | 3 | Query component | empty string | +--------------------+---------+---------------------------+------------------------+ | "fragment" | 4 | Fragment identifier | empty string | +--------------------+---------+---------------------------+------------------------+ | "username" | | User name | "None" | +--------------------+---------+---------------------------+------------------------+ | "password" | | Password | "None" | +--------------------+---------+---------------------------+------------------------+ | "hostname" | | Host name (lower case) | "None" | +--------------------+---------+---------------------------+------------------------+ | "port" | | Port number as integer, | "None" | | | | if present | | +--------------------+---------+---------------------------+------------------------+ Reading the "port" attribute will raise a "ValueError" if an invalid port is specified in the URL. See section Structured Parse Results for more information on the result object. Unmatched square brackets in the "netloc" attribute will raise a "ValueError". Characters in the "netloc" attribute that decompose under NFKC normalization (as used by the IDNA encoding) into any of "/", "?", "#", "@", or ":" will raise a "ValueError". If the URL is decomposed before parsing, no error will be raised. Following some of the WHATWG spec that updates RFC 3986, leading C0 control and space characters are stripped from the URL. "\n", "\r" and tab "\t" characters are removed from the URL at any position. Warning: "urlsplit()" does not perform validation. See URL parsing security for details. Changed in version 3.6: Out-of-range port numbers now raise "ValueError", instead of returning "None". Changed in version 3.8: Characters that affect netloc parsing under NFKC normalization will now raise "ValueError". Changed in version 3.9.5: ASCII newline and tab characters are stripped from the URL. Changed in version 3.9.17: Leading WHATWG C0 control and space characters are stripped from the URL. urllib.parse.urlunsplit(parts) Combine the elements of a tuple as returned by "urlsplit()" into a complete URL as a string. The *parts* argument can be any five-item iterable. This may result in a slightly different, but equivalent URL, if the URL that was parsed originally had unnecessary delimiters (for example, a ? with an empty query; the RFC states that these are equivalent). urllib.parse.urljoin(base, url, allow_fragments=True) Construct a full (“absolute”) URL by combining a “base URL” (*base*) with another URL (*url*). Informally, this uses components of the base URL, in particular the addressing scheme, the network location and (part of) the path, to provide missing components in the relative URL. For example: >>> from urllib.parse import urljoin >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', 'FAQ.html') 'http://www.cwi.nl/%7Eguido/FAQ.html' The *allow_fragments* argument has the same meaning and default as for "urlparse()". Note: If *url* is an absolute URL (that is, it starts with "//" or "scheme://"), the *url*’s hostname and/or scheme will be present in the result. For example: >>> urljoin('http://www.cwi.nl/%7Eguido/Python.html', ... '//www.python.org/%7Eguido') 'http://www.python.org/%7Eguido' If you do not want that behavior, preprocess the *url* with "urlsplit()" and "urlunsplit()", removing possible *scheme* and *netloc* parts. Changed in version 3.5: Behavior updated to match the semantics defined in **RFC 3986**. urllib.parse.urldefrag(url) If *url* contains a fragment identifier, return a modified version of *url* with no fragment identifier, and the fragment identifier as a separate string. If there is no fragment identifier in *url*, return *url* unmodified and an empty string. The return value is a *named tuple*, its items can be accessed by index or as named attributes: +--------------------+---------+---------------------------+------------------------+ | Attribute | Index | Value | Value if not present | |====================|=========|===========================|========================| | "url" | 0 | URL with no fragment | empty string | +--------------------+---------+---------------------------+------------------------+ | "fragment" | 1 | Fragment identifier | empty string | +--------------------+---------+---------------------------+------------------------+ See section Structured Parse Results for more information on the result object. Changed in version 3.2: Result is a structured object rather than a simple 2-tuple. urllib.parse.unwrap(url) Extract the url from a wrapped URL (that is, a string formatted as "", "", "URL:scheme://host/path" or "scheme://host/path"). If *url* is not a wrapped URL, it is returned without changes. URL parsing security ==================== The "urlsplit()" and "urlparse()" APIs do not perform **validation** of inputs. They may not raise errors on inputs that other applications consider invalid. They may also succeed on some inputs that might not be considered URLs elsewhere. Their purpose is for practical functionality rather than purity. Instead of raising an exception on unusual input, they may instead return some component parts as empty strings. Or components may contain more than perhaps they should. We recommend that users of these APIs where the values may be used anywhere with security implications code defensively. Do some verification within your code before trusting a returned component part. Does that "scheme" make sense? Is that a sensible "path"? Is there anything strange about that "hostname"? etc. Parsing ASCII Encoded Bytes =========================== The URL parsing functions were originally designed to operate on character strings only. In practice, it is useful to be able to manipulate properly quoted and encoded URLs as sequences of ASCII bytes. Accordingly, the URL parsing functions in this module all operate on "bytes" and "bytearray" objects in addition to "str" objects. If "str" data is passed in, the result will also contain only "str" data. If "bytes" or "bytearray" data is passed in, the result will contain only "bytes" data. Attempting to mix "str" data with "bytes" or "bytearray" in a single function call will result in a "TypeError" being raised, while attempting to pass in non-ASCII byte values will trigger "UnicodeDecodeError". To support easier conversion of result objects between "str" and "bytes", all return values from URL parsing functions provide either an "encode()" method (when the result contains "str" data) or a "decode()" method (when the result contains "bytes" data). The signatures of these methods match those of the corresponding "str" and "bytes" methods (except that the default encoding is "'ascii'" rather than "'utf-8'"). Each produces a value of a corresponding type that contains either "bytes" data (for "encode()" methods) or "str" data (for "decode()" methods). Applications that need to operate on potentially improperly quoted URLs that may contain non-ASCII data will need to do their own decoding from bytes to characters before invoking the URL parsing methods. The behaviour described in this section applies only to the URL parsing functions. The URL quoting functions use their own rules when producing or consuming byte sequences as detailed in the documentation of the individual URL quoting functions. Changed in version 3.2: URL parsing functions now accept ASCII encoded byte sequences Structured Parse Results ======================== The result objects from the "urlparse()", "urlsplit()" and "urldefrag()" functions are subclasses of the "tuple" type. These subclasses add the attributes listed in the documentation for those functions, the encoding and decoding support described in the previous section, as well as an additional method: urllib.parse.SplitResult.geturl() Return the re-combined version of the original URL as a string. This may differ from the original URL in that the scheme may be normalized to lower case and empty components may be dropped. Specifically, empty parameters, queries, and fragment identifiers will be removed. For "urldefrag()" results, only empty fragment identifiers will be removed. For "urlsplit()" and "urlparse()" results, all noted changes will be made to the URL returned by this method. The result of this method remains unchanged if passed back through the original parsing function: >>> from urllib.parse import urlsplit >>> url = 'HTTP://www.Python.org/doc/#' >>> r1 = urlsplit(url) >>> r1.geturl() 'http://www.Python.org/doc/' >>> r2 = urlsplit(r1.geturl()) >>> r2.geturl() 'http://www.Python.org/doc/' The following classes provide the implementations of the structured parse results when operating on "str" objects: class urllib.parse.DefragResult(url, fragment) Concrete class for "urldefrag()" results containing "str" data. The "encode()" method returns a "DefragResultBytes" instance. New in version 3.2. class urllib.parse.ParseResult(scheme, netloc, path, params, query, fragment) Concrete class for "urlparse()" results containing "str" data. The "encode()" method returns a "ParseResultBytes" instance. class urllib.parse.SplitResult(scheme, netloc, path, query, fragment) Concrete class for "urlsplit()" results containing "str" data. The "encode()" method returns a "SplitResultBytes" instance. The following classes provide the implementations of the parse results when operating on "bytes" or "bytearray" objects: class urllib.parse.DefragResultBytes(url, fragment) Concrete class for "urldefrag()" results containing "bytes" data. The "decode()" method returns a "DefragResult" instance. New in version 3.2. class urllib.parse.ParseResultBytes(scheme, netloc, path, params, query, fragment) Concrete class for "urlparse()" results containing "bytes" data. The "decode()" method returns a "ParseResult" instance. New in version 3.2. class urllib.parse.SplitResultBytes(scheme, netloc, path, query, fragment) Concrete class for "urlsplit()" results containing "bytes" data. The "decode()" method returns a "SplitResult" instance. New in version 3.2. URL Quoting =========== The URL quoting functions focus on taking program data and making it safe for use as URL components by quoting special characters and appropriately encoding non-ASCII text. They also support reversing these operations to recreate the original data from the contents of a URL component if that task isn’t already covered by the URL parsing functions above. urllib.parse.quote(string, safe='/', encoding=None, errors=None) Replace special characters in *string* using the "%xx" escape. Letters, digits, and the characters "'_.-~'" are never quoted. By default, this function is intended for quoting the path section of a URL. The optional *safe* parameter specifies additional ASCII characters that should not be quoted — its default value is "'/'". *string* may be either a "str" or a "bytes" object. Changed in version 3.7: Moved from **RFC 2396** to **RFC 3986** for quoting URL strings. “~” is now included in the set of unreserved characters. The optional *encoding* and *errors* parameters specify how to deal with non-ASCII characters, as accepted by the "str.encode()" method. *encoding* defaults to "'utf-8'". *errors* defaults to "'strict'", meaning unsupported characters raise a "UnicodeEncodeError". *encoding* and *errors* must not be supplied if *string* is a "bytes", or a "TypeError" is raised. Note that "quote(string, safe, encoding, errors)" is equivalent to "quote_from_bytes(string.encode(encoding, errors), safe)". Example: "quote('/El Niño/')" yields "'/El%20Ni%C3%B1o/'". urllib.parse.quote_plus(string, safe='', encoding=None, errors=None) Like "quote()", but also replace spaces with plus signs, as required for quoting HTML form values when building up a query string to go into a URL. Plus signs in the original string are escaped unless they are included in *safe*. It also does not have *safe* default to "'/'". Example: "quote_plus('/El Niño/')" yields "'%2FEl+Ni%C3%B1o%2F'". urllib.parse.quote_from_bytes(bytes, safe='/') Like "quote()", but accepts a "bytes" object rather than a "str", and does not perform string-to-bytes encoding. Example: "quote_from_bytes(b'a&\xef')" yields "'a%26%EF'". urllib.parse.unquote(string, encoding='utf-8', errors='replace') Replace "%xx" escapes with their single-character equivalent. The optional *encoding* and *errors* parameters specify how to decode percent-encoded sequences into Unicode characters, as accepted by the "bytes.decode()" method. *string* may be either a "str" or a "bytes" object. *encoding* defaults to "'utf-8'". *errors* defaults to "'replace'", meaning invalid sequences are replaced by a placeholder character. Example: "unquote('/El%20Ni%C3%B1o/')" yields "'/El Niño/'". Changed in version 3.9: *string* parameter supports bytes and str objects (previously only str). urllib.parse.unquote_plus(string, encoding='utf-8', errors='replace') Like "unquote()", but also replace plus signs with spaces, as required for unquoting HTML form values. *string* must be a "str". Example: "unquote_plus('/El+Ni%C3%B1o/')" yields "'/El Niño/'". urllib.parse.unquote_to_bytes(string) Replace "%xx" escapes with their single-octet equivalent, and return a "bytes" object. *string* may be either a "str" or a "bytes" object. If it is a "str", unescaped non-ASCII characters in *string* are encoded into UTF-8 bytes. Example: "unquote_to_bytes('a%26%EF')" yields "b'a&\xef'". urllib.parse.urlencode(query, doseq=False, safe='', encoding=None, errors=None, quote_via=quote_plus) Convert a mapping object or a sequence of two-element tuples, which may contain "str" or "bytes" objects, to a percent-encoded ASCII text string. If the resultant string is to be used as a *data* for POST operation with the "urlopen()" function, then it should be encoded to bytes, otherwise it would result in a "TypeError". The resulting string is a series of "key=value" pairs separated by "'&'" characters, where both *key* and *value* are quoted using the *quote_via* function. By default, "quote_plus()" is used to quote the values, which means spaces are quoted as a "'+'" character and ‘/’ characters are encoded as "%2F", which follows the standard for GET requests ("application/x-www-form-urlencoded"). An alternate function that can be passed as *quote_via* is "quote()", which will encode spaces as "%20" and not encode ‘/’ characters. For maximum control of what is quoted, use "quote" and specify a value for *safe*. When a sequence of two-element tuples is used as the *query* argument, the first element of each tuple is a key and the second is a value. The value element in itself can be a sequence and in that case, if the optional parameter *doseq* evaluates to "True", individual "key=value" pairs separated by "'&'" are generated for each element of the value sequence for the key. The order of parameters in the encoded string will match the order of parameter tuples in the sequence. The *safe*, *encoding*, and *errors* parameters are passed down to *quote_via* (the *encoding* and *errors* parameters are only passed when a query element is a "str"). To reverse this encoding process, "parse_qs()" and "parse_qsl()" are provided in this module to parse query strings into Python data structures. Refer to urllib examples to find out how the "urllib.parse.urlencode()" method can be used for generating the query string of a URL or data for a POST request. Changed in version 3.2: *query* supports bytes and string objects. New in version 3.5: *quote_via* parameter. See also: WHATWG - URL Living standard Working Group for the URL Standard that defines URLs, domains, IP addresses, the application/x-www-form-urlencoded format, and their API. **RFC 3986** - Uniform Resource Identifiers This is the current standard (STD66). Any changes to urllib.parse module should conform to this. Certain deviations could be observed, which are mostly for backward compatibility purposes and for certain de-facto parsing requirements as commonly observed in major browsers. **RFC 2732** - Format for Literal IPv6 Addresses in URL’s. This specifies the parsing requirements of IPv6 URLs. **RFC 2396** - Uniform Resource Identifiers (URI): Generic Syntax Document describing the generic syntactic requirements for both Uniform Resource Names (URNs) and Uniform Resource Locators (URLs). **RFC 2368** - The mailto URL scheme. Parsing requirements for mailto URL schemes. **RFC 1808** - Relative Uniform Resource Locators This Request For Comments includes the rules for joining an absolute and a relative URL, including a fair number of “Abnormal Examples” which govern the treatment of border cases. **RFC 1738** - Uniform Resource Locators (URL) This specifies the formal syntax and semantics of absolute URLs. "urllib.request" — Extensible library for opening URLs ****************************************************** **Source code:** Lib/urllib/request.py ====================================================================== The "urllib.request" module defines functions and classes which help in opening URLs (mostly HTTP) in a complex world — basic and digest authentication, redirections, cookies and more. See also: The Requests package is recommended for a higher-level HTTP client interface. The "urllib.request" module defines the following functions: urllib.request.urlopen(url, data=None[, timeout], *, cafile=None, capath=None, cadefault=False, context=None) Open the URL *url*, which can be either a string or a "Request" object. *data* must be an object specifying additional data to be sent to the server, or "None" if no such data is needed. See "Request" for details. urllib.request module uses HTTP/1.1 and includes "Connection:close" header in its HTTP requests. The optional *timeout* parameter specifies a timeout in seconds for blocking operations like the connection attempt (if not specified, the global default timeout setting will be used). This actually only works for HTTP, HTTPS and FTP connections. If *context* is specified, it must be a "ssl.SSLContext" instance describing the various SSL options. See "HTTPSConnection" for more details. The optional *cafile* and *capath* parameters specify a set of trusted CA certificates for HTTPS requests. *cafile* should point to a single file containing a bundle of CA certificates, whereas *capath* should point to a directory of hashed certificate files. More information can be found in "ssl.SSLContext.load_verify_locations()". The *cadefault* parameter is ignored. This function always returns an object which can work as a *context manager* and has the properties *url*, *headers*, and *status*. See "urllib.response.addinfourl" for more detail on these properties. For HTTP and HTTPS URLs, this function returns a "http.client.HTTPResponse" object slightly modified. In addition to the three new methods above, the msg attribute contains the same information as the "reason" attribute — the reason phrase returned by server — instead of the response headers as it is specified in the documentation for "HTTPResponse". For FTP, file, and data URLs and requests explicitly handled by legacy "URLopener" and "FancyURLopener" classes, this function returns a "urllib.response.addinfourl" object. Raises "URLError" on protocol errors. Note that "None" may be returned if no handler handles the request (though the default installed global "OpenerDirector" uses "UnknownHandler" to ensure this never happens). In addition, if proxy settings are detected (for example, when a "*_proxy" environment variable like "http_proxy" is set), "ProxyHandler" is default installed and makes sure the requests are handled through the proxy. The legacy "urllib.urlopen" function from Python 2.6 and earlier has been discontinued; "urllib.request.urlopen()" corresponds to the old "urllib2.urlopen". Proxy handling, which was done by passing a dictionary parameter to "urllib.urlopen", can be obtained by using "ProxyHandler" objects. The default opener raises an auditing event "urllib.Request" with arguments "fullurl", "data", "headers", "method" taken from the request object. Changed in version 3.2: *cafile* and *capath* were added. Changed in version 3.2: HTTPS virtual hosts are now supported if possible (that is, if "ssl.HAS_SNI" is true). New in version 3.2: *data* can be an iterable object. Changed in version 3.3: *cadefault* was added. Changed in version 3.4.3: *context* was added. Deprecated since version 3.6: *cafile*, *capath* and *cadefault* are deprecated in favor of *context*. Please use "ssl.SSLContext.load_cert_chain()" instead, or let "ssl.create_default_context()" select the system’s trusted CA certificates for you. urllib.request.install_opener(opener) Install an "OpenerDirector" instance as the default global opener. Installing an opener is only necessary if you want urlopen to use that opener; otherwise, simply call "OpenerDirector.open()" instead of "urlopen()". The code does not check for a real "OpenerDirector", and any class with the appropriate interface will work. urllib.request.build_opener([handler, ...]) Return an "OpenerDirector" instance, which chains the handlers in the order given. *handler*s can be either instances of "BaseHandler", or subclasses of "BaseHandler" (in which case it must be possible to call the constructor without any parameters). Instances of the following classes will be in front of the *handler*s, unless the *handler*s contain them, instances of them or subclasses of them: "ProxyHandler" (if proxy settings are detected), "UnknownHandler", "HTTPHandler", "HTTPDefaultErrorHandler", "HTTPRedirectHandler", "FTPHandler", "FileHandler", "HTTPErrorProcessor". If the Python installation has SSL support (i.e., if the "ssl" module can be imported), "HTTPSHandler" will also be added. A "BaseHandler" subclass may also change its "handler_order" attribute to modify its position in the handlers list. urllib.request.pathname2url(path) Convert the pathname *path* from the local syntax for a path to the form used in the path component of a URL. This does not produce a complete URL. The return value will already be quoted using the "quote()" function. urllib.request.url2pathname(path) Convert the path component *path* from a percent-encoded URL to the local syntax for a path. This does not accept a complete URL. This function uses "unquote()" to decode *path*. urllib.request.getproxies() This helper function returns a dictionary of scheme to proxy server URL mappings. It scans the environment for variables named "_proxy", in a case insensitive approach, for all operating systems first, and when it cannot find it, looks for proxy information from System Configuration for macOS and Windows Systems Registry for Windows. If both lowercase and uppercase environment variables exist (and disagree), lowercase is preferred. Note: If the environment variable "REQUEST_METHOD" is set, which usually indicates your script is running in a CGI environment, the environment variable "HTTP_PROXY" (uppercase "_PROXY") will be ignored. This is because that variable can be injected by a client using the “Proxy:” HTTP header. If you need to use an HTTP proxy in a CGI environment, either use "ProxyHandler" explicitly, or make sure the variable name is in lowercase (or at least the "_proxy" suffix). The following classes are provided: class urllib.request.Request(url, data=None, headers={}, origin_req_host=None, unverifiable=False, method=None) This class is an abstraction of a URL request. *url* should be a string containing a valid URL. *data* must be an object specifying additional data to send to the server, or "None" if no such data is needed. Currently HTTP requests are the only ones that use *data*. The supported object types include bytes, file-like objects, and iterables of bytes-like objects. If no "Content-Length" nor "Transfer-Encoding" header field has been provided, "HTTPHandler" will set these headers according to the type of *data*. "Content-Length" will be used to send bytes objects, while "Transfer-Encoding: chunked" as specified in **RFC 7230**, Section 3.3.1 will be used to send files and other iterables. For an HTTP POST request method, *data* should be a buffer in the standard *application/x-www-form-urlencoded* format. The "urllib.parse.urlencode()" function takes a mapping or sequence of 2-tuples and returns an ASCII string in this format. It should be encoded to bytes before being used as the *data* parameter. *headers* should be a dictionary, and will be treated as if "add_header()" was called with each key and value as arguments. This is often used to “spoof” the "User-Agent" header value, which is used by a browser to identify itself – some HTTP servers only allow requests coming from common browsers as opposed to scripts. For example, Mozilla Firefox may identify itself as ""Mozilla/5.0 (X11; U; Linux i686) Gecko/20071127 Firefox/2.0.0.11"", while "urllib"’s default user agent string is ""Python-urllib/2.6"" (on Python 2.6). All header keys are sent in camel case. An appropriate "Content-Type" header should be included if the *data* argument is present. If this header has not been provided and *data* is not None, "Content-Type: application/x-www-form- urlencoded" will be added as a default. The next two arguments are only of interest for correct handling of third-party HTTP cookies: *origin_req_host* should be the request-host of the origin transaction, as defined by **RFC 2965**. It defaults to "http.cookiejar.request_host(self)". This is the host name or IP address of the original request that was initiated by the user. For example, if the request is for an image in an HTML document, this should be the request-host of the request for the page containing the image. *unverifiable* should indicate whether the request is unverifiable, as defined by **RFC 2965**. It defaults to "False". An unverifiable request is one whose URL the user did not have the option to approve. For example, if the request is for an image in an HTML document, and the user had no option to approve the automatic fetching of the image, this should be true. *method* should be a string that indicates the HTTP request method that will be used (e.g. "'HEAD'"). If provided, its value is stored in the "method" attribute and is used by "get_method()". The default is "'GET'" if *data* is "None" or "'POST'" otherwise. Subclasses may indicate a different default method by setting the "method" attribute in the class itself. Note: The request will not work as expected if the data object is unable to deliver its content more than once (e.g. a file or an iterable that can produce the content only once) and the request is retried for HTTP redirects or authentication. The *data* is sent to the HTTP server right away after the headers. There is no support for a 100-continue expectation in the library. Changed in version 3.3: "Request.method" argument is added to the Request class. Changed in version 3.4: Default "Request.method" may be indicated at the class level. Changed in version 3.6: Do not raise an error if the "Content- Length" has not been provided and *data* is neither "None" nor a bytes object. Fall back to use chunked transfer encoding instead. class urllib.request.OpenerDirector The "OpenerDirector" class opens URLs via "BaseHandler"s chained together. It manages the chaining of handlers, and recovery from errors. class urllib.request.BaseHandler This is the base class for all registered handlers — and handles only the simple mechanics of registration. class urllib.request.HTTPDefaultErrorHandler A class which defines a default handler for HTTP error responses; all responses are turned into "HTTPError" exceptions. class urllib.request.HTTPRedirectHandler A class to handle redirections. class urllib.request.HTTPCookieProcessor(cookiejar=None) A class to handle HTTP Cookies. class urllib.request.ProxyHandler(proxies=None) Cause requests to go through a proxy. If *proxies* is given, it must be a dictionary mapping protocol names to URLs of proxies. The default is to read the list of proxies from the environment variables "_proxy". If no proxy environment variables are set, then in a Windows environment proxy settings are obtained from the registry’s Internet Settings section, and in a macOS environment proxy information is retrieved from the System Configuration Framework. To disable autodetected proxy pass an empty dictionary. The "no_proxy" environment variable can be used to specify hosts which shouldn’t be reached via proxy; if set, it should be a comma- separated list of hostname suffixes, optionally with ":port" appended, for example "cern.ch,ncsa.uiuc.edu,some.host:8080". Note: "HTTP_PROXY" will be ignored if a variable "REQUEST_METHOD" is set; see the documentation on "getproxies()". class urllib.request.HTTPPasswordMgr Keep a database of "(realm, uri) -> (user, password)" mappings. class urllib.request.HTTPPasswordMgrWithDefaultRealm Keep a database of "(realm, uri) -> (user, password)" mappings. A realm of "None" is considered a catch-all realm, which is searched if no other realm fits. class urllib.request.HTTPPasswordMgrWithPriorAuth A variant of "HTTPPasswordMgrWithDefaultRealm" that also has a database of "uri -> is_authenticated" mappings. Can be used by a BasicAuth handler to determine when to send authentication credentials immediately instead of waiting for a "401" response first. New in version 3.5. class urllib.request.AbstractBasicAuthHandler(password_mgr=None) This is a mixin class that helps with HTTP authentication, both to the remote host and to a proxy. *password_mgr*, if given, should be something that is compatible with "HTTPPasswordMgr"; refer to section HTTPPasswordMgr Objects for information on the interface that must be supported. If *passwd_mgr* also provides "is_authenticated" and "update_authenticated" methods (see HTTPPasswordMgrWithPriorAuth Objects), then the handler will use the "is_authenticated" result for a given URI to determine whether or not to send authentication credentials with the request. If "is_authenticated" returns "True" for the URI, credentials are sent. If "is_authenticated" is "False", credentials are not sent, and then if a "401" response is received the request is re-sent with the authentication credentials. If authentication succeeds, "update_authenticated" is called to set "is_authenticated" "True" for the URI, so that subsequent requests to the URI or any of its super-URIs will automatically include the authentication credentials. New in version 3.5: Added "is_authenticated" support. class urllib.request.HTTPBasicAuthHandler(password_mgr=None) Handle authentication with the remote host. *password_mgr*, if given, should be something that is compatible with "HTTPPasswordMgr"; refer to section HTTPPasswordMgr Objects for information on the interface that must be supported. HTTPBasicAuthHandler will raise a "ValueError" when presented with a wrong Authentication scheme. class urllib.request.ProxyBasicAuthHandler(password_mgr=None) Handle authentication with the proxy. *password_mgr*, if given, should be something that is compatible with "HTTPPasswordMgr"; refer to section HTTPPasswordMgr Objects for information on the interface that must be supported. class urllib.request.AbstractDigestAuthHandler(password_mgr=None) This is a mixin class that helps with HTTP authentication, both to the remote host and to a proxy. *password_mgr*, if given, should be something that is compatible with "HTTPPasswordMgr"; refer to section HTTPPasswordMgr Objects for information on the interface that must be supported. class urllib.request.HTTPDigestAuthHandler(password_mgr=None) Handle authentication with the remote host. *password_mgr*, if given, should be something that is compatible with "HTTPPasswordMgr"; refer to section HTTPPasswordMgr Objects for information on the interface that must be supported. When both Digest Authentication Handler and Basic Authentication Handler are both added, Digest Authentication is always tried first. If the Digest Authentication returns a 40x response again, it is sent to Basic Authentication handler to Handle. This Handler method will raise a "ValueError" when presented with an authentication scheme other than Digest or Basic. Changed in version 3.3: Raise "ValueError" on unsupported Authentication Scheme. class urllib.request.ProxyDigestAuthHandler(password_mgr=None) Handle authentication with the proxy. *password_mgr*, if given, should be something that is compatible with "HTTPPasswordMgr"; refer to section HTTPPasswordMgr Objects for information on the interface that must be supported. class urllib.request.HTTPHandler A class to handle opening of HTTP URLs. class urllib.request.HTTPSHandler(debuglevel=0, context=None, check_hostname=None) A class to handle opening of HTTPS URLs. *context* and *check_hostname* have the same meaning as in "http.client.HTTPSConnection". Changed in version 3.2: *context* and *check_hostname* were added. class urllib.request.FileHandler Open local files. class urllib.request.DataHandler Open data URLs. New in version 3.4. class urllib.request.FTPHandler Open FTP URLs. class urllib.request.CacheFTPHandler Open FTP URLs, keeping a cache of open FTP connections to minimize delays. class urllib.request.UnknownHandler A catch-all class to handle unknown URLs. class urllib.request.HTTPErrorProcessor Process HTTP error responses. Request Objects =============== The following methods describe "Request"’s public interface, and so all may be overridden in subclasses. It also defines several public attributes that can be used by clients to inspect the parsed request. Request.full_url The original URL passed to the constructor. Changed in version 3.4. Request.full_url is a property with setter, getter and a deleter. Getting "full_url" returns the original request URL with the fragment, if it was present. Request.type The URI scheme. Request.host The URI authority, typically a host, but may also contain a port separated by a colon. Request.origin_req_host The original host for the request, without port. Request.selector The URI path. If the "Request" uses a proxy, then selector will be the full URL that is passed to the proxy. Request.data The entity body for the request, or "None" if not specified. Changed in version 3.4: Changing value of "Request.data" now deletes “Content-Length” header if it was previously set or calculated. Request.unverifiable boolean, indicates whether the request is unverifiable as defined by **RFC 2965**. Request.method The HTTP request method to use. By default its value is "None", which means that "get_method()" will do its normal computation of the method to be used. Its value can be set (thus overriding the default computation in "get_method()") either by providing a default value by setting it at the class level in a "Request" subclass, or by passing a value in to the "Request" constructor via the *method* argument. New in version 3.3. Changed in version 3.4: A default value can now be set in subclasses; previously it could only be set via the constructor argument. Request.get_method() Return a string indicating the HTTP request method. If "Request.method" is not "None", return its value, otherwise return "'GET'" if "Request.data" is "None", or "'POST'" if it’s not. This is only meaningful for HTTP requests. Changed in version 3.3: get_method now looks at the value of "Request.method". Request.add_header(key, val) Add another header to the request. Headers are currently ignored by all handlers except HTTP handlers, where they are added to the list of headers sent to the server. Note that there cannot be more than one header with the same name, and later calls will overwrite previous calls in case the *key* collides. Currently, this is no loss of HTTP functionality, since all headers which have meaning when used more than once have a (header-specific) way of gaining the same functionality using only one header. Note that headers added using this method are also added to redirected requests. Request.add_unredirected_header(key, header) Add a header that will not be added to a redirected request. Request.has_header(header) Return whether the instance has the named header (checks both regular and unredirected). Request.remove_header(header) Remove named header from the request instance (both from regular and unredirected headers). New in version 3.4. Request.get_full_url() Return the URL given in the constructor. Changed in version 3.4. Returns "Request.full_url" Request.set_proxy(host, type) Prepare the request by connecting to a proxy server. The *host* and *type* will replace those of the instance, and the instance’s selector will be the original URL given in the constructor. Request.get_header(header_name, default=None) Return the value of the given header. If the header is not present, return the default value. Request.header_items() Return a list of tuples (header_name, header_value) of the Request headers. Changed in version 3.4: The request methods add_data, has_data, get_data, get_type, get_host, get_selector, get_origin_req_host and is_unverifiable that were deprecated since 3.3 have been removed. OpenerDirector Objects ====================== "OpenerDirector" instances have the following methods: OpenerDirector.add_handler(handler) *handler* should be an instance of "BaseHandler". The following methods are searched, and added to the possible chains (note that HTTP errors are a special case). Note that, in the following, *protocol* should be replaced with the actual protocol to handle, for example "http_response()" would be the HTTP protocol response handler. Also *type* should be replaced with the actual HTTP code, for example "http_error_404()" would handle HTTP 404 errors. * "_open()" — signal that the handler knows how to open *protocol* URLs. See "BaseHandler._open()" for more information. * "http_error_()" — signal that the handler knows how to handle HTTP errors with HTTP error code *type*. See "BaseHandler.http_error_()" for more information. * "_error()" — signal that the handler knows how to handle errors from (non-"http") *protocol*. * "_request()" — signal that the handler knows how to pre-process *protocol* requests. See "BaseHandler._request()" for more information. * "_response()" — signal that the handler knows how to post-process *protocol* responses. See "BaseHandler._response()" for more information. OpenerDirector.open(url, data=None[, timeout]) Open the given *url* (which can be a request object or a string), optionally passing the given *data*. Arguments, return values and exceptions raised are the same as those of "urlopen()" (which simply calls the "open()" method on the currently installed global "OpenerDirector"). The optional *timeout* parameter specifies a timeout in seconds for blocking operations like the connection attempt (if not specified, the global default timeout setting will be used). The timeout feature actually works only for HTTP, HTTPS and FTP connections. OpenerDirector.error(proto, *args) Handle an error of the given protocol. This will call the registered error handlers for the given protocol with the given arguments (which are protocol specific). The HTTP protocol is a special case which uses the HTTP response code to determine the specific error handler; refer to the "http_error_()" methods of the handler classes. Return values and exceptions raised are the same as those of "urlopen()". OpenerDirector objects open URLs in three stages: The order in which these methods are called within each stage is determined by sorting the handler instances. 1. Every handler with a method named like "_request()" has that method called to pre-process the request. 2. Handlers with a method named like "_open()" are called to handle the request. This stage ends when a handler either returns a non-"None" value (ie. a response), or raises an exception (usually "URLError"). Exceptions are allowed to propagate. In fact, the above algorithm is first tried for methods named "default_open()". If all such methods return "None", the algorithm is repeated for methods named like "_open()". If all such methods return "None", the algorithm is repeated for methods named "unknown_open()". Note that the implementation of these methods may involve calls of the parent "OpenerDirector" instance’s "open()" and "error()" methods. 3. Every handler with a method named like "_response()" has that method called to post-process the response. BaseHandler Objects =================== "BaseHandler" objects provide a couple of methods that are directly useful, and others that are meant to be used by derived classes. These are intended for direct use: BaseHandler.add_parent(director) Add a director as parent. BaseHandler.close() Remove any parents. The following attribute and methods should only be used by classes derived from "BaseHandler". Note: The convention has been adopted that subclasses defining "_request()" or "_response()" methods are named "*Processor"; all others are named "*Handler". BaseHandler.parent A valid "OpenerDirector", which can be used to open using a different protocol, or handle errors. BaseHandler.default_open(req) This method is *not* defined in "BaseHandler", but subclasses should define it if they want to catch all URLs. This method, if implemented, will be called by the parent "OpenerDirector". It should return a file-like object as described in the return value of the "open()" method of "OpenerDirector", or "None". It should raise "URLError", unless a truly exceptional thing happens (for example, "MemoryError" should not be mapped to "URLError"). This method will be called before any protocol-specific open method. BaseHandler._open(req) This method is *not* defined in "BaseHandler", but subclasses should define it if they want to handle URLs with the given protocol. This method, if defined, will be called by the parent "OpenerDirector". Return values should be the same as for "default_open()". BaseHandler.unknown_open(req) This method is *not* defined in "BaseHandler", but subclasses should define it if they want to catch all URLs with no specific registered handler to open it. This method, if implemented, will be called by the "parent" "OpenerDirector". Return values should be the same as for "default_open()". BaseHandler.http_error_default(req, fp, code, msg, hdrs) This method is *not* defined in "BaseHandler", but subclasses should override it if they intend to provide a catch-all for otherwise unhandled HTTP errors. It will be called automatically by the "OpenerDirector" getting the error, and should not normally be called in other circumstances. *req* will be a "Request" object, *fp* will be a file-like object with the HTTP error body, *code* will be the three-digit code of the error, *msg* will be the user-visible explanation of the code and *hdrs* will be a mapping object with the headers of the error. Return values and exceptions raised should be the same as those of "urlopen()". BaseHandler.http_error_(req, fp, code, msg, hdrs) *nnn* should be a three-digit HTTP error code. This method is also not defined in "BaseHandler", but will be called, if it exists, on an instance of a subclass, when an HTTP error with code *nnn* occurs. Subclasses should override this method to handle specific HTTP errors. Arguments, return values and exceptions raised should be the same as for "http_error_default()". BaseHandler._request(req) This method is *not* defined in "BaseHandler", but subclasses should define it if they want to pre-process requests of the given protocol. This method, if defined, will be called by the parent "OpenerDirector". *req* will be a "Request" object. The return value should be a "Request" object. BaseHandler._response(req, response) This method is *not* defined in "BaseHandler", but subclasses should define it if they want to post-process responses of the given protocol. This method, if defined, will be called by the parent "OpenerDirector". *req* will be a "Request" object. *response* will be an object implementing the same interface as the return value of "urlopen()". The return value should implement the same interface as the return value of "urlopen()". HTTPRedirectHandler Objects =========================== Note: Some HTTP redirections require action from this module’s client code. If this is the case, "HTTPError" is raised. See **RFC 2616** for details of the precise meanings of the various redirection codes.An "HTTPError" exception raised as a security consideration if the HTTPRedirectHandler is presented with a redirected URL which is not an HTTP, HTTPS or FTP URL. HTTPRedirectHandler.redirect_request(req, fp, code, msg, hdrs, newurl) Return a "Request" or "None" in response to a redirect. This is called by the default implementations of the "http_error_30*()" methods when a redirection is received from the server. If a redirection should take place, return a new "Request" to allow "http_error_30*()" to perform the redirect to *newurl*. Otherwise, raise "HTTPError" if no other handler should try to handle this URL, or return "None" if you can’t but another handler might. Note: The default implementation of this method does not strictly follow **RFC 2616**, which says that 301 and 302 responses to "POST" requests must not be automatically redirected without confirmation by the user. In reality, browsers do allow automatic redirection of these responses, changing the POST to a "GET", and the default implementation reproduces this behavior. HTTPRedirectHandler.http_error_301(req, fp, code, msg, hdrs) Redirect to the "Location:" or "URI:" URL. This method is called by the parent "OpenerDirector" when getting an HTTP ‘moved permanently’ response. HTTPRedirectHandler.http_error_302(req, fp, code, msg, hdrs) The same as "http_error_301()", but called for the ‘found’ response. HTTPRedirectHandler.http_error_303(req, fp, code, msg, hdrs) The same as "http_error_301()", but called for the ‘see other’ response. HTTPRedirectHandler.http_error_307(req, fp, code, msg, hdrs) The same as "http_error_301()", but called for the ‘temporary redirect’ response. HTTPCookieProcessor Objects =========================== "HTTPCookieProcessor" instances have one attribute: HTTPCookieProcessor.cookiejar The "http.cookiejar.CookieJar" in which cookies are stored. ProxyHandler Objects ==================== ProxyHandler._open(request) The "ProxyHandler" will have a method "_open()" for every *protocol* which has a proxy in the *proxies* dictionary given in the constructor. The method will modify requests to go through the proxy, by calling "request.set_proxy()", and call the next handler in the chain to actually execute the protocol. HTTPPasswordMgr Objects ======================= These methods are available on "HTTPPasswordMgr" and "HTTPPasswordMgrWithDefaultRealm" objects. HTTPPasswordMgr.add_password(realm, uri, user, passwd) *uri* can be either a single URI, or a sequence of URIs. *realm*, *user* and *passwd* must be strings. This causes "(user, passwd)" to be used as authentication tokens when authentication for *realm* and a super-URI of any of the given URIs is given. HTTPPasswordMgr.find_user_password(realm, authuri) Get user/password for given realm and URI, if any. This method will return "(None, None)" if there is no matching user/password. For "HTTPPasswordMgrWithDefaultRealm" objects, the realm "None" will be searched if the given *realm* has no matching user/password. HTTPPasswordMgrWithPriorAuth Objects ==================================== This password manager extends "HTTPPasswordMgrWithDefaultRealm" to support tracking URIs for which authentication credentials should always be sent. HTTPPasswordMgrWithPriorAuth.add_password(realm, uri, user, passwd, is_authenticated=False) *realm*, *uri*, *user*, *passwd* are as for "HTTPPasswordMgr.add_password()". *is_authenticated* sets the initial value of the "is_authenticated" flag for the given URI or list of URIs. If *is_authenticated* is specified as "True", *realm* is ignored. HTTPPasswordMgrWithPriorAuth.find_user_password(realm, authuri) Same as for "HTTPPasswordMgrWithDefaultRealm" objects HTTPPasswordMgrWithPriorAuth.update_authenticated(self, uri, is_authenticated=False) Update the "is_authenticated" flag for the given *uri* or list of URIs. HTTPPasswordMgrWithPriorAuth.is_authenticated(self, authuri) Returns the current state of the "is_authenticated" flag for the given URI. AbstractBasicAuthHandler Objects ================================ AbstractBasicAuthHandler.http_error_auth_reqed(authreq, host, req, headers) Handle an authentication request by getting a user/password pair, and re-trying the request. *authreq* should be the name of the header where the information about the realm is included in the request, *host* specifies the URL and path to authenticate for, *req* should be the (failed) "Request" object, and *headers* should be the error headers. *host* is either an authority (e.g. ""python.org"") or a URL containing an authority component (e.g. ""http://python.org/""). In either case, the authority must not contain a userinfo component (so, ""python.org"" and ""python.org:80"" are fine, ""joe:password@python.org"" is not). HTTPBasicAuthHandler Objects ============================ HTTPBasicAuthHandler.http_error_401(req, fp, code, msg, hdrs) Retry the request with authentication information, if available. ProxyBasicAuthHandler Objects ============================= ProxyBasicAuthHandler.http_error_407(req, fp, code, msg, hdrs) Retry the request with authentication information, if available. AbstractDigestAuthHandler Objects ================================= AbstractDigestAuthHandler.http_error_auth_reqed(authreq, host, req, headers) *authreq* should be the name of the header where the information about the realm is included in the request, *host* should be the host to authenticate to, *req* should be the (failed) "Request" object, and *headers* should be the error headers. HTTPDigestAuthHandler Objects ============================= HTTPDigestAuthHandler.http_error_401(req, fp, code, msg, hdrs) Retry the request with authentication information, if available. ProxyDigestAuthHandler Objects ============================== ProxyDigestAuthHandler.http_error_407(req, fp, code, msg, hdrs) Retry the request with authentication information, if available. HTTPHandler Objects =================== HTTPHandler.http_open(req) Send an HTTP request, which can be either GET or POST, depending on "req.has_data()". HTTPSHandler Objects ==================== HTTPSHandler.https_open(req) Send an HTTPS request, which can be either GET or POST, depending on "req.has_data()". FileHandler Objects =================== FileHandler.file_open(req) Open the file locally, if there is no host name, or the host name is "'localhost'". Changed in version 3.2: This method is applicable only for local hostnames. When a remote hostname is given, an "URLError" is raised. DataHandler Objects =================== DataHandler.data_open(req) Read a data URL. This kind of URL contains the content encoded in the URL itself. The data URL syntax is specified in **RFC 2397**. This implementation ignores white spaces in base64 encoded data URLs so the URL may be wrapped in whatever source file it comes from. But even though some browsers don’t mind about a missing padding at the end of a base64 encoded data URL, this implementation will raise an "ValueError" in that case. FTPHandler Objects ================== FTPHandler.ftp_open(req) Open the FTP file indicated by *req*. The login is always done with empty username and password. CacheFTPHandler Objects ======================= "CacheFTPHandler" objects are "FTPHandler" objects with the following additional methods: CacheFTPHandler.setTimeout(t) Set timeout of connections to *t* seconds. CacheFTPHandler.setMaxConns(m) Set maximum number of cached connections to *m*. UnknownHandler Objects ====================== UnknownHandler.unknown_open() Raise a "URLError" exception. HTTPErrorProcessor Objects ========================== HTTPErrorProcessor.http_response(request, response) Process HTTP error responses. For 200 error codes, the response object is returned immediately. For non-200 error codes, this simply passes the job on to the "http_error_()" handler methods, via "OpenerDirector.error()". Eventually, "HTTPDefaultErrorHandler" will raise an "HTTPError" if no other handler handles the error. HTTPErrorProcessor.https_response(request, response) Process HTTPS error responses. The behavior is same as "http_response()". Examples ======== In addition to the examples below, more examples are given in HOWTO Fetch Internet Resources Using The urllib Package. This example gets the python.org main page and displays the first 300 bytes of it. >>> import urllib.request >>> with urllib.request.urlopen('http://www.python.org/') as f: ... print(f.read(300)) ... b'\n\n\n\n\n\n \n Python Programming ' Note that urlopen returns a bytes object. This is because there is no way for urlopen to automatically determine the encoding of the byte stream it receives from the HTTP server. In general, a program will decode the returned bytes object to string once it determines or guesses the appropriate encoding. The following W3C document, https://www.w3.org/International/O-charset, lists the various ways in which an (X)HTML or an XML document could have specified its encoding information. As the python.org website uses *utf-8* encoding as specified in its meta tag, we will use the same for decoding the bytes object. >>> with urllib.request.urlopen('http://www.python.org/') as f: ... print(f.read(100).decode('utf-8')) ... <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtm It is also possible to achieve the same result without using the *context manager* approach. >>> import urllib.request >>> f = urllib.request.urlopen('http://www.python.org/') >>> print(f.read(100).decode('utf-8')) <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtm In the following example, we are sending a data-stream to the stdin of a CGI and reading the data it returns to us. Note that this example will only work when the Python installation supports SSL. >>> import urllib.request >>> req = urllib.request.Request(url='https://localhost/cgi-bin/test.cgi', ... data=b'This data is passed to stdin of the CGI') >>> with urllib.request.urlopen(req) as f: ... print(f.read().decode('utf-8')) ... Got Data: "This data is passed to stdin of the CGI" The code for the sample CGI used in the above example is: #!/usr/bin/env python import sys data = sys.stdin.read() print('Content-type: text/plain\n\nGot Data: "%s"' % data) Here is an example of doing a "PUT" request using "Request": import urllib.request DATA = b'some data' req = urllib.request.Request(url='http://localhost:8080', data=DATA,method='PUT') with urllib.request.urlopen(req) as f: pass print(f.status) print(f.reason) Use of Basic HTTP Authentication: import urllib.request # Create an OpenerDirector with support for Basic HTTP Authentication... auth_handler = urllib.request.HTTPBasicAuthHandler() auth_handler.add_password(realm='PDQ Application', uri='https://mahler:8092/site-updates.py', user='klem', passwd='kadidd!ehopper') opener = urllib.request.build_opener(auth_handler) # ...and install it globally so it can be used with urlopen. urllib.request.install_opener(opener) urllib.request.urlopen('http://www.example.com/login.html') "build_opener()" provides many handlers by default, including a "ProxyHandler". By default, "ProxyHandler" uses the environment variables named "<scheme>_proxy", where "<scheme>" is the URL scheme involved. For example, the "http_proxy" environment variable is read to obtain the HTTP proxy’s URL. This example replaces the default "ProxyHandler" with one that uses programmatically-supplied proxy URLs, and adds proxy authorization support with "ProxyBasicAuthHandler". proxy_handler = urllib.request.ProxyHandler({'http': 'http://www.example.com:3128/'}) proxy_auth_handler = urllib.request.ProxyBasicAuthHandler() proxy_auth_handler.add_password('realm', 'host', 'username', 'password') opener = urllib.request.build_opener(proxy_handler, proxy_auth_handler) # This time, rather than install the OpenerDirector, we use it directly: opener.open('http://www.example.com/login.html') Adding HTTP headers: Use the *headers* argument to the "Request" constructor, or: import urllib.request req = urllib.request.Request('http://www.example.com/') req.add_header('Referer', 'http://www.python.org/') # Customize the default User-Agent header value: req.add_header('User-Agent', 'urllib-example/0.1 (Contact: . . .)') r = urllib.request.urlopen(req) "OpenerDirector" automatically adds a *User-Agent* header to every "Request". To change this: import urllib.request opener = urllib.request.build_opener() opener.addheaders = [('User-agent', 'Mozilla/5.0')] opener.open('http://www.example.com/') Also, remember that a few standard headers (*Content-Length*, *Content-Type* and *Host*) are added when the "Request" is passed to "urlopen()" (or "OpenerDirector.open()"). Here is an example session that uses the "GET" method to retrieve a URL containing parameters: >>> import urllib.request >>> import urllib.parse >>> params = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0}) >>> url = "http://www.musi-cal.com/cgi-bin/query?%s" % params >>> with urllib.request.urlopen(url) as f: ... print(f.read().decode('utf-8')) ... The following example uses the "POST" method instead. Note that params output from urlencode is encoded to bytes before it is sent to urlopen as data: >>> import urllib.request >>> import urllib.parse >>> data = urllib.parse.urlencode({'spam': 1, 'eggs': 2, 'bacon': 0}) >>> data = data.encode('ascii') >>> with urllib.request.urlopen("http://requestb.in/xrbl82xr", data) as f: ... print(f.read().decode('utf-8')) ... The following example uses an explicitly specified HTTP proxy, overriding environment settings: >>> import urllib.request >>> proxies = {'http': 'http://proxy.example.com:8080/'} >>> opener = urllib.request.FancyURLopener(proxies) >>> with opener.open("http://www.python.org") as f: ... f.read().decode('utf-8') ... The following example uses no proxies at all, overriding environment settings: >>> import urllib.request >>> opener = urllib.request.FancyURLopener({}) >>> with opener.open("http://www.python.org/") as f: ... f.read().decode('utf-8') ... Legacy interface ================ The following functions and classes are ported from the Python 2 module "urllib" (as opposed to "urllib2"). They might become deprecated at some point in the future. urllib.request.urlretrieve(url, filename=None, reporthook=None, data=None) Copy a network object denoted by a URL to a local file. If the URL points to a local file, the object will not be copied unless filename is supplied. Return a tuple "(filename, headers)" where *filename* is the local file name under which the object can be found, and *headers* is whatever the "info()" method of the object returned by "urlopen()" returned (for a remote object). Exceptions are the same as for "urlopen()". The second argument, if present, specifies the file location to copy to (if absent, the location will be a tempfile with a generated name). The third argument, if present, is a callable that will be called once on establishment of the network connection and once after each block read thereafter. The callable will be passed three arguments; a count of blocks transferred so far, a block size in bytes, and the total size of the file. The third argument may be "-1" on older FTP servers which do not return a file size in response to a retrieval request. The following example illustrates the most common usage scenario: >>> import urllib.request >>> local_filename, headers = urllib.request.urlretrieve('http://python.org/') >>> html = open(local_filename) >>> html.close() If the *url* uses the "http:" scheme identifier, the optional *data* argument may be given to specify a "POST" request (normally the request type is "GET"). The *data* argument must be a bytes object in standard *application/x-www-form-urlencoded* format; see the "urllib.parse.urlencode()" function. "urlretrieve()" will raise "ContentTooShortError" when it detects that the amount of data available was less than the expected amount (which is the size reported by a *Content-Length* header). This can occur, for example, when the download is interrupted. The *Content-Length* is treated as a lower bound: if there’s more data to read, urlretrieve reads more data, but if less data is available, it raises the exception. You can still retrieve the downloaded data in this case, it is stored in the "content" attribute of the exception instance. If no *Content-Length* header was supplied, urlretrieve can not check the size of the data it has downloaded, and just returns it. In this case you just have to assume that the download was successful. urllib.request.urlcleanup() Cleans up temporary files that may have been left behind by previous calls to "urlretrieve()". class urllib.request.URLopener(proxies=None, **x509) Deprecated since version 3.3. Base class for opening and reading URLs. Unless you need to support opening objects using schemes other than "http:", "ftp:", or "file:", you probably want to use "FancyURLopener". By default, the "URLopener" class sends a *User-Agent* header of "urllib/VVV", where *VVV* is the "urllib" version number. Applications can define their own *User-Agent* header by subclassing "URLopener" or "FancyURLopener" and setting the class attribute "version" to an appropriate string value in the subclass definition. The optional *proxies* parameter should be a dictionary mapping scheme names to proxy URLs, where an empty dictionary turns proxies off completely. Its default value is "None", in which case environmental proxy settings will be used if present, as discussed in the definition of "urlopen()", above. Additional keyword parameters, collected in *x509*, may be used for authentication of the client when using the "https:" scheme. The keywords *key_file* and *cert_file* are supported to provide an SSL key and certificate; both are needed to support client authentication. "URLopener" objects will raise an "OSError" exception if the server returns an error code. open(fullurl, data=None) Open *fullurl* using the appropriate protocol. This method sets up cache and proxy information, then calls the appropriate open method with its input arguments. If the scheme is not recognized, "open_unknown()" is called. The *data* argument has the same meaning as the *data* argument of "urlopen()". This method always quotes *fullurl* using "quote()". open_unknown(fullurl, data=None) Overridable interface to open unknown URL types. retrieve(url, filename=None, reporthook=None, data=None) Retrieves the contents of *url* and places it in *filename*. The return value is a tuple consisting of a local filename and either an "email.message.Message" object containing the response headers (for remote URLs) or "None" (for local URLs). The caller must then open and read the contents of *filename*. If *filename* is not given and the URL refers to a local file, the input filename is returned. If the URL is non-local and *filename* is not given, the filename is the output of "tempfile.mktemp()" with a suffix that matches the suffix of the last path component of the input URL. If *reporthook* is given, it must be a function accepting three numeric parameters: A chunk number, the maximum size chunks are read in and the total size of the download (-1 if unknown). It will be called once at the start and after each chunk of data is read from the network. *reporthook* is ignored for local URLs. If the *url* uses the "http:" scheme identifier, the optional *data* argument may be given to specify a "POST" request (normally the request type is "GET"). The *data* argument must in standard *application/x-www-form-urlencoded* format; see the "urllib.parse.urlencode()" function. version Variable that specifies the user agent of the opener object. To get "urllib" to tell servers that it is a particular user agent, set this in a subclass as a class variable or in the constructor before calling the base constructor. class urllib.request.FancyURLopener(...) Deprecated since version 3.3. "FancyURLopener" subclasses "URLopener" providing default handling for the following HTTP response codes: 301, 302, 303, 307 and 401. For the 30x response codes listed above, the *Location* header is used to fetch the actual URL. For 401 response codes (authentication required), basic HTTP authentication is performed. For the 30x response codes, recursion is bounded by the value of the *maxtries* attribute, which defaults to 10. For all other response codes, the method "http_error_default()" is called which you can override in subclasses to handle the error appropriately. Note: According to the letter of **RFC 2616**, 301 and 302 responses to POST requests must not be automatically redirected without confirmation by the user. In reality, browsers do allow automatic redirection of these responses, changing the POST to a GET, and "urllib" reproduces this behaviour. The parameters to the constructor are the same as those for "URLopener". Note: When performing basic authentication, a "FancyURLopener" instance calls its "prompt_user_passwd()" method. The default implementation asks the users for the required information on the controlling terminal. A subclass may override this method to support more appropriate behavior if needed. The "FancyURLopener" class offers one additional method that should be overloaded to provide the appropriate behavior: prompt_user_passwd(host, realm) Return information needed to authenticate the user at the given host in the specified security realm. The return value should be a tuple, "(user, password)", which can be used for basic authentication. The implementation prompts for this information on the terminal; an application should override this method to use an appropriate interaction model in the local environment. "urllib.request" Restrictions ============================= * Currently, only the following protocols are supported: HTTP (versions 0.9 and 1.0), FTP, local files, and data URLs. Changed in version 3.4: Added support for data URLs. * The caching feature of "urlretrieve()" has been disabled until someone finds the time to hack proper processing of Expiration time headers. * There should be a function to query whether a particular URL is in the cache. * For backward compatibility, if a URL appears to point to a local file but the file can’t be opened, the URL is re-interpreted using the FTP protocol. This can sometimes cause confusing error messages. * The "urlopen()" and "urlretrieve()" functions can cause arbitrarily long delays while waiting for a network connection to be set up. This means that it is difficult to build an interactive Web client using these functions without using threads. * The data returned by "urlopen()" or "urlretrieve()" is the raw data returned by the server. This may be binary data (such as an image), plain text or (for example) HTML. The HTTP protocol provides type information in the reply header, which can be inspected by looking at the *Content-Type* header. If the returned data is HTML, you can use the module "html.parser" to parse it. * The code handling the FTP protocol cannot differentiate between a file and a directory. This can lead to unexpected behavior when attempting to read a URL that points to a file that is not accessible. If the URL ends in a "/", it is assumed to refer to a directory and will be handled accordingly. But if an attempt to read a file leads to a 550 error (meaning the URL cannot be found or is not accessible, often for permission reasons), then the path is treated as a directory in order to handle the case when a directory is specified by a URL but the trailing "/" has been left off. This can cause misleading results when you try to fetch a file whose read permissions make it inaccessible; the FTP code will try to read it, fail with a 550 error, and then perform a directory listing for the unreadable file. If fine-grained control is needed, consider using the "ftplib" module, subclassing "FancyURLopener", or changing *_urlopener* to meet your needs. "urllib.response" — Response classes used by urllib *************************************************** The "urllib.response" module defines functions and classes which define a minimal file-like interface, including "read()" and "readline()". Functions defined by this module are used internally by the "urllib.request" module. The typical response object is a "urllib.response.addinfourl" instance: class urllib.response.addinfourl url URL of the resource retrieved, commonly used to determine if a redirect was followed. headers Returns the headers of the response in the form of an "EmailMessage" instance. status New in version 3.9. Status code returned by server. geturl() Deprecated since version 3.9: Deprecated in favor of "url". info() Deprecated since version 3.9: Deprecated in favor of "headers". code Deprecated since version 3.9: Deprecated in favor of "status". getstatus() Deprecated since version 3.9: Deprecated in favor of "status". "urllib.robotparser" — Parser for robots.txt ********************************************* **Source code:** Lib/urllib/robotparser.py ====================================================================== This module provides a single class, "RobotFileParser", which answers questions about whether or not a particular user agent can fetch a URL on the Web site that published the "robots.txt" file. For more details on the structure of "robots.txt" files, see http://www.robotstxt.org/orig.html. class urllib.robotparser.RobotFileParser(url='') This class provides methods to read, parse and answer questions about the "robots.txt" file at *url*. set_url(url) Sets the URL referring to a "robots.txt" file. read() Reads the "robots.txt" URL and feeds it to the parser. parse(lines) Parses the lines argument. can_fetch(useragent, url) Returns "True" if the *useragent* is allowed to fetch the *url* according to the rules contained in the parsed "robots.txt" file. mtime() Returns the time the "robots.txt" file was last fetched. This is useful for long-running web spiders that need to check for new "robots.txt" files periodically. modified() Sets the time the "robots.txt" file was last fetched to the current time. crawl_delay(useragent) Returns the value of the "Crawl-delay" parameter from "robots.txt" for the *useragent* in question. If there is no such parameter or it doesn’t apply to the *useragent* specified or the "robots.txt" entry for this parameter has invalid syntax, return "None". New in version 3.6. request_rate(useragent) Returns the contents of the "Request-rate" parameter from "robots.txt" as a *named tuple* "RequestRate(requests, seconds)". If there is no such parameter or it doesn’t apply to the *useragent* specified or the "robots.txt" entry for this parameter has invalid syntax, return "None". New in version 3.6. site_maps() Returns the contents of the "Sitemap" parameter from "robots.txt" in the form of a "list()". If there is no such parameter or the "robots.txt" entry for this parameter has invalid syntax, return "None". New in version 3.8. The following example demonstrates basic use of the "RobotFileParser" class: >>> import urllib.robotparser >>> rp = urllib.robotparser.RobotFileParser() >>> rp.set_url("http://www.musi-cal.com/robots.txt") >>> rp.read() >>> rrate = rp.request_rate("*") >>> rrate.requests 3 >>> rrate.seconds 20 >>> rp.crawl_delay("*") 6 >>> rp.can_fetch("*", "http://www.musi-cal.com/cgi-bin/search?city=San+Francisco") False >>> rp.can_fetch("*", "http://www.musi-cal.com/") True "urllib" — URL handling modules ******************************* **Source code:** Lib/urllib/ ====================================================================== "urllib" is a package that collects several modules for working with URLs: * "urllib.request" for opening and reading URLs * "urllib.error" containing the exceptions raised by "urllib.request" * "urllib.parse" for parsing URLs * "urllib.robotparser" for parsing "robots.txt" files "uu" — Encode and decode uuencode files *************************************** **Source code:** Lib/uu.py Deprecated since version 3.11: The "uu" module is deprecated (see **PEP 594** for details). "base64" is a modern alternative. ====================================================================== This module encodes and decodes files in uuencode format, allowing arbitrary binary data to be transferred over ASCII-only connections. Wherever a file argument is expected, the methods accept a file-like object. For backwards compatibility, a string containing a pathname is also accepted, and the corresponding file will be opened for reading and writing; the pathname "'-'" is understood to mean the standard input or output. However, this interface is deprecated; it’s better for the caller to open the file itself, and be sure that, when required, the mode is "'rb'" or "'wb'" on Windows. This code was contributed by Lance Ellinghouse, and modified by Jack Jansen. The "uu" module defines the following functions: uu.encode(in_file, out_file, name=None, mode=None, *, backtick=False) Uuencode file *in_file* into file *out_file*. The uuencoded file will have the header specifying *name* and *mode* as the defaults for the results of decoding the file. The default defaults are taken from *in_file*, or "'-'" and "0o666" respectively. If *backtick* is true, zeros are represented by "'`'" instead of spaces. Changed in version 3.7: Added the *backtick* parameter. uu.decode(in_file, out_file=None, mode=None, quiet=False) This call decodes uuencoded file *in_file* placing the result on file *out_file*. If *out_file* is a pathname, *mode* is used to set the permission bits if the file must be created. Defaults for *out_file* and *mode* are taken from the uuencode header. However, if the file specified in the header already exists, a "uu.Error" is raised. "decode()" may print a warning to standard error if the input was produced by an incorrect uuencoder and Python could recover from that error. Setting *quiet* to a true value silences this warning. exception uu.Error Subclass of "Exception", this can be raised by "uu.decode()" under various situations, such as described above, but also including a badly formatted header, or truncated input file. See also: Module "binascii" Support module containing ASCII-to-binary and binary-to-ASCII conversions. "uuid" — UUID objects according to **RFC 4122** *********************************************** **Source code:** Lib/uuid.py ====================================================================== This module provides immutable "UUID" objects (the "UUID" class) and the functions "uuid1()", "uuid3()", "uuid4()", "uuid5()" for generating version 1, 3, 4, and 5 UUIDs as specified in **RFC 4122**. If all you want is a unique ID, you should probably call "uuid1()" or "uuid4()". Note that "uuid1()" may compromise privacy since it creates a UUID containing the computer’s network address. "uuid4()" creates a random UUID. Depending on support from the underlying platform, "uuid1()" may or may not return a “safe” UUID. A safe UUID is one which is generated using synchronization methods that ensure no two processes can obtain the same UUID. All instances of "UUID" have an "is_safe" attribute which relays any information about the UUID’s safety, using this enumeration: class uuid.SafeUUID New in version 3.7. safe The UUID was generated by the platform in a multiprocessing-safe way. unsafe The UUID was not generated in a multiprocessing-safe way. unknown The platform does not provide information on whether the UUID was generated safely or not. class uuid.UUID(hex=None, bytes=None, bytes_le=None, fields=None, int=None, version=None, *, is_safe=SafeUUID.unknown) Create a UUID from either a string of 32 hexadecimal digits, a string of 16 bytes in big-endian order as the *bytes* argument, a string of 16 bytes in little-endian order as the *bytes_le* argument, a tuple of six integers (32-bit *time_low*, 16-bit *time_mid*, 16-bit *time_hi_version*, 8-bit *clock_seq_hi_variant*, 8-bit *clock_seq_low*, 48-bit *node*) as the *fields* argument, or a single 128-bit integer as the *int* argument. When a string of hex digits is given, curly braces, hyphens, and a URN prefix are all optional. For example, these expressions all yield the same UUID: UUID('{12345678-1234-5678-1234-567812345678}') UUID('12345678123456781234567812345678') UUID('urn:uuid:12345678-1234-5678-1234-567812345678') UUID(bytes=b'\x12\x34\x56\x78'*4) UUID(bytes_le=b'\x78\x56\x34\x12\x34\x12\x78\x56' + b'\x12\x34\x56\x78\x12\x34\x56\x78') UUID(fields=(0x12345678, 0x1234, 0x5678, 0x12, 0x34, 0x567812345678)) UUID(int=0x12345678123456781234567812345678) Exactly one of *hex*, *bytes*, *bytes_le*, *fields*, or *int* must be given. The *version* argument is optional; if given, the resulting UUID will have its variant and version number set according to **RFC 4122**, overriding bits in the given *hex*, *bytes*, *bytes_le*, *fields*, or *int*. Comparison of UUID objects are made by way of comparing their "UUID.int" attributes. Comparison with a non-UUID object raises a "TypeError". "str(uuid)" returns a string in the form "12345678-1234-5678-1234-567812345678" where the 32 hexadecimal digits represent the UUID. "UUID" instances have these read-only attributes: UUID.bytes The UUID as a 16-byte string (containing the six integer fields in big-endian byte order). UUID.bytes_le The UUID as a 16-byte string (with *time_low*, *time_mid*, and *time_hi_version* in little-endian byte order). UUID.fields A tuple of the six integer fields of the UUID, which are also available as six individual attributes and two derived attributes: +--------------------------------+---------------------------------+ | Field | Meaning | |================================|=================================| | "time_low" | the first 32 bits of the UUID | +--------------------------------+---------------------------------+ | "time_mid" | the next 16 bits of the UUID | +--------------------------------+---------------------------------+ | "time_hi_version" | the next 16 bits of the UUID | +--------------------------------+---------------------------------+ | "clock_seq_hi_variant" | the next 8 bits of the UUID | +--------------------------------+---------------------------------+ | "clock_seq_low" | the next 8 bits of the UUID | +--------------------------------+---------------------------------+ | "node" | the last 48 bits of the UUID | +--------------------------------+---------------------------------+ | "time" | the 60-bit timestamp | +--------------------------------+---------------------------------+ | "clock_seq" | the 14-bit sequence number | +--------------------------------+---------------------------------+ UUID.hex The UUID as a 32-character lowercase hexadecimal string. UUID.int The UUID as a 128-bit integer. UUID.urn The UUID as a URN as specified in **RFC 4122**. UUID.variant The UUID variant, which determines the internal layout of the UUID. This will be one of the constants "RESERVED_NCS", "RFC_4122", "RESERVED_MICROSOFT", or "RESERVED_FUTURE". UUID.version The UUID version number (1 through 5, meaningful only when the variant is "RFC_4122"). UUID.is_safe An enumeration of "SafeUUID" which indicates whether the platform generated the UUID in a multiprocessing-safe way. New in version 3.7. The "uuid" module defines the following functions: uuid.getnode() Get the hardware address as a 48-bit positive integer. The first time this runs, it may launch a separate program, which could be quite slow. If all attempts to obtain the hardware address fail, we choose a random 48-bit number with the multicast bit (least significant bit of the first octet) set to 1 as recommended in **RFC 4122**. “Hardware address” means the MAC address of a network interface. On a machine with multiple network interfaces, universally administered MAC addresses (i.e. where the second least significant bit of the first octet is *unset*) will be preferred over locally administered MAC addresses, but with no other ordering guarantees. Changed in version 3.7: Universally administered MAC addresses are preferred over locally administered MAC addresses, since the former are guaranteed to be globally unique, while the latter are not. uuid.uuid1(node=None, clock_seq=None) Generate a UUID from a host ID, sequence number, and the current time. If *node* is not given, "getnode()" is used to obtain the hardware address. If *clock_seq* is given, it is used as the sequence number; otherwise a random 14-bit sequence number is chosen. uuid.uuid3(namespace, name) Generate a UUID based on the MD5 hash of a namespace identifier (which is a UUID) and a name (which is a string). uuid.uuid4() Generate a random UUID. uuid.uuid5(namespace, name) Generate a UUID based on the SHA-1 hash of a namespace identifier (which is a UUID) and a name (which is a string). The "uuid" module defines the following namespace identifiers for use with "uuid3()" or "uuid5()". uuid.NAMESPACE_DNS When this namespace is specified, the *name* string is a fully- qualified domain name. uuid.NAMESPACE_URL When this namespace is specified, the *name* string is a URL. uuid.NAMESPACE_OID When this namespace is specified, the *name* string is an ISO OID. uuid.NAMESPACE_X500 When this namespace is specified, the *name* string is an X.500 DN in DER or a text output format. The "uuid" module defines the following constants for the possible values of the "variant" attribute: uuid.RESERVED_NCS Reserved for NCS compatibility. uuid.RFC_4122 Specifies the UUID layout given in **RFC 4122**. uuid.RESERVED_MICROSOFT Reserved for Microsoft compatibility. uuid.RESERVED_FUTURE Reserved for future definition. See also: **RFC 4122** - A Universally Unique IDentifier (UUID) URN Namespace This specification defines a Uniform Resource Name namespace for UUIDs, the internal format of UUIDs, and methods of generating UUIDs. Example ======= Here are some examples of typical usage of the "uuid" module: >>> import uuid >>> # make a UUID based on the host ID and current time >>> uuid.uuid1() UUID('a8098c1a-f86e-11da-bd1a-00112444be1e') >>> # make a UUID using an MD5 hash of a namespace UUID and a name >>> uuid.uuid3(uuid.NAMESPACE_DNS, 'python.org') UUID('6fa459ea-ee8a-3ca4-894e-db77e160355e') >>> # make a random UUID >>> uuid.uuid4() UUID('16fd2706-8baf-433b-82eb-8c7fada847da') >>> # make a UUID using a SHA-1 hash of a namespace UUID and a name >>> uuid.uuid5(uuid.NAMESPACE_DNS, 'python.org') UUID('886313e1-3b8a-5372-9b90-0c9aee199e5d') >>> # make a UUID from a string of hex digits (braces and hyphens ignored) >>> x = uuid.UUID('{00010203-0405-0607-0809-0a0b0c0d0e0f}') >>> # convert a UUID to a string of hex digits in standard form >>> str(x) '00010203-0405-0607-0809-0a0b0c0d0e0f' >>> # get the raw 16 bytes of the UUID >>> x.bytes b'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f' >>> # make a UUID from a 16-byte string >>> uuid.UUID(bytes=x.bytes) UUID('00010203-0405-0607-0809-0a0b0c0d0e0f') "venv" — Creation of virtual environments ***************************************** New in version 3.3. **Source code:** Lib/venv/ ====================================================================== The "venv" module provides support for creating lightweight “virtual environments” with their own site directories, optionally isolated from system site directories. Each virtual environment has its own Python binary (which matches the version of the binary that was used to create this environment) and can have its own independent set of installed Python packages in its site directories. See **PEP 405** for more information about Python virtual environments. See also: Python Packaging User Guide: Creating and using virtual environments Creating virtual environments ============================= Creation of virtual environments is done by executing the command "venv": python3 -m venv /path/to/new/virtual/environment Running this command creates the target directory (creating any parent directories that don’t exist already) and places a "pyvenv.cfg" file in it with a "home" key pointing to the Python installation from which the command was run (a common name for the target directory is ".venv"). It also creates a "bin" (or "Scripts" on Windows) subdirectory containing a copy/symlink of the Python binary/binaries (as appropriate for the platform or arguments used at environment creation time). It also creates an (initially empty) "lib/pythonX.Y /site-packages" subdirectory (on Windows, this is "Lib\site- packages"). If an existing directory is specified, it will be re-used. Deprecated since version 3.6: "pyvenv" was the recommended tool for creating virtual environments for Python 3.3 and 3.4, and is deprecated in Python 3.6. Changed in version 3.5: The use of "venv" is now recommended for creating virtual environments. On Windows, invoke the "venv" command as follows: c:\>c:\Python35\python -m venv c:\path\to\myenv Alternatively, if you configured the "PATH" and "PATHEXT" variables for your Python installation: c:\>python -m venv c:\path\to\myenv The command, if run with "-h", will show the available options: usage: venv [-h] [--system-site-packages] [--symlinks | --copies] [--clear] [--upgrade] [--without-pip] [--prompt PROMPT] [--upgrade-deps] ENV_DIR [ENV_DIR ...] Creates virtual Python environments in one or more target directories. positional arguments: ENV_DIR A directory to create the environment in. optional arguments: -h, --help show this help message and exit --system-site-packages Give the virtual environment access to the system site-packages dir. --symlinks Try to use symlinks rather than copies, when symlinks are not the default for the platform. --copies Try to use copies rather than symlinks, even when symlinks are the default for the platform. --clear Delete the contents of the environment directory if it already exists, before environment creation. --upgrade Upgrade the environment directory to use this version of Python, assuming Python has been upgraded in-place. --without-pip Skips installing or upgrading pip in the virtual environment (pip is bootstrapped by default) --prompt PROMPT Provides an alternative prompt prefix for this environment. --upgrade-deps Upgrade core dependencies: pip setuptools to the latest version in PyPI Once an environment has been created, you may wish to activate it, e.g. by sourcing an activate script in its bin directory. Changed in version 3.9: Add "--upgrade-deps" option to upgrade pip + setuptools to the latest on PyPI Changed in version 3.4: Installs pip by default, added the "--without- pip" and "--copies" options Changed in version 3.4: In earlier versions, if the target directory already existed, an error was raised, unless the "--clear" or "-- upgrade" option was provided. Note: While symlinks are supported on Windows, they are not recommended. Of particular note is that double-clicking "python.exe" in File Explorer will resolve the symlink eagerly and ignore the virtual environment. Note: On Microsoft Windows, it may be required to enable the "Activate.ps1" script by setting the execution policy for the user. You can do this by issuing the following PowerShell command:PS C:> Set-ExecutionPolicy -ExecutionPolicy RemoteSigned -Scope CurrentUserSee About Execution Policies for more information. The created "pyvenv.cfg" file also includes the "include-system-site- packages" key, set to "true" if "venv" is run with the "--system-site- packages" option, "false" otherwise. Unless the "--without-pip" option is given, "ensurepip" will be invoked to bootstrap "pip" into the virtual environment. Multiple paths can be given to "venv", in which case an identical virtual environment will be created, according to the given options, at each provided path. Once a virtual environment has been created, it can be “activated” using a script in the virtual environment’s binary directory. The invocation of the script is platform-specific (*<venv>* must be replaced by the path of the directory containing the virtual environment): +---------------+-------------------+-------------------------------------------+ | Platform | Shell | Command to activate virtual environment | |===============|===================|===========================================| | POSIX | bash/zsh | $ source <venv>/bin/activate | +---------------+-------------------+-------------------------------------------+ | | fish | $ source <venv>/bin/activate.fish | +---------------+-------------------+-------------------------------------------+ | | csh/tcsh | $ source <venv>/bin/activate.csh | +---------------+-------------------+-------------------------------------------+ | | PowerShell Core | $ <venv>/bin/Activate.ps1 | +---------------+-------------------+-------------------------------------------+ | Windows | cmd.exe | C:\> <venv>\Scripts\activate.bat | +---------------+-------------------+-------------------------------------------+ | | PowerShell | PS C:\> <venv>\Scripts\Activate.ps1 | +---------------+-------------------+-------------------------------------------+ When a virtual environment is active, the "VIRTUAL_ENV" environment variable is set to the path of the virtual environment. This can be used to check if one is running inside a virtual environment. You don’t specifically *need* to activate an environment; activation just prepends the virtual environment’s binary directory to your path, so that “python” invokes the virtual environment’s Python interpreter and you can run installed scripts without having to use their full path. However, all scripts installed in a virtual environment should be runnable without activating it, and run with the virtual environment’s Python automatically. You can deactivate a virtual environment by typing “deactivate” in your shell. The exact mechanism is platform-specific and is an internal implementation detail (typically a script or shell function will be used). New in version 3.4: "fish" and "csh" activation scripts. New in version 3.8: PowerShell activation scripts installed under POSIX for PowerShell Core support. Note: A virtual environment is a Python environment such that the Python interpreter, libraries and scripts installed into it are isolated from those installed in other virtual environments, and (by default) any libraries installed in a “system” Python, i.e., one which is installed as part of your operating system.A virtual environment is a directory tree which contains Python executable files and other files which indicate that it is a virtual environment.Common installation tools such as setuptools and pip work as expected with virtual environments. In other words, when a virtual environment is active, they install Python packages into the virtual environment without needing to be told to do so explicitly.When a virtual environment is active (i.e., the virtual environment’s Python interpreter is running), the attributes "sys.prefix" and "sys.exec_prefix" point to the base directory of the virtual environment, whereas "sys.base_prefix" and "sys.base_exec_prefix" point to the non-virtual environment Python installation which was used to create the virtual environment. If a virtual environment is not active, then "sys.prefix" is the same as "sys.base_prefix" and "sys.exec_prefix" is the same as "sys.base_exec_prefix" (they all point to a non-virtual environment Python installation).When a virtual environment is active, any options that change the installation path will be ignored from all "distutils" configuration files to prevent projects being inadvertently installed outside of the virtual environment.When working in a command shell, users can make a virtual environment active by running an "activate" script in the virtual environment’s executables directory (the precise filename and command to use the file is shell-dependent), which prepends the virtual environment’s directory for executables to the "PATH" environment variable for the running shell. There should be no need in other circumstances to activate a virtual environment; scripts installed into virtual environments have a “shebang” line which points to the virtual environment’s Python interpreter. This means that the script will run with that interpreter regardless of the value of "PATH". On Windows, “shebang” line processing is supported if you have the Python Launcher for Windows installed (this was added to Python in 3.3 - see **PEP 397** for more details). Thus, double-clicking an installed script in a Windows Explorer window should run the script with the correct interpreter without there needing to be any reference to its virtual environment in "PATH". API === The high-level method described above makes use of a simple API which provides mechanisms for third-party virtual environment creators to customize environment creation according to their needs, the "EnvBuilder" class. class venv.EnvBuilder(system_site_packages=False, clear=False, symlinks=False, upgrade=False, with_pip=False, prompt=None, upgrade_deps=False) The "EnvBuilder" class accepts the following keyword arguments on instantiation: * "system_site_packages" – a Boolean value indicating that the system Python site-packages should be available to the environment (defaults to "False"). * "clear" – a Boolean value which, if true, will delete the contents of any existing target directory, before creating the environment. * "symlinks" – a Boolean value indicating whether to attempt to symlink the Python binary rather than copying. * "upgrade" – a Boolean value which, if true, will upgrade an existing environment with the running Python - for use when that Python has been upgraded in-place (defaults to "False"). * "with_pip" – a Boolean value which, if true, ensures pip is installed in the virtual environment. This uses "ensurepip" with the "--default-pip" option. * "prompt" – a String to be used after virtual environment is activated (defaults to "None" which means directory name of the environment would be used). If the special string ""."" is provided, the basename of the current directory is used as the prompt. * "upgrade_deps" – Update the base venv modules to the latest on PyPI Changed in version 3.4: Added the "with_pip" parameter New in version 3.6: Added the "prompt" parameter New in version 3.9: Added the "upgrade_deps" parameter Creators of third-party virtual environment tools will be free to use the provided "EnvBuilder" class as a base class. The returned env-builder is an object which has a method, "create": create(env_dir) Create a virtual environment by specifying the target directory (absolute or relative to the current directory) which is to contain the virtual environment. The "create" method will either create the environment in the specified directory, or raise an appropriate exception. The "create" method of the "EnvBuilder" class illustrates the hooks available for subclass customization: def create(self, env_dir): """ Create a virtualized Python environment in a directory. env_dir is the target directory to create an environment in. """ env_dir = os.path.abspath(env_dir) context = self.ensure_directories(env_dir) self.create_configuration(context) self.setup_python(context) self.setup_scripts(context) self.post_setup(context) Each of the methods "ensure_directories()", "create_configuration()", "setup_python()", "setup_scripts()" and "post_setup()" can be overridden. ensure_directories(env_dir) Creates the environment directory and all necessary directories, and returns a context object. This is just a holder for attributes (such as paths), for use by the other methods. The directories are allowed to exist already, as long as either "clear" or "upgrade" were specified to allow operating on an existing environment directory. create_configuration(context) Creates the "pyvenv.cfg" configuration file in the environment. setup_python(context) Creates a copy or symlink to the Python executable in the environment. On POSIX systems, if a specific executable "python3.x" was used, symlinks to "python" and "python3" will be created pointing to that executable, unless files with those names already exist. setup_scripts(context) Installs activation scripts appropriate to the platform into the virtual environment. upgrade_dependencies(context) Upgrades the core venv dependency packages (currently "pip" and "setuptools") in the environment. This is done by shelling out to the "pip" executable in the environment. New in version 3.9. post_setup(context) A placeholder method which can be overridden in third party implementations to pre-install packages in the virtual environment or perform other post-creation steps. Changed in version 3.7.2: Windows now uses redirector scripts for "python[w].exe" instead of copying the actual binaries. In 3.7.2 only "setup_python()" does nothing unless running from a build in the source tree. Changed in version 3.7.3: Windows copies the redirector scripts as part of "setup_python()" instead of "setup_scripts()". This was not the case in 3.7.2. When using symlinks, the original executables will be linked. In addition, "EnvBuilder" provides this utility method that can be called from "setup_scripts()" or "post_setup()" in subclasses to assist in installing custom scripts into the virtual environment. install_scripts(context, path) *path* is the path to a directory that should contain subdirectories “common”, “posix”, “nt”, each containing scripts destined for the bin directory in the environment. The contents of “common” and the directory corresponding to "os.name" are copied after some text replacement of placeholders: * "__VENV_DIR__" is replaced with the absolute path of the environment directory. * "__VENV_NAME__" is replaced with the environment name (final path segment of environment directory). * "__VENV_PROMPT__" is replaced with the prompt (the environment name surrounded by parentheses and with a following space) * "__VENV_BIN_NAME__" is replaced with the name of the bin directory (either "bin" or "Scripts"). * "__VENV_PYTHON__" is replaced with the absolute path of the environment’s executable. The directories are allowed to exist (for when an existing environment is being upgraded). There is also a module-level convenience function: venv.create(env_dir, system_site_packages=False, clear=False, symlinks=False, with_pip=False, prompt=None) Create an "EnvBuilder" with the given keyword arguments, and call its "create()" method with the *env_dir* argument. New in version 3.3. Changed in version 3.4: Added the "with_pip" parameter Changed in version 3.6: Added the "prompt" parameter An example of extending "EnvBuilder" ==================================== The following script shows how to extend "EnvBuilder" by implementing a subclass which installs setuptools and pip into a created virtual environment: import os import os.path from subprocess import Popen, PIPE import sys from threading import Thread from urllib.parse import urlparse from urllib.request import urlretrieve import venv class ExtendedEnvBuilder(venv.EnvBuilder): """ This builder installs setuptools and pip so that you can pip or easy_install other packages into the created virtual environment. :param nodist: If true, setuptools and pip are not installed into the created virtual environment. :param nopip: If true, pip is not installed into the created virtual environment. :param progress: If setuptools or pip are installed, the progress of the installation can be monitored by passing a progress callable. If specified, it is called with two arguments: a string indicating some progress, and a context indicating where the string is coming from. The context argument can have one of three values: 'main', indicating that it is called from virtualize() itself, and 'stdout' and 'stderr', which are obtained by reading lines from the output streams of a subprocess which is used to install the app. If a callable is not specified, default progress information is output to sys.stderr. """ def __init__(self, *args, **kwargs): self.nodist = kwargs.pop('nodist', False) self.nopip = kwargs.pop('nopip', False) self.progress = kwargs.pop('progress', None) self.verbose = kwargs.pop('verbose', False) super().__init__(*args, **kwargs) def post_setup(self, context): """ Set up any packages which need to be pre-installed into the virtual environment being created. :param context: The information for the virtual environment creation request being processed. """ os.environ['VIRTUAL_ENV'] = context.env_dir if not self.nodist: self.install_setuptools(context) # Can't install pip without setuptools if not self.nopip and not self.nodist: self.install_pip(context) def reader(self, stream, context): """ Read lines from a subprocess' output stream and either pass to a progress callable (if specified) or write progress information to sys.stderr. """ progress = self.progress while True: s = stream.readline() if not s: break if progress is not None: progress(s, context) else: if not self.verbose: sys.stderr.write('.') else: sys.stderr.write(s.decode('utf-8')) sys.stderr.flush() stream.close() def install_script(self, context, name, url): _, _, path, _, _, _ = urlparse(url) fn = os.path.split(path)[-1] binpath = context.bin_path distpath = os.path.join(binpath, fn) # Download script into the virtual environment's binaries folder urlretrieve(url, distpath) progress = self.progress if self.verbose: term = '\n' else: term = '' if progress is not None: progress('Installing %s ...%s' % (name, term), 'main') else: sys.stderr.write('Installing %s ...%s' % (name, term)) sys.stderr.flush() # Install in the virtual environment args = [context.env_exe, fn] p = Popen(args, stdout=PIPE, stderr=PIPE, cwd=binpath) t1 = Thread(target=self.reader, args=(p.stdout, 'stdout')) t1.start() t2 = Thread(target=self.reader, args=(p.stderr, 'stderr')) t2.start() p.wait() t1.join() t2.join() if progress is not None: progress('done.', 'main') else: sys.stderr.write('done.\n') # Clean up - no longer needed os.unlink(distpath) def install_setuptools(self, context): """ Install setuptools in the virtual environment. :param context: The information for the virtual environment creation request being processed. """ url = 'https://bitbucket.org/pypa/setuptools/downloads/ez_setup.py' self.install_script(context, 'setuptools', url) # clear up the setuptools archive which gets downloaded pred = lambda o: o.startswith('setuptools-') and o.endswith('.tar.gz') files = filter(pred, os.listdir(context.bin_path)) for f in files: f = os.path.join(context.bin_path, f) os.unlink(f) def install_pip(self, context): """ Install pip in the virtual environment. :param context: The information for the virtual environment creation request being processed. """ url = 'https://bootstrap.pypa.io/get-pip.py' self.install_script(context, 'pip', url) def main(args=None): compatible = True if sys.version_info < (3, 3): compatible = False elif not hasattr(sys, 'base_prefix'): compatible = False if not compatible: raise ValueError('This script is only for use with ' 'Python 3.3 or later') else: import argparse parser = argparse.ArgumentParser(prog=__name__, description='Creates virtual Python ' 'environments in one or ' 'more target ' 'directories.') parser.add_argument('dirs', metavar='ENV_DIR', nargs='+', help='A directory in which to create the ' 'virtual environment.') parser.add_argument('--no-setuptools', default=False, action='store_true', dest='nodist', help="Don't install setuptools or pip in the " "virtual environment.") parser.add_argument('--no-pip', default=False, action='store_true', dest='nopip', help="Don't install pip in the virtual " "environment.") parser.add_argument('--system-site-packages', default=False, action='store_true', dest='system_site', help='Give the virtual environment access to the ' 'system site-packages dir.') if os.name == 'nt': use_symlinks = False else: use_symlinks = True parser.add_argument('--symlinks', default=use_symlinks, action='store_true', dest='symlinks', help='Try to use symlinks rather than copies, ' 'when symlinks are not the default for ' 'the platform.') parser.add_argument('--clear', default=False, action='store_true', dest='clear', help='Delete the contents of the ' 'virtual environment ' 'directory if it already ' 'exists, before virtual ' 'environment creation.') parser.add_argument('--upgrade', default=False, action='store_true', dest='upgrade', help='Upgrade the virtual ' 'environment directory to ' 'use this version of ' 'Python, assuming Python ' 'has been upgraded ' 'in-place.') parser.add_argument('--verbose', default=False, action='store_true', dest='verbose', help='Display the output ' 'from the scripts which ' 'install setuptools and pip.') options = parser.parse_args(args) if options.upgrade and options.clear: raise ValueError('you cannot supply --upgrade and --clear together.') builder = ExtendedEnvBuilder(system_site_packages=options.system_site, clear=options.clear, symlinks=options.symlinks, upgrade=options.upgrade, nodist=options.nodist, nopip=options.nopip, verbose=options.verbose) for d in options.dirs: builder.create(d) if __name__ == '__main__': rc = 1 try: main() rc = 0 except Exception as e: print('Error: %s' % e, file=sys.stderr) sys.exit(rc) This script is also available for download online. "warnings" — Warning control **************************** **Source code:** Lib/warnings.py ====================================================================== Warning messages are typically issued in situations where it is useful to alert the user of some condition in a program, where that condition (normally) doesn’t warrant raising an exception and terminating the program. For example, one might want to issue a warning when a program uses an obsolete module. Python programmers issue warnings by calling the "warn()" function defined in this module. (C programmers use "PyErr_WarnEx()"; see Exception Handling for details). Warning messages are normally written to "sys.stderr", but their disposition can be changed flexibly, from ignoring all warnings to turning them into exceptions. The disposition of warnings can vary based on the warning category, the text of the warning message, and the source location where it is issued. Repetitions of a particular warning for the same source location are typically suppressed. There are two stages in warning control: first, each time a warning is issued, a determination is made whether a message should be issued or not; next, if a message is to be issued, it is formatted and printed using a user-settable hook. The determination whether to issue a warning message is controlled by the warning filter, which is a sequence of matching rules and actions. Rules can be added to the filter by calling "filterwarnings()" and reset to its default state by calling "resetwarnings()". The printing of warning messages is done by calling "showwarning()", which may be overridden; the default implementation of this function formats the message by calling "formatwarning()", which is also available for use by custom implementations. See also: "logging.captureWarnings()" allows you to handle all warnings with the standard logging infrastructure. Warning Categories ================== There are a number of built-in exceptions that represent warning categories. This categorization is useful to be able to filter out groups of warnings. While these are technically built-in exceptions, they are documented here, because conceptually they belong to the warnings mechanism. User code can define additional warning categories by subclassing one of the standard warning categories. A warning category must always be a subclass of the "Warning" class. The following warnings category classes are currently defined: +------------------------------------+-------------------------------------------------+ | Class | Description | |====================================|=================================================| | "Warning" | This is the base class of all warning category | | | classes. It is a subclass of "Exception". | +------------------------------------+-------------------------------------------------+ | "UserWarning" | The default category for "warn()". | +------------------------------------+-------------------------------------------------+ | "DeprecationWarning" | Base category for warnings about deprecated | | | features when those warnings are intended for | | | other Python developers (ignored by default, | | | unless triggered by code in "__main__"). | +------------------------------------+-------------------------------------------------+ | "SyntaxWarning" | Base category for warnings about dubious | | | syntactic features. | +------------------------------------+-------------------------------------------------+ | "RuntimeWarning" | Base category for warnings about dubious | | | runtime features. | +------------------------------------+-------------------------------------------------+ | "FutureWarning" | Base category for warnings about deprecated | | | features when those warnings are intended for | | | end users of applications that are written in | | | Python. | +------------------------------------+-------------------------------------------------+ | "PendingDeprecationWarning" | Base category for warnings about features that | | | will be deprecated in the future (ignored by | | | default). | +------------------------------------+-------------------------------------------------+ | "ImportWarning" | Base category for warnings triggered during the | | | process of importing a module (ignored by | | | default). | +------------------------------------+-------------------------------------------------+ | "UnicodeWarning" | Base category for warnings related to Unicode. | +------------------------------------+-------------------------------------------------+ | "BytesWarning" | Base category for warnings related to "bytes" | | | and "bytearray". | +------------------------------------+-------------------------------------------------+ | "ResourceWarning" | Base category for warnings related to resource | | | usage (ignored by default). | +------------------------------------+-------------------------------------------------+ Changed in version 3.7: Previously "DeprecationWarning" and "FutureWarning" were distinguished based on whether a feature was being removed entirely or changing its behaviour. They are now distinguished based on their intended audience and the way they’re handled by the default warnings filters. The Warnings Filter =================== The warnings filter controls whether warnings are ignored, displayed, or turned into errors (raising an exception). Conceptually, the warnings filter maintains an ordered list of filter specifications; any specific warning is matched against each filter specification in the list in turn until a match is found; the filter determines the disposition of the match. Each entry is a tuple of the form (*action*, *message*, *category*, *module*, *lineno*), where: * *action* is one of the following strings: +-----------------+------------------------------------------------+ | Value | Disposition | |=================|================================================| | ""default"" | print the first occurrence of matching | | | warnings for each location (module + line | | | number) where the warning is issued | +-----------------+------------------------------------------------+ | ""error"" | turn matching warnings into exceptions | +-----------------+------------------------------------------------+ | ""ignore"" | never print matching warnings | +-----------------+------------------------------------------------+ | ""always"" | always print matching warnings | +-----------------+------------------------------------------------+ | ""module"" | print the first occurrence of matching | | | warnings for each module where the warning is | | | issued (regardless of line number) | +-----------------+------------------------------------------------+ | ""once"" | print only the first occurrence of matching | | | warnings, regardless of location | +-----------------+------------------------------------------------+ * *message* is a string containing a regular expression that the start of the warning message must match. The expression is compiled to always be case-insensitive. * *category* is a class (a subclass of "Warning") of which the warning category must be a subclass in order to match. * *module* is a string containing a regular expression that the module name must match. The expression is compiled to be case-sensitive. * *lineno* is an integer that the line number where the warning occurred must match, or "0" to match all line numbers. Since the "Warning" class is derived from the built-in "Exception" class, to turn a warning into an error we simply raise "category(message)". If a warning is reported and doesn’t match any registered filter then the “default” action is applied (hence its name). Describing Warning Filters -------------------------- The warnings filter is initialized by "-W" options passed to the Python interpreter command line and the "PYTHONWARNINGS" environment variable. The interpreter saves the arguments for all supplied entries without interpretation in "sys.warnoptions"; the "warnings" module parses these when it is first imported (invalid options are ignored, after printing a message to "sys.stderr"). Individual warnings filters are specified as a sequence of fields separated by colons: action:message:category:module:line The meaning of each of these fields is as described in The Warnings Filter. When listing multiple filters on a single line (as for "PYTHONWARNINGS"), the individual filters are separated by commas and the filters listed later take precedence over those listed before them (as they’re applied left-to-right, and the most recently applied filters take precedence over earlier ones). Commonly used warning filters apply to either all warnings, warnings in a particular category, or warnings raised by particular modules or packages. Some examples: default # Show all warnings (even those ignored by default) ignore # Ignore all warnings error # Convert all warnings to errors error::ResourceWarning # Treat ResourceWarning messages as errors default::DeprecationWarning # Show DeprecationWarning messages ignore,default:::mymodule # Only report warnings triggered by "mymodule" error:::mymodule[.*] # Convert warnings to errors in "mymodule" # and any subpackages of "mymodule" Default Warning Filter ---------------------- By default, Python installs several warning filters, which can be overridden by the "-W" command-line option, the "PYTHONWARNINGS" environment variable and calls to "filterwarnings()". In regular release builds, the default warning filter has the following entries (in order of precedence): default::DeprecationWarning:__main__ ignore::DeprecationWarning ignore::PendingDeprecationWarning ignore::ImportWarning ignore::ResourceWarning In debug builds, the list of default warning filters is empty. Changed in version 3.2: "DeprecationWarning" is now ignored by default in addition to "PendingDeprecationWarning". Changed in version 3.7: "DeprecationWarning" is once again shown by default when triggered directly by code in "__main__". Changed in version 3.7: "BytesWarning" no longer appears in the default filter list and is instead configured via "sys.warnoptions" when "-b" is specified twice. Overriding the default filter ----------------------------- Developers of applications written in Python may wish to hide *all* Python level warnings from their users by default, and only display them when running tests or otherwise working on the application. The "sys.warnoptions" attribute used to pass filter configurations to the interpreter can be used as a marker to indicate whether or not warnings should be disabled: import sys if not sys.warnoptions: import warnings warnings.simplefilter("ignore") Developers of test runners for Python code are advised to instead ensure that *all* warnings are displayed by default for the code under test, using code like: import sys if not sys.warnoptions: import os, warnings warnings.simplefilter("default") # Change the filter in this process os.environ["PYTHONWARNINGS"] = "default" # Also affect subprocesses Finally, developers of interactive shells that run user code in a namespace other than "__main__" are advised to ensure that "DeprecationWarning" messages are made visible by default, using code like the following (where "user_ns" is the module used to execute code entered interactively): import warnings warnings.filterwarnings("default", category=DeprecationWarning, module=user_ns.get("__name__")) Temporarily Suppressing Warnings ================================ If you are using code that you know will raise a warning, such as a deprecated function, but do not want to see the warning (even when warnings have been explicitly configured via the command line), then it is possible to suppress the warning using the "catch_warnings" context manager: import warnings def fxn(): warnings.warn("deprecated", DeprecationWarning) with warnings.catch_warnings(): warnings.simplefilter("ignore") fxn() While within the context manager all warnings will simply be ignored. This allows you to use known-deprecated code without having to see the warning while not suppressing the warning for other code that might not be aware of its use of deprecated code. Note: this can only be guaranteed in a single-threaded application. If two or more threads use the "catch_warnings" context manager at the same time, the behavior is undefined. Testing Warnings ================ To test warnings raised by code, use the "catch_warnings" context manager. With it you can temporarily mutate the warnings filter to facilitate your testing. For instance, do the following to capture all raised warnings to check: import warnings def fxn(): warnings.warn("deprecated", DeprecationWarning) with warnings.catch_warnings(record=True) as w: # Cause all warnings to always be triggered. warnings.simplefilter("always") # Trigger a warning. fxn() # Verify some things assert len(w) == 1 assert issubclass(w[-1].category, DeprecationWarning) assert "deprecated" in str(w[-1].message) One can also cause all warnings to be exceptions by using "error" instead of "always". One thing to be aware of is that if a warning has already been raised because of a "once"/"default" rule, then no matter what filters are set the warning will not be seen again unless the warnings registry related to the warning has been cleared. Once the context manager exits, the warnings filter is restored to its state when the context was entered. This prevents tests from changing the warnings filter in unexpected ways between tests and leading to indeterminate test results. The "showwarning()" function in the module is also restored to its original value. Note: this can only be guaranteed in a single-threaded application. If two or more threads use the "catch_warnings" context manager at the same time, the behavior is undefined. When testing multiple operations that raise the same kind of warning, it is important to test them in a manner that confirms each operation is raising a new warning (e.g. set warnings to be raised as exceptions and check the operations raise exceptions, check that the length of the warning list continues to increase after each operation, or else delete the previous entries from the warnings list before each new operation). Updating Code For New Versions of Dependencies ============================================== Warning categories that are primarily of interest to Python developers (rather than end users of applications written in Python) are ignored by default. Notably, this “ignored by default” list includes "DeprecationWarning" (for every module except "__main__"), which means developers should make sure to test their code with typically ignored warnings made visible in order to receive timely notifications of future breaking API changes (whether in the standard library or third party packages). In the ideal case, the code will have a suitable test suite, and the test runner will take care of implicitly enabling all warnings when running tests (the test runner provided by the "unittest" module does this). In less ideal cases, applications can be checked for use of deprecated interfaces by passing "-Wd" to the Python interpreter (this is shorthand for "-W default") or setting "PYTHONWARNINGS=default" in the environment. This enables default handling for all warnings, including those that are ignored by default. To change what action is taken for encountered warnings you can change what argument is passed to "-W" (e.g. "-W error"). See the "-W" flag for more details on what is possible. Available Functions =================== warnings.warn(message, category=None, stacklevel=1, source=None) Issue a warning, or maybe ignore it or raise an exception. The *category* argument, if given, must be a warning category class; it defaults to "UserWarning". Alternatively, *message* can be a "Warning" instance, in which case *category* will be ignored and "message.__class__" will be used. In this case, the message text will be "str(message)". This function raises an exception if the particular warning issued is changed into an error by the warnings filter. The *stacklevel* argument can be used by wrapper functions written in Python, like this: def deprecation(message): warnings.warn(message, DeprecationWarning, stacklevel=2) This makes the warning refer to "deprecation()"’s caller, rather than to the source of "deprecation()" itself (since the latter would defeat the purpose of the warning message). *source*, if supplied, is the destroyed object which emitted a "ResourceWarning". Changed in version 3.6: Added *source* parameter. warnings.warn_explicit(message, category, filename, lineno, module=None, registry=None, module_globals=None, source=None) This is a low-level interface to the functionality of "warn()", passing in explicitly the message, category, filename and line number, and optionally the module name and the registry (which should be the "__warningregistry__" dictionary of the module). The module name defaults to the filename with ".py" stripped; if no registry is passed, the warning is never suppressed. *message* must be a string and *category* a subclass of "Warning" or *message* may be a "Warning" instance, in which case *category* will be ignored. *module_globals*, if supplied, should be the global namespace in use by the code for which the warning is issued. (This argument is used to support displaying source for modules found in zipfiles or other non-filesystem import sources). *source*, if supplied, is the destroyed object which emitted a "ResourceWarning". Changed in version 3.6: Add the *source* parameter. warnings.showwarning(message, category, filename, lineno, file=None, line=None) Write a warning to a file. The default implementation calls "formatwarning(message, category, filename, lineno, line)" and writes the resulting string to *file*, which defaults to "sys.stderr". You may replace this function with any callable by assigning to "warnings.showwarning". *line* is a line of source code to be included in the warning message; if *line* is not supplied, "showwarning()" will try to read the line specified by *filename* and *lineno*. warnings.formatwarning(message, category, filename, lineno, line=None) Format a warning the standard way. This returns a string which may contain embedded newlines and ends in a newline. *line* is a line of source code to be included in the warning message; if *line* is not supplied, "formatwarning()" will try to read the line specified by *filename* and *lineno*. warnings.filterwarnings(action, message='', category=Warning, module='', lineno=0, append=False) Insert an entry into the list of warnings filter specifications. The entry is inserted at the front by default; if *append* is true, it is inserted at the end. This checks the types of the arguments, compiles the *message* and *module* regular expressions, and inserts them as a tuple in the list of warnings filters. Entries closer to the front of the list override entries later in the list, if both match a particular warning. Omitted arguments default to a value that matches everything. warnings.simplefilter(action, category=Warning, lineno=0, append=False) Insert a simple entry into the list of warnings filter specifications. The meaning of the function parameters is as for "filterwarnings()", but regular expressions are not needed as the filter inserted always matches any message in any module as long as the category and line number match. warnings.resetwarnings() Reset the warnings filter. This discards the effect of all previous calls to "filterwarnings()", including that of the "-W" command line options and calls to "simplefilter()". Available Context Managers ========================== class warnings.catch_warnings(*, record=False, module=None) A context manager that copies and, upon exit, restores the warnings filter and the "showwarning()" function. If the *record* argument is "False" (the default) the context manager returns "None" on entry. If *record* is "True", a list is returned that is progressively populated with objects as seen by a custom "showwarning()" function (which also suppresses output to "sys.stdout"). Each object in the list has attributes with the same names as the arguments to "showwarning()". The *module* argument takes a module that will be used instead of the module returned when you import "warnings" whose filter will be protected. This argument exists primarily for testing the "warnings" module itself. Note: The "catch_warnings" manager works by replacing and then later restoring the module’s "showwarning()" function and internal list of filter specifications. This means the context manager is modifying global state and therefore is not thread-safe. "wave" — Read and write WAV files ********************************* **Source code:** Lib/wave.py ====================================================================== The "wave" module provides a convenient interface to the WAV sound format. It does not support compression/decompression, but it does support mono/stereo. The "wave" module defines the following function and exception: wave.open(file, mode=None) If *file* is a string, open the file by that name, otherwise treat it as a file-like object. *mode* can be: "'rb'" Read only mode. "'wb'" Write only mode. Note that it does not allow read/write WAV files. A *mode* of "'rb'" returns a "Wave_read" object, while a *mode* of "'wb'" returns a "Wave_write" object. If *mode* is omitted and a file-like object is passed as *file*, "file.mode" is used as the default value for *mode*. If you pass in a file-like object, the wave object will not close it when its "close()" method is called; it is the caller’s responsibility to close the file object. The "open()" function may be used in a "with" statement. When the "with" block completes, the "Wave_read.close()" or "Wave_write.close()" method is called. Changed in version 3.4: Added support for unseekable files. exception wave.Error An error raised when something is impossible because it violates the WAV specification or hits an implementation deficiency. Wave_read Objects ================= Wave_read objects, as returned by "open()", have the following methods: Wave_read.close() Close the stream if it was opened by "wave", and make the instance unusable. This is called automatically on object collection. Wave_read.getnchannels() Returns number of audio channels ("1" for mono, "2" for stereo). Wave_read.getsampwidth() Returns sample width in bytes. Wave_read.getframerate() Returns sampling frequency. Wave_read.getnframes() Returns number of audio frames. Wave_read.getcomptype() Returns compression type ("'NONE'" is the only supported type). Wave_read.getcompname() Human-readable version of "getcomptype()". Usually "'not compressed'" parallels "'NONE'". Wave_read.getparams() Returns a "namedtuple()" "(nchannels, sampwidth, framerate, nframes, comptype, compname)", equivalent to output of the "get*()" methods. Wave_read.readframes(n) Reads and returns at most *n* frames of audio, as a "bytes" object. Wave_read.rewind() Rewind the file pointer to the beginning of the audio stream. The following two methods are defined for compatibility with the "aifc" module, and don’t do anything interesting. Wave_read.getmarkers() Returns "None". Wave_read.getmark(id) Raise an error. The following two methods define a term “position” which is compatible between them, and is otherwise implementation dependent. Wave_read.setpos(pos) Set the file pointer to the specified position. Wave_read.tell() Return current file pointer position. Wave_write Objects ================== For seekable output streams, the "wave" header will automatically be updated to reflect the number of frames actually written. For unseekable streams, the *nframes* value must be accurate when the first frame data is written. An accurate *nframes* value can be achieved either by calling "setnframes()" or "setparams()" with the number of frames that will be written before "close()" is called and then using "writeframesraw()" to write the frame data, or by calling "writeframes()" with all of the frame data to be written. In the latter case "writeframes()" will calculate the number of frames in the data and set *nframes* accordingly before writing the frame data. Wave_write objects, as returned by "open()", have the following methods: Changed in version 3.4: Added support for unseekable files. Wave_write.close() Make sure *nframes* is correct, and close the file if it was opened by "wave". This method is called upon object collection. It will raise an exception if the output stream is not seekable and *nframes* does not match the number of frames actually written. Wave_write.setnchannels(n) Set the number of channels. Wave_write.setsampwidth(n) Set the sample width to *n* bytes. Wave_write.setframerate(n) Set the frame rate to *n*. Changed in version 3.2: A non-integral input to this method is rounded to the nearest integer. Wave_write.setnframes(n) Set the number of frames to *n*. This will be changed later if the number of frames actually written is different (this update attempt will raise an error if the output stream is not seekable). Wave_write.setcomptype(type, name) Set the compression type and description. At the moment, only compression type "NONE" is supported, meaning no compression. Wave_write.setparams(tuple) The *tuple* should be "(nchannels, sampwidth, framerate, nframes, comptype, compname)", with values valid for the "set*()" methods. Sets all parameters. Wave_write.tell() Return current position in the file, with the same disclaimer for the "Wave_read.tell()" and "Wave_read.setpos()" methods. Wave_write.writeframesraw(data) Write audio frames, without correcting *nframes*. Changed in version 3.4: Any *bytes-like object* is now accepted. Wave_write.writeframes(data) Write audio frames and make sure *nframes* is correct. It will raise an error if the output stream is not seekable and the total number of frames that have been written after *data* has been written does not match the previously set value for *nframes*. Changed in version 3.4: Any *bytes-like object* is now accepted. Note that it is invalid to set any parameters after calling "writeframes()" or "writeframesraw()", and any attempt to do so will raise "wave.Error". "weakref" — Weak references *************************** **Source code:** Lib/weakref.py ====================================================================== The "weakref" module allows the Python programmer to create *weak references* to objects. In the following, the term *referent* means the object which is referred to by a weak reference. A weak reference to an object is not enough to keep the object alive: when the only remaining references to a referent are weak references, *garbage collection* is free to destroy the referent and reuse its memory for something else. However, until the object is actually destroyed the weak reference may return the object even if there are no strong references to it. A primary use for weak references is to implement caches or mappings holding large objects, where it’s desired that a large object not be kept alive solely because it appears in a cache or mapping. For example, if you have a number of large binary image objects, you may wish to associate a name with each. If you used a Python dictionary to map names to images, or images to names, the image objects would remain alive just because they appeared as values or keys in the dictionaries. The "WeakKeyDictionary" and "WeakValueDictionary" classes supplied by the "weakref" module are an alternative, using weak references to construct mappings that don’t keep objects alive solely because they appear in the mapping objects. If, for example, an image object is a value in a "WeakValueDictionary", then when the last remaining references to that image object are the weak references held by weak mappings, garbage collection can reclaim the object, and its corresponding entries in weak mappings are simply deleted. "WeakKeyDictionary" and "WeakValueDictionary" use weak references in their implementation, setting up callback functions on the weak references that notify the weak dictionaries when a key or value has been reclaimed by garbage collection. "WeakSet" implements the "set" interface, but keeps weak references to its elements, just like a "WeakKeyDictionary" does. "finalize" provides a straight forward way to register a cleanup function to be called when an object is garbage collected. This is simpler to use than setting up a callback function on a raw weak reference, since the module automatically ensures that the finalizer remains alive until the object is collected. Most programs should find that using one of these weak container types or "finalize" is all they need – it’s not usually necessary to create your own weak references directly. The low-level machinery is exposed by the "weakref" module for the benefit of advanced uses. Not all objects can be weakly referenced; those objects which can include class instances, functions written in Python (but not in C), instance methods, sets, frozensets, some *file objects*, *generators*, type objects, sockets, arrays, deques, regular expression pattern objects, and code objects. Changed in version 3.2: Added support for thread.lock, threading.Lock, and code objects. Several built-in types such as "list" and "dict" do not directly support weak references but can add support through subclassing: class Dict(dict): pass obj = Dict(red=1, green=2, blue=3) # this object is weak referenceable **CPython implementation detail:** Other built-in types such as "tuple" and "int" do not support weak references even when subclassed. Extension types can easily be made to support weak references; see Weak Reference Support. When "__slots__" are defined for a given type, weak reference support is disabled unless a "'__weakref__'" string is also present in the sequence of strings in the "__slots__" declaration. See __slots__ documentation for details. class weakref.ref(object[, callback]) Return a weak reference to *object*. The original object can be retrieved by calling the reference object if the referent is still alive; if the referent is no longer alive, calling the reference object will cause "None" to be returned. If *callback* is provided and not "None", and the returned weakref object is still alive, the callback will be called when the object is about to be finalized; the weak reference object will be passed as the only parameter to the callback; the referent will no longer be available. It is allowable for many weak references to be constructed for the same object. Callbacks registered for each weak reference will be called from the most recently registered callback to the oldest registered callback. Exceptions raised by the callback will be noted on the standard error output, but cannot be propagated; they are handled in exactly the same way as exceptions raised from an object’s "__del__()" method. Weak references are *hashable* if the *object* is hashable. They will maintain their hash value even after the *object* was deleted. If "hash()" is called the first time only after the *object* was deleted, the call will raise "TypeError". Weak references support tests for equality, but not ordering. If the referents are still alive, two references have the same equality relationship as their referents (regardless of the *callback*). If either referent has been deleted, the references are equal only if the reference objects are the same object. This is a subclassable type rather than a factory function. __callback__ This read-only attribute returns the callback currently associated to the weakref. If there is no callback or if the referent of the weakref is no longer alive then this attribute will have value "None". Changed in version 3.4: Added the "__callback__" attribute. weakref.proxy(object[, callback]) Return a proxy to *object* which uses a weak reference. This supports use of the proxy in most contexts instead of requiring the explicit dereferencing used with weak reference objects. The returned object will have a type of either "ProxyType" or "CallableProxyType", depending on whether *object* is callable. Proxy objects are not *hashable* regardless of the referent; this avoids a number of problems related to their fundamentally mutable nature, and prevent their use as dictionary keys. *callback* is the same as the parameter of the same name to the "ref()" function. Changed in version 3.8: Extended the operator support on proxy objects to include the matrix multiplication operators "@" and "@=". weakref.getweakrefcount(object) Return the number of weak references and proxies which refer to *object*. weakref.getweakrefs(object) Return a list of all weak reference and proxy objects which refer to *object*. class weakref.WeakKeyDictionary([dict]) Mapping class that references keys weakly. Entries in the dictionary will be discarded when there is no longer a strong reference to the key. This can be used to associate additional data with an object owned by other parts of an application without adding attributes to those objects. This can be especially useful with objects that override attribute accesses. Changed in version 3.9: Added support for "|" and "|=" operators, specified in **PEP 584**. "WeakKeyDictionary" objects have an additional method that exposes the internal references directly. The references are not guaranteed to be “live” at the time they are used, so the result of calling the references needs to be checked before being used. This can be used to avoid creating references that will cause the garbage collector to keep the keys around longer than needed. WeakKeyDictionary.keyrefs() Return an iterable of the weak references to the keys. class weakref.WeakValueDictionary([dict]) Mapping class that references values weakly. Entries in the dictionary will be discarded when no strong reference to the value exists any more. Changed in version 3.9: Added support for "|" and "|=" operators, as specified in **PEP 584**. "WeakValueDictionary" objects have an additional method that has the same issues as the "keyrefs()" method of "WeakKeyDictionary" objects. WeakValueDictionary.valuerefs() Return an iterable of the weak references to the values. class weakref.WeakSet([elements]) Set class that keeps weak references to its elements. An element will be discarded when no strong reference to it exists any more. class weakref.WeakMethod(method) A custom "ref" subclass which simulates a weak reference to a bound method (i.e., a method defined on a class and looked up on an instance). Since a bound method is ephemeral, a standard weak reference cannot keep hold of it. "WeakMethod" has special code to recreate the bound method until either the object or the original function dies: >>> class C: ... def method(self): ... print("method called!") ... >>> c = C() >>> r = weakref.ref(c.method) >>> r() >>> r = weakref.WeakMethod(c.method) >>> r() <bound method C.method of <__main__.C object at 0x7fc859830220>> >>> r()() method called! >>> del c >>> gc.collect() 0 >>> r() >>> New in version 3.4. class weakref.finalize(obj, func, /, *args, **kwargs) Return a callable finalizer object which will be called when *obj* is garbage collected. Unlike an ordinary weak reference, a finalizer will always survive until the reference object is collected, greatly simplifying lifecycle management. A finalizer is considered *alive* until it is called (either explicitly or at garbage collection), and after that it is *dead*. Calling a live finalizer returns the result of evaluating "func(*arg, **kwargs)", whereas calling a dead finalizer returns "None". Exceptions raised by finalizer callbacks during garbage collection will be shown on the standard error output, but cannot be propagated. They are handled in the same way as exceptions raised from an object’s "__del__()" method or a weak reference’s callback. When the program exits, each remaining live finalizer is called unless its "atexit" attribute has been set to false. They are called in reverse order of creation. A finalizer will never invoke its callback during the later part of the *interpreter shutdown* when module globals are liable to have been replaced by "None". __call__() If *self* is alive then mark it as dead and return the result of calling "func(*args, **kwargs)". If *self* is dead then return "None". detach() If *self* is alive then mark it as dead and return the tuple "(obj, func, args, kwargs)". If *self* is dead then return "None". peek() If *self* is alive then return the tuple "(obj, func, args, kwargs)". If *self* is dead then return "None". alive Property which is true if the finalizer is alive, false otherwise. atexit A writable boolean property which by default is true. When the program exits, it calls all remaining live finalizers for which "atexit" is true. They are called in reverse order of creation. Note: It is important to ensure that *func*, *args* and *kwargs* do not own any references to *obj*, either directly or indirectly, since otherwise *obj* will never be garbage collected. In particular, *func* should not be a bound method of *obj*. New in version 3.4. weakref.ReferenceType The type object for weak references objects. weakref.ProxyType The type object for proxies of objects which are not callable. weakref.CallableProxyType The type object for proxies of callable objects. weakref.ProxyTypes Sequence containing all the type objects for proxies. This can make it simpler to test if an object is a proxy without being dependent on naming both proxy types. See also: **PEP 205** - Weak References The proposal and rationale for this feature, including links to earlier implementations and information about similar features in other languages. Weak Reference Objects ====================== Weak reference objects have no methods and no attributes besides "ref.__callback__". A weak reference object allows the referent to be obtained, if it still exists, by calling it: >>> import weakref >>> class Object: ... pass ... >>> o = Object() >>> r = weakref.ref(o) >>> o2 = r() >>> o is o2 True If the referent no longer exists, calling the reference object returns "None": >>> del o, o2 >>> print(r()) None Testing that a weak reference object is still live should be done using the expression "ref() is not None". Normally, application code that needs to use a reference object should follow this pattern: # r is a weak reference object o = r() if o is None: # referent has been garbage collected print("Object has been deallocated; can't frobnicate.") else: print("Object is still live!") o.do_something_useful() Using a separate test for “liveness” creates race conditions in threaded applications; another thread can cause a weak reference to become invalidated before the weak reference is called; the idiom shown above is safe in threaded applications as well as single- threaded applications. Specialized versions of "ref" objects can be created through subclassing. This is used in the implementation of the "WeakValueDictionary" to reduce the memory overhead for each entry in the mapping. This may be most useful to associate additional information with a reference, but could also be used to insert additional processing on calls to retrieve the referent. This example shows how a subclass of "ref" can be used to store additional information about an object and affect the value that’s returned when the referent is accessed: import weakref class ExtendedRef(weakref.ref): def __init__(self, ob, callback=None, /, **annotations): super().__init__(ob, callback) self.__counter = 0 for k, v in annotations.items(): setattr(self, k, v) def __call__(self): """Return a pair containing the referent and the number of times the reference has been called. """ ob = super().__call__() if ob is not None: self.__counter += 1 ob = (ob, self.__counter) return ob Example ======= This simple example shows how an application can use object IDs to retrieve objects that it has seen before. The IDs of the objects can then be used in other data structures without forcing the objects to remain alive, but the objects can still be retrieved by ID if they do. import weakref _id2obj_dict = weakref.WeakValueDictionary() def remember(obj): oid = id(obj) _id2obj_dict[oid] = obj return oid def id2obj(oid): return _id2obj_dict[oid] Finalizer Objects ================= The main benefit of using "finalize" is that it makes it simple to register a callback without needing to preserve the returned finalizer object. For instance >>> import weakref >>> class Object: ... pass ... >>> kenny = Object() >>> weakref.finalize(kenny, print, "You killed Kenny!") <finalize object at ...; for 'Object' at ...> >>> del kenny You killed Kenny! The finalizer can be called directly as well. However the finalizer will invoke the callback at most once. >>> def callback(x, y, z): ... print("CALLBACK") ... return x + y + z ... >>> obj = Object() >>> f = weakref.finalize(obj, callback, 1, 2, z=3) >>> assert f.alive >>> assert f() == 6 CALLBACK >>> assert not f.alive >>> f() # callback not called because finalizer dead >>> del obj # callback not called because finalizer dead You can unregister a finalizer using its "detach()" method. This kills the finalizer and returns the arguments passed to the constructor when it was created. >>> obj = Object() >>> f = weakref.finalize(obj, callback, 1, 2, z=3) >>> f.detach() (<...Object object ...>, <function callback ...>, (1, 2), {'z': 3}) >>> newobj, func, args, kwargs = _ >>> assert not f.alive >>> assert newobj is obj >>> assert func(*args, **kwargs) == 6 CALLBACK Unless you set the "atexit" attribute to "False", a finalizer will be called when the program exits if it is still alive. For instance >>> obj = Object() >>> weakref.finalize(obj, print, "obj dead or exiting") <finalize object at ...; for 'Object' at ...> >>> exit() obj dead or exiting Comparing finalizers with "__del__()" methods ============================================= Suppose we want to create a class whose instances represent temporary directories. The directories should be deleted with their contents when the first of the following events occurs: * the object is garbage collected, * the object’s "remove()" method is called, or * the program exits. We might try to implement the class using a "__del__()" method as follows: class TempDir: def __init__(self): self.name = tempfile.mkdtemp() def remove(self): if self.name is not None: shutil.rmtree(self.name) self.name = None @property def removed(self): return self.name is None def __del__(self): self.remove() Starting with Python 3.4, "__del__()" methods no longer prevent reference cycles from being garbage collected, and module globals are no longer forced to "None" during *interpreter shutdown*. So this code should work without any issues on CPython. However, handling of "__del__()" methods is notoriously implementation specific, since it depends on internal details of the interpreter’s garbage collector implementation. A more robust alternative can be to define a finalizer which only references the specific functions and objects that it needs, rather than having access to the full state of the object: class TempDir: def __init__(self): self.name = tempfile.mkdtemp() self._finalizer = weakref.finalize(self, shutil.rmtree, self.name) def remove(self): self._finalizer() @property def removed(self): return not self._finalizer.alive Defined like this, our finalizer only receives a reference to the details it needs to clean up the directory appropriately. If the object never gets garbage collected the finalizer will still be called at exit. The other advantage of weakref based finalizers is that they can be used to register finalizers for classes where the definition is controlled by a third party, such as running code when a module is unloaded: import weakref, sys def unloading_module(): # implicit reference to the module globals from the function body weakref.finalize(sys.modules[__name__], unloading_module) Note: If you create a finalizer object in a daemonic thread just as the program exits then there is the possibility that the finalizer does not get called at exit. However, in a daemonic thread "atexit.register()", "try: ... finally: ..." and "with: ..." do not guarantee that cleanup occurs either. "webbrowser" — Convenient Web-browser controller ************************************************ **Source code:** Lib/webbrowser.py ====================================================================== The "webbrowser" module provides a high-level interface to allow displaying Web-based documents to users. Under most circumstances, simply calling the "open()" function from this module will do the right thing. Under Unix, graphical browsers are preferred under X11, but text-mode browsers will be used if graphical browsers are not available or an X11 display isn’t available. If text-mode browsers are used, the calling process will block until the user exits the browser. If the environment variable "BROWSER" exists, it is interpreted as the "os.pathsep"-separated list of browsers to try ahead of the platform defaults. When the value of a list part contains the string "%s", then it is interpreted as a literal browser command line to be used with the argument URL substituted for "%s"; if the part does not contain "%s", it is simply interpreted as the name of the browser to launch. [1] For non-Unix platforms, or when a remote browser is available on Unix, the controlling process will not wait for the user to finish with the browser, but allow the remote browser to maintain its own windows on the display. If remote browsers are not available on Unix, the controlling process will launch a new browser and wait. The script **webbrowser** can be used as a command-line interface for the module. It accepts a URL as the argument. It accepts the following optional parameters: "-n" opens the URL in a new browser window, if possible; "-t" opens the URL in a new browser page (“tab”). The options are, naturally, mutually exclusive. Usage example: python -m webbrowser -t "https://www.python.org" The following exception is defined: exception webbrowser.Error Exception raised when a browser control error occurs. The following functions are defined: webbrowser.open(url, new=0, autoraise=True) Display *url* using the default browser. If *new* is 0, the *url* is opened in the same browser window if possible. If *new* is 1, a new browser window is opened if possible. If *new* is 2, a new browser page (“tab”) is opened if possible. If *autoraise* is "True", the window is raised if possible (note that under many window managers this will occur regardless of the setting of this variable). Note that on some platforms, trying to open a filename using this function, may work and start the operating system’s associated program. However, this is neither supported nor portable. Raises an auditing event "webbrowser.open" with argument "url". webbrowser.open_new(url) Open *url* in a new window of the default browser, if possible, otherwise, open *url* in the only browser window. webbrowser.open_new_tab(url) Open *url* in a new page (“tab”) of the default browser, if possible, otherwise equivalent to "open_new()". webbrowser.get(using=None) Return a controller object for the browser type *using*. If *using* is "None", return a controller for a default browser appropriate to the caller’s environment. webbrowser.register(name, constructor, instance=None, *, preferred=False) Register the browser type *name*. Once a browser type is registered, the "get()" function can return a controller for that browser type. If *instance* is not provided, or is "None", *constructor* will be called without parameters to create an instance when needed. If *instance* is provided, *constructor* will never be called, and may be "None". Setting *preferred* to "True" makes this browser a preferred result for a "get()" call with no argument. Otherwise, this entry point is only useful if you plan to either set the "BROWSER" variable or call "get()" with a nonempty argument matching the name of a handler you declare. Changed in version 3.7: *preferred* keyword-only parameter was added. A number of browser types are predefined. This table gives the type names that may be passed to the "get()" function and the corresponding instantiations for the controller classes, all defined in this module. +--------------------------+-------------------------------------------+---------+ | Type Name | Class Name | Notes | |==========================|===========================================|=========| | "'mozilla'" | "Mozilla('mozilla')" | | +--------------------------+-------------------------------------------+---------+ | "'firefox'" | "Mozilla('mozilla')" | | +--------------------------+-------------------------------------------+---------+ | "'netscape'" | "Mozilla('netscape')" | | +--------------------------+-------------------------------------------+---------+ | "'galeon'" | "Galeon('galeon')" | | +--------------------------+-------------------------------------------+---------+ | "'epiphany'" | "Galeon('epiphany')" | | +--------------------------+-------------------------------------------+---------+ | "'skipstone'" | "BackgroundBrowser('skipstone')" | | +--------------------------+-------------------------------------------+---------+ | "'kfmclient'" | "Konqueror()" | (1) | +--------------------------+-------------------------------------------+---------+ | "'konqueror'" | "Konqueror()" | (1) | +--------------------------+-------------------------------------------+---------+ | "'kfm'" | "Konqueror()" | (1) | +--------------------------+-------------------------------------------+---------+ | "'mosaic'" | "BackgroundBrowser('mosaic')" | | +--------------------------+-------------------------------------------+---------+ | "'opera'" | "Opera()" | | +--------------------------+-------------------------------------------+---------+ | "'grail'" | "Grail()" | | +--------------------------+-------------------------------------------+---------+ | "'links'" | "GenericBrowser('links')" | | +--------------------------+-------------------------------------------+---------+ | "'elinks'" | "Elinks('elinks')" | | +--------------------------+-------------------------------------------+---------+ | "'lynx'" | "GenericBrowser('lynx')" | | +--------------------------+-------------------------------------------+---------+ | "'w3m'" | "GenericBrowser('w3m')" | | +--------------------------+-------------------------------------------+---------+ | "'windows-default'" | "WindowsDefault" | (2) | +--------------------------+-------------------------------------------+---------+ | "'macosx'" | "MacOSXOSAScript('default')" | (3) | +--------------------------+-------------------------------------------+---------+ | "'safari'" | "MacOSXOSAScript('safari')" | (3) | +--------------------------+-------------------------------------------+---------+ | "'google-chrome'" | "Chrome('google-chrome')" | | +--------------------------+-------------------------------------------+---------+ | "'chrome'" | "Chrome('chrome')" | | +--------------------------+-------------------------------------------+---------+ | "'chromium'" | "Chromium('chromium')" | | +--------------------------+-------------------------------------------+---------+ | "'chromium-browser'" | "Chromium('chromium-browser')" | | +--------------------------+-------------------------------------------+---------+ Notes: 1. “Konqueror” is the file manager for the KDE desktop environment for Unix, and only makes sense to use if KDE is running. Some way of reliably detecting KDE would be nice; the "KDEDIR" variable is not sufficient. Note also that the name “kfm” is used even when using the **konqueror** command with KDE 2 — the implementation selects the best strategy for running Konqueror. 2. Only on Windows platforms. 3. Only on macOS platform. New in version 3.3: Support for Chrome/Chromium has been added. Here are some simple examples: url = 'https://docs.python.org/' # Open URL in a new tab, if a browser window is already open. webbrowser.open_new_tab(url) # Open URL in new window, raising the window if possible. webbrowser.open_new(url) Browser Controller Objects ========================== Browser controllers provide these methods which parallel three of the module-level convenience functions: controller.open(url, new=0, autoraise=True) Display *url* using the browser handled by this controller. If *new* is 1, a new browser window is opened if possible. If *new* is 2, a new browser page (“tab”) is opened if possible. controller.open_new(url) Open *url* in a new window of the browser handled by this controller, if possible, otherwise, open *url* in the only browser window. Alias "open_new()". controller.open_new_tab(url) Open *url* in a new page (“tab”) of the browser handled by this controller, if possible, otherwise equivalent to "open_new()". -[ Footnotes ]- [1] Executables named here without a full path will be searched in the directories given in the "PATH" environment variable. MS Windows Specific Services **************************** This chapter describes modules that are only available on MS Windows platforms. * "msvcrt" — Useful routines from the MS VC++ runtime * File Operations * Console I/O * Other Functions * "winreg" — Windows registry access * Functions * Constants * HKEY_* Constants * Access Rights * 64-bit Specific * Value Types * Registry Handle Objects * "winsound" — Sound-playing interface for Windows "winreg" — Windows registry access ********************************** ====================================================================== These functions expose the Windows registry API to Python. Instead of using an integer as the registry handle, a handle object is used to ensure that the handles are closed correctly, even if the programmer neglects to explicitly close them. Changed in version 3.3: Several functions in this module used to raise a "WindowsError", which is now an alias of "OSError". Functions ========= This module offers the following functions: winreg.CloseKey(hkey) Closes a previously opened registry key. The *hkey* argument specifies a previously opened key. Note: If *hkey* is not closed using this method (or via "hkey.Close()"), it is closed when the *hkey* object is destroyed by Python. winreg.ConnectRegistry(computer_name, key) Establishes a connection to a predefined registry handle on another computer, and returns a handle object. *computer_name* is the name of the remote computer, of the form "r"\\computername"". If "None", the local computer is used. *key* is the predefined handle to connect to. The return value is the handle of the opened key. If the function fails, an "OSError" exception is raised. Raises an auditing event "winreg.ConnectRegistry" with arguments "computer_name", "key". Changed in version 3.3: See above. winreg.CreateKey(key, sub_key) Creates or opens the specified key, returning a handle object. *key* is an already open key, or one of the predefined HKEY_* constants. *sub_key* is a string that names the key this method opens or creates. If *key* is one of the predefined keys, *sub_key* may be "None". In that case, the handle returned is the same key handle passed in to the function. If the key already exists, this function opens the existing key. The return value is the handle of the opened key. If the function fails, an "OSError" exception is raised. Raises an auditing event "winreg.CreateKey" with arguments "key", "sub_key", "access". Raises an auditing event "winreg.OpenKey/result" with argument "key". Changed in version 3.3: See above. winreg.CreateKeyEx(key, sub_key, reserved=0, access=KEY_WRITE) Creates or opens the specified key, returning a handle object. *key* is an already open key, or one of the predefined HKEY_* constants. *sub_key* is a string that names the key this method opens or creates. *reserved* is a reserved integer, and must be zero. The default is zero. *access* is an integer that specifies an access mask that describes the desired security access for the key. Default is "KEY_WRITE". See Access Rights for other allowed values. If *key* is one of the predefined keys, *sub_key* may be "None". In that case, the handle returned is the same key handle passed in to the function. If the key already exists, this function opens the existing key. The return value is the handle of the opened key. If the function fails, an "OSError" exception is raised. Raises an auditing event "winreg.CreateKey" with arguments "key", "sub_key", "access". Raises an auditing event "winreg.OpenKey/result" with argument "key". New in version 3.2. Changed in version 3.3: See above. winreg.DeleteKey(key, sub_key) Deletes the specified key. *key* is an already open key, or one of the predefined HKEY_* constants. *sub_key* is a string that must be a subkey of the key identified by the *key* parameter. This value must not be "None", and the key may not have subkeys. *This method can not delete keys with subkeys.* If the method succeeds, the entire key, including all of its values, is removed. If the method fails, an "OSError" exception is raised. Raises an auditing event "winreg.DeleteKey" with arguments "key", "sub_key", "access". Changed in version 3.3: See above. winreg.DeleteKeyEx(key, sub_key, access=KEY_WOW64_64KEY, reserved=0) Deletes the specified key. Note: The "DeleteKeyEx()" function is implemented with the RegDeleteKeyEx Windows API function, which is specific to 64-bit versions of Windows. See the RegDeleteKeyEx documentation. *key* is an already open key, or one of the predefined HKEY_* constants. *sub_key* is a string that must be a subkey of the key identified by the *key* parameter. This value must not be "None", and the key may not have subkeys. *reserved* is a reserved integer, and must be zero. The default is zero. *access* is an integer that specifies an access mask that describes the desired security access for the key. Default is "KEY_WOW64_64KEY". See Access Rights for other allowed values. *This method can not delete keys with subkeys.* If the method succeeds, the entire key, including all of its values, is removed. If the method fails, an "OSError" exception is raised. On unsupported Windows versions, "NotImplementedError" is raised. Raises an auditing event "winreg.DeleteKey" with arguments "key", "sub_key", "access". New in version 3.2. Changed in version 3.3: See above. winreg.DeleteValue(key, value) Removes a named value from a registry key. *key* is an already open key, or one of the predefined HKEY_* constants. *value* is a string that identifies the value to remove. Raises an auditing event "winreg.DeleteValue" with arguments "key", "value". winreg.EnumKey(key, index) Enumerates subkeys of an open registry key, returning a string. *key* is an already open key, or one of the predefined HKEY_* constants. *index* is an integer that identifies the index of the key to retrieve. The function retrieves the name of one subkey each time it is called. It is typically called repeatedly until an "OSError" exception is raised, indicating, no more values are available. Raises an auditing event "winreg.EnumKey" with arguments "key", "index". Changed in version 3.3: See above. winreg.EnumValue(key, index) Enumerates values of an open registry key, returning a tuple. *key* is an already open key, or one of the predefined HKEY_* constants. *index* is an integer that identifies the index of the value to retrieve. The function retrieves the name of one subkey each time it is called. It is typically called repeatedly, until an "OSError" exception is raised, indicating no more values. The result is a tuple of 3 items: +---------+----------------------------------------------+ | Index | Meaning | |=========|==============================================| | "0" | A string that identifies the value name | +---------+----------------------------------------------+ | "1" | An object that holds the value data, and | | | whose type depends on the underlying | | | registry type | +---------+----------------------------------------------+ | "2" | An integer that identifies the type of the | | | value data (see table in docs for | | | "SetValueEx()") | +---------+----------------------------------------------+ Raises an auditing event "winreg.EnumValue" with arguments "key", "index". Changed in version 3.3: See above. winreg.ExpandEnvironmentStrings(str) Expands environment variable placeholders "%NAME%" in strings like "REG_EXPAND_SZ": >>> ExpandEnvironmentStrings('%windir%') 'C:\\Windows' Raises an auditing event "winreg.ExpandEnvironmentStrings" with argument "str". winreg.FlushKey(key) Writes all the attributes of a key to the registry. *key* is an already open key, or one of the predefined HKEY_* constants. It is not necessary to call "FlushKey()" to change a key. Registry changes are flushed to disk by the registry using its lazy flusher. Registry changes are also flushed to disk at system shutdown. Unlike "CloseKey()", the "FlushKey()" method returns only when all the data has been written to the registry. An application should only call "FlushKey()" if it requires absolute certainty that registry changes are on disk. Note: If you don’t know whether a "FlushKey()" call is required, it probably isn’t. winreg.LoadKey(key, sub_key, file_name) Creates a subkey under the specified key and stores registration information from a specified file into that subkey. *key* is a handle returned by "ConnectRegistry()" or one of the constants "HKEY_USERS" or "HKEY_LOCAL_MACHINE". *sub_key* is a string that identifies the subkey to load. *file_name* is the name of the file to load registry data from. This file must have been created with the "SaveKey()" function. Under the file allocation table (FAT) file system, the filename may not have an extension. A call to "LoadKey()" fails if the calling process does not have the "SE_RESTORE_PRIVILEGE" privilege. Note that privileges are different from permissions – see the RegLoadKey documentation for more details. If *key* is a handle returned by "ConnectRegistry()", then the path specified in *file_name* is relative to the remote computer. Raises an auditing event "winreg.LoadKey" with arguments "key", "sub_key", "file_name". winreg.OpenKey(key, sub_key, reserved=0, access=KEY_READ) winreg.OpenKeyEx(key, sub_key, reserved=0, access=KEY_READ) Opens the specified key, returning a handle object. *key* is an already open key, or one of the predefined HKEY_* constants. *sub_key* is a string that identifies the sub_key to open. *reserved* is a reserved integer, and must be zero. The default is zero. *access* is an integer that specifies an access mask that describes the desired security access for the key. Default is "KEY_READ". See Access Rights for other allowed values. The result is a new handle to the specified key. If the function fails, "OSError" is raised. Raises an auditing event "winreg.OpenKey" with arguments "key", "sub_key", "access". Raises an auditing event "winreg.OpenKey/result" with argument "key". Changed in version 3.2: Allow the use of named arguments. Changed in version 3.3: See above. winreg.QueryInfoKey(key) Returns information about a key, as a tuple. *key* is an already open key, or one of the predefined HKEY_* constants. The result is a tuple of 3 items: +---------+-----------------------------------------------+ | Index | Meaning | |=========|===============================================| | "0" | An integer giving the number of sub keys this | | | key has. | +---------+-----------------------------------------------+ | "1" | An integer giving the number of values this | | | key has. | +---------+-----------------------------------------------+ | "2" | An integer giving when the key was last | | | modified (if available) as 100’s of | | | nanoseconds since Jan 1, 1601. | +---------+-----------------------------------------------+ Raises an auditing event "winreg.QueryInfoKey" with argument "key". winreg.QueryValue(key, sub_key) Retrieves the unnamed value for a key, as a string. *key* is an already open key, or one of the predefined HKEY_* constants. *sub_key* is a string that holds the name of the subkey with which the value is associated. If this parameter is "None" or empty, the function retrieves the value set by the "SetValue()" method for the key identified by *key*. Values in the registry have name, type, and data components. This method retrieves the data for a key’s first value that has a "NULL" name. But the underlying API call doesn’t return the type, so always use "QueryValueEx()" if possible. Raises an auditing event "winreg.QueryValue" with arguments "key", "sub_key", "value_name". winreg.QueryValueEx(key, value_name) Retrieves the type and data for a specified value name associated with an open registry key. *key* is an already open key, or one of the predefined HKEY_* constants. *value_name* is a string indicating the value to query. The result is a tuple of 2 items: +---------+-------------------------------------------+ | Index | Meaning | |=========|===========================================| | "0" | The value of the registry item. | +---------+-------------------------------------------+ | "1" | An integer giving the registry type for | | | this value (see table in docs for | | | "SetValueEx()") | +---------+-------------------------------------------+ Raises an auditing event "winreg.QueryValue" with arguments "key", "sub_key", "value_name". winreg.SaveKey(key, file_name) Saves the specified key, and all its subkeys to the specified file. *key* is an already open key, or one of the predefined HKEY_* constants. *file_name* is the name of the file to save registry data to. This file cannot already exist. If this filename includes an extension, it cannot be used on file allocation table (FAT) file systems by the "LoadKey()" method. If *key* represents a key on a remote computer, the path described by *file_name* is relative to the remote computer. The caller of this method must possess the "SeBackupPrivilege" security privilege. Note that privileges are different than permissions – see the Conflicts Between User Rights and Permissions documentation for more details. This function passes "NULL" for *security_attributes* to the API. Raises an auditing event "winreg.SaveKey" with arguments "key", "file_name". winreg.SetValue(key, sub_key, type, value) Associates a value with a specified key. *key* is an already open key, or one of the predefined HKEY_* constants. *sub_key* is a string that names the subkey with which the value is associated. *type* is an integer that specifies the type of the data. Currently this must be "REG_SZ", meaning only strings are supported. Use the "SetValueEx()" function for support for other data types. *value* is a string that specifies the new value. If the key specified by the *sub_key* parameter does not exist, the SetValue function creates it. Value lengths are limited by available memory. Long values (more than 2048 bytes) should be stored as files with the filenames stored in the configuration registry. This helps the registry perform efficiently. The key identified by the *key* parameter must have been opened with "KEY_SET_VALUE" access. Raises an auditing event "winreg.SetValue" with arguments "key", "sub_key", "type", "value". winreg.SetValueEx(key, value_name, reserved, type, value) Stores data in the value field of an open registry key. *key* is an already open key, or one of the predefined HKEY_* constants. *value_name* is a string that names the subkey with which the value is associated. *reserved* can be anything – zero is always passed to the API. *type* is an integer that specifies the type of the data. See Value Types for the available types. *value* is a string that specifies the new value. This method can also set additional value and type information for the specified key. The key identified by the key parameter must have been opened with "KEY_SET_VALUE" access. To open the key, use the "CreateKey()" or "OpenKey()" methods. Value lengths are limited by available memory. Long values (more than 2048 bytes) should be stored as files with the filenames stored in the configuration registry. This helps the registry perform efficiently. Raises an auditing event "winreg.SetValue" with arguments "key", "sub_key", "type", "value". winreg.DisableReflectionKey(key) Disables registry reflection for 32-bit processes running on a 64-bit operating system. *key* is an already open key, or one of the predefined HKEY_* constants. Will generally raise "NotImplementedError" if executed on a 32-bit operating system. If the key is not on the reflection list, the function succeeds but has no effect. Disabling reflection for a key does not affect reflection of any subkeys. Raises an auditing event "winreg.DisableReflectionKey" with argument "key". winreg.EnableReflectionKey(key) Restores registry reflection for the specified disabled key. *key* is an already open key, or one of the predefined HKEY_* constants. Will generally raise "NotImplementedError" if executed on a 32-bit operating system. Restoring reflection for a key does not affect reflection of any subkeys. Raises an auditing event "winreg.EnableReflectionKey" with argument "key". winreg.QueryReflectionKey(key) Determines the reflection state for the specified key. *key* is an already open key, or one of the predefined HKEY_* constants. Returns "True" if reflection is disabled. Will generally raise "NotImplementedError" if executed on a 32-bit operating system. Raises an auditing event "winreg.QueryReflectionKey" with argument "key". Constants ========= The following constants are defined for use in many "_winreg" functions. HKEY_* Constants ---------------- winreg.HKEY_CLASSES_ROOT Registry entries subordinate to this key define types (or classes) of documents and the properties associated with those types. Shell and COM applications use the information stored under this key. winreg.HKEY_CURRENT_USER Registry entries subordinate to this key define the preferences of the current user. These preferences include the settings of environment variables, data about program groups, colors, printers, network connections, and application preferences. winreg.HKEY_LOCAL_MACHINE Registry entries subordinate to this key define the physical state of the computer, including data about the bus type, system memory, and installed hardware and software. winreg.HKEY_USERS Registry entries subordinate to this key define the default user configuration for new users on the local computer and the user configuration for the current user. winreg.HKEY_PERFORMANCE_DATA Registry entries subordinate to this key allow you to access performance data. The data is not actually stored in the registry; the registry functions cause the system to collect the data from its source. winreg.HKEY_CURRENT_CONFIG Contains information about the current hardware profile of the local computer system. winreg.HKEY_DYN_DATA This key is not used in versions of Windows after 98. Access Rights ------------- For more information, see Registry Key Security and Access. winreg.KEY_ALL_ACCESS Combines the STANDARD_RIGHTS_REQUIRED, "KEY_QUERY_VALUE", "KEY_SET_VALUE", "KEY_CREATE_SUB_KEY", "KEY_ENUMERATE_SUB_KEYS", "KEY_NOTIFY", and "KEY_CREATE_LINK" access rights. winreg.KEY_WRITE Combines the STANDARD_RIGHTS_WRITE, "KEY_SET_VALUE", and "KEY_CREATE_SUB_KEY" access rights. winreg.KEY_READ Combines the STANDARD_RIGHTS_READ, "KEY_QUERY_VALUE", "KEY_ENUMERATE_SUB_KEYS", and "KEY_NOTIFY" values. winreg.KEY_EXECUTE Equivalent to "KEY_READ". winreg.KEY_QUERY_VALUE Required to query the values of a registry key. winreg.KEY_SET_VALUE Required to create, delete, or set a registry value. winreg.KEY_CREATE_SUB_KEY Required to create a subkey of a registry key. winreg.KEY_ENUMERATE_SUB_KEYS Required to enumerate the subkeys of a registry key. winreg.KEY_NOTIFY Required to request change notifications for a registry key or for subkeys of a registry key. winreg.KEY_CREATE_LINK Reserved for system use. 64-bit Specific ~~~~~~~~~~~~~~~ For more information, see Accessing an Alternate Registry View. winreg.KEY_WOW64_64KEY Indicates that an application on 64-bit Windows should operate on the 64-bit registry view. winreg.KEY_WOW64_32KEY Indicates that an application on 64-bit Windows should operate on the 32-bit registry view. Value Types ----------- For more information, see Registry Value Types. winreg.REG_BINARY Binary data in any form. winreg.REG_DWORD 32-bit number. winreg.REG_DWORD_LITTLE_ENDIAN A 32-bit number in little-endian format. Equivalent to "REG_DWORD". winreg.REG_DWORD_BIG_ENDIAN A 32-bit number in big-endian format. winreg.REG_EXPAND_SZ Null-terminated string containing references to environment variables ("%PATH%"). winreg.REG_LINK A Unicode symbolic link. winreg.REG_MULTI_SZ A sequence of null-terminated strings, terminated by two null characters. (Python handles this termination automatically.) winreg.REG_NONE No defined value type. winreg.REG_QWORD A 64-bit number. New in version 3.6. winreg.REG_QWORD_LITTLE_ENDIAN A 64-bit number in little-endian format. Equivalent to "REG_QWORD". New in version 3.6. winreg.REG_RESOURCE_LIST A device-driver resource list. winreg.REG_FULL_RESOURCE_DESCRIPTOR A hardware setting. winreg.REG_RESOURCE_REQUIREMENTS_LIST A hardware resource list. winreg.REG_SZ A null-terminated string. Registry Handle Objects ======================= This object wraps a Windows HKEY object, automatically closing it when the object is destroyed. To guarantee cleanup, you can call either the "Close()" method on the object, or the "CloseKey()" function. All registry functions in this module return one of these objects. All registry functions in this module which accept a handle object also accept an integer, however, use of the handle object is encouraged. Handle objects provide semantics for "__bool__()" – thus if handle: print("Yes") will print "Yes" if the handle is currently valid (has not been closed or detached). The object also support comparison semantics, so handle objects will compare true if they both reference the same underlying Windows handle value. Handle objects can be converted to an integer (e.g., using the built- in "int()" function), in which case the underlying Windows handle value is returned. You can also use the "Detach()" method to return the integer handle, and also disconnect the Windows handle from the handle object. PyHKEY.Close() Closes the underlying Windows handle. If the handle is already closed, no error is raised. PyHKEY.Detach() Detaches the Windows handle from the handle object. The result is an integer that holds the value of the handle before it is detached. If the handle is already detached or closed, this will return zero. After calling this function, the handle is effectively invalidated, but the handle is not closed. You would call this function when you need the underlying Win32 handle to exist beyond the lifetime of the handle object. Raises an auditing event "winreg.PyHKEY.Detach" with argument "key". PyHKEY.__enter__() PyHKEY.__exit__(*exc_info) The HKEY object implements "__enter__()" and "__exit__()" and thus supports the context protocol for the "with" statement: with OpenKey(HKEY_LOCAL_MACHINE, "foo") as key: ... # work with key will automatically close *key* when control leaves the "with" block. "winsound" — Sound-playing interface for Windows ************************************************ ====================================================================== The "winsound" module provides access to the basic sound-playing machinery provided by Windows platforms. It includes functions and several constants. winsound.Beep(frequency, duration) Beep the PC’s speaker. The *frequency* parameter specifies frequency, in hertz, of the sound, and must be in the range 37 through 32,767. The *duration* parameter specifies the number of milliseconds the sound should last. If the system is not able to beep the speaker, "RuntimeError" is raised. winsound.PlaySound(sound, flags) Call the underlying "PlaySound()" function from the Platform API. The *sound* parameter may be a filename, a system sound alias, audio data as a *bytes-like object*, or "None". Its interpretation depends on the value of *flags*, which can be a bitwise ORed combination of the constants described below. If the *sound* parameter is "None", any currently playing waveform sound is stopped. If the system indicates an error, "RuntimeError" is raised. winsound.MessageBeep(type=MB_OK) Call the underlying "MessageBeep()" function from the Platform API. This plays a sound as specified in the registry. The *type* argument specifies which sound to play; possible values are "-1", "MB_ICONASTERISK", "MB_ICONEXCLAMATION", "MB_ICONHAND", "MB_ICONQUESTION", and "MB_OK", all described below. The value "-1" produces a “simple beep”; this is the final fallback if a sound cannot be played otherwise. If the system indicates an error, "RuntimeError" is raised. winsound.SND_FILENAME The *sound* parameter is the name of a WAV file. Do not use with "SND_ALIAS". winsound.SND_ALIAS The *sound* parameter is a sound association name from the registry. If the registry contains no such name, play the system default sound unless "SND_NODEFAULT" is also specified. If no default sound is registered, raise "RuntimeError". Do not use with "SND_FILENAME". All Win32 systems support at least the following; most systems support many more: +----------------------------+------------------------------------------+ | "PlaySound()" *name* | Corresponding Control Panel Sound name | |============================|==========================================| | "'SystemAsterisk'" | Asterisk | +----------------------------+------------------------------------------+ | "'SystemExclamation'" | Exclamation | +----------------------------+------------------------------------------+ | "'SystemExit'" | Exit Windows | +----------------------------+------------------------------------------+ | "'SystemHand'" | Critical Stop | +----------------------------+------------------------------------------+ | "'SystemQuestion'" | Question | +----------------------------+------------------------------------------+ For example: import winsound # Play Windows exit sound. winsound.PlaySound("SystemExit", winsound.SND_ALIAS) # Probably play Windows default sound, if any is registered (because # "*" probably isn't the registered name of any sound). winsound.PlaySound("*", winsound.SND_ALIAS) winsound.SND_LOOP Play the sound repeatedly. The "SND_ASYNC" flag must also be used to avoid blocking. Cannot be used with "SND_MEMORY". winsound.SND_MEMORY The *sound* parameter to "PlaySound()" is a memory image of a WAV file, as a *bytes-like object*. Note: This module does not support playing from a memory image asynchronously, so a combination of this flag and "SND_ASYNC" will raise "RuntimeError". winsound.SND_PURGE Stop playing all instances of the specified sound. Note: This flag is not supported on modern Windows platforms. winsound.SND_ASYNC Return immediately, allowing sounds to play asynchronously. winsound.SND_NODEFAULT If the specified sound cannot be found, do not play the system default sound. winsound.SND_NOSTOP Do not interrupt sounds currently playing. winsound.SND_NOWAIT Return immediately if the sound driver is busy. Note: This flag is not supported on modern Windows platforms. winsound.MB_ICONASTERISK Play the "SystemDefault" sound. winsound.MB_ICONEXCLAMATION Play the "SystemExclamation" sound. winsound.MB_ICONHAND Play the "SystemHand" sound. winsound.MB_ICONQUESTION Play the "SystemQuestion" sound. winsound.MB_OK Play the "SystemDefault" sound. "wsgiref" — WSGI Utilities and Reference Implementation ******************************************************* ====================================================================== The Web Server Gateway Interface (WSGI) is a standard interface between web server software and web applications written in Python. Having a standard interface makes it easy to use an application that supports WSGI with a number of different web servers. Only authors of web servers and programming frameworks need to know every detail and corner case of the WSGI design. You don’t need to understand every detail of WSGI just to install a WSGI application or to write a web application using an existing framework. "wsgiref" is a reference implementation of the WSGI specification that can be used to add WSGI support to a web server or framework. It provides utilities for manipulating WSGI environment variables and response headers, base classes for implementing WSGI servers, a demo HTTP server that serves WSGI applications, and a validation tool that checks WSGI servers and applications for conformance to the WSGI specification (**PEP 3333**). See wsgi.readthedocs.io for more information about WSGI, and links to tutorials and other resources. "wsgiref.util" – WSGI environment utilities =========================================== This module provides a variety of utility functions for working with WSGI environments. A WSGI environment is a dictionary containing HTTP request variables as described in **PEP 3333**. All of the functions taking an *environ* parameter expect a WSGI-compliant dictionary to be supplied; please see **PEP 3333** for a detailed specification. wsgiref.util.guess_scheme(environ) Return a guess for whether "wsgi.url_scheme" should be “http” or “https”, by checking for a "HTTPS" environment variable in the *environ* dictionary. The return value is a string. This function is useful when creating a gateway that wraps CGI or a CGI-like protocol such as FastCGI. Typically, servers providing such protocols will include a "HTTPS" variable with a value of “1”, “yes”, or “on” when a request is received via SSL. So, this function returns “https” if such a value is found, and “http” otherwise. wsgiref.util.request_uri(environ, include_query=True) Return the full request URI, optionally including the query string, using the algorithm found in the “URL Reconstruction” section of **PEP 3333**. If *include_query* is false, the query string is not included in the resulting URI. wsgiref.util.application_uri(environ) Similar to "request_uri()", except that the "PATH_INFO" and "QUERY_STRING" variables are ignored. The result is the base URI of the application object addressed by the request. wsgiref.util.shift_path_info(environ) Shift a single name from "PATH_INFO" to "SCRIPT_NAME" and return the name. The *environ* dictionary is *modified* in-place; use a copy if you need to keep the original "PATH_INFO" or "SCRIPT_NAME" intact. If there are no remaining path segments in "PATH_INFO", "None" is returned. Typically, this routine is used to process each portion of a request URI path, for example to treat the path as a series of dictionary keys. This routine modifies the passed-in environment to make it suitable for invoking another WSGI application that is located at the target URI. For example, if there is a WSGI application at "/foo", and the request URI path is "/foo/bar/baz", and the WSGI application at "/foo" calls "shift_path_info()", it will receive the string “bar”, and the environment will be updated to be suitable for passing to a WSGI application at "/foo/bar". That is, "SCRIPT_NAME" will change from "/foo" to "/foo/bar", and "PATH_INFO" will change from "/bar/baz" to "/baz". When "PATH_INFO" is just a “/”, this routine returns an empty string and appends a trailing slash to "SCRIPT_NAME", even though empty path segments are normally ignored, and "SCRIPT_NAME" doesn’t normally end in a slash. This is intentional behavior, to ensure that an application can tell the difference between URIs ending in "/x" from ones ending in "/x/" when using this routine to do object traversal. wsgiref.util.setup_testing_defaults(environ) Update *environ* with trivial defaults for testing purposes. This routine adds various parameters required for WSGI, including "HTTP_HOST", "SERVER_NAME", "SERVER_PORT", "REQUEST_METHOD", "SCRIPT_NAME", "PATH_INFO", and all of the **PEP 3333**-defined "wsgi.*" variables. It only supplies default values, and does not replace any existing settings for these variables. This routine is intended to make it easier for unit tests of WSGI servers and applications to set up dummy environments. It should NOT be used by actual WSGI servers or applications, since the data is fake! Example usage: from wsgiref.util import setup_testing_defaults from wsgiref.simple_server import make_server # A relatively simple WSGI application. It's going to print out the # environment dictionary after being updated by setup_testing_defaults def simple_app(environ, start_response): setup_testing_defaults(environ) status = '200 OK' headers = [('Content-type', 'text/plain; charset=utf-8')] start_response(status, headers) ret = [("%s: %s\n" % (key, value)).encode("utf-8") for key, value in environ.items()] return ret with make_server('', 8000, simple_app) as httpd: print("Serving on port 8000...") httpd.serve_forever() In addition to the environment functions above, the "wsgiref.util" module also provides these miscellaneous utilities: wsgiref.util.is_hop_by_hop(header_name) Return "True" if ‘header_name’ is an HTTP/1.1 “Hop-by-Hop” header, as defined by **RFC 2616**. class wsgiref.util.FileWrapper(filelike, blksize=8192) A wrapper to convert a file-like object to an *iterator*. The resulting objects support both "__getitem__()" and "__iter__()" iteration styles, for compatibility with Python 2.1 and Jython. As the object is iterated over, the optional *blksize* parameter will be repeatedly passed to the *filelike* object’s "read()" method to obtain bytestrings to yield. When "read()" returns an empty bytestring, iteration is ended and is not resumable. If *filelike* has a "close()" method, the returned object will also have a "close()" method, and it will invoke the *filelike* object’s "close()" method when called. Example usage: from io import StringIO from wsgiref.util import FileWrapper # We're using a StringIO-buffer for as the file-like object filelike = StringIO("This is an example file-like object"*10) wrapper = FileWrapper(filelike, blksize=5) for chunk in wrapper: print(chunk) Deprecated since version 3.8: Support for "sequence protocol" is deprecated. "wsgiref.headers" – WSGI response header tools ============================================== This module provides a single class, "Headers", for convenient manipulation of WSGI response headers using a mapping-like interface. class wsgiref.headers.Headers([headers]) Create a mapping-like object wrapping *headers*, which must be a list of header name/value tuples as described in **PEP 3333**. The default value of *headers* is an empty list. "Headers" objects support typical mapping operations including "__getitem__()", "get()", "__setitem__()", "setdefault()", "__delitem__()" and "__contains__()". For each of these methods, the key is the header name (treated case-insensitively), and the value is the first value associated with that header name. Setting a header deletes any existing values for that header, then adds a new value at the end of the wrapped header list. Headers’ existing order is generally maintained, with new headers added to the end of the wrapped list. Unlike a dictionary, "Headers" objects do not raise an error when you try to get or delete a key that isn’t in the wrapped header list. Getting a nonexistent header just returns "None", and deleting a nonexistent header does nothing. "Headers" objects also support "keys()", "values()", and "items()" methods. The lists returned by "keys()" and "items()" can include the same key more than once if there is a multi-valued header. The "len()" of a "Headers" object is the same as the length of its "items()", which is the same as the length of the wrapped header list. In fact, the "items()" method just returns a copy of the wrapped header list. Calling "bytes()" on a "Headers" object returns a formatted bytestring suitable for transmission as HTTP response headers. Each header is placed on a line with its value, separated by a colon and a space. Each line is terminated by a carriage return and line feed, and the bytestring is terminated with a blank line. In addition to their mapping interface and formatting features, "Headers" objects also have the following methods for querying and adding multi-valued headers, and for adding headers with MIME parameters: get_all(name) Return a list of all the values for the named header. The returned list will be sorted in the order they appeared in the original header list or were added to this instance, and may contain duplicates. Any fields deleted and re-inserted are always appended to the header list. If no fields exist with the given name, returns an empty list. add_header(name, value, **_params) Add a (possibly multi-valued) header, with optional MIME parameters specified via keyword arguments. *name* is the header field to add. Keyword arguments can be used to set MIME parameters for the header field. Each parameter must be a string or "None". Underscores in parameter names are converted to dashes, since dashes are illegal in Python identifiers, but many MIME parameter names include dashes. If the parameter value is a string, it is added to the header value parameters in the form "name="value"". If it is "None", only the parameter name is added. (This is used for MIME parameters without a value.) Example usage: h.add_header('content-disposition', 'attachment', filename='bud.gif') The above will add a header that looks like this: Content-Disposition: attachment; filename="bud.gif" Changed in version 3.5: *headers* parameter is optional. "wsgiref.simple_server" – a simple WSGI HTTP server =================================================== This module implements a simple HTTP server (based on "http.server") that serves WSGI applications. Each server instance serves a single WSGI application on a given host and port. If you want to serve multiple applications on a single host and port, you should create a WSGI application that parses "PATH_INFO" to select which application to invoke for each request. (E.g., using the "shift_path_info()" function from "wsgiref.util".) wsgiref.simple_server.make_server(host, port, app, server_class=WSGIServer, handler_class=WSGIRequestHandler) Create a new WSGI server listening on *host* and *port*, accepting connections for *app*. The return value is an instance of the supplied *server_class*, and will process requests using the specified *handler_class*. *app* must be a WSGI application object, as defined by **PEP 3333**. Example usage: from wsgiref.simple_server import make_server, demo_app with make_server('', 8000, demo_app) as httpd: print("Serving HTTP on port 8000...") # Respond to requests until process is killed httpd.serve_forever() # Alternative: serve one request, then exit httpd.handle_request() wsgiref.simple_server.demo_app(environ, start_response) This function is a small but complete WSGI application that returns a text page containing the message “Hello world!” and a list of the key/value pairs provided in the *environ* parameter. It’s useful for verifying that a WSGI server (such as "wsgiref.simple_server") is able to run a simple WSGI application correctly. class wsgiref.simple_server.WSGIServer(server_address, RequestHandlerClass) Create a "WSGIServer" instance. *server_address* should be a "(host,port)" tuple, and *RequestHandlerClass* should be the subclass of "http.server.BaseHTTPRequestHandler" that will be used to process requests. You do not normally need to call this constructor, as the "make_server()" function can handle all the details for you. "WSGIServer" is a subclass of "http.server.HTTPServer", so all of its methods (such as "serve_forever()" and "handle_request()") are available. "WSGIServer" also provides these WSGI-specific methods: set_app(application) Sets the callable *application* as the WSGI application that will receive requests. get_app() Returns the currently-set application callable. Normally, however, you do not need to use these additional methods, as "set_app()" is normally called by "make_server()", and the "get_app()" exists mainly for the benefit of request handler instances. class wsgiref.simple_server.WSGIRequestHandler(request, client_address, server) Create an HTTP handler for the given *request* (i.e. a socket), *client_address* (a "(host,port)" tuple), and *server* ("WSGIServer" instance). You do not need to create instances of this class directly; they are automatically created as needed by "WSGIServer" objects. You can, however, subclass this class and supply it as a *handler_class* to the "make_server()" function. Some possibly relevant methods for overriding in subclasses: get_environ() Returns a dictionary containing the WSGI environment for a request. The default implementation copies the contents of the "WSGIServer" object’s "base_environ" dictionary attribute and then adds various headers derived from the HTTP request. Each call to this method should return a new dictionary containing all of the relevant CGI environment variables as specified in **PEP 3333**. get_stderr() Return the object that should be used as the "wsgi.errors" stream. The default implementation just returns "sys.stderr". handle() Process the HTTP request. The default implementation creates a handler instance using a "wsgiref.handlers" class to implement the actual WSGI application interface. "wsgiref.validate" — WSGI conformance checker ============================================= When creating new WSGI application objects, frameworks, servers, or middleware, it can be useful to validate the new code’s conformance using "wsgiref.validate". This module provides a function that creates WSGI application objects that validate communications between a WSGI server or gateway and a WSGI application object, to check both sides for protocol conformance. Note that this utility does not guarantee complete **PEP 3333** compliance; an absence of errors from this module does not necessarily mean that errors do not exist. However, if this module does produce an error, then it is virtually certain that either the server or application is not 100% compliant. This module is based on the "paste.lint" module from Ian Bicking’s “Python Paste” library. wsgiref.validate.validator(application) Wrap *application* and return a new WSGI application object. The returned application will forward all requests to the original *application*, and will check that both the *application* and the server invoking it are conforming to the WSGI specification and to **RFC 2616**. Any detected nonconformance results in an "AssertionError" being raised; note, however, that how these errors are handled is server- dependent. For example, "wsgiref.simple_server" and other servers based on "wsgiref.handlers" (that don’t override the error handling methods to do something else) will simply output a message that an error has occurred, and dump the traceback to "sys.stderr" or some other error stream. This wrapper may also generate output using the "warnings" module to indicate behaviors that are questionable but which may not actually be prohibited by **PEP 3333**. Unless they are suppressed using Python command-line options or the "warnings" API, any such warnings will be written to "sys.stderr" (*not* "wsgi.errors", unless they happen to be the same object). Example usage: from wsgiref.validate import validator from wsgiref.simple_server import make_server # Our callable object which is intentionally not compliant to the # standard, so the validator is going to break def simple_app(environ, start_response): status = '200 OK' # HTTP Status headers = [('Content-type', 'text/plain')] # HTTP Headers start_response(status, headers) # This is going to break because we need to return a list, and # the validator is going to inform us return b"Hello World" # This is the application wrapped in a validator validator_app = validator(simple_app) with make_server('', 8000, validator_app) as httpd: print("Listening on port 8000....") httpd.serve_forever() "wsgiref.handlers" – server/gateway base classes ================================================ This module provides base handler classes for implementing WSGI servers and gateways. These base classes handle most of the work of communicating with a WSGI application, as long as they are given a CGI-like environment, along with input, output, and error streams. class wsgiref.handlers.CGIHandler CGI-based invocation via "sys.stdin", "sys.stdout", "sys.stderr" and "os.environ". This is useful when you have a WSGI application and want to run it as a CGI script. Simply invoke "CGIHandler().run(app)", where "app" is the WSGI application object you wish to invoke. This class is a subclass of "BaseCGIHandler" that sets "wsgi.run_once" to true, "wsgi.multithread" to false, and "wsgi.multiprocess" to true, and always uses "sys" and "os" to obtain the necessary CGI streams and environment. class wsgiref.handlers.IISCGIHandler A specialized alternative to "CGIHandler", for use when deploying on Microsoft’s IIS web server, without having set the config allowPathInfo option (IIS>=7) or metabase allowPathInfoForScriptMappings (IIS<7). By default, IIS gives a "PATH_INFO" that duplicates the "SCRIPT_NAME" at the front, causing problems for WSGI applications that wish to implement routing. This handler strips any such duplicated path. IIS can be configured to pass the correct "PATH_INFO", but this causes another bug where "PATH_TRANSLATED" is wrong. Luckily this variable is rarely used and is not guaranteed by WSGI. On IIS<7, though, the setting can only be made on a vhost level, affecting all other script mappings, many of which break when exposed to the "PATH_TRANSLATED" bug. For this reason IIS<7 is almost never deployed with the fix (Even IIS7 rarely uses it because there is still no UI for it.). There is no way for CGI code to tell whether the option was set, so a separate handler class is provided. It is used in the same way as "CGIHandler", i.e., by calling "IISCGIHandler().run(app)", where "app" is the WSGI application object you wish to invoke. New in version 3.2. class wsgiref.handlers.BaseCGIHandler(stdin, stdout, stderr, environ, multithread=True, multiprocess=False) Similar to "CGIHandler", but instead of using the "sys" and "os" modules, the CGI environment and I/O streams are specified explicitly. The *multithread* and *multiprocess* values are used to set the "wsgi.multithread" and "wsgi.multiprocess" flags for any applications run by the handler instance. This class is a subclass of "SimpleHandler" intended for use with software other than HTTP “origin servers”. If you are writing a gateway protocol implementation (such as CGI, FastCGI, SCGI, etc.) that uses a "Status:" header to send an HTTP status, you probably want to subclass this instead of "SimpleHandler". class wsgiref.handlers.SimpleHandler(stdin, stdout, stderr, environ, multithread=True, multiprocess=False) Similar to "BaseCGIHandler", but designed for use with HTTP origin servers. If you are writing an HTTP server implementation, you will probably want to subclass this instead of "BaseCGIHandler". This class is a subclass of "BaseHandler". It overrides the "__init__()", "get_stdin()", "get_stderr()", "add_cgi_vars()", "_write()", and "_flush()" methods to support explicitly setting the environment and streams via the constructor. The supplied environment and streams are stored in the "stdin", "stdout", "stderr", and "environ" attributes. The "write()" method of *stdout* should write each chunk in full, like "io.BufferedIOBase". class wsgiref.handlers.BaseHandler This is an abstract base class for running WSGI applications. Each instance will handle a single HTTP request, although in principle you could create a subclass that was reusable for multiple requests. "BaseHandler" instances have only one method intended for external use: run(app) Run the specified WSGI application, *app*. All of the other "BaseHandler" methods are invoked by this method in the process of running the application, and thus exist primarily to allow customizing the process. The following methods MUST be overridden in a subclass: _write(data) Buffer the bytes *data* for transmission to the client. It’s okay if this method actually transmits the data; "BaseHandler" just separates write and flush operations for greater efficiency when the underlying system actually has such a distinction. _flush() Force buffered data to be transmitted to the client. It’s okay if this method is a no-op (i.e., if "_write()" actually sends the data). get_stdin() Return an input stream object suitable for use as the "wsgi.input" of the request currently being processed. get_stderr() Return an output stream object suitable for use as the "wsgi.errors" of the request currently being processed. add_cgi_vars() Insert CGI variables for the current request into the "environ" attribute. Here are some other methods and attributes you may wish to override. This list is only a summary, however, and does not include every method that can be overridden. You should consult the docstrings and source code for additional information before attempting to create a customized "BaseHandler" subclass. Attributes and methods for customizing the WSGI environment: wsgi_multithread The value to be used for the "wsgi.multithread" environment variable. It defaults to true in "BaseHandler", but may have a different default (or be set by the constructor) in the other subclasses. wsgi_multiprocess The value to be used for the "wsgi.multiprocess" environment variable. It defaults to true in "BaseHandler", but may have a different default (or be set by the constructor) in the other subclasses. wsgi_run_once The value to be used for the "wsgi.run_once" environment variable. It defaults to false in "BaseHandler", but "CGIHandler" sets it to true by default. os_environ The default environment variables to be included in every request’s WSGI environment. By default, this is a copy of "os.environ" at the time that "wsgiref.handlers" was imported, but subclasses can either create their own at the class or instance level. Note that the dictionary should be considered read-only, since the default value is shared between multiple classes and instances. server_software If the "origin_server" attribute is set, this attribute’s value is used to set the default "SERVER_SOFTWARE" WSGI environment variable, and also to set a default "Server:" header in HTTP responses. It is ignored for handlers (such as "BaseCGIHandler" and "CGIHandler") that are not HTTP origin servers. Changed in version 3.3: The term “Python” is replaced with implementation specific term like “CPython”, “Jython” etc. get_scheme() Return the URL scheme being used for the current request. The default implementation uses the "guess_scheme()" function from "wsgiref.util" to guess whether the scheme should be “http” or “https”, based on the current request’s "environ" variables. setup_environ() Set the "environ" attribute to a fully-populated WSGI environment. The default implementation uses all of the above methods and attributes, plus the "get_stdin()", "get_stderr()", and "add_cgi_vars()" methods and the "wsgi_file_wrapper" attribute. It also inserts a "SERVER_SOFTWARE" key if not present, as long as the "origin_server" attribute is a true value and the "server_software" attribute is set. Methods and attributes for customizing exception handling: log_exception(exc_info) Log the *exc_info* tuple in the server log. *exc_info* is a "(type, value, traceback)" tuple. The default implementation simply writes the traceback to the request’s "wsgi.errors" stream and flushes it. Subclasses can override this method to change the format or retarget the output, mail the traceback to an administrator, or whatever other action may be deemed suitable. traceback_limit The maximum number of frames to include in tracebacks output by the default "log_exception()" method. If "None", all frames are included. error_output(environ, start_response) This method is a WSGI application to generate an error page for the user. It is only invoked if an error occurs before headers are sent to the client. This method can access the current error information using "sys.exc_info()", and should pass that information to *start_response* when calling it (as described in the “Error Handling” section of **PEP 3333**). The default implementation just uses the "error_status", "error_headers", and "error_body" attributes to generate an output page. Subclasses can override this to produce more dynamic error output. Note, however, that it’s not recommended from a security perspective to spit out diagnostics to any old user; ideally, you should have to do something special to enable diagnostic output, which is why the default implementation doesn’t include any. error_status The HTTP status used for error responses. This should be a status string as defined in **PEP 3333**; it defaults to a 500 code and message. error_headers The HTTP headers used for error responses. This should be a list of WSGI response headers ("(name, value)" tuples), as described in **PEP 3333**. The default list just sets the content type to "text/plain". error_body The error response body. This should be an HTTP response body bytestring. It defaults to the plain text, “A server error occurred. Please contact the administrator.” Methods and attributes for **PEP 3333**’s “Optional Platform- Specific File Handling” feature: wsgi_file_wrapper A "wsgi.file_wrapper" factory, or "None". The default value of this attribute is the "wsgiref.util.FileWrapper" class. sendfile() Override to implement platform-specific file transmission. This method is called only if the application’s return value is an instance of the class specified by the "wsgi_file_wrapper" attribute. It should return a true value if it was able to successfully transmit the file, so that the default transmission code will not be executed. The default implementation of this method just returns a false value. Miscellaneous methods and attributes: origin_server This attribute should be set to a true value if the handler’s "_write()" and "_flush()" are being used to communicate directly to the client, rather than via a CGI-like gateway protocol that wants the HTTP status in a special "Status:" header. This attribute’s default value is true in "BaseHandler", but false in "BaseCGIHandler" and "CGIHandler". http_version If "origin_server" is true, this string attribute is used to set the HTTP version of the response set to the client. It defaults to ""1.0"". wsgiref.handlers.read_environ() Transcode CGI variables from "os.environ" to **PEP 3333** “bytes in unicode” strings, returning a new dictionary. This function is used by "CGIHandler" and "IISCGIHandler" in place of directly using "os.environ", which is not necessarily WSGI-compliant on all platforms and web servers using Python 3 – specifically, ones where the OS’s actual environment is Unicode (i.e. Windows), or ones where the environment is bytes, but the system encoding used by Python to decode it is anything other than ISO-8859-1 (e.g. Unix systems using UTF-8). If you are implementing a CGI-based handler of your own, you probably want to use this routine instead of just copying values out of "os.environ" directly. New in version 3.2. Examples ======== This is a working “Hello World” WSGI application: from wsgiref.simple_server import make_server # Every WSGI application must have an application object - a callable # object that accepts two arguments. For that purpose, we're going to # use a function (note that you're not limited to a function, you can # use a class for example). The first argument passed to the function # is a dictionary containing CGI-style environment variables and the # second variable is the callable object. def hello_world_app(environ, start_response): status = '200 OK' # HTTP Status headers = [('Content-type', 'text/plain; charset=utf-8')] # HTTP Headers start_response(status, headers) # The returned object is going to be printed return [b"Hello World"] with make_server('', 8000, hello_world_app) as httpd: print("Serving on port 8000...") # Serve until process is killed httpd.serve_forever() Example of a WSGI application serving the current directory, accept optional directory and port number (default: 8000) on the command line: #!/usr/bin/env python3 ''' Small wsgiref based web server. Takes a path to serve from and an optional port number (defaults to 8000), then tries to serve files. Mime types are guessed from the file names, 404 errors are raised if the file is not found. Used for the make serve target in Doc. ''' import sys import os import mimetypes from wsgiref import simple_server, util def app(environ, respond): fn = os.path.join(path, environ['PATH_INFO'][1:]) if '.' not in fn.split(os.path.sep)[-1]: fn = os.path.join(fn, 'index.html') type = mimetypes.guess_type(fn)[0] if os.path.exists(fn): respond('200 OK', [('Content-Type', type)]) return util.FileWrapper(open(fn, "rb")) else: respond('404 Not Found', [('Content-Type', 'text/plain')]) return [b'not found'] if __name__ == '__main__': path = sys.argv[1] if len(sys.argv) > 1 else os.getcwd() port = int(sys.argv[2]) if len(sys.argv) > 2 else 8000 httpd = simple_server.make_server('', port, app) print("Serving {} on port {}, control-C to stop".format(path, port)) try: httpd.serve_forever() except KeyboardInterrupt: print("Shutting down.") httpd.server_close() "xdrlib" — Encode and decode XDR data ************************************* **Source code:** Lib/xdrlib.py Deprecated since version 3.11: The "xdrlib" module is deprecated (see **PEP 594** for details). ====================================================================== The "xdrlib" module supports the External Data Representation Standard as described in **RFC 1014**, written by Sun Microsystems, Inc. June 1987. It supports most of the data types described in the RFC. The "xdrlib" module defines two classes, one for packing variables into XDR representation, and another for unpacking from XDR representation. There are also two exception classes. class xdrlib.Packer "Packer" is the class for packing data into XDR representation. The "Packer" class is instantiated with no arguments. class xdrlib.Unpacker(data) "Unpacker" is the complementary class which unpacks XDR data values from a string buffer. The input buffer is given as *data*. See also: **RFC 1014** - XDR: External Data Representation Standard This RFC defined the encoding of data which was XDR at the time this module was originally written. It has apparently been obsoleted by **RFC 1832**. **RFC 1832** - XDR: External Data Representation Standard Newer RFC that provides a revised definition of XDR. Packer Objects ============== "Packer" instances have the following methods: Packer.get_buffer() Returns the current pack buffer as a string. Packer.reset() Resets the pack buffer to the empty string. In general, you can pack any of the most common XDR data types by calling the appropriate "pack_type()" method. Each method takes a single argument, the value to pack. The following simple data type packing methods are supported: "pack_uint()", "pack_int()", "pack_enum()", "pack_bool()", "pack_uhyper()", and "pack_hyper()". Packer.pack_float(value) Packs the single-precision floating point number *value*. Packer.pack_double(value) Packs the double-precision floating point number *value*. The following methods support packing strings, bytes, and opaque data: Packer.pack_fstring(n, s) Packs a fixed length string, *s*. *n* is the length of the string but it is *not* packed into the data buffer. The string is padded with null bytes if necessary to guaranteed 4 byte alignment. Packer.pack_fopaque(n, data) Packs a fixed length opaque data stream, similarly to "pack_fstring()". Packer.pack_string(s) Packs a variable length string, *s*. The length of the string is first packed as an unsigned integer, then the string data is packed with "pack_fstring()". Packer.pack_opaque(data) Packs a variable length opaque data string, similarly to "pack_string()". Packer.pack_bytes(bytes) Packs a variable length byte stream, similarly to "pack_string()". The following methods support packing arrays and lists: Packer.pack_list(list, pack_item) Packs a *list* of homogeneous items. This method is useful for lists with an indeterminate size; i.e. the size is not available until the entire list has been walked. For each item in the list, an unsigned integer "1" is packed first, followed by the data value from the list. *pack_item* is the function that is called to pack the individual item. At the end of the list, an unsigned integer "0" is packed. For example, to pack a list of integers, the code might appear like this: import xdrlib p = xdrlib.Packer() p.pack_list([1, 2, 3], p.pack_int) Packer.pack_farray(n, array, pack_item) Packs a fixed length list (*array*) of homogeneous items. *n* is the length of the list; it is *not* packed into the buffer, but a "ValueError" exception is raised if "len(array)" is not equal to *n*. As above, *pack_item* is the function used to pack each element. Packer.pack_array(list, pack_item) Packs a variable length *list* of homogeneous items. First, the length of the list is packed as an unsigned integer, then each element is packed as in "pack_farray()" above. Unpacker Objects ================ The "Unpacker" class offers the following methods: Unpacker.reset(data) Resets the string buffer with the given *data*. Unpacker.get_position() Returns the current unpack position in the data buffer. Unpacker.set_position(position) Sets the data buffer unpack position to *position*. You should be careful about using "get_position()" and "set_position()". Unpacker.get_buffer() Returns the current unpack data buffer as a string. Unpacker.done() Indicates unpack completion. Raises an "Error" exception if all of the data has not been unpacked. In addition, every data type that can be packed with a "Packer", can be unpacked with an "Unpacker". Unpacking methods are of the form "unpack_type()", and take no arguments. They return the unpacked object. Unpacker.unpack_float() Unpacks a single-precision floating point number. Unpacker.unpack_double() Unpacks a double-precision floating point number, similarly to "unpack_float()". In addition, the following methods unpack strings, bytes, and opaque data: Unpacker.unpack_fstring(n) Unpacks and returns a fixed length string. *n* is the number of characters expected. Padding with null bytes to guaranteed 4 byte alignment is assumed. Unpacker.unpack_fopaque(n) Unpacks and returns a fixed length opaque data stream, similarly to "unpack_fstring()". Unpacker.unpack_string() Unpacks and returns a variable length string. The length of the string is first unpacked as an unsigned integer, then the string data is unpacked with "unpack_fstring()". Unpacker.unpack_opaque() Unpacks and returns a variable length opaque data string, similarly to "unpack_string()". Unpacker.unpack_bytes() Unpacks and returns a variable length byte stream, similarly to "unpack_string()". The following methods support unpacking arrays and lists: Unpacker.unpack_list(unpack_item) Unpacks and returns a list of homogeneous items. The list is unpacked one element at a time by first unpacking an unsigned integer flag. If the flag is "1", then the item is unpacked and appended to the list. A flag of "0" indicates the end of the list. *unpack_item* is the function that is called to unpack the items. Unpacker.unpack_farray(n, unpack_item) Unpacks and returns (as a list) a fixed length array of homogeneous items. *n* is number of list elements to expect in the buffer. As above, *unpack_item* is the function used to unpack each element. Unpacker.unpack_array(unpack_item) Unpacks and returns a variable length *list* of homogeneous items. First, the length of the list is unpacked as an unsigned integer, then each element is unpacked as in "unpack_farray()" above. Exceptions ========== Exceptions in this module are coded as class instances: exception xdrlib.Error The base exception class. "Error" has a single public attribute "msg" containing the description of the error. exception xdrlib.ConversionError Class derived from "Error". Contains no additional instance variables. Here is an example of how you would catch one of these exceptions: import xdrlib p = xdrlib.Packer() try: p.pack_double(8.01) except xdrlib.ConversionError as instance: print('packing the double failed:', instance.msg) "xml.dom.minidom" — Minimal DOM implementation ********************************************** **Source code:** Lib/xml/dom/minidom.py ====================================================================== "xml.dom.minidom" is a minimal implementation of the Document Object Model interface, with an API similar to that in other languages. It is intended to be simpler than the full DOM and also significantly smaller. Users who are not already proficient with the DOM should consider using the "xml.etree.ElementTree" module for their XML processing instead. Warning: The "xml.dom.minidom" module is not secure against maliciously constructed data. If you need to parse untrusted or unauthenticated data see XML vulnerabilities. DOM applications typically start by parsing some XML into a DOM. With "xml.dom.minidom", this is done through the parse functions: from xml.dom.minidom import parse, parseString dom1 = parse('c:\\temp\\mydata.xml') # parse an XML file by name datasource = open('c:\\temp\\mydata.xml') dom2 = parse(datasource) # parse an open file dom3 = parseString('<myxml>Some data<empty/> some more data</myxml>') The "parse()" function can take either a filename or an open file object. xml.dom.minidom.parse(filename_or_file, parser=None, bufsize=None) Return a "Document" from the given input. *filename_or_file* may be either a file name, or a file-like object. *parser*, if given, must be a SAX2 parser object. This function will change the document handler of the parser and activate namespace support; other parser configuration (like setting an entity resolver) must have been done in advance. If you have XML in a string, you can use the "parseString()" function instead: xml.dom.minidom.parseString(string, parser=None) Return a "Document" that represents the *string*. This method creates an "io.StringIO" object for the string and passes that on to "parse()". Both functions return a "Document" object representing the content of the document. What the "parse()" and "parseString()" functions do is connect an XML parser with a “DOM builder” that can accept parse events from any SAX parser and convert them into a DOM tree. The name of the functions are perhaps misleading, but are easy to grasp when learning the interfaces. The parsing of the document will be completed before these functions return; it’s simply that these functions do not provide a parser implementation themselves. You can also create a "Document" by calling a method on a “DOM Implementation” object. You can get this object either by calling the "getDOMImplementation()" function in the "xml.dom" package or the "xml.dom.minidom" module. Once you have a "Document", you can add child nodes to it to populate the DOM: from xml.dom.minidom import getDOMImplementation impl = getDOMImplementation() newdoc = impl.createDocument(None, "some_tag", None) top_element = newdoc.documentElement text = newdoc.createTextNode('Some textual content.') top_element.appendChild(text) Once you have a DOM document object, you can access the parts of your XML document through its properties and methods. These properties are defined in the DOM specification. The main property of the document object is the "documentElement" property. It gives you the main element in the XML document: the one that holds all others. Here is an example program: dom3 = parseString("<myxml>Some data</myxml>") assert dom3.documentElement.tagName == "myxml" When you are finished with a DOM tree, you may optionally call the "unlink()" method to encourage early cleanup of the now-unneeded objects. "unlink()" is an "xml.dom.minidom"-specific extension to the DOM API that renders the node and its descendants are essentially useless. Otherwise, Python’s garbage collector will eventually take care of the objects in the tree. See also: Document Object Model (DOM) Level 1 Specification The W3C recommendation for the DOM supported by "xml.dom.minidom". DOM Objects =========== The definition of the DOM API for Python is given as part of the "xml.dom" module documentation. This section lists the differences between the API and "xml.dom.minidom". Node.unlink() Break internal references within the DOM so that it will be garbage collected on versions of Python without cyclic GC. Even when cyclic GC is available, using this can make large amounts of memory available sooner, so calling this on DOM objects as soon as they are no longer needed is good practice. This only needs to be called on the "Document" object, but may be called on child nodes to discard children of that node. You can avoid calling this method explicitly by using the "with" statement. The following code will automatically unlink *dom* when the "with" block is exited: with xml.dom.minidom.parse(datasource) as dom: ... # Work with dom. Node.writexml(writer, indent="", addindent="", newl="", encoding=None, standalone=None) Write XML to the writer object. The writer receives texts but not bytes as input, it should have a "write()" method which matches that of the file object interface. The *indent* parameter is the indentation of the current node. The *addindent* parameter is the incremental indentation to use for subnodes of the current one. The *newl* parameter specifies the string to use to terminate newlines. For the "Document" node, an additional keyword argument *encoding* can be used to specify the encoding field of the XML header. Similarly, explicitly stating the *standalone* argument causes the standalone document declarations to be added to the prologue of the XML document. If the value is set to *True*, *standalone=”yes”* is added, otherwise it is set to *“no”*. Not stating the argument will omit the declaration from the document. Changed in version 3.8: The "writexml()" method now preserves the attribute order specified by the user. Changed in version 3.9: The *standalone* parameter was added. Node.toxml(encoding=None, standalone=None) Return a string or byte string containing the XML represented by the DOM node. With an explicit *encoding* [1] argument, the result is a byte string in the specified encoding. With no *encoding* argument, the result is a Unicode string, and the XML declaration in the resulting string does not specify an encoding. Encoding this string in an encoding other than UTF-8 is likely incorrect, since UTF-8 is the default encoding of XML. The *standalone* argument behaves exactly as in "writexml()". Changed in version 3.8: The "toxml()" method now preserves the attribute order specified by the user. Changed in version 3.9: The *standalone* parameter was added. Node.toprettyxml(indent="\t", newl="\n", encoding=None, standalone=None) Return a pretty-printed version of the document. *indent* specifies the indentation string and defaults to a tabulator; *newl* specifies the string emitted at the end of each line and defaults to "\n". The *encoding* argument behaves like the corresponding argument of "toxml()". The *standalone* argument behaves exactly as in "writexml()". Changed in version 3.8: The "toprettyxml()" method now preserves the attribute order specified by the user. Changed in version 3.9: The *standalone* parameter was added. DOM Example =========== This example program is a fairly realistic example of a simple program. In this particular case, we do not take much advantage of the flexibility of the DOM. import xml.dom.minidom document = """\ <slideshow> <title>Demo slideshow Slide title This is a demo Of a program for processing slides Another demo slide It is important To have more than one slide """ dom = xml.dom.minidom.parseString(document) def getText(nodelist): rc = [] for node in nodelist: if node.nodeType == node.TEXT_NODE: rc.append(node.data) return ''.join(rc) def handleSlideshow(slideshow): print("") handleSlideshowTitle(slideshow.getElementsByTagName("title")[0]) slides = slideshow.getElementsByTagName("slide") handleToc(slides) handleSlides(slides) print("") def handleSlides(slides): for slide in slides: handleSlide(slide) def handleSlide(slide): handleSlideTitle(slide.getElementsByTagName("title")[0]) handlePoints(slide.getElementsByTagName("point")) def handleSlideshowTitle(title): print("%s" % getText(title.childNodes)) def handleSlideTitle(title): print("

%s

" % getText(title.childNodes)) def handlePoints(points): print("
    ") for point in points: handlePoint(point) print("
") def handlePoint(point): print("
  • %s
  • " % getText(point.childNodes)) def handleToc(slides): for slide in slides: title = slide.getElementsByTagName("title")[0] print("

    %s

    " % getText(title.childNodes)) handleSlideshow(dom) minidom and the DOM standard ============================ The "xml.dom.minidom" module is essentially a DOM 1.0-compatible DOM with some DOM 2 features (primarily namespace features). Usage of the DOM interface in Python is straight-forward. The following mapping rules apply: * Interfaces are accessed through instance objects. Applications should not instantiate the classes themselves; they should use the creator functions available on the "Document" object. Derived interfaces support all operations (and attributes) from the base interfaces, plus any new operations. * Operations are used as methods. Since the DOM uses only "in" parameters, the arguments are passed in normal order (from left to right). There are no optional arguments. "void" operations return "None". * IDL attributes map to instance attributes. For compatibility with the OMG IDL language mapping for Python, an attribute "foo" can also be accessed through accessor methods "_get_foo()" and "_set_foo()". "readonly" attributes must not be changed; this is not enforced at runtime. * The types "short int", "unsigned int", "unsigned long long", and "boolean" all map to Python integer objects. * The type "DOMString" maps to Python strings. "xml.dom.minidom" supports either bytes or strings, but will normally produce strings. Values of type "DOMString" may also be "None" where allowed to have the IDL "null" value by the DOM specification from the W3C. * "const" declarations map to variables in their respective scope (e.g. "xml.dom.minidom.Node.PROCESSING_INSTRUCTION_NODE"); they must not be changed. * "DOMException" is currently not supported in "xml.dom.minidom". Instead, "xml.dom.minidom" uses standard Python exceptions such as "TypeError" and "AttributeError". * "NodeList" objects are implemented using Python’s built-in list type. These objects provide the interface defined in the DOM specification, but with earlier versions of Python they do not support the official API. They are, however, much more “Pythonic” than the interface defined in the W3C recommendations. The following interfaces have no implementation in "xml.dom.minidom": * "DOMTimeStamp" * "EntityReference" Most of these reflect information in the XML document that is not of general utility to most DOM users. -[ Footnotes ]- [1] The encoding name included in the XML output should conform to the appropriate standards. For example, “UTF-8” is valid, but “UTF8” is not valid in an XML document’s declaration, even though Python accepts it as an encoding name. See https://www.w3.org/TR/2006 /REC-xml11-20060816/#NT-EncodingDecl and https://www.iana.org/assignments/character-sets/character- sets.xhtml. "xml.dom.pulldom" — Support for building partial DOM trees ********************************************************** **Source code:** Lib/xml/dom/pulldom.py ====================================================================== The "xml.dom.pulldom" module provides a “pull parser” which can also be asked to produce DOM-accessible fragments of the document where necessary. The basic concept involves pulling “events” from a stream of incoming XML and processing them. In contrast to SAX which also employs an event-driven processing model together with callbacks, the user of a pull parser is responsible for explicitly pulling events from the stream, looping over those events until either processing is finished or an error condition occurs. Warning: The "xml.dom.pulldom" module is not secure against maliciously constructed data. If you need to parse untrusted or unauthenticated data see XML vulnerabilities. Changed in version 3.7.1: The SAX parser no longer processes general external entities by default to increase security by default. To enable processing of external entities, pass a custom parser instance in: from xml.dom.pulldom import parse from xml.sax import make_parser from xml.sax.handler import feature_external_ges parser = make_parser() parser.setFeature(feature_external_ges, True) parse(filename, parser=parser) Example: from xml.dom import pulldom doc = pulldom.parse('sales_items.xml') for event, node in doc: if event == pulldom.START_ELEMENT and node.tagName == 'item': if int(node.getAttribute('price')) > 50: doc.expandNode(node) print(node.toxml()) "event" is a constant and can be one of: * "START_ELEMENT" * "END_ELEMENT" * "COMMENT" * "START_DOCUMENT" * "END_DOCUMENT" * "CHARACTERS" * "PROCESSING_INSTRUCTION" * "IGNORABLE_WHITESPACE" "node" is an object of type "xml.dom.minidom.Document", "xml.dom.minidom.Element" or "xml.dom.minidom.Text". Since the document is treated as a “flat” stream of events, the document “tree” is implicitly traversed and the desired elements are found regardless of their depth in the tree. In other words, one does not need to consider hierarchical issues such as recursive searching of the document nodes, although if the context of elements were important, one would either need to maintain some context-related state (i.e. remembering where one is in the document at any given point) or to make use of the "DOMEventStream.expandNode()" method and switch to DOM-related processing. class xml.dom.pulldom.PullDom(documentFactory=None) Subclass of "xml.sax.handler.ContentHandler". class xml.dom.pulldom.SAX2DOM(documentFactory=None) Subclass of "xml.sax.handler.ContentHandler". xml.dom.pulldom.parse(stream_or_string, parser=None, bufsize=None) Return a "DOMEventStream" from the given input. *stream_or_string* may be either a file name, or a file-like object. *parser*, if given, must be an "XMLReader" object. This function will change the document handler of the parser and activate namespace support; other parser configuration (like setting an entity resolver) must have been done in advance. If you have XML in a string, you can use the "parseString()" function instead: xml.dom.pulldom.parseString(string, parser=None) Return a "DOMEventStream" that represents the (Unicode) *string*. xml.dom.pulldom.default_bufsize Default value for the *bufsize* parameter to "parse()". The value of this variable can be changed before calling "parse()" and the new value will take effect. DOMEventStream Objects ====================== class xml.dom.pulldom.DOMEventStream(stream, parser, bufsize) Deprecated since version 3.8: Support for "sequence protocol" is deprecated. getEvent() Return a tuple containing *event* and the current *node* as "xml.dom.minidom.Document" if event equals "START_DOCUMENT", "xml.dom.minidom.Element" if event equals "START_ELEMENT" or "END_ELEMENT" or "xml.dom.minidom.Text" if event equals "CHARACTERS". The current node does not contain information about its children, unless "expandNode()" is called. expandNode(node) Expands all children of *node* into *node*. Example: from xml.dom import pulldom xml = 'Foo

    Some text

    and more

    ' doc = pulldom.parseString(xml) for event, node in doc: if event == pulldom.START_ELEMENT and node.tagName == 'p': # Following statement only prints '

    ' print(node.toxml()) doc.expandNode(node) # Following statement prints node with all its children '

    Some text

    and more

    ' print(node.toxml()) reset() "xml.dom" — The Document Object Model API ***************************************** **Source code:** Lib/xml/dom/__init__.py ====================================================================== The Document Object Model, or “DOM,” is a cross-language API from the World Wide Web Consortium (W3C) for accessing and modifying XML documents. A DOM implementation presents an XML document as a tree structure, or allows client code to build such a structure from scratch. It then gives access to the structure through a set of objects which provided well-known interfaces. The DOM is extremely useful for random-access applications. SAX only allows you a view of one bit of the document at a time. If you are looking at one SAX element, you have no access to another. If you are looking at a text node, you have no access to a containing element. When you write a SAX application, you need to keep track of your program’s position in the document somewhere in your own code. SAX does not do it for you. Also, if you need to look ahead in the XML document, you are just out of luck. Some applications are simply impossible in an event driven model with no access to a tree. Of course you could build some sort of tree yourself in SAX events, but the DOM allows you to avoid writing that code. The DOM is a standard tree representation for XML data. The Document Object Model is being defined by the W3C in stages, or “levels” in their terminology. The Python mapping of the API is substantially based on the DOM Level 2 recommendation. DOM applications typically start by parsing some XML into a DOM. How this is accomplished is not covered at all by DOM Level 1, and Level 2 provides only limited improvements: There is a "DOMImplementation" object class which provides access to "Document" creation methods, but no way to access an XML reader/parser/Document builder in an implementation-independent way. There is also no well-defined way to access these methods without an existing "Document" object. In Python, each DOM implementation will provide a function "getDOMImplementation()". DOM Level 3 adds a Load/Store specification, which defines an interface to the reader, but this is not yet available in the Python standard library. Once you have a DOM document object, you can access the parts of your XML document through its properties and methods. These properties are defined in the DOM specification; this portion of the reference manual describes the interpretation of the specification in Python. The specification provided by the W3C defines the DOM API for Java, ECMAScript, and OMG IDL. The Python mapping defined here is based in large part on the IDL version of the specification, but strict compliance is not required (though implementations are free to support the strict mapping from IDL). See section Conformance for a detailed discussion of mapping requirements. See also: Document Object Model (DOM) Level 2 Specification The W3C recommendation upon which the Python DOM API is based. Document Object Model (DOM) Level 1 Specification The W3C recommendation for the DOM supported by "xml.dom.minidom". Python Language Mapping Specification This specifies the mapping from OMG IDL to Python. Module Contents =============== The "xml.dom" contains the following functions: xml.dom.registerDOMImplementation(name, factory) Register the *factory* function with the name *name*. The factory function should return an object which implements the "DOMImplementation" interface. The factory function can return the same object every time, or a new one for each call, as appropriate for the specific implementation (e.g. if that implementation supports some customization). xml.dom.getDOMImplementation(name=None, features=()) Return a suitable DOM implementation. The *name* is either well- known, the module name of a DOM implementation, or "None". If it is not "None", imports the corresponding module and returns a "DOMImplementation" object if the import succeeds. If no name is given, and if the environment variable "PYTHON_DOM" is set, this variable is used to find the implementation. If name is not given, this examines the available implementations to find one with the required feature set. If no implementation can be found, raise an "ImportError". The features list must be a sequence of "(feature, version)" pairs which are passed to the "hasFeature()" method on available "DOMImplementation" objects. Some convenience constants are also provided: xml.dom.EMPTY_NAMESPACE The value used to indicate that no namespace is associated with a node in the DOM. This is typically found as the "namespaceURI" of a node, or used as the *namespaceURI* parameter to a namespaces- specific method. xml.dom.XML_NAMESPACE The namespace URI associated with the reserved prefix "xml", as defined by Namespaces in XML (section 4). xml.dom.XMLNS_NAMESPACE The namespace URI for namespace declarations, as defined by Document Object Model (DOM) Level 2 Core Specification (section 1.1.8). xml.dom.XHTML_NAMESPACE The URI of the XHTML namespace as defined by XHTML 1.0: The Extensible HyperText Markup Language (section 3.1.1). In addition, "xml.dom" contains a base "Node" class and the DOM exception classes. The "Node" class provided by this module does not implement any of the methods or attributes defined by the DOM specification; concrete DOM implementations must provide those. The "Node" class provided as part of this module does provide the constants used for the "nodeType" attribute on concrete "Node" objects; they are located within the class rather than at the module level to conform with the DOM specifications. Objects in the DOM ================== The definitive documentation for the DOM is the DOM specification from the W3C. Note that DOM attributes may also be manipulated as nodes instead of as simple strings. It is fairly rare that you must do this, however, so this usage is not yet documented. +----------------------------------+-------------------------------------+-----------------------------------+ | Interface | Section | Purpose | |==================================|=====================================|===================================| | "DOMImplementation" | DOMImplementation Objects | Interface to the underlying | | | | implementation. | +----------------------------------+-------------------------------------+-----------------------------------+ | "Node" | Node Objects | Base interface for most objects | | | | in a document. | +----------------------------------+-------------------------------------+-----------------------------------+ | "NodeList" | NodeList Objects | Interface for a sequence of | | | | nodes. | +----------------------------------+-------------------------------------+-----------------------------------+ | "DocumentType" | DocumentType Objects | Information about the | | | | declarations needed to process a | | | | document. | +----------------------------------+-------------------------------------+-----------------------------------+ | "Document" | Document Objects | Object which represents an entire | | | | document. | +----------------------------------+-------------------------------------+-----------------------------------+ | "Element" | Element Objects | Element nodes in the document | | | | hierarchy. | +----------------------------------+-------------------------------------+-----------------------------------+ | "Attr" | Attr Objects | Attribute value nodes on element | | | | nodes. | +----------------------------------+-------------------------------------+-----------------------------------+ | "Comment" | Comment Objects | Representation of comments in the | | | | source document. | +----------------------------------+-------------------------------------+-----------------------------------+ | "Text" | Text and CDATASection Objects | Nodes containing textual content | | | | from the document. | +----------------------------------+-------------------------------------+-----------------------------------+ | "ProcessingInstruction" | ProcessingInstruction Objects | Processing instruction | | | | representation. | +----------------------------------+-------------------------------------+-----------------------------------+ An additional section describes the exceptions defined for working with the DOM in Python. DOMImplementation Objects ------------------------- The "DOMImplementation" interface provides a way for applications to determine the availability of particular features in the DOM they are using. DOM Level 2 added the ability to create new "Document" and "DocumentType" objects using the "DOMImplementation" as well. DOMImplementation.hasFeature(feature, version) Return "True" if the feature identified by the pair of strings *feature* and *version* is implemented. DOMImplementation.createDocument(namespaceUri, qualifiedName, doctype) Return a new "Document" object (the root of the DOM), with a child "Element" object having the given *namespaceUri* and *qualifiedName*. The *doctype* must be a "DocumentType" object created by "createDocumentType()", or "None". In the Python DOM API, the first two arguments can also be "None" in order to indicate that no "Element" child is to be created. DOMImplementation.createDocumentType(qualifiedName, publicId, systemId) Return a new "DocumentType" object that encapsulates the given *qualifiedName*, *publicId*, and *systemId* strings, representing the information contained in an XML document type declaration. Node Objects ------------ All of the components of an XML document are subclasses of "Node". Node.nodeType An integer representing the node type. Symbolic constants for the types are on the "Node" object: "ELEMENT_NODE", "ATTRIBUTE_NODE", "TEXT_NODE", "CDATA_SECTION_NODE", "ENTITY_NODE", "PROCESSING_INSTRUCTION_NODE", "COMMENT_NODE", "DOCUMENT_NODE", "DOCUMENT_TYPE_NODE", "NOTATION_NODE". This is a read-only attribute. Node.parentNode The parent of the current node, or "None" for the document node. The value is always a "Node" object or "None". For "Element" nodes, this will be the parent element, except for the root element, in which case it will be the "Document" object. For "Attr" nodes, this is always "None". This is a read-only attribute. Node.attributes A "NamedNodeMap" of attribute objects. Only elements have actual values for this; others provide "None" for this attribute. This is a read-only attribute. Node.previousSibling The node that immediately precedes this one with the same parent. For instance the element with an end-tag that comes just before the *self* element’s start-tag. Of course, XML documents are made up of more than just elements so the previous sibling could be text, a comment, or something else. If this node is the first child of the parent, this attribute will be "None". This is a read-only attribute. Node.nextSibling The node that immediately follows this one with the same parent. See also "previousSibling". If this is the last child of the parent, this attribute will be "None". This is a read-only attribute. Node.childNodes A list of nodes contained within this node. This is a read-only attribute. Node.firstChild The first child of the node, if there are any, or "None". This is a read-only attribute. Node.lastChild The last child of the node, if there are any, or "None". This is a read-only attribute. Node.localName The part of the "tagName" following the colon if there is one, else the entire "tagName". The value is a string. Node.prefix The part of the "tagName" preceding the colon if there is one, else the empty string. The value is a string, or "None". Node.namespaceURI The namespace associated with the element name. This will be a string or "None". This is a read-only attribute. Node.nodeName This has a different meaning for each node type; see the DOM specification for details. You can always get the information you would get here from another property such as the "tagName" property for elements or the "name" property for attributes. For all node types, the value of this attribute will be either a string or "None". This is a read-only attribute. Node.nodeValue This has a different meaning for each node type; see the DOM specification for details. The situation is similar to that with "nodeName". The value is a string or "None". Node.hasAttributes() Return "True" if the node has any attributes. Node.hasChildNodes() Return "True" if the node has any child nodes. Node.isSameNode(other) Return "True" if *other* refers to the same node as this node. This is especially useful for DOM implementations which use any sort of proxy architecture (because more than one object can refer to the same node). Note: This is based on a proposed DOM Level 3 API which is still in the “working draft” stage, but this particular interface appears uncontroversial. Changes from the W3C will not necessarily affect this method in the Python DOM interface (though any new W3C API for this would also be supported). Node.appendChild(newChild) Add a new child node to this node at the end of the list of children, returning *newChild*. If the node was already in the tree, it is removed first. Node.insertBefore(newChild, refChild) Insert a new child node before an existing child. It must be the case that *refChild* is a child of this node; if not, "ValueError" is raised. *newChild* is returned. If *refChild* is "None", it inserts *newChild* at the end of the children’s list. Node.removeChild(oldChild) Remove a child node. *oldChild* must be a child of this node; if not, "ValueError" is raised. *oldChild* is returned on success. If *oldChild* will not be used further, its "unlink()" method should be called. Node.replaceChild(newChild, oldChild) Replace an existing node with a new node. It must be the case that *oldChild* is a child of this node; if not, "ValueError" is raised. Node.normalize() Join adjacent text nodes so that all stretches of text are stored as single "Text" instances. This simplifies processing text from a DOM tree for many applications. Node.cloneNode(deep) Clone this node. Setting *deep* means to clone all child nodes as well. This returns the clone. NodeList Objects ---------------- A "NodeList" represents a sequence of nodes. These objects are used in two ways in the DOM Core recommendation: an "Element" object provides one as its list of child nodes, and the "getElementsByTagName()" and "getElementsByTagNameNS()" methods of "Node" return objects with this interface to represent query results. The DOM Level 2 recommendation defines one method and one attribute for these objects: NodeList.item(i) Return the *i*’th item from the sequence, if there is one, or "None". The index *i* is not allowed to be less than zero or greater than or equal to the length of the sequence. NodeList.length The number of nodes in the sequence. In addition, the Python DOM interface requires that some additional support is provided to allow "NodeList" objects to be used as Python sequences. All "NodeList" implementations must include support for "__len__()" and "__getitem__()"; this allows iteration over the "NodeList" in "for" statements and proper support for the "len()" built-in function. If a DOM implementation supports modification of the document, the "NodeList" implementation must also support the "__setitem__()" and "__delitem__()" methods. DocumentType Objects -------------------- Information about the notations and entities declared by a document (including the external subset if the parser uses it and can provide the information) is available from a "DocumentType" object. The "DocumentType" for a document is available from the "Document" object’s "doctype" attribute; if there is no "DOCTYPE" declaration for the document, the document’s "doctype" attribute will be set to "None" instead of an instance of this interface. "DocumentType" is a specialization of "Node", and adds the following attributes: DocumentType.publicId The public identifier for the external subset of the document type definition. This will be a string or "None". DocumentType.systemId The system identifier for the external subset of the document type definition. This will be a URI as a string, or "None". DocumentType.internalSubset A string giving the complete internal subset from the document. This does not include the brackets which enclose the subset. If the document has no internal subset, this should be "None". DocumentType.name The name of the root element as given in the "DOCTYPE" declaration, if present. DocumentType.entities This is a "NamedNodeMap" giving the definitions of external entities. For entity names defined more than once, only the first definition is provided (others are ignored as required by the XML recommendation). This may be "None" if the information is not provided by the parser, or if no entities are defined. DocumentType.notations This is a "NamedNodeMap" giving the definitions of notations. For notation names defined more than once, only the first definition is provided (others are ignored as required by the XML recommendation). This may be "None" if the information is not provided by the parser, or if no notations are defined. Document Objects ---------------- A "Document" represents an entire XML document, including its constituent elements, attributes, processing instructions, comments etc. Remember that it inherits properties from "Node". Document.documentElement The one and only root element of the document. Document.createElement(tagName) Create and return a new element node. The element is not inserted into the document when it is created. You need to explicitly insert it with one of the other methods such as "insertBefore()" or "appendChild()". Document.createElementNS(namespaceURI, tagName) Create and return a new element with a namespace. The *tagName* may have a prefix. The element is not inserted into the document when it is created. You need to explicitly insert it with one of the other methods such as "insertBefore()" or "appendChild()". Document.createTextNode(data) Create and return a text node containing the data passed as a parameter. As with the other creation methods, this one does not insert the node into the tree. Document.createComment(data) Create and return a comment node containing the data passed as a parameter. As with the other creation methods, this one does not insert the node into the tree. Document.createProcessingInstruction(target, data) Create and return a processing instruction node containing the *target* and *data* passed as parameters. As with the other creation methods, this one does not insert the node into the tree. Document.createAttribute(name) Create and return an attribute node. This method does not associate the attribute node with any particular element. You must use "setAttributeNode()" on the appropriate "Element" object to use the newly created attribute instance. Document.createAttributeNS(namespaceURI, qualifiedName) Create and return an attribute node with a namespace. The *tagName* may have a prefix. This method does not associate the attribute node with any particular element. You must use "setAttributeNode()" on the appropriate "Element" object to use the newly created attribute instance. Document.getElementsByTagName(tagName) Search for all descendants (direct children, children’s children, etc.) with a particular element type name. Document.getElementsByTagNameNS(namespaceURI, localName) Search for all descendants (direct children, children’s children, etc.) with a particular namespace URI and localname. The localname is the part of the namespace after the prefix. Element Objects --------------- "Element" is a subclass of "Node", so inherits all the attributes of that class. Element.tagName The element type name. In a namespace-using document it may have colons in it. The value is a string. Element.getElementsByTagName(tagName) Same as equivalent method in the "Document" class. Element.getElementsByTagNameNS(namespaceURI, localName) Same as equivalent method in the "Document" class. Element.hasAttribute(name) Return "True" if the element has an attribute named by *name*. Element.hasAttributeNS(namespaceURI, localName) Return "True" if the element has an attribute named by *namespaceURI* and *localName*. Element.getAttribute(name) Return the value of the attribute named by *name* as a string. If no such attribute exists, an empty string is returned, as if the attribute had no value. Element.getAttributeNode(attrname) Return the "Attr" node for the attribute named by *attrname*. Element.getAttributeNS(namespaceURI, localName) Return the value of the attribute named by *namespaceURI* and *localName* as a string. If no such attribute exists, an empty string is returned, as if the attribute had no value. Element.getAttributeNodeNS(namespaceURI, localName) Return an attribute value as a node, given a *namespaceURI* and *localName*. Element.removeAttribute(name) Remove an attribute by name. If there is no matching attribute, a "NotFoundErr" is raised. Element.removeAttributeNode(oldAttr) Remove and return *oldAttr* from the attribute list, if present. If *oldAttr* is not present, "NotFoundErr" is raised. Element.removeAttributeNS(namespaceURI, localName) Remove an attribute by name. Note that it uses a localName, not a qname. No exception is raised if there is no matching attribute. Element.setAttribute(name, value) Set an attribute value from a string. Element.setAttributeNode(newAttr) Add a new attribute node to the element, replacing an existing attribute if necessary if the "name" attribute matches. If a replacement occurs, the old attribute node will be returned. If *newAttr* is already in use, "InuseAttributeErr" will be raised. Element.setAttributeNodeNS(newAttr) Add a new attribute node to the element, replacing an existing attribute if necessary if the "namespaceURI" and "localName" attributes match. If a replacement occurs, the old attribute node will be returned. If *newAttr* is already in use, "InuseAttributeErr" will be raised. Element.setAttributeNS(namespaceURI, qname, value) Set an attribute value from a string, given a *namespaceURI* and a *qname*. Note that a qname is the whole attribute name. This is different than above. Attr Objects ------------ "Attr" inherits from "Node", so inherits all its attributes. Attr.name The attribute name. In a namespace-using document it may include a colon. Attr.localName The part of the name following the colon if there is one, else the entire name. This is a read-only attribute. Attr.prefix The part of the name preceding the colon if there is one, else the empty string. Attr.value The text value of the attribute. This is a synonym for the "nodeValue" attribute. NamedNodeMap Objects -------------------- "NamedNodeMap" does *not* inherit from "Node". NamedNodeMap.length The length of the attribute list. NamedNodeMap.item(index) Return an attribute with a particular index. The order you get the attributes in is arbitrary but will be consistent for the life of a DOM. Each item is an attribute node. Get its value with the "value" attribute. There are also experimental methods that give this class more mapping behavior. You can use them or you can use the standardized "getAttribute*()" family of methods on the "Element" objects. Comment Objects --------------- "Comment" represents a comment in the XML document. It is a subclass of "Node", but cannot have child nodes. Comment.data The content of the comment as a string. The attribute contains all characters between the leading "", but does not include them. Text and CDATASection Objects ----------------------------- The "Text" interface represents text in the XML document. If the parser and DOM implementation support the DOM’s XML extension, portions of the text enclosed in CDATA marked sections are stored in "CDATASection" objects. These two interfaces are identical, but provide different values for the "nodeType" attribute. These interfaces extend the "Node" interface. They cannot have child nodes. Text.data The content of the text node as a string. Note: The use of a "CDATASection" node does not indicate that the node represents a complete CDATA marked section, only that the content of the node was part of a CDATA section. A single CDATA section may be represented by more than one node in the document tree. There is no way to determine whether two adjacent "CDATASection" nodes represent different CDATA marked sections. ProcessingInstruction Objects ----------------------------- Represents a processing instruction in the XML document; this inherits from the "Node" interface and cannot have child nodes. ProcessingInstruction.target The content of the processing instruction up to the first whitespace character. This is a read-only attribute. ProcessingInstruction.data The content of the processing instruction following the first whitespace character. Exceptions ---------- The DOM Level 2 recommendation defines a single exception, "DOMException", and a number of constants that allow applications to determine what sort of error occurred. "DOMException" instances carry a "code" attribute that provides the appropriate value for the specific exception. The Python DOM interface provides the constants, but also expands the set of exceptions so that a specific exception exists for each of the exception codes defined by the DOM. The implementations must raise the appropriate specific exception, each of which carries the appropriate value for the "code" attribute. exception xml.dom.DOMException Base exception class used for all specific DOM exceptions. This exception class cannot be directly instantiated. exception xml.dom.DomstringSizeErr Raised when a specified range of text does not fit into a string. This is not known to be used in the Python DOM implementations, but may be received from DOM implementations not written in Python. exception xml.dom.HierarchyRequestErr Raised when an attempt is made to insert a node where the node type is not allowed. exception xml.dom.IndexSizeErr Raised when an index or size parameter to a method is negative or exceeds the allowed values. exception xml.dom.InuseAttributeErr Raised when an attempt is made to insert an "Attr" node that is already present elsewhere in the document. exception xml.dom.InvalidAccessErr Raised if a parameter or an operation is not supported on the underlying object. exception xml.dom.InvalidCharacterErr This exception is raised when a string parameter contains a character that is not permitted in the context it’s being used in by the XML 1.0 recommendation. For example, attempting to create an "Element" node with a space in the element type name will cause this error to be raised. exception xml.dom.InvalidModificationErr Raised when an attempt is made to modify the type of a node. exception xml.dom.InvalidStateErr Raised when an attempt is made to use an object that is not defined or is no longer usable. exception xml.dom.NamespaceErr If an attempt is made to change any object in a way that is not permitted with regard to the Namespaces in XML recommendation, this exception is raised. exception xml.dom.NotFoundErr Exception when a node does not exist in the referenced context. For example, "NamedNodeMap.removeNamedItem()" will raise this if the node passed in does not exist in the map. exception xml.dom.NotSupportedErr Raised when the implementation does not support the requested type of object or operation. exception xml.dom.NoDataAllowedErr This is raised if data is specified for a node which does not support data. exception xml.dom.NoModificationAllowedErr Raised on attempts to modify an object where modifications are not allowed (such as for read-only nodes). exception xml.dom.SyntaxErr Raised when an invalid or illegal string is specified. exception xml.dom.WrongDocumentErr Raised when a node is inserted in a different document than it currently belongs to, and the implementation does not support migrating the node from one document to the other. The exception codes defined in the DOM recommendation map to the exceptions described above according to this table: +----------------------------------------+-----------------------------------+ | Constant | Exception | |========================================|===================================| | "DOMSTRING_SIZE_ERR" | "DomstringSizeErr" | +----------------------------------------+-----------------------------------+ | "HIERARCHY_REQUEST_ERR" | "HierarchyRequestErr" | +----------------------------------------+-----------------------------------+ | "INDEX_SIZE_ERR" | "IndexSizeErr" | +----------------------------------------+-----------------------------------+ | "INUSE_ATTRIBUTE_ERR" | "InuseAttributeErr" | +----------------------------------------+-----------------------------------+ | "INVALID_ACCESS_ERR" | "InvalidAccessErr" | +----------------------------------------+-----------------------------------+ | "INVALID_CHARACTER_ERR" | "InvalidCharacterErr" | +----------------------------------------+-----------------------------------+ | "INVALID_MODIFICATION_ERR" | "InvalidModificationErr" | +----------------------------------------+-----------------------------------+ | "INVALID_STATE_ERR" | "InvalidStateErr" | +----------------------------------------+-----------------------------------+ | "NAMESPACE_ERR" | "NamespaceErr" | +----------------------------------------+-----------------------------------+ | "NOT_FOUND_ERR" | "NotFoundErr" | +----------------------------------------+-----------------------------------+ | "NOT_SUPPORTED_ERR" | "NotSupportedErr" | +----------------------------------------+-----------------------------------+ | "NO_DATA_ALLOWED_ERR" | "NoDataAllowedErr" | +----------------------------------------+-----------------------------------+ | "NO_MODIFICATION_ALLOWED_ERR" | "NoModificationAllowedErr" | +----------------------------------------+-----------------------------------+ | "SYNTAX_ERR" | "SyntaxErr" | +----------------------------------------+-----------------------------------+ | "WRONG_DOCUMENT_ERR" | "WrongDocumentErr" | +----------------------------------------+-----------------------------------+ Conformance =========== This section describes the conformance requirements and relationships between the Python DOM API, the W3C DOM recommendations, and the OMG IDL mapping for Python. Type Mapping ------------ The IDL types used in the DOM specification are mapped to Python types according to the following table. +--------------------+---------------------------------------------+ | IDL Type | Python Type | |====================|=============================================| | "boolean" | "bool" or "int" | +--------------------+---------------------------------------------+ | "int" | "int" | +--------------------+---------------------------------------------+ | "long int" | "int" | +--------------------+---------------------------------------------+ | "unsigned int" | "int" | +--------------------+---------------------------------------------+ | "DOMString" | "str" or "bytes" | +--------------------+---------------------------------------------+ | "null" | "None" | +--------------------+---------------------------------------------+ Accessor Methods ---------------- The mapping from OMG IDL to Python defines accessor functions for IDL "attribute" declarations in much the way the Java mapping does. Mapping the IDL declarations readonly attribute string someValue; attribute string anotherValue; yields three accessor functions: a “get” method for "someValue" ("_get_someValue()"), and “get” and “set” methods for "anotherValue" ("_get_anotherValue()" and "_set_anotherValue()"). The mapping, in particular, does not require that the IDL attributes are accessible as normal Python attributes: "object.someValue" is *not* required to work, and may raise an "AttributeError". The Python DOM API, however, *does* require that normal attribute access work. This means that the typical surrogates generated by Python IDL compilers are not likely to work, and wrapper objects may be needed on the client if the DOM objects are accessed via CORBA. While this does require some additional consideration for CORBA DOM clients, the implementers with experience using DOM over CORBA from Python do not consider this a problem. Attributes that are declared "readonly" may not restrict write access in all DOM implementations. In the Python DOM API, accessor functions are not required. If provided, they should take the form defined by the Python IDL mapping, but these methods are considered unnecessary since the attributes are accessible directly from Python. “Set” accessors should never be provided for "readonly" attributes. The IDL definitions do not fully embody the requirements of the W3C DOM API, such as the notion of certain objects, such as the return value of "getElementsByTagName()", being “live”. The Python DOM API does not require implementations to enforce such requirements. "xml.etree.ElementTree" — The ElementTree XML API ************************************************* **Source code:** Lib/xml/etree/ElementTree.py ====================================================================== The "xml.etree.ElementTree" module implements a simple and efficient API for parsing and creating XML data. Changed in version 3.3: This module will use a fast implementation whenever available. Deprecated since version 3.3: The "xml.etree.cElementTree" module is deprecated. Warning: The "xml.etree.ElementTree" module is not secure against maliciously constructed data. If you need to parse untrusted or unauthenticated data see XML vulnerabilities. Tutorial ======== This is a short tutorial for using "xml.etree.ElementTree" ("ET" in short). The goal is to demonstrate some of the building blocks and basic concepts of the module. XML tree and elements --------------------- XML is an inherently hierarchical data format, and the most natural way to represent it is with a tree. "ET" has two classes for this purpose - "ElementTree" represents the whole XML document as a tree, and "Element" represents a single node in this tree. Interactions with the whole document (reading and writing to/from files) are usually done on the "ElementTree" level. Interactions with a single XML element and its sub-elements are done on the "Element" level. Parsing XML ----------- We’ll be using the following XML document as the sample data for this section: 1 2008 141100 4 2011 59900 68 2011 13600 We can import this data by reading from a file: import xml.etree.ElementTree as ET tree = ET.parse('country_data.xml') root = tree.getroot() Or directly from a string: root = ET.fromstring(country_data_as_string) "fromstring()" parses XML from a string directly into an "Element", which is the root element of the parsed tree. Other parsing functions may create an "ElementTree". Check the documentation to be sure. As an "Element", "root" has a tag and a dictionary of attributes: >>> root.tag 'data' >>> root.attrib {} It also has children nodes over which we can iterate: >>> for child in root: ... print(child.tag, child.attrib) ... country {'name': 'Liechtenstein'} country {'name': 'Singapore'} country {'name': 'Panama'} Children are nested, and we can access specific child nodes by index: >>> root[0][1].text '2008' Note: Not all elements of the XML input will end up as elements of the parsed tree. Currently, this module skips over any XML comments, processing instructions, and document type declarations in the input. Nevertheless, trees built using this module’s API rather than parsing from XML text can have comments and processing instructions in them; they will be included when generating XML output. A document type declaration may be accessed by passing a custom "TreeBuilder" instance to the "XMLParser" constructor. Pull API for non-blocking parsing --------------------------------- Most parsing functions provided by this module require the whole document to be read at once before returning any result. It is possible to use an "XMLParser" and feed data into it incrementally, but it is a push API that calls methods on a callback target, which is too low-level and inconvenient for most needs. Sometimes what the user really wants is to be able to parse XML incrementally, without blocking operations, while enjoying the convenience of fully constructed "Element" objects. The most powerful tool for doing this is "XMLPullParser". It does not require a blocking read to obtain the XML data, and is instead fed with data incrementally with "XMLPullParser.feed()" calls. To get the parsed XML elements, call "XMLPullParser.read_events()". Here is an example: >>> parser = ET.XMLPullParser(['start', 'end']) >>> parser.feed('sometext') >>> list(parser.read_events()) [('start', )] >>> parser.feed(' more text') >>> for event, elem in parser.read_events(): ... print(event) ... print(elem.tag, 'text=', elem.text) ... end The obvious use case is applications that operate in a non-blocking fashion where the XML data is being received from a socket or read incrementally from some storage device. In such cases, blocking reads are unacceptable. Because it’s so flexible, "XMLPullParser" can be inconvenient to use for simpler use-cases. If you don’t mind your application blocking on reading XML data but would still like to have incremental parsing capabilities, take a look at "iterparse()". It can be useful when you’re reading a large XML document and don’t want to hold it wholly in memory. Where *immediate* feedback through events is wanted, calling method "XMLPullParser.flush()" can help reduce delay; please make sure to study the related security notes. Finding interesting elements ---------------------------- "Element" has some useful methods that help iterate recursively over all the sub-tree below it (its children, their children, and so on). For example, "Element.iter()": >>> for neighbor in root.iter('neighbor'): ... print(neighbor.attrib) ... {'name': 'Austria', 'direction': 'E'} {'name': 'Switzerland', 'direction': 'W'} {'name': 'Malaysia', 'direction': 'N'} {'name': 'Costa Rica', 'direction': 'W'} {'name': 'Colombia', 'direction': 'E'} "Element.findall()" finds only elements with a tag which are direct children of the current element. "Element.find()" finds the *first* child with a particular tag, and "Element.text" accesses the element’s text content. "Element.get()" accesses the element’s attributes: >>> for country in root.findall('country'): ... rank = country.find('rank').text ... name = country.get('name') ... print(name, rank) ... Liechtenstein 1 Singapore 4 Panama 68 More sophisticated specification of which elements to look for is possible by using XPath. Modifying an XML File --------------------- "ElementTree" provides a simple way to build XML documents and write them to files. The "ElementTree.write()" method serves this purpose. Once created, an "Element" object may be manipulated by directly changing its fields (such as "Element.text"), adding and modifying attributes ("Element.set()" method), as well as adding new children (for example with "Element.append()"). Let’s say we want to add one to each country’s rank, and add an "updated" attribute to the rank element: >>> for rank in root.iter('rank'): ... new_rank = int(rank.text) + 1 ... rank.text = str(new_rank) ... rank.set('updated', 'yes') ... >>> tree.write('output.xml') Our XML now looks like this: 2 2008 141100 5 2011 59900 69 2011 13600 We can remove elements using "Element.remove()". Let’s say we want to remove all countries with a rank higher than 50: >>> for country in root.findall('country'): ... # using root.findall() to avoid removal during traversal ... rank = int(country.find('rank').text) ... if rank > 50: ... root.remove(country) ... >>> tree.write('output.xml') Note that concurrent modification while iterating can lead to problems, just like when iterating and modifying Python lists or dicts. Therefore, the example first collects all matching elements with "root.findall()", and only then iterates over the list of matches. Our XML now looks like this: 2 2008 141100 5 2011 59900 Building XML documents ---------------------- The "SubElement()" function also provides a convenient way to create new sub-elements for a given element: >>> a = ET.Element('a') >>> b = ET.SubElement(a, 'b') >>> c = ET.SubElement(a, 'c') >>> d = ET.SubElement(c, 'd') >>> ET.dump(a)
    Parsing XML with Namespaces --------------------------- If the XML input has namespaces, tags and attributes with prefixes in the form "prefix:sometag" get expanded to "{uri}sometag" where the *prefix* is replaced by the full *URI*. Also, if there is a default namespace, that full URI gets prepended to all of the non-prefixed tags. Here is an XML example that incorporates two namespaces, one with the prefix “fictional” and the other serving as the default namespace: John Cleese Lancelot Archie Leach Eric Idle Sir Robin Gunther Commander Clement One way to search and explore this XML example is to manually add the URI to every tag or attribute in the xpath of a "find()" or "findall()": root = fromstring(xml_text) for actor in root.findall('{http://people.example.com}actor'): name = actor.find('{http://people.example.com}name') print(name.text) for char in actor.findall('{http://characters.example.com}character'): print(' |-->', char.text) A better way to search the namespaced XML example is to create a dictionary with your own prefixes and use those in the search functions: ns = {'real_person': 'http://people.example.com', 'role': 'http://characters.example.com'} for actor in root.findall('real_person:actor', ns): name = actor.find('real_person:name', ns) print(name.text) for char in actor.findall('role:character', ns): print(' |-->', char.text) These two approaches both output: John Cleese |--> Lancelot |--> Archie Leach Eric Idle |--> Sir Robin |--> Gunther |--> Commander Clement XPath support ============= This module provides limited support for XPath expressions for locating elements in a tree. The goal is to support a small subset of the abbreviated syntax; a full XPath engine is outside the scope of the module. Example ------- Here’s an example that demonstrates some of the XPath capabilities of the module. We’ll be using the "countrydata" XML document from the Parsing XML section: import xml.etree.ElementTree as ET root = ET.fromstring(countrydata) # Top-level elements root.findall(".") # All 'neighbor' grand-children of 'country' children of the top-level # elements root.findall("./country/neighbor") # Nodes with name='Singapore' that have a 'year' child root.findall(".//year/..[@name='Singapore']") # 'year' nodes that are children of nodes with name='Singapore' root.findall(".//*[@name='Singapore']/year") # All 'neighbor' nodes that are the second child of their parent root.findall(".//neighbor[2]") For XML with namespaces, use the usual qualified "{namespace}tag" notation: # All dublin-core "title" tags in the document root.findall(".//{http://purl.org/dc/elements/1.1/}title") Supported XPath syntax ---------------------- +-------------------------+--------------------------------------------------------+ | Syntax | Meaning | |=========================|========================================================| | "tag" | Selects all child elements with the given tag. For | | | example, "spam" selects all child elements named | | | "spam", and "spam/egg" selects all grandchildren named | | | "egg" in all children named "spam". "{namespace}*" | | | selects all tags in the given namespace, "{*}spam" | | | selects tags named "spam" in any (or no) namespace, | | | and "{}*" only selects tags that are not in a | | | namespace. Changed in version 3.8: Support for star- | | | wildcards was added. | +-------------------------+--------------------------------------------------------+ | "*" | Selects all child elements, including comments and | | | processing instructions. For example, "*/egg" selects | | | all grandchildren named "egg". | +-------------------------+--------------------------------------------------------+ | "." | Selects the current node. This is mostly useful at | | | the beginning of the path, to indicate that it’s a | | | relative path. | +-------------------------+--------------------------------------------------------+ | "//" | Selects all subelements, on all levels beneath the | | | current element. For example, ".//egg" selects all | | | "egg" elements in the entire tree. | +-------------------------+--------------------------------------------------------+ | ".." | Selects the parent element. Returns "None" if the | | | path attempts to reach the ancestors of the start | | | element (the element "find" was called on). | +-------------------------+--------------------------------------------------------+ | "[@attrib]" | Selects all elements that have the given attribute. | +-------------------------+--------------------------------------------------------+ | "[@attrib='value']" | Selects all elements for which the given attribute has | | | the given value. The value cannot contain quotes. | +-------------------------+--------------------------------------------------------+ | "[tag]" | Selects all elements that have a child named "tag". | | | Only immediate children are supported. | +-------------------------+--------------------------------------------------------+ | "[.='text']" | Selects all elements whose complete text content, | | | including descendants, equals the given "text". New | | | in version 3.7. | +-------------------------+--------------------------------------------------------+ | "[tag='text']" | Selects all elements that have a child named "tag" | | | whose complete text content, including descendants, | | | equals the given "text". | +-------------------------+--------------------------------------------------------+ | "[position]" | Selects all elements that are located at the given | | | position. The position can be either an integer (1 is | | | the first position), the expression "last()" (for the | | | last position), or a position relative to the last | | | position (e.g. "last()-1"). | +-------------------------+--------------------------------------------------------+ Predicates (expressions within square brackets) must be preceded by a tag name, an asterisk, or another predicate. "position" predicates must be preceded by a tag name. Reference ========= Functions --------- xml.etree.ElementTree.canonicalize(xml_data=None, *, out=None, from_file=None, **options) C14N 2.0 transformation function. Canonicalization is a way to normalise XML output in a way that allows byte-by-byte comparisons and digital signatures. It reduced the freedom that XML serializers have and instead generates a more constrained XML representation. The main restrictions regard the placement of namespace declarations, the ordering of attributes, and ignorable whitespace. This function takes an XML data string (*xml_data*) or a file path or file-like object (*from_file*) as input, converts it to the canonical form, and writes it out using the *out* file(-like) object, if provided, or returns it as a text string if not. The output file receives text, not bytes. It should therefore be opened in text mode with "utf-8" encoding. Typical uses: xml_data = "..." print(canonicalize(xml_data)) with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file: canonicalize(xml_data, out=out_file) with open("c14n_output.xml", mode='w', encoding='utf-8') as out_file: canonicalize(from_file="inputfile.xml", out=out_file) The configuration *options* are as follows: * *with_comments*: set to true to include comments (default: false) * *strip_text*: set to true to strip whitespace before and after text content (default: false) * *rewrite_prefixes*: set to true to replace namespace prefixes by “n{number}” (default: false) * *qname_aware_tags*: a set of qname aware tag names in which prefixes should be replaced in text content (default: empty) * *qname_aware_attrs*: a set of qname aware attribute names in which prefixes should be replaced in text content (default: empty) * *exclude_attrs*: a set of attribute names that should not be serialised * *exclude_tags*: a set of tag names that should not be serialised In the option list above, “a set” refers to any collection or iterable of strings, no ordering is expected. New in version 3.8. xml.etree.ElementTree.Comment(text=None) Comment element factory. This factory function creates a special element that will be serialized as an XML comment by the standard serializer. The comment string can be either a bytestring or a Unicode string. *text* is a string containing the comment string. Returns an element instance representing a comment. Note that "XMLParser" skips over comments in the input instead of creating comment objects for them. An "ElementTree" will only contain comment nodes if they have been inserted into to the tree using one of the "Element" methods. xml.etree.ElementTree.dump(elem) Writes an element tree or element structure to sys.stdout. This function should be used for debugging only. The exact output format is implementation dependent. In this version, it’s written as an ordinary XML file. *elem* is an element tree or an individual element. Changed in version 3.8: The "dump()" function now preserves the attribute order specified by the user. xml.etree.ElementTree.fromstring(text, parser=None) Parses an XML section from a string constant. Same as "XML()". *text* is a string containing XML data. *parser* is an optional parser instance. If not given, the standard "XMLParser" parser is used. Returns an "Element" instance. xml.etree.ElementTree.fromstringlist(sequence, parser=None) Parses an XML document from a sequence of string fragments. *sequence* is a list or other sequence containing XML data fragments. *parser* is an optional parser instance. If not given, the standard "XMLParser" parser is used. Returns an "Element" instance. New in version 3.2. xml.etree.ElementTree.indent(tree, space=" ", level=0) Appends whitespace to the subtree to indent the tree visually. This can be used to generate pretty-printed XML output. *tree* can be an Element or ElementTree. *space* is the whitespace string that will be inserted for each indentation level, two space characters by default. For indenting partial subtrees inside of an already indented tree, pass the initial indentation level as *level*. New in version 3.9. xml.etree.ElementTree.iselement(element) Check if an object appears to be a valid element object. *element* is an element instance. Return "True" if this is an element object. xml.etree.ElementTree.iterparse(source, events=None, parser=None) Parses an XML section into an element tree incrementally, and reports what’s going on to the user. *source* is a filename or *file object* containing XML data. *events* is a sequence of events to report back. The supported events are the strings ""start"", ""end"", ""comment"", ""pi"", ""start-ns"" and ""end- ns"" (the “ns” events are used to get detailed namespace information). If *events* is omitted, only ""end"" events are reported. *parser* is an optional parser instance. If not given, the standard "XMLParser" parser is used. *parser* must be a subclass of "XMLParser" and can only use the default "TreeBuilder" as a target. Returns an *iterator* providing "(event, elem)" pairs. Note that while "iterparse()" builds the tree incrementally, it issues blocking reads on *source* (or the file it names). As such, it’s unsuitable for applications where blocking reads can’t be made. For fully non-blocking parsing, see "XMLPullParser". Note: "iterparse()" only guarantees that it has seen the “>” character of a starting tag when it emits a “start” event, so the attributes are defined, but the contents of the text and tail attributes are undefined at that point. The same applies to the element children; they may or may not be present.If you need a fully populated element, look for “end” events instead. Deprecated since version 3.4: The *parser* argument. Changed in version 3.8: The "comment" and "pi" events were added. xml.etree.ElementTree.parse(source, parser=None) Parses an XML section into an element tree. *source* is a filename or file object containing XML data. *parser* is an optional parser instance. If not given, the standard "XMLParser" parser is used. Returns an "ElementTree" instance. xml.etree.ElementTree.ProcessingInstruction(target, text=None) PI element factory. This factory function creates a special element that will be serialized as an XML processing instruction. *target* is a string containing the PI target. *text* is a string containing the PI contents, if given. Returns an element instance, representing a processing instruction. Note that "XMLParser" skips over processing instructions in the input instead of creating comment objects for them. An "ElementTree" will only contain processing instruction nodes if they have been inserted into to the tree using one of the "Element" methods. xml.etree.ElementTree.register_namespace(prefix, uri) Registers a namespace prefix. The registry is global, and any existing mapping for either the given prefix or the namespace URI will be removed. *prefix* is a namespace prefix. *uri* is a namespace uri. Tags and attributes in this namespace will be serialized with the given prefix, if at all possible. New in version 3.2. xml.etree.ElementTree.SubElement(parent, tag, attrib={}, **extra) Subelement factory. This function creates an element instance, and appends it to an existing element. The element name, attribute names, and attribute values can be either bytestrings or Unicode strings. *parent* is the parent element. *tag* is the subelement name. *attrib* is an optional dictionary, containing element attributes. *extra* contains additional attributes, given as keyword arguments. Returns an element instance. xml.etree.ElementTree.tostring(element, encoding="us-ascii", method="xml", *, xml_declaration=None, default_namespace=None, short_empty_elements=True) Generates a string representation of an XML element, including all subelements. *element* is an "Element" instance. *encoding* [1] is the output encoding (default is US-ASCII). Use "encoding="unicode"" to generate a Unicode string (otherwise, a bytestring is generated). *method* is either ""xml"", ""html"" or ""text"" (default is ""xml""). *xml_declaration*, *default_namespace* and *short_empty_elements* has the same meaning as in "ElementTree.write()". Returns an (optionally) encoded string containing the XML data. New in version 3.4: The *short_empty_elements* parameter. New in version 3.8: The *xml_declaration* and *default_namespace* parameters. Changed in version 3.8: The "tostring()" function now preserves the attribute order specified by the user. xml.etree.ElementTree.tostringlist(element, encoding="us-ascii", method="xml", *, xml_declaration=None, default_namespace=None, short_empty_elements=True) Generates a string representation of an XML element, including all subelements. *element* is an "Element" instance. *encoding* [1] is the output encoding (default is US-ASCII). Use "encoding="unicode"" to generate a Unicode string (otherwise, a bytestring is generated). *method* is either ""xml"", ""html"" or ""text"" (default is ""xml""). *xml_declaration*, *default_namespace* and *short_empty_elements* has the same meaning as in "ElementTree.write()". Returns a list of (optionally) encoded strings containing the XML data. It does not guarantee any specific sequence, except that "b"".join(tostringlist(element)) == tostring(element)". New in version 3.2. New in version 3.4: The *short_empty_elements* parameter. New in version 3.8: The *xml_declaration* and *default_namespace* parameters. Changed in version 3.8: The "tostringlist()" function now preserves the attribute order specified by the user. xml.etree.ElementTree.XML(text, parser=None) Parses an XML section from a string constant. This function can be used to embed “XML literals” in Python code. *text* is a string containing XML data. *parser* is an optional parser instance. If not given, the standard "XMLParser" parser is used. Returns an "Element" instance. xml.etree.ElementTree.XMLID(text, parser=None) Parses an XML section from a string constant, and also returns a dictionary which maps from element id:s to elements. *text* is a string containing XML data. *parser* is an optional parser instance. If not given, the standard "XMLParser" parser is used. Returns a tuple containing an "Element" instance and a dictionary. XInclude support ================ This module provides limited support for XInclude directives, via the "xml.etree.ElementInclude" helper module. This module can be used to insert subtrees and text strings into element trees, based on information in the tree. Example ------- Here’s an example that demonstrates use of the XInclude module. To include an XML document in the current document, use the "{http://www.w3.org/2001/XInclude}include" element and set the **parse** attribute to ""xml"", and use the **href** attribute to specify the document to include. By default, the **href** attribute is treated as a file name. You can use custom loaders to override this behaviour. Also note that the standard helper does not support XPointer syntax. To process this file, load it as usual, and pass the root element to the "xml.etree.ElementTree" module: from xml.etree import ElementTree, ElementInclude tree = ElementTree.parse("document.xml") root = tree.getroot() ElementInclude.include(root) The ElementInclude module replaces the "{http://www.w3.org/2001/XInclude}include" element with the root element from the **source.xml** document. The result might look something like this: This is a paragraph. If the **parse** attribute is omitted, it defaults to “xml”. The href attribute is required. To include a text document, use the "{http://www.w3.org/2001/XInclude}include" element, and set the **parse** attribute to “text”: Copyright (c) . The result might look something like: Copyright (c) 2003. Reference ========= Functions --------- xml.etree.ElementInclude.default_loader(href, parse, encoding=None) Default loader. This default loader reads an included resource from disk. *href* is a URL. *parse* is for parse mode either “xml” or “text”. *encoding* is an optional text encoding. If not given, encoding is "utf-8". Returns the expanded resource. If the parse mode is ""xml"", this is an ElementTree instance. If the parse mode is “text”, this is a Unicode string. If the loader fails, it can return None or raise an exception. xml.etree.ElementInclude.include(elem, loader=None, base_url=None, max_depth=6) This function expands XInclude directives. *elem* is the root element. *loader* is an optional resource loader. If omitted, it defaults to "default_loader()". If given, it should be a callable that implements the same interface as "default_loader()". *base_url* is base URL of the original file, to resolve relative include file references. *max_depth* is the maximum number of recursive inclusions. Limited to reduce the risk of malicious content explosion. Pass a negative value to disable the limitation. Returns the expanded resource. If the parse mode is ""xml"", this is an ElementTree instance. If the parse mode is “text”, this is a Unicode string. If the loader fails, it can return None or raise an exception. New in version 3.9: The *base_url* and *max_depth* parameters. Element Objects --------------- class xml.etree.ElementTree.Element(tag, attrib={}, **extra) Element class. This class defines the Element interface, and provides a reference implementation of this interface. The element name, attribute names, and attribute values can be either bytestrings or Unicode strings. *tag* is the element name. *attrib* is an optional dictionary, containing element attributes. *extra* contains additional attributes, given as keyword arguments. tag A string identifying what kind of data this element represents (the element type, in other words). text tail These attributes can be used to hold additional data associated with the element. Their values are usually strings but may be any application-specific object. If the element is created from an XML file, the *text* attribute holds either the text between the element’s start tag and its first child or end tag, or "None", and the *tail* attribute holds either the text between the element’s end tag and the next tag, or "None". For the XML data 1234 the *a* element has "None" for both *text* and *tail* attributes, the *b* element has *text* ""1"" and *tail* ""4"", the *c* element has *text* ""2"" and *tail* "None", and the *d* element has *text* "None" and *tail* ""3"". To collect the inner text of an element, see "itertext()", for example """.join(element.itertext())". Applications may store arbitrary objects in these attributes. attrib A dictionary containing the element’s attributes. Note that while the *attrib* value is always a real mutable Python dictionary, an ElementTree implementation may choose to use another internal representation, and create the dictionary only if someone asks for it. To take advantage of such implementations, use the dictionary methods below whenever possible. The following dictionary-like methods work on the element attributes. clear() Resets an element. This function removes all subelements, clears all attributes, and sets the text and tail attributes to "None". get(key, default=None) Gets the element attribute named *key*. Returns the attribute value, or *default* if the attribute was not found. items() Returns the element attributes as a sequence of (name, value) pairs. The attributes are returned in an arbitrary order. keys() Returns the elements attribute names as a list. The names are returned in an arbitrary order. set(key, value) Set the attribute *key* on the element to *value*. The following methods work on the element’s children (subelements). append(subelement) Adds the element *subelement* to the end of this element’s internal list of subelements. Raises "TypeError" if *subelement* is not an "Element". extend(subelements) Appends *subelements* from a sequence object with zero or more elements. Raises "TypeError" if a subelement is not an "Element". New in version 3.2. find(match, namespaces=None) Finds the first subelement matching *match*. *match* may be a tag name or a path. Returns an element instance or "None". *namespaces* is an optional mapping from namespace prefix to full name. Pass "''" as prefix to move all unprefixed tag names in the expression into the given namespace. findall(match, namespaces=None) Finds all matching subelements, by tag name or path. Returns a list containing all matching elements in document order. *namespaces* is an optional mapping from namespace prefix to full name. Pass "''" as prefix to move all unprefixed tag names in the expression into the given namespace. findtext(match, default=None, namespaces=None) Finds text for the first subelement matching *match*. *match* may be a tag name or a path. Returns the text content of the first matching element, or *default* if no element was found. Note that if the matching element has no text content an empty string is returned. *namespaces* is an optional mapping from namespace prefix to full name. Pass "''" as prefix to move all unprefixed tag names in the expression into the given namespace. insert(index, subelement) Inserts *subelement* at the given position in this element. Raises "TypeError" if *subelement* is not an "Element". iter(tag=None) Creates a tree *iterator* with the current element as the root. The iterator iterates over this element and all elements below it, in document (depth first) order. If *tag* is not "None" or "'*'", only elements whose tag equals *tag* are returned from the iterator. If the tree structure is modified during iteration, the result is undefined. New in version 3.2. iterfind(match, namespaces=None) Finds all matching subelements, by tag name or path. Returns an iterable yielding all matching elements in document order. *namespaces* is an optional mapping from namespace prefix to full name. New in version 3.2. itertext() Creates a text iterator. The iterator loops over this element and all subelements, in document order, and returns all inner text. New in version 3.2. makeelement(tag, attrib) Creates a new element object of the same type as this element. Do not call this method, use the "SubElement()" factory function instead. remove(subelement) Removes *subelement* from the element. Unlike the find* methods this method compares elements based on the instance identity, not on tag value or contents. "Element" objects also support the following sequence type methods for working with subelements: "__delitem__()", "__getitem__()", "__setitem__()", "__len__()". Caution: Elements with no subelements will test as "False". This behavior will change in future versions. Use specific "len(elem)" or "elem is None" test instead. element = root.find('foo') if not element: # careful! print("element not found, or element has no subelements") if element is None: print("element not found") Prior to Python 3.8, the serialisation order of the XML attributes of elements was artificially made predictable by sorting the attributes by their name. Based on the now guaranteed ordering of dicts, this arbitrary reordering was removed in Python 3.8 to preserve the order in which attributes were originally parsed or created by user code. In general, user code should try not to depend on a specific ordering of attributes, given that the XML Information Set explicitly excludes the attribute order from conveying information. Code should be prepared to deal with any ordering on input. In cases where deterministic XML output is required, e.g. for cryptographic signing or test data sets, canonical serialisation is available with the "canonicalize()" function. In cases where canonical output is not applicable but a specific attribute order is still desirable on output, code should aim for creating the attributes directly in the desired order, to avoid perceptual mismatches for readers of the code. In cases where this is difficult to achieve, a recipe like the following can be applied prior to serialisation to enforce an order independently from the Element creation: def reorder_attributes(root): for el in root.iter(): attrib = el.attrib if len(attrib) > 1: # adjust attribute order, e.g. by sorting attribs = sorted(attrib.items()) attrib.clear() attrib.update(attribs) ElementTree Objects ------------------- class xml.etree.ElementTree.ElementTree(element=None, file=None) ElementTree wrapper class. This class represents an entire element hierarchy, and adds some extra support for serialization to and from standard XML. *element* is the root element. The tree is initialized with the contents of the XML *file* if given. _setroot(element) Replaces the root element for this tree. This discards the current contents of the tree, and replaces it with the given element. Use with care. *element* is an element instance. find(match, namespaces=None) Same as "Element.find()", starting at the root of the tree. findall(match, namespaces=None) Same as "Element.findall()", starting at the root of the tree. findtext(match, default=None, namespaces=None) Same as "Element.findtext()", starting at the root of the tree. getroot() Returns the root element for this tree. iter(tag=None) Creates and returns a tree iterator for the root element. The iterator loops over all elements in this tree, in section order. *tag* is the tag to look for (default is to return all elements). iterfind(match, namespaces=None) Same as "Element.iterfind()", starting at the root of the tree. New in version 3.2. parse(source, parser=None) Loads an external XML section into this element tree. *source* is a file name or *file object*. *parser* is an optional parser instance. If not given, the standard "XMLParser" parser is used. Returns the section root element. write(file, encoding="us-ascii", xml_declaration=None, default_namespace=None, method="xml", *, short_empty_elements=True) Writes the element tree to a file, as XML. *file* is a file name, or a *file object* opened for writing. *encoding* [1] is the output encoding (default is US-ASCII). *xml_declaration* controls if an XML declaration should be added to the file. Use "False" for never, "True" for always, "None" for only if not US- ASCII or UTF-8 or Unicode (default is "None"). *default_namespace* sets the default XML namespace (for “xmlns”). *method* is either ""xml"", ""html"" or ""text"" (default is ""xml""). The keyword-only *short_empty_elements* parameter controls the formatting of elements that contain no content. If "True" (the default), they are emitted as a single self-closed tag, otherwise they are emitted as a pair of start/end tags. The output is either a string ("str") or binary ("bytes"). This is controlled by the *encoding* argument. If *encoding* is ""unicode"", the output is a string; otherwise, it’s binary. Note that this may conflict with the type of *file* if it’s an open *file object*; make sure you do not try to write a string to a binary stream and vice versa. New in version 3.4: The *short_empty_elements* parameter. Changed in version 3.8: The "write()" method now preserves the attribute order specified by the user. This is the XML file that is going to be manipulated: Example page

    Moved to example.org or example.com.

    Example of changing the attribute “target” of every link in first paragraph: >>> from xml.etree.ElementTree import ElementTree >>> tree = ElementTree() >>> tree.parse("index.xhtml") >>> p = tree.find("body/p") # Finds first occurrence of tag p in body >>> p >>> links = list(p.iter("a")) # Returns list of all links >>> links [, ] >>> for i in links: # Iterates through all found links ... i.attrib["target"] = "blank" >>> tree.write("output.xhtml") QName Objects ------------- class xml.etree.ElementTree.QName(text_or_uri, tag=None) QName wrapper. This can be used to wrap a QName attribute value, in order to get proper namespace handling on output. *text_or_uri* is a string containing the QName value, in the form {uri}local, or, if the tag argument is given, the URI part of a QName. If *tag* is given, the first argument is interpreted as a URI, and this argument is interpreted as a local name. "QName" instances are opaque. TreeBuilder Objects ------------------- class xml.etree.ElementTree.TreeBuilder(element_factory=None, *, comment_factory=None, pi_factory=None, insert_comments=False, insert_pis=False) Generic element structure builder. This builder converts a sequence of start, data, end, comment and pi method calls to a well-formed element structure. You can use this class to build an element structure using a custom XML parser, or a parser for some other XML-like format. *element_factory*, when given, must be a callable accepting two positional arguments: a tag and a dict of attributes. It is expected to return a new element instance. The *comment_factory* and *pi_factory* functions, when given, should behave like the "Comment()" and "ProcessingInstruction()" functions to create comments and processing instructions. When not given, the default factories will be used. When *insert_comments* and/or *insert_pis* is true, comments/pis will be inserted into the tree if they appear within the root element (but not outside of it). close() Flushes the builder buffers, and returns the toplevel document element. Returns an "Element" instance. data(data) Adds text to the current element. *data* is a string. This should be either a bytestring, or a Unicode string. end(tag) Closes the current element. *tag* is the element name. Returns the closed element. start(tag, attrs) Opens a new element. *tag* is the element name. *attrs* is a dictionary containing element attributes. Returns the opened element. comment(text) Creates a comment with the given *text*. If "insert_comments" is true, this will also add it to the tree. New in version 3.8. pi(target, text) Creates a comment with the given *target* name and *text*. If "insert_pis" is true, this will also add it to the tree. New in version 3.8. In addition, a custom "TreeBuilder" object can provide the following methods: doctype(name, pubid, system) Handles a doctype declaration. *name* is the doctype name. *pubid* is the public identifier. *system* is the system identifier. This method does not exist on the default "TreeBuilder" class. New in version 3.2. start_ns(prefix, uri) Is called whenever the parser encounters a new namespace declaration, before the "start()" callback for the opening element that defines it. *prefix* is "''" for the default namespace and the declared namespace prefix name otherwise. *uri* is the namespace URI. New in version 3.8. end_ns(prefix) Is called after the "end()" callback of an element that declared a namespace prefix mapping, with the name of the *prefix* that went out of scope. New in version 3.8. class xml.etree.ElementTree.C14NWriterTarget(write, *, with_comments=False, strip_text=False, rewrite_prefixes=False, qname_aware_tags=None, qname_aware_attrs=None, exclude_attrs=None, exclude_tags=None) A C14N 2.0 writer. Arguments are the same as for the "canonicalize()" function. This class does not build a tree but translates the callback events directly into a serialised form using the *write* function. New in version 3.8. XMLParser Objects ----------------- class xml.etree.ElementTree.XMLParser(*, target=None, encoding=None) This class is the low-level building block of the module. It uses "xml.parsers.expat" for efficient, event-based parsing of XML. It can be fed XML data incrementally with the "feed()" method, and parsing events are translated to a push API - by invoking callbacks on the *target* object. If *target* is omitted, the standard "TreeBuilder" is used. If *encoding* [1] is given, the value overrides the encoding specified in the XML file. Changed in version 3.8: Parameters are now keyword-only. The *html* argument no longer supported. close() Finishes feeding data to the parser. Returns the result of calling the "close()" method of the *target* passed during construction; by default, this is the toplevel document element. feed(data) Feeds data to the parser. *data* is encoded data. flush() Triggers parsing of any previously fed unparsed data, which can be used to ensure more immediate feedback, in particular with Expat >=2.6.0. The implementation of "flush()" temporarily disables reparse deferral with Expat (if currently enabled) and triggers a reparse. Disabling reparse deferral has security consequences; please see "xml.parsers.expat.xmlparser.SetReparseDeferralEnabled()" for details. Note that "flush()" has been backported to some prior releases of CPython as a security fix. Check for availability of "flush()" using "hasattr()" if used in code running across a variety of Python versions. New in version 3.9.19. "XMLParser.feed()" calls *target*’s "start(tag, attrs_dict)" method for each opening tag, its "end(tag)" method for each closing tag, and data is processed by method "data(data)". For further supported callback methods, see the "TreeBuilder" class. "XMLParser.close()" calls *target*’s method "close()". "XMLParser" can be used not only for building a tree structure. This is an example of counting the maximum depth of an XML file: >>> from xml.etree.ElementTree import XMLParser >>> class MaxDepth: # The target object of the parser ... maxDepth = 0 ... depth = 0 ... def start(self, tag, attrib): # Called for each opening tag. ... self.depth += 1 ... if self.depth > self.maxDepth: ... self.maxDepth = self.depth ... def end(self, tag): # Called for each closing tag. ... self.depth -= 1 ... def data(self, data): ... pass # We do not need to do anything with data. ... def close(self): # Called when all data has been parsed. ... return self.maxDepth ... >>> target = MaxDepth() >>> parser = XMLParser(target=target) >>> exampleXml = """ ... ... ... ... ... ... ... ... ... ... """ >>> parser.feed(exampleXml) >>> parser.close() 4 XMLPullParser Objects --------------------- class xml.etree.ElementTree.XMLPullParser(events=None) A pull parser suitable for non-blocking applications. Its input- side API is similar to that of "XMLParser", but instead of pushing calls to a callback target, "XMLPullParser" collects an internal list of parsing events and lets the user read from it. *events* is a sequence of events to report back. The supported events are the strings ""start"", ""end"", ""comment"", ""pi"", ""start-ns"" and ""end-ns"" (the “ns” events are used to get detailed namespace information). If *events* is omitted, only ""end"" events are reported. feed(data) Feed the given bytes data to the parser. flush() Triggers parsing of any previously fed unparsed data, which can be used to ensure more immediate feedback, in particular with Expat >=2.6.0. The implementation of "flush()" temporarily disables reparse deferral with Expat (if currently enabled) and triggers a reparse. Disabling reparse deferral has security consequences; please see "xml.parsers.expat.xmlparser.SetReparseDeferralEnabled()" for details. Note that "flush()" has been backported to some prior releases of CPython as a security fix. Check for availability of "flush()" using "hasattr()" if used in code running across a variety of Python versions. New in version 3.9.19. close() Signal the parser that the data stream is terminated. Unlike "XMLParser.close()", this method always returns "None". Any events not yet retrieved when the parser is closed can still be read with "read_events()". read_events() Return an iterator over the events which have been encountered in the data fed to the parser. The iterator yields "(event, elem)" pairs, where *event* is a string representing the type of event (e.g. ""end"") and *elem* is the encountered "Element" object, or other context value as follows. * "start", "end": the current Element. * "comment", "pi": the current comment / processing instruction * "start-ns": a tuple "(prefix, uri)" naming the declared namespace mapping. * "end-ns": "None" (this may change in a future version) Events provided in a previous call to "read_events()" will not be yielded again. Events are consumed from the internal queue only when they are retrieved from the iterator, so multiple readers iterating in parallel over iterators obtained from "read_events()" will have unpredictable results. Note: "XMLPullParser" only guarantees that it has seen the “>” character of a starting tag when it emits a “start” event, so the attributes are defined, but the contents of the text and tail attributes are undefined at that point. The same applies to the element children; they may or may not be present.If you need a fully populated element, look for “end” events instead. New in version 3.4. Changed in version 3.8: The "comment" and "pi" events were added. Exceptions ---------- class xml.etree.ElementTree.ParseError XML parse error, raised by the various parsing methods in this module when parsing fails. The string representation of an instance of this exception will contain a user-friendly error message. In addition, it will have the following attributes available: code A numeric error code from the expat parser. See the documentation of "xml.parsers.expat" for the list of error codes and their meanings. position A tuple of *line*, *column* numbers, specifying where the error occurred. -[ Footnotes ]- [1] The encoding string included in XML output should conform to the appropriate standards. For example, “UTF-8” is valid, but “UTF8” is not. See https://www.w3.org/TR/2006/REC-xml11-20060816/#NT- EncodingDecl and https://www.iana.org/assignments/character-sets /character-sets.xhtml. "xml.sax.handler" — Base classes for SAX handlers ************************************************* **Source code:** Lib/xml/sax/handler.py ====================================================================== The SAX API defines four kinds of handlers: content handlers, DTD handlers, error handlers, and entity resolvers. Applications normally only need to implement those interfaces whose events they are interested in; they can implement the interfaces in a single object or in multiple objects. Handler implementations should inherit from the base classes provided in the module "xml.sax.handler", so that all methods get default implementations. class xml.sax.handler.ContentHandler This is the main callback interface in SAX, and the one most important to applications. The order of events in this interface mirrors the order of the information in the document. class xml.sax.handler.DTDHandler Handle DTD events. This interface specifies only those DTD events required for basic parsing (unparsed entities and attributes). class xml.sax.handler.EntityResolver Basic interface for resolving entities. If you create an object implementing this interface, then register the object with your Parser, the parser will call the method in your object to resolve all external entities. class xml.sax.handler.ErrorHandler Interface used by the parser to present error and warning messages to the application. The methods of this object control whether errors are immediately converted to exceptions or are handled in some other way. In addition to these classes, "xml.sax.handler" provides symbolic constants for the feature and property names. xml.sax.handler.feature_namespaces value: ""http://xml.org/sax/features/namespaces"" true: Perform Namespace processing. false: Optionally do not perform Namespace processing (implies namespace-prefixes; default). access: (parsing) read-only; (not parsing) read/write xml.sax.handler.feature_namespace_prefixes value: ""http://xml.org/sax/features/namespace-prefixes"" true: Report the original prefixed names and attributes used for Namespace declarations. false: Do not report attributes used for Namespace declarations, and optionally do not report original prefixed names (default). access: (parsing) read-only; (not parsing) read/write xml.sax.handler.feature_string_interning value: ""http://xml.org/sax/features/string-interning"" true: All element names, prefixes, attribute names, Namespace URIs, and local names are interned using the built-in intern function. false: Names are not necessarily interned, although they may be (default). access: (parsing) read-only; (not parsing) read/write xml.sax.handler.feature_validation value: ""http://xml.org/sax/features/validation"" true: Report all validation errors (implies external-general-entities and external-parameter-entities). false: Do not report validation errors. access: (parsing) read-only; (not parsing) read/write xml.sax.handler.feature_external_ges value: ""http://xml.org/sax/features/external-general-entities"" true: Include all external general (text) entities. false: Do not include external general entities. access: (parsing) read-only; (not parsing) read/write xml.sax.handler.feature_external_pes value: ""http://xml.org/sax/features/external-parameter-entities"" true: Include all external parameter entities, including the external DTD subset. false: Do not include any external parameter entities, even the external DTD subset. access: (parsing) read-only; (not parsing) read/write xml.sax.handler.all_features List of all features. xml.sax.handler.property_lexical_handler value: ""http://xml.org/sax/properties/lexical-handler"" data type: xml.sax.sax2lib.LexicalHandler (not supported in Python 2) description: An optional extension handler for lexical events like comments. access: read/write xml.sax.handler.property_declaration_handler value: ""http://xml.org/sax/properties/declaration-handler"" data type: xml.sax.sax2lib.DeclHandler (not supported in Python 2) description: An optional extension handler for DTD-related events other than notations and unparsed entities. access: read/write xml.sax.handler.property_dom_node value: ""http://xml.org/sax/properties/dom-node"" data type: org.w3c.dom.Node (not supported in Python 2) description: When parsing, the current DOM node being visited if this is a DOM iterator; when not parsing, the root DOM node for iteration. access: (parsing) read-only; (not parsing) read/write xml.sax.handler.property_xml_string value: ""http://xml.org/sax/properties/xml-string"" data type: Bytes description: The literal string of characters that was the source for the current event. access: read-only xml.sax.handler.all_properties List of all known property names. ContentHandler Objects ====================== Users are expected to subclass "ContentHandler" to support their application. The following methods are called by the parser on the appropriate events in the input document: ContentHandler.setDocumentLocator(locator) Called by the parser to give the application a locator for locating the origin of document events. SAX parsers are strongly encouraged (though not absolutely required) to supply a locator: if it does so, it must supply the locator to the application by invoking this method before invoking any of the other methods in the DocumentHandler interface. The locator allows the application to determine the end position of any document-related event, even if the parser is not reporting an error. Typically, the application will use this information for reporting its own errors (such as character content that does not match an application’s business rules). The information returned by the locator is probably not sufficient for use with a search engine. Note that the locator will return correct information only during the invocation of the events in this interface. The application should not attempt to use it at any other time. ContentHandler.startDocument() Receive notification of the beginning of a document. The SAX parser will invoke this method only once, before any other methods in this interface or in DTDHandler (except for "setDocumentLocator()"). ContentHandler.endDocument() Receive notification of the end of a document. The SAX parser will invoke this method only once, and it will be the last method invoked during the parse. The parser shall not invoke this method until it has either abandoned parsing (because of an unrecoverable error) or reached the end of input. ContentHandler.startPrefixMapping(prefix, uri) Begin the scope of a prefix-URI Namespace mapping. The information from this event is not necessary for normal Namespace processing: the SAX XML reader will automatically replace prefixes for element and attribute names when the "feature_namespaces" feature is enabled (the default). There are cases, however, when applications need to use prefixes in character data or in attribute values, where they cannot safely be expanded automatically; the "startPrefixMapping()" and "endPrefixMapping()" events supply the information to the application to expand prefixes in those contexts itself, if necessary. Note that "startPrefixMapping()" and "endPrefixMapping()" events are not guaranteed to be properly nested relative to each-other: all "startPrefixMapping()" events will occur before the corresponding "startElement()" event, and all "endPrefixMapping()" events will occur after the corresponding "endElement()" event, but their order is not guaranteed. ContentHandler.endPrefixMapping(prefix) End the scope of a prefix-URI mapping. See "startPrefixMapping()" for details. This event will always occur after the corresponding "endElement()" event, but the order of "endPrefixMapping()" events is not otherwise guaranteed. ContentHandler.startElement(name, attrs) Signals the start of an element in non-namespace mode. The *name* parameter contains the raw XML 1.0 name of the element type as a string and the *attrs* parameter holds an object of the "Attributes" interface (see The Attributes Interface) containing the attributes of the element. The object passed as *attrs* may be re-used by the parser; holding on to a reference to it is not a reliable way to keep a copy of the attributes. To keep a copy of the attributes, use the "copy()" method of the *attrs* object. ContentHandler.endElement(name) Signals the end of an element in non-namespace mode. The *name* parameter contains the name of the element type, just as with the "startElement()" event. ContentHandler.startElementNS(name, qname, attrs) Signals the start of an element in namespace mode. The *name* parameter contains the name of the element type as a "(uri, localname)" tuple, the *qname* parameter contains the raw XML 1.0 name used in the source document, and the *attrs* parameter holds an instance of the "AttributesNS" interface (see The AttributesNS Interface) containing the attributes of the element. If no namespace is associated with the element, the *uri* component of *name* will be "None". The object passed as *attrs* may be re- used by the parser; holding on to a reference to it is not a reliable way to keep a copy of the attributes. To keep a copy of the attributes, use the "copy()" method of the *attrs* object. Parsers may set the *qname* parameter to "None", unless the "feature_namespace_prefixes" feature is activated. ContentHandler.endElementNS(name, qname) Signals the end of an element in namespace mode. The *name* parameter contains the name of the element type, just as with the "startElementNS()" method, likewise the *qname* parameter. ContentHandler.characters(content) Receive notification of character data. The Parser will call this method to report each chunk of character data. SAX parsers may return all contiguous character data in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity so that the Locator provides useful information. *content* may be a string or bytes instance; the "expat" reader module always produces strings. Note: The earlier SAX 1 interface provided by the Python XML Special Interest Group used a more Java-like interface for this method. Since most parsers used from Python did not take advantage of the older interface, the simpler signature was chosen to replace it. To convert old code to the new interface, use *content* instead of slicing content with the old *offset* and *length* parameters. ContentHandler.ignorableWhitespace(whitespace) Receive notification of ignorable whitespace in element content. Validating Parsers must use this method to report each chunk of ignorable whitespace (see the W3C XML 1.0 recommendation, section 2.10): non-validating parsers may also use this method if they are capable of parsing and using content models. SAX parsers may return all contiguous whitespace in a single chunk, or they may split it into several chunks; however, all of the characters in any single event must come from the same external entity, so that the Locator provides useful information. ContentHandler.processingInstruction(target, data) Receive notification of a processing instruction. The Parser will invoke this method once for each processing instruction found: note that processing instructions may occur before or after the main document element. A SAX parser should never report an XML declaration (XML 1.0, section 2.8) or a text declaration (XML 1.0, section 4.3.1) using this method. ContentHandler.skippedEntity(name) Receive notification of a skipped entity. The Parser will invoke this method once for each entity skipped. Non-validating processors may skip entities if they have not seen the declarations (because, for example, the entity was declared in an external DTD subset). All processors may skip external entities, depending on the values of the "feature_external_ges" and the "feature_external_pes" properties. DTDHandler Objects ================== "DTDHandler" instances provide the following methods: DTDHandler.notationDecl(name, publicId, systemId) Handle a notation declaration event. DTDHandler.unparsedEntityDecl(name, publicId, systemId, ndata) Handle an unparsed entity declaration event. EntityResolver Objects ====================== EntityResolver.resolveEntity(publicId, systemId) Resolve the system identifier of an entity and return either the system identifier to read from as a string, or an InputSource to read from. The default implementation returns *systemId*. ErrorHandler Objects ==================== Objects with this interface are used to receive error and warning information from the "XMLReader". If you create an object that implements this interface, then register the object with your "XMLReader", the parser will call the methods in your object to report all warnings and errors. There are three levels of errors available: warnings, (possibly) recoverable errors, and unrecoverable errors. All methods take a "SAXParseException" as the only parameter. Errors and warnings may be converted to an exception by raising the passed-in exception object. ErrorHandler.error(exception) Called when the parser encounters a recoverable error. If this method does not raise an exception, parsing may continue, but further document information should not be expected by the application. Allowing the parser to continue may allow additional errors to be discovered in the input document. ErrorHandler.fatalError(exception) Called when the parser encounters an error it cannot recover from; parsing is expected to terminate when this method returns. ErrorHandler.warning(exception) Called when the parser presents minor warning information to the application. Parsing is expected to continue when this method returns, and document information will continue to be passed to the application. Raising an exception in this method will cause parsing to end. "xml.sax.xmlreader" — Interface for XML parsers *********************************************** **Source code:** Lib/xml/sax/xmlreader.py ====================================================================== SAX parsers implement the "XMLReader" interface. They are implemented in a Python module, which must provide a function "create_parser()". This function is invoked by "xml.sax.make_parser()" with no arguments to create a new parser object. class xml.sax.xmlreader.XMLReader Base class which can be inherited by SAX parsers. class xml.sax.xmlreader.IncrementalParser In some cases, it is desirable not to parse an input source at once, but to feed chunks of the document as they get available. Note that the reader will normally not read the entire file, but read it in chunks as well; still "parse()" won’t return until the entire document is processed. So these interfaces should be used if the blocking behaviour of "parse()" is not desirable. When the parser is instantiated it is ready to begin accepting data from the feed method immediately. After parsing has been finished with a call to close the reset method must be called to make the parser ready to accept new data, either from feed or using the parse method. Note that these methods must *not* be called during parsing, that is, after parse has been called and before it returns. By default, the class also implements the parse method of the XMLReader interface using the feed, close and reset methods of the IncrementalParser interface as a convenience to SAX 2.0 driver writers. class xml.sax.xmlreader.Locator Interface for associating a SAX event with a document location. A locator object will return valid results only during calls to DocumentHandler methods; at any other time, the results are unpredictable. If information is not available, methods may return "None". class xml.sax.xmlreader.InputSource(system_id=None) Encapsulation of the information needed by the "XMLReader" to read entities. This class may include information about the public identifier, system identifier, byte stream (possibly with character encoding information) and/or the character stream of an entity. Applications will create objects of this class for use in the "XMLReader.parse()" method and for returning from EntityResolver.resolveEntity. An "InputSource" belongs to the application, the "XMLReader" is not allowed to modify "InputSource" objects passed to it from the application, although it may make copies and modify those. class xml.sax.xmlreader.AttributesImpl(attrs) This is an implementation of the "Attributes" interface (see section The Attributes Interface). This is a dictionary-like object which represents the element attributes in a "startElement()" call. In addition to the most useful dictionary operations, it supports a number of other methods as described by the interface. Objects of this class should be instantiated by readers; *attrs* must be a dictionary-like object containing a mapping from attribute names to attribute values. class xml.sax.xmlreader.AttributesNSImpl(attrs, qnames) Namespace-aware variant of "AttributesImpl", which will be passed to "startElementNS()". It is derived from "AttributesImpl", but understands attribute names as two-tuples of *namespaceURI* and *localname*. In addition, it provides a number of methods expecting qualified names as they appear in the original document. This class implements the "AttributesNS" interface (see section The AttributesNS Interface). XMLReader Objects ================= The "XMLReader" interface supports the following methods: XMLReader.parse(source) Process an input source, producing SAX events. The *source* object can be a system identifier (a string identifying the input source – typically a file name or a URL), a "pathlib.Path" or *path-like* object, or an "InputSource" object. When "parse()" returns, the input is completely processed, and the parser object can be discarded or reset. Changed in version 3.5: Added support of character streams. Changed in version 3.8: Added support of path-like objects. XMLReader.getContentHandler() Return the current "ContentHandler". XMLReader.setContentHandler(handler) Set the current "ContentHandler". If no "ContentHandler" is set, content events will be discarded. XMLReader.getDTDHandler() Return the current "DTDHandler". XMLReader.setDTDHandler(handler) Set the current "DTDHandler". If no "DTDHandler" is set, DTD events will be discarded. XMLReader.getEntityResolver() Return the current "EntityResolver". XMLReader.setEntityResolver(handler) Set the current "EntityResolver". If no "EntityResolver" is set, attempts to resolve an external entity will result in opening the system identifier for the entity, and fail if it is not available. XMLReader.getErrorHandler() Return the current "ErrorHandler". XMLReader.setErrorHandler(handler) Set the current error handler. If no "ErrorHandler" is set, errors will be raised as exceptions, and warnings will be printed. XMLReader.setLocale(locale) Allow an application to set the locale for errors and warnings. SAX parsers are not required to provide localization for errors and warnings; if they cannot support the requested locale, however, they must raise a SAX exception. Applications may request a locale change in the middle of a parse. XMLReader.getFeature(featurename) Return the current setting for feature *featurename*. If the feature is not recognized, "SAXNotRecognizedException" is raised. The well-known featurenames are listed in the module "xml.sax.handler". XMLReader.setFeature(featurename, value) Set the *featurename* to *value*. If the feature is not recognized, "SAXNotRecognizedException" is raised. If the feature or its setting is not supported by the parser, *SAXNotSupportedException* is raised. XMLReader.getProperty(propertyname) Return the current setting for property *propertyname*. If the property is not recognized, a "SAXNotRecognizedException" is raised. The well-known propertynames are listed in the module "xml.sax.handler". XMLReader.setProperty(propertyname, value) Set the *propertyname* to *value*. If the property is not recognized, "SAXNotRecognizedException" is raised. If the property or its setting is not supported by the parser, *SAXNotSupportedException* is raised. IncrementalParser Objects ========================= Instances of "IncrementalParser" offer the following additional methods: IncrementalParser.feed(data) Process a chunk of *data*. IncrementalParser.close() Assume the end of the document. That will check well-formedness conditions that can be checked only at the end, invoke handlers, and may clean up resources allocated during parsing. IncrementalParser.reset() This method is called after close has been called to reset the parser so that it is ready to parse new documents. The results of calling parse or feed after close without calling reset are undefined. Locator Objects =============== Instances of "Locator" provide these methods: Locator.getColumnNumber() Return the column number where the current event begins. Locator.getLineNumber() Return the line number where the current event begins. Locator.getPublicId() Return the public identifier for the current event. Locator.getSystemId() Return the system identifier for the current event. InputSource Objects =================== InputSource.setPublicId(id) Sets the public identifier of this "InputSource". InputSource.getPublicId() Returns the public identifier of this "InputSource". InputSource.setSystemId(id) Sets the system identifier of this "InputSource". InputSource.getSystemId() Returns the system identifier of this "InputSource". InputSource.setEncoding(encoding) Sets the character encoding of this "InputSource". The encoding must be a string acceptable for an XML encoding declaration (see section 4.3.3 of the XML recommendation). The encoding attribute of the "InputSource" is ignored if the "InputSource" also contains a character stream. InputSource.getEncoding() Get the character encoding of this InputSource. InputSource.setByteStream(bytefile) Set the byte stream (a *binary file*) for this input source. The SAX parser will ignore this if there is also a character stream specified, but it will use a byte stream in preference to opening a URI connection itself. If the application knows the character encoding of the byte stream, it should set it with the setEncoding method. InputSource.getByteStream() Get the byte stream for this input source. The getEncoding method will return the character encoding for this byte stream, or "None" if unknown. InputSource.setCharacterStream(charfile) Set the character stream (a *text file*) for this input source. If there is a character stream specified, the SAX parser will ignore any byte stream and will not attempt to open a URI connection to the system identifier. InputSource.getCharacterStream() Get the character stream for this input source. The "Attributes" Interface ========================== "Attributes" objects implement a portion of the *mapping protocol*, including the methods "copy()", "get()", "__contains__()", "items()", "keys()", and "values()". The following methods are also provided: Attributes.getLength() Return the number of attributes. Attributes.getNames() Return the names of the attributes. Attributes.getType(name) Returns the type of the attribute *name*, which is normally "'CDATA'". Attributes.getValue(name) Return the value of attribute *name*. The "AttributesNS" Interface ============================ This interface is a subtype of the "Attributes" interface (see section The Attributes Interface). All methods supported by that interface are also available on "AttributesNS" objects. The following methods are also available: AttributesNS.getValueByQName(name) Return the value for a qualified name. AttributesNS.getNameByQName(name) Return the "(namespace, localname)" pair for a qualified *name*. AttributesNS.getQNameByName(name) Return the qualified name for a "(namespace, localname)" pair. AttributesNS.getQNames() Return the qualified names of all attributes. "xml.sax" — Support for SAX2 parsers ************************************ **Source code:** Lib/xml/sax/__init__.py ====================================================================== The "xml.sax" package provides a number of modules which implement the Simple API for XML (SAX) interface for Python. The package itself provides the SAX exceptions and the convenience functions which will be most used by users of the SAX API. Warning: The "xml.sax" module is not secure against maliciously constructed data. If you need to parse untrusted or unauthenticated data see XML vulnerabilities. Changed in version 3.7.1: The SAX parser no longer processes general external entities by default to increase security. Before, the parser created network connections to fetch remote files or loaded local files from the file system for DTD and entities. The feature can be enabled again with method "setFeature()" on the parser object and argument "feature_external_ges". The convenience functions are: xml.sax.make_parser(parser_list=[]) Create and return a SAX "XMLReader" object. The first parser found will be used. If *parser_list* is provided, it must be an iterable of strings which name modules that have a function named "create_parser()". Modules listed in *parser_list* will be used before modules in the default list of parsers. Changed in version 3.8: The *parser_list* argument can be any iterable, not just a list. xml.sax.parse(filename_or_stream, handler, error_handler=handler.ErrorHandler()) Create a SAX parser and use it to parse a document. The document, passed in as *filename_or_stream*, can be a filename or a file object. The *handler* parameter needs to be a SAX "ContentHandler" instance. If *error_handler* is given, it must be a SAX "ErrorHandler" instance; if omitted, "SAXParseException" will be raised on all errors. There is no return value; all work must be done by the *handler* passed in. xml.sax.parseString(string, handler, error_handler=handler.ErrorHandler()) Similar to "parse()", but parses from a buffer *string* received as a parameter. *string* must be a "str" instance or a *bytes-like object*. Changed in version 3.5: Added support of "str" instances. A typical SAX application uses three kinds of objects: readers, handlers and input sources. “Reader” in this context is another term for parser, i.e. some piece of code that reads the bytes or characters from the input source, and produces a sequence of events. The events then get distributed to the handler objects, i.e. the reader invokes a method on the handler. A SAX application must therefore obtain a reader object, create or open the input sources, create the handlers, and connect these objects all together. As the final step of preparation, the reader is called to parse the input. During parsing, methods on the handler objects are called based on structural and syntactic events from the input data. For these objects, only the interfaces are relevant; they are normally not instantiated by the application itself. Since Python does not have an explicit notion of interface, they are formally introduced as classes, but applications may use implementations which do not inherit from the provided classes. The "InputSource", "Locator", "Attributes", "AttributesNS", and "XMLReader" interfaces are defined in the module "xml.sax.xmlreader". The handler interfaces are defined in "xml.sax.handler". For convenience, "InputSource" (which is often instantiated directly) and the handler classes are also available from "xml.sax". These interfaces are described below. In addition to these classes, "xml.sax" provides the following exception classes. exception xml.sax.SAXException(msg, exception=None) Encapsulate an XML error or warning. This class can contain basic error or warning information from either the XML parser or the application: it can be subclassed to provide additional functionality or to add localization. Note that although the handlers defined in the "ErrorHandler" interface receive instances of this exception, it is not required to actually raise the exception — it is also useful as a container for information. When instantiated, *msg* should be a human-readable description of the error. The optional *exception* parameter, if given, should be "None" or an exception that was caught by the parsing code and is being passed along as information. This is the base class for the other SAX exception classes. exception xml.sax.SAXParseException(msg, exception, locator) Subclass of "SAXException" raised on parse errors. Instances of this class are passed to the methods of the SAX "ErrorHandler" interface to provide information about the parse error. This class supports the SAX "Locator" interface as well as the "SAXException" interface. exception xml.sax.SAXNotRecognizedException(msg, exception=None) Subclass of "SAXException" raised when a SAX "XMLReader" is confronted with an unrecognized feature or property. SAX applications and extensions may use this class for similar purposes. exception xml.sax.SAXNotSupportedException(msg, exception=None) Subclass of "SAXException" raised when a SAX "XMLReader" is asked to enable a feature that is not supported, or to set a property to a value that the implementation does not support. SAX applications and extensions may use this class for similar purposes. See also: SAX: The Simple API for XML This site is the focal point for the definition of the SAX API. It provides a Java implementation and online documentation. Links to implementations and historical information are also available. Module "xml.sax.handler" Definitions of the interfaces for application-provided objects. Module "xml.sax.saxutils" Convenience functions for use in SAX applications. Module "xml.sax.xmlreader" Definitions of the interfaces for parser-provided objects. SAXException Objects ==================== The "SAXException" exception class supports the following methods: SAXException.getMessage() Return a human-readable message describing the error condition. SAXException.getException() Return an encapsulated exception object, or "None". "xml.sax.saxutils" — SAX Utilities ********************************** **Source code:** Lib/xml/sax/saxutils.py ====================================================================== The module "xml.sax.saxutils" contains a number of classes and functions that are commonly useful when creating SAX applications, either in direct use, or as base classes. xml.sax.saxutils.escape(data, entities={}) Escape "'&'", "'<'", and "'>'" in a string of data. You can escape other strings of data by passing a dictionary as the optional *entities* parameter. The keys and values must all be strings; each key will be replaced with its corresponding value. The characters "'&'", "'<'" and "'>'" are always escaped, even if *entities* is provided. xml.sax.saxutils.unescape(data, entities={}) Unescape "'&'", "'<'", and "'>'" in a string of data. You can unescape other strings of data by passing a dictionary as the optional *entities* parameter. The keys and values must all be strings; each key will be replaced with its corresponding value. "'&'", "'<'", and "'>'" are always unescaped, even if *entities* is provided. xml.sax.saxutils.quoteattr(data, entities={}) Similar to "escape()", but also prepares *data* to be used as an attribute value. The return value is a quoted version of *data* with any additional required replacements. "quoteattr()" will select a quote character based on the content of *data*, attempting to avoid encoding any quote characters in the string. If both single- and double-quote characters are already in *data*, the double-quote characters will be encoded and *data* will be wrapped in double-quotes. The resulting string can be used directly as an attribute value: >>> print("" % quoteattr("ab ' cd \" ef")) This function is useful when generating attribute values for HTML or any SGML using the reference concrete syntax. class xml.sax.saxutils.XMLGenerator(out=None, encoding='iso-8859-1', short_empty_elements=False) This class implements the "ContentHandler" interface by writing SAX events back into an XML document. In other words, using an "XMLGenerator" as the content handler will reproduce the original document being parsed. *out* should be a file-like object which will default to *sys.stdout*. *encoding* is the encoding of the output stream which defaults to "'iso-8859-1'". *short_empty_elements* controls the formatting of elements that contain no content: if "False" (the default) they are emitted as a pair of start/end tags, if set to "True" they are emitted as a single self-closed tag. New in version 3.2: The *short_empty_elements* parameter. class xml.sax.saxutils.XMLFilterBase(base) This class is designed to sit between an "XMLReader" and the client application’s event handlers. By default, it does nothing but pass requests up to the reader and events on to the handlers unmodified, but subclasses can override specific methods to modify the event stream or the configuration requests as they pass through. xml.sax.saxutils.prepare_input_source(source, base='') This function takes an input source and an optional base URL and returns a fully resolved "InputSource" object ready for reading. The input source can be given as a string, a file-like object, or an "InputSource" object; parsers will use this function to implement the polymorphic *source* argument to their "parse()" method. XML Processing Modules ********************** **Source code:** Lib/xml/ ====================================================================== Python’s interfaces for processing XML are grouped in the "xml" package. Warning: The XML modules are not secure against erroneous or maliciously constructed data. If you need to parse untrusted or unauthenticated data see the XML vulnerabilities and The defusedxml Package sections. It is important to note that modules in the "xml" package require that there be at least one SAX-compliant XML parser available. The Expat parser is included with Python, so the "xml.parsers.expat" module will always be available. The documentation for the "xml.dom" and "xml.sax" packages are the definition of the Python bindings for the DOM and SAX interfaces. The XML handling submodules are: * "xml.etree.ElementTree": the ElementTree API, a simple and lightweight XML processor * "xml.dom": the DOM API definition * "xml.dom.minidom": a minimal DOM implementation * "xml.dom.pulldom": support for building partial DOM trees * "xml.sax": SAX2 base classes and convenience functions * "xml.parsers.expat": the Expat parser binding XML vulnerabilities =================== The XML processing modules are not secure against maliciously constructed data. An attacker can abuse XML features to carry out denial of service attacks, access local files, generate network connections to other machines, or circumvent firewalls. The following table gives an overview of the known attacks and whether the various modules are vulnerable to them. +---------------------------+--------------------+--------------------+--------------------+--------------------+--------------------+ | kind | sax | etree | minidom | pulldom | xmlrpc | |===========================|====================|====================|====================|====================|====================| | billion laughs | **Vulnerable** (1) | **Vulnerable** (1) | **Vulnerable** (1) | **Vulnerable** (1) | **Vulnerable** (1) | +---------------------------+--------------------+--------------------+--------------------+--------------------+--------------------+ | quadratic blowup | **Vulnerable** (1) | **Vulnerable** (1) | **Vulnerable** (1) | **Vulnerable** (1) | **Vulnerable** (1) | +---------------------------+--------------------+--------------------+--------------------+--------------------+--------------------+ | external entity expansion | Safe (5) | Safe (2) | Safe (3) | Safe (5) | Safe (4) | +---------------------------+--------------------+--------------------+--------------------+--------------------+--------------------+ | DTD retrieval | Safe (5) | Safe | Safe | Safe (5) | Safe | +---------------------------+--------------------+--------------------+--------------------+--------------------+--------------------+ | decompression bomb | Safe | Safe | Safe | Safe | **Vulnerable** | +---------------------------+--------------------+--------------------+--------------------+--------------------+--------------------+ | large tokens | **Vulnerable** (6) | **Vulnerable** (6) | **Vulnerable** (6) | **Vulnerable** (6) | **Vulnerable** (6) | +---------------------------+--------------------+--------------------+--------------------+--------------------+--------------------+ 1. Expat 2.4.1 and newer is not vulnerable to the “billion laughs” and “quadratic blowup” vulnerabilities. Items still listed as vulnerable due to potential reliance on system-provided libraries. Check "pyexpat.EXPAT_VERSION". 2. "xml.etree.ElementTree" doesn’t expand external entities and raises a "ParserError" when an entity occurs. 3. "xml.dom.minidom" doesn’t expand external entities and simply returns the unexpanded entity verbatim. 4. "xmlrpclib" doesn’t expand external entities and omits them. 5. Since Python 3.7.1, external general entities are no longer processed by default. 6. Expat 2.6.0 and newer is not vulnerable to denial of service through quadratic runtime caused by parsing large tokens. Items still listed as vulnerable due to potential reliance on system- provided libraries. Check "pyexpat.EXPAT_VERSION". billion laughs / exponential entity expansion The Billion Laughs attack – also known as exponential entity expansion – uses multiple levels of nested entities. Each entity refers to another entity several times, and the final entity definition contains a small string. The exponential expansion results in several gigabytes of text and consumes lots of memory and CPU time. quadratic blowup entity expansion A quadratic blowup attack is similar to a Billion Laughs attack; it abuses entity expansion, too. Instead of nested entities it repeats one large entity with a couple of thousand chars over and over again. The attack isn’t as efficient as the exponential case but it avoids triggering parser countermeasures that forbid deeply-nested entities. external entity expansion Entity declarations can contain more than just text for replacement. They can also point to external resources or local files. The XML parser accesses the resource and embeds the content into the XML document. DTD retrieval Some XML libraries like Python’s "xml.dom.pulldom" retrieve document type definitions from remote or local locations. The feature has similar implications as the external entity expansion issue. decompression bomb Decompression bombs (aka ZIP bomb) apply to all XML libraries that can parse compressed XML streams such as gzipped HTTP streams or LZMA-compressed files. For an attacker it can reduce the amount of transmitted data by three magnitudes or more. large tokens Expat needs to re-parse unfinished tokens; without the protection introduced in Expat 2.6.0, this can lead to quadratic runtime that can be used to cause denial of service in the application parsing XML. The issue is known as CVE-2023-52425. The documentation for defusedxml on PyPI has further information about all known attack vectors with examples and references. The "defusedxml" Package ======================== defusedxml is a pure Python package with modified subclasses of all stdlib XML parsers that prevent any potentially malicious operation. Use of this package is recommended for any server code that parses untrusted XML data. The package also ships with example exploits and extended documentation on more XML exploits such as XPath injection. "xmlrpc.client" — XML-RPC client access *************************************** **Source code:** Lib/xmlrpc/client.py ====================================================================== XML-RPC is a Remote Procedure Call method that uses XML passed via HTTP(S) as a transport. With it, a client can call methods with parameters on a remote server (the server is named by a URI) and get back structured data. This module supports writing XML-RPC client code; it handles all the details of translating between conformable Python objects and XML on the wire. Warning: The "xmlrpc.client" module is not secure against maliciously constructed data. If you need to parse untrusted or unauthenticated data see XML vulnerabilities. Changed in version 3.5: For HTTPS URIs, "xmlrpc.client" now performs all the necessary certificate and hostname checks by default. class xmlrpc.client.ServerProxy(uri, transport=None, encoding=None, verbose=False, allow_none=False, use_datetime=False, use_builtin_types=False, *, headers=(), context=None) A "ServerProxy" instance is an object that manages communication with a remote XML-RPC server. The required first argument is a URI (Uniform Resource Indicator), and will normally be the URL of the server. The optional second argument is a transport factory instance; by default it is an internal "SafeTransport" instance for https: URLs and an internal HTTP "Transport" instance otherwise. The optional third argument is an encoding, by default UTF-8. The optional fourth argument is a debugging flag. The following parameters govern the use of the returned proxy instance. If *allow_none* is true, the Python constant "None" will be translated into XML; the default behaviour is for "None" to raise a "TypeError". This is a commonly-used extension to the XML- RPC specification, but isn’t supported by all clients and servers; see http://ontosys.com/xml-rpc/extensions.php for a description. The *use_builtin_types* flag can be used to cause date/time values to be presented as "datetime.datetime" objects and binary data to be presented as "bytes" objects; this flag is false by default. "datetime.datetime", "bytes" and "bytearray" objects may be passed to calls. The *headers* parameter is an optional sequence of HTTP headers to send with each request, expressed as a sequence of 2-tuples representing the header name and value. (e.g. *[(‘Header- Name’, ‘value’)]*). The obsolete *use_datetime* flag is similar to *use_builtin_types* but it applies only to date/time values. Changed in version 3.3: The *use_builtin_types* flag was added. Changed in version 3.8: The *headers* parameter was added.Both the HTTP and HTTPS transports support the URL syntax extension for HTTP Basic Authentication: "http://user:pass@host:port/path". The "user:pass" portion will be base64-encoded as an HTTP ‘Authorization’ header, and sent to the remote server as part of the connection process when invoking an XML-RPC method. You only need to use this if the remote server requires a Basic Authentication user and password. If an HTTPS URL is provided, *context* may be "ssl.SSLContext" and configures the SSL settings of the underlying HTTPS connection.The returned instance is a proxy object with methods that can be used to invoke corresponding RPC calls on the remote server. If the remote server supports the introspection API, the proxy can also be used to query the remote server for the methods it supports (service discovery) and fetch other server-associated metadata.Types that are conformable (e.g. that can be marshalled through XML), include the following (and except where noted, they are unmarshalled as the same Python type): +------------------------+---------------------------------------------------------+ | XML-RPC type | Python type | |========================|=========================================================| | "boolean" | "bool" | +------------------------+---------------------------------------------------------+ | "int", "i1", "i2", | "int" in range from -2147483648 to 2147483647. Values | | "i4", "i8" or | get the "" tag. | | "biginteger" | | +------------------------+---------------------------------------------------------+ | "double" or "float" | "float". Values get the "" tag. | +------------------------+---------------------------------------------------------+ | "string" | "str" | +------------------------+---------------------------------------------------------+ | "array" | "list" or "tuple" containing conformable elements. | | | Arrays are returned as "lists". | +------------------------+---------------------------------------------------------+ | "struct" | "dict". Keys must be strings, values may be any | | | conformable type. Objects of user-defined classes can | | | be passed in; only their "__dict__" attribute is | | | transmitted. | +------------------------+---------------------------------------------------------+ | "dateTime.iso8601" | "DateTime" or "datetime.datetime". Returned type | | | depends on values of *use_builtin_types* and | | | *use_datetime* flags. | +------------------------+---------------------------------------------------------+ | "base64" | "Binary", "bytes" or "bytearray". Returned type | | | depends on the value of the *use_builtin_types* flag. | +------------------------+---------------------------------------------------------+ | "nil" | The "None" constant. Passing is allowed only if | | | *allow_none* is true. | +------------------------+---------------------------------------------------------+ | "bigdecimal" | "decimal.Decimal". Returned type only. | +------------------------+---------------------------------------------------------+ This is the full set of data types supported by XML-RPC. Method calls may also raise a special "Fault" instance, used to signal XML-RPC server errors, or "ProtocolError" used to signal an error in the HTTP/HTTPS transport layer. Both "Fault" and "ProtocolError" derive from a base class called "Error". Note that the xmlrpc client module currently does not marshal instances of subclasses of built-in types.When passing strings, characters special to XML such as "<", ">", and "&" will be automatically escaped. However, it’s the caller’s responsibility to ensure that the string is free of characters that aren’t allowed in XML, such as the control characters with ASCII values between 0 and 31 (except, of course, tab, newline and carriage return); failing to do this will result in an XML-RPC request that isn’t well-formed XML. If you have to pass arbitrary bytes via XML-RPC, use "bytes" or "bytearray" classes or the "Binary" wrapper class described below."Server" is retained as an alias for "ServerProxy" for backwards compatibility. New code should use "ServerProxy". Changed in version 3.5: Added the *context* argument. Changed in version 3.6: Added support of type tags with prefixes (e.g. "ex:nil"). Added support of unmarshalling additional types used by Apache XML-RPC implementation for numerics: "i1", "i2", "i8", "biginteger", "float" and "bigdecimal". See http://ws.apache.org/xmlrpc/types.html for a description. See also: XML-RPC HOWTO A good description of XML-RPC operation and client software in several languages. Contains pretty much everything an XML-RPC client developer needs to know. XML-RPC Introspection Describes the XML-RPC protocol extension for introspection. XML-RPC Specification The official specification. ServerProxy Objects =================== A "ServerProxy" instance has a method corresponding to each remote procedure call accepted by the XML-RPC server. Calling the method performs an RPC, dispatched by both name and argument signature (e.g. the same method name can be overloaded with multiple argument signatures). The RPC finishes by returning a value, which may be either returned data in a conformant type or a "Fault" or "ProtocolError" object indicating an error. Servers that support the XML introspection API support some common methods grouped under the reserved "system" attribute: ServerProxy.system.listMethods() This method returns a list of strings, one for each (non-system) method supported by the XML-RPC server. ServerProxy.system.methodSignature(name) This method takes one parameter, the name of a method implemented by the XML-RPC server. It returns an array of possible signatures for this method. A signature is an array of types. The first of these types is the return type of the method, the rest are parameters. Because multiple signatures (ie. overloading) is permitted, this method returns a list of signatures rather than a singleton. Signatures themselves are restricted to the top level parameters expected by a method. For instance if a method expects one array of structs as a parameter, and it returns a string, its signature is simply “string, array”. If it expects three integers and returns a string, its signature is “string, int, int, int”. If no signature is defined for the method, a non-array value is returned. In Python this means that the type of the returned value will be something other than list. ServerProxy.system.methodHelp(name) This method takes one parameter, the name of a method implemented by the XML-RPC server. It returns a documentation string describing the use of that method. If no such string is available, an empty string is returned. The documentation string may contain HTML markup. Changed in version 3.5: Instances of "ServerProxy" support the *context manager* protocol for closing the underlying transport. A working example follows. The server code: from xmlrpc.server import SimpleXMLRPCServer def is_even(n): return n % 2 == 0 server = SimpleXMLRPCServer(("localhost", 8000)) print("Listening on port 8000...") server.register_function(is_even, "is_even") server.serve_forever() The client code for the preceding server: import xmlrpc.client with xmlrpc.client.ServerProxy("http://localhost:8000/") as proxy: print("3 is even: %s" % str(proxy.is_even(3))) print("100 is even: %s" % str(proxy.is_even(100))) DateTime Objects ================ class xmlrpc.client.DateTime This class may be initialized with seconds since the epoch, a time tuple, an ISO 8601 time/date string, or a "datetime.datetime" instance. It has the following methods, supported mainly for internal use by the marshalling/unmarshalling code: decode(string) Accept a string as the instance’s new time value. encode(out) Write the XML-RPC encoding of this "DateTime" item to the *out* stream object. It also supports certain of Python’s built-in operators through rich comparison and "__repr__()" methods. A working example follows. The server code: import datetime from xmlrpc.server import SimpleXMLRPCServer import xmlrpc.client def today(): today = datetime.datetime.today() return xmlrpc.client.DateTime(today) server = SimpleXMLRPCServer(("localhost", 8000)) print("Listening on port 8000...") server.register_function(today, "today") server.serve_forever() The client code for the preceding server: import xmlrpc.client import datetime proxy = xmlrpc.client.ServerProxy("http://localhost:8000/") today = proxy.today() # convert the ISO8601 string to a datetime object converted = datetime.datetime.strptime(today.value, "%Y%m%dT%H:%M:%S") print("Today: %s" % converted.strftime("%d.%m.%Y, %H:%M")) Binary Objects ============== class xmlrpc.client.Binary This class may be initialized from bytes data (which may include NULs). The primary access to the content of a "Binary" object is provided by an attribute: data The binary data encapsulated by the "Binary" instance. The data is provided as a "bytes" object. "Binary" objects have the following methods, supported mainly for internal use by the marshalling/unmarshalling code: decode(bytes) Accept a base64 "bytes" object and decode it as the instance’s new data. encode(out) Write the XML-RPC base 64 encoding of this binary item to the *out* stream object. The encoded data will have newlines every 76 characters as per **RFC 2045 section 6.8**, which was the de facto standard base64 specification when the XML-RPC spec was written. It also supports certain of Python’s built-in operators through "__eq__()" and "__ne__()" methods. Example usage of the binary objects. We’re going to transfer an image over XMLRPC: from xmlrpc.server import SimpleXMLRPCServer import xmlrpc.client def python_logo(): with open("python_logo.jpg", "rb") as handle: return xmlrpc.client.Binary(handle.read()) server = SimpleXMLRPCServer(("localhost", 8000)) print("Listening on port 8000...") server.register_function(python_logo, 'python_logo') server.serve_forever() The client gets the image and saves it to a file: import xmlrpc.client proxy = xmlrpc.client.ServerProxy("http://localhost:8000/") with open("fetched_python_logo.jpg", "wb") as handle: handle.write(proxy.python_logo().data) Fault Objects ============= class xmlrpc.client.Fault A "Fault" object encapsulates the content of an XML-RPC fault tag. Fault objects have the following attributes: faultCode A string indicating the fault type. faultString A string containing a diagnostic message associated with the fault. In the following example we’re going to intentionally cause a "Fault" by returning a complex type object. The server code: from xmlrpc.server import SimpleXMLRPCServer # A marshalling error is going to occur because we're returning a # complex number def add(x, y): return x+y+0j server = SimpleXMLRPCServer(("localhost", 8000)) print("Listening on port 8000...") server.register_function(add, 'add') server.serve_forever() The client code for the preceding server: import xmlrpc.client proxy = xmlrpc.client.ServerProxy("http://localhost:8000/") try: proxy.add(2, 5) except xmlrpc.client.Fault as err: print("A fault occurred") print("Fault code: %d" % err.faultCode) print("Fault string: %s" % err.faultString) ProtocolError Objects ===================== class xmlrpc.client.ProtocolError A "ProtocolError" object describes a protocol error in the underlying transport layer (such as a 404 ‘not found’ error if the server named by the URI does not exist). It has the following attributes: url The URI or URL that triggered the error. errcode The error code. errmsg The error message or diagnostic string. headers A dict containing the headers of the HTTP/HTTPS request that triggered the error. In the following example we’re going to intentionally cause a "ProtocolError" by providing an invalid URI: import xmlrpc.client # create a ServerProxy with a URI that doesn't respond to XMLRPC requests proxy = xmlrpc.client.ServerProxy("http://google.com/") try: proxy.some_method() except xmlrpc.client.ProtocolError as err: print("A protocol error occurred") print("URL: %s" % err.url) print("HTTP/HTTPS headers: %s" % err.headers) print("Error code: %d" % err.errcode) print("Error message: %s" % err.errmsg) MultiCall Objects ================= The "MultiCall" object provides a way to encapsulate multiple calls to a remote server into a single request [1]. class xmlrpc.client.MultiCall(server) Create an object used to boxcar method calls. *server* is the eventual target of the call. Calls can be made to the result object, but they will immediately return "None", and only store the call name and parameters in the "MultiCall" object. Calling the object itself causes all stored calls to be transmitted as a single "system.multicall" request. The result of this call is a *generator*; iterating over this generator yields the individual results. A usage example of this class follows. The server code: from xmlrpc.server import SimpleXMLRPCServer def add(x, y): return x + y def subtract(x, y): return x - y def multiply(x, y): return x * y def divide(x, y): return x // y # A simple server with simple arithmetic functions server = SimpleXMLRPCServer(("localhost", 8000)) print("Listening on port 8000...") server.register_multicall_functions() server.register_function(add, 'add') server.register_function(subtract, 'subtract') server.register_function(multiply, 'multiply') server.register_function(divide, 'divide') server.serve_forever() The client code for the preceding server: import xmlrpc.client proxy = xmlrpc.client.ServerProxy("http://localhost:8000/") multicall = xmlrpc.client.MultiCall(proxy) multicall.add(7, 3) multicall.subtract(7, 3) multicall.multiply(7, 3) multicall.divide(7, 3) result = multicall() print("7+3=%d, 7-3=%d, 7*3=%d, 7//3=%d" % tuple(result)) Convenience Functions ===================== xmlrpc.client.dumps(params, methodname=None, methodresponse=None, encoding=None, allow_none=False) Convert *params* into an XML-RPC request. or into a response if *methodresponse* is true. *params* can be either a tuple of arguments or an instance of the "Fault" exception class. If *methodresponse* is true, only a single value can be returned, meaning that *params* must be of length 1. *encoding*, if supplied, is the encoding to use in the generated XML; the default is UTF-8. Python’s "None" value cannot be used in standard XML-RPC; to allow using it via an extension, provide a true value for *allow_none*. xmlrpc.client.loads(data, use_datetime=False, use_builtin_types=False) Convert an XML-RPC request or response into Python objects, a "(params, methodname)". *params* is a tuple of argument; *methodname* is a string, or "None" if no method name is present in the packet. If the XML-RPC packet represents a fault condition, this function will raise a "Fault" exception. The *use_builtin_types* flag can be used to cause date/time values to be presented as "datetime.datetime" objects and binary data to be presented as "bytes" objects; this flag is false by default. The obsolete *use_datetime* flag is similar to *use_builtin_types* but it applies only to date/time values. Changed in version 3.3: The *use_builtin_types* flag was added. Example of Client Usage ======================= # simple test program (from the XML-RPC specification) from xmlrpc.client import ServerProxy, Error # server = ServerProxy("http://localhost:8000") # local server with ServerProxy("http://betty.userland.com") as proxy: print(proxy) try: print(proxy.examples.getStateName(41)) except Error as v: print("ERROR", v) To access an XML-RPC server through a HTTP proxy, you need to define a custom transport. The following example shows how: import http.client import xmlrpc.client class ProxiedTransport(xmlrpc.client.Transport): def set_proxy(self, host, port=None, headers=None): self.proxy = host, port self.proxy_headers = headers def make_connection(self, host): connection = http.client.HTTPConnection(*self.proxy) connection.set_tunnel(host, headers=self.proxy_headers) self._connection = host, connection return connection transport = ProxiedTransport() transport.set_proxy('proxy-server', 8080) server = xmlrpc.client.ServerProxy('http://betty.userland.com', transport=transport) print(server.examples.getStateName(41)) Example of Client and Server Usage ================================== See SimpleXMLRPCServer Example. -[ Footnotes ]- [1] This approach has been first presented in a discussion on xmlrpc.com. "xmlrpc.server" — Basic XML-RPC servers *************************************** **Source code:** Lib/xmlrpc/server.py ====================================================================== The "xmlrpc.server" module provides a basic server framework for XML- RPC servers written in Python. Servers can either be free standing, using "SimpleXMLRPCServer", or embedded in a CGI environment, using "CGIXMLRPCRequestHandler". Warning: The "xmlrpc.server" module is not secure against maliciously constructed data. If you need to parse untrusted or unauthenticated data see XML vulnerabilities. class xmlrpc.server.SimpleXMLRPCServer(addr, requestHandler=SimpleXMLRPCRequestHandler, logRequests=True, allow_none=False, encoding=None, bind_and_activate=True, use_builtin_types=False) Create a new server instance. This class provides methods for registration of functions that can be called by the XML-RPC protocol. The *requestHandler* parameter should be a factory for request handler instances; it defaults to "SimpleXMLRPCRequestHandler". The *addr* and *requestHandler* parameters are passed to the "socketserver.TCPServer" constructor. If *logRequests* is true (the default), requests will be logged; setting this parameter to false will turn off logging. The *allow_none* and *encoding* parameters are passed on to "xmlrpc.client" and control the XML-RPC responses that will be returned from the server. The *bind_and_activate* parameter controls whether "server_bind()" and "server_activate()" are called immediately by the constructor; it defaults to true. Setting it to false allows code to manipulate the *allow_reuse_address* class variable before the address is bound. The *use_builtin_types* parameter is passed to the "loads()" function and controls which types are processed when date/times values or binary data are received; it defaults to false. Changed in version 3.3: The *use_builtin_types* flag was added. class xmlrpc.server.CGIXMLRPCRequestHandler(allow_none=False, encoding=None, use_builtin_types=False) Create a new instance to handle XML-RPC requests in a CGI environment. The *allow_none* and *encoding* parameters are passed on to "xmlrpc.client" and control the XML-RPC responses that will be returned from the server. The *use_builtin_types* parameter is passed to the "loads()" function and controls which types are processed when date/times values or binary data are received; it defaults to false. Changed in version 3.3: The *use_builtin_types* flag was added. class xmlrpc.server.SimpleXMLRPCRequestHandler Create a new request handler instance. This request handler supports "POST" requests and modifies logging so that the *logRequests* parameter to the "SimpleXMLRPCServer" constructor parameter is honored. SimpleXMLRPCServer Objects ========================== The "SimpleXMLRPCServer" class is based on "socketserver.TCPServer" and provides a means of creating simple, stand alone XML-RPC servers. SimpleXMLRPCServer.register_function(function=None, name=None) Register a function that can respond to XML-RPC requests. If *name* is given, it will be the method name associated with *function*, otherwise "function.__name__" will be used. *name* is a string, and may contain characters not legal in Python identifiers, including the period character. This method can also be used as a decorator. When used as a decorator, *name* can only be given as a keyword argument to register *function* under *name*. If no *name* is given, "function.__name__" will be used. Changed in version 3.7: "register_function()" can be used as a decorator. SimpleXMLRPCServer.register_instance(instance, allow_dotted_names=False) Register an object which is used to expose method names which have not been registered using "register_function()". If *instance* contains a "_dispatch()" method, it is called with the requested method name and the parameters from the request. Its API is "def _dispatch(self, method, params)" (note that *params* does not represent a variable argument list). If it calls an underlying function to perform its task, that function is called as "func(*params)", expanding the parameter list. The return value from "_dispatch()" is returned to the client as the result. If *instance* does not have a "_dispatch()" method, it is searched for an attribute matching the name of the requested method. If the optional *allow_dotted_names* argument is true and the instance does not have a "_dispatch()" method, then if the requested method name contains periods, each component of the method name is searched for individually, with the effect that a simple hierarchical search is performed. The value found from this search is then called with the parameters from the request, and the return value is passed back to the client. Warning: Enabling the *allow_dotted_names* option allows intruders to access your module’s global variables and may allow intruders to execute arbitrary code on your machine. Only use this option on a secure, closed network. SimpleXMLRPCServer.register_introspection_functions() Registers the XML-RPC introspection functions "system.listMethods", "system.methodHelp" and "system.methodSignature". SimpleXMLRPCServer.register_multicall_functions() Registers the XML-RPC multicall function system.multicall. SimpleXMLRPCRequestHandler.rpc_paths An attribute value that must be a tuple listing valid path portions of the URL for receiving XML-RPC requests. Requests posted to other paths will result in a 404 “no such page” HTTP error. If this tuple is empty, all paths will be considered valid. The default value is "('/', '/RPC2')". SimpleXMLRPCServer Example -------------------------- Server code: from xmlrpc.server import SimpleXMLRPCServer from xmlrpc.server import SimpleXMLRPCRequestHandler # Restrict to a particular path. class RequestHandler(SimpleXMLRPCRequestHandler): rpc_paths = ('/RPC2',) # Create server with SimpleXMLRPCServer(('localhost', 8000), requestHandler=RequestHandler) as server: server.register_introspection_functions() # Register pow() function; this will use the value of # pow.__name__ as the name, which is just 'pow'. server.register_function(pow) # Register a function under a different name def adder_function(x, y): return x + y server.register_function(adder_function, 'add') # Register an instance; all the methods of the instance are # published as XML-RPC methods (in this case, just 'mul'). class MyFuncs: def mul(self, x, y): return x * y server.register_instance(MyFuncs()) # Run the server's main loop server.serve_forever() The following client code will call the methods made available by the preceding server: import xmlrpc.client s = xmlrpc.client.ServerProxy('http://localhost:8000') print(s.pow(2,3)) # Returns 2**3 = 8 print(s.add(2,3)) # Returns 5 print(s.mul(5,2)) # Returns 5*2 = 10 # Print list of available methods print(s.system.listMethods()) "register_function()" can also be used as a decorator. The previous server example can register functions in a decorator way: from xmlrpc.server import SimpleXMLRPCServer from xmlrpc.server import SimpleXMLRPCRequestHandler class RequestHandler(SimpleXMLRPCRequestHandler): rpc_paths = ('/RPC2',) with SimpleXMLRPCServer(('localhost', 8000), requestHandler=RequestHandler) as server: server.register_introspection_functions() # Register pow() function; this will use the value of # pow.__name__ as the name, which is just 'pow'. server.register_function(pow) # Register a function under a different name, using # register_function as a decorator. *name* can only be given # as a keyword argument. @server.register_function(name='add') def adder_function(x, y): return x + y # Register a function under function.__name__. @server.register_function def mul(x, y): return x * y server.serve_forever() The following example included in the "Lib/xmlrpc/server.py" module shows a server allowing dotted names and registering a multicall function. Warning: Enabling the *allow_dotted_names* option allows intruders to access your module’s global variables and may allow intruders to execute arbitrary code on your machine. Only use this example only within a secure, closed network. import datetime class ExampleService: def getData(self): return '42' class currentTime: @staticmethod def getCurrentTime(): return datetime.datetime.now() with SimpleXMLRPCServer(("localhost", 8000)) as server: server.register_function(pow) server.register_function(lambda x,y: x+y, 'add') server.register_instance(ExampleService(), allow_dotted_names=True) server.register_multicall_functions() print('Serving XML-RPC on localhost port 8000') try: server.serve_forever() except KeyboardInterrupt: print("\nKeyboard interrupt received, exiting.") sys.exit(0) This ExampleService demo can be invoked from the command line: python -m xmlrpc.server The client that interacts with the above server is included in *Lib/xmlrpc/client.py*: server = ServerProxy("http://localhost:8000") try: print(server.currentTime.getCurrentTime()) except Error as v: print("ERROR", v) multi = MultiCall(server) multi.getData() multi.pow(2,9) multi.add(1,2) try: for response in multi(): print(response) except Error as v: print("ERROR", v) This client which interacts with the demo XMLRPC server can be invoked as: python -m xmlrpc.client CGIXMLRPCRequestHandler ======================= The "CGIXMLRPCRequestHandler" class can be used to handle XML-RPC requests sent to Python CGI scripts. CGIXMLRPCRequestHandler.register_function(function=None, name=None) Register a function that can respond to XML-RPC requests. If *name* is given, it will be the method name associated with *function*, otherwise "function.__name__" will be used. *name* is a string, and may contain characters not legal in Python identifiers, including the period character. This method can also be used as a decorator. When used as a decorator, *name* can only be given as a keyword argument to register *function* under *name*. If no *name* is given, "function.__name__" will be used. Changed in version 3.7: "register_function()" can be used as a decorator. CGIXMLRPCRequestHandler.register_instance(instance) Register an object which is used to expose method names which have not been registered using "register_function()". If instance contains a "_dispatch()" method, it is called with the requested method name and the parameters from the request; the return value is returned to the client as the result. If instance does not have a "_dispatch()" method, it is searched for an attribute matching the name of the requested method; if the requested method name contains periods, each component of the method name is searched for individually, with the effect that a simple hierarchical search is performed. The value found from this search is then called with the parameters from the request, and the return value is passed back to the client. CGIXMLRPCRequestHandler.register_introspection_functions() Register the XML-RPC introspection functions "system.listMethods", "system.methodHelp" and "system.methodSignature". CGIXMLRPCRequestHandler.register_multicall_functions() Register the XML-RPC multicall function "system.multicall". CGIXMLRPCRequestHandler.handle_request(request_text=None) Handle an XML-RPC request. If *request_text* is given, it should be the POST data provided by the HTTP server, otherwise the contents of stdin will be used. Example: class MyFuncs: def mul(self, x, y): return x * y handler = CGIXMLRPCRequestHandler() handler.register_function(pow) handler.register_function(lambda x,y: x+y, 'add') handler.register_introspection_functions() handler.register_instance(MyFuncs()) handler.handle_request() Documenting XMLRPC server ========================= These classes extend the above classes to serve HTML documentation in response to HTTP GET requests. Servers can either be free standing, using "DocXMLRPCServer", or embedded in a CGI environment, using "DocCGIXMLRPCRequestHandler". class xmlrpc.server.DocXMLRPCServer(addr, requestHandler=DocXMLRPCRequestHandler, logRequests=True, allow_none=False, encoding=None, bind_and_activate=True, use_builtin_types=True) Create a new server instance. All parameters have the same meaning as for "SimpleXMLRPCServer"; *requestHandler* defaults to "DocXMLRPCRequestHandler". Changed in version 3.3: The *use_builtin_types* flag was added. class xmlrpc.server.DocCGIXMLRPCRequestHandler Create a new instance to handle XML-RPC requests in a CGI environment. class xmlrpc.server.DocXMLRPCRequestHandler Create a new request handler instance. This request handler supports XML-RPC POST requests, documentation GET requests, and modifies logging so that the *logRequests* parameter to the "DocXMLRPCServer" constructor parameter is honored. DocXMLRPCServer Objects ======================= The "DocXMLRPCServer" class is derived from "SimpleXMLRPCServer" and provides a means of creating self-documenting, stand alone XML-RPC servers. HTTP POST requests are handled as XML-RPC method calls. HTTP GET requests are handled by generating pydoc-style HTML documentation. This allows a server to provide its own web-based documentation. DocXMLRPCServer.set_server_title(server_title) Set the title used in the generated HTML documentation. This title will be used inside the HTML “title” element. DocXMLRPCServer.set_server_name(server_name) Set the name used in the generated HTML documentation. This name will appear at the top of the generated documentation inside a “h1” element. DocXMLRPCServer.set_server_documentation(server_documentation) Set the description used in the generated HTML documentation. This description will appear as a paragraph, below the server name, in the documentation. DocCGIXMLRPCRequestHandler ========================== The "DocCGIXMLRPCRequestHandler" class is derived from "CGIXMLRPCRequestHandler" and provides a means of creating self- documenting, XML-RPC CGI scripts. HTTP POST requests are handled as XML-RPC method calls. HTTP GET requests are handled by generating pydoc-style HTML documentation. This allows a server to provide its own web-based documentation. DocCGIXMLRPCRequestHandler.set_server_title(server_title) Set the title used in the generated HTML documentation. This title will be used inside the HTML “title” element. DocCGIXMLRPCRequestHandler.set_server_name(server_name) Set the name used in the generated HTML documentation. This name will appear at the top of the generated documentation inside a “h1” element. DocCGIXMLRPCRequestHandler.set_server_documentation(server_documentation) Set the description used in the generated HTML documentation. This description will appear as a paragraph, below the server name, in the documentation. "xmlrpc" — XMLRPC server and client modules ******************************************* XML-RPC is a Remote Procedure Call method that uses XML passed via HTTP as a transport. With it, a client can call methods with parameters on a remote server (the server is named by a URI) and get back structured data. "xmlrpc" is a package that collects server and client modules implementing XML-RPC. The modules are: * "xmlrpc.client" * "xmlrpc.server" "zipapp" — Manage executable Python zip archives ************************************************ New in version 3.5. **Source code:** Lib/zipapp.py ====================================================================== This module provides tools to manage the creation of zip files containing Python code, which can be executed directly by the Python interpreter. The module provides both a Command-Line Interface and a Python API. Basic Example ============= The following example shows how the Command-Line Interface can be used to create an executable archive from a directory containing Python code. When run, the archive will execute the "main" function from the module "myapp" in the archive. $ python -m zipapp myapp -m "myapp:main" $ python myapp.pyz Command-Line Interface ====================== When called as a program from the command line, the following form is used: $ python -m zipapp source [options] If *source* is a directory, this will create an archive from the contents of *source*. If *source* is a file, it should be an archive, and it will be copied to the target archive (or the contents of its shebang line will be displayed if the –info option is specified). The following options are understood: -o , --output= Write the output to a file named *output*. If this option is not specified, the output filename will be the same as the input *source*, with the extension ".pyz" added. If an explicit filename is given, it is used as is (so a ".pyz" extension should be included if required). An output filename must be specified if the *source* is an archive (and in that case, *output* must not be the same as *source*). -p , --python= Add a "#!" line to the archive specifying *interpreter* as the command to run. Also, on POSIX, make the archive executable. The default is to write no "#!" line, and not make the file executable. -m , --main= Write a "__main__.py" file to the archive that executes *mainfn*. The *mainfn* argument should have the form “pkg.mod:fn”, where “pkg.mod” is a package/module in the archive, and “fn” is a callable in the given module. The "__main__.py" file will execute that callable. "--main" cannot be specified when copying an archive. -c, --compress Compress files with the deflate method, reducing the size of the output file. By default, files are stored uncompressed in the archive. "--compress" has no effect when copying an archive. New in version 3.7. --info Display the interpreter embedded in the archive, for diagnostic purposes. In this case, any other options are ignored and SOURCE must be an archive, not a directory. -h, --help Print a short usage message and exit. Python API ========== The module defines two convenience functions: zipapp.create_archive(source, target=None, interpreter=None, main=None, filter=None, compressed=False) Create an application archive from *source*. The source can be any of the following: * The name of a directory, or a *path-like object* referring to a directory, in which case a new application archive will be created from the content of that directory. * The name of an existing application archive file, or a *path-like object* referring to such a file, in which case the file is copied to the target (modifying it to reflect the value given for the *interpreter* argument). The file name should include the ".pyz" extension, if required. * A file object open for reading in bytes mode. The content of the file should be an application archive, and the file object is assumed to be positioned at the start of the archive. The *target* argument determines where the resulting archive will be written: * If it is the name of a file, or a *path-like object*, the archive will be written to that file. * If it is an open file object, the archive will be written to that file object, which must be open for writing in bytes mode. * If the target is omitted (or "None"), the source must be a directory and the target will be a file with the same name as the source, with a ".pyz" extension added. The *interpreter* argument specifies the name of the Python interpreter with which the archive will be executed. It is written as a “shebang” line at the start of the archive. On POSIX, this will be interpreted by the OS, and on Windows it will be handled by the Python launcher. Omitting the *interpreter* results in no shebang line being written. If an interpreter is specified, and the target is a filename, the executable bit of the target file will be set. The *main* argument specifies the name of a callable which will be used as the main program for the archive. It can only be specified if the source is a directory, and the source does not already contain a "__main__.py" file. The *main* argument should take the form “pkg.module:callable” and the archive will be run by importing “pkg.module” and executing the given callable with no arguments. It is an error to omit *main* if the source is a directory and does not contain a "__main__.py" file, as otherwise the resulting archive would not be executable. The optional *filter* argument specifies a callback function that is passed a Path object representing the path to the file being added (relative to the source directory). It should return "True" if the file is to be added. The optional *compressed* argument determines whether files are compressed. If set to "True", files in the archive are compressed with the deflate method; otherwise, files are stored uncompressed. This argument has no effect when copying an existing archive. If a file object is specified for *source* or *target*, it is the caller’s responsibility to close it after calling create_archive. When copying an existing archive, file objects supplied only need "read" and "readline", or "write" methods. When creating an archive from a directory, if the target is a file object it will be passed to the "zipfile.ZipFile" class, and must supply the methods needed by that class. New in version 3.7: Added the *filter* and *compressed* arguments. zipapp.get_interpreter(archive) Return the interpreter specified in the "#!" line at the start of the archive. If there is no "#!" line, return "None". The *archive* argument can be a filename or a file-like object open for reading in bytes mode. It is assumed to be at the start of the archive. Examples ======== Pack up a directory into an archive, and run it. $ python -m zipapp myapp $ python myapp.pyz The same can be done using the "create_archive()" function: >>> import zipapp >>> zipapp.create_archive('myapp', 'myapp.pyz') To make the application directly executable on POSIX, specify an interpreter to use. $ python -m zipapp myapp -p "/usr/bin/env python" $ ./myapp.pyz To replace the shebang line on an existing archive, create a modified archive using the "create_archive()" function: >>> import zipapp >>> zipapp.create_archive('old_archive.pyz', 'new_archive.pyz', '/usr/bin/python3') To update the file in place, do the replacement in memory using a "BytesIO" object, and then overwrite the source afterwards. Note that there is a risk when overwriting a file in place that an error will result in the loss of the original file. This code does not protect against such errors, but production code should do so. Also, this method will only work if the archive fits in memory: >>> import zipapp >>> import io >>> temp = io.BytesIO() >>> zipapp.create_archive('myapp.pyz', temp, '/usr/bin/python2') >>> with open('myapp.pyz', 'wb') as f: >>> f.write(temp.getvalue()) Specifying the Interpreter ========================== Note that if you specify an interpreter and then distribute your application archive, you need to ensure that the interpreter used is portable. The Python launcher for Windows supports most common forms of POSIX "#!" line, but there are other issues to consider: * If you use “/usr/bin/env python” (or other forms of the “python” command, such as “/usr/bin/python”), you need to consider that your users may have either Python 2 or Python 3 as their default, and write your code to work under both versions. * If you use an explicit version, for example “/usr/bin/env python3” your application will not work for users who do not have that version. (This may be what you want if you have not made your code Python 2 compatible). * There is no way to say “python X.Y or later”, so be careful of using an exact version like “/usr/bin/env python3.4” as you will need to change your shebang line for users of Python 3.5, for example. Typically, you should use an “/usr/bin/env python2” or “/usr/bin/env python3”, depending on whether your code is written for Python 2 or 3. Creating Standalone Applications with zipapp ============================================ Using the "zipapp" module, it is possible to create self-contained Python programs, which can be distributed to end users who only need to have a suitable version of Python installed on their system. The key to doing this is to bundle all of the application’s dependencies into the archive, along with the application code. The steps to create a standalone archive are as follows: 1. Create your application in a directory as normal, so you have a "myapp" directory containing a "__main__.py" file, and any supporting application code. 2. Install all of your application’s dependencies into the "myapp" directory, using pip: $ python -m pip install -r requirements.txt --target myapp (this assumes you have your project requirements in a "requirements.txt" file - if not, you can just list the dependencies manually on the pip command line). 3. Optionally, delete the ".dist-info" directories created by pip in the "myapp" directory. These hold metadata for pip to manage the packages, and as you won’t be making any further use of pip they aren’t required - although it won’t do any harm if you leave them. 4. Package the application using: $ python -m zipapp -p "interpreter" myapp This will produce a standalone executable, which can be run on any machine with the appropriate interpreter available. See Specifying the Interpreter for details. It can be shipped to users as a single file. On Unix, the "myapp.pyz" file is executable as it stands. You can rename the file to remove the ".pyz" extension if you prefer a “plain” command name. On Windows, the "myapp.pyz[w]" file is executable by virtue of the fact that the Python interpreter registers the ".pyz" and ".pyzw" file extensions when installed. Making a Windows executable --------------------------- On Windows, registration of the ".pyz" extension is optional, and furthermore, there are certain places that don’t recognise registered extensions “transparently” (the simplest example is that "subprocess.run(['myapp'])" won’t find your application - you need to explicitly specify the extension). On Windows, therefore, it is often preferable to create an executable from the zipapp. This is relatively easy, although it does require a C compiler. The basic approach relies on the fact that zipfiles can have arbitrary data prepended, and Windows exe files can have arbitrary data appended. So by creating a suitable launcher and tacking the ".pyz" file onto the end of it, you end up with a single- file executable that runs your application. A suitable launcher can be as simple as the following: #define Py_LIMITED_API 1 #include "Python.h" #define WIN32_LEAN_AND_MEAN #include #ifdef WINDOWS int WINAPI wWinMain( HINSTANCE hInstance, /* handle to current instance */ HINSTANCE hPrevInstance, /* handle to previous instance */ LPWSTR lpCmdLine, /* pointer to command line */ int nCmdShow /* show state of window */ ) #else int wmain() #endif { wchar_t **myargv = _alloca((__argc + 1) * sizeof(wchar_t*)); myargv[0] = __wargv[0]; memcpy(myargv + 1, __wargv, __argc * sizeof(wchar_t *)); return Py_Main(__argc+1, myargv); } If you define the "WINDOWS" preprocessor symbol, this will generate a GUI executable, and without it, a console executable. To compile the executable, you can either just use the standard MSVC command line tools, or you can take advantage of the fact that distutils knows how to compile Python source: >>> from distutils.ccompiler import new_compiler >>> import distutils.sysconfig >>> import sys >>> import os >>> from pathlib import Path >>> def compile(src): >>> src = Path(src) >>> cc = new_compiler() >>> exe = src.stem >>> cc.add_include_dir(distutils.sysconfig.get_python_inc()) >>> cc.add_library_dir(os.path.join(sys.base_exec_prefix, 'libs')) >>> # First the CLI executable >>> objs = cc.compile([str(src)]) >>> cc.link_executable(objs, exe) >>> # Now the GUI executable >>> cc.define_macro('WINDOWS') >>> objs = cc.compile([str(src)]) >>> cc.link_executable(objs, exe + 'w') >>> if __name__ == "__main__": >>> compile("zastub.c") The resulting launcher uses the “Limited ABI”, so it will run unchanged with any version of Python 3.x. All it needs is for Python ("python3.dll") to be on the user’s "PATH". For a fully standalone distribution, you can distribute the launcher with your application appended, bundled with the Python “embedded” distribution. This will run on any PC with the appropriate architecture (32 bit or 64 bit). Caveats ------- There are some limitations to the process of bundling your application into a single file. In most, if not all, cases they can be addressed without needing major changes to your application. 1. If your application depends on a package that includes a C extension, that package cannot be run from a zip file (this is an OS limitation, as executable code must be present in the filesystem for the OS loader to load it). In this case, you can exclude that dependency from the zipfile, and either require your users to have it installed, or ship it alongside your zipfile and add code to your "__main__.py" to include the directory containing the unzipped module in "sys.path". In this case, you will need to make sure to ship appropriate binaries for your target architecture(s) (and potentially pick the correct version to add to "sys.path" at runtime, based on the user’s machine). 2. If you are shipping a Windows executable as described above, you either need to ensure that your users have "python3.dll" on their PATH (which is not the default behaviour of the installer) or you should bundle your application with the embedded distribution. 3. The suggested launcher above uses the Python embedding API. This means that in your application, "sys.executable" will be your application, and *not* a conventional Python interpreter. Your code and its dependencies need to be prepared for this possibility. For example, if your application uses the "multiprocessing" module, it will need to call "multiprocessing.set_executable()" to let the module know where to find the standard Python interpreter. The Python Zip Application Archive Format ========================================= Python has been able to execute zip files which contain a "__main__.py" file since version 2.6. In order to be executed by Python, an application archive simply has to be a standard zip file containing a "__main__.py" file which will be run as the entry point for the application. As usual for any Python script, the parent of the script (in this case the zip file) will be placed on "sys.path" and thus further modules can be imported from the zip file. The zip file format allows arbitrary data to be prepended to a zip file. The zip application format uses this ability to prepend a standard POSIX “shebang” line to the file ("#!/path/to/interpreter"). Formally, the Python zip application format is therefore: 1. An optional shebang line, containing the characters "b'#!'" followed by an interpreter name, and then a newline ("b'\n'") character. The interpreter name can be anything acceptable to the OS “shebang” processing, or the Python launcher on Windows. The interpreter should be encoded in UTF-8 on Windows, and in "sys.getfilesystemencoding()" on POSIX. 2. Standard zipfile data, as generated by the "zipfile" module. The zipfile content *must* include a file called "__main__.py" (which must be in the “root” of the zipfile - i.e., it cannot be in a subdirectory). The zipfile data can be compressed or uncompressed. If an application archive has a shebang line, it may have the executable bit set on POSIX systems, to allow it to be executed directly. There is no requirement that the tools in this module are used to create application archives - the module is a convenience, but archives in the above format created by any means are acceptable to Python. "zipfile" — Work with ZIP archives ********************************** **Source code:** Lib/zipfile.py ====================================================================== The ZIP file format is a common archive and compression standard. This module provides tools to create, read, write, append, and list a ZIP file. Any advanced use of this module will require an understanding of the format, as defined in PKZIP Application Note. This module does not currently handle multi-disk ZIP files. It can handle ZIP files that use the ZIP64 extensions (that is ZIP files that are more than 4 GiB in size). It supports decryption of encrypted files in ZIP archives, but it currently cannot create an encrypted file. Decryption is extremely slow as it is implemented in native Python rather than C. The module defines the following items: exception zipfile.BadZipFile The error raised for bad ZIP files. New in version 3.2. exception zipfile.BadZipfile Alias of "BadZipFile", for compatibility with older Python versions. Deprecated since version 3.2. exception zipfile.LargeZipFile The error raised when a ZIP file would require ZIP64 functionality but that has not been enabled. class zipfile.ZipFile The class for reading and writing ZIP files. See section ZipFile Objects for constructor details. class zipfile.Path A pathlib-compatible wrapper for zip files. See section Path Objects for details. New in version 3.8. class zipfile.PyZipFile Class for creating ZIP archives containing Python libraries. class zipfile.ZipInfo(filename='NoName', date_time=(1980, 1, 1, 0, 0, 0)) Class used to represent information about a member of an archive. Instances of this class are returned by the "getinfo()" and "infolist()" methods of "ZipFile" objects. Most users of the "zipfile" module will not need to create these, but only use those created by this module. *filename* should be the full name of the archive member, and *date_time* should be a tuple containing six fields which describe the time of the last modification to the file; the fields are described in section ZipInfo Objects. zipfile.is_zipfile(filename) Returns "True" if *filename* is a valid ZIP file based on its magic number, otherwise returns "False". *filename* may be a file or file-like object too. Changed in version 3.1: Support for file and file-like objects. zipfile.ZIP_STORED The numeric constant for an uncompressed archive member. zipfile.ZIP_DEFLATED The numeric constant for the usual ZIP compression method. This requires the "zlib" module. zipfile.ZIP_BZIP2 The numeric constant for the BZIP2 compression method. This requires the "bz2" module. New in version 3.3. zipfile.ZIP_LZMA The numeric constant for the LZMA compression method. This requires the "lzma" module. New in version 3.3. Note: The ZIP file format specification has included support for bzip2 compression since 2001, and for LZMA compression since 2006. However, some tools (including older Python releases) do not support these compression methods, and may either refuse to process the ZIP file altogether, or fail to extract individual files. See also: PKZIP Application Note Documentation on the ZIP file format by Phil Katz, the creator of the format and algorithms used. Info-ZIP Home Page Information about the Info-ZIP project’s ZIP archive programs and development libraries. ZipFile Objects =============== class zipfile.ZipFile(file, mode='r', compression=ZIP_STORED, allowZip64=True, compresslevel=None, *, strict_timestamps=True) Open a ZIP file, where *file* can be a path to a file (a string), a file-like object or a *path-like object*. The *mode* parameter should be "'r'" to read an existing file, "'w'" to truncate and write a new file, "'a'" to append to an existing file, or "'x'" to exclusively create and write a new file. If *mode* is "'x'" and *file* refers to an existing file, a "FileExistsError" will be raised. If *mode* is "'a'" and *file* refers to an existing ZIP file, then additional files are added to it. If *file* does not refer to a ZIP file, then a new ZIP archive is appended to the file. This is meant for adding a ZIP archive to another file (such as "python.exe"). If *mode* is "'a'" and the file does not exist at all, it is created. If *mode* is "'r'" or "'a'", the file should be seekable. *compression* is the ZIP compression method to use when writing the archive, and should be "ZIP_STORED", "ZIP_DEFLATED", "ZIP_BZIP2" or "ZIP_LZMA"; unrecognized values will cause "NotImplementedError" to be raised. If "ZIP_DEFLATED", "ZIP_BZIP2" or "ZIP_LZMA" is specified but the corresponding module ("zlib", "bz2" or "lzma") is not available, "RuntimeError" is raised. The default is "ZIP_STORED". If *allowZip64* is "True" (the default) zipfile will create ZIP files that use the ZIP64 extensions when the zipfile is larger than 4 GiB. If it is "false" "zipfile" will raise an exception when the ZIP file would require ZIP64 extensions. The *compresslevel* parameter controls the compression level to use when writing files to the archive. When using "ZIP_STORED" or "ZIP_LZMA" it has no effect. When using "ZIP_DEFLATED" integers "0" through "9" are accepted (see "zlib" for more information). When using "ZIP_BZIP2" integers "1" through "9" are accepted (see "bz2" for more information). The *strict_timestamps* argument, when set to "False", allows to zip files older than 1980-01-01 at the cost of setting the timestamp to 1980-01-01. Similar behavior occurs with files newer than 2107-12-31, the timestamp is also set to the limit. If the file is created with mode "'w'", "'x'" or "'a'" and then "closed" without adding any files to the archive, the appropriate ZIP structures for an empty archive will be written to the file. ZipFile is also a context manager and therefore supports the "with" statement. In the example, *myzip* is closed after the "with" statement’s suite is finished—even if an exception occurs: with ZipFile('spam.zip', 'w') as myzip: myzip.write('eggs.txt') New in version 3.2: Added the ability to use "ZipFile" as a context manager. Changed in version 3.3: Added support for "bzip2" and "lzma" compression. Changed in version 3.4: ZIP64 extensions are enabled by default. Changed in version 3.5: Added support for writing to unseekable streams. Added support for the "'x'" mode. Changed in version 3.6: Previously, a plain "RuntimeError" was raised for unrecognized compression values. Changed in version 3.6.2: The *file* parameter accepts a *path-like object*. Changed in version 3.7: Add the *compresslevel* parameter. New in version 3.8: The *strict_timestamps* keyword-only argument ZipFile.close() Close the archive file. You must call "close()" before exiting your program or essential records will not be written. ZipFile.getinfo(name) Return a "ZipInfo" object with information about the archive member *name*. Calling "getinfo()" for a name not currently contained in the archive will raise a "KeyError". ZipFile.infolist() Return a list containing a "ZipInfo" object for each member of the archive. The objects are in the same order as their entries in the actual ZIP file on disk if an existing archive was opened. ZipFile.namelist() Return a list of archive members by name. ZipFile.open(name, mode='r', pwd=None, *, force_zip64=False) Access a member of the archive as a binary file-like object. *name* can be either the name of a file within the archive or a "ZipInfo" object. The *mode* parameter, if included, must be "'r'" (the default) or "'w'". *pwd* is the password used to decrypt encrypted ZIP files. "open()" is also a context manager and therefore supports the "with" statement: with ZipFile('spam.zip') as myzip: with myzip.open('eggs.txt') as myfile: print(myfile.read()) With *mode* "'r'" the file-like object ("ZipExtFile") is read-only and provides the following methods: "read()", "readline()", "readlines()", "seek()", "tell()", "__iter__()", "__next__()". These objects can operate independently of the ZipFile. With "mode='w'", a writable file handle is returned, which supports the "write()" method. While a writable file handle is open, attempting to read or write other files in the ZIP file will raise a "ValueError". When writing a file, if the file size is not known in advance but may exceed 2 GiB, pass "force_zip64=True" to ensure that the header format is capable of supporting large files. If the file size is known in advance, construct a "ZipInfo" object with "file_size" set, and use that as the *name* parameter. Note: The "open()", "read()" and "extract()" methods can take a filename or a "ZipInfo" object. You will appreciate this when trying to read a ZIP file that contains members with duplicate names. Changed in version 3.6: Removed support of "mode='U'". Use "io.TextIOWrapper" for reading compressed text files in *universal newlines* mode. Changed in version 3.6: "ZipFile.open()" can now be used to write files into the archive with the "mode='w'" option. Changed in version 3.6: Calling "open()" on a closed ZipFile will raise a "ValueError". Previously, a "RuntimeError" was raised. ZipFile.extract(member, path=None, pwd=None) Extract a member from the archive to the current working directory; *member* must be its full name or a "ZipInfo" object. Its file information is extracted as accurately as possible. *path* specifies a different directory to extract to. *member* can be a filename or a "ZipInfo" object. *pwd* is the password used for encrypted files. Returns the normalized path created (a directory or new file). Note: If a member filename is an absolute path, a drive/UNC sharepoint and leading (back)slashes will be stripped, e.g.: "///foo/bar" becomes "foo/bar" on Unix, and "C:\foo\bar" becomes "foo\bar" on Windows. And all "".."" components in a member filename will be removed, e.g.: "../../foo../../ba..r" becomes "foo../ba..r". On Windows illegal characters (":", "<", ">", "|", """, "?", and "*") replaced by underscore ("_"). Changed in version 3.6: Calling "extract()" on a closed ZipFile will raise a "ValueError". Previously, a "RuntimeError" was raised. Changed in version 3.6.2: The *path* parameter accepts a *path-like object*. ZipFile.extractall(path=None, members=None, pwd=None) Extract all members from the archive to the current working directory. *path* specifies a different directory to extract to. *members* is optional and must be a subset of the list returned by "namelist()". *pwd* is the password used for encrypted files. Warning: Never extract archives from untrusted sources without prior inspection. It is possible that files are created outside of *path*, e.g. members that have absolute filenames starting with ""/"" or filenames with two dots "".."". This module attempts to prevent that. See "extract()" note. Changed in version 3.6: Calling "extractall()" on a closed ZipFile will raise a "ValueError". Previously, a "RuntimeError" was raised. Changed in version 3.6.2: The *path* parameter accepts a *path-like object*. ZipFile.printdir() Print a table of contents for the archive to "sys.stdout". ZipFile.setpassword(pwd) Set *pwd* as default password to extract encrypted files. ZipFile.read(name, pwd=None) Return the bytes of the file *name* in the archive. *name* is the name of the file in the archive, or a "ZipInfo" object. The archive must be open for read or append. *pwd* is the password used for encrypted files and, if specified, it will override the default password set with "setpassword()". Calling "read()" on a ZipFile that uses a compression method other than "ZIP_STORED", "ZIP_DEFLATED", "ZIP_BZIP2" or "ZIP_LZMA" will raise a "NotImplementedError". An error will also be raised if the corresponding compression module is not available. Changed in version 3.6: Calling "read()" on a closed ZipFile will raise a "ValueError". Previously, a "RuntimeError" was raised. ZipFile.testzip() Read all the files in the archive and check their CRC’s and file headers. Return the name of the first bad file, or else return "None". Changed in version 3.6: Calling "testzip()" on a closed ZipFile will raise a "ValueError". Previously, a "RuntimeError" was raised. ZipFile.write(filename, arcname=None, compress_type=None, compresslevel=None) Write the file named *filename* to the archive, giving it the archive name *arcname* (by default, this will be the same as *filename*, but without a drive letter and with leading path separators removed). If given, *compress_type* overrides the value given for the *compression* parameter to the constructor for the new entry. Similarly, *compresslevel* will override the constructor if given. The archive must be open with mode "'w'", "'x'" or "'a'". Note: Archive names should be relative to the archive root, that is, they should not start with a path separator. Note: If "arcname" (or "filename", if "arcname" is not given) contains a null byte, the name of the file in the archive will be truncated at the null byte. Note: A leading slash in the filename may lead to the archive being impossible to open in some zip programs on Windows systems. Changed in version 3.6: Calling "write()" on a ZipFile created with mode "'r'" or a closed ZipFile will raise a "ValueError". Previously, a "RuntimeError" was raised. ZipFile.writestr(zinfo_or_arcname, data, compress_type=None, compresslevel=None) Write a file into the archive. The contents is *data*, which may be either a "str" or a "bytes" instance; if it is a "str", it is encoded as UTF-8 first. *zinfo_or_arcname* is either the file name it will be given in the archive, or a "ZipInfo" instance. If it’s an instance, at least the filename, date, and time must be given. If it’s a name, the date and time is set to the current date and time. The archive must be opened with mode "'w'", "'x'" or "'a'". If given, *compress_type* overrides the value given for the *compression* parameter to the constructor for the new entry, or in the *zinfo_or_arcname* (if that is a "ZipInfo" instance). Similarly, *compresslevel* will override the constructor if given. Note: When passing a "ZipInfo" instance as the *zinfo_or_arcname* parameter, the compression method used will be that specified in the *compress_type* member of the given "ZipInfo" instance. By default, the "ZipInfo" constructor sets this member to "ZIP_STORED". Changed in version 3.2: The *compress_type* argument. Changed in version 3.6: Calling "writestr()" on a ZipFile created with mode "'r'" or a closed ZipFile will raise a "ValueError". Previously, a "RuntimeError" was raised. The following data attributes are also available: ZipFile.filename Name of the ZIP file. ZipFile.debug The level of debug output to use. This may be set from "0" (the default, no output) to "3" (the most output). Debugging information is written to "sys.stdout". ZipFile.comment The comment associated with the ZIP file as a "bytes" object. If assigning a comment to a "ZipFile" instance created with mode "'w'", "'x'" or "'a'", it should be no longer than 65535 bytes. Comments longer than this will be truncated. Path Objects ============ class zipfile.Path(root, at='') Construct a Path object from a "root" zipfile (which may be a "ZipFile" instance or "file" suitable for passing to the "ZipFile" constructor). "at" specifies the location of this Path within the zipfile, e.g. ‘dir/file.txt’, ‘dir/’, or ‘’. Defaults to the empty string, indicating the root. Path objects expose the following features of "pathlib.Path" objects: Path objects are traversable using the "/" operator. Path.name The final path component. Path.open(mode='r', *, pwd, **) Invoke "ZipFile.open()" on the current path. Allows opening for read or write, text or binary through supported modes: ‘r’, ‘w’, ‘rb’, ‘wb’. Positional and keyword arguments are passed through to "io.TextIOWrapper" when opened as text and ignored otherwise. "pwd" is the "pwd" parameter to "ZipFile.open()". Changed in version 3.9: Added support for text and binary modes for open. Default mode is now text. Path.iterdir() Enumerate the children of the current directory. Path.is_dir() Return "True" if the current context references a directory. Path.is_file() Return "True" if the current context references a file. Path.exists() Return "True" if the current context references a file or directory in the zip file. Path.read_text(*, **) Read the current file as unicode text. Positional and keyword arguments are passed through to "io.TextIOWrapper" (except "buffer", which is implied by the context). Path.read_bytes() Read the current file as bytes. PyZipFile Objects ================= The "PyZipFile" constructor takes the same parameters as the "ZipFile" constructor, and one additional parameter, *optimize*. class zipfile.PyZipFile(file, mode='r', compression=ZIP_STORED, allowZip64=True, optimize=-1) New in version 3.2: The *optimize* parameter. Changed in version 3.4: ZIP64 extensions are enabled by default. Instances have one method in addition to those of "ZipFile" objects: writepy(pathname, basename='', filterfunc=None) Search for files "*.py" and add the corresponding file to the archive. If the *optimize* parameter to "PyZipFile" was not given or "-1", the corresponding file is a "*.pyc" file, compiling if necessary. If the *optimize* parameter to "PyZipFile" was "0", "1" or "2", only files with that optimization level (see "compile()") are added to the archive, compiling if necessary. If *pathname* is a file, the filename must end with ".py", and just the (corresponding "*.pyc") file is added at the top level (no path information). If *pathname* is a file that does not end with ".py", a "RuntimeError" will be raised. If it is a directory, and the directory is not a package directory, then all the files "*.pyc" are added at the top level. If the directory is a package directory, then all "*.pyc" are added under the package name as a file path, and if any subdirectories are package directories, all of these are added recursively in sorted order. *basename* is intended for internal use only. *filterfunc*, if given, must be a function taking a single string argument. It will be passed each path (including each individual full file path) before it is added to the archive. If *filterfunc* returns a false value, the path will not be added, and if it is a directory its contents will be ignored. For example, if our test files are all either in "test" directories or start with the string "test_", we can use a *filterfunc* to exclude them: >>> zf = PyZipFile('myprog.zip') >>> def notests(s): ... fn = os.path.basename(s) ... return (not (fn == 'test' or fn.startswith('test_'))) >>> zf.writepy('myprog', filterfunc=notests) The "writepy()" method makes archives with file names like this: string.pyc # Top level name test/__init__.pyc # Package directory test/testall.pyc # Module test.testall test/bogus/__init__.pyc # Subpackage directory test/bogus/myfile.pyc # Submodule test.bogus.myfile New in version 3.4: The *filterfunc* parameter. Changed in version 3.6.2: The *pathname* parameter accepts a *path-like object*. Changed in version 3.7: Recursion sorts directory entries. ZipInfo Objects =============== Instances of the "ZipInfo" class are returned by the "getinfo()" and "infolist()" methods of "ZipFile" objects. Each object stores information about a single member of the ZIP archive. There is one classmethod to make a "ZipInfo" instance for a filesystem file: classmethod ZipInfo.from_file(filename, arcname=None, *, strict_timestamps=True) Construct a "ZipInfo" instance for a file on the filesystem, in preparation for adding it to a zip file. *filename* should be the path to a file or directory on the filesystem. If *arcname* is specified, it is used as the name within the archive. If *arcname* is not specified, the name will be the same as *filename*, but with any drive letter and leading path separators removed. The *strict_timestamps* argument, when set to "False", allows to zip files older than 1980-01-01 at the cost of setting the timestamp to 1980-01-01. Similar behavior occurs with files newer than 2107-12-31, the timestamp is also set to the limit. New in version 3.6. Changed in version 3.6.2: The *filename* parameter accepts a *path- like object*. New in version 3.8: The *strict_timestamps* keyword-only argument Instances have the following methods and attributes: ZipInfo.is_dir() Return "True" if this archive member is a directory. This uses the entry’s name: directories should always end with "/". New in version 3.6. ZipInfo.filename Name of the file in the archive. ZipInfo.date_time The time and date of the last modification to the archive member. This is a tuple of six values: +---------+----------------------------+ | Index | Value | |=========|============================| | "0" | Year (>= 1980) | +---------+----------------------------+ | "1" | Month (one-based) | +---------+----------------------------+ | "2" | Day of month (one-based) | +---------+----------------------------+ | "3" | Hours (zero-based) | +---------+----------------------------+ | "4" | Minutes (zero-based) | +---------+----------------------------+ | "5" | Seconds (zero-based) | +---------+----------------------------+ Note: The ZIP file format does not support timestamps before 1980. ZipInfo.compress_type Type of compression for the archive member. ZipInfo.comment Comment for the individual archive member as a "bytes" object. ZipInfo.extra Expansion field data. The PKZIP Application Note contains some comments on the internal structure of the data contained in this "bytes" object. ZipInfo.create_system System which created ZIP archive. ZipInfo.create_version PKZIP version which created ZIP archive. ZipInfo.extract_version PKZIP version needed to extract archive. ZipInfo.reserved Must be zero. ZipInfo.flag_bits ZIP flag bits. ZipInfo.volume Volume number of file header. ZipInfo.internal_attr Internal attributes. ZipInfo.external_attr External file attributes. ZipInfo.header_offset Byte offset to the file header. ZipInfo.CRC CRC-32 of the uncompressed file. ZipInfo.compress_size Size of the compressed data. ZipInfo.file_size Size of the uncompressed file. Command-Line Interface ====================== The "zipfile" module provides a simple command-line interface to interact with ZIP archives. If you want to create a new ZIP archive, specify its name after the "-c" option and then list the filename(s) that should be included: $ python -m zipfile -c monty.zip spam.txt eggs.txt Passing a directory is also acceptable: $ python -m zipfile -c monty.zip life-of-brian_1979/ If you want to extract a ZIP archive into the specified directory, use the "-e" option: $ python -m zipfile -e monty.zip target-dir/ For a list of the files in a ZIP archive, use the "-l" option: $ python -m zipfile -l monty.zip Command-line options -------------------- -l --list List files in a zipfile. -c ... --create ... Create zipfile from source files. -e --extract Extract zipfile into target directory. -t --test Test whether the zipfile is valid or not. Decompression pitfalls ====================== The extraction in zipfile module might fail due to some pitfalls listed below. From file itself ---------------- Decompression may fail due to incorrect password / CRC checksum / ZIP format or unsupported compression method / decryption. File System limitations ----------------------- Exceeding limitations on different file systems can cause decompression failed. Such as allowable characters in the directory entries, length of the file name, length of the pathname, size of a single file, and number of files, etc. Resources limitations --------------------- The lack of memory or disk volume would lead to decompression failed. For example, decompression bombs (aka ZIP bomb) apply to zipfile library that can cause disk volume exhaustion. Interruption ------------ Interruption during the decompression, such as pressing control-C or killing the decompression process may result in incomplete decompression of the archive. Default behaviors of extraction ------------------------------- Not knowing the default extraction behaviors can cause unexpected decompression results. For example, when extracting the same archive twice, it overwrites files without asking. "zipimport" — Import modules from Zip archives ********************************************** **Source code:** Lib/zipimport.py ====================================================================== This module adds the ability to import Python modules ("*.py", "*.pyc") and packages from ZIP-format archives. It is usually not needed to use the "zipimport" module explicitly; it is automatically used by the built-in "import" mechanism for "sys.path" items that are paths to ZIP archives. Typically, "sys.path" is a list of directory names as strings. This module also allows an item of "sys.path" to be a string naming a ZIP file archive. The ZIP archive can contain a subdirectory structure to support package imports, and a path within the archive can be specified to only import from a subdirectory. For example, the path "example.zip/lib/" would only import from the "lib/" subdirectory within the archive. Any files may be present in the ZIP archive, but importers are only invoked for ".py" and ".pyc" files. ZIP import of dynamic modules (".pyd", ".so") is disallowed. Note that if an archive only contains ".py" files, Python will not attempt to modify the archive by adding the corresponding ".pyc" file, meaning that if a ZIP archive doesn’t contain ".pyc" files, importing may be rather slow. Changed in version 3.8: Previously, ZIP archives with an archive comment were not supported. See also: PKZIP Application Note Documentation on the ZIP file format by Phil Katz, the creator of the format and algorithms used. **PEP 273** - Import Modules from Zip Archives Written by James C. Ahlstrom, who also provided an implementation. Python 2.3 follows the specification in **PEP 273**, but uses an implementation written by Just van Rossum that uses the import hooks described in **PEP 302**. **PEP 302** - New Import Hooks The PEP to add the import hooks that help this module work. This module defines an exception: exception zipimport.ZipImportError Exception raised by zipimporter objects. It’s a subclass of "ImportError", so it can be caught as "ImportError", too. zipimporter Objects =================== "zipimporter" is the class for importing ZIP files. class zipimport.zipimporter(archivepath) Create a new zipimporter instance. *archivepath* must be a path to a ZIP file, or to a specific path within a ZIP file. For example, an *archivepath* of "foo/bar.zip/lib" will look for modules in the "lib" directory inside the ZIP file "foo/bar.zip" (provided that it exists). "ZipImportError" is raised if *archivepath* doesn’t point to a valid ZIP archive. find_module(fullname[, path]) Search for a module specified by *fullname*. *fullname* must be the fully qualified (dotted) module name. It returns the zipimporter instance itself if the module was found, or "None" if it wasn’t. The optional *path* argument is ignored—it’s there for compatibility with the importer protocol. get_code(fullname) Return the code object for the specified module. Raise "ZipImportError" if the module couldn’t be found. get_data(pathname) Return the data associated with *pathname*. Raise "OSError" if the file wasn’t found. Changed in version 3.3: "IOError" used to be raised instead of "OSError". get_filename(fullname) Return the value "__file__" would be set to if the specified module was imported. Raise "ZipImportError" if the module couldn’t be found. New in version 3.1. get_source(fullname) Return the source code for the specified module. Raise "ZipImportError" if the module couldn’t be found, return "None" if the archive does contain the module, but has no source for it. is_package(fullname) Return "True" if the module specified by *fullname* is a package. Raise "ZipImportError" if the module couldn’t be found. load_module(fullname) Load the module specified by *fullname*. *fullname* must be the fully qualified (dotted) module name. It returns the imported module, or raises "ZipImportError" if it wasn’t found. archive The file name of the importer’s associated ZIP file, without a possible subpath. prefix The subpath within the ZIP file where modules are searched. This is the empty string for zipimporter objects which point to the root of the ZIP file. The "archive" and "prefix" attributes, when combined with a slash, equal the original *archivepath* argument given to the "zipimporter" constructor. Examples ======== Here is an example that imports a module from a ZIP archive - note that the "zipimport" module is not explicitly used. $ unzip -l example.zip Archive: example.zip Length Date Time Name -------- ---- ---- ---- 8467 11-26-02 22:30 jwzthreading.py -------- ------- 8467 1 file $ ./python Python 2.3 (#1, Aug 1 2003, 19:54:32) >>> import sys >>> sys.path.insert(0, 'example.zip') # Add .zip file to front of path >>> import jwzthreading >>> jwzthreading.__file__ 'example.zip/jwzthreading.py' "zlib" — Compression compatible with **gzip** ********************************************* ====================================================================== For applications that require data compression, the functions in this module allow compression and decompression, using the zlib library. The zlib library has its own home page at https://www.zlib.net. There are known incompatibilities between the Python module and versions of the zlib library earlier than 1.1.3; 1.1.3 has a security vulnerability, so we recommend using 1.1.4 or later. zlib’s functions have many options and often need to be used in a particular order. This documentation doesn’t attempt to cover all of the permutations; consult the zlib manual at http://www.zlib.net/manual.html for authoritative information. For reading and writing ".gz" files see the "gzip" module. The available exception and functions in this module are: exception zlib.error Exception raised on compression and decompression errors. zlib.adler32(data[, value]) Computes an Adler-32 checksum of *data*. (An Adler-32 checksum is almost as reliable as a CRC32 but can be computed much more quickly.) The result is an unsigned 32-bit integer. If *value* is present, it is used as the starting value of the checksum; otherwise, a default value of 1 is used. Passing in *value* allows computing a running checksum over the concatenation of several inputs. The algorithm is not cryptographically strong, and should not be used for authentication or digital signatures. Since the algorithm is designed for use as a checksum algorithm, it is not suitable for use as a general hash algorithm. Changed in version 3.0: The result is always unsigned. To generate the same numeric value when using Python 2 or earlier, use "adler32(data) & 0xffffffff". zlib.compress(data, /, level=-1) Compresses the bytes in *data*, returning a bytes object containing compressed data. *level* is an integer from "0" to "9" or "-1" controlling the level of compression; "1" (Z_BEST_SPEED) is fastest and produces the least compression, "9" (Z_BEST_COMPRESSION) is slowest and produces the most. "0" (Z_NO_COMPRESSION) is no compression. The default value is "-1" (Z_DEFAULT_COMPRESSION). Z_DEFAULT_COMPRESSION represents a default compromise between speed and compression (currently equivalent to level 6). Raises the "error" exception if any error occurs. Changed in version 3.6: *level* can now be used as a keyword parameter. zlib.compressobj(level=-1, method=DEFLATED, wbits=MAX_WBITS, memLevel=DEF_MEM_LEVEL, strategy=Z_DEFAULT_STRATEGY[, zdict]) Returns a compression object, to be used for compressing data streams that won’t fit into memory at once. *level* is the compression level – an integer from "0" to "9" or "-1". A value of "1" (Z_BEST_SPEED) is fastest and produces the least compression, while a value of "9" (Z_BEST_COMPRESSION) is slowest and produces the most. "0" (Z_NO_COMPRESSION) is no compression. The default value is "-1" (Z_DEFAULT_COMPRESSION). Z_DEFAULT_COMPRESSION represents a default compromise between speed and compression (currently equivalent to level 6). *method* is the compression algorithm. Currently, the only supported value is "DEFLATED". The *wbits* argument controls the size of the history buffer (or the “window size”) used when compressing data, and whether a header and trailer is included in the output. It can take several ranges of values, defaulting to "15" (MAX_WBITS): * +9 to +15: The base-two logarithm of the window size, which therefore ranges between 512 and 32768. Larger values produce better compression at the expense of greater memory usage. The resulting output will include a zlib-specific header and trailer. * −9 to −15: Uses the absolute value of *wbits* as the window size logarithm, while producing a raw output stream with no header or trailing checksum. * +25 to +31 = 16 + (9 to 15): Uses the low 4 bits of the value as the window size logarithm, while including a basic **gzip** header and trailing checksum in the output. The *memLevel* argument controls the amount of memory used for the internal compression state. Valid values range from "1" to "9". Higher values use more memory, but are faster and produce smaller output. *strategy* is used to tune the compression algorithm. Possible values are "Z_DEFAULT_STRATEGY", "Z_FILTERED", "Z_HUFFMAN_ONLY", "Z_RLE" (zlib 1.2.0.1) and "Z_FIXED" (zlib 1.2.2.2). *zdict* is a predefined compression dictionary. This is a sequence of bytes (such as a "bytes" object) containing subsequences that are expected to occur frequently in the data that is to be compressed. Those subsequences that are expected to be most common should come at the end of the dictionary. Changed in version 3.3: Added the *zdict* parameter and keyword argument support. zlib.crc32(data[, value]) Computes a CRC (Cyclic Redundancy Check) checksum of *data*. The result is an unsigned 32-bit integer. If *value* is present, it is used as the starting value of the checksum; otherwise, a default value of 0 is used. Passing in *value* allows computing a running checksum over the concatenation of several inputs. The algorithm is not cryptographically strong, and should not be used for authentication or digital signatures. Since the algorithm is designed for use as a checksum algorithm, it is not suitable for use as a general hash algorithm. Changed in version 3.0: The result is always unsigned. To generate the same numeric value when using Python 2 or earlier, use "crc32(data) & 0xffffffff". zlib.decompress(data, /, wbits=MAX_WBITS, bufsize=DEF_BUF_SIZE) Decompresses the bytes in *data*, returning a bytes object containing the uncompressed data. The *wbits* parameter depends on the format of *data*, and is discussed further below. If *bufsize* is given, it is used as the initial size of the output buffer. Raises the "error" exception if any error occurs. The *wbits* parameter controls the size of the history buffer (or “window size”), and what header and trailer format is expected. It is similar to the parameter for "compressobj()", but accepts more ranges of values: * +8 to +15: The base-two logarithm of the window size. The input must include a zlib header and trailer. * 0: Automatically determine the window size from the zlib header. Only supported since zlib 1.2.3.5. * −8 to −15: Uses the absolute value of *wbits* as the window size logarithm. The input must be a raw stream with no header or trailer. * +24 to +31 = 16 + (8 to 15): Uses the low 4 bits of the value as the window size logarithm. The input must include a gzip header and trailer. * +40 to +47 = 32 + (8 to 15): Uses the low 4 bits of the value as the window size logarithm, and automatically accepts either the zlib or gzip format. When decompressing a stream, the window size must not be smaller than the size originally used to compress the stream; using a too- small value may result in an "error" exception. The default *wbits* value corresponds to the largest window size and requires a zlib header and trailer to be included. *bufsize* is the initial size of the buffer used to hold decompressed data. If more space is required, the buffer size will be increased as needed, so you don’t have to get this value exactly right; tuning it will only save a few calls to "malloc()". Changed in version 3.6: *wbits* and *bufsize* can be used as keyword arguments. zlib.decompressobj(wbits=MAX_WBITS[, zdict]) Returns a decompression object, to be used for decompressing data streams that won’t fit into memory at once. The *wbits* parameter controls the size of the history buffer (or the “window size”), and what header and trailer format is expected. It has the same meaning as described for decompress(). The *zdict* parameter specifies a predefined compression dictionary. If provided, this must be the same dictionary as was used by the compressor that produced the data that is to be decompressed. Note: If *zdict* is a mutable object (such as a "bytearray"), you must not modify its contents between the call to "decompressobj()" and the first call to the decompressor’s "decompress()" method. Changed in version 3.3: Added the *zdict* parameter. Compression objects support the following methods: Compress.compress(data) Compress *data*, returning a bytes object containing compressed data for at least part of the data in *data*. This data should be concatenated to the output produced by any preceding calls to the "compress()" method. Some input may be kept in internal buffers for later processing. Compress.flush([mode]) All pending input is processed, and a bytes object containing the remaining compressed output is returned. *mode* can be selected from the constants "Z_NO_FLUSH", "Z_PARTIAL_FLUSH", "Z_SYNC_FLUSH", "Z_FULL_FLUSH", "Z_BLOCK" (zlib 1.2.3.4), or "Z_FINISH", defaulting to "Z_FINISH". Except "Z_FINISH", all constants allow compressing further bytestrings of data, while "Z_FINISH" finishes the compressed stream and prevents compressing any more data. After calling "flush()" with *mode* set to "Z_FINISH", the "compress()" method cannot be called again; the only realistic action is to delete the object. Compress.copy() Returns a copy of the compression object. This can be used to efficiently compress a set of data that share a common initial prefix. Changed in version 3.8: Added "copy.copy()" and "copy.deepcopy()" support to compression objects. Decompression objects support the following methods and attributes: Decompress.unused_data A bytes object which contains any bytes past the end of the compressed data. That is, this remains "b""" until the last byte that contains compression data is available. If the whole bytestring turned out to contain compressed data, this is "b""", an empty bytes object. Decompress.unconsumed_tail A bytes object that contains any data that was not consumed by the last "decompress()" call because it exceeded the limit for the uncompressed data buffer. This data has not yet been seen by the zlib machinery, so you must feed it (possibly with further data concatenated to it) back to a subsequent "decompress()" method call in order to get correct output. Decompress.eof A boolean indicating whether the end of the compressed data stream has been reached. This makes it possible to distinguish between a properly-formed compressed stream, and an incomplete or truncated one. New in version 3.3. Decompress.decompress(data, max_length=0) Decompress *data*, returning a bytes object containing the uncompressed data corresponding to at least part of the data in *string*. This data should be concatenated to the output produced by any preceding calls to the "decompress()" method. Some of the input data may be preserved in internal buffers for later processing. If the optional parameter *max_length* is non-zero then the return value will be no longer than *max_length*. This may mean that not all of the compressed input can be processed; and unconsumed data will be stored in the attribute "unconsumed_tail". This bytestring must be passed to a subsequent call to "decompress()" if decompression is to continue. If *max_length* is zero then the whole input is decompressed, and "unconsumed_tail" is empty. Changed in version 3.6: *max_length* can be used as a keyword argument. Decompress.flush([length]) All pending input is processed, and a bytes object containing the remaining uncompressed output is returned. After calling "flush()", the "decompress()" method cannot be called again; the only realistic action is to delete the object. The optional parameter *length* sets the initial size of the output buffer. Decompress.copy() Returns a copy of the decompression object. This can be used to save the state of the decompressor midway through the data stream in order to speed up random seeks into the stream at a future point. Changed in version 3.8: Added "copy.copy()" and "copy.deepcopy()" support to decompression objects. Information about the version of the zlib library in use is available through the following constants: zlib.ZLIB_VERSION The version string of the zlib library that was used for building the module. This may be different from the zlib library actually used at runtime, which is available as "ZLIB_RUNTIME_VERSION". zlib.ZLIB_RUNTIME_VERSION The version string of the zlib library actually loaded by the interpreter. New in version 3.3. See also: Module "gzip" Reading and writing **gzip**-format files. http://www.zlib.net The zlib library home page. http://www.zlib.net/manual.html The zlib manual explains the semantics and usage of the library’s many functions. "zoneinfo" — IANA time zone support *********************************** New in version 3.9. ====================================================================== The "zoneinfo" module provides a concrete time zone implementation to support the IANA time zone database as originally specified in **PEP 615**. By default, "zoneinfo" uses the system’s time zone data if available; if no system time zone data is available, the library will fall back to using the first-party tzdata package available on PyPI. See also: Module: "datetime" Provides the "time" and "datetime" types with which the "ZoneInfo" class is designed to be used. Package tzdata First-party package maintained by the CPython core developers to supply time zone data via PyPI. Using "ZoneInfo" ================ "ZoneInfo" is a concrete implementation of the "datetime.tzinfo" abstract base class, and is intended to be attached to "tzinfo", either via the constructor, the "datetime.replace" method or "datetime.astimezone": >>> from zoneinfo import ZoneInfo >>> from datetime import datetime, timedelta >>> dt = datetime(2020, 10, 31, 12, tzinfo=ZoneInfo("America/Los_Angeles")) >>> print(dt) 2020-10-31 12:00:00-07:00 >>> dt.tzname() 'PDT' Datetimes constructed in this way are compatible with datetime arithmetic and handle daylight saving time transitions with no further intervention: >>> dt_add = dt + timedelta(days=1) >>> print(dt_add) 2020-11-01 12:00:00-08:00 >>> dt_add.tzname() 'PST' These time zones also support the "fold" attribute introduced in **PEP 495**. During offset transitions which induce ambiguous times (such as a daylight saving time to standard time transition), the offset from *before* the transition is used when "fold=0", and the offset *after* the transition is used when "fold=1", for example: >>> dt = datetime(2020, 11, 1, 1, tzinfo=ZoneInfo("America/Los_Angeles")) >>> print(dt) 2020-11-01 01:00:00-07:00 >>> print(dt.replace(fold=1)) 2020-11-01 01:00:00-08:00 When converting from another time zone, the fold will be set to the correct value: >>> from datetime import timezone >>> LOS_ANGELES = ZoneInfo("America/Los_Angeles") >>> dt_utc = datetime(2020, 11, 1, 8, tzinfo=timezone.utc) >>> # Before the PDT -> PST transition >>> print(dt_utc.astimezone(LOS_ANGELES)) 2020-11-01 01:00:00-07:00 >>> # After the PDT -> PST transition >>> print((dt_utc + timedelta(hours=1)).astimezone(LOS_ANGELES)) 2020-11-01 01:00:00-08:00 Data sources ============ The "zoneinfo" module does not directly provide time zone data, and instead pulls time zone information from the system time zone database or the first-party PyPI package tzdata, if available. Some systems, including notably Windows systems, do not have an IANA database available, and so for projects targeting cross-platform compatibility that require time zone data, it is recommended to declare a dependency on tzdata. If neither system data nor tzdata are available, all calls to "ZoneInfo" will raise "ZoneInfoNotFoundError". Configuring the data sources ---------------------------- When "ZoneInfo(key)" is called, the constructor first searches the directories specified in "TZPATH" for a file matching "key", and on failure looks for a match in the tzdata package. This behavior can be configured in three ways: 1. The default "TZPATH" when not otherwise specified can be configured at compile time. 2. "TZPATH" can be configured using an environment variable. 3. At runtime, the search path can be manipulated using the "reset_tzpath()" function. Compile-time configuration ~~~~~~~~~~~~~~~~~~~~~~~~~~ The default "TZPATH" includes several common deployment locations for the time zone database (except on Windows, where there are no “well- known” locations for time zone data). On POSIX systems, downstream distributors and those building Python from source who know where their system time zone data is deployed may change the default time zone path by specifying the compile-time option "TZPATH" (or, more likely, the "configure" flag "--with-tzpath"), which should be a string delimited by "os.pathsep". On all platforms, the configured value is available as the "TZPATH" key in "sysconfig.get_config_var()". Environment configuration ~~~~~~~~~~~~~~~~~~~~~~~~~ When initializing "TZPATH" (either at import time or whenever "reset_tzpath()" is called with no arguments), the "zoneinfo" module will use the environment variable "PYTHONTZPATH", if it exists, to set the search path. PYTHONTZPATH This is an "os.pathsep"-separated string containing the time zone search path to use. It must consist of only absolute rather than relative paths. Relative components specified in "PYTHONTZPATH" will not be used, but otherwise the behavior when a relative path is specified is implementation-defined; CPython will raise "InvalidTZPathWarning", but other implementations are free to silently ignore the erroneous component or raise an exception. To set the system to ignore the system data and use the tzdata package instead, set "PYTHONTZPATH=""". Runtime configuration ~~~~~~~~~~~~~~~~~~~~~ The TZ search path can also be configured at runtime using the "reset_tzpath()" function. This is generally not an advisable operation, though it is reasonable to use it in test functions that require the use of a specific time zone path (or require disabling access to the system time zones). The "ZoneInfo" class ==================== class zoneinfo.ZoneInfo(key) A concrete "datetime.tzinfo" subclass that represents an IANA time zone specified by the string "key". Calls to the primary constructor will always return objects that compare identically; put another way, barring cache invalidation via "ZoneInfo.clear_cache()", for all values of "key", the following assertion will always be true: a = ZoneInfo(key) b = ZoneInfo(key) assert a is b "key" must be in the form of a relative, normalized POSIX path, with no up-level references. The constructor will raise "ValueError" if a non-conforming key is passed. If no file matching "key" is found, the constructor will raise "ZoneInfoNotFoundError". The "ZoneInfo" class has two alternate constructors: classmethod ZoneInfo.from_file(fobj, /, key=None) Constructs a "ZoneInfo" object from a file-like object returning bytes (e.g. a file opened in binary mode or an "io.BytesIO" object). Unlike the primary constructor, this always constructs a new object. The "key" parameter sets the name of the zone for the purposes of "__str__()" and "__repr__()". Objects created via this constructor cannot be pickled (see pickling). classmethod ZoneInfo.no_cache(key) An alternate constructor that bypasses the constructor’s cache. It is identical to the primary constructor, but returns a new object on each call. This is most likely to be useful for testing or demonstration purposes, but it can also be used to create a system with a different cache invalidation strategy. Objects created via this constructor will also bypass the cache of a deserializing process when unpickled. Caution: Using this constructor may change the semantics of your datetimes in surprising ways, only use it if you know that you need to. The following class methods are also available: classmethod ZoneInfo.clear_cache(*, only_keys=None) A method for invalidating the cache on the "ZoneInfo" class. If no arguments are passed, all caches are invalidated and the next call to the primary constructor for each key will return a new instance. If an iterable of key names is passed to the "only_keys" parameter, only the specified keys will be removed from the cache. Keys passed to "only_keys" but not found in the cache are ignored. Warning: Invoking this function may change the semantics of datetimes using "ZoneInfo" in surprising ways; this modifies process-wide global state and thus may have wide-ranging effects. Only use it if you know that you need to. The class has one attribute: ZoneInfo.key This is a read-only *attribute* that returns the value of "key" passed to the constructor, which should be a lookup key in the IANA time zone database (e.g. "America/New_York", "Europe/Paris" or "Asia/Tokyo"). For zones constructed from file without specifying a "key" parameter, this will be set to "None". Note: Although it is a somewhat common practice to expose these to end users, these values are designed to be primary keys for representing the relevant zones and not necessarily user-facing elements. Projects like CLDR (the Unicode Common Locale Data Repository) can be used to get more user-friendly strings from these keys. String representations ---------------------- The string representation returned when calling "str" on a "ZoneInfo" object defaults to using the "ZoneInfo.key" attribute (see the note on usage in the attribute documentation): >>> zone = ZoneInfo("Pacific/Kwajalein") >>> str(zone) 'Pacific/Kwajalein' >>> dt = datetime(2020, 4, 1, 3, 15, tzinfo=zone) >>> f"{dt.isoformat()} [{dt.tzinfo}]" '2020-04-01T03:15:00+12:00 [Pacific/Kwajalein]' For objects constructed from a file without specifying a "key" parameter, "str" falls back to calling "repr()". "ZoneInfo"’s "repr" is implementation-defined and not necessarily stable between versions, but it is guaranteed not to be a valid "ZoneInfo" key. Pickle serialization -------------------- Rather than serializing all transition data, "ZoneInfo" objects are serialized by key, and "ZoneInfo" objects constructed from files (even those with a value for "key" specified) cannot be pickled. The behavior of a "ZoneInfo" file depends on how it was constructed: 1. "ZoneInfo(key)": When constructed with the primary constructor, a "ZoneInfo" object is serialized by key, and when deserialized, the deserializing process uses the primary and thus it is expected that these are expected to be the same object as other references to the same time zone. For example, if "europe_berlin_pkl" is a string containing a pickle constructed from "ZoneInfo("Europe/Berlin")", one would expect the following behavior: >>> a = ZoneInfo("Europe/Berlin") >>> b = pickle.loads(europe_berlin_pkl) >>> a is b True 2. "ZoneInfo.no_cache(key)": When constructed from the cache-bypassing constructor, the "ZoneInfo" object is also serialized by key, but when deserialized, the deserializing process uses the cache bypassing constructor. If "europe_berlin_pkl_nc" is a string containing a pickle constructed from "ZoneInfo.no_cache("Europe/Berlin")", one would expect the following behavior: >>> a = ZoneInfo("Europe/Berlin") >>> b = pickle.loads(europe_berlin_pkl_nc) >>> a is b False 3. "ZoneInfo.from_file(fobj, /, key=None)": When constructed from a file, the "ZoneInfo" object raises an exception on pickling. If an end user wants to pickle a "ZoneInfo" constructed from a file, it is recommended that they use a wrapper type or a custom serialization function: either serializing by key or storing the contents of the file object and serializing that. This method of serialization requires that the time zone data for the required key be available on both the serializing and deserializing side, similar to the way that references to classes and functions are expected to exist in both the serializing and deserializing environments. It also means that no guarantees are made about the consistency of results when unpickling a "ZoneInfo" pickled in an environment with a different version of the time zone data. Functions ========= zoneinfo.available_timezones() Get a set containing all the valid keys for IANA time zones available anywhere on the time zone path. This is recalculated on every call to the function. This function only includes canonical zone names and does not include “special” zones such as those under the "posix/" and "right/" directories, or the "posixrules" zone. Caution: This function may open a large number of files, as the best way to determine if a file on the time zone path is a valid time zone is to read the “magic string” at the beginning. Note: These values are not designed to be exposed to end-users; for user facing elements, applications should use something like CLDR (the Unicode Common Locale Data Repository) to get more user- friendly strings. See also the cautionary note on "ZoneInfo.key". zoneinfo.reset_tzpath(to=None) Sets or resets the time zone search path ("TZPATH") for the module. When called with no arguments, "TZPATH" is set to the default value. Calling "reset_tzpath" will not invalidate the "ZoneInfo" cache, and so calls to the primary "ZoneInfo" constructor will only use the new "TZPATH" in the case of a cache miss. The "to" parameter must be a *sequence* of strings or "os.PathLike" and not a string, all of which must be absolute paths. "ValueError" will be raised if something other than an absolute path is passed. Globals ======= zoneinfo.TZPATH A read-only sequence representing the time zone search path – when constructing a "ZoneInfo" from a key, the key is joined to each entry in the "TZPATH", and the first file found is used. "TZPATH" may contain only absolute paths, never relative paths, regardless of how it is configured. The object that "zoneinfo.TZPATH" points to may change in response to a call to "reset_tzpath()", so it is recommended to use "zoneinfo.TZPATH" rather than importing "TZPATH" from "zoneinfo" or assigning a long-lived variable to "zoneinfo.TZPATH". For more information on configuring the time zone search path, see Configuring the data sources. Exceptions and warnings ======================= exception zoneinfo.ZoneInfoNotFoundError Raised when construction of a "ZoneInfo" object fails because the specified key could not be found on the system. This is a subclass of "KeyError". exception zoneinfo.InvalidTZPathWarning Raised when "PYTHONTZPATH" contains an invalid component that will be filtered out, such as a relative path. History and License ******************* History of the software ======================= Python was created in the early 1990s by Guido van Rossum at Stichting Mathematisch Centrum (CWI, see https://www.cwi.nl/) in the Netherlands as a successor of a language called ABC. Guido remains Python’s principal author, although it includes many contributions from others. In 1995, Guido continued his work on Python at the Corporation for National Research Initiatives (CNRI, see https://www.cnri.reston.va.us/) in Reston, Virginia where he released several versions of the software. In May 2000, Guido and the Python core development team moved to BeOpen.com to form the BeOpen PythonLabs team. In October of the same year, the PythonLabs team moved to Digital Creations (now Zope Corporation; see https://www.zope.org/). In 2001, the Python Software Foundation (PSF, see https://www.python.org/psf/) was formed, a non- profit organization created specifically to own Python-related Intellectual Property. Zope Corporation is a sponsoring member of the PSF. All Python releases are Open Source (see https://opensource.org/ for the Open Source Definition). Historically, most, but not all, Python releases have also been GPL-compatible; the table below summarizes the various releases. +------------------+----------------+--------------+--------------+-------------------+ | Release | Derived from | Year | Owner | GPL compatible? | |==================|================|==============|==============|===================| | 0.9.0 thru 1.2 | n/a | 1991-1995 | CWI | yes | +------------------+----------------+--------------+--------------+-------------------+ | 1.3 thru 1.5.2 | 1.2 | 1995-1999 | CNRI | yes | +------------------+----------------+--------------+--------------+-------------------+ | 1.6 | 1.5.2 | 2000 | CNRI | no | +------------------+----------------+--------------+--------------+-------------------+ | 2.0 | 1.6 | 2000 | BeOpen.com | no | +------------------+----------------+--------------+--------------+-------------------+ | 1.6.1 | 1.6 | 2001 | CNRI | no | +------------------+----------------+--------------+--------------+-------------------+ | 2.1 | 2.0+1.6.1 | 2001 | PSF | no | +------------------+----------------+--------------+--------------+-------------------+ | 2.0.1 | 2.0+1.6.1 | 2001 | PSF | yes | +------------------+----------------+--------------+--------------+-------------------+ | 2.1.1 | 2.1+2.0.1 | 2001 | PSF | yes | +------------------+----------------+--------------+--------------+-------------------+ | 2.1.2 | 2.1.1 | 2002 | PSF | yes | +------------------+----------------+--------------+--------------+-------------------+ | 2.1.3 | 2.1.2 | 2002 | PSF | yes | +------------------+----------------+--------------+--------------+-------------------+ | 2.2 and above | 2.1.1 | 2001-now | PSF | yes | +------------------+----------------+--------------+--------------+-------------------+ Note: GPL-compatible doesn’t mean that we’re distributing Python under the GPL. All Python licenses, unlike the GPL, let you distribute a modified version without making your changes open source. The GPL- compatible licenses make it possible to combine Python with other software that is released under the GPL; the others don’t. Thanks to the many outside volunteers who have worked under Guido’s direction to make these releases possible. Terms and conditions for accessing or otherwise using Python ============================================================ Python software and documentation are licensed under the PSF License Agreement. Starting with Python 3.8.6, examples, recipes, and other code in the documentation are dual licensed under the PSF License Agreement and the Zero-Clause BSD license. Some software incorporated into Python is under different licenses. The licenses are listed with code falling under that license. See Licenses and Acknowledgements for Incorporated Software for an incomplete list of these licenses. PSF LICENSE AGREEMENT FOR PYTHON 3.9.23 --------------------------------------- 1. This LICENSE AGREEMENT is between the Python Software Foundation ("PSF"), and the Individual or Organization ("Licensee") accessing and otherwise using Python 3.9.23 software in source or binary form and its associated documentation. 2. Subject to the terms and conditions of this License Agreement, PSF hereby grants Licensee a nonexclusive, royalty-free, world-wide license to reproduce, analyze, test, perform and/or display publicly, prepare derivative works, distribute, and otherwise use Python 3.9.23 alone or in any derivative version, provided, however, that PSF's License Agreement and PSF's notice of copyright, i.e., "Copyright © 2001-2023 Python Software Foundation; All Rights Reserved" are retained in Python 3.9.23 alone or in any derivative version prepared by Licensee. 3. In the event Licensee prepares a derivative work that is based on or incorporates Python 3.9.23 or any part thereof, and wants to make the derivative work available to others as provided herein, then Licensee hereby agrees to include in any such work a brief summary of the changes made to Python 3.9.23. 4. PSF is making Python 3.9.23 available to Licensee on an "AS IS" basis. PSF MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, PSF MAKES NO AND DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON 3.9.23 WILL NOT INFRINGE ANY THIRD PARTY RIGHTS. 5. PSF SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON 3.9.23 FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON 3.9.23, OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF. 6. This License Agreement will automatically terminate upon a material breach of its terms and conditions. 7. Nothing in this License Agreement shall be deemed to create any relationship of agency, partnership, or joint venture between PSF and Licensee. This License Agreement does not grant permission to use PSF trademarks or trade name in a trademark sense to endorse or promote products or services of Licensee, or any third party. 8. By copying, installing or otherwise using Python 3.9.23, Licensee agrees to be bound by the terms and conditions of this License Agreement. BEOPEN.COM LICENSE AGREEMENT FOR PYTHON 2.0 ------------------------------------------- BEOPEN PYTHON OPEN SOURCE LICENSE AGREEMENT VERSION 1 1. This LICENSE AGREEMENT is between BeOpen.com ("BeOpen"), having an office at 160 Saratoga Avenue, Santa Clara, CA 95051, and the Individual or Organization ("Licensee") accessing and otherwise using this software in source or binary form and its associated documentation ("the Software"). 2. Subject to the terms and conditions of this BeOpen Python License Agreement, BeOpen hereby grants Licensee a non-exclusive, royalty-free, world-wide license to reproduce, analyze, test, perform and/or display publicly, prepare derivative works, distribute, and otherwise use the Software alone or in any derivative version, provided, however, that the BeOpen Python License is retained in the Software, alone or in any derivative version prepared by Licensee. 3. BeOpen is making the Software available to Licensee on an "AS IS" basis. BEOPEN MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, BEOPEN MAKES NO AND DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF THE SOFTWARE WILL NOT INFRINGE ANY THIRD PARTY RIGHTS. 4. BEOPEN SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF THE SOFTWARE FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS A RESULT OF USING, MODIFYING OR DISTRIBUTING THE SOFTWARE, OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF. 5. This License Agreement will automatically terminate upon a material breach of its terms and conditions. 6. This License Agreement shall be governed by and interpreted in all respects by the law of the State of California, excluding conflict of law provisions. Nothing in this License Agreement shall be deemed to create any relationship of agency, partnership, or joint venture between BeOpen and Licensee. This License Agreement does not grant permission to use BeOpen trademarks or trade names in a trademark sense to endorse or promote products or services of Licensee, or any third party. As an exception, the "BeOpen Python" logos available at http://www.pythonlabs.com/logos.html may be used according to the permissions granted on that web page. 7. By copying, installing or otherwise using the software, Licensee agrees to be bound by the terms and conditions of this License Agreement. CNRI LICENSE AGREEMENT FOR PYTHON 1.6.1 --------------------------------------- 1. This LICENSE AGREEMENT is between the Corporation for National Research Initiatives, having an office at 1895 Preston White Drive, Reston, VA 20191 ("CNRI"), and the Individual or Organization ("Licensee") accessing and otherwise using Python 1.6.1 software in source or binary form and its associated documentation. 2. Subject to the terms and conditions of this License Agreement, CNRI hereby grants Licensee a nonexclusive, royalty-free, world-wide license to reproduce, analyze, test, perform and/or display publicly, prepare derivative works, distribute, and otherwise use Python 1.6.1 alone or in any derivative version, provided, however, that CNRI's License Agreement and CNRI's notice of copyright, i.e., "Copyright © 1995-2001 Corporation for National Research Initiatives; All Rights Reserved" are retained in Python 1.6.1 alone or in any derivative version prepared by Licensee. Alternately, in lieu of CNRI's License Agreement, Licensee may substitute the following text (omitting the quotes): "Python 1.6.1 is made available subject to the terms and conditions in CNRI's License Agreement. This Agreement together with Python 1.6.1 may be located on the Internet using the following unique, persistent identifier (known as a handle): 1895.22/1013. This Agreement may also be obtained from a proxy server on the Internet using the following URL: http://hdl.handle.net/1895.22/1013." 3. In the event Licensee prepares a derivative work that is based on or incorporates Python 1.6.1 or any part thereof, and wants to make the derivative work available to others as provided herein, then Licensee hereby agrees to include in any such work a brief summary of the changes made to Python 1.6.1. 4. CNRI is making Python 1.6.1 available to Licensee on an "AS IS" basis. CNRI MAKES NO REPRESENTATIONS OR WARRANTIES, EXPRESS OR IMPLIED. BY WAY OF EXAMPLE, BUT NOT LIMITATION, CNRI MAKES NO AND DISCLAIMS ANY REPRESENTATION OR WARRANTY OF MERCHANTABILITY OR FITNESS FOR ANY PARTICULAR PURPOSE OR THAT THE USE OF PYTHON 1.6.1 WILL NOT INFRINGE ANY THIRD PARTY RIGHTS. 5. CNRI SHALL NOT BE LIABLE TO LICENSEE OR ANY OTHER USERS OF PYTHON 1.6.1 FOR ANY INCIDENTAL, SPECIAL, OR CONSEQUENTIAL DAMAGES OR LOSS AS A RESULT OF MODIFYING, DISTRIBUTING, OR OTHERWISE USING PYTHON 1.6.1, OR ANY DERIVATIVE THEREOF, EVEN IF ADVISED OF THE POSSIBILITY THEREOF. 6. This License Agreement will automatically terminate upon a material breach of its terms and conditions. 7. This License Agreement shall be governed by the federal intellectual property law of the United States, including without limitation the federal copyright law, and, to the extent such U.S. federal law does not apply, by the law of the Commonwealth of Virginia, excluding Virginia's conflict of law provisions. Notwithstanding the foregoing, with regard to derivative works based on Python 1.6.1 that incorporate non-separable material that was previously distributed under the GNU General Public License (GPL), the law of the Commonwealth of Virginia shall govern this License Agreement only as to issues arising under or with respect to Paragraphs 4, 5, and 7 of this License Agreement. Nothing in this License Agreement shall be deemed to create any relationship of agency, partnership, or joint venture between CNRI and Licensee. This License Agreement does not grant permission to use CNRI trademarks or trade name in a trademark sense to endorse or promote products or services of Licensee, or any third party. 8. By clicking on the "ACCEPT" button where indicated, or by copying, installing or otherwise using Python 1.6.1, Licensee agrees to be bound by the terms and conditions of this License Agreement. CWI LICENSE AGREEMENT FOR PYTHON 0.9.0 THROUGH 1.2 -------------------------------------------------- Copyright © 1991 - 1995, Stichting Mathematisch Centrum Amsterdam, The Netherlands. All rights reserved. Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of Stichting Mathematisch Centrum or CWI not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. STICHTING MATHEMATISCH CENTRUM DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL STICHTING MATHEMATISCH CENTRUM BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. ZERO-CLAUSE BSD LICENSE FOR CODE IN THE PYTHON 3.9.23 DOCUMENTATION ------------------------------------------------------------------- Permission to use, copy, modify, and/or distribute this software for any purpose with or without fee is hereby granted. THE SOFTWARE IS PROVIDED "AS IS" AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS. IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY SPECIAL, DIRECT, INDIRECT, OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. Licenses and Acknowledgements for Incorporated Software ======================================================= This section is an incomplete, but growing list of licenses and acknowledgements for third-party software incorporated in the Python distribution. Mersenne Twister ---------------- The "_random" module includes code based on a download from http://ww w.math.sci.hiroshima-u.ac.jp/~m-mat/MT/MT2002/emt19937ar.html. The following are the verbatim comments from the original code: A C-program for MT19937, with initialization improved 2002/1/26. Coded by Takuji Nishimura and Makoto Matsumoto. Before using, initialize the state by using init_genrand(seed) or init_by_array(init_key, key_length). Copyright (C) 1997 - 2002, Makoto Matsumoto and Takuji Nishimura, All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. The names of its contributors may not be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Any feedback is very welcome. http://www.math.sci.hiroshima-u.ac.jp/~m-mat/MT/emt.html email: m-mat @ math.sci.hiroshima-u.ac.jp (remove space) Sockets ------- The "socket" module uses the functions, "getaddrinfo()", and "getnameinfo()", which are coded in separate source files from the WIDE Project, http://www.wide.ad.jp/. Copyright (C) 1995, 1996, 1997, and 1998 WIDE Project. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. 3. Neither the name of the project nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE PROJECT AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE PROJECT OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. Asynchronous socket services ---------------------------- The "asynchat" and "asyncore" modules contain the following notice: Copyright 1996 by Sam Rushing All Rights Reserved Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of Sam Rushing not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. SAM RUSHING DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL SAM RUSHING BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. Cookie management ----------------- The "http.cookies" module contains the following notice: Copyright 2000 by Timothy O'Malley All Rights Reserved Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of Timothy O'Malley not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. Timothy O'Malley DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL Timothy O'Malley BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. Execution tracing ----------------- The "trace" module contains the following notice: portions copyright 2001, Autonomous Zones Industries, Inc., all rights... err... reserved and offered to the public under the terms of the Python 2.2 license. Author: Zooko O'Whielacronx http://zooko.com/ mailto:zooko@zooko.com Copyright 2000, Mojam Media, Inc., all rights reserved. Author: Skip Montanaro Copyright 1999, Bioreason, Inc., all rights reserved. Author: Andrew Dalke Copyright 1995-1997, Automatrix, Inc., all rights reserved. Author: Skip Montanaro Copyright 1991-1995, Stichting Mathematisch Centrum, all rights reserved. Permission to use, copy, modify, and distribute this Python software and its associated documentation for any purpose without fee is hereby granted, provided that the above copyright notice appears in all copies, and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of neither Automatrix, Bioreason or Mojam Media be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. UUencode and UUdecode functions ------------------------------- The "uu" module contains the following notice: Copyright 1994 by Lance Ellinghouse Cathedral City, California Republic, United States of America. All Rights Reserved Permission to use, copy, modify, and distribute this software and its documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appear in all copies and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of Lance Ellinghouse not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. LANCE ELLINGHOUSE DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS, IN NO EVENT SHALL LANCE ELLINGHOUSE CENTRUM BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. Modified by Jack Jansen, CWI, July 1995: - Use binascii module to do the actual line-by-line conversion between ascii and binary. This results in a 1000-fold speedup. The C version is still 5 times faster, though. - Arguments more compliant with Python standard XML Remote Procedure Calls -------------------------- The "xmlrpc.client" module contains the following notice: The XML-RPC client interface is Copyright (c) 1999-2002 by Secret Labs AB Copyright (c) 1999-2002 by Fredrik Lundh By obtaining, using, and/or copying this software and/or its associated documentation, you agree that you have read, understood, and will comply with the following terms and conditions: Permission to use, copy, modify, and distribute this software and its associated documentation for any purpose and without fee is hereby granted, provided that the above copyright notice appears in all copies, and that both that copyright notice and this permission notice appear in supporting documentation, and that the name of Secret Labs AB or the author not be used in advertising or publicity pertaining to distribution of the software without specific, written prior permission. SECRET LABS AB AND THE AUTHOR DISCLAIMS ALL WARRANTIES WITH REGARD TO THIS SOFTWARE, INCLUDING ALL IMPLIED WARRANTIES OF MERCHANT- ABILITY AND FITNESS. IN NO EVENT SHALL SECRET LABS AB OR THE AUTHOR BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER RESULTING FROM LOSS OF USE, DATA OR PROFITS, WHETHER IN AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF THIS SOFTWARE. test_epoll ---------- The "test_epoll" module contains the following notice: Copyright (c) 2001-2006 Twisted Matrix Laboratories. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. Select kqueue ------------- The "select" module contains the following notice for the kqueue interface: Copyright (c) 2000 Doug White, 2006 James Knight, 2007 Christian Heimes All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS ``AS IS'' AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. SipHash24 --------- The file "Python/pyhash.c" contains Marek Majkowski’ implementation of Dan Bernstein’s SipHash24 algorithm. It contains the following note: Copyright (c) 2013 Marek Majkowski Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. Original location: https://github.com/majek/csiphash/ Solution inspired by code from: Samuel Neves (supercop/crypto_auth/siphash24/little) djb (supercop/crypto_auth/siphash24/little2) Jean-Philippe Aumasson (https://131002.net/siphash/siphash24.c) strtod and dtoa --------------- The file "Python/dtoa.c", which supplies C functions dtoa and strtod for conversion of C doubles to and from strings, is derived from the file of the same name by David M. Gay, currently available from http://www.netlib.org/fp/. The original file, as retrieved on March 16, 2009, contains the following copyright and licensing notice: /**************************************************************** * * The author of this software is David M. Gay. * * Copyright (c) 1991, 2000, 2001 by Lucent Technologies. * * Permission to use, copy, modify, and distribute this software for any * purpose without fee is hereby granted, provided that this entire notice * is included in all copies of any software which is or includes a copy * or modification of this software and in all copies of the supporting * documentation for such software. * * THIS SOFTWARE IS BEING PROVIDED "AS IS", WITHOUT ANY EXPRESS OR IMPLIED * WARRANTY. IN PARTICULAR, NEITHER THE AUTHOR NOR LUCENT MAKES ANY * REPRESENTATION OR WARRANTY OF ANY KIND CONCERNING THE MERCHANTABILITY * OF THIS SOFTWARE OR ITS FITNESS FOR ANY PARTICULAR PURPOSE. * ***************************************************************/ OpenSSL ------- The modules "hashlib", "posix", "ssl", "crypt" use the OpenSSL library for added performance if made available by the operating system. Additionally, the Windows and macOS installers for Python may include a copy of the OpenSSL libraries, so we include a copy of the OpenSSL license here: LICENSE ISSUES ============== The OpenSSL toolkit stays under a dual license, i.e. both the conditions of the OpenSSL License and the original SSLeay license apply to the toolkit. See below for the actual license texts. Actually both licenses are BSD-style Open Source licenses. In case of any license issues related to OpenSSL please contact openssl-core@openssl.org. OpenSSL License --------------- /* ==================================================================== * Copyright (c) 1998-2008 The OpenSSL Project. All rights reserved. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * * 1. Redistributions of source code must retain the above copyright * notice, this list of conditions and the following disclaimer. * * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in * the documentation and/or other materials provided with the * distribution. * * 3. All advertising materials mentioning features or use of this * software must display the following acknowledgment: * "This product includes software developed by the OpenSSL Project * for use in the OpenSSL Toolkit. (http://www.openssl.org/)" * * 4. The names "OpenSSL Toolkit" and "OpenSSL Project" must not be used to * endorse or promote products derived from this software without * prior written permission. For written permission, please contact * openssl-core@openssl.org. * * 5. Products derived from this software may not be called "OpenSSL" * nor may "OpenSSL" appear in their names without prior written * permission of the OpenSSL Project. * * 6. Redistributions of any form whatsoever must retain the following * acknowledgment: * "This product includes software developed by the OpenSSL Project * for use in the OpenSSL Toolkit (http://www.openssl.org/)" * * THIS SOFTWARE IS PROVIDED BY THE OpenSSL PROJECT ``AS IS'' AND ANY * EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR * PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE OpenSSL PROJECT OR * ITS CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, * SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT * NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; * LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, * STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) * ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED * OF THE POSSIBILITY OF SUCH DAMAGE. * ==================================================================== * * This product includes cryptographic software written by Eric Young * (eay@cryptsoft.com). This product includes software written by Tim * Hudson (tjh@cryptsoft.com). * */ Original SSLeay License ----------------------- /* Copyright (C) 1995-1998 Eric Young (eay@cryptsoft.com) * All rights reserved. * * This package is an SSL implementation written * by Eric Young (eay@cryptsoft.com). * The implementation was written so as to conform with Netscapes SSL. * * This library is free for commercial and non-commercial use as long as * the following conditions are aheared to. The following conditions * apply to all code found in this distribution, be it the RC4, RSA, * lhash, DES, etc., code; not just the SSL code. The SSL documentation * included with this distribution is covered by the same copyright terms * except that the holder is Tim Hudson (tjh@cryptsoft.com). * * Copyright remains Eric Young's, and as such any Copyright notices in * the code are not to be removed. * If this package is used in a product, Eric Young should be given attribution * as the author of the parts of the library used. * This can be in the form of a textual message at program startup or * in documentation (online or textual) provided with the package. * * Redistribution and use in source and binary forms, with or without * modification, are permitted provided that the following conditions * are met: * 1. Redistributions of source code must retain the copyright * notice, this list of conditions and the following disclaimer. * 2. Redistributions in binary form must reproduce the above copyright * notice, this list of conditions and the following disclaimer in the * documentation and/or other materials provided with the distribution. * 3. All advertising materials mentioning features or use of this software * must display the following acknowledgement: * "This product includes cryptographic software written by * Eric Young (eay@cryptsoft.com)" * The word 'cryptographic' can be left out if the rouines from the library * being used are not cryptographic related :-). * 4. If you include any Windows specific code (or a derivative thereof) from * the apps directory (application code) you must include an acknowledgement: * "This product includes software written by Tim Hudson (tjh@cryptsoft.com)" * * THIS SOFTWARE IS PROVIDED BY ERIC YOUNG ``AS IS'' AND * ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE * IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE * ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE * FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL * DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS * OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) * HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT * LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY * OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF * SUCH DAMAGE. * * The licence and distribution terms for any publically available version or * derivative of this code cannot be changed. i.e. this code cannot simply be * copied and put under another distribution licence * [including the GNU Public Licence.] */ expat ----- The "pyexpat" extension is built using an included copy of the expat sources unless the build is configured "--with-system-expat": Copyright (c) 1998, 1999, 2000 Thai Open Source Software Center Ltd and Clark Cooper Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. libffi ------ The "_ctypes" extension is built using an included copy of the libffi sources unless the build is configured "--with-system-libffi": Copyright (c) 1996-2008 Red Hat, Inc and others. Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the ``Software''), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. THE SOFTWARE IS PROVIDED ``AS IS'', WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. zlib ---- The "zlib" extension is built using an included copy of the zlib sources if the zlib version found on the system is too old to be used for the build: Copyright (C) 1995-2011 Jean-loup Gailly and Mark Adler This software is provided 'as-is', without any express or implied warranty. In no event will the authors be held liable for any damages arising from the use of this software. Permission is granted to anyone to use this software for any purpose, including commercial applications, and to alter it and redistribute it freely, subject to the following restrictions: 1. The origin of this software must not be misrepresented; you must not claim that you wrote the original software. If you use this software in a product, an acknowledgment in the product documentation would be appreciated but is not required. 2. Altered source versions must be plainly marked as such, and must not be misrepresented as being the original software. 3. This notice may not be removed or altered from any source distribution. Jean-loup Gailly Mark Adler jloup@gzip.org madler@alumni.caltech.edu cfuhash ------- The implementation of the hash table used by the "tracemalloc" is based on the cfuhash project: Copyright (c) 2005 Don Owens All rights reserved. This code is released under the BSD license: Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the author nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. libmpdec -------- The "_decimal" module is built using an included copy of the libmpdec library unless the build is configured "--with-system-libmpdec": Copyright (c) 2008-2020 Stefan Krah. All rights reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: 1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer. 2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. THIS SOFTWARE IS PROVIDED BY THE AUTHOR AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE AUTHOR OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. W3C C14N test suite ------------------- The C14N 2.0 test suite in the "test" package ("Lib/test/xmltestdata/c14n-20/") was retrieved from the W3C website at https://www.w3.org/TR/xml-c14n2-testcases/ and is distributed under the 3-clause BSD license: Copyright (c) 2013 W3C(R) (MIT, ERCIM, Keio, Beihang), All Rights Reserved. Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met: * Redistributions of works must retain the original copyright notice, this list of conditions and the following disclaimer. * Redistributions in binary form must reproduce the original copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution. * Neither the name of the W3C nor the names of its contributors may be used to endorse or promote products derived from this work without specific prior written permission. THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 8. Compound statements ********************** Compound statements contain (groups of) other statements; they affect or control the execution of those other statements in some way. In general, compound statements span multiple lines, although in simple incarnations a whole compound statement may be contained in one line. The "if", "while" and "for" statements implement traditional control flow constructs. "try" specifies exception handlers and/or cleanup code for a group of statements, while the "with" statement allows the execution of initialization and finalization code around a block of code. Function and class definitions are also syntactically compound statements. A compound statement consists of one or more ‘clauses.’ A clause consists of a header and a ‘suite.’ The clause headers of a particular compound statement are all at the same indentation level. Each clause header begins with a uniquely identifying keyword and ends with a colon. A suite is a group of statements controlled by a clause. A suite can be one or more semicolon-separated simple statements on the same line as the header, following the header’s colon, or it can be one or more indented statements on subsequent lines. Only the latter form of a suite can contain nested compound statements; the following is illegal, mostly because it wouldn’t be clear to which "if" clause a following "else" clause would belong: if test1: if test2: print(x) Also note that the semicolon binds tighter than the colon in this context, so that in the following example, either all or none of the "print()" calls are executed: if x < y < z: print(x); print(y); print(z) Summarizing: compound_stmt ::= if_stmt | while_stmt | for_stmt | try_stmt | with_stmt | funcdef | classdef | async_with_stmt | async_for_stmt | async_funcdef suite ::= stmt_list NEWLINE | NEWLINE INDENT statement+ DEDENT statement ::= stmt_list NEWLINE | compound_stmt stmt_list ::= simple_stmt (";" simple_stmt)* [";"] Note that statements always end in a "NEWLINE" possibly followed by a "DEDENT". Also note that optional continuation clauses always begin with a keyword that cannot start a statement, thus there are no ambiguities (the ‘dangling "else"’ problem is solved in Python by requiring nested "if" statements to be indented). The formatting of the grammar rules in the following sections places each clause on a separate line for clarity. 8.1. The "if" statement ======================= The "if" statement is used for conditional execution: if_stmt ::= "if" assignment_expression ":" suite ("elif" assignment_expression ":" suite)* ["else" ":" suite] It selects exactly one of the suites by evaluating the expressions one by one until one is found to be true (see section Boolean operations for the definition of true and false); then that suite is executed (and no other part of the "if" statement is executed or evaluated). If all expressions are false, the suite of the "else" clause, if present, is executed. 8.2. The "while" statement ========================== The "while" statement is used for repeated execution as long as an expression is true: while_stmt ::= "while" assignment_expression ":" suite ["else" ":" suite] This repeatedly tests the expression and, if it is true, executes the first suite; if the expression is false (which may be the first time it is tested) the suite of the "else" clause, if present, is executed and the loop terminates. A "break" statement executed in the first suite terminates the loop without executing the "else" clause’s suite. A "continue" statement executed in the first suite skips the rest of the suite and goes back to testing the expression. 8.3. The "for" statement ======================== The "for" statement is used to iterate over the elements of a sequence (such as a string, tuple or list) or other iterable object: for_stmt ::= "for" target_list "in" expression_list ":" suite ["else" ":" suite] The expression list is evaluated once; it should yield an iterable object. An iterator is created for the result of the "expression_list". The suite is then executed once for each item provided by the iterator, in the order returned by the iterator. Each item in turn is assigned to the target list using the standard rules for assignments (see Assignment statements), and then the suite is executed. When the items are exhausted (which is immediately when the sequence is empty or an iterator raises a "StopIteration" exception), the suite in the "else" clause, if present, is executed, and the loop terminates. A "break" statement executed in the first suite terminates the loop without executing the "else" clause’s suite. A "continue" statement executed in the first suite skips the rest of the suite and continues with the next item, or with the "else" clause if there is no next item. The for-loop makes assignments to the variables in the target list. This overwrites all previous assignments to those variables including those made in the suite of the for-loop: for i in range(10): print(i) i = 5 # this will not affect the for-loop # because i will be overwritten with the next # index in the range Names in the target list are not deleted when the loop is finished, but if the sequence is empty, they will not have been assigned to at all by the loop. Hint: the built-in function "range()" returns an iterator of integers suitable to emulate the effect of Pascal’s "for i := a to b do"; e.g., "list(range(3))" returns the list "[0, 1, 2]". Note: There is a subtlety when the sequence is being modified by the loop (this can only occur for mutable sequences, e.g. lists). An internal counter is used to keep track of which item is used next, and this is incremented on each iteration. When this counter has reached the length of the sequence the loop terminates. This means that if the suite deletes the current (or a previous) item from the sequence, the next item will be skipped (since it gets the index of the current item which has already been treated). Likewise, if the suite inserts an item in the sequence before the current item, the current item will be treated again the next time through the loop. This can lead to nasty bugs that can be avoided by making a temporary copy using a slice of the whole sequence, e.g., for x in a[:]: if x < 0: a.remove(x) 8.4. The "try" statement ======================== The "try" statement specifies exception handlers and/or cleanup code for a group of statements: try_stmt ::= try1_stmt | try2_stmt try1_stmt ::= "try" ":" suite ("except" [expression ["as" identifier]] ":" suite)+ ["else" ":" suite] ["finally" ":" suite] try2_stmt ::= "try" ":" suite "finally" ":" suite The "except" clause(s) specify one or more exception handlers. When no exception occurs in the "try" clause, no exception handler is executed. When an exception occurs in the "try" suite, a search for an exception handler is started. This search inspects the except clauses in turn until one is found that matches the exception. An expression- less except clause, if present, must be last; it matches any exception. For an except clause with an expression, that expression is evaluated, and the clause matches the exception if the resulting object is “compatible” with the exception. An object is compatible with an exception if the object is the class or a *non-virtual base class* of the exception object, or a tuple containing an item that is the class or a non-virtual base class of the exception object. If no except clause matches the exception, the search for an exception handler continues in the surrounding code and on the invocation stack. [1] If the evaluation of an expression in the header of an except clause raises an exception, the original search for a handler is canceled and a search starts for the new exception in the surrounding code and on the call stack (it is treated as if the entire "try" statement raised the exception). When a matching except clause is found, the exception is assigned to the target specified after the "as" keyword in that except clause, if present, and the except clause’s suite is executed. All except clauses must have an executable block. When the end of this block is reached, execution continues normally after the entire try statement. (This means that if two nested handlers exist for the same exception, and the exception occurs in the try clause of the inner handler, the outer handler will not handle the exception.) When an exception has been assigned using "as target", it is cleared at the end of the except clause. This is as if except E as N: foo was translated to except E as N: try: foo finally: del N This means the exception must be assigned to a different name to be able to refer to it after the except clause. Exceptions are cleared because with the traceback attached to them, they form a reference cycle with the stack frame, keeping all locals in that frame alive until the next garbage collection occurs. Before an except clause’s suite is executed, details about the exception are stored in the "sys" module and can be accessed via "sys.exc_info()". "sys.exc_info()" returns a 3-tuple consisting of the exception class, the exception instance and a traceback object (see section The standard type hierarchy) identifying the point in the program where the exception occurred. "sys.exc_info()" values are restored to their previous values (before the call) when returning from a function that handled an exception. The optional "else" clause is executed if the control flow leaves the "try" suite, no exception was raised, and no "return", "continue", or "break" statement was executed. Exceptions in the "else" clause are not handled by the preceding "except" clauses. If "finally" is present, it specifies a ‘cleanup’ handler. The "try" clause is executed, including any "except" and "else" clauses. If an exception occurs in any of the clauses and is not handled, the exception is temporarily saved. The "finally" clause is executed. If there is a saved exception it is re-raised at the end of the "finally" clause. If the "finally" clause raises another exception, the saved exception is set as the context of the new exception. If the "finally" clause executes a "return", "break" or "continue" statement, the saved exception is discarded: >>> def f(): ... try: ... 1/0 ... finally: ... return 42 ... >>> f() 42 The exception information is not available to the program during execution of the "finally" clause. When a "return", "break" or "continue" statement is executed in the "try" suite of a "try"…"finally" statement, the "finally" clause is also executed ‘on the way out.’ The return value of a function is determined by the last "return" statement executed. Since the "finally" clause always executes, a "return" statement executed in the "finally" clause will always be the last one executed: >>> def foo(): ... try: ... return 'try' ... finally: ... return 'finally' ... >>> foo() 'finally' Additional information on exceptions can be found in section Exceptions, and information on using the "raise" statement to generate exceptions may be found in section The raise statement. Changed in version 3.8: Prior to Python 3.8, a "continue" statement was illegal in the "finally" clause due to a problem with the implementation. 8.5. The "with" statement ========================= The "with" statement is used to wrap the execution of a block with methods defined by a context manager (see section With Statement Context Managers). This allows common "try"…"except"…"finally" usage patterns to be encapsulated for convenient reuse. with_stmt ::= "with" with_item ("," with_item)* ":" suite with_item ::= expression ["as" target] The execution of the "with" statement with one “item” proceeds as follows: 1. The context expression (the expression given in the "with_item") is evaluated to obtain a context manager. 2. The context manager’s "__enter__()" is loaded for later use. 3. The context manager’s "__exit__()" is loaded for later use. 4. The context manager’s "__enter__()" method is invoked. 5. If a target was included in the "with" statement, the return value from "__enter__()" is assigned to it. Note: The "with" statement guarantees that if the "__enter__()" method returns without an error, then "__exit__()" will always be called. Thus, if an error occurs during the assignment to the target list, it will be treated the same as an error occurring within the suite would be. See step 6 below. 6. The suite is executed. 7. The context manager’s "__exit__()" method is invoked. If an exception caused the suite to be exited, its type, value, and traceback are passed as arguments to "__exit__()". Otherwise, three "None" arguments are supplied. If the suite was exited due to an exception, and the return value from the "__exit__()" method was false, the exception is reraised. If the return value was true, the exception is suppressed, and execution continues with the statement following the "with" statement. If the suite was exited for any reason other than an exception, the return value from "__exit__()" is ignored, and execution proceeds at the normal location for the kind of exit that was taken. The following code: with EXPRESSION as TARGET: SUITE is semantically equivalent to: manager = (EXPRESSION) enter = type(manager).__enter__ exit = type(manager).__exit__ value = enter(manager) hit_except = False try: TARGET = value SUITE except: hit_except = True if not exit(manager, *sys.exc_info()): raise finally: if not hit_except: exit(manager, None, None, None) With more than one item, the context managers are processed as if multiple "with" statements were nested: with A() as a, B() as b: SUITE is semantically equivalent to: with A() as a: with B() as b: SUITE Changed in version 3.1: Support for multiple context expressions. See also: **PEP 343** - The “with” statement The specification, background, and examples for the Python "with" statement. 8.6. Function definitions ========================= A function definition defines a user-defined function object (see section The standard type hierarchy): funcdef ::= [decorators] "def" funcname "(" [parameter_list] ")" ["->" expression] ":" suite decorators ::= decorator+ decorator ::= "@" assignment_expression NEWLINE parameter_list ::= defparameter ("," defparameter)* "," "/" ["," [parameter_list_no_posonly]] | parameter_list_no_posonly parameter_list_no_posonly ::= defparameter ("," defparameter)* ["," [parameter_list_starargs]] | parameter_list_starargs parameter_list_starargs ::= "*" [parameter] ("," defparameter)* ["," ["**" parameter [","]]] | "**" parameter [","] parameter ::= identifier [":" expression] defparameter ::= parameter ["=" expression] funcname ::= identifier A function definition is an executable statement. Its execution binds the function name in the current local namespace to a function object (a wrapper around the executable code for the function). This function object contains a reference to the current global namespace as the global namespace to be used when the function is called. The function definition does not execute the function body; this gets executed only when the function is called. [2] A function definition may be wrapped by one or more *decorator* expressions. Decorator expressions are evaluated when the function is defined, in the scope that contains the function definition. The result must be a callable, which is invoked with the function object as the only argument. The returned value is bound to the function name instead of the function object. Multiple decorators are applied in nested fashion. For example, the following code @f1(arg) @f2 def func(): pass is roughly equivalent to def func(): pass func = f1(arg)(f2(func)) except that the original function is not temporarily bound to the name "func". Changed in version 3.9: Functions may be decorated with any valid "assignment_expression". Previously, the grammar was much more restrictive; see **PEP 614** for details. When one or more *parameters* have the form *parameter* "=" *expression*, the function is said to have “default parameter values.” For a parameter with a default value, the corresponding *argument* may be omitted from a call, in which case the parameter’s default value is substituted. If a parameter has a default value, all following parameters up until the “"*"” must also have a default value — this is a syntactic restriction that is not expressed by the grammar. **Default parameter values are evaluated from left to right when the function definition is executed.** This means that the expression is evaluated once, when the function is defined, and that the same “pre- computed” value is used for each call. This is especially important to understand when a default parameter is a mutable object, such as a list or a dictionary: if the function modifies the object (e.g. by appending an item to a list), the default value is in effect modified. This is generally not what was intended. A way around this is to use "None" as the default, and explicitly test for it in the body of the function, e.g.: def whats_on_the_telly(penguin=None): if penguin is None: penguin = [] penguin.append("property of the zoo") return penguin Function call semantics are described in more detail in section Calls. A function call always assigns values to all parameters mentioned in the parameter list, either from positional arguments, from keyword arguments, or from default values. If the form “"*identifier"” is present, it is initialized to a tuple receiving any excess positional parameters, defaulting to the empty tuple. If the form “"**identifier"” is present, it is initialized to a new ordered mapping receiving any excess keyword arguments, defaulting to a new empty mapping of the same type. Parameters after “"*"” or “"*identifier"” are keyword-only parameters and may only be passed by keyword arguments. Parameters before “"/"” are positional-only parameters and may only be passed by positional arguments. Changed in version 3.8: The "/" function parameter syntax may be used to indicate positional-only parameters. See **PEP 570** for details. Parameters may have an *annotation* of the form “": expression"” following the parameter name. Any parameter may have an annotation, even those of the form "*identifier" or "**identifier". Functions may have “return” annotation of the form “"-> expression"” after the parameter list. These annotations can be any valid Python expression. The presence of annotations does not change the semantics of a function. The annotation values are available as values of a dictionary keyed by the parameters’ names in the "__annotations__" attribute of the function object. If the "annotations" import from "__future__" is used, annotations are preserved as strings at runtime which enables postponed evaluation. Otherwise, they are evaluated when the function definition is executed. In this case annotations may be evaluated in a different order than they appear in the source code. It is also possible to create anonymous functions (functions not bound to a name), for immediate use in expressions. This uses lambda expressions, described in section Lambdas. Note that the lambda expression is merely a shorthand for a simplified function definition; a function defined in a “"def"” statement can be passed around or assigned to another name just like a function defined by a lambda expression. The “"def"” form is actually more powerful since it allows the execution of multiple statements and annotations. **Programmer’s note:** Functions are first-class objects. A “"def"” statement executed inside a function definition defines a local function that can be returned or passed around. Free variables used in the nested function can access the local variables of the function containing the def. See section Naming and binding for details. See also: **PEP 3107** - Function Annotations The original specification for function annotations. **PEP 484** - Type Hints Definition of a standard meaning for annotations: type hints. **PEP 526** - Syntax for Variable Annotations Ability to type hint variable declarations, including class variables and instance variables **PEP 563** - Postponed Evaluation of Annotations Support for forward references within annotations by preserving annotations in a string form at runtime instead of eager evaluation. 8.7. Class definitions ====================== A class definition defines a class object (see section The standard type hierarchy): classdef ::= [decorators] "class" classname [inheritance] ":" suite inheritance ::= "(" [argument_list] ")" classname ::= identifier A class definition is an executable statement. The inheritance list usually gives a list of base classes (see Metaclasses for more advanced uses), so each item in the list should evaluate to a class object which allows subclassing. Classes without an inheritance list inherit, by default, from the base class "object"; hence, class Foo: pass is equivalent to class Foo(object): pass The class’s suite is then executed in a new execution frame (see Naming and binding), using a newly created local namespace and the original global namespace. (Usually, the suite contains mostly function definitions.) When the class’s suite finishes execution, its execution frame is discarded but its local namespace is saved. [3] A class object is then created using the inheritance list for the base classes and the saved local namespace for the attribute dictionary. The class name is bound to this class object in the original local namespace. The order in which attributes are defined in the class body is preserved in the new class’s "__dict__". Note that this is reliable only right after the class is created and only for classes that were defined using the definition syntax. Class creation can be customized heavily using metaclasses. Classes can also be decorated: just like when decorating functions, @f1(arg) @f2 class Foo: pass is roughly equivalent to class Foo: pass Foo = f1(arg)(f2(Foo)) The evaluation rules for the decorator expressions are the same as for function decorators. The result is then bound to the class name. Changed in version 3.9: Classes may be decorated with any valid "assignment_expression". Previously, the grammar was much more restrictive; see **PEP 614** for details. **Programmer’s note:** Variables defined in the class definition are class attributes; they are shared by instances. Instance attributes can be set in a method with "self.name = value". Both class and instance attributes are accessible through the notation “"self.name"”, and an instance attribute hides a class attribute with the same name when accessed in this way. Class attributes can be used as defaults for instance attributes, but using mutable values there can lead to unexpected results. Descriptors can be used to create instance variables with different implementation details. See also: **PEP 3115** - Metaclasses in Python 3000 The proposal that changed the declaration of metaclasses to the current syntax, and the semantics for how classes with metaclasses are constructed. **PEP 3129** - Class Decorators The proposal that added class decorators. Function and method decorators were introduced in **PEP 318**. 8.8. Coroutines =============== New in version 3.5. 8.8.1. Coroutine function definition ------------------------------------ async_funcdef ::= [decorators] "async" "def" funcname "(" [parameter_list] ")" ["->" expression] ":" suite Execution of Python coroutines can be suspended and resumed at many points (see *coroutine*). Inside the body of a coroutine function, "await" and "async" identifiers become reserved keywords; "await" expressions, "async for" and "async with" can only be used in coroutine function bodies. Functions defined with "async def" syntax are always coroutine functions, even if they do not contain "await" or "async" keywords. It is a "SyntaxError" to use a "yield from" expression inside the body of a coroutine function. An example of a coroutine function: async def func(param1, param2): do_stuff() await some_coroutine() 8.8.2. The "async for" statement -------------------------------- async_for_stmt ::= "async" for_stmt An *asynchronous iterable* provides an "__aiter__" method that directly returns an *asynchronous iterator*, which can call asynchronous code in its "__anext__" method. The "async for" statement allows convenient iteration over asynchronous iterables. The following code: async for TARGET in ITER: SUITE else: SUITE2 Is semantically equivalent to: iter = (ITER) iter = type(iter).__aiter__(iter) running = True while running: try: TARGET = await type(iter).__anext__(iter) except StopAsyncIteration: running = False else: SUITE else: SUITE2 See also "__aiter__()" and "__anext__()" for details. It is a "SyntaxError" to use an "async for" statement outside the body of a coroutine function. 8.8.3. The "async with" statement --------------------------------- async_with_stmt ::= "async" with_stmt An *asynchronous context manager* is a *context manager* that is able to suspend execution in its *enter* and *exit* methods. The following code: async with EXPRESSION as TARGET: SUITE is semantically equivalent to: manager = (EXPRESSION) aenter = type(manager).__aenter__ aexit = type(manager).__aexit__ value = await aenter(manager) hit_except = False try: TARGET = value SUITE except: hit_except = True if not await aexit(manager, *sys.exc_info()): raise finally: if not hit_except: await aexit(manager, None, None, None) See also "__aenter__()" and "__aexit__()" for details. It is a "SyntaxError" to use an "async with" statement outside the body of a coroutine function. See also: **PEP 492** - Coroutines with async and await syntax The proposal that made coroutines a proper standalone concept in Python, and added supporting syntax. -[ Footnotes ]- [1] The exception is propagated to the invocation stack unless there is a "finally" clause which happens to raise another exception. That new exception causes the old one to be lost. [2] A string literal appearing as the first statement in the function body is transformed into the function’s "__doc__" attribute and therefore the function’s *docstring*. [3] A string literal appearing as the first statement in the class body is transformed into the namespace’s "__doc__" item and therefore the class’s *docstring*. 3. Data model ************* 3.1. Objects, values and types ============================== *Objects* are Python’s abstraction for data. All data in a Python program is represented by objects or by relations between objects. (In a sense, and in conformance to Von Neumann’s model of a “stored program computer”, code is also represented by objects.) Every object has an identity, a type and a value. An object’s *identity* never changes once it has been created; you may think of it as the object’s address in memory. The ‘"is"’ operator compares the identity of two objects; the "id()" function returns an integer representing its identity. **CPython implementation detail:** For CPython, "id(x)" is the memory address where "x" is stored. An object’s type determines the operations that the object supports (e.g., “does it have a length?”) and also defines the possible values for objects of that type. The "type()" function returns an object’s type (which is an object itself). Like its identity, an object’s *type* is also unchangeable. [1] The *value* of some objects can change. Objects whose value can change are said to be *mutable*; objects whose value is unchangeable once they are created are called *immutable*. (The value of an immutable container object that contains a reference to a mutable object can change when the latter’s value is changed; however the container is still considered immutable, because the collection of objects it contains cannot be changed. So, immutability is not strictly the same as having an unchangeable value, it is more subtle.) An object’s mutability is determined by its type; for instance, numbers, strings and tuples are immutable, while dictionaries and lists are mutable. Objects are never explicitly destroyed; however, when they become unreachable they may be garbage-collected. An implementation is allowed to postpone garbage collection or omit it altogether — it is a matter of implementation quality how garbage collection is implemented, as long as no objects are collected that are still reachable. **CPython implementation detail:** CPython currently uses a reference- counting scheme with (optional) delayed detection of cyclically linked garbage, which collects most objects as soon as they become unreachable, but is not guaranteed to collect garbage containing circular references. See the documentation of the "gc" module for information on controlling the collection of cyclic garbage. Other implementations act differently and CPython may change. Do not depend on immediate finalization of objects when they become unreachable (so you should always close files explicitly). Note that the use of the implementation’s tracing or debugging facilities may keep objects alive that would normally be collectable. Also note that catching an exception with a ‘"try"…"except"’ statement may keep objects alive. Some objects contain references to “external” resources such as open files or windows. It is understood that these resources are freed when the object is garbage-collected, but since garbage collection is not guaranteed to happen, such objects also provide an explicit way to release the external resource, usually a "close()" method. Programs are strongly recommended to explicitly close such objects. The ‘"try"…"finally"’ statement and the ‘"with"’ statement provide convenient ways to do this. Some objects contain references to other objects; these are called *containers*. Examples of containers are tuples, lists and dictionaries. The references are part of a container’s value. In most cases, when we talk about the value of a container, we imply the values, not the identities of the contained objects; however, when we talk about the mutability of a container, only the identities of the immediately contained objects are implied. So, if an immutable container (like a tuple) contains a reference to a mutable object, its value changes if that mutable object is changed. Types affect almost all aspects of object behavior. Even the importance of object identity is affected in some sense: for immutable types, operations that compute new values may actually return a reference to any existing object with the same type and value, while for mutable objects this is not allowed. E.g., after "a = 1; b = 1", "a" and "b" may or may not refer to the same object with the value one, depending on the implementation, but after "c = []; d = []", "c" and "d" are guaranteed to refer to two different, unique, newly created empty lists. (Note that "c = d = []" assigns the same object to both "c" and "d".) 3.2. The standard type hierarchy ================================ Below is a list of the types that are built into Python. Extension modules (written in C, Java, or other languages, depending on the implementation) can define additional types. Future versions of Python may add types to the type hierarchy (e.g., rational numbers, efficiently stored arrays of integers, etc.), although such additions will often be provided via the standard library instead. Some of the type descriptions below contain a paragraph listing ‘special attributes.’ These are attributes that provide access to the implementation and are not intended for general use. Their definition may change in the future. None This type has a single value. There is a single object with this value. This object is accessed through the built-in name "None". It is used to signify the absence of a value in many situations, e.g., it is returned from functions that don’t explicitly return anything. Its truth value is false. NotImplemented This type has a single value. There is a single object with this value. This object is accessed through the built-in name "NotImplemented". Numeric methods and rich comparison methods should return this value if they do not implement the operation for the operands provided. (The interpreter will then try the reflected operation, or some other fallback, depending on the operator.) It should not be evaluated in a boolean context. See Implementing the arithmetic operations for more details. Changed in version 3.9: Evaluating "NotImplemented" in a boolean context is deprecated. While it currently evaluates as true, it will emit a "DeprecationWarning". It will raise a "TypeError" in a future version of Python. Ellipsis This type has a single value. There is a single object with this value. This object is accessed through the literal "..." or the built-in name "Ellipsis". Its truth value is true. "numbers.Number" These are created by numeric literals and returned as results by arithmetic operators and arithmetic built-in functions. Numeric objects are immutable; once created their value never changes. Python numbers are of course strongly related to mathematical numbers, but subject to the limitations of numerical representation in computers. The string representations of the numeric classes, computed by "__repr__()" and "__str__()", have the following properties: * They are valid numeric literals which, when passed to their class constructor, produce an object having the value of the original numeric. * The representation is in base 10, when possible. * Leading zeros, possibly excepting a single zero before a decimal point, are not shown. * Trailing zeros, possibly excepting a single zero after a decimal point, are not shown. * A sign is shown only when the number is negative. Python distinguishes between integers, floating point numbers, and complex numbers: "numbers.Integral" These represent elements from the mathematical set of integers (positive and negative). There are two types of integers: Integers ("int") These represent numbers in an unlimited range, subject to available (virtual) memory only. For the purpose of shift and mask operations, a binary representation is assumed, and negative numbers are represented in a variant of 2’s complement which gives the illusion of an infinite string of sign bits extending to the left. Booleans ("bool") These represent the truth values False and True. The two objects representing the values "False" and "True" are the only Boolean objects. The Boolean type is a subtype of the integer type, and Boolean values behave like the values 0 and 1, respectively, in almost all contexts, the exception being that when converted to a string, the strings ""False"" or ""True"" are returned, respectively. The rules for integer representation are intended to give the most meaningful interpretation of shift and mask operations involving negative integers. "numbers.Real" ("float") These represent machine-level double precision floating point numbers. You are at the mercy of the underlying machine architecture (and C or Java implementation) for the accepted range and handling of overflow. Python does not support single- precision floating point numbers; the savings in processor and memory usage that are usually the reason for using these are dwarfed by the overhead of using objects in Python, so there is no reason to complicate the language with two kinds of floating point numbers. "numbers.Complex" ("complex") These represent complex numbers as a pair of machine-level double precision floating point numbers. The same caveats apply as for floating point numbers. The real and imaginary parts of a complex number "z" can be retrieved through the read-only attributes "z.real" and "z.imag". Sequences These represent finite ordered sets indexed by non-negative numbers. The built-in function "len()" returns the number of items of a sequence. When the length of a sequence is *n*, the index set contains the numbers 0, 1, …, *n*-1. Item *i* of sequence *a* is selected by "a[i]". Sequences also support slicing: "a[i:j]" selects all items with index *k* such that *i* "<=" *k* "<" *j*. When used as an expression, a slice is a sequence of the same type. This implies that the index set is renumbered so that it starts at 0. Some sequences also support “extended slicing” with a third “step” parameter: "a[i:j:k]" selects all items of *a* with index *x* where "x = i + n*k", *n* ">=" "0" and *i* "<=" *x* "<" *j*. Sequences are distinguished according to their mutability: Immutable sequences An object of an immutable sequence type cannot change once it is created. (If the object contains references to other objects, these other objects may be mutable and may be changed; however, the collection of objects directly referenced by an immutable object cannot change.) The following types are immutable sequences: Strings A string is a sequence of values that represent Unicode code points. All the code points in the range "U+0000 - U+10FFFF" can be represented in a string. Python doesn’t have a "char" type; instead, every code point in the string is represented as a string object with length "1". The built-in function "ord()" converts a code point from its string form to an integer in the range "0 - 10FFFF"; "chr()" converts an integer in the range "0 - 10FFFF" to the corresponding length "1" string object. "str.encode()" can be used to convert a "str" to "bytes" using the given text encoding, and "bytes.decode()" can be used to achieve the opposite. Tuples The items of a tuple are arbitrary Python objects. Tuples of two or more items are formed by comma-separated lists of expressions. A tuple of one item (a ‘singleton’) can be formed by affixing a comma to an expression (an expression by itself does not create a tuple, since parentheses must be usable for grouping of expressions). An empty tuple can be formed by an empty pair of parentheses. Bytes A bytes object is an immutable array. The items are 8-bit bytes, represented by integers in the range 0 <= x < 256. Bytes literals (like "b'abc'") and the built-in "bytes()" constructor can be used to create bytes objects. Also, bytes objects can be decoded to strings via the "decode()" method. Mutable sequences Mutable sequences can be changed after they are created. The subscription and slicing notations can be used as the target of assignment and "del" (delete) statements. There are currently two intrinsic mutable sequence types: Lists The items of a list are arbitrary Python objects. Lists are formed by placing a comma-separated list of expressions in square brackets. (Note that there are no special cases needed to form lists of length 0 or 1.) Byte Arrays A bytearray object is a mutable array. They are created by the built-in "bytearray()" constructor. Aside from being mutable (and hence unhashable), byte arrays otherwise provide the same interface and functionality as immutable "bytes" objects. The extension module "array" provides an additional example of a mutable sequence type, as does the "collections" module. Set types These represent unordered, finite sets of unique, immutable objects. As such, they cannot be indexed by any subscript. However, they can be iterated over, and the built-in function "len()" returns the number of items in a set. Common uses for sets are fast membership testing, removing duplicates from a sequence, and computing mathematical operations such as intersection, union, difference, and symmetric difference. For set elements, the same immutability rules apply as for dictionary keys. Note that numeric types obey the normal rules for numeric comparison: if two numbers compare equal (e.g., "1" and "1.0"), only one of them can be contained in a set. There are currently two intrinsic set types: Sets These represent a mutable set. They are created by the built-in "set()" constructor and can be modified afterwards by several methods, such as "add()". Frozen sets These represent an immutable set. They are created by the built-in "frozenset()" constructor. As a frozenset is immutable and *hashable*, it can be used again as an element of another set, or as a dictionary key. Mappings These represent finite sets of objects indexed by arbitrary index sets. The subscript notation "a[k]" selects the item indexed by "k" from the mapping "a"; this can be used in expressions and as the target of assignments or "del" statements. The built-in function "len()" returns the number of items in a mapping. There is currently a single intrinsic mapping type: Dictionaries These represent finite sets of objects indexed by nearly arbitrary values. The only types of values not acceptable as keys are values containing lists or dictionaries or other mutable types that are compared by value rather than by object identity, the reason being that the efficient implementation of dictionaries requires a key’s hash value to remain constant. Numeric types used for keys obey the normal rules for numeric comparison: if two numbers compare equal (e.g., "1" and "1.0") then they can be used interchangeably to index the same dictionary entry. Dictionaries preserve insertion order, meaning that keys will be produced in the same order they were added sequentially over the dictionary. Replacing an existing key does not change the order, however removing a key and re-inserting it will add it to the end instead of keeping its old place. Dictionaries are mutable; they can be created by the "{...}" notation (see section Dictionary displays). The extension modules "dbm.ndbm" and "dbm.gnu" provide additional examples of mapping types, as does the "collections" module. Changed in version 3.7: Dictionaries did not preserve insertion order in versions of Python before 3.6. In CPython 3.6, insertion order was preserved, but it was considered an implementation detail at that time rather than a language guarantee. Callable types These are the types to which the function call operation (see section Calls) can be applied: User-defined functions A user-defined function object is created by a function definition (see section Function definitions). It should be called with an argument list containing the same number of items as the function’s formal parameter list. Special attributes: +---------------------------+---------------------------------+-------------+ | Attribute | Meaning | | |===========================|=================================|=============| | "__doc__" | The function’s documentation | Writable | | | string, or "None" if | | | | unavailable; not inherited by | | | | subclasses. | | +---------------------------+---------------------------------+-------------+ | "__name__" | The function’s name. | Writable | +---------------------------+---------------------------------+-------------+ | "__qualname__" | The function’s *qualified | Writable | | | name*. New in version 3.3. | | +---------------------------+---------------------------------+-------------+ | "__module__" | The name of the module the | Writable | | | function was defined in, or | | | | "None" if unavailable. | | +---------------------------+---------------------------------+-------------+ | "__defaults__" | A tuple containing default | Writable | | | argument values for those | | | | arguments that have defaults, | | | | or "None" if no arguments have | | | | a default value. | | +---------------------------+---------------------------------+-------------+ | "__code__" | The code object representing | Writable | | | the compiled function body. | | +---------------------------+---------------------------------+-------------+ | "__globals__" | A reference to the dictionary | Read-only | | | that holds the function’s | | | | global variables — the global | | | | namespace of the module in | | | | which the function was defined. | | +---------------------------+---------------------------------+-------------+ | "__dict__" | The namespace supporting | Writable | | | arbitrary function attributes. | | +---------------------------+---------------------------------+-------------+ | "__closure__" | "None" or a tuple of cells that | Read-only | | | contain bindings for the | | | | function’s free variables. See | | | | below for information on the | | | | "cell_contents" attribute. | | +---------------------------+---------------------------------+-------------+ | "__annotations__" | A dict containing annotations | Writable | | | of parameters. The keys of the | | | | dict are the parameter names, | | | | and "'return'" for the return | | | | annotation, if provided. | | +---------------------------+---------------------------------+-------------+ | "__kwdefaults__" | A dict containing defaults for | Writable | | | keyword-only parameters. | | +---------------------------+---------------------------------+-------------+ Most of the attributes labelled “Writable” check the type of the assigned value. Function objects also support getting and setting arbitrary attributes, which can be used, for example, to attach metadata to functions. Regular attribute dot-notation is used to get and set such attributes. *Note that the current implementation only supports function attributes on user-defined functions. Function attributes on built-in functions may be supported in the future.* A cell object has the attribute "cell_contents". This can be used to get the value of the cell, as well as set the value. Additional information about a function’s definition can be retrieved from its code object; see the description of internal types below. The "cell" type can be accessed in the "types" module. Instance methods An instance method object combines a class, a class instance and any callable object (normally a user-defined function). Special read-only attributes: "__self__" is the class instance object, "__func__" is the function object; "__doc__" is the method’s documentation (same as "__func__.__doc__"); "__name__" is the method name (same as "__func__.__name__"); "__module__" is the name of the module the method was defined in, or "None" if unavailable. Methods also support accessing (but not setting) the arbitrary function attributes on the underlying function object. User-defined method objects may be created when getting an attribute of a class (perhaps via an instance of that class), if that attribute is a user-defined function object or a class method object. When an instance method object is created by retrieving a user- defined function object from a class via one of its instances, its "__self__" attribute is the instance, and the method object is said to be bound. The new method’s "__func__" attribute is the original function object. When an instance method object is created by retrieving a class method object from a class or instance, its "__self__" attribute is the class itself, and its "__func__" attribute is the function object underlying the class method. When an instance method object is called, the underlying function ("__func__") is called, inserting the class instance ("__self__") in front of the argument list. For instance, when "C" is a class which contains a definition for a function "f()", and "x" is an instance of "C", calling "x.f(1)" is equivalent to calling "C.f(x, 1)". When an instance method object is derived from a class method object, the “class instance” stored in "__self__" will actually be the class itself, so that calling either "x.f(1)" or "C.f(1)" is equivalent to calling "f(C,1)" where "f" is the underlying function. Note that the transformation from function object to instance method object happens each time the attribute is retrieved from the instance. In some cases, a fruitful optimization is to assign the attribute to a local variable and call that local variable. Also notice that this transformation only happens for user-defined functions; other callable objects (and all non- callable objects) are retrieved without transformation. It is also important to note that user-defined functions which are attributes of a class instance are not converted to bound methods; this *only* happens when the function is an attribute of the class. Generator functions A function or method which uses the "yield" statement (see section The yield statement) is called a *generator function*. Such a function, when called, always returns an iterator object which can be used to execute the body of the function: calling the iterator’s "iterator.__next__()" method will cause the function to execute until it provides a value using the "yield" statement. When the function executes a "return" statement or falls off the end, a "StopIteration" exception is raised and the iterator will have reached the end of the set of values to be returned. Coroutine functions A function or method which is defined using "async def" is called a *coroutine function*. Such a function, when called, returns a *coroutine* object. It may contain "await" expressions, as well as "async with" and "async for" statements. See also the Coroutine Objects section. Asynchronous generator functions A function or method which is defined using "async def" and which uses the "yield" statement is called a *asynchronous generator function*. Such a function, when called, returns an asynchronous iterator object which can be used in an "async for" statement to execute the body of the function. Calling the asynchronous iterator’s "aiterator.__anext__" method will return an *awaitable* which when awaited will execute until it provides a value using the "yield" expression. When the function executes an empty "return" statement or falls off the end, a "StopAsyncIteration" exception is raised and the asynchronous iterator will have reached the end of the set of values to be yielded. Built-in functions A built-in function object is a wrapper around a C function. Examples of built-in functions are "len()" and "math.sin()" ("math" is a standard built-in module). The number and type of the arguments are determined by the C function. Special read- only attributes: "__doc__" is the function’s documentation string, or "None" if unavailable; "__name__" is the function’s name; "__self__" is set to "None" (but see the next item); "__module__" is the name of the module the function was defined in or "None" if unavailable. Built-in methods This is really a different disguise of a built-in function, this time containing an object passed to the C function as an implicit extra argument. An example of a built-in method is "alist.append()", assuming *alist* is a list object. In this case, the special read-only attribute "__self__" is set to the object denoted by *alist*. Classes Classes are callable. These objects normally act as factories for new instances of themselves, but variations are possible for class types that override "__new__()". The arguments of the call are passed to "__new__()" and, in the typical case, to "__init__()" to initialize the new instance. Class Instances Instances of arbitrary classes can be made callable by defining a "__call__()" method in their class. Modules Modules are a basic organizational unit of Python code, and are created by the import system as invoked either by the "import" statement, or by calling functions such as "importlib.import_module()" and built-in "__import__()". A module object has a namespace implemented by a dictionary object (this is the dictionary referenced by the "__globals__" attribute of functions defined in the module). Attribute references are translated to lookups in this dictionary, e.g., "m.x" is equivalent to "m.__dict__["x"]". A module object does not contain the code object used to initialize the module (since it isn’t needed once the initialization is done). Attribute assignment updates the module’s namespace dictionary, e.g., "m.x = 1" is equivalent to "m.__dict__["x"] = 1". Predefined (writable) attributes: "__name__" is the module’s name; "__doc__" is the module’s documentation string, or "None" if unavailable; "__annotations__" (optional) is a dictionary containing *variable annotations* collected during module body execution; "__file__" is the pathname of the file from which the module was loaded, if it was loaded from a file. The "__file__" attribute may be missing for certain types of modules, such as C modules that are statically linked into the interpreter; for extension modules loaded dynamically from a shared library, it is the pathname of the shared library file. Special read-only attribute: "__dict__" is the module’s namespace as a dictionary object. **CPython implementation detail:** Because of the way CPython clears module dictionaries, the module dictionary will be cleared when the module falls out of scope even if the dictionary still has live references. To avoid this, copy the dictionary or keep the module around while using its dictionary directly. Custom classes Custom class types are typically created by class definitions (see section Class definitions). A class has a namespace implemented by a dictionary object. Class attribute references are translated to lookups in this dictionary, e.g., "C.x" is translated to "C.__dict__["x"]" (although there are a number of hooks which allow for other means of locating attributes). When the attribute name is not found there, the attribute search continues in the base classes. This search of the base classes uses the C3 method resolution order which behaves correctly even in the presence of ‘diamond’ inheritance structures where there are multiple inheritance paths leading back to a common ancestor. Additional details on the C3 MRO used by Python can be found in the documentation accompanying the 2.3 release at https://www.python.org/download/releases/2.3/mro/. When a class attribute reference (for class "C", say) would yield a class method object, it is transformed into an instance method object whose "__self__" attribute is "C". When it would yield a static method object, it is transformed into the object wrapped by the static method object. See section Implementing Descriptors for another way in which attributes retrieved from a class may differ from those actually contained in its "__dict__". Class attribute assignments update the class’s dictionary, never the dictionary of a base class. A class object can be called (see above) to yield a class instance (see below). Special attributes: "__name__" is the class name; "__module__" is the module name in which the class was defined; "__dict__" is the dictionary containing the class’s namespace; "__bases__" is a tuple containing the base classes, in the order of their occurrence in the base class list; "__doc__" is the class’s documentation string, or "None" if undefined; "__annotations__" (optional) is a dictionary containing *variable annotations* collected during class body execution. Class instances A class instance is created by calling a class object (see above). A class instance has a namespace implemented as a dictionary which is the first place in which attribute references are searched. When an attribute is not found there, and the instance’s class has an attribute by that name, the search continues with the class attributes. If a class attribute is found that is a user-defined function object, it is transformed into an instance method object whose "__self__" attribute is the instance. Static method and class method objects are also transformed; see above under “Classes”. See section Implementing Descriptors for another way in which attributes of a class retrieved via its instances may differ from the objects actually stored in the class’s "__dict__". If no class attribute is found, and the object’s class has a "__getattr__()" method, that is called to satisfy the lookup. Attribute assignments and deletions update the instance’s dictionary, never a class’s dictionary. If the class has a "__setattr__()" or "__delattr__()" method, this is called instead of updating the instance dictionary directly. Class instances can pretend to be numbers, sequences, or mappings if they have methods with certain special names. See section Special method names. Special attributes: "__dict__" is the attribute dictionary; "__class__" is the instance’s class. I/O objects (also known as file objects) A *file object* represents an open file. Various shortcuts are available to create file objects: the "open()" built-in function, and also "os.popen()", "os.fdopen()", and the "makefile()" method of socket objects (and perhaps by other functions or methods provided by extension modules). The objects "sys.stdin", "sys.stdout" and "sys.stderr" are initialized to file objects corresponding to the interpreter’s standard input, output and error streams; they are all open in text mode and therefore follow the interface defined by the "io.TextIOBase" abstract class. Internal types A few types used internally by the interpreter are exposed to the user. Their definitions may change with future versions of the interpreter, but they are mentioned here for completeness. Code objects Code objects represent *byte-compiled* executable Python code, or *bytecode*. The difference between a code object and a function object is that the function object contains an explicit reference to the function’s globals (the module in which it was defined), while a code object contains no context; also the default argument values are stored in the function object, not in the code object (because they represent values calculated at run-time). Unlike function objects, code objects are immutable and contain no references (directly or indirectly) to mutable objects. Special read-only attributes: "co_name" gives the function name; "co_argcount" is the total number of positional arguments (including positional-only arguments and arguments with default values); "co_posonlyargcount" is the number of positional-only arguments (including arguments with default values); "co_kwonlyargcount" is the number of keyword-only arguments (including arguments with default values); "co_nlocals" is the number of local variables used by the function (including arguments); "co_varnames" is a tuple containing the names of the local variables (starting with the argument names); "co_cellvars" is a tuple containing the names of local variables that are referenced by nested functions; "co_freevars" is a tuple containing the names of free variables; "co_code" is a string representing the sequence of bytecode instructions; "co_consts" is a tuple containing the literals used by the bytecode; "co_names" is a tuple containing the names used by the bytecode; "co_filename" is the filename from which the code was compiled; "co_firstlineno" is the first line number of the function; "co_lnotab" is a string encoding the mapping from bytecode offsets to line numbers (for details see the source code of the interpreter); "co_stacksize" is the required stack size; "co_flags" is an integer encoding a number of flags for the interpreter. The following flag bits are defined for "co_flags": bit "0x04" is set if the function uses the "*arguments" syntax to accept an arbitrary number of positional arguments; bit "0x08" is set if the function uses the "**keywords" syntax to accept arbitrary keyword arguments; bit "0x20" is set if the function is a generator. Future feature declarations ("from __future__ import division") also use bits in "co_flags" to indicate whether a code object was compiled with a particular feature enabled: bit "0x2000" is set if the function was compiled with future division enabled; bits "0x10" and "0x1000" were used in earlier versions of Python. Other bits in "co_flags" are reserved for internal use. If a code object represents a function, the first item in "co_consts" is the documentation string of the function, or "None" if undefined. Frame objects Frame objects represent execution frames. They may occur in traceback objects (see below), and are also passed to registered trace functions. Special read-only attributes: "f_back" is to the previous stack frame (towards the caller), or "None" if this is the bottom stack frame; "f_code" is the code object being executed in this frame; "f_locals" is the dictionary used to look up local variables; "f_globals" is used for global variables; "f_builtins" is used for built-in (intrinsic) names; "f_lasti" gives the precise instruction (this is an index into the bytecode string of the code object). Accessing "f_code" raises an auditing event "object.__getattr__" with arguments "obj" and ""f_code"". Special writable attributes: "f_trace", if not "None", is a function called for various events during code execution (this is used by the debugger). Normally an event is triggered for each new source line - this can be disabled by setting "f_trace_lines" to "False". Implementations *may* allow per-opcode events to be requested by setting "f_trace_opcodes" to "True". Note that this may lead to undefined interpreter behaviour if exceptions raised by the trace function escape to the function being traced. "f_lineno" is the current line number of the frame — writing to this from within a trace function jumps to the given line (only for the bottom-most frame). A debugger can implement a Jump command (aka Set Next Statement) by writing to f_lineno. Frame objects support one method: frame.clear() This method clears all references to local variables held by the frame. Also, if the frame belonged to a generator, the generator is finalized. This helps break reference cycles involving frame objects (for example when catching an exception and storing its traceback for later use). "RuntimeError" is raised if the frame is currently executing. New in version 3.4. Traceback objects Traceback objects represent a stack trace of an exception. A traceback object is implicitly created when an exception occurs, and may also be explicitly created by calling "types.TracebackType". For implicitly created tracebacks, when the search for an exception handler unwinds the execution stack, at each unwound level a traceback object is inserted in front of the current traceback. When an exception handler is entered, the stack trace is made available to the program. (See section The try statement.) It is accessible as the third item of the tuple returned by "sys.exc_info()", and as the "__traceback__" attribute of the caught exception. When the program contains no suitable handler, the stack trace is written (nicely formatted) to the standard error stream; if the interpreter is interactive, it is also made available to the user as "sys.last_traceback". For explicitly created tracebacks, it is up to the creator of the traceback to determine how the "tb_next" attributes should be linked to form a full stack trace. Special read-only attributes: "tb_frame" points to the execution frame of the current level; "tb_lineno" gives the line number where the exception occurred; "tb_lasti" indicates the precise instruction. The line number and last instruction in the traceback may differ from the line number of its frame object if the exception occurred in a "try" statement with no matching except clause or with a finally clause. Accessing "tb_frame" raises an auditing event "object.__getattr__" with arguments "obj" and ""tb_frame"". Special writable attribute: "tb_next" is the next level in the stack trace (towards the frame where the exception occurred), or "None" if there is no next level. Changed in version 3.7: Traceback objects can now be explicitly instantiated from Python code, and the "tb_next" attribute of existing instances can be updated. Slice objects Slice objects are used to represent slices for "__getitem__()" methods. They are also created by the built-in "slice()" function. Special read-only attributes: "start" is the lower bound; "stop" is the upper bound; "step" is the step value; each is "None" if omitted. These attributes can have any type. Slice objects support one method: slice.indices(self, length) This method takes a single integer argument *length* and computes information about the slice that the slice object would describe if applied to a sequence of *length* items. It returns a tuple of three integers; respectively these are the *start* and *stop* indices and the *step* or stride length of the slice. Missing or out-of-bounds indices are handled in a manner consistent with regular slices. Static method objects Static method objects provide a way of defeating the transformation of function objects to method objects described above. A static method object is a wrapper around any other object, usually a user-defined method object. When a static method object is retrieved from a class or a class instance, the object actually returned is the wrapped object, which is not subject to any further transformation. Static method objects are not themselves callable, although the objects they wrap usually are. Static method objects are created by the built-in "staticmethod()" constructor. Class method objects A class method object, like a static method object, is a wrapper around another object that alters the way in which that object is retrieved from classes and class instances. The behaviour of class method objects upon such retrieval is described above, under “User-defined methods”. Class method objects are created by the built-in "classmethod()" constructor. 3.3. Special method names ========================= A class can implement certain operations that are invoked by special syntax (such as arithmetic operations or subscripting and slicing) by defining methods with special names. This is Python’s approach to *operator overloading*, allowing classes to define their own behavior with respect to language operators. For instance, if a class defines a method named "__getitem__()", and "x" is an instance of this class, then "x[i]" is roughly equivalent to "type(x).__getitem__(x, i)". Except where mentioned, attempts to execute an operation raise an exception when no appropriate method is defined (typically "AttributeError" or "TypeError"). Setting a special method to "None" indicates that the corresponding operation is not available. For example, if a class sets "__iter__()" to "None", the class is not iterable, so calling "iter()" on its instances will raise a "TypeError" (without falling back to "__getitem__()"). [2] When implementing a class that emulates any built-in type, it is important that the emulation only be implemented to the degree that it makes sense for the object being modelled. For example, some sequences may work well with retrieval of individual elements, but extracting a slice may not make sense. (One example of this is the "NodeList" interface in the W3C’s Document Object Model.) 3.3.1. Basic customization -------------------------- object.__new__(cls[, ...]) Called to create a new instance of class *cls*. "__new__()" is a static method (special-cased so you need not declare it as such) that takes the class of which an instance was requested as its first argument. The remaining arguments are those passed to the object constructor expression (the call to the class). The return value of "__new__()" should be the new object instance (usually an instance of *cls*). Typical implementations create a new instance of the class by invoking the superclass’s "__new__()" method using "super().__new__(cls[, ...])" with appropriate arguments and then modifying the newly-created instance as necessary before returning it. If "__new__()" is invoked during object construction and it returns an instance of *cls*, then the new instance’s "__init__()" method will be invoked like "__init__(self[, ...])", where *self* is the new instance and the remaining arguments are the same as were passed to the object constructor. If "__new__()" does not return an instance of *cls*, then the new instance’s "__init__()" method will not be invoked. "__new__()" is intended mainly to allow subclasses of immutable types (like int, str, or tuple) to customize instance creation. It is also commonly overridden in custom metaclasses in order to customize class creation. object.__init__(self[, ...]) Called after the instance has been created (by "__new__()"), but before it is returned to the caller. The arguments are those passed to the class constructor expression. If a base class has an "__init__()" method, the derived class’s "__init__()" method, if any, must explicitly call it to ensure proper initialization of the base class part of the instance; for example: "super().__init__([args...])". Because "__new__()" and "__init__()" work together in constructing objects ("__new__()" to create it, and "__init__()" to customize it), no non-"None" value may be returned by "__init__()"; doing so will cause a "TypeError" to be raised at runtime. object.__del__(self) Called when the instance is about to be destroyed. This is also called a finalizer or (improperly) a destructor. If a base class has a "__del__()" method, the derived class’s "__del__()" method, if any, must explicitly call it to ensure proper deletion of the base class part of the instance. It is possible (though not recommended!) for the "__del__()" method to postpone destruction of the instance by creating a new reference to it. This is called object *resurrection*. It is implementation-dependent whether "__del__()" is called a second time when a resurrected object is about to be destroyed; the current *CPython* implementation only calls it once. It is not guaranteed that "__del__()" methods are called for objects that still exist when the interpreter exits. Note: "del x" doesn’t directly call "x.__del__()" — the former decrements the reference count for "x" by one, and the latter is only called when "x"’s reference count reaches zero. **CPython implementation detail:** It is possible for a reference cycle to prevent the reference count of an object from going to zero. In this case, the cycle will be later detected and deleted by the *cyclic garbage collector*. A common cause of reference cycles is when an exception has been caught in a local variable. The frame’s locals then reference the exception, which references its own traceback, which references the locals of all frames caught in the traceback. See also: Documentation for the "gc" module. Warning: Due to the precarious circumstances under which "__del__()" methods are invoked, exceptions that occur during their execution are ignored, and a warning is printed to "sys.stderr" instead. In particular: * "__del__()" can be invoked when arbitrary code is being executed, including from any arbitrary thread. If "__del__()" needs to take a lock or invoke any other blocking resource, it may deadlock as the resource may already be taken by the code that gets interrupted to execute "__del__()". * "__del__()" can be executed during interpreter shutdown. As a consequence, the global variables it needs to access (including other modules) may already have been deleted or set to "None". Python guarantees that globals whose name begins with a single underscore are deleted from their module before other globals are deleted; if no other references to such globals exist, this may help in assuring that imported modules are still available at the time when the "__del__()" method is called. object.__repr__(self) Called by the "repr()" built-in function to compute the “official” string representation of an object. If at all possible, this should look like a valid Python expression that could be used to recreate an object with the same value (given an appropriate environment). If this is not possible, a string of the form "<...some useful description...>" should be returned. The return value must be a string object. If a class defines "__repr__()" but not "__str__()", then "__repr__()" is also used when an “informal” string representation of instances of that class is required. This is typically used for debugging, so it is important that the representation is information-rich and unambiguous. object.__str__(self) Called by "str(object)" and the built-in functions "format()" and "print()" to compute the “informal” or nicely printable string representation of an object. The return value must be a string object. This method differs from "object.__repr__()" in that there is no expectation that "__str__()" return a valid Python expression: a more convenient or concise representation can be used. The default implementation defined by the built-in type "object" calls "object.__repr__()". object.__bytes__(self) Called by bytes to compute a byte-string representation of an object. This should return a "bytes" object. object.__format__(self, format_spec) Called by the "format()" built-in function, and by extension, evaluation of formatted string literals and the "str.format()" method, to produce a “formatted” string representation of an object. The *format_spec* argument is a string that contains a description of the formatting options desired. The interpretation of the *format_spec* argument is up to the type implementing "__format__()", however most classes will either delegate formatting to one of the built-in types, or use a similar formatting option syntax. See Format Specification Mini-Language for a description of the standard formatting syntax. The return value must be a string object. Changed in version 3.4: The __format__ method of "object" itself raises a "TypeError" if passed any non-empty string. Changed in version 3.7: "object.__format__(x, '')" is now equivalent to "str(x)" rather than "format(str(x), '')". object.__lt__(self, other) object.__le__(self, other) object.__eq__(self, other) object.__ne__(self, other) object.__gt__(self, other) object.__ge__(self, other) These are the so-called “rich comparison” methods. The correspondence between operator symbols and method names is as follows: "xy" calls "x.__gt__(y)", and "x>=y" calls "x.__ge__(y)". A rich comparison method may return the singleton "NotImplemented" if it does not implement the operation for a given pair of arguments. By convention, "False" and "True" are returned for a successful comparison. However, these methods can return any value, so if the comparison operator is used in a Boolean context (e.g., in the condition of an "if" statement), Python will call "bool()" on the value to determine if the result is true or false. By default, "object" implements "__eq__()" by using "is", returning "NotImplemented" in the case of a false comparison: "True if x is y else NotImplemented". For "__ne__()", by default it delegates to "__eq__()" and inverts the result unless it is "NotImplemented". There are no other implied relationships among the comparison operators or default implementations; for example, the truth of "(x.__hash__". If a class that does not override "__eq__()" wishes to suppress hash support, it should include "__hash__ = None" in the class definition. A class which defines its own "__hash__()" that explicitly raises a "TypeError" would be incorrectly identified as hashable by an "isinstance(obj, collections.abc.Hashable)" call. Note: By default, the "__hash__()" values of str and bytes objects are “salted” with an unpredictable random value. Although they remain constant within an individual Python process, they are not predictable between repeated invocations of Python.This is intended to provide protection against a denial-of-service caused by carefully-chosen inputs that exploit the worst case performance of a dict insertion, O(n^2) complexity. See http://www.ocert.org/advisories/ocert-2011-003.html for details.Changing hash values affects the iteration order of sets. Python has never made guarantees about this ordering (and it typically varies between 32-bit and 64-bit builds).See also "PYTHONHASHSEED". Changed in version 3.3: Hash randomization is enabled by default. object.__bool__(self) Called to implement truth value testing and the built-in operation "bool()"; should return "False" or "True". When this method is not defined, "__len__()" is called, if it is defined, and the object is considered true if its result is nonzero. If a class defines neither "__len__()" nor "__bool__()", all its instances are considered true. 3.3.2. Customizing attribute access ----------------------------------- The following methods can be defined to customize the meaning of attribute access (use of, assignment to, or deletion of "x.name") for class instances. object.__getattr__(self, name) Called when the default attribute access fails with an "AttributeError" (either "__getattribute__()" raises an "AttributeError" because *name* is not an instance attribute or an attribute in the class tree for "self"; or "__get__()" of a *name* property raises "AttributeError"). This method should either return the (computed) attribute value or raise an "AttributeError" exception. Note that if the attribute is found through the normal mechanism, "__getattr__()" is not called. (This is an intentional asymmetry between "__getattr__()" and "__setattr__()".) This is done both for efficiency reasons and because otherwise "__getattr__()" would have no way to access other attributes of the instance. Note that at least for instance variables, you can fake total control by not inserting any values in the instance attribute dictionary (but instead inserting them in another object). See the "__getattribute__()" method below for a way to actually get total control over attribute access. object.__getattribute__(self, name) Called unconditionally to implement attribute accesses for instances of the class. If the class also defines "__getattr__()", the latter will not be called unless "__getattribute__()" either calls it explicitly or raises an "AttributeError". This method should return the (computed) attribute value or raise an "AttributeError" exception. In order to avoid infinite recursion in this method, its implementation should always call the base class method with the same name to access any attributes it needs, for example, "object.__getattribute__(self, name)". Note: This method may still be bypassed when looking up special methods as the result of implicit invocation via language syntax or built-in functions. See Special method lookup. For certain sensitive attribute accesses, raises an auditing event "object.__getattr__" with arguments "obj" and "name". object.__setattr__(self, name, value) Called when an attribute assignment is attempted. This is called instead of the normal mechanism (i.e. store the value in the instance dictionary). *name* is the attribute name, *value* is the value to be assigned to it. If "__setattr__()" wants to assign to an instance attribute, it should call the base class method with the same name, for example, "object.__setattr__(self, name, value)". For certain sensitive attribute assignments, raises an auditing event "object.__setattr__" with arguments "obj", "name", "value". object.__delattr__(self, name) Like "__setattr__()" but for attribute deletion instead of assignment. This should only be implemented if "del obj.name" is meaningful for the object. For certain sensitive attribute deletions, raises an auditing event "object.__delattr__" with arguments "obj" and "name". object.__dir__(self) Called when "dir()" is called on the object. A sequence must be returned. "dir()" converts the returned sequence to a list and sorts it. 3.3.2.1. Customizing module attribute access ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Special names "__getattr__" and "__dir__" can be also used to customize access to module attributes. The "__getattr__" function at the module level should accept one argument which is the name of an attribute and return the computed value or raise an "AttributeError". If an attribute is not found on a module object through the normal lookup, i.e. "object.__getattribute__()", then "__getattr__" is searched in the module "__dict__" before raising an "AttributeError". If found, it is called with the attribute name and the result is returned. The "__dir__" function should accept no arguments, and return a sequence of strings that represents the names accessible on module. If present, this function overrides the standard "dir()" search on a module. For a more fine grained customization of the module behavior (setting attributes, properties, etc.), one can set the "__class__" attribute of a module object to a subclass of "types.ModuleType". For example: import sys from types import ModuleType class VerboseModule(ModuleType): def __repr__(self): return f'Verbose {self.__name__}' def __setattr__(self, attr, value): print(f'Setting {attr}...') super().__setattr__(attr, value) sys.modules[__name__].__class__ = VerboseModule Note: Defining module "__getattr__" and setting module "__class__" only affect lookups made using the attribute access syntax – directly accessing the module globals (whether by code within the module, or via a reference to the module’s globals dictionary) is unaffected. Changed in version 3.5: "__class__" module attribute is now writable. New in version 3.7: "__getattr__" and "__dir__" module attributes. See also: **PEP 562** - Module __getattr__ and __dir__ Describes the "__getattr__" and "__dir__" functions on modules. 3.3.2.2. Implementing Descriptors ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The following methods only apply when an instance of the class containing the method (a so-called *descriptor* class) appears in an *owner* class (the descriptor must be in either the owner’s class dictionary or in the class dictionary for one of its parents). In the examples below, “the attribute” refers to the attribute whose name is the key of the property in the owner class’ "__dict__". object.__get__(self, instance, owner=None) Called to get the attribute of the owner class (class attribute access) or of an instance of that class (instance attribute access). The optional *owner* argument is the owner class, while *instance* is the instance that the attribute was accessed through, or "None" when the attribute is accessed through the *owner*. This method should return the computed attribute value or raise an "AttributeError" exception. **PEP 252** specifies that "__get__()" is callable with one or two arguments. Python’s own built-in descriptors support this specification; however, it is likely that some third-party tools have descriptors that require both arguments. Python’s own "__getattribute__()" implementation always passes in both arguments whether they are required or not. object.__set__(self, instance, value) Called to set the attribute on an instance *instance* of the owner class to a new value, *value*. Note, adding "__set__()" or "__delete__()" changes the kind of descriptor to a “data descriptor”. See Invoking Descriptors for more details. object.__delete__(self, instance) Called to delete the attribute on an instance *instance* of the owner class. object.__set_name__(self, owner, name) Called at the time the owning class *owner* is created. The descriptor has been assigned to *name*. Note: "__set_name__()" is only called implicitly as part of the "type" constructor, so it will need to be called explicitly with the appropriate parameters when a descriptor is added to a class after initial creation: class A: pass descr = custom_descriptor() A.attr = descr descr.__set_name__(A, 'attr') See Creating the class object for more details. New in version 3.6. The attribute "__objclass__" is interpreted by the "inspect" module as specifying the class where this object was defined (setting this appropriately can assist in runtime introspection of dynamic class attributes). For callables, it may indicate that an instance of the given type (or a subclass) is expected or required as the first positional argument (for example, CPython sets this attribute for unbound methods that are implemented in C). 3.3.2.3. Invoking Descriptors ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ In general, a descriptor is an object attribute with “binding behavior”, one whose attribute access has been overridden by methods in the descriptor protocol: "__get__()", "__set__()", and "__delete__()". If any of those methods are defined for an object, it is said to be a descriptor. The default behavior for attribute access is to get, set, or delete the attribute from an object’s dictionary. For instance, "a.x" has a lookup chain starting with "a.__dict__['x']", then "type(a).__dict__['x']", and continuing through the base classes of "type(a)" excluding metaclasses. However, if the looked-up value is an object defining one of the descriptor methods, then Python may override the default behavior and invoke the descriptor method instead. Where this occurs in the precedence chain depends on which descriptor methods were defined and how they were called. The starting point for descriptor invocation is a binding, "a.x". How the arguments are assembled depends on "a": Direct Call The simplest and least common call is when user code directly invokes a descriptor method: "x.__get__(a)". Instance Binding If binding to an object instance, "a.x" is transformed into the call: "type(a).__dict__['x'].__get__(a, type(a))". Class Binding If binding to a class, "A.x" is transformed into the call: "A.__dict__['x'].__get__(None, A)". Super Binding If "a" is an instance of "super", then the binding "super(B, obj).m()" searches "obj.__class__.__mro__" for the base class "A" immediately following "B" and then invokes the descriptor with the call: "A.__dict__['m'].__get__(obj, obj.__class__)". For instance bindings, the precedence of descriptor invocation depends on which descriptor methods are defined. A descriptor can define any combination of "__get__()", "__set__()" and "__delete__()". If it does not define "__get__()", then accessing the attribute will return the descriptor object itself unless there is a value in the object’s instance dictionary. If the descriptor defines "__set__()" and/or "__delete__()", it is a data descriptor; if it defines neither, it is a non-data descriptor. Normally, data descriptors define both "__get__()" and "__set__()", while non-data descriptors have just the "__get__()" method. Data descriptors with "__get__()" and "__set__()" (and/or "__delete__()") defined always override a redefinition in an instance dictionary. In contrast, non-data descriptors can be overridden by instances. Python methods (including those decorated with "@staticmethod" and "@classmethod") are implemented as non-data descriptors. Accordingly, instances can redefine and override methods. This allows individual instances to acquire behaviors that differ from other instances of the same class. The "property()" function is implemented as a data descriptor. Accordingly, instances cannot override the behavior of a property. 3.3.2.4. __slots__ ~~~~~~~~~~~~~~~~~~ *__slots__* allow us to explicitly declare data members (like properties) and deny the creation of "__dict__" and *__weakref__* (unless explicitly declared in *__slots__* or available in a parent.) The space saved over using "__dict__" can be significant. Attribute lookup speed can be significantly improved as well. object.__slots__ This class variable can be assigned a string, iterable, or sequence of strings with variable names used by instances. *__slots__* reserves space for the declared variables and prevents the automatic creation of "__dict__" and *__weakref__* for each instance. 3.3.2.4.1. Notes on using *__slots__* """"""""""""""""""""""""""""""""""""" * When inheriting from a class without *__slots__*, the "__dict__" and *__weakref__* attribute of the instances will always be accessible. * Without a "__dict__" variable, instances cannot be assigned new variables not listed in the *__slots__* definition. Attempts to assign to an unlisted variable name raises "AttributeError". If dynamic assignment of new variables is desired, then add "'__dict__'" to the sequence of strings in the *__slots__* declaration. * Without a *__weakref__* variable for each instance, classes defining *__slots__* do not support "weak references" to its instances. If weak reference support is needed, then add "'__weakref__'" to the sequence of strings in the *__slots__* declaration. * *__slots__* are implemented at the class level by creating descriptors for each variable name. As a result, class attributes cannot be used to set default values for instance variables defined by *__slots__*; otherwise, the class attribute would overwrite the descriptor assignment. * The action of a *__slots__* declaration is not limited to the class where it is defined. *__slots__* declared in parents are available in child classes. However, child subclasses will get a "__dict__" and *__weakref__* unless they also define *__slots__* (which should only contain names of any *additional* slots). * If a class defines a slot also defined in a base class, the instance variable defined by the base class slot is inaccessible (except by retrieving its descriptor directly from the base class). This renders the meaning of the program undefined. In the future, a check may be added to prevent this. * Nonempty *__slots__* does not work for classes derived from “variable-length” built-in types such as "int", "bytes" and "tuple". * Any non-string *iterable* may be assigned to *__slots__*. * If a "dictionary" is used to assign *__slots__*, the dictionary keys will be used as the slot names. The values of the dictionary can be used to provide per-attribute docstrings that will be recognised by "inspect.getdoc()" and displayed in the output of "help()". * "__class__" assignment works only if both classes have the same *__slots__*. * Multiple inheritance with multiple slotted parent classes can be used, but only one parent is allowed to have attributes created by slots (the other bases must have empty slot layouts) - violations raise "TypeError". * If an *iterator* is used for *__slots__* then a *descriptor* is created for each of the iterator’s values. However, the *__slots__* attribute will be an empty iterator. 3.3.3. Customizing class creation --------------------------------- Whenever a class inherits from another class, "__init_subclass__()" is called on the parent class. This way, it is possible to write classes which change the behavior of subclasses. This is closely related to class decorators, but where class decorators only affect the specific class they’re applied to, "__init_subclass__" solely applies to future subclasses of the class defining the method. classmethod object.__init_subclass__(cls) This method is called whenever the containing class is subclassed. *cls* is then the new subclass. If defined as a normal instance method, this method is implicitly converted to a class method. Keyword arguments which are given to a new class are passed to the parent’s class "__init_subclass__". For compatibility with other classes using "__init_subclass__", one should take out the needed keyword arguments and pass the others over to the base class, as in: class Philosopher: def __init_subclass__(cls, /, default_name, **kwargs): super().__init_subclass__(**kwargs) cls.default_name = default_name class AustralianPhilosopher(Philosopher, default_name="Bruce"): pass The default implementation "object.__init_subclass__" does nothing, but raises an error if it is called with any arguments. Note: The metaclass hint "metaclass" is consumed by the rest of the type machinery, and is never passed to "__init_subclass__" implementations. The actual metaclass (rather than the explicit hint) can be accessed as "type(cls)". New in version 3.6. 3.3.3.1. Metaclasses ~~~~~~~~~~~~~~~~~~~~ By default, classes are constructed using "type()". The class body is executed in a new namespace and the class name is bound locally to the result of "type(name, bases, namespace)". The class creation process can be customized by passing the "metaclass" keyword argument in the class definition line, or by inheriting from an existing class that included such an argument. In the following example, both "MyClass" and "MySubclass" are instances of "Meta": class Meta(type): pass class MyClass(metaclass=Meta): pass class MySubclass(MyClass): pass Any other keyword arguments that are specified in the class definition are passed through to all metaclass operations described below. When a class definition is executed, the following steps occur: * MRO entries are resolved; * the appropriate metaclass is determined; * the class namespace is prepared; * the class body is executed; * the class object is created. 3.3.3.2. Resolving MRO entries ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If a base that appears in class definition is not an instance of "type", then an "__mro_entries__" method is searched on it. If found, it is called with the original bases tuple. This method must return a tuple of classes that will be used instead of this base. The tuple may be empty, in such case the original base is ignored. See also: **PEP 560** - Core support for typing module and generic types 3.3.3.3. Determining the appropriate metaclass ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The appropriate metaclass for a class definition is determined as follows: * if no bases and no explicit metaclass are given, then "type()" is used; * if an explicit metaclass is given and it is *not* an instance of "type()", then it is used directly as the metaclass; * if an instance of "type()" is given as the explicit metaclass, or bases are defined, then the most derived metaclass is used. The most derived metaclass is selected from the explicitly specified metaclass (if any) and the metaclasses (i.e. "type(cls)") of all specified base classes. The most derived metaclass is one which is a subtype of *all* of these candidate metaclasses. If none of the candidate metaclasses meets that criterion, then the class definition will fail with "TypeError". 3.3.3.4. Preparing the class namespace ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Once the appropriate metaclass has been identified, then the class namespace is prepared. If the metaclass has a "__prepare__" attribute, it is called as "namespace = metaclass.__prepare__(name, bases, **kwds)" (where the additional keyword arguments, if any, come from the class definition). The "__prepare__" method should be implemented as a "classmethod". The namespace returned by "__prepare__" is passed in to "__new__", but when the final class object is created the namespace is copied into a new "dict". If the metaclass has no "__prepare__" attribute, then the class namespace is initialised as an empty ordered mapping. See also: **PEP 3115** - Metaclasses in Python 3000 Introduced the "__prepare__" namespace hook 3.3.3.5. Executing the class body ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The class body is executed (approximately) as "exec(body, globals(), namespace)". The key difference from a normal call to "exec()" is that lexical scoping allows the class body (including any methods) to reference names from the current and outer scopes when the class definition occurs inside a function. However, even when the class definition occurs inside the function, methods defined inside the class still cannot see names defined at the class scope. Class variables must be accessed through the first parameter of instance or class methods, or through the implicit lexically scoped "__class__" reference described in the next section. 3.3.3.6. Creating the class object ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Once the class namespace has been populated by executing the class body, the class object is created by calling "metaclass(name, bases, namespace, **kwds)" (the additional keywords passed here are the same as those passed to "__prepare__"). This class object is the one that will be referenced by the zero- argument form of "super()". "__class__" is an implicit closure reference created by the compiler if any methods in a class body refer to either "__class__" or "super". This allows the zero argument form of "super()" to correctly identify the class being defined based on lexical scoping, while the class or instance that was used to make the current call is identified based on the first argument passed to the method. **CPython implementation detail:** In CPython 3.6 and later, the "__class__" cell is passed to the metaclass as a "__classcell__" entry in the class namespace. If present, this must be propagated up to the "type.__new__" call in order for the class to be initialised correctly. Failing to do so will result in a "RuntimeError" in Python 3.8. When using the default metaclass "type", or any metaclass that ultimately calls "type.__new__", the following additional customisation steps are invoked after creating the class object: * first, "type.__new__" collects all of the descriptors in the class namespace that define a "__set_name__()" method; * second, all of these "__set_name__" methods are called with the class being defined and the assigned name of that particular descriptor; * finally, the "__init_subclass__()" hook is called on the immediate parent of the new class in its method resolution order. After the class object is created, it is passed to the class decorators included in the class definition (if any) and the resulting object is bound in the local namespace as the defined class. When a new class is created by "type.__new__", the object provided as the namespace parameter is copied to a new ordered mapping and the original object is discarded. The new copy is wrapped in a read-only proxy, which becomes the "__dict__" attribute of the class object. See also: **PEP 3135** - New super Describes the implicit "__class__" closure reference 3.3.3.7. Uses for metaclasses ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The potential uses for metaclasses are boundless. Some ideas that have been explored include enum, logging, interface checking, automatic delegation, automatic property creation, proxies, frameworks, and automatic resource locking/synchronization. 3.3.4. Customizing instance and subclass checks ----------------------------------------------- The following methods are used to override the default behavior of the "isinstance()" and "issubclass()" built-in functions. In particular, the metaclass "abc.ABCMeta" implements these methods in order to allow the addition of Abstract Base Classes (ABCs) as “virtual base classes” to any class or type (including built-in types), including other ABCs. class.__instancecheck__(self, instance) Return true if *instance* should be considered a (direct or indirect) instance of *class*. If defined, called to implement "isinstance(instance, class)". class.__subclasscheck__(self, subclass) Return true if *subclass* should be considered a (direct or indirect) subclass of *class*. If defined, called to implement "issubclass(subclass, class)". Note that these methods are looked up on the type (metaclass) of a class. They cannot be defined as class methods in the actual class. This is consistent with the lookup of special methods that are called on instances, only in this case the instance is itself a class. See also: **PEP 3119** - Introducing Abstract Base Classes Includes the specification for customizing "isinstance()" and "issubclass()" behavior through "__instancecheck__()" and "__subclasscheck__()", with motivation for this functionality in the context of adding Abstract Base Classes (see the "abc" module) to the language. 3.3.5. Emulating generic types ------------------------------ When using *type annotations*, it is often useful to *parameterize* a *generic type* using Python’s square-brackets notation. For example, the annotation "list[int]" might be used to signify a "list" in which all the elements are of type "int". See also: **PEP 484** - Type Hints Introducing Python’s framework for type annotations Generic Alias Types Documentation for objects representing parameterized generic classes Generics, user-defined generics and "typing.Generic" Documentation on how to implement generic classes that can be parameterized at runtime and understood by static type-checkers. A class can *generally* only be parameterized if it defines the special class method "__class_getitem__()". classmethod object.__class_getitem__(cls, key) Return an object representing the specialization of a generic class by type arguments found in *key*. When defined on a class, "__class_getitem__()" is automatically a class method. As such, there is no need for it to be decorated with "@classmethod" when it is defined. 3.3.5.1. The purpose of *__class_getitem__* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The purpose of "__class_getitem__()" is to allow runtime parameterization of standard-library generic classes in order to more easily apply *type hints* to these classes. To implement custom generic classes that can be parameterized at runtime and understood by static type-checkers, users should either inherit from a standard library class that already implements "__class_getitem__()", or inherit from "typing.Generic", which has its own implementation of "__class_getitem__()". Custom implementations of "__class_getitem__()" on classes defined outside of the standard library may not be understood by third-party type-checkers such as mypy. Using "__class_getitem__()" on any class for purposes other than type hinting is discouraged. 3.3.5.2. *__class_getitem__* versus *__getitem__* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Usually, the subscription of an object using square brackets will call the "__getitem__()" instance method defined on the object’s class. However, if the object being subscribed is itself a class, the class method "__class_getitem__()" may be called instead. "__class_getitem__()" should return a GenericAlias object if it is properly defined. Presented with the *expression* "obj[x]", the Python interpreter follows something like the following process to decide whether "__getitem__()" or "__class_getitem__()" should be called: from inspect import isclass def subscribe(obj, x): """Return the result of the expression `obj[x]`""" class_of_obj = type(obj) # If the class of obj defines __getitem__, # call class_of_obj.__getitem__(obj, x) if hasattr(class_of_obj, '__getitem__'): return class_of_obj.__getitem__(obj, x) # Else, if obj is a class and defines __class_getitem__, # call obj.__class_getitem__(x) elif isclass(obj) and hasattr(obj, '__class_getitem__'): return obj.__class_getitem__(x) # Else, raise an exception else: raise TypeError( f"'{class_of_obj.__name__}' object is not subscriptable" ) In Python, all classes are themselves instances of other classes. The class of a class is known as that class’s *metaclass*, and most classes have the "type" class as their metaclass. "type" does not define "__getitem__()", meaning that expressions such as "list[int]", "dict[str, float]" and "tuple[str, bytes]" all result in "__class_getitem__()" being called: >>> # list has class "type" as its metaclass, like most classes: >>> type(list) >>> type(dict) == type(list) == type(tuple) == type(str) == type(bytes) True >>> # "list[int]" calls "list.__class_getitem__(int)" >>> list[int] list[int] >>> # list.__class_getitem__ returns a GenericAlias object: >>> type(list[int]) However, if a class has a custom metaclass that defines "__getitem__()", subscribing the class may result in different behaviour. An example of this can be found in the "enum" module: >>> from enum import Enum >>> class Menu(Enum): ... """A breakfast menu""" ... SPAM = 'spam' ... BACON = 'bacon' ... >>> # Enum classes have a custom metaclass: >>> type(Menu) >>> # EnumMeta defines __getitem__, >>> # so __class_getitem__ is not called, >>> # and the result is not a GenericAlias object: >>> Menu['SPAM'] >>> type(Menu['SPAM']) See also: **PEP 560** - Core Support for typing module and generic types Introducing "__class_getitem__()", and outlining when a subscription results in "__class_getitem__()" being called instead of "__getitem__()" 3.3.6. Emulating callable objects --------------------------------- object.__call__(self[, args...]) Called when the instance is “called” as a function; if this method is defined, "x(arg1, arg2, ...)" roughly translates to "type(x).__call__(x, arg1, ...)". 3.3.7. Emulating container types -------------------------------- The following methods can be defined to implement container objects. Containers usually are *sequences* (such as "lists" or "tuples") or *mappings* (like "dictionaries"), but can represent other containers as well. The first set of methods is used either to emulate a sequence or to emulate a mapping; the difference is that for a sequence, the allowable keys should be the integers *k* for which "0 <= k < N" where *N* is the length of the sequence, or "slice" objects, which define a range of items. It is also recommended that mappings provide the methods "keys()", "values()", "items()", "get()", "clear()", "setdefault()", "pop()", "popitem()", "copy()", and "update()" behaving similar to those for Python’s standard "dictionary" objects. The "collections.abc" module provides a "MutableMapping" *abstract base class* to help create those methods from a base set of "__getitem__()", "__setitem__()", "__delitem__()", and "keys()". Mutable sequences should provide methods "append()", "count()", "index()", "extend()", "insert()", "pop()", "remove()", "reverse()" and "sort()", like Python standard "list" objects. Finally, sequence types should implement addition (meaning concatenation) and multiplication (meaning repetition) by defining the methods "__add__()", "__radd__()", "__iadd__()", "__mul__()", "__rmul__()" and "__imul__()" described below; they should not define other numerical operators. It is recommended that both mappings and sequences implement the "__contains__()" method to allow efficient use of the "in" operator; for mappings, "in" should search the mapping’s keys; for sequences, it should search through the values. It is further recommended that both mappings and sequences implement the "__iter__()" method to allow efficient iteration through the container; for mappings, "__iter__()" should iterate through the object’s keys; for sequences, it should iterate through the values. object.__len__(self) Called to implement the built-in function "len()". Should return the length of the object, an integer ">=" 0. Also, an object that doesn’t define a "__bool__()" method and whose "__len__()" method returns zero is considered to be false in a Boolean context. **CPython implementation detail:** In CPython, the length is required to be at most "sys.maxsize". If the length is larger than "sys.maxsize" some features (such as "len()") may raise "OverflowError". To prevent raising "OverflowError" by truth value testing, an object must define a "__bool__()" method. object.__length_hint__(self) Called to implement "operator.length_hint()". Should return an estimated length for the object (which may be greater or less than the actual length). The length must be an integer ">=" 0. The return value may also be "NotImplemented", which is treated the same as if the "__length_hint__" method didn’t exist at all. This method is purely an optimization and is never required for correctness. New in version 3.4. Note: Slicing is done exclusively with the following three methods. A call like a[1:2] = b is translated to a[slice(1, 2, None)] = b and so forth. Missing slice items are always filled in with "None". object.__getitem__(self, key) Called to implement evaluation of "self[key]". For *sequence* types, the accepted keys should be integers and slice objects. Note that the special interpretation of negative indexes (if the class wishes to emulate a *sequence* type) is up to the "__getitem__()" method. If *key* is of an inappropriate type, "TypeError" may be raised; if of a value outside the set of indexes for the sequence (after any special interpretation of negative values), "IndexError" should be raised. For *mapping* types, if *key* is missing (not in the container), "KeyError" should be raised. Note: "for" loops expect that an "IndexError" will be raised for illegal indexes to allow proper detection of the end of the sequence. Note: When subscripting a *class*, the special class method "__class_getitem__()" may be called instead of "__getitem__()". See __class_getitem__ versus __getitem__ for more details. object.__setitem__(self, key, value) Called to implement assignment to "self[key]". Same note as for "__getitem__()". This should only be implemented for mappings if the objects support changes to the values for keys, or if new keys can be added, or for sequences if elements can be replaced. The same exceptions should be raised for improper *key* values as for the "__getitem__()" method. object.__delitem__(self, key) Called to implement deletion of "self[key]". Same note as for "__getitem__()". This should only be implemented for mappings if the objects support removal of keys, or for sequences if elements can be removed from the sequence. The same exceptions should be raised for improper *key* values as for the "__getitem__()" method. object.__missing__(self, key) Called by "dict"."__getitem__()" to implement "self[key]" for dict subclasses when key is not in the dictionary. object.__iter__(self) This method is called when an iterator is required for a container. This method should return a new iterator object that can iterate over all the objects in the container. For mappings, it should iterate over the keys of the container. Iterator objects also need to implement this method; they are required to return themselves. For more information on iterator objects, see Iterator Types. object.__reversed__(self) Called (if present) by the "reversed()" built-in to implement reverse iteration. It should return a new iterator object that iterates over all the objects in the container in reverse order. If the "__reversed__()" method is not provided, the "reversed()" built-in will fall back to using the sequence protocol ("__len__()" and "__getitem__()"). Objects that support the sequence protocol should only provide "__reversed__()" if they can provide an implementation that is more efficient than the one provided by "reversed()". The membership test operators ("in" and "not in") are normally implemented as an iteration through a container. However, container objects can supply the following special method with a more efficient implementation, which also does not require the object be iterable. object.__contains__(self, item) Called to implement membership test operators. Should return true if *item* is in *self*, false otherwise. For mapping objects, this should consider the keys of the mapping rather than the values or the key-item pairs. For objects that don’t define "__contains__()", the membership test first tries iteration via "__iter__()", then the old sequence iteration protocol via "__getitem__()", see this section in the language reference. 3.3.8. Emulating numeric types ------------------------------ The following methods can be defined to emulate numeric objects. Methods corresponding to operations that are not supported by the particular kind of number implemented (e.g., bitwise operations for non-integral numbers) should be left undefined. object.__add__(self, other) object.__sub__(self, other) object.__mul__(self, other) object.__matmul__(self, other) object.__truediv__(self, other) object.__floordiv__(self, other) object.__mod__(self, other) object.__divmod__(self, other) object.__pow__(self, other[, modulo]) object.__lshift__(self, other) object.__rshift__(self, other) object.__and__(self, other) object.__xor__(self, other) object.__or__(self, other) These methods are called to implement the binary arithmetic operations ("+", "-", "*", "@", "/", "//", "%", "divmod()", "pow()", "**", "<<", ">>", "&", "^", "|"). For instance, to evaluate the expression "x + y", where *x* is an instance of a class that has an "__add__()" method, "x.__add__(y)" is called. The "__divmod__()" method should be the equivalent to using "__floordiv__()" and "__mod__()"; it should not be related to "__truediv__()". Note that "__pow__()" should be defined to accept an optional third argument if the ternary version of the built-in "pow()" function is to be supported. If one of those methods does not support the operation with the supplied arguments, it should return "NotImplemented". object.__radd__(self, other) object.__rsub__(self, other) object.__rmul__(self, other) object.__rmatmul__(self, other) object.__rtruediv__(self, other) object.__rfloordiv__(self, other) object.__rmod__(self, other) object.__rdivmod__(self, other) object.__rpow__(self, other[, modulo]) object.__rlshift__(self, other) object.__rrshift__(self, other) object.__rand__(self, other) object.__rxor__(self, other) object.__ror__(self, other) These methods are called to implement the binary arithmetic operations ("+", "-", "*", "@", "/", "//", "%", "divmod()", "pow()", "**", "<<", ">>", "&", "^", "|") with reflected (swapped) operands. These functions are only called if the left operand does not support the corresponding operation [3] and the operands are of different types. [4] For instance, to evaluate the expression "x - y", where *y* is an instance of a class that has an "__rsub__()" method, "y.__rsub__(x)" is called if "x.__sub__(y)" returns *NotImplemented*. Note that ternary "pow()" will not try calling "__rpow__()" (the coercion rules would become too complicated). Note: If the right operand’s type is a subclass of the left operand’s type and that subclass provides a different implementation of the reflected method for the operation, this method will be called before the left operand’s non-reflected method. This behavior allows subclasses to override their ancestors’ operations. object.__iadd__(self, other) object.__isub__(self, other) object.__imul__(self, other) object.__imatmul__(self, other) object.__itruediv__(self, other) object.__ifloordiv__(self, other) object.__imod__(self, other) object.__ipow__(self, other[, modulo]) object.__ilshift__(self, other) object.__irshift__(self, other) object.__iand__(self, other) object.__ixor__(self, other) object.__ior__(self, other) These methods are called to implement the augmented arithmetic assignments ("+=", "-=", "*=", "@=", "/=", "//=", "%=", "**=", "<<=", ">>=", "&=", "^=", "|="). These methods should attempt to do the operation in-place (modifying *self*) and return the result (which could be, but does not have to be, *self*). If a specific method is not defined, the augmented assignment falls back to the normal methods. For instance, if *x* is an instance of a class with an "__iadd__()" method, "x += y" is equivalent to "x = x.__iadd__(y)" . Otherwise, "x.__add__(y)" and "y.__radd__(x)" are considered, as with the evaluation of "x + y". In certain situations, augmented assignment can result in unexpected errors (see Why does a_tuple[i] += [‘item’] raise an exception when the addition works?), but this behavior is in fact part of the data model. Note: Due to a bug in the dispatching mechanism for "**=", a class that defines "__ipow__()" but returns "NotImplemented" would fail to fall back to "x.__pow__(y)" and "y.__rpow__(x)". This bug is fixed in Python 3.10. object.__neg__(self) object.__pos__(self) object.__abs__(self) object.__invert__(self) Called to implement the unary arithmetic operations ("-", "+", "abs()" and "~"). object.__complex__(self) object.__int__(self) object.__float__(self) Called to implement the built-in functions "complex()", "int()" and "float()". Should return a value of the appropriate type. object.__index__(self) Called to implement "operator.index()", and whenever Python needs to losslessly convert the numeric object to an integer object (such as in slicing, or in the built-in "bin()", "hex()" and "oct()" functions). Presence of this method indicates that the numeric object is an integer type. Must return an integer. If "__int__()", "__float__()" and "__complex__()" are not defined then corresponding built-in functions "int()", "float()" and "complex()" fall back to "__index__()". object.__round__(self[, ndigits]) object.__trunc__(self) object.__floor__(self) object.__ceil__(self) Called to implement the built-in function "round()" and "math" functions "trunc()", "floor()" and "ceil()". Unless *ndigits* is passed to "__round__()" all these methods should return the value of the object truncated to an "Integral" (typically an "int"). The built-in function "int()" falls back to "__trunc__()" if neither "__int__()" nor "__index__()" is defined. 3.3.9. With Statement Context Managers -------------------------------------- A *context manager* is an object that defines the runtime context to be established when executing a "with" statement. The context manager handles the entry into, and the exit from, the desired runtime context for the execution of the block of code. Context managers are normally invoked using the "with" statement (described in section The with statement), but can also be used by directly invoking their methods. Typical uses of context managers include saving and restoring various kinds of global state, locking and unlocking resources, closing opened files, etc. For more information on context managers, see Context Manager Types. object.__enter__(self) Enter the runtime context related to this object. The "with" statement will bind this method’s return value to the target(s) specified in the "as" clause of the statement, if any. object.__exit__(self, exc_type, exc_value, traceback) Exit the runtime context related to this object. The parameters describe the exception that caused the context to be exited. If the context was exited without an exception, all three arguments will be "None". If an exception is supplied, and the method wishes to suppress the exception (i.e., prevent it from being propagated), it should return a true value. Otherwise, the exception will be processed normally upon exit from this method. Note that "__exit__()" methods should not reraise the passed-in exception; this is the caller’s responsibility. See also: **PEP 343** - The “with” statement The specification, background, and examples for the Python "with" statement. 3.3.10. Special method lookup ----------------------------- For custom classes, implicit invocations of special methods are only guaranteed to work correctly if defined on an object’s type, not in the object’s instance dictionary. That behaviour is the reason why the following code raises an exception: >>> class C: ... pass ... >>> c = C() >>> c.__len__ = lambda: 5 >>> len(c) Traceback (most recent call last): File "", line 1, in TypeError: object of type 'C' has no len() The rationale behind this behaviour lies with a number of special methods such as "__hash__()" and "__repr__()" that are implemented by all objects, including type objects. If the implicit lookup of these methods used the conventional lookup process, they would fail when invoked on the type object itself: >>> 1 .__hash__() == hash(1) True >>> int.__hash__() == hash(int) Traceback (most recent call last): File "", line 1, in TypeError: descriptor '__hash__' of 'int' object needs an argument Incorrectly attempting to invoke an unbound method of a class in this way is sometimes referred to as ‘metaclass confusion’, and is avoided by bypassing the instance when looking up special methods: >>> type(1).__hash__(1) == hash(1) True >>> type(int).__hash__(int) == hash(int) True In addition to bypassing any instance attributes in the interest of correctness, implicit special method lookup generally also bypasses the "__getattribute__()" method even of the object’s metaclass: >>> class Meta(type): ... def __getattribute__(*args): ... print("Metaclass getattribute invoked") ... return type.__getattribute__(*args) ... >>> class C(object, metaclass=Meta): ... def __len__(self): ... return 10 ... def __getattribute__(*args): ... print("Class getattribute invoked") ... return object.__getattribute__(*args) ... >>> c = C() >>> c.__len__() # Explicit lookup via instance Class getattribute invoked 10 >>> type(c).__len__(c) # Explicit lookup via type Metaclass getattribute invoked 10 >>> len(c) # Implicit lookup 10 Bypassing the "__getattribute__()" machinery in this fashion provides significant scope for speed optimisations within the interpreter, at the cost of some flexibility in the handling of special methods (the special method *must* be set on the class object itself in order to be consistently invoked by the interpreter). 3.4. Coroutines =============== 3.4.1. Awaitable Objects ------------------------ An *awaitable* object generally implements an "__await__()" method. *Coroutine objects* returned from "async def" functions are awaitable. Note: The *generator iterator* objects returned from generators decorated with "types.coroutine()" or "asyncio.coroutine()" are also awaitable, but they do not implement "__await__()". object.__await__(self) Must return an *iterator*. Should be used to implement *awaitable* objects. For instance, "asyncio.Future" implements this method to be compatible with the "await" expression. New in version 3.5. See also: **PEP 492** for additional information about awaitable objects. 3.4.2. Coroutine Objects ------------------------ *Coroutine objects* are *awaitable* objects. A coroutine’s execution can be controlled by calling "__await__()" and iterating over the result. When the coroutine has finished executing and returns, the iterator raises "StopIteration", and the exception’s "value" attribute holds the return value. If the coroutine raises an exception, it is propagated by the iterator. Coroutines should not directly raise unhandled "StopIteration" exceptions. Coroutines also have the methods listed below, which are analogous to those of generators (see Generator-iterator methods). However, unlike generators, coroutines do not directly support iteration. Changed in version 3.5.2: It is a "RuntimeError" to await on a coroutine more than once. coroutine.send(value) Starts or resumes execution of the coroutine. If *value* is "None", this is equivalent to advancing the iterator returned by "__await__()". If *value* is not "None", this method delegates to the "send()" method of the iterator that caused the coroutine to suspend. The result (return value, "StopIteration", or other exception) is the same as when iterating over the "__await__()" return value, described above. coroutine.throw(value) coroutine.throw(type[, value[, traceback]]) Raises the specified exception in the coroutine. This method delegates to the "throw()" method of the iterator that caused the coroutine to suspend, if it has such a method. Otherwise, the exception is raised at the suspension point. The result (return value, "StopIteration", or other exception) is the same as when iterating over the "__await__()" return value, described above. If the exception is not caught in the coroutine, it propagates back to the caller. coroutine.close() Causes the coroutine to clean itself up and exit. If the coroutine is suspended, this method first delegates to the "close()" method of the iterator that caused the coroutine to suspend, if it has such a method. Then it raises "GeneratorExit" at the suspension point, causing the coroutine to immediately clean itself up. Finally, the coroutine is marked as having finished executing, even if it was never started. Coroutine objects are automatically closed using the above process when they are about to be destroyed. 3.4.3. Asynchronous Iterators ----------------------------- An *asynchronous iterator* can call asynchronous code in its "__anext__" method. Asynchronous iterators can be used in an "async for" statement. object.__aiter__(self) Must return an *asynchronous iterator* object. object.__anext__(self) Must return an *awaitable* resulting in a next value of the iterator. Should raise a "StopAsyncIteration" error when the iteration is over. An example of an asynchronous iterable object: class Reader: async def readline(self): ... def __aiter__(self): return self async def __anext__(self): val = await self.readline() if val == b'': raise StopAsyncIteration return val New in version 3.5. Changed in version 3.7: Prior to Python 3.7, "__aiter__()" could return an *awaitable* that would resolve to an *asynchronous iterator*.Starting with Python 3.7, "__aiter__()" must return an asynchronous iterator object. Returning anything else will result in a "TypeError" error. 3.4.4. Asynchronous Context Managers ------------------------------------ An *asynchronous context manager* is a *context manager* that is able to suspend execution in its "__aenter__" and "__aexit__" methods. Asynchronous context managers can be used in an "async with" statement. object.__aenter__(self) Semantically similar to "__enter__()", the only difference being that it must return an *awaitable*. object.__aexit__(self, exc_type, exc_value, traceback) Semantically similar to "__exit__()", the only difference being that it must return an *awaitable*. An example of an asynchronous context manager class: class AsyncContextManager: async def __aenter__(self): await log('entering context') async def __aexit__(self, exc_type, exc, tb): await log('exiting context') New in version 3.5. -[ Footnotes ]- [1] It *is* possible in some cases to change an object’s type, under certain controlled conditions. It generally isn’t a good idea though, since it can lead to some very strange behaviour if it is handled incorrectly. [2] The "__hash__()", "__iter__()", "__reversed__()", and "__contains__()" methods have special handling for this; others will still raise a "TypeError", but may do so by relying on the behavior that "None" is not callable. [3] “Does not support” here means that the class has no such method, or the method returns "NotImplemented". Do not set the method to "None" if you want to force fallback to the right operand’s reflected method—that will instead have the opposite effect of explicitly *blocking* such fallback. [4] For operands of the same type, it is assumed that if the non- reflected method – such as "__add__()" – fails then the overall operation is not supported, which is why the reflected method is not called. 4. Execution model ****************** 4.1. Structure of a program =========================== A Python program is constructed from code blocks. A *block* is a piece of Python program text that is executed as a unit. The following are blocks: a module, a function body, and a class definition. Each command typed interactively is a block. A script file (a file given as standard input to the interpreter or specified as a command line argument to the interpreter) is a code block. A script command (a command specified on the interpreter command line with the "-c" option) is a code block. A module run as a top level script (as module "__main__") from the command line using a "-m" argument is also a code block. The string argument passed to the built-in functions "eval()" and "exec()" is a code block. A code block is executed in an *execution frame*. A frame contains some administrative information (used for debugging) and determines where and how execution continues after the code block’s execution has completed. 4.2. Naming and binding ======================= 4.2.1. Binding of names ----------------------- *Names* refer to objects. Names are introduced by name binding operations. The following constructs bind names: formal parameters to functions, "import" statements, class and function definitions (these bind the class or function name in the defining block), and targets that are identifiers if occurring in an assignment, "for" loop header, or after "as" in a "with" statement or "except" clause. The "import" statement of the form "from ... import *" binds all names defined in the imported module, except those beginning with an underscore. This form may only be used at the module level. A target occurring in a "del" statement is also considered bound for this purpose (though the actual semantics are to unbind the name). Each assignment or import statement occurs within a block defined by a class or function definition or at the module level (the top-level code block). If a name is bound in a block, it is a local variable of that block, unless declared as "nonlocal" or "global". If a name is bound at the module level, it is a global variable. (The variables of the module code block are local and global.) If a variable is used in a code block but not defined there, it is a *free variable*. Each occurrence of a name in the program text refers to the *binding* of that name established by the following name resolution rules. 4.2.2. Resolution of names -------------------------- A *scope* defines the visibility of a name within a block. If a local variable is defined in a block, its scope includes that block. If the definition occurs in a function block, the scope extends to any blocks contained within the defining one, unless a contained block introduces a different binding for the name. When a name is used in a code block, it is resolved using the nearest enclosing scope. The set of all such scopes visible to a code block is called the block’s *environment*. When a name is not found at all, a "NameError" exception is raised. If the current scope is a function scope, and the name refers to a local variable that has not yet been bound to a value at the point where the name is used, an "UnboundLocalError" exception is raised. "UnboundLocalError" is a subclass of "NameError". If a name binding operation occurs anywhere within a code block, all uses of the name within the block are treated as references to the current block. This can lead to errors when a name is used within a block before it is bound. This rule is subtle. Python lacks declarations and allows name binding operations to occur anywhere within a code block. The local variables of a code block can be determined by scanning the entire text of the block for name binding operations. If the "global" statement occurs within a block, all uses of the names specified in the statement refer to the bindings of those names in the top-level namespace. Names are resolved in the top-level namespace by searching the global namespace, i.e. the namespace of the module containing the code block, and the builtins namespace, the namespace of the module "builtins". The global namespace is searched first. If the names are not found there, the builtins namespace is searched. The "global" statement must precede all uses of the listed names. The "global" statement has the same scope as a name binding operation in the same block. If the nearest enclosing scope for a free variable contains a global statement, the free variable is treated as a global. The "nonlocal" statement causes corresponding names to refer to previously bound variables in the nearest enclosing function scope. "SyntaxError" is raised at compile time if the given name does not exist in any enclosing function scope. The namespace for a module is automatically created the first time a module is imported. The main module for a script is always called "__main__". Class definition blocks and arguments to "exec()" and "eval()" are special in the context of name resolution. A class definition is an executable statement that may use and define names. These references follow the normal rules for name resolution with an exception that unbound local variables are looked up in the global namespace. The namespace of the class definition becomes the attribute dictionary of the class. The scope of names defined in a class block is limited to the class block; it does not extend to the code blocks of methods – this includes comprehensions and generator expressions since they are implemented using a function scope. This means that the following will fail: class A: a = 42 b = list(a + i for i in range(10)) 4.2.3. Builtins and restricted execution ---------------------------------------- **CPython implementation detail:** Users should not touch "__builtins__"; it is strictly an implementation detail. Users wanting to override values in the builtins namespace should "import" the "builtins" module and modify its attributes appropriately. The builtins namespace associated with the execution of a code block is actually found by looking up the name "__builtins__" in its global namespace; this should be a dictionary or a module (in the latter case the module’s dictionary is used). By default, when in the "__main__" module, "__builtins__" is the built-in module "builtins"; when in any other module, "__builtins__" is an alias for the dictionary of the "builtins" module itself. 4.2.4. Interaction with dynamic features ---------------------------------------- Name resolution of free variables occurs at runtime, not at compile time. This means that the following code will print 42: i = 10 def f(): print(i) i = 42 f() The "eval()" and "exec()" functions do not have access to the full environment for resolving names. Names may be resolved in the local and global namespaces of the caller. Free variables are not resolved in the nearest enclosing namespace, but in the global namespace. [1] The "exec()" and "eval()" functions have optional arguments to override the global and local namespace. If only one namespace is specified, it is used for both. 4.3. Exceptions =============== Exceptions are a means of breaking out of the normal flow of control of a code block in order to handle errors or other exceptional conditions. An exception is *raised* at the point where the error is detected; it may be *handled* by the surrounding code block or by any code block that directly or indirectly invoked the code block where the error occurred. The Python interpreter raises an exception when it detects a run-time error (such as division by zero). A Python program can also explicitly raise an exception with the "raise" statement. Exception handlers are specified with the "try" … "except" statement. The "finally" clause of such a statement can be used to specify cleanup code which does not handle the exception, but is executed whether an exception occurred or not in the preceding code. Python uses the “termination” model of error handling: an exception handler can find out what happened and continue execution at an outer level, but it cannot repair the cause of the error and retry the failing operation (except by re-entering the offending piece of code from the top). When an exception is not handled at all, the interpreter terminates execution of the program, or returns to its interactive main loop. In either case, it prints a stack traceback, except when the exception is "SystemExit". Exceptions are identified by class instances. The "except" clause is selected depending on the class of the instance: it must reference the class of the instance or a *non-virtual base class* thereof. The instance can be received by the handler and can carry additional information about the exceptional condition. Note: Exception messages are not part of the Python API. Their contents may change from one version of Python to the next without warning and should not be relied on by code which will run under multiple versions of the interpreter. See also the description of the "try" statement in section The try statement and "raise" statement in section The raise statement. -[ Footnotes ]- [1] This limitation occurs because the code that is executed by these operations is not available at the time the module is compiled. 6. Expressions ************** This chapter explains the meaning of the elements of expressions in Python. **Syntax Notes:** In this and the following chapters, extended BNF notation will be used to describe syntax, not lexical analysis. When (one alternative of) a syntax rule has the form name ::= othername and no semantics are given, the semantics of this form of "name" are the same as for "othername". 6.1. Arithmetic conversions =========================== When a description of an arithmetic operator below uses the phrase “the numeric arguments are converted to a common type”, this means that the operator implementation for built-in types works as follows: * If either argument is a complex number, the other is converted to complex; * otherwise, if either argument is a floating point number, the other is converted to floating point; * otherwise, both must be integers and no conversion is necessary. Some additional rules apply for certain operators (e.g., a string as a left argument to the ‘%’ operator). Extensions must define their own conversion behavior. 6.2. Atoms ========== Atoms are the most basic elements of expressions. The simplest atoms are identifiers or literals. Forms enclosed in parentheses, brackets or braces are also categorized syntactically as atoms. The syntax for atoms is: atom ::= identifier | literal | enclosure enclosure ::= parenth_form | list_display | dict_display | set_display | generator_expression | yield_atom 6.2.1. Identifiers (Names) -------------------------- An identifier occurring as an atom is a name. See section Identifiers and keywords for lexical definition and section Naming and binding for documentation of naming and binding. When the name is bound to an object, evaluation of the atom yields that object. When a name is not bound, an attempt to evaluate it raises a "NameError" exception. **Private name mangling:** When an identifier that textually occurs in a class definition begins with two or more underscore characters and does not end in two or more underscores, it is considered a *private name* of that class. Private names are transformed to a longer form before code is generated for them. The transformation inserts the class name, with leading underscores removed and a single underscore inserted, in front of the name. For example, the identifier "__spam" occurring in a class named "Ham" will be transformed to "_Ham__spam". This transformation is independent of the syntactical context in which the identifier is used. If the transformed name is extremely long (longer than 255 characters), implementation defined truncation may happen. If the class name consists only of underscores, no transformation is done. 6.2.2. Literals --------------- Python supports string and bytes literals and various numeric literals: literal ::= stringliteral | bytesliteral | integer | floatnumber | imagnumber Evaluation of a literal yields an object of the given type (string, bytes, integer, floating point number, complex number) with the given value. The value may be approximated in the case of floating point and imaginary (complex) literals. See section Literals for details. All literals correspond to immutable data types, and hence the object’s identity is less important than its value. Multiple evaluations of literals with the same value (either the same occurrence in the program text or a different occurrence) may obtain the same object or a different object with the same value. 6.2.3. Parenthesized forms -------------------------- A parenthesized form is an optional expression list enclosed in parentheses: parenth_form ::= "(" [starred_expression] ")" A parenthesized expression list yields whatever that expression list yields: if the list contains at least one comma, it yields a tuple; otherwise, it yields the single expression that makes up the expression list. An empty pair of parentheses yields an empty tuple object. Since tuples are immutable, the same rules as for literals apply (i.e., two occurrences of the empty tuple may or may not yield the same object). Note that tuples are not formed by the parentheses, but rather by use of the comma operator. The exception is the empty tuple, for which parentheses *are* required — allowing unparenthesized “nothing” in expressions would cause ambiguities and allow common typos to pass uncaught. 6.2.4. Displays for lists, sets and dictionaries ------------------------------------------------ For constructing a list, a set or a dictionary Python provides special syntax called “displays”, each of them in two flavors: * either the container contents are listed explicitly, or * they are computed via a set of looping and filtering instructions, called a *comprehension*. Common syntax elements for comprehensions are: comprehension ::= assignment_expression comp_for comp_for ::= ["async"] "for" target_list "in" or_test [comp_iter] comp_iter ::= comp_for | comp_if comp_if ::= "if" or_test [comp_iter] The comprehension consists of a single expression followed by at least one "for" clause and zero or more "for" or "if" clauses. In this case, the elements of the new container are those that would be produced by considering each of the "for" or "if" clauses a block, nesting from left to right, and evaluating the expression to produce an element each time the innermost block is reached. However, aside from the iterable expression in the leftmost "for" clause, the comprehension is executed in a separate implicitly nested scope. This ensures that names assigned to in the target list don’t “leak” into the enclosing scope. The iterable expression in the leftmost "for" clause is evaluated directly in the enclosing scope and then passed as an argument to the implicitly nested scope. Subsequent "for" clauses and any filter condition in the leftmost "for" clause cannot be evaluated in the enclosing scope as they may depend on the values obtained from the leftmost iterable. For example: "[x*y for x in range(10) for y in range(x, x+10)]". To ensure the comprehension always results in a container of the appropriate type, "yield" and "yield from" expressions are prohibited in the implicitly nested scope. Since Python 3.6, in an "async def" function, an "async for" clause may be used to iterate over a *asynchronous iterator*. A comprehension in an "async def" function may consist of either a "for" or "async for" clause following the leading expression, may contain additional "for" or "async for" clauses, and may also use "await" expressions. If a comprehension contains either "async for" clauses or "await" expressions it is called an *asynchronous comprehension*. An asynchronous comprehension may suspend the execution of the coroutine function in which it appears. See also **PEP 530**. New in version 3.6: Asynchronous comprehensions were introduced. Changed in version 3.8: "yield" and "yield from" prohibited in the implicitly nested scope. 6.2.5. List displays -------------------- A list display is a possibly empty series of expressions enclosed in square brackets: list_display ::= "[" [starred_list | comprehension] "]" A list display yields a new list object, the contents being specified by either a list of expressions or a comprehension. When a comma- separated list of expressions is supplied, its elements are evaluated from left to right and placed into the list object in that order. When a comprehension is supplied, the list is constructed from the elements resulting from the comprehension. 6.2.6. Set displays ------------------- A set display is denoted by curly braces and distinguishable from dictionary displays by the lack of colons separating keys and values: set_display ::= "{" (starred_list | comprehension) "}" A set display yields a new mutable set object, the contents being specified by either a sequence of expressions or a comprehension. When a comma-separated list of expressions is supplied, its elements are evaluated from left to right and added to the set object. When a comprehension is supplied, the set is constructed from the elements resulting from the comprehension. An empty set cannot be constructed with "{}"; this literal constructs an empty dictionary. 6.2.7. Dictionary displays -------------------------- A dictionary display is a possibly empty series of key/datum pairs enclosed in curly braces: dict_display ::= "{" [key_datum_list | dict_comprehension] "}" key_datum_list ::= key_datum ("," key_datum)* [","] key_datum ::= expression ":" expression | "**" or_expr dict_comprehension ::= expression ":" expression comp_for A dictionary display yields a new dictionary object. If a comma-separated sequence of key/datum pairs is given, they are evaluated from left to right to define the entries of the dictionary: each key object is used as a key into the dictionary to store the corresponding datum. This means that you can specify the same key multiple times in the key/datum list, and the final dictionary’s value for that key will be the last one given. A double asterisk "**" denotes *dictionary unpacking*. Its operand must be a *mapping*. Each mapping item is added to the new dictionary. Later values replace values already set by earlier key/datum pairs and earlier dictionary unpackings. New in version 3.5: Unpacking into dictionary displays, originally proposed by **PEP 448**. A dict comprehension, in contrast to list and set comprehensions, needs two expressions separated with a colon followed by the usual “for” and “if” clauses. When the comprehension is run, the resulting key and value elements are inserted in the new dictionary in the order they are produced. Restrictions on the types of the key values are listed earlier in section The standard type hierarchy. (To summarize, the key type should be *hashable*, which excludes all mutable objects.) Clashes between duplicate keys are not detected; the last datum (textually rightmost in the display) stored for a given key value prevails. Changed in version 3.8: Prior to Python 3.8, in dict comprehensions, the evaluation order of key and value was not well-defined. In CPython, the value was evaluated before the key. Starting with 3.8, the key is evaluated before the value, as proposed by **PEP 572**. 6.2.8. Generator expressions ---------------------------- A generator expression is a compact generator notation in parentheses: generator_expression ::= "(" expression comp_for ")" A generator expression yields a new generator object. Its syntax is the same as for comprehensions, except that it is enclosed in parentheses instead of brackets or curly braces. Variables used in the generator expression are evaluated lazily when the "__next__()" method is called for the generator object (in the same fashion as normal generators). However, the iterable expression in the leftmost "for" clause is immediately evaluated, so that an error produced by it will be emitted at the point where the generator expression is defined, rather than at the point where the first value is retrieved. Subsequent "for" clauses and any filter condition in the leftmost "for" clause cannot be evaluated in the enclosing scope as they may depend on the values obtained from the leftmost iterable. For example: "(x*y for x in range(10) for y in range(x, x+10))". The parentheses can be omitted on calls with only one argument. See section Calls for details. To avoid interfering with the expected operation of the generator expression itself, "yield" and "yield from" expressions are prohibited in the implicitly defined generator. If a generator expression contains either "async for" clauses or "await" expressions it is called an *asynchronous generator expression*. An asynchronous generator expression returns a new asynchronous generator object, which is an asynchronous iterator (see Asynchronous Iterators). New in version 3.6: Asynchronous generator expressions were introduced. Changed in version 3.7: Prior to Python 3.7, asynchronous generator expressions could only appear in "async def" coroutines. Starting with 3.7, any function can use asynchronous generator expressions. Changed in version 3.8: "yield" and "yield from" prohibited in the implicitly nested scope. 6.2.9. Yield expressions ------------------------ yield_atom ::= "(" yield_expression ")" yield_expression ::= "yield" [expression_list | "from" expression] The yield expression is used when defining a *generator* function or an *asynchronous generator* function and thus can only be used in the body of a function definition. Using a yield expression in a function’s body causes that function to be a generator function, and using it in an "async def" function’s body causes that coroutine function to be an asynchronous generator function. For example: def gen(): # defines a generator function yield 123 async def agen(): # defines an asynchronous generator function yield 123 Due to their side effects on the containing scope, "yield" expressions are not permitted as part of the implicitly defined scopes used to implement comprehensions and generator expressions. Changed in version 3.8: Yield expressions prohibited in the implicitly nested scopes used to implement comprehensions and generator expressions. Generator functions are described below, while asynchronous generator functions are described separately in section Asynchronous generator functions. When a generator function is called, it returns an iterator known as a generator. That generator then controls the execution of the generator function. The execution starts when one of the generator’s methods is called. At that time, the execution proceeds to the first yield expression, where it is suspended again, returning the value of "expression_list" to the generator’s caller. By suspended, we mean that all local state is retained, including the current bindings of local variables, the instruction pointer, the internal evaluation stack, and the state of any exception handling. When the execution is resumed by calling one of the generator’s methods, the function can proceed exactly as if the yield expression were just another external call. The value of the yield expression after resuming depends on the method which resumed the execution. If "__next__()" is used (typically via either a "for" or the "next()" builtin) then the result is "None". Otherwise, if "send()" is used, then the result will be the value passed in to that method. All of this makes generator functions quite similar to coroutines; they yield multiple times, they have more than one entry point and their execution can be suspended. The only difference is that a generator function cannot control where the execution should continue after it yields; the control is always transferred to the generator’s caller. Yield expressions are allowed anywhere in a "try" construct. If the generator is not resumed before it is finalized (by reaching a zero reference count or by being garbage collected), the generator- iterator’s "close()" method will be called, allowing any pending "finally" clauses to execute. When "yield from " is used, the supplied expression must be an iterable. The values produced by iterating that iterable are passed directly to the caller of the current generator’s methods. Any values passed in with "send()" and any exceptions passed in with "throw()" are passed to the underlying iterator if it has the appropriate methods. If this is not the case, then "send()" will raise "AttributeError" or "TypeError", while "throw()" will just raise the passed in exception immediately. When the underlying iterator is complete, the "value" attribute of the raised "StopIteration" instance becomes the value of the yield expression. It can be either set explicitly when raising "StopIteration", or automatically when the subiterator is a generator (by returning a value from the subgenerator). Changed in version 3.3: Added "yield from " to delegate control flow to a subiterator. The parentheses may be omitted when the yield expression is the sole expression on the right hand side of an assignment statement. See also: **PEP 255** - Simple Generators The proposal for adding generators and the "yield" statement to Python. **PEP 342** - Coroutines via Enhanced Generators The proposal to enhance the API and syntax of generators, making them usable as simple coroutines. **PEP 380** - Syntax for Delegating to a Subgenerator The proposal to introduce the "yield_from" syntax, making delegation to subgenerators easy. **PEP 525** - Asynchronous Generators The proposal that expanded on **PEP 492** by adding generator capabilities to coroutine functions. 6.2.9.1. Generator-iterator methods ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This subsection describes the methods of a generator iterator. They can be used to control the execution of a generator function. Note that calling any of the generator methods below when the generator is already executing raises a "ValueError" exception. generator.__next__() Starts the execution of a generator function or resumes it at the last executed yield expression. When a generator function is resumed with a "__next__()" method, the current yield expression always evaluates to "None". The execution then continues to the next yield expression, where the generator is suspended again, and the value of the "expression_list" is returned to "__next__()"’s caller. If the generator exits without yielding another value, a "StopIteration" exception is raised. This method is normally called implicitly, e.g. by a "for" loop, or by the built-in "next()" function. generator.send(value) Resumes the execution and “sends” a value into the generator function. The *value* argument becomes the result of the current yield expression. The "send()" method returns the next value yielded by the generator, or raises "StopIteration" if the generator exits without yielding another value. When "send()" is called to start the generator, it must be called with "None" as the argument, because there is no yield expression that could receive the value. generator.throw(value) generator.throw(type[, value[, traceback]]) Raises an exception at the point where the generator was paused, and returns the next value yielded by the generator function. If the generator exits without yielding another value, a "StopIteration" exception is raised. If the generator function does not catch the passed-in exception, or raises a different exception, then that exception propagates to the caller. In typical use, this is called with a single exception instance similar to the way the "raise" keyword is used. For backwards compatability, however, the second signature is supported, following a convention from older versions of Python. The *type* argument should be an exception class, and *value* should be an exception instance. If the *value* is not provided, the *type* constructor is called to get an instance. If *traceback* is provided, it is set on the exception, otherwise any existing "__traceback__" attribute stored in *value* may be cleared. generator.close() Raises a "GeneratorExit" at the point where the generator function was paused. If the generator function then exits gracefully, is already closed, or raises "GeneratorExit" (by not catching the exception), close returns to its caller. If the generator yields a value, a "RuntimeError" is raised. If the generator raises any other exception, it is propagated to the caller. "close()" does nothing if the generator has already exited due to an exception or normal exit. 6.2.9.2. Examples ~~~~~~~~~~~~~~~~~ Here is a simple example that demonstrates the behavior of generators and generator functions: >>> def echo(value=None): ... print("Execution starts when 'next()' is called for the first time.") ... try: ... while True: ... try: ... value = (yield value) ... except Exception as e: ... value = e ... finally: ... print("Don't forget to clean up when 'close()' is called.") ... >>> generator = echo(1) >>> print(next(generator)) Execution starts when 'next()' is called for the first time. 1 >>> print(next(generator)) None >>> print(generator.send(2)) 2 >>> generator.throw(TypeError, "spam") TypeError('spam',) >>> generator.close() Don't forget to clean up when 'close()' is called. For examples using "yield from", see PEP 380: Syntax for Delegating to a Subgenerator in “What’s New in Python.” 6.2.9.3. Asynchronous generator functions ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The presence of a yield expression in a function or method defined using "async def" further defines the function as an *asynchronous generator* function. When an asynchronous generator function is called, it returns an asynchronous iterator known as an asynchronous generator object. That object then controls the execution of the generator function. An asynchronous generator object is typically used in an "async for" statement in a coroutine function analogously to how a generator object would be used in a "for" statement. Calling one of the asynchronous generator’s methods returns an *awaitable* object, and the execution starts when this object is awaited on. At that time, the execution proceeds to the first yield expression, where it is suspended again, returning the value of "expression_list" to the awaiting coroutine. As with a generator, suspension means that all local state is retained, including the current bindings of local variables, the instruction pointer, the internal evaluation stack, and the state of any exception handling. When the execution is resumed by awaiting on the next object returned by the asynchronous generator’s methods, the function can proceed exactly as if the yield expression were just another external call. The value of the yield expression after resuming depends on the method which resumed the execution. If "__anext__()" is used then the result is "None". Otherwise, if "asend()" is used, then the result will be the value passed in to that method. In an asynchronous generator function, yield expressions are allowed anywhere in a "try" construct. However, if an asynchronous generator is not resumed before it is finalized (by reaching a zero reference count or by being garbage collected), then a yield expression within a "try" construct could result in a failure to execute pending "finally" clauses. In this case, it is the responsibility of the event loop or scheduler running the asynchronous generator to call the asynchronous generator-iterator’s "aclose()" method and run the resulting coroutine object, thus allowing any pending "finally" clauses to execute. To take care of finalization, an event loop should define a *finalizer* function which takes an asynchronous generator-iterator and presumably calls "aclose()" and executes the coroutine. This *finalizer* may be registered by calling "sys.set_asyncgen_hooks()". When first iterated over, an asynchronous generator-iterator will store the registered *finalizer* to be called upon finalization. For a reference example of a *finalizer* method see the implementation of "asyncio.Loop.shutdown_asyncgens" in Lib/asyncio/base_events.py. The expression "yield from " is a syntax error when used in an asynchronous generator function. 6.2.9.4. Asynchronous generator-iterator methods ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ This subsection describes the methods of an asynchronous generator iterator, which are used to control the execution of a generator function. coroutine agen.__anext__() Returns an awaitable which when run starts to execute the asynchronous generator or resumes it at the last executed yield expression. When an asynchronous generator function is resumed with an "__anext__()" method, the current yield expression always evaluates to "None" in the returned awaitable, which when run will continue to the next yield expression. The value of the "expression_list" of the yield expression is the value of the "StopIteration" exception raised by the completing coroutine. If the asynchronous generator exits without yielding another value, the awaitable instead raises a "StopAsyncIteration" exception, signalling that the asynchronous iteration has completed. This method is normally called implicitly by a "async for" loop. coroutine agen.asend(value) Returns an awaitable which when run resumes the execution of the asynchronous generator. As with the "send()" method for a generator, this “sends” a value into the asynchronous generator function, and the *value* argument becomes the result of the current yield expression. The awaitable returned by the "asend()" method will return the next value yielded by the generator as the value of the raised "StopIteration", or raises "StopAsyncIteration" if the asynchronous generator exits without yielding another value. When "asend()" is called to start the asynchronous generator, it must be called with "None" as the argument, because there is no yield expression that could receive the value. coroutine agen.athrow(value) coroutine agen.athrow(type[, value[, traceback]]) Returns an awaitable that raises an exception of type "type" at the point where the asynchronous generator was paused, and returns the next value yielded by the generator function as the value of the raised "StopIteration" exception. If the asynchronous generator exits without yielding another value, a "StopAsyncIteration" exception is raised by the awaitable. If the generator function does not catch the passed-in exception, or raises a different exception, then when the awaitable is run that exception propagates to the caller of the awaitable. coroutine agen.aclose() Returns an awaitable that when run will throw a "GeneratorExit" into the asynchronous generator function at the point where it was paused. If the asynchronous generator function then exits gracefully, is already closed, or raises "GeneratorExit" (by not catching the exception), then the returned awaitable will raise a "StopIteration" exception. Any further awaitables returned by subsequent calls to the asynchronous generator will raise a "StopAsyncIteration" exception. If the asynchronous generator yields a value, a "RuntimeError" is raised by the awaitable. If the asynchronous generator raises any other exception, it is propagated to the caller of the awaitable. If the asynchronous generator has already exited due to an exception or normal exit, then further calls to "aclose()" will return an awaitable that does nothing. 6.3. Primaries ============== Primaries represent the most tightly bound operations of the language. Their syntax is: primary ::= atom | attributeref | subscription | slicing | call 6.3.1. Attribute references --------------------------- An attribute reference is a primary followed by a period and a name: attributeref ::= primary "." identifier The primary must evaluate to an object of a type that supports attribute references, which most objects do. This object is then asked to produce the attribute whose name is the identifier. This production can be customized by overriding the "__getattr__()" method. If this attribute is not available, the exception "AttributeError" is raised. Otherwise, the type and value of the object produced is determined by the object. Multiple evaluations of the same attribute reference may yield different objects. 6.3.2. Subscriptions -------------------- The subscription of an instance of a container class will generally select an element from the container. The subscription of a *generic class* will generally return a GenericAlias object. subscription ::= primary "[" expression_list "]" When an object is subscripted, the interpreter will evaluate the primary and the expression list. The primary must evaluate to an object that supports subscription. An object may support subscription through defining one or both of "__getitem__()" and "__class_getitem__()". When the primary is subscripted, the evaluated result of the expression list will be passed to one of these methods. For more details on when "__class_getitem__" is called instead of "__getitem__", see __class_getitem__ versus __getitem__. If the expression list contains at least one comma, it will evaluate to a "tuple" containing the items of the expression list. Otherwise, the expression list will evaluate to the value of the list’s sole member. For built-in objects, there are two types of objects that support subscription via "__getitem__()": 1. Mappings. If the primary is a *mapping*, the expression list must evaluate to an object whose value is one of the keys of the mapping, and the subscription selects the value in the mapping that corresponds to that key. An example of a builtin mapping class is the "dict" class. 2. Sequences. If the primary is a *sequence*, the expression list must evaluate to an "int" or a "slice" (as discussed in the following section). Examples of builtin sequence classes include the "str", "list" and "tuple" classes. The formal syntax makes no special provision for negative indices in *sequences*. However, built-in sequences all provide a "__getitem__()" method that interprets negative indices by adding the length of the sequence to the index so that, for example, "x[-1]" selects the last item of "x". The resulting value must be a nonnegative integer less than the number of items in the sequence, and the subscription selects the item whose index is that value (counting from zero). Since the support for negative indices and slicing occurs in the object’s "__getitem__()" method, subclasses overriding this method will need to explicitly add that support. A "string" is a special kind of sequence whose items are *characters*. A character is not a separate data type but a string of exactly one character. 6.3.3. Slicings --------------- A slicing selects a range of items in a sequence object (e.g., a string, tuple or list). Slicings may be used as expressions or as targets in assignment or "del" statements. The syntax for a slicing: slicing ::= primary "[" slice_list "]" slice_list ::= slice_item ("," slice_item)* [","] slice_item ::= expression | proper_slice proper_slice ::= [lower_bound] ":" [upper_bound] [ ":" [stride] ] lower_bound ::= expression upper_bound ::= expression stride ::= expression There is ambiguity in the formal syntax here: anything that looks like an expression list also looks like a slice list, so any subscription can be interpreted as a slicing. Rather than further complicating the syntax, this is disambiguated by defining that in this case the interpretation as a subscription takes priority over the interpretation as a slicing (this is the case if the slice list contains no proper slice). The semantics for a slicing are as follows. The primary is indexed (using the same "__getitem__()" method as normal subscription) with a key that is constructed from the slice list, as follows. If the slice list contains at least one comma, the key is a tuple containing the conversion of the slice items; otherwise, the conversion of the lone slice item is the key. The conversion of a slice item that is an expression is that expression. The conversion of a proper slice is a slice object (see section The standard type hierarchy) whose "start", "stop" and "step" attributes are the values of the expressions given as lower bound, upper bound and stride, respectively, substituting "None" for missing expressions. 6.3.4. Calls ------------ A call calls a callable object (e.g., a *function*) with a possibly empty series of *arguments*: call ::= primary "(" [argument_list [","] | comprehension] ")" argument_list ::= positional_arguments ["," starred_and_keywords] ["," keywords_arguments] | starred_and_keywords ["," keywords_arguments] | keywords_arguments positional_arguments ::= positional_item ("," positional_item)* positional_item ::= assignment_expression | "*" expression starred_and_keywords ::= ("*" expression | keyword_item) ("," "*" expression | "," keyword_item)* keywords_arguments ::= (keyword_item | "**" expression) ("," keyword_item | "," "**" expression)* keyword_item ::= identifier "=" expression An optional trailing comma may be present after the positional and keyword arguments but does not affect the semantics. The primary must evaluate to a callable object (user-defined functions, built-in functions, methods of built-in objects, class objects, methods of class instances, and all objects having a "__call__()" method are callable). All argument expressions are evaluated before the call is attempted. Please refer to section Function definitions for the syntax of formal *parameter* lists. If keyword arguments are present, they are first converted to positional arguments, as follows. First, a list of unfilled slots is created for the formal parameters. If there are N positional arguments, they are placed in the first N slots. Next, for each keyword argument, the identifier is used to determine the corresponding slot (if the identifier is the same as the first formal parameter name, the first slot is used, and so on). If the slot is already filled, a "TypeError" exception is raised. Otherwise, the value of the argument is placed in the slot, filling it (even if the expression is "None", it fills the slot). When all arguments have been processed, the slots that are still unfilled are filled with the corresponding default value from the function definition. (Default values are calculated, once, when the function is defined; thus, a mutable object such as a list or dictionary used as default value will be shared by all calls that don’t specify an argument value for the corresponding slot; this should usually be avoided.) If there are any unfilled slots for which no default value is specified, a "TypeError" exception is raised. Otherwise, the list of filled slots is used as the argument list for the call. **CPython implementation detail:** An implementation may provide built-in functions whose positional parameters do not have names, even if they are ‘named’ for the purpose of documentation, and which therefore cannot be supplied by keyword. In CPython, this is the case for functions implemented in C that use "PyArg_ParseTuple()" to parse their arguments. If there are more positional arguments than there are formal parameter slots, a "TypeError" exception is raised, unless a formal parameter using the syntax "*identifier" is present; in this case, that formal parameter receives a tuple containing the excess positional arguments (or an empty tuple if there were no excess positional arguments). If any keyword argument does not correspond to a formal parameter name, a "TypeError" exception is raised, unless a formal parameter using the syntax "**identifier" is present; in this case, that formal parameter receives a dictionary containing the excess keyword arguments (using the keywords as keys and the argument values as corresponding values), or a (new) empty dictionary if there were no excess keyword arguments. If the syntax "*expression" appears in the function call, "expression" must evaluate to an *iterable*. Elements from these iterables are treated as if they were additional positional arguments. For the call "f(x1, x2, *y, x3, x4)", if *y* evaluates to a sequence *y1*, …, *yM*, this is equivalent to a call with M+4 positional arguments *x1*, *x2*, *y1*, …, *yM*, *x3*, *x4*. A consequence of this is that although the "*expression" syntax may appear *after* explicit keyword arguments, it is processed *before* the keyword arguments (and any "**expression" arguments – see below). So: >>> def f(a, b): ... print(a, b) ... >>> f(b=1, *(2,)) 2 1 >>> f(a=1, *(2,)) Traceback (most recent call last): File "", line 1, in TypeError: f() got multiple values for keyword argument 'a' >>> f(1, *(2,)) 1 2 It is unusual for both keyword arguments and the "*expression" syntax to be used in the same call, so in practice this confusion does not arise. If the syntax "**expression" appears in the function call, "expression" must evaluate to a *mapping*, the contents of which are treated as additional keyword arguments. If a keyword is already present (as an explicit keyword argument, or from another unpacking), a "TypeError" exception is raised. Formal parameters using the syntax "*identifier" or "**identifier" cannot be used as positional argument slots or as keyword argument names. Changed in version 3.5: Function calls accept any number of "*" and "**" unpackings, positional arguments may follow iterable unpackings ("*"), and keyword arguments may follow dictionary unpackings ("**"). Originally proposed by **PEP 448**. A call always returns some value, possibly "None", unless it raises an exception. How this value is computed depends on the type of the callable object. If it is— a user-defined function: The code block for the function is executed, passing it the argument list. The first thing the code block will do is bind the formal parameters to the arguments; this is described in section Function definitions. When the code block executes a "return" statement, this specifies the return value of the function call. a built-in function or method: The result is up to the interpreter; see Built-in Functions for the descriptions of built-in functions and methods. a class object: A new instance of that class is returned. a class instance method: The corresponding user-defined function is called, with an argument list that is one longer than the argument list of the call: the instance becomes the first argument. a class instance: The class must define a "__call__()" method; the effect is then the same as if that method was called. 6.4. Await expression ===================== Suspend the execution of *coroutine* on an *awaitable* object. Can only be used inside a *coroutine function*. await_expr ::= "await" primary New in version 3.5. 6.5. The power operator ======================= The power operator binds more tightly than unary operators on its left; it binds less tightly than unary operators on its right. The syntax is: power ::= (await_expr | primary) ["**" u_expr] Thus, in an unparenthesized sequence of power and unary operators, the operators are evaluated from right to left (this does not constrain the evaluation order for the operands): "-1**2" results in "-1". The power operator has the same semantics as the built-in "pow()" function, when called with two arguments: it yields its left argument raised to the power of its right argument. The numeric arguments are first converted to a common type, and the result is of that type. For int operands, the result has the same type as the operands unless the second argument is negative; in that case, all arguments are converted to float and a float result is delivered. For example, "10**2" returns "100", but "10**-2" returns "0.01". Raising "0.0" to a negative power results in a "ZeroDivisionError". Raising a negative number to a fractional power results in a "complex" number. (In earlier versions it raised a "ValueError".) This operation can be customized using the special "__pow__()" method. 6.6. Unary arithmetic and bitwise operations ============================================ All unary arithmetic and bitwise operations have the same priority: u_expr ::= power | "-" u_expr | "+" u_expr | "~" u_expr The unary "-" (minus) operator yields the negation of its numeric argument; the operation can be overridden with the "__neg__()" special method. The unary "+" (plus) operator yields its numeric argument unchanged; the operation can be overridden with the "__pos__()" special method. The unary "~" (invert) operator yields the bitwise inversion of its integer argument. The bitwise inversion of "x" is defined as "-(x+1)". It only applies to integral numbers or to custom objects that override the "__invert__()" special method. In all three cases, if the argument does not have the proper type, a "TypeError" exception is raised. 6.7. Binary arithmetic operations ================================= The binary arithmetic operations have the conventional priority levels. Note that some of these operations also apply to certain non- numeric types. Apart from the power operator, there are only two levels, one for multiplicative operators and one for additive operators: m_expr ::= u_expr | m_expr "*" u_expr | m_expr "@" m_expr | m_expr "//" u_expr | m_expr "/" u_expr | m_expr "%" u_expr a_expr ::= m_expr | a_expr "+" m_expr | a_expr "-" m_expr The "*" (multiplication) operator yields the product of its arguments. The arguments must either both be numbers, or one argument must be an integer and the other must be a sequence. In the former case, the numbers are converted to a common type and then multiplied together. In the latter case, sequence repetition is performed; a negative repetition factor yields an empty sequence. This operation can be customized using the special "__mul__()" and "__rmul__()" methods. The "@" (at) operator is intended to be used for matrix multiplication. No builtin Python types implement this operator. New in version 3.5. The "/" (division) and "//" (floor division) operators yield the quotient of their arguments. The numeric arguments are first converted to a common type. Division of integers yields a float, while floor division of integers results in an integer; the result is that of mathematical division with the ‘floor’ function applied to the result. Division by zero raises the "ZeroDivisionError" exception. This operation can be customized using the special "__truediv__()" and "__floordiv__()" methods. The "%" (modulo) operator yields the remainder from the division of the first argument by the second. The numeric arguments are first converted to a common type. A zero right argument raises the "ZeroDivisionError" exception. The arguments may be floating point numbers, e.g., "3.14%0.7" equals "0.34" (since "3.14" equals "4*0.7 + 0.34".) The modulo operator always yields a result with the same sign as its second operand (or zero); the absolute value of the result is strictly smaller than the absolute value of the second operand [1]. The floor division and modulo operators are connected by the following identity: "x == (x//y)*y + (x%y)". Floor division and modulo are also connected with the built-in function "divmod()": "divmod(x, y) == (x//y, x%y)". [2]. In addition to performing the modulo operation on numbers, the "%" operator is also overloaded by string objects to perform old-style string formatting (also known as interpolation). The syntax for string formatting is described in the Python Library Reference, section printf-style String Formatting. The *modulo* operation can be customized using the special "__mod__()" method. The floor division operator, the modulo operator, and the "divmod()" function are not defined for complex numbers. Instead, convert to a floating point number using the "abs()" function if appropriate. The "+" (addition) operator yields the sum of its arguments. The arguments must either both be numbers or both be sequences of the same type. In the former case, the numbers are converted to a common type and then added together. In the latter case, the sequences are concatenated. This operation can be customized using the special "__add__()" and "__radd__()" methods. The "-" (subtraction) operator yields the difference of its arguments. The numeric arguments are first converted to a common type. This operation can be customized using the special "__sub__()" method. 6.8. Shifting operations ======================== The shifting operations have lower priority than the arithmetic operations: shift_expr ::= a_expr | shift_expr ("<<" | ">>") a_expr These operators accept integers as arguments. They shift the first argument to the left or right by the number of bits given by the second argument. This operation can be customized using the special "__lshift__()" and "__rshift__()" methods. A right shift by *n* bits is defined as floor division by "pow(2,n)". A left shift by *n* bits is defined as multiplication with "pow(2,n)". 6.9. Binary bitwise operations ============================== Each of the three bitwise operations has a different priority level: and_expr ::= shift_expr | and_expr "&" shift_expr xor_expr ::= and_expr | xor_expr "^" and_expr or_expr ::= xor_expr | or_expr "|" xor_expr The "&" operator yields the bitwise AND of its arguments, which must be integers or one of them must be a custom object overriding "__and__()" or "__rand__()" special methods. The "^" operator yields the bitwise XOR (exclusive OR) of its arguments, which must be integers or one of them must be a custom object overriding "__xor__()" or "__rxor__()" special methods. The "|" operator yields the bitwise (inclusive) OR of its arguments, which must be integers or one of them must be a custom object overriding "__or__()" or "__ror__()" special methods. 6.10. Comparisons ================= Unlike C, all comparison operations in Python have the same priority, which is lower than that of any arithmetic, shifting or bitwise operation. Also unlike C, expressions like "a < b < c" have the interpretation that is conventional in mathematics: comparison ::= or_expr (comp_operator or_expr)* comp_operator ::= "<" | ">" | "==" | ">=" | "<=" | "!=" | "is" ["not"] | ["not"] "in" Comparisons yield boolean values: "True" or "False". Custom *rich comparison methods* may return non-boolean values. In this case Python will call "bool()" on such value in boolean contexts. Comparisons can be chained arbitrarily, e.g., "x < y <= z" is equivalent to "x < y and y <= z", except that "y" is evaluated only once (but in both cases "z" is not evaluated at all when "x < y" is found to be false). Formally, if *a*, *b*, *c*, …, *y*, *z* are expressions and *op1*, *op2*, …, *opN* are comparison operators, then "a op1 b op2 c ... y opN z" is equivalent to "a op1 b and b op2 c and ... y opN z", except that each expression is evaluated at most once. Note that "a op1 b op2 c" doesn’t imply any kind of comparison between *a* and *c*, so that, e.g., "x < y > z" is perfectly legal (though perhaps not pretty). 6.10.1. Value comparisons ------------------------- The operators "<", ">", "==", ">=", "<=", and "!=" compare the values of two objects. The objects do not need to have the same type. Chapter Objects, values and types states that objects have a value (in addition to type and identity). The value of an object is a rather abstract notion in Python: For example, there is no canonical access method for an object’s value. Also, there is no requirement that the value of an object should be constructed in a particular way, e.g. comprised of all its data attributes. Comparison operators implement a particular notion of what the value of an object is. One can think of them as defining the value of an object indirectly, by means of their comparison implementation. Because all types are (direct or indirect) subtypes of "object", they inherit the default comparison behavior from "object". Types can customize their comparison behavior by implementing *rich comparison methods* like "__lt__()", described in Basic customization. The default behavior for equality comparison ("==" and "!=") is based on the identity of the objects. Hence, equality comparison of instances with the same identity results in equality, and equality comparison of instances with different identities results in inequality. A motivation for this default behavior is the desire that all objects should be reflexive (i.e. "x is y" implies "x == y"). A default order comparison ("<", ">", "<=", and ">=") is not provided; an attempt raises "TypeError". A motivation for this default behavior is the lack of a similar invariant as for equality. The behavior of the default equality comparison, that instances with different identities are always unequal, may be in contrast to what types will need that have a sensible definition of object value and value-based equality. Such types will need to customize their comparison behavior, and in fact, a number of built-in types have done that. The following list describes the comparison behavior of the most important built-in types. * Numbers of built-in numeric types (Numeric Types — int, float, complex) and of the standard library types "fractions.Fraction" and "decimal.Decimal" can be compared within and across their types, with the restriction that complex numbers do not support order comparison. Within the limits of the types involved, they compare mathematically (algorithmically) correct without loss of precision. The not-a-number values "float('NaN')" and "decimal.Decimal('NaN')" are special. Any ordered comparison of a number to a not-a-number value is false. A counter-intuitive implication is that not-a-number values are not equal to themselves. For example, if "x = float('NaN')", "3 < x", "x < 3" and "x == x" are all false, while "x != x" is true. This behavior is compliant with IEEE 754. * "None" and "NotImplemented" are singletons. **PEP 8** advises that comparisons for singletons should always be done with "is" or "is not", never the equality operators. * Binary sequences (instances of "bytes" or "bytearray") can be compared within and across their types. They compare lexicographically using the numeric values of their elements. * Strings (instances of "str") compare lexicographically using the numerical Unicode code points (the result of the built-in function "ord()") of their characters. [3] Strings and binary sequences cannot be directly compared. * Sequences (instances of "tuple", "list", or "range") can be compared only within each of their types, with the restriction that ranges do not support order comparison. Equality comparison across these types results in inequality, and ordering comparison across these types raises "TypeError". Sequences compare lexicographically using comparison of corresponding elements. The built-in containers typically assume identical objects are equal to themselves. That lets them bypass equality tests for identical objects to improve performance and to maintain their internal invariants. Lexicographical comparison between built-in collections works as follows: * For two collections to compare equal, they must be of the same type, have the same length, and each pair of corresponding elements must compare equal (for example, "[1,2] == (1,2)" is false because the type is not the same). * Collections that support order comparison are ordered the same as their first unequal elements (for example, "[1,2,x] <= [1,2,y]" has the same value as "x <= y"). If a corresponding element does not exist, the shorter collection is ordered first (for example, "[1,2] < [1,2,3]" is true). * Mappings (instances of "dict") compare equal if and only if they have equal *(key, value)* pairs. Equality comparison of the keys and values enforces reflexivity. Order comparisons ("<", ">", "<=", and ">=") raise "TypeError". * Sets (instances of "set" or "frozenset") can be compared within and across their types. They define order comparison operators to mean subset and superset tests. Those relations do not define total orderings (for example, the two sets "{1,2}" and "{2,3}" are not equal, nor subsets of one another, nor supersets of one another). Accordingly, sets are not appropriate arguments for functions which depend on total ordering (for example, "min()", "max()", and "sorted()" produce undefined results given a list of sets as inputs). Comparison of sets enforces reflexivity of its elements. * Most other built-in types have no comparison methods implemented, so they inherit the default comparison behavior. User-defined classes that customize their comparison behavior should follow some consistency rules, if possible: * Equality comparison should be reflexive. In other words, identical objects should compare equal: "x is y" implies "x == y" * Comparison should be symmetric. In other words, the following expressions should have the same result: "x == y" and "y == x" "x != y" and "y != x" "x < y" and "y > x" "x <= y" and "y >= x" * Comparison should be transitive. The following (non-exhaustive) examples illustrate that: "x > y and y > z" implies "x > z" "x < y and y <= z" implies "x < z" * Inverse comparison should result in the boolean negation. In other words, the following expressions should have the same result: "x == y" and "not x != y" "x < y" and "not x >= y" (for total ordering) "x > y" and "not x <= y" (for total ordering) The last two expressions apply to totally ordered collections (e.g. to sequences, but not to sets or mappings). See also the "total_ordering()" decorator. * The "hash()" result should be consistent with equality. Objects that are equal should either have the same hash value, or be marked as unhashable. Python does not enforce these consistency rules. In fact, the not-a-number values are an example for not following these rules. 6.10.2. Membership test operations ---------------------------------- The operators "in" and "not in" test for membership. "x in s" evaluates to "True" if *x* is a member of *s*, and "False" otherwise. "x not in s" returns the negation of "x in s". All built-in sequences and set types support this as well as dictionary, for which "in" tests whether the dictionary has a given key. For container types such as list, tuple, set, frozenset, dict, or collections.deque, the expression "x in y" is equivalent to "any(x is e or x == e for e in y)". For the string and bytes types, "x in y" is "True" if and only if *x* is a substring of *y*. An equivalent test is "y.find(x) != -1". Empty strings are always considered to be a substring of any other string, so """ in "abc"" will return "True". For user-defined classes which define the "__contains__()" method, "x in y" returns "True" if "y.__contains__(x)" returns a true value, and "False" otherwise. For user-defined classes which do not define "__contains__()" but do define "__iter__()", "x in y" is "True" if some value "z", for which the expression "x is z or x == z" is true, is produced while iterating over "y". If an exception is raised during the iteration, it is as if "in" raised that exception. Lastly, the old-style iteration protocol is tried: if a class defines "__getitem__()", "x in y" is "True" if and only if there is a non- negative integer index *i* such that "x is y[i] or x == y[i]", and no lower integer index raises the "IndexError" exception. (If any other exception is raised, it is as if "in" raised that exception). The operator "not in" is defined to have the inverse truth value of "in". 6.10.3. Identity comparisons ---------------------------- The operators "is" and "is not" test for an object’s identity: "x is y" is true if and only if *x* and *y* are the same object. An Object’s identity is determined using the "id()" function. "x is not y" yields the inverse truth value. [4] 6.11. Boolean operations ======================== or_test ::= and_test | or_test "or" and_test and_test ::= not_test | and_test "and" not_test not_test ::= comparison | "not" not_test In the context of Boolean operations, and also when expressions are used by control flow statements, the following values are interpreted as false: "False", "None", numeric zero of all types, and empty strings and containers (including strings, tuples, lists, dictionaries, sets and frozensets). All other values are interpreted as true. User-defined objects can customize their truth value by providing a "__bool__()" method. The operator "not" yields "True" if its argument is false, "False" otherwise. The expression "x and y" first evaluates *x*; if *x* is false, its value is returned; otherwise, *y* is evaluated and the resulting value is returned. The expression "x or y" first evaluates *x*; if *x* is true, its value is returned; otherwise, *y* is evaluated and the resulting value is returned. Note that neither "and" nor "or" restrict the value and type they return to "False" and "True", but rather return the last evaluated argument. This is sometimes useful, e.g., if "s" is a string that should be replaced by a default value if it is empty, the expression "s or 'foo'" yields the desired value. Because "not" has to create a new value, it returns a boolean value regardless of the type of its argument (for example, "not 'foo'" produces "False" rather than "''".) 6.12. Assignment expressions ============================ assignment_expression ::= [identifier ":="] expression An assignment expression (sometimes also called a “named expression” or “walrus”) assigns an "expression" to an "identifier", while also returning the value of the "expression". One common use case is when handling matched regular expressions: if matching := pattern.search(data): do_something(matching) Or, when processing a file stream in chunks: while chunk := file.read(9000): process(chunk) New in version 3.8: See **PEP 572** for more details about assignment expressions. 6.13. Conditional expressions ============================= conditional_expression ::= or_test ["if" or_test "else" expression] expression ::= conditional_expression | lambda_expr Conditional expressions (sometimes called a “ternary operator”) have the lowest priority of all Python operations. The expression "x if C else y" first evaluates the condition, *C* rather than *x*. If *C* is true, *x* is evaluated and its value is returned; otherwise, *y* is evaluated and its value is returned. See **PEP 308** for more details about conditional expressions. 6.14. Lambdas ============= lambda_expr ::= "lambda" [parameter_list] ":" expression Lambda expressions (sometimes called lambda forms) are used to create anonymous functions. The expression "lambda parameters: expression" yields a function object. The unnamed object behaves like a function object defined with: def (parameters): return expression See section Function definitions for the syntax of parameter lists. Note that functions created with lambda expressions cannot contain statements or annotations. 6.15. Expression lists ====================== expression_list ::= expression ("," expression)* [","] starred_list ::= starred_item ("," starred_item)* [","] starred_expression ::= expression | (starred_item ",")* [starred_item] starred_item ::= assignment_expression | "*" or_expr Except when part of a list or set display, an expression list containing at least one comma yields a tuple. The length of the tuple is the number of expressions in the list. The expressions are evaluated from left to right. An asterisk "*" denotes *iterable unpacking*. Its operand must be an *iterable*. The iterable is expanded into a sequence of items, which are included in the new tuple, list, or set, at the site of the unpacking. New in version 3.5: Iterable unpacking in expression lists, originally proposed by **PEP 448**. The trailing comma is required only to create a single tuple (a.k.a. a *singleton*); it is optional in all other cases. A single expression without a trailing comma doesn’t create a tuple, but rather yields the value of that expression. (To create an empty tuple, use an empty pair of parentheses: "()".) 6.16. Evaluation order ====================== Python evaluates expressions from left to right. Notice that while evaluating an assignment, the right-hand side is evaluated before the left-hand side. In the following lines, expressions will be evaluated in the arithmetic order of their suffixes: expr1, expr2, expr3, expr4 (expr1, expr2, expr3, expr4) {expr1: expr2, expr3: expr4} expr1 + expr2 * (expr3 - expr4) expr1(expr2, expr3, *expr4, **expr5) expr3, expr4 = expr1, expr2 6.17. Operator precedence ========================= The following table summarizes the operator precedence in Python, from highest precedence (most binding) to lowest precedence (least binding). Operators in the same box have the same precedence. Unless the syntax is explicitly given, operators are binary. Operators in the same box group left to right (except for exponentiation, which groups from right to left). Note that comparisons, membership tests, and identity tests, all have the same precedence and have a left-to-right chaining feature as described in the Comparisons section. +-------------------------------------------------+---------------------------------------+ | Operator | Description | |=================================================|=======================================| | "(expressions...)", "[expressions...]", "{key: | Binding or parenthesized expression, | | value...}", "{expressions...}" | list display, dictionary display, set | | | display | +-------------------------------------------------+---------------------------------------+ | "x[index]", "x[index:index]", | Subscription, slicing, call, | | "x(arguments...)", "x.attribute" | attribute reference | +-------------------------------------------------+---------------------------------------+ | "await x" | Await expression | +-------------------------------------------------+---------------------------------------+ | "**" | Exponentiation [5] | +-------------------------------------------------+---------------------------------------+ | "+x", "-x", "~x" | Positive, negative, bitwise NOT | +-------------------------------------------------+---------------------------------------+ | "*", "@", "/", "//", "%" | Multiplication, matrix | | | multiplication, division, floor | | | division, remainder [6] | +-------------------------------------------------+---------------------------------------+ | "+", "-" | Addition and subtraction | +-------------------------------------------------+---------------------------------------+ | "<<", ">>" | Shifts | +-------------------------------------------------+---------------------------------------+ | "&" | Bitwise AND | +-------------------------------------------------+---------------------------------------+ | "^" | Bitwise XOR | +-------------------------------------------------+---------------------------------------+ | "|" | Bitwise OR | +-------------------------------------------------+---------------------------------------+ | "in", "not in", "is", "is not", "<", "<=", ">", | Comparisons, including membership | | ">=", "!=", "==" | tests and identity tests | +-------------------------------------------------+---------------------------------------+ | "not x" | Boolean NOT | +-------------------------------------------------+---------------------------------------+ | "and" | Boolean AND | +-------------------------------------------------+---------------------------------------+ | "or" | Boolean OR | +-------------------------------------------------+---------------------------------------+ | "if" – "else" | Conditional expression | +-------------------------------------------------+---------------------------------------+ | "lambda" | Lambda expression | +-------------------------------------------------+---------------------------------------+ | ":=" | Assignment expression | +-------------------------------------------------+---------------------------------------+ -[ Footnotes ]- [1] While "abs(x%y) < abs(y)" is true mathematically, for floats it may not be true numerically due to roundoff. For example, and assuming a platform on which a Python float is an IEEE 754 double- precision number, in order that "-1e-100 % 1e100" have the same sign as "1e100", the computed result is "-1e-100 + 1e100", which is numerically exactly equal to "1e100". The function "math.fmod()" returns a result whose sign matches the sign of the first argument instead, and so returns "-1e-100" in this case. Which approach is more appropriate depends on the application. [2] If x is very close to an exact integer multiple of y, it’s possible for "x//y" to be one larger than "(x-x%y)//y" due to rounding. In such cases, Python returns the latter result, in order to preserve that "divmod(x,y)[0] * y + x % y" be very close to "x". [3] The Unicode standard distinguishes between *code points* (e.g. U+0041) and *abstract characters* (e.g. “LATIN CAPITAL LETTER A”). While most abstract characters in Unicode are only represented using one code point, there is a number of abstract characters that can in addition be represented using a sequence of more than one code point. For example, the abstract character “LATIN CAPITAL LETTER C WITH CEDILLA” can be represented as a single *precomposed character* at code position U+00C7, or as a sequence of a *base character* at code position U+0043 (LATIN CAPITAL LETTER C), followed by a *combining character* at code position U+0327 (COMBINING CEDILLA). The comparison operators on strings compare at the level of Unicode code points. This may be counter-intuitive to humans. For example, ""\u00C7" == "\u0043\u0327"" is "False", even though both strings represent the same abstract character “LATIN CAPITAL LETTER C WITH CEDILLA”. To compare strings at the level of abstract characters (that is, in a way intuitive to humans), use "unicodedata.normalize()". [4] Due to automatic garbage-collection, free lists, and the dynamic nature of descriptors, you may notice seemingly unusual behaviour in certain uses of the "is" operator, like those involving comparisons between instance methods, or constants. Check their documentation for more info. [5] The power operator "**" binds less tightly than an arithmetic or bitwise unary operator on its right, that is, "2**-1" is "0.5". [6] The "%" operator is also used for string formatting; the same precedence applies. 10. Full Grammar specification ****************************** This is the full Python grammar, derived directly from the grammar used to generate the CPython parser (see Grammar/python.gram). The version here omits details related to code generation and error recovery. The notation is a mixture of EBNF and PEG. In particular, "&" followed by a symbol, token or parenthesized group indicates a positive lookahead (i.e., is required to match but not consumed), while "!" indicates a negative lookahead (i.e., is required _not_ to match). We use the "|" separator to mean PEG’s “ordered choice” (written as "/" in traditional PEG grammars). # PEG grammar for Python @trailer ''' void * _PyPegen_parse(Parser *p) { // Initialize keywords p->keywords = reserved_keywords; p->n_keyword_lists = n_keyword_lists; // Run parser void *result = NULL; if (p->start_rule == Py_file_input) { result = file_rule(p); } else if (p->start_rule == Py_single_input) { result = interactive_rule(p); } else if (p->start_rule == Py_eval_input) { result = eval_rule(p); } else if (p->start_rule == Py_func_type_input) { result = func_type_rule(p); } else if (p->start_rule == Py_fstring_input) { result = fstring_rule(p); } return result; } // The end ''' file[mod_ty]: a=[statements] ENDMARKER { _PyPegen_make_module(p, a) } interactive[mod_ty]: a=statement_newline { Interactive(a, p->arena) } eval[mod_ty]: a=expressions NEWLINE* ENDMARKER { Expression(a, p->arena) } func_type[mod_ty]: '(' a=[type_expressions] ')' '->' b=expression NEWLINE* ENDMARKER { FunctionType(a, b, p->arena) } fstring[expr_ty]: star_expressions # type_expressions allow */** but ignore them type_expressions[asdl_seq*]: | a=','.expression+ ',' '*' b=expression ',' '**' c=expression { _PyPegen_seq_append_to_end(p, CHECK(_PyPegen_seq_append_to_end(p, a, b)), c) } | a=','.expression+ ',' '*' b=expression { _PyPegen_seq_append_to_end(p, a, b) } | a=','.expression+ ',' '**' b=expression { _PyPegen_seq_append_to_end(p, a, b) } | '*' a=expression ',' '**' b=expression { _PyPegen_seq_append_to_end(p, CHECK(_PyPegen_singleton_seq(p, a)), b) } | '*' a=expression { _PyPegen_singleton_seq(p, a) } | '**' a=expression { _PyPegen_singleton_seq(p, a) } | ','.expression+ statements[asdl_seq*]: a=statement+ { _PyPegen_seq_flatten(p, a) } statement[asdl_seq*]: a=compound_stmt { _PyPegen_singleton_seq(p, a) } | simple_stmt statement_newline[asdl_seq*]: | a=compound_stmt NEWLINE { _PyPegen_singleton_seq(p, a) } | simple_stmt | NEWLINE { _PyPegen_singleton_seq(p, CHECK(_Py_Pass(EXTRA))) } | ENDMARKER { _PyPegen_interactive_exit(p) } simple_stmt[asdl_seq*]: | a=small_stmt !';' NEWLINE { _PyPegen_singleton_seq(p, a) } # Not needed, there for speedup | a=';'.small_stmt+ [';'] NEWLINE { a } # NOTE: assignment MUST precede expression, else parsing a simple assignment # will throw a SyntaxError. small_stmt[stmt_ty] (memo): | assignment | e=star_expressions { _Py_Expr(e, EXTRA) } | &'return' return_stmt | &('import' | 'from') import_stmt | &'raise' raise_stmt | 'pass' { _Py_Pass(EXTRA) } | &'del' del_stmt | &'yield' yield_stmt | &'assert' assert_stmt | 'break' { _Py_Break(EXTRA) } | 'continue' { _Py_Continue(EXTRA) } | &'global' global_stmt | &'nonlocal' nonlocal_stmt compound_stmt[stmt_ty]: | &('def' | '@' | ASYNC) function_def | &'if' if_stmt | &('class' | '@') class_def | &('with' | ASYNC) with_stmt | &('for' | ASYNC) for_stmt | &'try' try_stmt | &'while' while_stmt # NOTE: annotated_rhs may start with 'yield'; yield_expr must start with 'yield' assignment[stmt_ty]: | a=NAME ':' b=expression c=['=' d=annotated_rhs { d }] { CHECK_VERSION( 6, "Variable annotation syntax is", _Py_AnnAssign(CHECK(_PyPegen_set_expr_context(p, a, Store)), b, c, 1, EXTRA) ) } | a=('(' b=single_target ')' { b } | single_subscript_attribute_target) ':' b=expression c=['=' d=annotated_rhs { d }] { CHECK_VERSION(6, "Variable annotations syntax is", _Py_AnnAssign(a, b, c, 0, EXTRA)) } | a=(z=star_targets '=' { z })+ b=(yield_expr | star_expressions) !'=' tc=[TYPE_COMMENT] { _Py_Assign(a, b, NEW_TYPE_COMMENT(p, tc), EXTRA) } | a=single_target b=augassign ~ c=(yield_expr | star_expressions) { _Py_AugAssign(a, b->kind, c, EXTRA) } | invalid_assignment augassign[AugOperator*]: | '+=' { _PyPegen_augoperator(p, Add) } | '-=' { _PyPegen_augoperator(p, Sub) } | '*=' { _PyPegen_augoperator(p, Mult) } | '@=' { CHECK_VERSION(5, "The '@' operator is", _PyPegen_augoperator(p, MatMult)) } | '/=' { _PyPegen_augoperator(p, Div) } | '%=' { _PyPegen_augoperator(p, Mod) } | '&=' { _PyPegen_augoperator(p, BitAnd) } | '|=' { _PyPegen_augoperator(p, BitOr) } | '^=' { _PyPegen_augoperator(p, BitXor) } | '<<=' { _PyPegen_augoperator(p, LShift) } | '>>=' { _PyPegen_augoperator(p, RShift) } | '**=' { _PyPegen_augoperator(p, Pow) } | '//=' { _PyPegen_augoperator(p, FloorDiv) } global_stmt[stmt_ty]: 'global' a=','.NAME+ { _Py_Global(CHECK(_PyPegen_map_names_to_ids(p, a)), EXTRA) } nonlocal_stmt[stmt_ty]: 'nonlocal' a=','.NAME+ { _Py_Nonlocal(CHECK(_PyPegen_map_names_to_ids(p, a)), EXTRA) } yield_stmt[stmt_ty]: y=yield_expr { _Py_Expr(y, EXTRA) } assert_stmt[stmt_ty]: 'assert' a=expression b=[',' z=expression { z }] { _Py_Assert(a, b, EXTRA) } del_stmt[stmt_ty]: | 'del' a=del_targets &(';' | NEWLINE) { _Py_Delete(a, EXTRA) } | invalid_del_stmt import_stmt[stmt_ty]: import_name | import_from import_name[stmt_ty]: 'import' a=dotted_as_names { _Py_Import(a, EXTRA) } # note below: the ('.' | '...') is necessary because '...' is tokenized as ELLIPSIS import_from[stmt_ty]: | 'from' a=('.' | '...')* b=dotted_name 'import' c=import_from_targets { _Py_ImportFrom(b->v.Name.id, c, _PyPegen_seq_count_dots(a), EXTRA) } | 'from' a=('.' | '...')+ 'import' b=import_from_targets { _Py_ImportFrom(NULL, b, _PyPegen_seq_count_dots(a), EXTRA) } import_from_targets[asdl_seq*]: | '(' a=import_from_as_names [','] ')' { a } | import_from_as_names !',' | '*' { _PyPegen_singleton_seq(p, CHECK(_PyPegen_alias_for_star(p))) } | invalid_import_from_targets import_from_as_names[asdl_seq*]: | a=','.import_from_as_name+ { a } import_from_as_name[alias_ty]: | a=NAME b=['as' z=NAME { z }] { _Py_alias(a->v.Name.id, (b) ? ((expr_ty) b)->v.Name.id : NULL, p->arena) } dotted_as_names[asdl_seq*]: | a=','.dotted_as_name+ { a } dotted_as_name[alias_ty]: | a=dotted_name b=['as' z=NAME { z }] { _Py_alias(a->v.Name.id, (b) ? ((expr_ty) b)->v.Name.id : NULL, p->arena) } dotted_name[expr_ty]: | a=dotted_name '.' b=NAME { _PyPegen_join_names_with_dot(p, a, b) } | NAME if_stmt[stmt_ty]: | 'if' a=named_expression ':' b=block c=elif_stmt { _Py_If(a, b, CHECK(_PyPegen_singleton_seq(p, c)), EXTRA) } | 'if' a=named_expression ':' b=block c=[else_block] { _Py_If(a, b, c, EXTRA) } elif_stmt[stmt_ty]: | 'elif' a=named_expression ':' b=block c=elif_stmt { _Py_If(a, b, CHECK(_PyPegen_singleton_seq(p, c)), EXTRA) } | 'elif' a=named_expression ':' b=block c=[else_block] { _Py_If(a, b, c, EXTRA) } else_block[asdl_seq*]: 'else' ':' b=block { b } while_stmt[stmt_ty]: | 'while' a=named_expression ':' b=block c=[else_block] { _Py_While(a, b, c, EXTRA) } for_stmt[stmt_ty]: | 'for' t=star_targets 'in' ~ ex=star_expressions ':' tc=[TYPE_COMMENT] b=block el=[else_block] { _Py_For(t, ex, b, el, NEW_TYPE_COMMENT(p, tc), EXTRA) } | ASYNC 'for' t=star_targets 'in' ~ ex=star_expressions ':' tc=[TYPE_COMMENT] b=block el=[else_block] { CHECK_VERSION(5, "Async for loops are", _Py_AsyncFor(t, ex, b, el, NEW_TYPE_COMMENT(p, tc), EXTRA)) } | invalid_for_target with_stmt[stmt_ty]: | 'with' '(' a=','.with_item+ ','? ')' ':' b=block { _Py_With(a, b, NULL, EXTRA) } | 'with' a=','.with_item+ ':' tc=[TYPE_COMMENT] b=block { _Py_With(a, b, NEW_TYPE_COMMENT(p, tc), EXTRA) } | ASYNC 'with' '(' a=','.with_item+ ','? ')' ':' b=block { CHECK_VERSION(5, "Async with statements are", _Py_AsyncWith(a, b, NULL, EXTRA)) } | ASYNC 'with' a=','.with_item+ ':' tc=[TYPE_COMMENT] b=block { CHECK_VERSION(5, "Async with statements are", _Py_AsyncWith(a, b, NEW_TYPE_COMMENT(p, tc), EXTRA)) } with_item[withitem_ty]: | e=expression 'as' t=star_target &(',' | ')' | ':') { _Py_withitem(e, t, p->arena) } | invalid_with_item | e=expression { _Py_withitem(e, NULL, p->arena) } try_stmt[stmt_ty]: | 'try' ':' b=block f=finally_block { _Py_Try(b, NULL, NULL, f, EXTRA) } | 'try' ':' b=block ex=except_block+ el=[else_block] f=[finally_block] { _Py_Try(b, ex, el, f, EXTRA) } except_block[excepthandler_ty]: | 'except' e=expression t=['as' z=NAME { z }] ':' b=block { _Py_ExceptHandler(e, (t) ? ((expr_ty) t)->v.Name.id : NULL, b, EXTRA) } | 'except' ':' b=block { _Py_ExceptHandler(NULL, NULL, b, EXTRA) } finally_block[asdl_seq*]: 'finally' ':' a=block { a } return_stmt[stmt_ty]: | 'return' a=[star_expressions] { _Py_Return(a, EXTRA) } raise_stmt[stmt_ty]: | 'raise' a=expression b=['from' z=expression { z }] { _Py_Raise(a, b, EXTRA) } | 'raise' { _Py_Raise(NULL, NULL, EXTRA) } function_def[stmt_ty]: | d=decorators f=function_def_raw { _PyPegen_function_def_decorators(p, d, f) } | function_def_raw function_def_raw[stmt_ty]: | 'def' n=NAME '(' params=[params] ')' a=['->' z=expression { z }] ':' tc=[func_type_comment] b=block { _Py_FunctionDef(n->v.Name.id, (params) ? params : CHECK(_PyPegen_empty_arguments(p)), b, NULL, a, NEW_TYPE_COMMENT(p, tc), EXTRA) } | ASYNC 'def' n=NAME '(' params=[params] ')' a=['->' z=expression { z }] ':' tc=[func_type_comment] b=block { CHECK_VERSION( 5, "Async functions are", _Py_AsyncFunctionDef(n->v.Name.id, (params) ? params : CHECK(_PyPegen_empty_arguments(p)), b, NULL, a, NEW_TYPE_COMMENT(p, tc), EXTRA) ) } func_type_comment[Token*]: | NEWLINE t=TYPE_COMMENT &(NEWLINE INDENT) { t } # Must be followed by indented block | invalid_double_type_comments | TYPE_COMMENT params[arguments_ty]: | invalid_parameters | parameters parameters[arguments_ty]: | a=slash_no_default b=param_no_default* c=param_with_default* d=[star_etc] { _PyPegen_make_arguments(p, a, NULL, b, c, d) } | a=slash_with_default b=param_with_default* c=[star_etc] { _PyPegen_make_arguments(p, NULL, a, NULL, b, c) } | a=param_no_default+ b=param_with_default* c=[star_etc] { _PyPegen_make_arguments(p, NULL, NULL, a, b, c) } | a=param_with_default+ b=[star_etc] { _PyPegen_make_arguments(p, NULL, NULL, NULL, a, b)} | a=star_etc { _PyPegen_make_arguments(p, NULL, NULL, NULL, NULL, a) } # Some duplication here because we can't write (',' | &')'), # which is because we don't support empty alternatives (yet). # slash_no_default[asdl_seq*]: | a=param_no_default+ '/' ',' { a } | a=param_no_default+ '/' &')' { a } slash_with_default[SlashWithDefault*]: | a=param_no_default* b=param_with_default+ '/' ',' { _PyPegen_slash_with_default(p, a, b) } | a=param_no_default* b=param_with_default+ '/' &')' { _PyPegen_slash_with_default(p, a, b) } star_etc[StarEtc*]: | '*' a=param_no_default b=param_maybe_default* c=[kwds] { _PyPegen_star_etc(p, a, b, c) } | '*' ',' b=param_maybe_default+ c=[kwds] { _PyPegen_star_etc(p, NULL, b, c) } | a=kwds { _PyPegen_star_etc(p, NULL, NULL, a) } | invalid_star_etc kwds[arg_ty]: '**' a=param_no_default { a } # One parameter. This *includes* a following comma and type comment. # # There are three styles: # - No default # - With default # - Maybe with default # # There are two alternative forms of each, to deal with type comments: # - Ends in a comma followed by an optional type comment # - No comma, optional type comment, must be followed by close paren # The latter form is for a final parameter without trailing comma. # param_no_default[arg_ty]: | a=param ',' tc=TYPE_COMMENT? { _PyPegen_add_type_comment_to_arg(p, a, tc) } | a=param tc=TYPE_COMMENT? &')' { _PyPegen_add_type_comment_to_arg(p, a, tc) } param_with_default[NameDefaultPair*]: | a=param c=default ',' tc=TYPE_COMMENT? { _PyPegen_name_default_pair(p, a, c, tc) } | a=param c=default tc=TYPE_COMMENT? &')' { _PyPegen_name_default_pair(p, a, c, tc) } param_maybe_default[NameDefaultPair*]: | a=param c=default? ',' tc=TYPE_COMMENT? { _PyPegen_name_default_pair(p, a, c, tc) } | a=param c=default? tc=TYPE_COMMENT? &')' { _PyPegen_name_default_pair(p, a, c, tc) } param[arg_ty]: a=NAME b=annotation? { _Py_arg(a->v.Name.id, b, NULL, EXTRA) } annotation[expr_ty]: ':' a=expression { a } default[expr_ty]: '=' a=expression { a } decorators[asdl_seq*]: a=('@' f=named_expression NEWLINE { f })+ { a } class_def[stmt_ty]: | a=decorators b=class_def_raw { _PyPegen_class_def_decorators(p, a, b) } | class_def_raw class_def_raw[stmt_ty]: | 'class' a=NAME b=['(' z=[arguments] ')' { z }] ':' c=block { _Py_ClassDef(a->v.Name.id, (b) ? ((expr_ty) b)->v.Call.args : NULL, (b) ? ((expr_ty) b)->v.Call.keywords : NULL, c, NULL, EXTRA) } block[asdl_seq*] (memo): | NEWLINE INDENT a=statements DEDENT { a } | simple_stmt | invalid_block star_expressions[expr_ty]: | a=star_expression b=(',' c=star_expression { c })+ [','] { _Py_Tuple(CHECK(_PyPegen_seq_insert_in_front(p, a, b)), Load, EXTRA) } | a=star_expression ',' { _Py_Tuple(CHECK(_PyPegen_singleton_seq(p, a)), Load, EXTRA) } | star_expression star_expression[expr_ty] (memo): | '*' a=bitwise_or { _Py_Starred(a, Load, EXTRA) } | expression star_named_expressions[asdl_seq*]: a=','.star_named_expression+ [','] { a } star_named_expression[expr_ty]: | '*' a=bitwise_or { _Py_Starred(a, Load, EXTRA) } | named_expression named_expression[expr_ty]: | a=NAME ':=' ~ b=expression { _Py_NamedExpr(CHECK(_PyPegen_set_expr_context(p, a, Store)), b, EXTRA) } | expression !':=' | invalid_named_expression annotated_rhs[expr_ty]: yield_expr | star_expressions expressions[expr_ty]: | a=expression b=(',' c=expression { c })+ [','] { _Py_Tuple(CHECK(_PyPegen_seq_insert_in_front(p, a, b)), Load, EXTRA) } | a=expression ',' { _Py_Tuple(CHECK(_PyPegen_singleton_seq(p, a)), Load, EXTRA) } | expression expression[expr_ty] (memo): | a=disjunction 'if' b=disjunction 'else' c=expression { _Py_IfExp(b, a, c, EXTRA) } | disjunction | lambdef lambdef[expr_ty]: | 'lambda' a=[lambda_params] ':' b=expression { _Py_Lambda((a) ? a : CHECK(_PyPegen_empty_arguments(p)), b, EXTRA) } lambda_params[arguments_ty]: | invalid_lambda_parameters | lambda_parameters # lambda_parameters etc. duplicates parameters but without annotations # or type comments, and if there's no comma after a parameter, we expect # a colon, not a close parenthesis. (For more, see parameters above.) # lambda_parameters[arguments_ty]: | a=lambda_slash_no_default b=lambda_param_no_default* c=lambda_param_with_default* d=[lambda_star_etc] { _PyPegen_make_arguments(p, a, NULL, b, c, d) } | a=lambda_slash_with_default b=lambda_param_with_default* c=[lambda_star_etc] { _PyPegen_make_arguments(p, NULL, a, NULL, b, c) } | a=lambda_param_no_default+ b=lambda_param_with_default* c=[lambda_star_etc] { _PyPegen_make_arguments(p, NULL, NULL, a, b, c) } | a=lambda_param_with_default+ b=[lambda_star_etc] { _PyPegen_make_arguments(p, NULL, NULL, NULL, a, b)} | a=lambda_star_etc { _PyPegen_make_arguments(p, NULL, NULL, NULL, NULL, a) } lambda_slash_no_default[asdl_seq*]: | a=lambda_param_no_default+ '/' ',' { a } | a=lambda_param_no_default+ '/' &':' { a } lambda_slash_with_default[SlashWithDefault*]: | a=lambda_param_no_default* b=lambda_param_with_default+ '/' ',' { _PyPegen_slash_with_default(p, a, b) } | a=lambda_param_no_default* b=lambda_param_with_default+ '/' &':' { _PyPegen_slash_with_default(p, a, b) } lambda_star_etc[StarEtc*]: | '*' a=lambda_param_no_default b=lambda_param_maybe_default* c=[lambda_kwds] { _PyPegen_star_etc(p, a, b, c) } | '*' ',' b=lambda_param_maybe_default+ c=[lambda_kwds] { _PyPegen_star_etc(p, NULL, b, c) } | a=lambda_kwds { _PyPegen_star_etc(p, NULL, NULL, a) } | invalid_lambda_star_etc lambda_kwds[arg_ty]: '**' a=lambda_param_no_default { a } lambda_param_no_default[arg_ty]: | a=lambda_param ',' { a } | a=lambda_param &':' { a } lambda_param_with_default[NameDefaultPair*]: | a=lambda_param c=default ',' { _PyPegen_name_default_pair(p, a, c, NULL) } | a=lambda_param c=default &':' { _PyPegen_name_default_pair(p, a, c, NULL) } lambda_param_maybe_default[NameDefaultPair*]: | a=lambda_param c=default? ',' { _PyPegen_name_default_pair(p, a, c, NULL) } | a=lambda_param c=default? &':' { _PyPegen_name_default_pair(p, a, c, NULL) } lambda_param[arg_ty]: a=NAME { _Py_arg(a->v.Name.id, NULL, NULL, EXTRA) } disjunction[expr_ty] (memo): | a=conjunction b=('or' c=conjunction { c })+ { _Py_BoolOp( Or, CHECK(_PyPegen_seq_insert_in_front(p, a, b)), EXTRA) } | conjunction conjunction[expr_ty] (memo): | a=inversion b=('and' c=inversion { c })+ { _Py_BoolOp( And, CHECK(_PyPegen_seq_insert_in_front(p, a, b)), EXTRA) } | inversion inversion[expr_ty] (memo): | 'not' a=inversion { _Py_UnaryOp(Not, a, EXTRA) } | comparison comparison[expr_ty]: | a=bitwise_or b=compare_op_bitwise_or_pair+ { _Py_Compare(a, CHECK(_PyPegen_get_cmpops(p, b)), CHECK(_PyPegen_get_exprs(p, b)), EXTRA) } | bitwise_or compare_op_bitwise_or_pair[CmpopExprPair*]: | eq_bitwise_or | noteq_bitwise_or | lte_bitwise_or | lt_bitwise_or | gte_bitwise_or | gt_bitwise_or | notin_bitwise_or | in_bitwise_or | isnot_bitwise_or | is_bitwise_or eq_bitwise_or[CmpopExprPair*]: '==' a=bitwise_or { _PyPegen_cmpop_expr_pair(p, Eq, a) } noteq_bitwise_or[CmpopExprPair*]: | (tok='!=' { _PyPegen_check_barry_as_flufl(p, tok) ? NULL : tok}) a=bitwise_or {_PyPegen_cmpop_expr_pair(p, NotEq, a) } lte_bitwise_or[CmpopExprPair*]: '<=' a=bitwise_or { _PyPegen_cmpop_expr_pair(p, LtE, a) } lt_bitwise_or[CmpopExprPair*]: '<' a=bitwise_or { _PyPegen_cmpop_expr_pair(p, Lt, a) } gte_bitwise_or[CmpopExprPair*]: '>=' a=bitwise_or { _PyPegen_cmpop_expr_pair(p, GtE, a) } gt_bitwise_or[CmpopExprPair*]: '>' a=bitwise_or { _PyPegen_cmpop_expr_pair(p, Gt, a) } notin_bitwise_or[CmpopExprPair*]: 'not' 'in' a=bitwise_or { _PyPegen_cmpop_expr_pair(p, NotIn, a) } in_bitwise_or[CmpopExprPair*]: 'in' a=bitwise_or { _PyPegen_cmpop_expr_pair(p, In, a) } isnot_bitwise_or[CmpopExprPair*]: 'is' 'not' a=bitwise_or { _PyPegen_cmpop_expr_pair(p, IsNot, a) } is_bitwise_or[CmpopExprPair*]: 'is' a=bitwise_or { _PyPegen_cmpop_expr_pair(p, Is, a) } bitwise_or[expr_ty]: | a=bitwise_or '|' b=bitwise_xor { _Py_BinOp(a, BitOr, b, EXTRA) } | bitwise_xor bitwise_xor[expr_ty]: | a=bitwise_xor '^' b=bitwise_and { _Py_BinOp(a, BitXor, b, EXTRA) } | bitwise_and bitwise_and[expr_ty]: | a=bitwise_and '&' b=shift_expr { _Py_BinOp(a, BitAnd, b, EXTRA) } | shift_expr shift_expr[expr_ty]: | a=shift_expr '<<' b=sum { _Py_BinOp(a, LShift, b, EXTRA) } | a=shift_expr '>>' b=sum { _Py_BinOp(a, RShift, b, EXTRA) } | sum sum[expr_ty]: | a=sum '+' b=term { _Py_BinOp(a, Add, b, EXTRA) } | a=sum '-' b=term { _Py_BinOp(a, Sub, b, EXTRA) } | term term[expr_ty]: | a=term '*' b=factor { _Py_BinOp(a, Mult, b, EXTRA) } | a=term '/' b=factor { _Py_BinOp(a, Div, b, EXTRA) } | a=term '//' b=factor { _Py_BinOp(a, FloorDiv, b, EXTRA) } | a=term '%' b=factor { _Py_BinOp(a, Mod, b, EXTRA) } | a=term '@' b=factor { CHECK_VERSION(5, "The '@' operator is", _Py_BinOp(a, MatMult, b, EXTRA)) } | factor factor[expr_ty] (memo): | '+' a=factor { _Py_UnaryOp(UAdd, a, EXTRA) } | '-' a=factor { _Py_UnaryOp(USub, a, EXTRA) } | '~' a=factor { _Py_UnaryOp(Invert, a, EXTRA) } | power power[expr_ty]: | a=await_primary '**' b=factor { _Py_BinOp(a, Pow, b, EXTRA) } | await_primary await_primary[expr_ty] (memo): | AWAIT a=primary { CHECK_VERSION(5, "Await expressions are", _Py_Await(a, EXTRA)) } | primary primary[expr_ty]: | invalid_primary # must be before 'primay genexp' because of invalid_genexp | a=primary '.' b=NAME { _Py_Attribute(a, b->v.Name.id, Load, EXTRA) } | a=primary b=genexp { _Py_Call(a, CHECK(_PyPegen_singleton_seq(p, b)), NULL, EXTRA) } | a=primary '(' b=[arguments] ')' { _Py_Call(a, (b) ? ((expr_ty) b)->v.Call.args : NULL, (b) ? ((expr_ty) b)->v.Call.keywords : NULL, EXTRA) } | a=primary '[' b=slices ']' { _Py_Subscript(a, b, Load, EXTRA) } | atom slices[expr_ty]: | a=slice !',' { a } | a=','.slice+ [','] { _Py_Tuple(a, Load, EXTRA) } slice[expr_ty]: | a=[expression] ':' b=[expression] c=[':' d=[expression] { d }] { _Py_Slice(a, b, c, EXTRA) } | a=expression { a } atom[expr_ty]: | NAME | 'True' { _Py_Constant(Py_True, NULL, EXTRA) } | 'False' { _Py_Constant(Py_False, NULL, EXTRA) } | 'None' { _Py_Constant(Py_None, NULL, EXTRA) } | '__peg_parser__' { RAISE_SYNTAX_ERROR("You found it!") } | &STRING strings | NUMBER | &'(' (tuple | group | genexp) | &'[' (list | listcomp) | &'{' (dict | set | dictcomp | setcomp) | '...' { _Py_Constant(Py_Ellipsis, NULL, EXTRA) } strings[expr_ty] (memo): a=STRING+ { _PyPegen_concatenate_strings(p, a) } list[expr_ty]: | '[' a=[star_named_expressions] ']' { _Py_List(a, Load, EXTRA) } listcomp[expr_ty]: | '[' a=named_expression ~ b=for_if_clauses ']' { _Py_ListComp(a, b, EXTRA) } | invalid_comprehension tuple[expr_ty]: | '(' a=[y=star_named_expression ',' z=[star_named_expressions] { _PyPegen_seq_insert_in_front(p, y, z) } ] ')' { _Py_Tuple(a, Load, EXTRA) } group[expr_ty]: | '(' a=(yield_expr | named_expression) ')' { a } | invalid_group genexp[expr_ty]: | '(' a=named_expression ~ b=for_if_clauses ')' { _Py_GeneratorExp(a, b, EXTRA) } | invalid_comprehension set[expr_ty]: '{' a=star_named_expressions '}' { _Py_Set(a, EXTRA) } setcomp[expr_ty]: | '{' a=named_expression ~ b=for_if_clauses '}' { _Py_SetComp(a, b, EXTRA) } | invalid_comprehension dict[expr_ty]: | '{' a=[double_starred_kvpairs] '}' { _Py_Dict(CHECK(_PyPegen_get_keys(p, a)), CHECK(_PyPegen_get_values(p, a)), EXTRA) } dictcomp[expr_ty]: | '{' a=kvpair b=for_if_clauses '}' { _Py_DictComp(a->key, a->value, b, EXTRA) } | invalid_dict_comprehension double_starred_kvpairs[asdl_seq*]: a=','.double_starred_kvpair+ [','] { a } double_starred_kvpair[KeyValuePair*]: | '**' a=bitwise_or { _PyPegen_key_value_pair(p, NULL, a) } | kvpair kvpair[KeyValuePair*]: a=expression ':' b=expression { _PyPegen_key_value_pair(p, a, b) } for_if_clauses[asdl_seq*]: | for_if_clause+ for_if_clause[comprehension_ty]: | ASYNC 'for' a=star_targets 'in' ~ b=disjunction c=('if' z=disjunction { z })* { CHECK_VERSION(6, "Async comprehensions are", _Py_comprehension(a, b, c, 1, p->arena)) } | 'for' a=star_targets 'in' ~ b=disjunction c=('if' z=disjunction { z })* { _Py_comprehension(a, b, c, 0, p->arena) } | invalid_for_target yield_expr[expr_ty]: | 'yield' 'from' a=expression { _Py_YieldFrom(a, EXTRA) } | 'yield' a=[star_expressions] { _Py_Yield(a, EXTRA) } arguments[expr_ty] (memo): | a=args [','] &')' { a } | invalid_arguments args[expr_ty]: | a=','.(starred_expression | named_expression !'=')+ b=[',' k=kwargs {k}] { _PyPegen_collect_call_seqs(p, a, b, EXTRA) } | a=kwargs { _Py_Call(_PyPegen_dummy_name(p), CHECK_NULL_ALLOWED(_PyPegen_seq_extract_starred_exprs(p, a)), CHECK_NULL_ALLOWED(_PyPegen_seq_delete_starred_exprs(p, a)), EXTRA) } kwargs[asdl_seq*]: | a=','.kwarg_or_starred+ ',' b=','.kwarg_or_double_starred+ { _PyPegen_join_sequences(p, a, b) } | ','.kwarg_or_starred+ | ','.kwarg_or_double_starred+ starred_expression[expr_ty]: | '*' a=expression { _Py_Starred(a, Load, EXTRA) } kwarg_or_starred[KeywordOrStarred*]: | a=NAME '=' b=expression { _PyPegen_keyword_or_starred(p, CHECK(_Py_keyword(a->v.Name.id, b, EXTRA)), 1) } | a=starred_expression { _PyPegen_keyword_or_starred(p, a, 0) } | invalid_kwarg kwarg_or_double_starred[KeywordOrStarred*]: | a=NAME '=' b=expression { _PyPegen_keyword_or_starred(p, CHECK(_Py_keyword(a->v.Name.id, b, EXTRA)), 1) } | '**' a=expression { _PyPegen_keyword_or_starred(p, CHECK(_Py_keyword(NULL, a, EXTRA)), 1) } | invalid_kwarg # NOTE: star_targets may contain *bitwise_or, targets may not. star_targets[expr_ty]: | a=star_target !',' { a } | a=star_target b=(',' c=star_target { c })* [','] { _Py_Tuple(CHECK(_PyPegen_seq_insert_in_front(p, a, b)), Store, EXTRA) } star_targets_list_seq[asdl_seq*]: a=','.star_target+ [','] { a } star_targets_tuple_seq[asdl_seq*]: | a=star_target b=(',' c=star_target { c })+ [','] { _PyPegen_seq_insert_in_front(p, a, b) } | a=star_target ',' { _PyPegen_singleton_seq(p, a) } star_target[expr_ty] (memo): | '*' a=(!'*' star_target) { _Py_Starred(CHECK(_PyPegen_set_expr_context(p, a, Store)), Store, EXTRA) } | target_with_star_atom target_with_star_atom[expr_ty] (memo): | a=t_primary '.' b=NAME !t_lookahead { _Py_Attribute(a, b->v.Name.id, Store, EXTRA) } | a=t_primary '[' b=slices ']' !t_lookahead { _Py_Subscript(a, b, Store, EXTRA) } | star_atom star_atom[expr_ty]: | a=NAME { _PyPegen_set_expr_context(p, a, Store) } | '(' a=target_with_star_atom ')' { _PyPegen_set_expr_context(p, a, Store) } | '(' a=[star_targets_tuple_seq] ')' { _Py_Tuple(a, Store, EXTRA) } | '[' a=[star_targets_list_seq] ']' { _Py_List(a, Store, EXTRA) } single_target[expr_ty]: | single_subscript_attribute_target | a=NAME { _PyPegen_set_expr_context(p, a, Store) } | '(' a=single_target ')' { a } single_subscript_attribute_target[expr_ty]: | a=t_primary '.' b=NAME !t_lookahead { _Py_Attribute(a, b->v.Name.id, Store, EXTRA) } | a=t_primary '[' b=slices ']' !t_lookahead { _Py_Subscript(a, b, Store, EXTRA) } del_targets[asdl_seq*]: a=','.del_target+ [','] { a } del_target[expr_ty] (memo): | a=t_primary '.' b=NAME !t_lookahead { _Py_Attribute(a, b->v.Name.id, Del, EXTRA) } | a=t_primary '[' b=slices ']' !t_lookahead { _Py_Subscript(a, b, Del, EXTRA) } | del_t_atom del_t_atom[expr_ty]: | a=NAME { _PyPegen_set_expr_context(p, a, Del) } | '(' a=del_target ')' { _PyPegen_set_expr_context(p, a, Del) } | '(' a=[del_targets] ')' { _Py_Tuple(a, Del, EXTRA) } | '[' a=[del_targets] ']' { _Py_List(a, Del, EXTRA) } t_primary[expr_ty]: | a=t_primary '.' b=NAME &t_lookahead { _Py_Attribute(a, b->v.Name.id, Load, EXTRA) } | a=t_primary '[' b=slices ']' &t_lookahead { _Py_Subscript(a, b, Load, EXTRA) } | a=t_primary b=genexp &t_lookahead { _Py_Call(a, CHECK(_PyPegen_singleton_seq(p, b)), NULL, EXTRA) } | a=t_primary '(' b=[arguments] ')' &t_lookahead { _Py_Call(a, (b) ? ((expr_ty) b)->v.Call.args : NULL, (b) ? ((expr_ty) b)->v.Call.keywords : NULL, EXTRA) } | a=atom &t_lookahead { a } t_lookahead: '(' | '[' | '.' # From here on, there are rules for invalid syntax with specialised error messages invalid_arguments: | args ',' '*' { RAISE_SYNTAX_ERROR("iterable argument unpacking follows keyword argument unpacking") } | a=expression for_if_clauses ',' [args | expression for_if_clauses] { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "Generator expression must be parenthesized") } | a=args for_if_clauses { _PyPegen_nonparen_genexp_in_call(p, a) } | args ',' a=expression for_if_clauses { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "Generator expression must be parenthesized") } | a=args ',' args { _PyPegen_arguments_parsing_error(p, a) } invalid_kwarg: | !(NAME '=') a=expression b='=' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION( a, "expression cannot contain assignment, perhaps you meant \"==\"?") } invalid_named_expression: | a=expression ':=' expression { RAISE_SYNTAX_ERROR_KNOWN_LOCATION( a, "cannot use assignment expressions with %s", _PyPegen_get_expr_name(a)) } invalid_assignment: | a=invalid_ann_assign_target ':' expression { RAISE_SYNTAX_ERROR_KNOWN_LOCATION( a, "only single target (not %s) can be annotated", _PyPegen_get_expr_name(a) )} | a=star_named_expression ',' star_named_expressions* ':' expression { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "only single target (not tuple) can be annotated") } | a=expression ':' expression { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "illegal target for annotation") } | (star_targets '=')* a=star_expressions '=' { RAISE_SYNTAX_ERROR_INVALID_TARGET(STAR_TARGETS, a) } | (star_targets '=')* a=yield_expr '=' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "assignment to yield expression not possible") } | a=star_expressions augassign (yield_expr | star_expressions) { RAISE_SYNTAX_ERROR_KNOWN_LOCATION( a, "'%s' is an illegal expression for augmented assignment", _PyPegen_get_expr_name(a) )} invalid_ann_assign_target[expr_ty]: | list | tuple | '(' a=invalid_ann_assign_target ')' { a } invalid_del_stmt: | 'del' a=star_expressions { RAISE_SYNTAX_ERROR_INVALID_TARGET(DEL_TARGETS, a) } invalid_block: | NEWLINE !INDENT { RAISE_INDENTATION_ERROR("expected an indented block") } invalid_primary: | primary a='{' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "invalid syntax") } invalid_comprehension: | ('[' | '(' | '{') a=starred_expression for_if_clauses { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "iterable unpacking cannot be used in comprehension") } invalid_dict_comprehension: | '{' a='**' bitwise_or for_if_clauses '}' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "dict unpacking cannot be used in dict comprehension") } invalid_parameters: | param_no_default* (slash_with_default | param_with_default+) param_no_default { RAISE_SYNTAX_ERROR("non-default argument follows default argument") } invalid_lambda_parameters: | lambda_param_no_default* (lambda_slash_with_default | lambda_param_with_default+) lambda_param_no_default { RAISE_SYNTAX_ERROR("non-default argument follows default argument") } invalid_star_etc: | '*' (')' | ',' (')' | '**')) { RAISE_SYNTAX_ERROR("named arguments must follow bare *") } | '*' ',' TYPE_COMMENT { RAISE_SYNTAX_ERROR("bare * has associated type comment") } invalid_lambda_star_etc: | '*' (':' | ',' (':' | '**')) { RAISE_SYNTAX_ERROR("named arguments must follow bare *") } invalid_double_type_comments: | TYPE_COMMENT NEWLINE TYPE_COMMENT NEWLINE INDENT { RAISE_SYNTAX_ERROR("Cannot have two type comments on def") } invalid_with_item: | expression 'as' a=expression { RAISE_SYNTAX_ERROR_INVALID_TARGET(STAR_TARGETS, a) } invalid_for_target: | ASYNC? 'for' a=star_expressions { RAISE_SYNTAX_ERROR_INVALID_TARGET(FOR_TARGETS, a) } invalid_group: | '(' a=starred_expression ')' { RAISE_SYNTAX_ERROR_KNOWN_LOCATION(a, "can't use starred expression here") } invalid_import_from_targets: | import_from_as_names ',' NEWLINE { RAISE_SYNTAX_ERROR("trailing comma not allowed without surrounding parentheses") } 5. The import system ******************** Python code in one *module* gains access to the code in another module by the process of *importing* it. The "import" statement is the most common way of invoking the import machinery, but it is not the only way. Functions such as "importlib.import_module()" and built-in "__import__()" can also be used to invoke the import machinery. The "import" statement combines two operations; it searches for the named module, then it binds the results of that search to a name in the local scope. The search operation of the "import" statement is defined as a call to the "__import__()" function, with the appropriate arguments. The return value of "__import__()" is used to perform the name binding operation of the "import" statement. See the "import" statement for the exact details of that name binding operation. A direct call to "__import__()" performs only the module search and, if found, the module creation operation. While certain side-effects may occur, such as the importing of parent packages, and the updating of various caches (including "sys.modules"), only the "import" statement performs a name binding operation. When an "import" statement is executed, the standard builtin "__import__()" function is called. Other mechanisms for invoking the import system (such as "importlib.import_module()") may choose to bypass "__import__()" and use their own solutions to implement import semantics. When a module is first imported, Python searches for the module and if found, it creates a module object [1], initializing it. If the named module cannot be found, a "ModuleNotFoundError" is raised. Python implements various strategies to search for the named module when the import machinery is invoked. These strategies can be modified and extended by using various hooks described in the sections below. Changed in version 3.3: The import system has been updated to fully implement the second phase of **PEP 302**. There is no longer any implicit import machinery - the full import system is exposed through "sys.meta_path". In addition, native namespace package support has been implemented (see **PEP 420**). 5.1. "importlib" ================ The "importlib" module provides a rich API for interacting with the import system. For example "importlib.import_module()" provides a recommended, simpler API than built-in "__import__()" for invoking the import machinery. Refer to the "importlib" library documentation for additional detail. 5.2. Packages ============= Python has only one type of module object, and all modules are of this type, regardless of whether the module is implemented in Python, C, or something else. To help organize modules and provide a naming hierarchy, Python has a concept of *packages*. You can think of packages as the directories on a file system and modules as files within directories, but don’t take this analogy too literally since packages and modules need not originate from the file system. For the purposes of this documentation, we’ll use this convenient analogy of directories and files. Like file system directories, packages are organized hierarchically, and packages may themselves contain subpackages, as well as regular modules. It’s important to keep in mind that all packages are modules, but not all modules are packages. Or put another way, packages are just a special kind of module. Specifically, any module that contains a "__path__" attribute is considered a package. All modules have a name. Subpackage names are separated from their parent package name by a dot, akin to Python’s standard attribute access syntax. Thus you might have a package called "email", which in turn has a subpackage called "email.mime" and a module within that subpackage called "email.mime.text". 5.2.1. Regular packages ----------------------- Python defines two types of packages, *regular packages* and *namespace packages*. Regular packages are traditional packages as they existed in Python 3.2 and earlier. A regular package is typically implemented as a directory containing an "__init__.py" file. When a regular package is imported, this "__init__.py" file is implicitly executed, and the objects it defines are bound to names in the package’s namespace. The "__init__.py" file can contain the same Python code that any other module can contain, and Python will add some additional attributes to the module when it is imported. For example, the following file system layout defines a top level "parent" package with three subpackages: parent/ __init__.py one/ __init__.py two/ __init__.py three/ __init__.py Importing "parent.one" will implicitly execute "parent/__init__.py" and "parent/one/__init__.py". Subsequent imports of "parent.two" or "parent.three" will execute "parent/two/__init__.py" and "parent/three/__init__.py" respectively. 5.2.2. Namespace packages ------------------------- A namespace package is a composite of various *portions*, where each portion contributes a subpackage to the parent package. Portions may reside in different locations on the file system. Portions may also be found in zip files, on the network, or anywhere else that Python searches during import. Namespace packages may or may not correspond directly to objects on the file system; they may be virtual modules that have no concrete representation. Namespace packages do not use an ordinary list for their "__path__" attribute. They instead use a custom iterable type which will automatically perform a new search for package portions on the next import attempt within that package if the path of their parent package (or "sys.path" for a top level package) changes. With namespace packages, there is no "parent/__init__.py" file. In fact, there may be multiple "parent" directories found during import search, where each one is provided by a different portion. Thus "parent/one" may not be physically located next to "parent/two". In this case, Python will create a namespace package for the top-level "parent" package whenever it or one of its subpackages is imported. See also **PEP 420** for the namespace package specification. 5.3. Searching ============== To begin the search, Python needs the *fully qualified* name of the module (or package, but for the purposes of this discussion, the difference is immaterial) being imported. This name may come from various arguments to the "import" statement, or from the parameters to the "importlib.import_module()" or "__import__()" functions. This name will be used in various phases of the import search, and it may be the dotted path to a submodule, e.g. "foo.bar.baz". In this case, Python first tries to import "foo", then "foo.bar", and finally "foo.bar.baz". If any of the intermediate imports fail, a "ModuleNotFoundError" is raised. 5.3.1. The module cache ----------------------- The first place checked during import search is "sys.modules". This mapping serves as a cache of all modules that have been previously imported, including the intermediate paths. So if "foo.bar.baz" was previously imported, "sys.modules" will contain entries for "foo", "foo.bar", and "foo.bar.baz". Each key will have as its value the corresponding module object. During import, the module name is looked up in "sys.modules" and if present, the associated value is the module satisfying the import, and the process completes. However, if the value is "None", then a "ModuleNotFoundError" is raised. If the module name is missing, Python will continue searching for the module. "sys.modules" is writable. Deleting a key may not destroy the associated module (as other modules may hold references to it), but it will invalidate the cache entry for the named module, causing Python to search anew for the named module upon its next import. The key can also be assigned to "None", forcing the next import of the module to result in a "ModuleNotFoundError". Beware though, as if you keep a reference to the module object, invalidate its cache entry in "sys.modules", and then re-import the named module, the two module objects will *not* be the same. By contrast, "importlib.reload()" will reuse the *same* module object, and simply reinitialise the module contents by rerunning the module’s code. 5.3.2. Finders and loaders -------------------------- If the named module is not found in "sys.modules", then Python’s import protocol is invoked to find and load the module. This protocol consists of two conceptual objects, *finders* and *loaders*. A finder’s job is to determine whether it can find the named module using whatever strategy it knows about. Objects that implement both of these interfaces are referred to as *importers* - they return themselves when they find that they can load the requested module. Python includes a number of default finders and importers. The first one knows how to locate built-in modules, and the second knows how to locate frozen modules. A third default finder searches an *import path* for modules. The *import path* is a list of locations that may name file system paths or zip files. It can also be extended to search for any locatable resource, such as those identified by URLs. The import machinery is extensible, so new finders can be added to extend the range and scope of module searching. Finders do not actually load modules. If they can find the named module, they return a *module spec*, an encapsulation of the module’s import-related information, which the import machinery then uses when loading the module. The following sections describe the protocol for finders and loaders in more detail, including how you can create and register new ones to extend the import machinery. Changed in version 3.4: In previous versions of Python, finders returned *loaders* directly, whereas now they return module specs which *contain* loaders. Loaders are still used during import but have fewer responsibilities. 5.3.3. Import hooks ------------------- The import machinery is designed to be extensible; the primary mechanism for this are the *import hooks*. There are two types of import hooks: *meta hooks* and *import path hooks*. Meta hooks are called at the start of import processing, before any other import processing has occurred, other than "sys.modules" cache look up. This allows meta hooks to override "sys.path" processing, frozen modules, or even built-in modules. Meta hooks are registered by adding new finder objects to "sys.meta_path", as described below. Import path hooks are called as part of "sys.path" (or "package.__path__") processing, at the point where their associated path item is encountered. Import path hooks are registered by adding new callables to "sys.path_hooks" as described below. 5.3.4. The meta path -------------------- When the named module is not found in "sys.modules", Python next searches "sys.meta_path", which contains a list of meta path finder objects. These finders are queried in order to see if they know how to handle the named module. Meta path finders must implement a method called "find_spec()" which takes three arguments: a name, an import path, and (optionally) a target module. The meta path finder can use any strategy it wants to determine whether it can handle the named module or not. If the meta path finder knows how to handle the named module, it returns a spec object. If it cannot handle the named module, it returns "None". If "sys.meta_path" processing reaches the end of its list without returning a spec, then a "ModuleNotFoundError" is raised. Any other exceptions raised are simply propagated up, aborting the import process. The "find_spec()" method of meta path finders is called with two or three arguments. The first is the fully qualified name of the module being imported, for example "foo.bar.baz". The second argument is the path entries to use for the module search. For top-level modules, the second argument is "None", but for submodules or subpackages, the second argument is the value of the parent package’s "__path__" attribute. If the appropriate "__path__" attribute cannot be accessed, a "ModuleNotFoundError" is raised. The third argument is an existing module object that will be the target of loading later. The import system passes in a target module only during reload. The meta path may be traversed multiple times for a single import request. For example, assuming none of the modules involved has already been cached, importing "foo.bar.baz" will first perform a top level import, calling "mpf.find_spec("foo", None, None)" on each meta path finder ("mpf"). After "foo" has been imported, "foo.bar" will be imported by traversing the meta path a second time, calling "mpf.find_spec("foo.bar", foo.__path__, None)". Once "foo.bar" has been imported, the final traversal will call "mpf.find_spec("foo.bar.baz", foo.bar.__path__, None)". Some meta path finders only support top level imports. These importers will always return "None" when anything other than "None" is passed as the second argument. Python’s default "sys.meta_path" has three meta path finders, one that knows how to import built-in modules, one that knows how to import frozen modules, and one that knows how to import modules from an *import path* (i.e. the *path based finder*). Changed in version 3.4: The "find_spec()" method of meta path finders replaced "find_module()", which is now deprecated. While it will continue to work without change, the import machinery will try it only if the finder does not implement "find_spec()". 5.4. Loading ============ If and when a module spec is found, the import machinery will use it (and the loader it contains) when loading the module. Here is an approximation of what happens during the loading portion of import: module = None if spec.loader is not None and hasattr(spec.loader, 'create_module'): # It is assumed 'exec_module' will also be defined on the loader. module = spec.loader.create_module(spec) if module is None: module = ModuleType(spec.name) # The import-related module attributes get set here: _init_module_attrs(spec, module) if spec.loader is None: # unsupported raise ImportError if spec.origin is None and spec.submodule_search_locations is not None: # namespace package sys.modules[spec.name] = module elif not hasattr(spec.loader, 'exec_module'): module = spec.loader.load_module(spec.name) # Set __loader__ and __package__ if missing. else: sys.modules[spec.name] = module try: spec.loader.exec_module(module) except BaseException: try: del sys.modules[spec.name] except KeyError: pass raise return sys.modules[spec.name] Note the following details: * If there is an existing module object with the given name in "sys.modules", import will have already returned it. * The module will exist in "sys.modules" before the loader executes the module code. This is crucial because the module code may (directly or indirectly) import itself; adding it to "sys.modules" beforehand prevents unbounded recursion in the worst case and multiple loading in the best. * If loading fails, the failing module – and only the failing module – gets removed from "sys.modules". Any module already in the "sys.modules" cache, and any module that was successfully loaded as a side-effect, must remain in the cache. This contrasts with reloading where even the failing module is left in "sys.modules". * After the module is created but before execution, the import machinery sets the import-related module attributes (“_init_module_attrs” in the pseudo-code example above), as summarized in a later section. * Module execution is the key moment of loading in which the module’s namespace gets populated. Execution is entirely delegated to the loader, which gets to decide what gets populated and how. * The module created during loading and passed to exec_module() may not be the one returned at the end of import [2]. Changed in version 3.4: The import system has taken over the boilerplate responsibilities of loaders. These were previously performed by the "importlib.abc.Loader.load_module()" method. 5.4.1. Loaders -------------- Module loaders provide the critical function of loading: module execution. The import machinery calls the "importlib.abc.Loader.exec_module()" method with a single argument, the module object to execute. Any value returned from "exec_module()" is ignored. Loaders must satisfy the following requirements: * If the module is a Python module (as opposed to a built-in module or a dynamically loaded extension), the loader should execute the module’s code in the module’s global name space ("module.__dict__"). * If the loader cannot execute the module, it should raise an "ImportError", although any other exception raised during "exec_module()" will be propagated. In many cases, the finder and loader can be the same object; in such cases the "find_spec()" method would just return a spec with the loader set to "self". Module loaders may opt in to creating the module object during loading by implementing a "create_module()" method. It takes one argument, the module spec, and returns the new module object to use during loading. "create_module()" does not need to set any attributes on the module object. If the method returns "None", the import machinery will create the new module itself. New in version 3.4: The "create_module()" method of loaders. Changed in version 3.4: The "load_module()" method was replaced by "exec_module()" and the import machinery assumed all the boilerplate responsibilities of loading.For compatibility with existing loaders, the import machinery will use the "load_module()" method of loaders if it exists and the loader does not also implement "exec_module()". However, "load_module()" has been deprecated and loaders should implement "exec_module()" instead.The "load_module()" method must implement all the boilerplate loading functionality described above in addition to executing the module. All the same constraints apply, with some additional clarification: * If there is an existing module object with the given name in "sys.modules", the loader must use that existing module. (Otherwise, "importlib.reload()" will not work correctly.) If the named module does not exist in "sys.modules", the loader must create a new module object and add it to "sys.modules". * The module *must* exist in "sys.modules" before the loader executes the module code, to prevent unbounded recursion or multiple loading. * If loading fails, the loader must remove any modules it has inserted into "sys.modules", but it must remove **only** the failing module(s), and only if the loader itself has loaded the module(s) explicitly. Changed in version 3.5: A "DeprecationWarning" is raised when "exec_module()" is defined but "create_module()" is not. Changed in version 3.6: An "ImportError" is raised when "exec_module()" is defined but "create_module()" is not. 5.4.2. Submodules ----------------- When a submodule is loaded using any mechanism (e.g. "importlib" APIs, the "import" or "import-from" statements, or built-in "__import__()") a binding is placed in the parent module’s namespace to the submodule object. For example, if package "spam" has a submodule "foo", after importing "spam.foo", "spam" will have an attribute "foo" which is bound to the submodule. Let’s say you have the following directory structure: spam/ __init__.py foo.py and "spam/__init__.py" has the following line in it: from .foo import Foo then executing the following puts name bindings for "foo" and "Foo" in the "spam" module: >>> import spam >>> spam.foo >>> spam.Foo Given Python’s familiar name binding rules this might seem surprising, but it’s actually a fundamental feature of the import system. The invariant holding is that if you have "sys.modules['spam']" and "sys.modules['spam.foo']" (as you would after the above import), the latter must appear as the "foo" attribute of the former. 5.4.3. Module spec ------------------ The import machinery uses a variety of information about each module during import, especially before loading. Most of the information is common to all modules. The purpose of a module’s spec is to encapsulate this import-related information on a per-module basis. Using a spec during import allows state to be transferred between import system components, e.g. between the finder that creates the module spec and the loader that executes it. Most importantly, it allows the import machinery to perform the boilerplate operations of loading, whereas without a module spec the loader had that responsibility. The module’s spec is exposed as the "__spec__" attribute on a module object. See "ModuleSpec" for details on the contents of the module spec. New in version 3.4. 5.4.4. Import-related module attributes --------------------------------------- The import machinery fills in these attributes on each module object during loading, based on the module’s spec, before the loader executes the module. __name__ The "__name__" attribute must be set to the fully-qualified name of the module. This name is used to uniquely identify the module in the import system. __loader__ The "__loader__" attribute must be set to the loader object that the import machinery used when loading the module. This is mostly for introspection, but can be used for additional loader-specific functionality, for example getting data associated with a loader. __package__ The module’s "__package__" attribute must be set. Its value must be a string, but it can be the same value as its "__name__". When the module is a package, its "__package__" value should be set to its "__name__". When the module is not a package, "__package__" should be set to the empty string for top-level modules, or for submodules, to the parent package’s name. See **PEP 366** for further details. This attribute is used instead of "__name__" to calculate explicit relative imports for main modules, as defined in **PEP 366**. It is expected to have the same value as "__spec__.parent". Changed in version 3.6: The value of "__package__" is expected to be the same as "__spec__.parent". __spec__ The "__spec__" attribute must be set to the module spec that was used when importing the module. Setting "__spec__" appropriately applies equally to modules initialized during interpreter startup. The one exception is "__main__", where "__spec__" is set to None in some cases. When "__package__" is not defined, "__spec__.parent" is used as a fallback. New in version 3.4. Changed in version 3.6: "__spec__.parent" is used as a fallback when "__package__" is not defined. __path__ If the module is a package (either regular or namespace), the module object’s "__path__" attribute must be set. The value must be iterable, but may be empty if "__path__" has no further significance. If "__path__" is not empty, it must produce strings when iterated over. More details on the semantics of "__path__" are given below. Non-package modules should not have a "__path__" attribute. __file__ __cached__ "__file__" is optional. If set, this attribute’s value must be a string. The import system may opt to leave "__file__" unset if it has no semantic meaning (e.g. a module loaded from a database). If "__file__" is set, it may also be appropriate to set the "__cached__" attribute which is the path to any compiled version of the code (e.g. byte-compiled file). The file does not need to exist to set this attribute; the path can simply point to where the compiled file would exist (see **PEP 3147**). It is also appropriate to set "__cached__" when "__file__" is not set. However, that scenario is quite atypical. Ultimately, the loader is what makes use of "__file__" and/or "__cached__". So if a loader can load from a cached module but otherwise does not load from a file, that atypical scenario may be appropriate. 5.4.5. module.__path__ ---------------------- By definition, if a module has a "__path__" attribute, it is a package. A package’s "__path__" attribute is used during imports of its subpackages. Within the import machinery, it functions much the same as "sys.path", i.e. providing a list of locations to search for modules during import. However, "__path__" is typically much more constrained than "sys.path". "__path__" must be an iterable of strings, but it may be empty. The same rules used for "sys.path" also apply to a package’s "__path__", and "sys.path_hooks" (described below) are consulted when traversing a package’s "__path__". A package’s "__init__.py" file may set or alter the package’s "__path__" attribute, and this was typically the way namespace packages were implemented prior to **PEP 420**. With the adoption of **PEP 420**, namespace packages no longer need to supply "__init__.py" files containing only "__path__" manipulation code; the import machinery automatically sets "__path__" correctly for the namespace package. 5.4.6. Module reprs ------------------- By default, all modules have a usable repr, however depending on the attributes set above, and in the module’s spec, you can more explicitly control the repr of module objects. If the module has a spec ("__spec__"), the import machinery will try to generate a repr from it. If that fails or there is no spec, the import system will craft a default repr using whatever information is available on the module. It will try to use the "module.__name__", "module.__file__", and "module.__loader__" as input into the repr, with defaults for whatever information is missing. Here are the exact rules used: * If the module has a "__spec__" attribute, the information in the spec is used to generate the repr. The “name”, “loader”, “origin”, and “has_location” attributes are consulted. * If the module has a "__file__" attribute, this is used as part of the module’s repr. * If the module has no "__file__" but does have a "__loader__" that is not "None", then the loader’s repr is used as part of the module’s repr. * Otherwise, just use the module’s "__name__" in the repr. Changed in version 3.4: Use of "loader.module_repr()" has been deprecated and the module spec is now used by the import machinery to generate a module repr.For backward compatibility with Python 3.3, the module repr will be generated by calling the loader’s "module_repr()" method, if defined, before trying either approach described above. However, the method is deprecated. 5.4.7. Cached bytecode invalidation ----------------------------------- Before Python loads cached bytecode from a ".pyc" file, it checks whether the cache is up-to-date with the source ".py" file. By default, Python does this by storing the source’s last-modified timestamp and size in the cache file when writing it. At runtime, the import system then validates the cache file by checking the stored metadata in the cache file against the source’s metadata. Python also supports “hash-based” cache files, which store a hash of the source file’s contents rather than its metadata. There are two variants of hash-based ".pyc" files: checked and unchecked. For checked hash-based ".pyc" files, Python validates the cache file by hashing the source file and comparing the resulting hash with the hash in the cache file. If a checked hash-based cache file is found to be invalid, Python regenerates it and writes a new checked hash-based cache file. For unchecked hash-based ".pyc" files, Python simply assumes the cache file is valid if it exists. Hash-based ".pyc" files validation behavior may be overridden with the "--check-hash-based- pycs" flag. Changed in version 3.7: Added hash-based ".pyc" files. Previously, Python only supported timestamp-based invalidation of bytecode caches. 5.5. The Path Based Finder ========================== As mentioned previously, Python comes with several default meta path finders. One of these, called the *path based finder* ("PathFinder"), searches an *import path*, which contains a list of *path entries*. Each path entry names a location to search for modules. The path based finder itself doesn’t know how to import anything. Instead, it traverses the individual path entries, associating each of them with a path entry finder that knows how to handle that particular kind of path. The default set of path entry finders implement all the semantics for finding modules on the file system, handling special file types such as Python source code (".py" files), Python byte code (".pyc" files) and shared libraries (e.g. ".so" files). When supported by the "zipimport" module in the standard library, the default path entry finders also handle loading all of these file types (other than shared libraries) from zipfiles. Path entries need not be limited to file system locations. They can refer to URLs, database queries, or any other location that can be specified as a string. The path based finder provides additional hooks and protocols so that you can extend and customize the types of searchable path entries. For example, if you wanted to support path entries as network URLs, you could write a hook that implements HTTP semantics to find modules on the web. This hook (a callable) would return a *path entry finder* supporting the protocol described below, which was then used to get a loader for the module from the web. A word of warning: this section and the previous both use the term *finder*, distinguishing between them by using the terms *meta path finder* and *path entry finder*. These two types of finders are very similar, support similar protocols, and function in similar ways during the import process, but it’s important to keep in mind that they are subtly different. In particular, meta path finders operate at the beginning of the import process, as keyed off the "sys.meta_path" traversal. By contrast, path entry finders are in a sense an implementation detail of the path based finder, and in fact, if the path based finder were to be removed from "sys.meta_path", none of the path entry finder semantics would be invoked. 5.5.1. Path entry finders ------------------------- The *path based finder* is responsible for finding and loading Python modules and packages whose location is specified with a string *path entry*. Most path entries name locations in the file system, but they need not be limited to this. As a meta path finder, the *path based finder* implements the "find_spec()" protocol previously described, however it exposes additional hooks that can be used to customize how modules are found and loaded from the *import path*. Three variables are used by the *path based finder*, "sys.path", "sys.path_hooks" and "sys.path_importer_cache". The "__path__" attributes on package objects are also used. These provide additional ways that the import machinery can be customized. "sys.path" contains a list of strings providing search locations for modules and packages. It is initialized from the "PYTHONPATH" environment variable and various other installation- and implementation-specific defaults. Entries in "sys.path" can name directories on the file system, zip files, and potentially other “locations” (see the "site" module) that should be searched for modules, such as URLs, or database queries. Only strings and bytes should be present on "sys.path"; all other data types are ignored. The encoding of bytes entries is determined by the individual *path entry finders*. The *path based finder* is a *meta path finder*, so the import machinery begins the *import path* search by calling the path based finder’s "find_spec()" method as described previously. When the "path" argument to "find_spec()" is given, it will be a list of string paths to traverse - typically a package’s "__path__" attribute for an import within that package. If the "path" argument is "None", this indicates a top level import and "sys.path" is used. The path based finder iterates over every entry in the search path, and for each of these, looks for an appropriate *path entry finder* ("PathEntryFinder") for the path entry. Because this can be an expensive operation (e.g. there may be *stat()* call overheads for this search), the path based finder maintains a cache mapping path entries to path entry finders. This cache is maintained in "sys.path_importer_cache" (despite the name, this cache actually stores finder objects rather than being limited to *importer* objects). In this way, the expensive search for a particular *path entry* location’s *path entry finder* need only be done once. User code is free to remove cache entries from "sys.path_importer_cache" forcing the path based finder to perform the path entry search again [3]. If the path entry is not present in the cache, the path based finder iterates over every callable in "sys.path_hooks". Each of the *path entry hooks* in this list is called with a single argument, the path entry to be searched. This callable may either return a *path entry finder* that can handle the path entry, or it may raise "ImportError". An "ImportError" is used by the path based finder to signal that the hook cannot find a *path entry finder* for that *path entry*. The exception is ignored and *import path* iteration continues. The hook should expect either a string or bytes object; the encoding of bytes objects is up to the hook (e.g. it may be a file system encoding, UTF-8, or something else), and if the hook cannot decode the argument, it should raise "ImportError". If "sys.path_hooks" iteration ends with no *path entry finder* being returned, then the path based finder’s "find_spec()" method will store "None" in "sys.path_importer_cache" (to indicate that there is no finder for this path entry) and return "None", indicating that this *meta path finder* could not find the module. If a *path entry finder* *is* returned by one of the *path entry hook* callables on "sys.path_hooks", then the following protocol is used to ask the finder for a module spec, which is then used when loading the module. The current working directory – denoted by an empty string – is handled slightly differently from other entries on "sys.path". First, if the current working directory is found to not exist, no value is stored in "sys.path_importer_cache". Second, the value for the current working directory is looked up fresh for each module lookup. Third, the path used for "sys.path_importer_cache" and returned by "importlib.machinery.PathFinder.find_spec()" will be the actual current working directory and not the empty string. 5.5.2. Path entry finder protocol --------------------------------- In order to support imports of modules and initialized packages and also to contribute portions to namespace packages, path entry finders must implement the "find_spec()" method. "find_spec()" takes two arguments: the fully qualified name of the module being imported, and the (optional) target module. "find_spec()" returns a fully populated spec for the module. This spec will always have “loader” set (with one exception). To indicate to the import machinery that the spec represents a namespace *portion*, the path entry finder sets “submodule_search_locations” to a list containing the portion. Changed in version 3.4: "find_spec()" replaced "find_loader()" and "find_module()", both of which are now deprecated, but will be used if "find_spec()" is not defined.Older path entry finders may implement one of these two deprecated methods instead of "find_spec()". The methods are still respected for the sake of backward compatibility. However, if "find_spec()" is implemented on the path entry finder, the legacy methods are ignored."find_loader()" takes one argument, the fully qualified name of the module being imported. "find_loader()" returns a 2-tuple where the first item is the loader and the second item is a namespace *portion*.For backwards compatibility with other implementations of the import protocol, many path entry finders also support the same, traditional "find_module()" method that meta path finders support. However path entry finder "find_module()" methods are never called with a "path" argument (they are expected to record the appropriate path information from the initial call to the path hook).The "find_module()" method on path entry finders is deprecated, as it does not allow the path entry finder to contribute portions to namespace packages. If both "find_loader()" and "find_module()" exist on a path entry finder, the import system will always call "find_loader()" in preference to "find_module()". 5.6. Replacing the standard import system ========================================= The most reliable mechanism for replacing the entire import system is to delete the default contents of "sys.meta_path", replacing them entirely with a custom meta path hook. If it is acceptable to only alter the behaviour of import statements without affecting other APIs that access the import system, then replacing the builtin "__import__()" function may be sufficient. This technique may also be employed at the module level to only alter the behaviour of import statements within that module. To selectively prevent the import of some modules from a hook early on the meta path (rather than disabling the standard import system entirely), it is sufficient to raise "ModuleNotFoundError" directly from "find_spec()" instead of returning "None". The latter indicates that the meta path search should continue, while raising an exception terminates it immediately. 5.7. Package Relative Imports ============================= Relative imports use leading dots. A single leading dot indicates a relative import, starting with the current package. Two or more leading dots indicate a relative import to the parent(s) of the current package, one level per dot after the first. For example, given the following package layout: package/ __init__.py subpackage1/ __init__.py moduleX.py moduleY.py subpackage2/ __init__.py moduleZ.py moduleA.py In either "subpackage1/moduleX.py" or "subpackage1/__init__.py", the following are valid relative imports: from .moduleY import spam from .moduleY import spam as ham from . import moduleY from ..subpackage1 import moduleY from ..subpackage2.moduleZ import eggs from ..moduleA import foo Absolute imports may use either the "import <>" or "from <> import <>" syntax, but relative imports may only use the second form; the reason for this is that: import XXX.YYY.ZZZ should expose "XXX.YYY.ZZZ" as a usable expression, but .moduleY is not a valid expression. 5.8. Special considerations for __main__ ======================================== The "__main__" module is a special case relative to Python’s import system. As noted elsewhere, the "__main__" module is directly initialized at interpreter startup, much like "sys" and "builtins". However, unlike those two, it doesn’t strictly qualify as a built-in module. This is because the manner in which "__main__" is initialized depends on the flags and other options with which the interpreter is invoked. 5.8.1. __main__.__spec__ ------------------------ Depending on how "__main__" is initialized, "__main__.__spec__" gets set appropriately or to "None". When Python is started with the "-m" option, "__spec__" is set to the module spec of the corresponding module or package. "__spec__" is also populated when the "__main__" module is loaded as part of executing a directory, zipfile or other "sys.path" entry. In the remaining cases "__main__.__spec__" is set to "None", as the code used to populate the "__main__" does not correspond directly with an importable module: * interactive prompt * "-c" option * running from stdin * running directly from a source or bytecode file Note that "__main__.__spec__" is always "None" in the last case, *even if* the file could technically be imported directly as a module instead. Use the "-m" switch if valid module metadata is desired in "__main__". Note also that even when "__main__" corresponds with an importable module and "__main__.__spec__" is set accordingly, they’re still considered *distinct* modules. This is due to the fact that blocks guarded by "if __name__ == "__main__":" checks only execute when the module is used to populate the "__main__" namespace, and not during normal import. 5.9. Open issues ================ XXX It would be really nice to have a diagram. XXX * (import_machinery.rst) how about a section devoted just to the attributes of modules and packages, perhaps expanding upon or supplanting the related entries in the data model reference page? XXX runpy, pkgutil, et al in the library manual should all get “See Also” links at the top pointing to the new import system section. XXX Add more explanation regarding the different ways in which "__main__" is initialized? XXX Add more info on "__main__" quirks/pitfalls (i.e. copy from **PEP 395**). 5.10. References ================ The import machinery has evolved considerably since Python’s early days. The original specification for packages is still available to read, although some details have changed since the writing of that document. The original specification for "sys.meta_path" was **PEP 302**, with subsequent extension in **PEP 420**. **PEP 420** introduced *namespace packages* for Python 3.3. **PEP 420** also introduced the "find_loader()" protocol as an alternative to "find_module()". **PEP 366** describes the addition of the "__package__" attribute for explicit relative imports in main modules. **PEP 328** introduced absolute and explicit relative imports and initially proposed "__name__" for semantics **PEP 366** would eventually specify for "__package__". **PEP 338** defines executing modules as scripts. **PEP 451** adds the encapsulation of per-module import state in spec objects. It also off-loads most of the boilerplate responsibilities of loaders back onto the import machinery. These changes allow the deprecation of several APIs in the import system and also addition of new methods to finders and loaders. -[ Footnotes ]- [1] See "types.ModuleType". [2] The importlib implementation avoids using the return value directly. Instead, it gets the module object by looking the module name up in "sys.modules". The indirect effect of this is that an imported module may replace itself in "sys.modules". This is implementation-specific behavior that is not guaranteed to work in other Python implementations. [3] In legacy code, it is possible to find instances of "imp.NullImporter" in the "sys.path_importer_cache". It is recommended that code be changed to use "None" instead. See Porting Python code for more details. The Python Language Reference ***************************** This reference manual describes the syntax and “core semantics” of the language. It is terse, but attempts to be exact and complete. The semantics of non-essential built-in object types and of the built-in functions and modules are described in The Python Standard Library. For an informal introduction to the language, see The Python Tutorial. For C or C++ programmers, two additional manuals exist: Extending and Embedding the Python Interpreter describes the high-level picture of how to write a Python extension module, and the Python/C API Reference Manual describes the interfaces available to C/C++ programmers in detail. * 1. Introduction * 1.1. Alternate Implementations * 1.2. Notation * 2. Lexical analysis * 2.1. Line structure * 2.2. Other tokens * 2.3. Identifiers and keywords * 2.4. Literals * 2.5. Operators * 2.6. Delimiters * 3. Data model * 3.1. Objects, values and types * 3.2. The standard type hierarchy * 3.3. Special method names * 3.4. Coroutines * 4. Execution model * 4.1. Structure of a program * 4.2. Naming and binding * 4.3. Exceptions * 5. The import system * 5.1. "importlib" * 5.2. Packages * 5.3. Searching * 5.4. Loading * 5.5. The Path Based Finder * 5.6. Replacing the standard import system * 5.7. Package Relative Imports * 5.8. Special considerations for __main__ * 5.9. Open issues * 5.10. References * 6. Expressions * 6.1. Arithmetic conversions * 6.2. Atoms * 6.3. Primaries * 6.4. Await expression * 6.5. The power operator * 6.6. Unary arithmetic and bitwise operations * 6.7. Binary arithmetic operations * 6.8. Shifting operations * 6.9. Binary bitwise operations * 6.10. Comparisons * 6.11. Boolean operations * 6.12. Assignment expressions * 6.13. Conditional expressions * 6.14. Lambdas * 6.15. Expression lists * 6.16. Evaluation order * 6.17. Operator precedence * 7. Simple statements * 7.1. Expression statements * 7.2. Assignment statements * 7.3. The "assert" statement * 7.4. The "pass" statement * 7.5. The "del" statement * 7.6. The "return" statement * 7.7. The "yield" statement * 7.8. The "raise" statement * 7.9. The "break" statement * 7.10. The "continue" statement * 7.11. The "import" statement * 7.12. The "global" statement * 7.13. The "nonlocal" statement * 8. Compound statements * 8.1. The "if" statement * 8.2. The "while" statement * 8.3. The "for" statement * 8.4. The "try" statement * 8.5. The "with" statement * 8.6. Function definitions * 8.7. Class definitions * 8.8. Coroutines * 9. Top-level components * 9.1. Complete Python programs * 9.2. File input * 9.3. Interactive input * 9.4. Expression input * 10. Full Grammar specification 1. Introduction *************** This reference manual describes the Python programming language. It is not intended as a tutorial. While I am trying to be as precise as possible, I chose to use English rather than formal specifications for everything except syntax and lexical analysis. This should make the document more understandable to the average reader, but will leave room for ambiguities. Consequently, if you were coming from Mars and tried to re-implement Python from this document alone, you might have to guess things and in fact you would probably end up implementing quite a different language. On the other hand, if you are using Python and wonder what the precise rules about a particular area of the language are, you should definitely be able to find them here. If you would like to see a more formal definition of the language, maybe you could volunteer your time — or invent a cloning machine :-). It is dangerous to add too many implementation details to a language reference document — the implementation may change, and other implementations of the same language may work differently. On the other hand, CPython is the one Python implementation in widespread use (although alternate implementations continue to gain support), and its particular quirks are sometimes worth being mentioned, especially where the implementation imposes additional limitations. Therefore, you’ll find short “implementation notes” sprinkled throughout the text. Every Python implementation comes with a number of built-in and standard modules. These are documented in The Python Standard Library. A few built-in modules are mentioned when they interact in a significant way with the language definition. 1.1. Alternate Implementations ============================== Though there is one Python implementation which is by far the most popular, there are some alternate implementations which are of particular interest to different audiences. Known implementations include: CPython This is the original and most-maintained implementation of Python, written in C. New language features generally appear here first. Jython Python implemented in Java. This implementation can be used as a scripting language for Java applications, or can be used to create applications using the Java class libraries. It is also often used to create tests for Java libraries. More information can be found at the Jython website. Python for .NET This implementation actually uses the CPython implementation, but is a managed .NET application and makes .NET libraries available. It was created by Brian Lloyd. For more information, see the Python for .NET home page. IronPython An alternate Python for .NET. Unlike Python.NET, this is a complete Python implementation that generates IL, and compiles Python code directly to .NET assemblies. It was created by Jim Hugunin, the original creator of Jython. For more information, see the IronPython website. PyPy An implementation of Python written completely in Python. It supports several advanced features not found in other implementations like stackless support and a Just in Time compiler. One of the goals of the project is to encourage experimentation with the language itself by making it easier to modify the interpreter (since it is written in Python). Additional information is available on the PyPy project’s home page. Each of these implementations varies in some way from the language as documented in this manual, or introduces specific information beyond what’s covered in the standard Python documentation. Please refer to the implementation-specific documentation to determine what else you need to know about the specific implementation you’re using. 1.2. Notation ============= The descriptions of lexical analysis and syntax use a modified BNF grammar notation. This uses the following style of definition: name ::= lc_letter (lc_letter | "_")* lc_letter ::= "a"..."z" The first line says that a "name" is an "lc_letter" followed by a sequence of zero or more "lc_letter"s and underscores. An "lc_letter" in turn is any of the single characters "'a'" through "'z'". (This rule is actually adhered to for the names defined in lexical and grammar rules in this document.) Each rule begins with a name (which is the name defined by the rule) and "::=". A vertical bar ("|") is used to separate alternatives; it is the least binding operator in this notation. A star ("*") means zero or more repetitions of the preceding item; likewise, a plus ("+") means one or more repetitions, and a phrase enclosed in square brackets ("[ ]") means zero or one occurrences (in other words, the enclosed phrase is optional). The "*" and "+" operators bind as tightly as possible; parentheses are used for grouping. Literal strings are enclosed in quotes. White space is only meaningful to separate tokens. Rules are normally contained on a single line; rules with many alternatives may be formatted alternatively with each line after the first beginning with a vertical bar. In lexical definitions (as the example above), two more conventions are used: Two literal characters separated by three dots mean a choice of any single character in the given (inclusive) range of ASCII characters. A phrase between angular brackets ("<...>") gives an informal description of the symbol defined; e.g., this could be used to describe the notion of ‘control character’ if needed. Even though the notation used is almost the same, there is a big difference between the meaning of lexical and syntactic definitions: a lexical definition operates on the individual characters of the input source, while a syntax definition operates on the stream of tokens generated by the lexical analysis. All uses of BNF in the next chapter (“Lexical Analysis”) are lexical definitions; uses in subsequent chapters are syntactic definitions. 2. Lexical analysis ******************* A Python program is read by a *parser*. Input to the parser is a stream of *tokens*, generated by the *lexical analyzer*. This chapter describes how the lexical analyzer breaks a file into tokens. Python reads program text as Unicode code points; the encoding of a source file can be given by an encoding declaration and defaults to UTF-8, see **PEP 3120** for details. If the source file cannot be decoded, a "SyntaxError" is raised. 2.1. Line structure =================== A Python program is divided into a number of *logical lines*. 2.1.1. Logical lines -------------------- The end of a logical line is represented by the token NEWLINE. Statements cannot cross logical line boundaries except where NEWLINE is allowed by the syntax (e.g., between statements in compound statements). A logical line is constructed from one or more *physical lines* by following the explicit or implicit *line joining* rules. 2.1.2. Physical lines --------------------- A physical line is a sequence of characters terminated by an end-of- line sequence. In source files and strings, any of the standard platform line termination sequences can be used - the Unix form using ASCII LF (linefeed), the Windows form using the ASCII sequence CR LF (return followed by linefeed), or the old Macintosh form using the ASCII CR (return) character. All of these forms can be used equally, regardless of platform. The end of input also serves as an implicit terminator for the final physical line. When embedding Python, source code strings should be passed to Python APIs using the standard C conventions for newline characters (the "\n" character, representing ASCII LF, is the line terminator). 2.1.3. Comments --------------- A comment starts with a hash character ("#") that is not part of a string literal, and ends at the end of the physical line. A comment signifies the end of the logical line unless the implicit line joining rules are invoked. Comments are ignored by the syntax. 2.1.4. Encoding declarations ---------------------------- If a comment in the first or second line of the Python script matches the regular expression "coding[=:]\s*([-\w.]+)", this comment is processed as an encoding declaration; the first group of this expression names the encoding of the source code file. The encoding declaration must appear on a line of its own. If it is the second line, the first line must also be a comment-only line. The recommended forms of an encoding expression are # -*- coding: -*- which is recognized also by GNU Emacs, and # vim:fileencoding= which is recognized by Bram Moolenaar’s VIM. If no encoding declaration is found, the default encoding is UTF-8. In addition, if the first bytes of the file are the UTF-8 byte-order mark ("b'\xef\xbb\xbf'"), the declared file encoding is UTF-8 (this is supported, among others, by Microsoft’s **notepad**). If an encoding is declared, the encoding name must be recognized by Python (see Standard Encodings). The encoding is used for all lexical analysis, including string literals, comments and identifiers. 2.1.5. Explicit line joining ---------------------------- Two or more physical lines may be joined into logical lines using backslash characters ("\"), as follows: when a physical line ends in a backslash that is not part of a string literal or comment, it is joined with the following forming a single logical line, deleting the backslash and the following end-of-line character. For example: if 1900 < year < 2100 and 1 <= month <= 12 \ and 1 <= day <= 31 and 0 <= hour < 24 \ and 0 <= minute < 60 and 0 <= second < 60: # Looks like a valid date return 1 A line ending in a backslash cannot carry a comment. A backslash does not continue a comment. A backslash does not continue a token except for string literals (i.e., tokens other than string literals cannot be split across physical lines using a backslash). A backslash is illegal elsewhere on a line outside a string literal. 2.1.6. Implicit line joining ---------------------------- Expressions in parentheses, square brackets or curly braces can be split over more than one physical line without using backslashes. For example: month_names = ['Januari', 'Februari', 'Maart', # These are the 'April', 'Mei', 'Juni', # Dutch names 'Juli', 'Augustus', 'September', # for the months 'Oktober', 'November', 'December'] # of the year Implicitly continued lines can carry comments. The indentation of the continuation lines is not important. Blank continuation lines are allowed. There is no NEWLINE token between implicit continuation lines. Implicitly continued lines can also occur within triple-quoted strings (see below); in that case they cannot carry comments. 2.1.7. Blank lines ------------------ A logical line that contains only spaces, tabs, formfeeds and possibly a comment, is ignored (i.e., no NEWLINE token is generated). During interactive input of statements, handling of a blank line may differ depending on the implementation of the read-eval-print loop. In the standard interactive interpreter, an entirely blank logical line (i.e. one containing not even whitespace or a comment) terminates a multi- line statement. 2.1.8. Indentation ------------------ Leading whitespace (spaces and tabs) at the beginning of a logical line is used to compute the indentation level of the line, which in turn is used to determine the grouping of statements. Tabs are replaced (from left to right) by one to eight spaces such that the total number of characters up to and including the replacement is a multiple of eight (this is intended to be the same rule as used by Unix). The total number of spaces preceding the first non-blank character then determines the line’s indentation. Indentation cannot be split over multiple physical lines using backslashes; the whitespace up to the first backslash determines the indentation. Indentation is rejected as inconsistent if a source file mixes tabs and spaces in a way that makes the meaning dependent on the worth of a tab in spaces; a "TabError" is raised in that case. **Cross-platform compatibility note:** because of the nature of text editors on non-UNIX platforms, it is unwise to use a mixture of spaces and tabs for the indentation in a single source file. It should also be noted that different platforms may explicitly limit the maximum indentation level. A formfeed character may be present at the start of the line; it will be ignored for the indentation calculations above. Formfeed characters occurring elsewhere in the leading whitespace have an undefined effect (for instance, they may reset the space count to zero). The indentation levels of consecutive lines are used to generate INDENT and DEDENT tokens, using a stack, as follows. Before the first line of the file is read, a single zero is pushed on the stack; this will never be popped off again. The numbers pushed on the stack will always be strictly increasing from bottom to top. At the beginning of each logical line, the line’s indentation level is compared to the top of the stack. If it is equal, nothing happens. If it is larger, it is pushed on the stack, and one INDENT token is generated. If it is smaller, it *must* be one of the numbers occurring on the stack; all numbers on the stack that are larger are popped off, and for each number popped off a DEDENT token is generated. At the end of the file, a DEDENT token is generated for each number remaining on the stack that is larger than zero. Here is an example of a correctly (though confusingly) indented piece of Python code: def perm(l): # Compute the list of all permutations of l if len(l) <= 1: return [l] r = [] for i in range(len(l)): s = l[:i] + l[i+1:] p = perm(s) for x in p: r.append(l[i:i+1] + x) return r The following example shows various indentation errors: def perm(l): # error: first line indented for i in range(len(l)): # error: not indented s = l[:i] + l[i+1:] p = perm(l[:i] + l[i+1:]) # error: unexpected indent for x in p: r.append(l[i:i+1] + x) return r # error: inconsistent dedent (Actually, the first three errors are detected by the parser; only the last error is found by the lexical analyzer — the indentation of "return r" does not match a level popped off the stack.) 2.1.9. Whitespace between tokens -------------------------------- Except at the beginning of a logical line or in string literals, the whitespace characters space, tab and formfeed can be used interchangeably to separate tokens. Whitespace is needed between two tokens only if their concatenation could otherwise be interpreted as a different token (e.g., ab is one token, but a b is two tokens). 2.2. Other tokens ================= Besides NEWLINE, INDENT and DEDENT, the following categories of tokens exist: *identifiers*, *keywords*, *literals*, *operators*, and *delimiters*. Whitespace characters (other than line terminators, discussed earlier) are not tokens, but serve to delimit tokens. Where ambiguity exists, a token comprises the longest possible string that forms a legal token, when read from left to right. 2.3. Identifiers and keywords ============================= Identifiers (also referred to as *names*) are described by the following lexical definitions. The syntax of identifiers in Python is based on the Unicode standard annex UAX-31, with elaboration and changes as defined below; see also **PEP 3131** for further details. Within the ASCII range (U+0001..U+007F), the valid characters for identifiers are the same as in Python 2.x: the uppercase and lowercase letters "A" through "Z", the underscore "_" and, except for the first character, the digits "0" through "9". Python 3.0 introduces additional characters from outside the ASCII range (see **PEP 3131**). For these characters, the classification uses the version of the Unicode Character Database as included in the "unicodedata" module. Identifiers are unlimited in length. Case is significant. identifier ::= xid_start xid_continue* id_start ::= id_continue ::= xid_start ::= xid_continue ::= The Unicode category codes mentioned above stand for: * *Lu* - uppercase letters * *Ll* - lowercase letters * *Lt* - titlecase letters * *Lm* - modifier letters * *Lo* - other letters * *Nl* - letter numbers * *Mn* - nonspacing marks * *Mc* - spacing combining marks * *Nd* - decimal numbers * *Pc* - connector punctuations * *Other_ID_Start* - explicit list of characters in PropList.txt to support backwards compatibility * *Other_ID_Continue* - likewise All identifiers are converted into the normal form NFKC while parsing; comparison of identifiers is based on NFKC. A non-normative HTML file listing all valid identifier characters for Unicode 4.1 can be found at https://www.unicode.org/Public/13.0.0/ucd/DerivedCoreProperties.txt 2.3.1. Keywords --------------- The following identifiers are used as reserved words, or *keywords* of the language, and cannot be used as ordinary identifiers. They must be spelled exactly as written here: False await else import pass None break except in raise True class finally is return and continue for lambda try as def from nonlocal while assert del global not with async elif if or yield 2.3.2. Reserved classes of identifiers -------------------------------------- Certain classes of identifiers (besides keywords) have special meanings. These classes are identified by the patterns of leading and trailing underscore characters: "_*" Not imported by "from module import *". The special identifier "_" is used in the interactive interpreter to store the result of the last evaluation; it is stored in the "builtins" module. When not in interactive mode, "_" has no special meaning and is not defined. See section The import statement. Note: The name "_" is often used in conjunction with internationalization; refer to the documentation for the "gettext" module for more information on this convention. "__*__" System-defined names, informally known as “dunder” names. These names are defined by the interpreter and its implementation (including the standard library). Current system names are discussed in the Special method names section and elsewhere. More will likely be defined in future versions of Python. *Any* use of "__*__" names, in any context, that does not follow explicitly documented use, is subject to breakage without warning. "__*" Class-private names. Names in this category, when used within the context of a class definition, are re-written to use a mangled form to help avoid name clashes between “private” attributes of base and derived classes. See section Identifiers (Names). 2.4. Literals ============= Literals are notations for constant values of some built-in types. 2.4.1. String and Bytes literals -------------------------------- String literals are described by the following lexical definitions: stringliteral ::= [stringprefix](shortstring | longstring) stringprefix ::= "r" | "u" | "R" | "U" | "f" | "F" | "fr" | "Fr" | "fR" | "FR" | "rf" | "rF" | "Rf" | "RF" shortstring ::= "'" shortstringitem* "'" | '"' shortstringitem* '"' longstring ::= "'''" longstringitem* "'''" | '"""' longstringitem* '"""' shortstringitem ::= shortstringchar | stringescapeseq longstringitem ::= longstringchar | stringescapeseq shortstringchar ::= longstringchar ::= stringescapeseq ::= "\" bytesliteral ::= bytesprefix(shortbytes | longbytes) bytesprefix ::= "b" | "B" | "br" | "Br" | "bR" | "BR" | "rb" | "rB" | "Rb" | "RB" shortbytes ::= "'" shortbytesitem* "'" | '"' shortbytesitem* '"' longbytes ::= "'''" longbytesitem* "'''" | '"""' longbytesitem* '"""' shortbytesitem ::= shortbyteschar | bytesescapeseq longbytesitem ::= longbyteschar | bytesescapeseq shortbyteschar ::= longbyteschar ::= bytesescapeseq ::= "\" One syntactic restriction not indicated by these productions is that whitespace is not allowed between the "stringprefix" or "bytesprefix" and the rest of the literal. The source character set is defined by the encoding declaration; it is UTF-8 if no encoding declaration is given in the source file; see section Encoding declarations. In plain English: Both types of literals can be enclosed in matching single quotes ("'") or double quotes ("""). They can also be enclosed in matching groups of three single or double quotes (these are generally referred to as *triple-quoted strings*). The backslash ("\") character is used to give special meaning to otherwise ordinary characters like "n", which means ‘newline’ when escaped ("\n"). It can also be used to escape characters that otherwise have a special meaning, such as newline, backslash itself, or the quote character. See escape sequences below for examples. Bytes literals are always prefixed with "'b'" or "'B'"; they produce an instance of the "bytes" type instead of the "str" type. They may only contain ASCII characters; bytes with a numeric value of 128 or greater must be expressed with escapes. Both string and bytes literals may optionally be prefixed with a letter "'r'" or "'R'"; such strings are called *raw strings* and treat backslashes as literal characters. As a result, in string literals, "'\U'" and "'\u'" escapes in raw strings are not treated specially. Given that Python 2.x’s raw unicode literals behave differently than Python 3.x’s the "'ur'" syntax is not supported. New in version 3.3: The "'rb'" prefix of raw bytes literals has been added as a synonym of "'br'". New in version 3.3: Support for the unicode legacy literal ("u'value'") was reintroduced to simplify the maintenance of dual Python 2.x and 3.x codebases. See **PEP 414** for more information. A string literal with "'f'" or "'F'" in its prefix is a *formatted string literal*; see Formatted string literals. The "'f'" may be combined with "'r'", but not with "'b'" or "'u'", therefore raw formatted strings are possible, but formatted bytes literals are not. In triple-quoted literals, unescaped newlines and quotes are allowed (and are retained), except that three unescaped quotes in a row terminate the literal. (A “quote” is the character used to open the literal, i.e. either "'" or """.) Unless an "'r'" or "'R'" prefix is present, escape sequences in string and bytes literals are interpreted according to rules similar to those used by Standard C. The recognized escape sequences are: +-------------------+-----------------------------------+---------+ | Escape Sequence | Meaning | Notes | |===================|===================================|=========| | "\newline" | Backslash and newline ignored | | +-------------------+-----------------------------------+---------+ | "\\" | Backslash ("\") | | +-------------------+-----------------------------------+---------+ | "\'" | Single quote ("'") | | +-------------------+-----------------------------------+---------+ | "\"" | Double quote (""") | | +-------------------+-----------------------------------+---------+ | "\a" | ASCII Bell (BEL) | | +-------------------+-----------------------------------+---------+ | "\b" | ASCII Backspace (BS) | | +-------------------+-----------------------------------+---------+ | "\f" | ASCII Formfeed (FF) | | +-------------------+-----------------------------------+---------+ | "\n" | ASCII Linefeed (LF) | | +-------------------+-----------------------------------+---------+ | "\r" | ASCII Carriage Return (CR) | | +-------------------+-----------------------------------+---------+ | "\t" | ASCII Horizontal Tab (TAB) | | +-------------------+-----------------------------------+---------+ | "\v" | ASCII Vertical Tab (VT) | | +-------------------+-----------------------------------+---------+ | "\ooo" | Character with octal value *ooo* | (1,3) | +-------------------+-----------------------------------+---------+ | "\xhh" | Character with hex value *hh* | (2,3) | +-------------------+-----------------------------------+---------+ Escape sequences only recognized in string literals are: +-------------------+-----------------------------------+---------+ | Escape Sequence | Meaning | Notes | |===================|===================================|=========| | "\N{name}" | Character named *name* in the | (4) | | | Unicode database | | +-------------------+-----------------------------------+---------+ | "\uxxxx" | Character with 16-bit hex value | (5) | | | *xxxx* | | +-------------------+-----------------------------------+---------+ | "\Uxxxxxxxx" | Character with 32-bit hex value | (6) | | | *xxxxxxxx* | | +-------------------+-----------------------------------+---------+ Notes: 1. As in Standard C, up to three octal digits are accepted. 2. Unlike in Standard C, exactly two hex digits are required. 3. In a bytes literal, hexadecimal and octal escapes denote the byte with the given value. In a string literal, these escapes denote a Unicode character with the given value. 4. Changed in version 3.3: Support for name aliases [1] has been added. 5. Exactly four hex digits are required. 6. Any Unicode character can be encoded this way. Exactly eight hex digits are required. Unlike Standard C, all unrecognized escape sequences are left in the string unchanged, i.e., *the backslash is left in the result*. (This behavior is useful when debugging: if an escape sequence is mistyped, the resulting output is more easily recognized as broken.) It is also important to note that the escape sequences only recognized in string literals fall into the category of unrecognized escapes for bytes literals. Changed in version 3.6: Unrecognized escape sequences produce a "DeprecationWarning". In a future Python version they will be a "SyntaxWarning" and eventually a "SyntaxError". Even in a raw literal, quotes can be escaped with a backslash, but the backslash remains in the result; for example, "r"\""" is a valid string literal consisting of two characters: a backslash and a double quote; "r"\"" is not a valid string literal (even a raw string cannot end in an odd number of backslashes). Specifically, *a raw literal cannot end in a single backslash* (since the backslash would escape the following quote character). Note also that a single backslash followed by a newline is interpreted as those two characters as part of the literal, *not* as a line continuation. 2.4.2. String literal concatenation ----------------------------------- Multiple adjacent string or bytes literals (delimited by whitespace), possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation. Thus, ""hello" 'world'" is equivalent to ""helloworld"". This feature can be used to reduce the number of backslashes needed, to split long strings conveniently across long lines, or even to add comments to parts of strings, for example: re.compile("[A-Za-z_]" # letter or underscore "[A-Za-z0-9_]*" # letter, digit or underscore ) Note that this feature is defined at the syntactical level, but implemented at compile time. The ‘+’ operator must be used to concatenate string expressions at run time. Also note that literal concatenation can use different quoting styles for each component (even mixing raw strings and triple quoted strings), and formatted string literals may be concatenated with plain string literals. 2.4.3. Formatted string literals -------------------------------- New in version 3.6. A *formatted string literal* or *f-string* is a string literal that is prefixed with "'f'" or "'F'". These strings may contain replacement fields, which are expressions delimited by curly braces "{}". While other string literals always have a constant value, formatted strings are really expressions evaluated at run time. Escape sequences are decoded like in ordinary string literals (except when a literal is also marked as a raw string). After decoding, the grammar for the contents of the string is: f_string ::= (literal_char | "{{" | "}}" | replacement_field)* replacement_field ::= "{" f_expression ["="] ["!" conversion] [":" format_spec] "}" f_expression ::= (conditional_expression | "*" or_expr) ("," conditional_expression | "," "*" or_expr)* [","] | yield_expression conversion ::= "s" | "r" | "a" format_spec ::= (literal_char | NULL | replacement_field)* literal_char ::= The parts of the string outside curly braces are treated literally, except that any doubled curly braces "'{{'" or "'}}'" are replaced with the corresponding single curly brace. A single opening curly bracket "'{'" marks a replacement field, which starts with a Python expression. To display both the expression text and its value after evaluation, (useful in debugging), an equal sign "'='" may be added after the expression. A conversion field, introduced by an exclamation point "'!'" may follow. A format specifier may also be appended, introduced by a colon "':'". A replacement field ends with a closing curly bracket "'}'". Expressions in formatted string literals are treated like regular Python expressions surrounded by parentheses, with a few exceptions. An empty expression is not allowed, and both "lambda" and assignment expressions ":=" must be surrounded by explicit parentheses. Replacement expressions can contain line breaks (e.g. in triple-quoted strings), but they cannot contain comments. Each expression is evaluated in the context where the formatted string literal appears, in order from left to right. Changed in version 3.7: Prior to Python 3.7, an "await" expression and comprehensions containing an "async for" clause were illegal in the expressions in formatted string literals due to a problem with the implementation. When the equal sign "'='" is provided, the output will have the expression text, the "'='" and the evaluated value. Spaces after the opening brace "'{'", within the expression and after the "'='" are all retained in the output. By default, the "'='" causes the "repr()" of the expression to be provided, unless there is a format specified. When a format is specified it defaults to the "str()" of the expression unless a conversion "'!r'" is declared. New in version 3.8: The equal sign "'='". If a conversion is specified, the result of evaluating the expression is converted before formatting. Conversion "'!s'" calls "str()" on the result, "'!r'" calls "repr()", and "'!a'" calls "ascii()". The result is then formatted using the "format()" protocol. The format specifier is passed to the "__format__()" method of the expression or conversion result. An empty string is passed when the format specifier is omitted. The formatted result is then included in the final value of the whole string. Top-level format specifiers may include nested replacement fields. These nested fields may include their own conversion fields and format specifiers, but may not include more deeply-nested replacement fields. The format specifier mini-language is the same as that used by the "str.format()" method. Formatted string literals may be concatenated, but replacement fields cannot be split across literals. Some examples of formatted string literals: >>> name = "Fred" >>> f"He said his name is {name!r}." "He said his name is 'Fred'." >>> f"He said his name is {repr(name)}." # repr() is equivalent to !r "He said his name is 'Fred'." >>> width = 10 >>> precision = 4 >>> value = decimal.Decimal("12.34567") >>> f"result: {value:{width}.{precision}}" # nested fields 'result: 12.35' >>> today = datetime(year=2017, month=1, day=27) >>> f"{today:%B %d, %Y}" # using date format specifier 'January 27, 2017' >>> f"{today=:%B %d, %Y}" # using date format specifier and debugging 'today=January 27, 2017' >>> number = 1024 >>> f"{number:#0x}" # using integer format specifier '0x400' >>> foo = "bar" >>> f"{ foo = }" # preserves whitespace " foo = 'bar'" >>> line = "The mill's closed" >>> f"{line = }" 'line = "The mill\'s closed"' >>> f"{line = :20}" "line = The mill's closed " >>> f"{line = !r:20}" 'line = "The mill\'s closed" ' A consequence of sharing the same syntax as regular string literals is that characters in the replacement fields must not conflict with the quoting used in the outer formatted string literal: f"abc {a["x"]} def" # error: outer string literal ended prematurely f"abc {a['x']} def" # workaround: use different quoting Backslashes are not allowed in format expressions and will raise an error: f"newline: {ord('\n')}" # raises SyntaxError To include a value in which a backslash escape is required, create a temporary variable. >>> newline = ord('\n') >>> f"newline: {newline}" 'newline: 10' Formatted string literals cannot be used as docstrings, even if they do not include expressions. >>> def foo(): ... f"Not a docstring" ... >>> foo.__doc__ is None True See also **PEP 498** for the proposal that added formatted string literals, and "str.format()", which uses a related format string mechanism. 2.4.4. Numeric literals ----------------------- There are three types of numeric literals: integers, floating point numbers, and imaginary numbers. There are no complex literals (complex numbers can be formed by adding a real number and an imaginary number). Note that numeric literals do not include a sign; a phrase like "-1" is actually an expression composed of the unary operator ‘"-"’ and the literal "1". 2.4.5. Integer literals ----------------------- Integer literals are described by the following lexical definitions: integer ::= decinteger | bininteger | octinteger | hexinteger decinteger ::= nonzerodigit (["_"] digit)* | "0"+ (["_"] "0")* bininteger ::= "0" ("b" | "B") (["_"] bindigit)+ octinteger ::= "0" ("o" | "O") (["_"] octdigit)+ hexinteger ::= "0" ("x" | "X") (["_"] hexdigit)+ nonzerodigit ::= "1"..."9" digit ::= "0"..."9" bindigit ::= "0" | "1" octdigit ::= "0"..."7" hexdigit ::= digit | "a"..."f" | "A"..."F" There is no limit for the length of integer literals apart from what can be stored in available memory. Underscores are ignored for determining the numeric value of the literal. They can be used to group digits for enhanced readability. One underscore can occur between digits, and after base specifiers like "0x". Note that leading zeros in a non-zero decimal number are not allowed. This is for disambiguation with C-style octal literals, which Python used before version 3.0. Some examples of integer literals: 7 2147483647 0o177 0b100110111 3 79228162514264337593543950336 0o377 0xdeadbeef 100_000_000_000 0b_1110_0101 Changed in version 3.6: Underscores are now allowed for grouping purposes in literals. 2.4.6. Floating point literals ------------------------------ Floating point literals are described by the following lexical definitions: floatnumber ::= pointfloat | exponentfloat pointfloat ::= [digitpart] fraction | digitpart "." exponentfloat ::= (digitpart | pointfloat) exponent digitpart ::= digit (["_"] digit)* fraction ::= "." digitpart exponent ::= ("e" | "E") ["+" | "-"] digitpart Note that the integer and exponent parts are always interpreted using radix 10. For example, "077e010" is legal, and denotes the same number as "77e10". The allowed range of floating point literals is implementation-dependent. As in integer literals, underscores are supported for digit grouping. Some examples of floating point literals: 3.14 10. .001 1e100 3.14e-10 0e0 3.14_15_93 Changed in version 3.6: Underscores are now allowed for grouping purposes in literals. 2.4.7. Imaginary literals ------------------------- Imaginary literals are described by the following lexical definitions: imagnumber ::= (floatnumber | digitpart) ("j" | "J") An imaginary literal yields a complex number with a real part of 0.0. Complex numbers are represented as a pair of floating point numbers and have the same restrictions on their range. To create a complex number with a nonzero real part, add a floating point number to it, e.g., "(3+4j)". Some examples of imaginary literals: 3.14j 10.j 10j .001j 1e100j 3.14e-10j 3.14_15_93j 2.5. Operators ============== The following tokens are operators: + - * ** / // % @ << >> & | ^ ~ := < > <= >= == != 2.6. Delimiters =============== The following tokens serve as delimiters in the grammar: ( ) [ ] { } , : . ; @ = -> += -= *= /= //= %= @= &= |= ^= >>= <<= **= The period can also occur in floating-point and imaginary literals. A sequence of three periods has a special meaning as an ellipsis literal. The second half of the list, the augmented assignment operators, serve lexically as delimiters, but also perform an operation. The following printing ASCII characters have special meaning as part of other tokens or are otherwise significant to the lexical analyzer: ' " # \ The following printing ASCII characters are not used in Python. Their occurrence outside string literals and comments is an unconditional error: $ ? ` -[ Footnotes ]- [1] https://www.unicode.org/Public/11.0.0/ucd/NameAliases.txt 7. Simple statements ******************** A simple statement is comprised within a single logical line. Several simple statements may occur on a single line separated by semicolons. The syntax for simple statements is: simple_stmt ::= expression_stmt | assert_stmt | assignment_stmt | augmented_assignment_stmt | annotated_assignment_stmt | pass_stmt | del_stmt | return_stmt | yield_stmt | raise_stmt | break_stmt | continue_stmt | import_stmt | future_stmt | global_stmt | nonlocal_stmt 7.1. Expression statements ========================== Expression statements are used (mostly interactively) to compute and write a value, or (usually) to call a procedure (a function that returns no meaningful result; in Python, procedures return the value "None"). Other uses of expression statements are allowed and occasionally useful. The syntax for an expression statement is: expression_stmt ::= starred_expression An expression statement evaluates the expression list (which may be a single expression). In interactive mode, if the value is not "None", it is converted to a string using the built-in "repr()" function and the resulting string is written to standard output on a line by itself (except if the result is "None", so that procedure calls do not cause any output.) 7.2. Assignment statements ========================== Assignment statements are used to (re)bind names to values and to modify attributes or items of mutable objects: assignment_stmt ::= (target_list "=")+ (starred_expression | yield_expression) target_list ::= target ("," target)* [","] target ::= identifier | "(" [target_list] ")" | "[" [target_list] "]" | attributeref | subscription | slicing | "*" target (See section Primaries for the syntax definitions for *attributeref*, *subscription*, and *slicing*.) An assignment statement evaluates the expression list (remember that this can be a single expression or a comma-separated list, the latter yielding a tuple) and assigns the single resulting object to each of the target lists, from left to right. Assignment is defined recursively depending on the form of the target (list). When a target is part of a mutable object (an attribute reference, subscription or slicing), the mutable object must ultimately perform the assignment and decide about its validity, and may raise an exception if the assignment is unacceptable. The rules observed by various types and the exceptions raised are given with the definition of the object types (see section The standard type hierarchy). Assignment of an object to a target list, optionally enclosed in parentheses or square brackets, is recursively defined as follows. * If the target list is a single target with no trailing comma, optionally in parentheses, the object is assigned to that target. * Else: * If the target list contains one target prefixed with an asterisk, called a “starred” target: The object must be an iterable with at least as many items as there are targets in the target list, minus one. The first items of the iterable are assigned, from left to right, to the targets before the starred target. The final items of the iterable are assigned to the targets after the starred target. A list of the remaining items in the iterable is then assigned to the starred target (the list can be empty). * Else: The object must be an iterable with the same number of items as there are targets in the target list, and the items are assigned, from left to right, to the corresponding targets. Assignment of an object to a single target is recursively defined as follows. * If the target is an identifier (name): * If the name does not occur in a "global" or "nonlocal" statement in the current code block: the name is bound to the object in the current local namespace. * Otherwise: the name is bound to the object in the global namespace or the outer namespace determined by "nonlocal", respectively. The name is rebound if it was already bound. This may cause the reference count for the object previously bound to the name to reach zero, causing the object to be deallocated and its destructor (if it has one) to be called. * If the target is an attribute reference: The primary expression in the reference is evaluated. It should yield an object with assignable attributes; if this is not the case, "TypeError" is raised. That object is then asked to assign the assigned object to the given attribute; if it cannot perform the assignment, it raises an exception (usually but not necessarily "AttributeError"). Note: If the object is a class instance and the attribute reference occurs on both sides of the assignment operator, the right-hand side expression, "a.x" can access either an instance attribute or (if no instance attribute exists) a class attribute. The left-hand side target "a.x" is always set as an instance attribute, creating it if necessary. Thus, the two occurrences of "a.x" do not necessarily refer to the same attribute: if the right-hand side expression refers to a class attribute, the left-hand side creates a new instance attribute as the target of the assignment: class Cls: x = 3 # class variable inst = Cls() inst.x = inst.x + 1 # writes inst.x as 4 leaving Cls.x as 3 This description does not necessarily apply to descriptor attributes, such as properties created with "property()". * If the target is a subscription: The primary expression in the reference is evaluated. It should yield either a mutable sequence object (such as a list) or a mapping object (such as a dictionary). Next, the subscript expression is evaluated. If the primary is a mutable sequence object (such as a list), the subscript must yield an integer. If it is negative, the sequence’s length is added to it. The resulting value must be a nonnegative integer less than the sequence’s length, and the sequence is asked to assign the assigned object to its item with that index. If the index is out of range, "IndexError" is raised (assignment to a subscripted sequence cannot add new items to a list). If the primary is a mapping object (such as a dictionary), the subscript must have a type compatible with the mapping’s key type, and the mapping is then asked to create a key/datum pair which maps the subscript to the assigned object. This can either replace an existing key/value pair with the same key value, or insert a new key/value pair (if no key with the same value existed). For user-defined objects, the "__setitem__()" method is called with appropriate arguments. * If the target is a slicing: The primary expression in the reference is evaluated. It should yield a mutable sequence object (such as a list). The assigned object should be a sequence object of the same type. Next, the lower and upper bound expressions are evaluated, insofar they are present; defaults are zero and the sequence’s length. The bounds should evaluate to integers. If either bound is negative, the sequence’s length is added to it. The resulting bounds are clipped to lie between zero and the sequence’s length, inclusive. Finally, the sequence object is asked to replace the slice with the items of the assigned sequence. The length of the slice may be different from the length of the assigned sequence, thus changing the length of the target sequence, if the target sequence allows it. **CPython implementation detail:** In the current implementation, the syntax for targets is taken to be the same as for expressions, and invalid syntax is rejected during the code generation phase, causing less detailed error messages. Although the definition of assignment implies that overlaps between the left-hand side and the right-hand side are ‘simultaneous’ (for example "a, b = b, a" swaps two variables), overlaps *within* the collection of assigned-to variables occur left-to-right, sometimes resulting in confusion. For instance, the following program prints "[0, 2]": x = [0, 1] i = 0 i, x[i] = 1, 2 # i is updated, then x[i] is updated print(x) See also: **PEP 3132** - Extended Iterable Unpacking The specification for the "*target" feature. 7.2.1. Augmented assignment statements -------------------------------------- Augmented assignment is the combination, in a single statement, of a binary operation and an assignment statement: augmented_assignment_stmt ::= augtarget augop (expression_list | yield_expression) augtarget ::= identifier | attributeref | subscription | slicing augop ::= "+=" | "-=" | "*=" | "@=" | "/=" | "//=" | "%=" | "**=" | ">>=" | "<<=" | "&=" | "^=" | "|=" (See section Primaries for the syntax definitions of the last three symbols.) An augmented assignment evaluates the target (which, unlike normal assignment statements, cannot be an unpacking) and the expression list, performs the binary operation specific to the type of assignment on the two operands, and assigns the result to the original target. The target is only evaluated once. An augmented assignment expression like "x += 1" can be rewritten as "x = x + 1" to achieve a similar, but not exactly equal effect. In the augmented version, "x" is only evaluated once. Also, when possible, the actual operation is performed *in-place*, meaning that rather than creating a new object and assigning that to the target, the old object is modified instead. Unlike normal assignments, augmented assignments evaluate the left- hand side *before* evaluating the right-hand side. For example, "a[i] += f(x)" first looks-up "a[i]", then it evaluates "f(x)" and performs the addition, and lastly, it writes the result back to "a[i]". With the exception of assigning to tuples and multiple targets in a single statement, the assignment done by augmented assignment statements is handled the same way as normal assignments. Similarly, with the exception of the possible *in-place* behavior, the binary operation performed by augmented assignment is the same as the normal binary operations. For targets which are attribute references, the same caveat about class and instance attributes applies as for regular assignments. 7.2.2. Annotated assignment statements -------------------------------------- *Annotation* assignment is the combination, in a single statement, of a variable or attribute annotation and an optional assignment statement: annotated_assignment_stmt ::= augtarget ":" expression ["=" (starred_expression | yield_expression)] The difference from normal Assignment statements is that only single target is allowed. For simple names as assignment targets, if in class or module scope, the annotations are evaluated and stored in a special class or module attribute "__annotations__" that is a dictionary mapping from variable names (mangled if private) to evaluated annotations. This attribute is writable and is automatically created at the start of class or module body execution, if annotations are found statically. For expressions as assignment targets, the annotations are evaluated if in class or module scope, but not stored. If a name is annotated in a function scope, then this name is local for that scope. Annotations are never evaluated and stored in function scopes. If the right hand side is present, an annotated assignment performs the actual assignment before evaluating annotations (where applicable). If the right hand side is not present for an expression target, then the interpreter evaluates the target except for the last "__setitem__()" or "__setattr__()" call. See also: **PEP 526** - Syntax for Variable Annotations The proposal that added syntax for annotating the types of variables (including class variables and instance variables), instead of expressing them through comments. **PEP 484** - Type hints The proposal that added the "typing" module to provide a standard syntax for type annotations that can be used in static analysis tools and IDEs. Changed in version 3.8: Now annotated assignments allow same expressions in the right hand side as the regular assignments. Previously, some expressions (like un-parenthesized tuple expressions) caused a syntax error. 7.3. The "assert" statement =========================== Assert statements are a convenient way to insert debugging assertions into a program: assert_stmt ::= "assert" expression ["," expression] The simple form, "assert expression", is equivalent to if __debug__: if not expression: raise AssertionError The extended form, "assert expression1, expression2", is equivalent to if __debug__: if not expression1: raise AssertionError(expression2) These equivalences assume that "__debug__" and "AssertionError" refer to the built-in variables with those names. In the current implementation, the built-in variable "__debug__" is "True" under normal circumstances, "False" when optimization is requested (command line option "-O"). The current code generator emits no code for an assert statement when optimization is requested at compile time. Note that it is unnecessary to include the source code for the expression that failed in the error message; it will be displayed as part of the stack trace. Assignments to "__debug__" are illegal. The value for the built-in variable is determined when the interpreter starts. 7.4. The "pass" statement ========================= pass_stmt ::= "pass" "pass" is a null operation — when it is executed, nothing happens. It is useful as a placeholder when a statement is required syntactically, but no code needs to be executed, for example: def f(arg): pass # a function that does nothing (yet) class C: pass # a class with no methods (yet) 7.5. The "del" statement ======================== del_stmt ::= "del" target_list Deletion is recursively defined very similar to the way assignment is defined. Rather than spelling it out in full details, here are some hints. Deletion of a target list recursively deletes each target, from left to right. Deletion of a name removes the binding of that name from the local or global namespace, depending on whether the name occurs in a "global" statement in the same code block. If the name is unbound, a "NameError" exception will be raised. Deletion of attribute references, subscriptions and slicings is passed to the primary object involved; deletion of a slicing is in general equivalent to assignment of an empty slice of the right type (but even this is determined by the sliced object). Changed in version 3.2: Previously it was illegal to delete a name from the local namespace if it occurs as a free variable in a nested block. 7.6. The "return" statement =========================== return_stmt ::= "return" [expression_list] "return" may only occur syntactically nested in a function definition, not within a nested class definition. If an expression list is present, it is evaluated, else "None" is substituted. "return" leaves the current function call with the expression list (or "None") as return value. When "return" passes control out of a "try" statement with a "finally" clause, that "finally" clause is executed before really leaving the function. In a generator function, the "return" statement indicates that the generator is done and will cause "StopIteration" to be raised. The returned value (if any) is used as an argument to construct "StopIteration" and becomes the "StopIteration.value" attribute. In an asynchronous generator function, an empty "return" statement indicates that the asynchronous generator is done and will cause "StopAsyncIteration" to be raised. A non-empty "return" statement is a syntax error in an asynchronous generator function. 7.7. The "yield" statement ========================== yield_stmt ::= yield_expression A "yield" statement is semantically equivalent to a yield expression. The yield statement can be used to omit the parentheses that would otherwise be required in the equivalent yield expression statement. For example, the yield statements yield yield from are equivalent to the yield expression statements (yield ) (yield from ) Yield expressions and statements are only used when defining a *generator* function, and are only used in the body of the generator function. Using yield in a function definition is sufficient to cause that definition to create a generator function instead of a normal function. For full details of "yield" semantics, refer to the Yield expressions section. 7.8. The "raise" statement ========================== raise_stmt ::= "raise" [expression ["from" expression]] If no expressions are present, "raise" re-raises the exception that is currently being handled, which is also known as the *active exception*. If there isn’t currently an active exception, a "RuntimeError" exception is raised indicating that this is an error. Otherwise, "raise" evaluates the first expression as the exception object. It must be either a subclass or an instance of "BaseException". If it is a class, the exception instance will be obtained when needed by instantiating the class with no arguments. The *type* of the exception is the exception instance’s class, the *value* is the instance itself. A traceback object is normally created automatically when an exception is raised and attached to it as the "__traceback__" attribute, which is writable. You can create an exception and set your own traceback in one step using the "with_traceback()" exception method (which returns the same exception instance, with its traceback set to its argument), like so: raise Exception("foo occurred").with_traceback(tracebackobj) The "from" clause is used for exception chaining: if given, the second *expression* must be another exception class or instance. If the second expression is an exception instance, it will be attached to the raised exception as the "__cause__" attribute (which is writable). If the expression is an exception class, the class will be instantiated and the resulting exception instance will be attached to the raised exception as the "__cause__" attribute. If the raised exception is not handled, both exceptions will be printed: >>> try: ... print(1 / 0) ... except Exception as exc: ... raise RuntimeError("Something bad happened") from exc ... Traceback (most recent call last): File "", line 2, in ZeroDivisionError: division by zero The above exception was the direct cause of the following exception: Traceback (most recent call last): File "", line 4, in RuntimeError: Something bad happened A similar mechanism works implicitly if a new exception is raised when an exception is already being handled. An exception may be handled when an "except" or "finally" clause, or a "with" statement, is used. The previous exception is then attached as the new exception’s "__context__" attribute: >>> try: ... print(1 / 0) ... except: ... raise RuntimeError("Something bad happened") ... Traceback (most recent call last): File "", line 2, in ZeroDivisionError: division by zero During handling of the above exception, another exception occurred: Traceback (most recent call last): File "", line 4, in RuntimeError: Something bad happened Exception chaining can be explicitly suppressed by specifying "None" in the "from" clause: >>> try: ... print(1 / 0) ... except: ... raise RuntimeError("Something bad happened") from None ... Traceback (most recent call last): File "", line 4, in RuntimeError: Something bad happened Additional information on exceptions can be found in section Exceptions, and information about handling exceptions is in section The try statement. Changed in version 3.3: "None" is now permitted as "Y" in "raise X from Y". New in version 3.3: The "__suppress_context__" attribute to suppress automatic display of the exception context. 7.9. The "break" statement ========================== break_stmt ::= "break" "break" may only occur syntactically nested in a "for" or "while" loop, but not nested in a function or class definition within that loop. It terminates the nearest enclosing loop, skipping the optional "else" clause if the loop has one. If a "for" loop is terminated by "break", the loop control target keeps its current value. When "break" passes control out of a "try" statement with a "finally" clause, that "finally" clause is executed before really leaving the loop. 7.10. The "continue" statement ============================== continue_stmt ::= "continue" "continue" may only occur syntactically nested in a "for" or "while" loop, but not nested in a function or class definition within that loop. It continues with the next cycle of the nearest enclosing loop. When "continue" passes control out of a "try" statement with a "finally" clause, that "finally" clause is executed before really starting the next loop cycle. 7.11. The "import" statement ============================ import_stmt ::= "import" module ["as" identifier] ("," module ["as" identifier])* | "from" relative_module "import" identifier ["as" identifier] ("," identifier ["as" identifier])* | "from" relative_module "import" "(" identifier ["as" identifier] ("," identifier ["as" identifier])* [","] ")" | "from" relative_module "import" "*" module ::= (identifier ".")* identifier relative_module ::= "."* module | "."+ The basic import statement (no "from" clause) is executed in two steps: 1. find a module, loading and initializing it if necessary 2. define a name or names in the local namespace for the scope where the "import" statement occurs. When the statement contains multiple clauses (separated by commas) the two steps are carried out separately for each clause, just as though the clauses had been separated out into individual import statements. The details of the first step, finding and loading modules are described in greater detail in the section on the import system, which also describes the various types of packages and modules that can be imported, as well as all the hooks that can be used to customize the import system. Note that failures in this step may indicate either that the module could not be located, *or* that an error occurred while initializing the module, which includes execution of the module’s code. If the requested module is retrieved successfully, it will be made available in the local namespace in one of three ways: * If the module name is followed by "as", then the name following "as" is bound directly to the imported module. * If no other name is specified, and the module being imported is a top level module, the module’s name is bound in the local namespace as a reference to the imported module * If the module being imported is *not* a top level module, then the name of the top level package that contains the module is bound in the local namespace as a reference to the top level package. The imported module must be accessed using its full qualified name rather than directly The "from" form uses a slightly more complex process: 1. find the module specified in the "from" clause, loading and initializing it if necessary; 2. for each of the identifiers specified in the "import" clauses: 1. check if the imported module has an attribute by that name 2. if not, attempt to import a submodule with that name and then check the imported module again for that attribute 3. if the attribute is not found, "ImportError" is raised. 4. otherwise, a reference to that value is stored in the local namespace, using the name in the "as" clause if it is present, otherwise using the attribute name Examples: import foo # foo imported and bound locally import foo.bar.baz # foo, foo.bar, and foo.bar.baz imported, foo bound locally import foo.bar.baz as fbb # foo, foo.bar, and foo.bar.baz imported, foo.bar.baz bound as fbb from foo.bar import baz # foo, foo.bar, and foo.bar.baz imported, foo.bar.baz bound as baz from foo import attr # foo imported and foo.attr bound as attr If the list of identifiers is replaced by a star ("'*'"), all public names defined in the module are bound in the local namespace for the scope where the "import" statement occurs. The *public names* defined by a module are determined by checking the module’s namespace for a variable named "__all__"; if defined, it must be a sequence of strings which are names defined or imported by that module. The names given in "__all__" are all considered public and are required to exist. If "__all__" is not defined, the set of public names includes all names found in the module’s namespace which do not begin with an underscore character ("'_'"). "__all__" should contain the entire public API. It is intended to avoid accidentally exporting items that are not part of the API (such as library modules which were imported and used within the module). The wild card form of import — "from module import *" — is only allowed at the module level. Attempting to use it in class or function definitions will raise a "SyntaxError". When specifying what module to import you do not have to specify the absolute name of the module. When a module or package is contained within another package it is possible to make a relative import within the same top package without having to mention the package name. By using leading dots in the specified module or package after "from" you can specify how high to traverse up the current package hierarchy without specifying exact names. One leading dot means the current package where the module making the import exists. Two dots means up one package level. Three dots is up two levels, etc. So if you execute "from . import mod" from a module in the "pkg" package then you will end up importing "pkg.mod". If you execute "from ..subpkg2 import mod" from within "pkg.subpkg1" you will import "pkg.subpkg2.mod". The specification for relative imports is contained in the Package Relative Imports section. "importlib.import_module()" is provided to support applications that determine dynamically the modules to be loaded. Raises an auditing event "import" with arguments "module", "filename", "sys.path", "sys.meta_path", "sys.path_hooks". 7.11.1. Future statements ------------------------- A *future statement* is a directive to the compiler that a particular module should be compiled using syntax or semantics that will be available in a specified future release of Python where the feature becomes standard. The future statement is intended to ease migration to future versions of Python that introduce incompatible changes to the language. It allows use of the new features on a per-module basis before the release in which the feature becomes standard. future_stmt ::= "from" "__future__" "import" feature ["as" identifier] ("," feature ["as" identifier])* | "from" "__future__" "import" "(" feature ["as" identifier] ("," feature ["as" identifier])* [","] ")" feature ::= identifier A future statement must appear near the top of the module. The only lines that can appear before a future statement are: * the module docstring (if any), * comments, * blank lines, and * other future statements. The only feature that requires using the future statement is "annotations" (see **PEP 563**). All historical features enabled by the future statement are still recognized by Python 3. The list includes "absolute_import", "division", "generators", "generator_stop", "unicode_literals", "print_function", "nested_scopes" and "with_statement". They are all redundant because they are always enabled, and only kept for backwards compatibility. A future statement is recognized and treated specially at compile time: Changes to the semantics of core constructs are often implemented by generating different code. It may even be the case that a new feature introduces new incompatible syntax (such as a new reserved word), in which case the compiler may need to parse the module differently. Such decisions cannot be pushed off until runtime. For any given release, the compiler knows which feature names have been defined, and raises a compile-time error if a future statement contains a feature not known to it. The direct runtime semantics are the same as for any import statement: there is a standard module "__future__", described later, and it will be imported in the usual way at the time the future statement is executed. The interesting runtime semantics depend on the specific feature enabled by the future statement. Note that there is nothing special about the statement: import __future__ [as name] That is not a future statement; it’s an ordinary import statement with no special semantics or syntax restrictions. Code compiled by calls to the built-in functions "exec()" and "compile()" that occur in a module "M" containing a future statement will, by default, use the new syntax or semantics associated with the future statement. This can be controlled by optional arguments to "compile()" — see the documentation of that function for details. A future statement typed at an interactive interpreter prompt will take effect for the rest of the interpreter session. If an interpreter is started with the "-i" option, is passed a script name to execute, and the script includes a future statement, it will be in effect in the interactive session started after the script is executed. See also: **PEP 236** - Back to the __future__ The original proposal for the __future__ mechanism. 7.12. The "global" statement ============================ global_stmt ::= "global" identifier ("," identifier)* The "global" statement is a declaration which holds for the entire current code block. It means that the listed identifiers are to be interpreted as globals. It would be impossible to assign to a global variable without "global", although free variables may refer to globals without being declared global. Names listed in a "global" statement must not be used in the same code block textually preceding that "global" statement. Names listed in a "global" statement must not be defined as formal parameters or in a "for" loop control target, "class" definition, function definition, "import" statement, or variable annotation. **CPython implementation detail:** The current implementation does not enforce some of these restrictions, but programs should not abuse this freedom, as future implementations may enforce them or silently change the meaning of the program. **Programmer’s note:** "global" is a directive to the parser. It applies only to code parsed at the same time as the "global" statement. In particular, a "global" statement contained in a string or code object supplied to the built-in "exec()" function does not affect the code block *containing* the function call, and code contained in such a string is unaffected by "global" statements in the code containing the function call. The same applies to the "eval()" and "compile()" functions. 7.13. The "nonlocal" statement ============================== nonlocal_stmt ::= "nonlocal" identifier ("," identifier)* The "nonlocal" statement causes the listed identifiers to refer to previously bound variables in the nearest enclosing scope excluding globals. This is important because the default behavior for binding is to search the local namespace first. The statement allows encapsulated code to rebind variables outside of the local scope besides the global (module) scope. Names listed in a "nonlocal" statement, unlike those listed in a "global" statement, must refer to pre-existing bindings in an enclosing scope (the scope in which a new binding should be created cannot be determined unambiguously). Names listed in a "nonlocal" statement must not collide with pre- existing bindings in the local scope. See also: **PEP 3104** - Access to Names in Outer Scopes The specification for the "nonlocal" statement. 9. Top-level components *********************** The Python interpreter can get its input from a number of sources: from a script passed to it as standard input or as program argument, typed in interactively, from a module source file, etc. This chapter gives the syntax used in these cases. 9.1. Complete Python programs ============================= While a language specification need not prescribe how the language interpreter is invoked, it is useful to have a notion of a complete Python program. A complete Python program is executed in a minimally initialized environment: all built-in and standard modules are available, but none have been initialized, except for "sys" (various system services), "builtins" (built-in functions, exceptions and "None") and "__main__". The latter is used to provide the local and global namespace for execution of the complete program. The syntax for a complete Python program is that for file input, described in the next section. The interpreter may also be invoked in interactive mode; in this case, it does not read and execute a complete program but reads and executes one statement (possibly compound) at a time. The initial environment is identical to that of a complete program; each statement is executed in the namespace of "__main__". A complete program can be passed to the interpreter in three forms: with the "-c" *string* command line option, as a file passed as the first command line argument, or as standard input. If the file or standard input is a tty device, the interpreter enters interactive mode; otherwise, it executes the file as a complete program. 9.2. File input =============== All input read from non-interactive files has the same form: file_input ::= (NEWLINE | statement)* This syntax is used in the following situations: * when parsing a complete Python program (from a file or from a string); * when parsing a module; * when parsing a string passed to the "exec()" function; 9.3. Interactive input ====================== Input in interactive mode is parsed using the following grammar: interactive_input ::= [stmt_list] NEWLINE | compound_stmt NEWLINE Note that a (top-level) compound statement must be followed by a blank line in interactive mode; this is needed to help the parser detect the end of the input. 9.4. Expression input ===================== "eval()" is used for expression input. It ignores leading whitespace. The string argument to "eval()" must have the following form: eval_input ::= expression_list NEWLINE* 16. Appendix ************ 16.1. Interactive Mode ====================== 16.1.1. Error Handling ---------------------- When an error occurs, the interpreter prints an error message and a stack trace. In interactive mode, it then returns to the primary prompt; when input came from a file, it exits with a nonzero exit status after printing the stack trace. (Exceptions handled by an "except" clause in a "try" statement are not errors in this context.) Some errors are unconditionally fatal and cause an exit with a nonzero exit; this applies to internal inconsistencies and some cases of running out of memory. All error messages are written to the standard error stream; normal output from executed commands is written to standard output. Typing the interrupt character (usually "Control-C" or "Delete") to the primary or secondary prompt cancels the input and returns to the primary prompt. [1] Typing an interrupt while a command is executing raises the "KeyboardInterrupt" exception, which may be handled by a "try" statement. 16.1.2. Executable Python Scripts --------------------------------- On BSD’ish Unix systems, Python scripts can be made directly executable, like shell scripts, by putting the line #!/usr/bin/env python3.5 (assuming that the interpreter is on the user’s "PATH") at the beginning of the script and giving the file an executable mode. The "#!" must be the first two characters of the file. On some platforms, this first line must end with a Unix-style line ending ("'\n'"), not a Windows ("'\r\n'") line ending. Note that the hash, or pound, character, "'#'", is used to start a comment in Python. The script can be given an executable mode, or permission, using the **chmod** command. $ chmod +x myscript.py On Windows systems, there is no notion of an “executable mode”. The Python installer automatically associates ".py" files with "python.exe" so that a double-click on a Python file will run it as a script. The extension can also be ".pyw", in that case, the console window that normally appears is suppressed. 16.1.3. The Interactive Startup File ------------------------------------ When you use Python interactively, it is frequently handy to have some standard commands executed every time the interpreter is started. You can do this by setting an environment variable named "PYTHONSTARTUP" to the name of a file containing your start-up commands. This is similar to the ".profile" feature of the Unix shells. This file is only read in interactive sessions, not when Python reads commands from a script, and not when "/dev/tty" is given as the explicit source of commands (which otherwise behaves like an interactive session). It is executed in the same namespace where interactive commands are executed, so that objects that it defines or imports can be used without qualification in the interactive session. You can also change the prompts "sys.ps1" and "sys.ps2" in this file. If you want to read an additional start-up file from the current directory, you can program this in the global start-up file using code like "if os.path.isfile('.pythonrc.py'): exec(open('.pythonrc.py').read())". If you want to use the startup file in a script, you must do this explicitly in the script: import os filename = os.environ.get('PYTHONSTARTUP') if filename and os.path.isfile(filename): with open(filename) as fobj: startup_file = fobj.read() exec(startup_file) 16.1.4. The Customization Modules --------------------------------- Python provides two hooks to let you customize it: "sitecustomize" and "usercustomize". To see how it works, you need first to find the location of your user site-packages directory. Start Python and run this code: >>> import site >>> site.getusersitepackages() '/home/user/.local/lib/python3.5/site-packages' Now you can create a file named "usercustomize.py" in that directory and put anything you want in it. It will affect every invocation of Python, unless it is started with the "-s" option to disable the automatic import. "sitecustomize" works in the same way, but is typically created by an administrator of the computer in the global site-packages directory, and is imported before "usercustomize". See the documentation of the "site" module for more details. -[ Footnotes ]- [1] A problem with the GNU Readline package may prevent this. 1. Whetting Your Appetite ************************* If you do much work on computers, eventually you find that there’s some task you’d like to automate. For example, you may wish to perform a search-and-replace over a large number of text files, or rename and rearrange a bunch of photo files in a complicated way. Perhaps you’d like to write a small custom database, or a specialized GUI application, or a simple game. If you’re a professional software developer, you may have to work with several C/C++/Java libraries but find the usual write/compile/test/re- compile cycle is too slow. Perhaps you’re writing a test suite for such a library and find writing the testing code a tedious task. Or maybe you’ve written a program that could use an extension language, and you don’t want to design and implement a whole new language for your application. Python is just the language for you. You could write a Unix shell script or Windows batch files for some of these tasks, but shell scripts are best at moving around files and changing text data, not well-suited for GUI applications or games. You could write a C/C++/Java program, but it can take a lot of development time to get even a first-draft program. Python is simpler to use, available on Windows, macOS, and Unix operating systems, and will help you get the job done more quickly. Python is simple to use, but it is a real programming language, offering much more structure and support for large programs than shell scripts or batch files can offer. On the other hand, Python also offers much more error checking than C, and, being a *very-high-level language*, it has high-level data types built in, such as flexible arrays and dictionaries. Because of its more general data types Python is applicable to a much larger problem domain than Awk or even Perl, yet many things are at least as easy in Python as in those languages. Python allows you to split your program into modules that can be reused in other Python programs. It comes with a large collection of standard modules that you can use as the basis of your programs — or as examples to start learning to program in Python. Some of these modules provide things like file I/O, system calls, sockets, and even interfaces to graphical user interface toolkits like Tk. Python is an interpreted language, which can save you considerable time during program development because no compilation and linking is necessary. The interpreter can be used interactively, which makes it easy to experiment with features of the language, to write throw-away programs, or to test functions during bottom-up program development. It is also a handy desk calculator. Python enables programs to be written compactly and readably. Programs written in Python are typically much shorter than equivalent C, C++, or Java programs, for several reasons: * the high-level data types allow you to express complex operations in a single statement; * statement grouping is done by indentation instead of beginning and ending brackets; * no variable or argument declarations are necessary. Python is *extensible*: if you know how to program in C it is easy to add a new built-in function or module to the interpreter, either to perform critical operations at maximum speed, or to link Python programs to libraries that may only be available in binary form (such as a vendor-specific graphics library). Once you are really hooked, you can link the Python interpreter into an application written in C and use it as an extension or command language for that application. By the way, the language is named after the BBC show “Monty Python’s Flying Circus” and has nothing to do with reptiles. Making references to Monty Python skits in documentation is not only allowed, it is encouraged! Now that you are all excited about Python, you’ll want to examine it in some more detail. Since the best way to learn a language is to use it, the tutorial invites you to play with the Python interpreter as you read. In the next chapter, the mechanics of using the interpreter are explained. This is rather mundane information, but essential for trying out the examples shown later. The rest of the tutorial introduces various features of the Python language and system through examples, beginning with simple expressions, statements and data types, through functions and modules, and finally touching upon advanced concepts like exceptions and user- defined classes. 9. Classes ********** Classes provide a means of bundling data and functionality together. Creating a new class creates a new *type* of object, allowing new *instances* of that type to be made. Each class instance can have attributes attached to it for maintaining its state. Class instances can also have methods (defined by its class) for modifying its state. Compared with other programming languages, Python’s class mechanism adds classes with a minimum of new syntax and semantics. It is a mixture of the class mechanisms found in C++ and Modula-3. Python classes provide all the standard features of Object Oriented Programming: the class inheritance mechanism allows multiple base classes, a derived class can override any methods of its base class or classes, and a method can call the method of a base class with the same name. Objects can contain arbitrary amounts and kinds of data. As is true for modules, classes partake of the dynamic nature of Python: they are created at runtime, and can be modified further after creation. In C++ terminology, normally class members (including the data members) are *public* (except see below Private Variables), and all member functions are *virtual*. As in Modula-3, there are no shorthands for referencing the object’s members from its methods: the method function is declared with an explicit first argument representing the object, which is provided implicitly by the call. As in Smalltalk, classes themselves are objects. This provides semantics for importing and renaming. Unlike C++ and Modula-3, built-in types can be used as base classes for extension by the user. Also, like in C++, most built-in operators with special syntax (arithmetic operators, subscripting etc.) can be redefined for class instances. (Lacking universally accepted terminology to talk about classes, I will make occasional use of Smalltalk and C++ terms. I would use Modula-3 terms, since its object-oriented semantics are closer to those of Python than C++, but I expect that few readers have heard of it.) 9.1. A Word About Names and Objects =================================== Objects have individuality, and multiple names (in multiple scopes) can be bound to the same object. This is known as aliasing in other languages. This is usually not appreciated on a first glance at Python, and can be safely ignored when dealing with immutable basic types (numbers, strings, tuples). However, aliasing has a possibly surprising effect on the semantics of Python code involving mutable objects such as lists, dictionaries, and most other types. This is usually used to the benefit of the program, since aliases behave like pointers in some respects. For example, passing an object is cheap since only a pointer is passed by the implementation; and if a function modifies an object passed as an argument, the caller will see the change — this eliminates the need for two different argument passing mechanisms as in Pascal. 9.2. Python Scopes and Namespaces ================================= Before introducing classes, I first have to tell you something about Python’s scope rules. Class definitions play some neat tricks with namespaces, and you need to know how scopes and namespaces work to fully understand what’s going on. Incidentally, knowledge about this subject is useful for any advanced Python programmer. Let’s begin with some definitions. A *namespace* is a mapping from names to objects. Most namespaces are currently implemented as Python dictionaries, but that’s normally not noticeable in any way (except for performance), and it may change in the future. Examples of namespaces are: the set of built-in names (containing functions such as "abs()", and built-in exception names); the global names in a module; and the local names in a function invocation. In a sense the set of attributes of an object also form a namespace. The important thing to know about namespaces is that there is absolutely no relation between names in different namespaces; for instance, two different modules may both define a function "maximize" without confusion — users of the modules must prefix it with the module name. By the way, I use the word *attribute* for any name following a dot — for example, in the expression "z.real", "real" is an attribute of the object "z". Strictly speaking, references to names in modules are attribute references: in the expression "modname.funcname", "modname" is a module object and "funcname" is an attribute of it. In this case there happens to be a straightforward mapping between the module’s attributes and the global names defined in the module: they share the same namespace! [1] Attributes may be read-only or writable. In the latter case, assignment to attributes is possible. Module attributes are writable: you can write "modname.the_answer = 42". Writable attributes may also be deleted with the "del" statement. For example, "del modname.the_answer" will remove the attribute "the_answer" from the object named by "modname". Namespaces are created at different moments and have different lifetimes. The namespace containing the built-in names is created when the Python interpreter starts up, and is never deleted. The global namespace for a module is created when the module definition is read in; normally, module namespaces also last until the interpreter quits. The statements executed by the top-level invocation of the interpreter, either read from a script file or interactively, are considered part of a module called "__main__", so they have their own global namespace. (The built-in names actually also live in a module; this is called "builtins".) The local namespace for a function is created when the function is called, and deleted when the function returns or raises an exception that is not handled within the function. (Actually, forgetting would be a better way to describe what actually happens.) Of course, recursive invocations each have their own local namespace. A *scope* is a textual region of a Python program where a namespace is directly accessible. “Directly accessible” here means that an unqualified reference to a name attempts to find the name in the namespace. Although scopes are determined statically, they are used dynamically. At any time during execution, there are 3 or 4 nested scopes whose namespaces are directly accessible: * the innermost scope, which is searched first, contains the local names * the scopes of any enclosing functions, which are searched starting with the nearest enclosing scope, contains non-local, but also non- global names * the next-to-last scope contains the current module’s global names * the outermost scope (searched last) is the namespace containing built-in names If a name is declared global, then all references and assignments go directly to the middle scope containing the module’s global names. To rebind variables found outside of the innermost scope, the "nonlocal" statement can be used; if not declared nonlocal, those variables are read-only (an attempt to write to such a variable will simply create a *new* local variable in the innermost scope, leaving the identically named outer variable unchanged). Usually, the local scope references the local names of the (textually) current function. Outside functions, the local scope references the same namespace as the global scope: the module’s namespace. Class definitions place yet another namespace in the local scope. It is important to realize that scopes are determined textually: the global scope of a function defined in a module is that module’s namespace, no matter from where or by what alias the function is called. On the other hand, the actual search for names is done dynamically, at run time — however, the language definition is evolving towards static name resolution, at “compile” time, so don’t rely on dynamic name resolution! (In fact, local variables are already determined statically.) A special quirk of Python is that – if no "global" or "nonlocal" statement is in effect – assignments to names always go into the innermost scope. Assignments do not copy data — they just bind names to objects. The same is true for deletions: the statement "del x" removes the binding of "x" from the namespace referenced by the local scope. In fact, all operations that introduce new names use the local scope: in particular, "import" statements and function definitions bind the module or function name in the local scope. The "global" statement can be used to indicate that particular variables live in the global scope and should be rebound there; the "nonlocal" statement indicates that particular variables live in an enclosing scope and should be rebound there. 9.2.1. Scopes and Namespaces Example ------------------------------------ This is an example demonstrating how to reference the different scopes and namespaces, and how "global" and "nonlocal" affect variable binding: def scope_test(): def do_local(): spam = "local spam" def do_nonlocal(): nonlocal spam spam = "nonlocal spam" def do_global(): global spam spam = "global spam" spam = "test spam" do_local() print("After local assignment:", spam) do_nonlocal() print("After nonlocal assignment:", spam) do_global() print("After global assignment:", spam) scope_test() print("In global scope:", spam) The output of the example code is: After local assignment: test spam After nonlocal assignment: nonlocal spam After global assignment: nonlocal spam In global scope: global spam Note how the *local* assignment (which is default) didn’t change *scope_test*’s binding of *spam*. The "nonlocal" assignment changed *scope_test*’s binding of *spam*, and the "global" assignment changed the module-level binding. You can also see that there was no previous binding for *spam* before the "global" assignment. 9.3. A First Look at Classes ============================ Classes introduce a little bit of new syntax, three new object types, and some new semantics. 9.3.1. Class Definition Syntax ------------------------------ The simplest form of class definition looks like this: class ClassName: . . . Class definitions, like function definitions ("def" statements) must be executed before they have any effect. (You could conceivably place a class definition in a branch of an "if" statement, or inside a function.) In practice, the statements inside a class definition will usually be function definitions, but other statements are allowed, and sometimes useful — we’ll come back to this later. The function definitions inside a class normally have a peculiar form of argument list, dictated by the calling conventions for methods — again, this is explained later. When a class definition is entered, a new namespace is created, and used as the local scope — thus, all assignments to local variables go into this new namespace. In particular, function definitions bind the name of the new function here. When a class definition is left normally (via the end), a *class object* is created. This is basically a wrapper around the contents of the namespace created by the class definition; we’ll learn more about class objects in the next section. The original local scope (the one in effect just before the class definition was entered) is reinstated, and the class object is bound here to the class name given in the class definition header ("ClassName" in the example). 9.3.2. Class Objects -------------------- Class objects support two kinds of operations: attribute references and instantiation. *Attribute references* use the standard syntax used for all attribute references in Python: "obj.name". Valid attribute names are all the names that were in the class’s namespace when the class object was created. So, if the class definition looked like this: class MyClass: """A simple example class""" i = 12345 def f(self): return 'hello world' then "MyClass.i" and "MyClass.f" are valid attribute references, returning an integer and a function object, respectively. Class attributes can also be assigned to, so you can change the value of "MyClass.i" by assignment. "__doc__" is also a valid attribute, returning the docstring belonging to the class: ""A simple example class"". Class *instantiation* uses function notation. Just pretend that the class object is a parameterless function that returns a new instance of the class. For example (assuming the above class): x = MyClass() creates a new *instance* of the class and assigns this object to the local variable "x". The instantiation operation (“calling” a class object) creates an empty object. Many classes like to create objects with instances customized to a specific initial state. Therefore a class may define a special method named "__init__()", like this: def __init__(self): self.data = [] When a class defines an "__init__()" method, class instantiation automatically invokes "__init__()" for the newly-created class instance. So in this example, a new, initialized instance can be obtained by: x = MyClass() Of course, the "__init__()" method may have arguments for greater flexibility. In that case, arguments given to the class instantiation operator are passed on to "__init__()". For example, >>> class Complex: ... def __init__(self, realpart, imagpart): ... self.r = realpart ... self.i = imagpart ... >>> x = Complex(3.0, -4.5) >>> x.r, x.i (3.0, -4.5) 9.3.3. Instance Objects ----------------------- Now what can we do with instance objects? The only operations understood by instance objects are attribute references. There are two kinds of valid attribute names: data attributes and methods. *Data attributes* correspond to “instance variables” in Smalltalk, and to “data members” in C++. Data attributes need not be declared; like local variables, they spring into existence when they are first assigned to. For example, if "x" is the instance of "MyClass" created above, the following piece of code will print the value "16", without leaving a trace: x.counter = 1 while x.counter < 10: x.counter = x.counter * 2 print(x.counter) del x.counter The other kind of instance attribute reference is a *method*. A method is a function that “belongs to” an object. (In Python, the term method is not unique to class instances: other object types can have methods as well. For example, list objects have methods called append, insert, remove, sort, and so on. However, in the following discussion, we’ll use the term method exclusively to mean methods of class instance objects, unless explicitly stated otherwise.) Valid method names of an instance object depend on its class. By definition, all attributes of a class that are function objects define corresponding methods of its instances. So in our example, "x.f" is a valid method reference, since "MyClass.f" is a function, but "x.i" is not, since "MyClass.i" is not. But "x.f" is not the same thing as "MyClass.f" — it is a *method object*, not a function object. 9.3.4. Method Objects --------------------- Usually, a method is called right after it is bound: x.f() In the "MyClass" example, this will return the string "'hello world'". However, it is not necessary to call a method right away: "x.f" is a method object, and can be stored away and called at a later time. For example: xf = x.f while True: print(xf()) will continue to print "hello world" until the end of time. What exactly happens when a method is called? You may have noticed that "x.f()" was called without an argument above, even though the function definition for "f()" specified an argument. What happened to the argument? Surely Python raises an exception when a function that requires an argument is called without any — even if the argument isn’t actually used… Actually, you may have guessed the answer: the special thing about methods is that the instance object is passed as the first argument of the function. In our example, the call "x.f()" is exactly equivalent to "MyClass.f(x)". In general, calling a method with a list of *n* arguments is equivalent to calling the corresponding function with an argument list that is created by inserting the method’s instance object before the first argument. If you still don’t understand how methods work, a look at the implementation can perhaps clarify matters. When a non-data attribute of an instance is referenced, the instance’s class is searched. If the name denotes a valid class attribute that is a function object, a method object is created by packing (pointers to) the instance object and the function object just found together in an abstract object: this is the method object. When the method object is called with an argument list, a new argument list is constructed from the instance object and the argument list, and the function object is called with this new argument list. 9.3.5. Class and Instance Variables ----------------------------------- Generally speaking, instance variables are for data unique to each instance and class variables are for attributes and methods shared by all instances of the class: class Dog: kind = 'canine' # class variable shared by all instances def __init__(self, name): self.name = name # instance variable unique to each instance >>> d = Dog('Fido') >>> e = Dog('Buddy') >>> d.kind # shared by all dogs 'canine' >>> e.kind # shared by all dogs 'canine' >>> d.name # unique to d 'Fido' >>> e.name # unique to e 'Buddy' As discussed in A Word About Names and Objects, shared data can have possibly surprising effects with involving *mutable* objects such as lists and dictionaries. For example, the *tricks* list in the following code should not be used as a class variable because just a single list would be shared by all *Dog* instances: class Dog: tricks = [] # mistaken use of a class variable def __init__(self, name): self.name = name def add_trick(self, trick): self.tricks.append(trick) >>> d = Dog('Fido') >>> e = Dog('Buddy') >>> d.add_trick('roll over') >>> e.add_trick('play dead') >>> d.tricks # unexpectedly shared by all dogs ['roll over', 'play dead'] Correct design of the class should use an instance variable instead: class Dog: def __init__(self, name): self.name = name self.tricks = [] # creates a new empty list for each dog def add_trick(self, trick): self.tricks.append(trick) >>> d = Dog('Fido') >>> e = Dog('Buddy') >>> d.add_trick('roll over') >>> e.add_trick('play dead') >>> d.tricks ['roll over'] >>> e.tricks ['play dead'] 9.4. Random Remarks =================== If the same attribute name occurs in both an instance and in a class, then attribute lookup prioritizes the instance: >>> class Warehouse: purpose = 'storage' region = 'west' >>> w1 = Warehouse() >>> print(w1.purpose, w1.region) storage west >>> w2 = Warehouse() >>> w2.region = 'east' >>> print(w2.purpose, w2.region) storage east Data attributes may be referenced by methods as well as by ordinary users (“clients”) of an object. In other words, classes are not usable to implement pure abstract data types. In fact, nothing in Python makes it possible to enforce data hiding — it is all based upon convention. (On the other hand, the Python implementation, written in C, can completely hide implementation details and control access to an object if necessary; this can be used by extensions to Python written in C.) Clients should use data attributes with care — clients may mess up invariants maintained by the methods by stamping on their data attributes. Note that clients may add data attributes of their own to an instance object without affecting the validity of the methods, as long as name conflicts are avoided — again, a naming convention can save a lot of headaches here. There is no shorthand for referencing data attributes (or other methods!) from within methods. I find that this actually increases the readability of methods: there is no chance of confusing local variables and instance variables when glancing through a method. Often, the first argument of a method is called "self". This is nothing more than a convention: the name "self" has absolutely no special meaning to Python. Note, however, that by not following the convention your code may be less readable to other Python programmers, and it is also conceivable that a *class browser* program might be written that relies upon such a convention. Any function object that is a class attribute defines a method for instances of that class. It is not necessary that the function definition is textually enclosed in the class definition: assigning a function object to a local variable in the class is also ok. For example: # Function defined outside the class def f1(self, x, y): return min(x, x+y) class C: f = f1 def g(self): return 'hello world' h = g Now "f", "g" and "h" are all attributes of class "C" that refer to function objects, and consequently they are all methods of instances of "C" — "h" being exactly equivalent to "g". Note that this practice usually only serves to confuse the reader of a program. Methods may call other methods by using method attributes of the "self" argument: class Bag: def __init__(self): self.data = [] def add(self, x): self.data.append(x) def addtwice(self, x): self.add(x) self.add(x) Methods may reference global names in the same way as ordinary functions. The global scope associated with a method is the module containing its definition. (A class is never used as a global scope.) While one rarely encounters a good reason for using global data in a method, there are many legitimate uses of the global scope: for one thing, functions and modules imported into the global scope can be used by methods, as well as functions and classes defined in it. Usually, the class containing the method is itself defined in this global scope, and in the next section we’ll find some good reasons why a method would want to reference its own class. Each value is an object, and therefore has a *class* (also called its *type*). It is stored as "object.__class__". 9.5. Inheritance ================ Of course, a language feature would not be worthy of the name “class” without supporting inheritance. The syntax for a derived class definition looks like this: class DerivedClassName(BaseClassName): . . . The name "BaseClassName" must be defined in a scope containing the derived class definition. In place of a base class name, other arbitrary expressions are also allowed. This can be useful, for example, when the base class is defined in another module: class DerivedClassName(modname.BaseClassName): Execution of a derived class definition proceeds the same as for a base class. When the class object is constructed, the base class is remembered. This is used for resolving attribute references: if a requested attribute is not found in the class, the search proceeds to look in the base class. This rule is applied recursively if the base class itself is derived from some other class. There’s nothing special about instantiation of derived classes: "DerivedClassName()" creates a new instance of the class. Method references are resolved as follows: the corresponding class attribute is searched, descending down the chain of base classes if necessary, and the method reference is valid if this yields a function object. Derived classes may override methods of their base classes. Because methods have no special privileges when calling other methods of the same object, a method of a base class that calls another method defined in the same base class may end up calling a method of a derived class that overrides it. (For C++ programmers: all methods in Python are effectively "virtual".) An overriding method in a derived class may in fact want to extend rather than simply replace the base class method of the same name. There is a simple way to call the base class method directly: just call "BaseClassName.methodname(self, arguments)". This is occasionally useful to clients as well. (Note that this only works if the base class is accessible as "BaseClassName" in the global scope.) Python has two built-in functions that work with inheritance: * Use "isinstance()" to check an instance’s type: "isinstance(obj, int)" will be "True" only if "obj.__class__" is "int" or some class derived from "int". * Use "issubclass()" to check class inheritance: "issubclass(bool, int)" is "True" since "bool" is a subclass of "int". However, "issubclass(float, int)" is "False" since "float" is not a subclass of "int". 9.5.1. Multiple Inheritance --------------------------- Python supports a form of multiple inheritance as well. A class definition with multiple base classes looks like this: class DerivedClassName(Base1, Base2, Base3): . . . For most purposes, in the simplest cases, you can think of the search for attributes inherited from a parent class as depth-first, left-to- right, not searching twice in the same class where there is an overlap in the hierarchy. Thus, if an attribute is not found in "DerivedClassName", it is searched for in "Base1", then (recursively) in the base classes of "Base1", and if it was not found there, it was searched for in "Base2", and so on. In fact, it is slightly more complex than that; the method resolution order changes dynamically to support cooperative calls to "super()". This approach is known in some other multiple-inheritance languages as call-next-method and is more powerful than the super call found in single-inheritance languages. Dynamic ordering is necessary because all cases of multiple inheritance exhibit one or more diamond relationships (where at least one of the parent classes can be accessed through multiple paths from the bottommost class). For example, all classes inherit from "object", so any case of multiple inheritance provides more than one path to reach "object". To keep the base classes from being accessed more than once, the dynamic algorithm linearizes the search order in a way that preserves the left-to-right ordering specified in each class, that calls each parent only once, and that is monotonic (meaning that a class can be subclassed without affecting the precedence order of its parents). Taken together, these properties make it possible to design reliable and extensible classes with multiple inheritance. For more detail, see https://www.python.org/download/releases/2.3/mro/. 9.6. Private Variables ====================== “Private” instance variables that cannot be accessed except from inside an object don’t exist in Python. However, there is a convention that is followed by most Python code: a name prefixed with an underscore (e.g. "_spam") should be treated as a non-public part of the API (whether it is a function, a method or a data member). It should be considered an implementation detail and subject to change without notice. Since there is a valid use-case for class-private members (namely to avoid name clashes of names with names defined by subclasses), there is limited support for such a mechanism, called *name mangling*. Any identifier of the form "__spam" (at least two leading underscores, at most one trailing underscore) is textually replaced with "_classname__spam", where "classname" is the current class name with leading underscore(s) stripped. This mangling is done without regard to the syntactic position of the identifier, as long as it occurs within the definition of a class. Name mangling is helpful for letting subclasses override methods without breaking intraclass method calls. For example: class Mapping: def __init__(self, iterable): self.items_list = [] self.__update(iterable) def update(self, iterable): for item in iterable: self.items_list.append(item) __update = update # private copy of original update() method class MappingSubclass(Mapping): def update(self, keys, values): # provides new signature for update() # but does not break __init__() for item in zip(keys, values): self.items_list.append(item) The above example would work even if "MappingSubclass" were to introduce a "__update" identifier since it is replaced with "_Mapping__update" in the "Mapping" class and "_MappingSubclass__update" in the "MappingSubclass" class respectively. Note that the mangling rules are designed mostly to avoid accidents; it still is possible to access or modify a variable that is considered private. This can even be useful in special circumstances, such as in the debugger. Notice that code passed to "exec()" or "eval()" does not consider the classname of the invoking class to be the current class; this is similar to the effect of the "global" statement, the effect of which is likewise restricted to code that is byte-compiled together. The same restriction applies to "getattr()", "setattr()" and "delattr()", as well as when referencing "__dict__" directly. 9.7. Odds and Ends ================== Sometimes it is useful to have a data type similar to the Pascal “record” or C “struct”, bundling together a few named data items. An empty class definition will do nicely: class Employee: pass john = Employee() # Create an empty employee record # Fill the fields of the record john.name = 'John Doe' john.dept = 'computer lab' john.salary = 1000 A piece of Python code that expects a particular abstract data type can often be passed a class that emulates the methods of that data type instead. For instance, if you have a function that formats some data from a file object, you can define a class with methods "read()" and "readline()" that get the data from a string buffer instead, and pass it as an argument. Instance method objects have attributes, too: "m.__self__" is the instance object with the method "m()", and "m.__func__" is the function object corresponding to the method. 9.8. Iterators ============== By now you have probably noticed that most container objects can be looped over using a "for" statement: for element in [1, 2, 3]: print(element) for element in (1, 2, 3): print(element) for key in {'one':1, 'two':2}: print(key) for char in "123": print(char) for line in open("myfile.txt"): print(line, end='') This style of access is clear, concise, and convenient. The use of iterators pervades and unifies Python. Behind the scenes, the "for" statement calls "iter()" on the container object. The function returns an iterator object that defines the method "__next__()" which accesses elements in the container one at a time. When there are no more elements, "__next__()" raises a "StopIteration" exception which tells the "for" loop to terminate. You can call the "__next__()" method using the "next()" built-in function; this example shows how it all works: >>> s = 'abc' >>> it = iter(s) >>> it >>> next(it) 'a' >>> next(it) 'b' >>> next(it) 'c' >>> next(it) Traceback (most recent call last): File "", line 1, in next(it) StopIteration Having seen the mechanics behind the iterator protocol, it is easy to add iterator behavior to your classes. Define an "__iter__()" method which returns an object with a "__next__()" method. If the class defines "__next__()", then "__iter__()" can just return "self": class Reverse: """Iterator for looping over a sequence backwards.""" def __init__(self, data): self.data = data self.index = len(data) def __iter__(self): return self def __next__(self): if self.index == 0: raise StopIteration self.index = self.index - 1 return self.data[self.index] >>> rev = Reverse('spam') >>> iter(rev) <__main__.Reverse object at 0x00A1DB50> >>> for char in rev: ... print(char) ... m a p s 9.9. Generators =============== *Generators* are a simple and powerful tool for creating iterators. They are written like regular functions but use the "yield" statement whenever they want to return data. Each time "next()" is called on it, the generator resumes where it left off (it remembers all the data values and which statement was last executed). An example shows that generators can be trivially easy to create: def reverse(data): for index in range(len(data)-1, -1, -1): yield data[index] >>> for char in reverse('golf'): ... print(char) ... f l o g Anything that can be done with generators can also be done with class- based iterators as described in the previous section. What makes generators so compact is that the "__iter__()" and "__next__()" methods are created automatically. Another key feature is that the local variables and execution state are automatically saved between calls. This made the function easier to write and much more clear than an approach using instance variables like "self.index" and "self.data". In addition to automatic method creation and saving program state, when generators terminate, they automatically raise "StopIteration". In combination, these features make it easy to create iterators with no more effort than writing a regular function. 9.10. Generator Expressions =========================== Some simple generators can be coded succinctly as expressions using a syntax similar to list comprehensions but with parentheses instead of square brackets. These expressions are designed for situations where the generator is used right away by an enclosing function. Generator expressions are more compact but less versatile than full generator definitions and tend to be more memory friendly than equivalent list comprehensions. Examples: >>> sum(i*i for i in range(10)) # sum of squares 285 >>> xvec = [10, 20, 30] >>> yvec = [7, 5, 3] >>> sum(x*y for x,y in zip(xvec, yvec)) # dot product 260 >>> unique_words = set(word for line in page for word in line.split()) >>> valedictorian = max((student.gpa, student.name) for student in graduates) >>> data = 'golf' >>> list(data[i] for i in range(len(data)-1, -1, -1)) ['f', 'l', 'o', 'g'] -[ Footnotes ]- [1] Except for one thing. Module objects have a secret read-only attribute called "__dict__" which returns the dictionary used to implement the module’s namespace; the name "__dict__" is an attribute but not a global name. Obviously, using this violates the abstraction of namespace implementation, and should be restricted to things like post-mortem debuggers. 4. More Control Flow Tools ************************** Besides the "while" statement just introduced, Python uses the usual flow control statements known from other languages, with some twists. 4.1. "if" Statements ==================== Perhaps the most well-known statement type is the "if" statement. For example: >>> x = int(input("Please enter an integer: ")) Please enter an integer: 42 >>> if x < 0: ... x = 0 ... print('Negative changed to zero') ... elif x == 0: ... print('Zero') ... elif x == 1: ... print('Single') ... else: ... print('More') ... More There can be zero or more "elif" parts, and the "else" part is optional. The keyword ‘"elif"’ is short for ‘else if’, and is useful to avoid excessive indentation. An "if" … "elif" … "elif" … sequence is a substitute for the "switch" or "case" statements found in other languages. 4.2. "for" Statements ===================== The "for" statement in Python differs a bit from what you may be used to in C or Pascal. Rather than always iterating over an arithmetic progression of numbers (like in Pascal), or giving the user the ability to define both the iteration step and halting condition (as C), Python’s "for" statement iterates over the items of any sequence (a list or a string), in the order that they appear in the sequence. For example (no pun intended): >>> # Measure some strings: ... words = ['cat', 'window', 'defenestrate'] >>> for w in words: ... print(w, len(w)) ... cat 3 window 6 defenestrate 12 Code that modifies a collection while iterating over that same collection can be tricky to get right. Instead, it is usually more straight-forward to loop over a copy of the collection or to create a new collection: # Strategy: Iterate over a copy for user, status in users.copy().items(): if status == 'inactive': del users[user] # Strategy: Create a new collection active_users = {} for user, status in users.items(): if status == 'active': active_users[user] = status 4.3. The "range()" Function =========================== If you do need to iterate over a sequence of numbers, the built-in function "range()" comes in handy. It generates arithmetic progressions: >>> for i in range(5): ... print(i) ... 0 1 2 3 4 The given end point is never part of the generated sequence; "range(10)" generates 10 values, the legal indices for items of a sequence of length 10. It is possible to let the range start at another number, or to specify a different increment (even negative; sometimes this is called the ‘step’): >>> list(range(5, 10)) [5, 6, 7, 8, 9] >>> list(range(0, 10, 3)) [0, 3, 6, 9] >>> list(range(-10, -100, -30)) [-10, -40, -70] To iterate over the indices of a sequence, you can combine "range()" and "len()" as follows: >>> a = ['Mary', 'had', 'a', 'little', 'lamb'] >>> for i in range(len(a)): ... print(i, a[i]) ... 0 Mary 1 had 2 a 3 little 4 lamb In most such cases, however, it is convenient to use the "enumerate()" function, see Looping Techniques. A strange thing happens if you just print a range: >>> range(10) range(0, 10) In many ways the object returned by "range()" behaves as if it is a list, but in fact it isn’t. It is an object which returns the successive items of the desired sequence when you iterate over it, but it doesn’t really make the list, thus saving space. We say such an object is *iterable*, that is, suitable as a target for functions and constructs that expect something from which they can obtain successive items until the supply is exhausted. We have seen that the "for" statement is such a construct, while an example of a function that takes an iterable is "sum()": >>> sum(range(4)) # 0 + 1 + 2 + 3 6 Later we will see more functions that return iterables and take iterables as arguments. In chapter Data Structures, we will discuss in more detail about "list()". 4.4. "break" and "continue" Statements, and "else" Clauses on Loops =================================================================== The "break" statement, like in C, breaks out of the innermost enclosing "for" or "while" loop. Loop statements may have an "else" clause; it is executed when the loop terminates through exhaustion of the iterable (with "for") or when the condition becomes false (with "while"), but not when the loop is terminated by a "break" statement. This is exemplified by the following loop, which searches for prime numbers: >>> for n in range(2, 10): ... for x in range(2, n): ... if n % x == 0: ... print(n, 'equals', x, '*', n//x) ... break ... else: ... # loop fell through without finding a factor ... print(n, 'is a prime number') ... 2 is a prime number 3 is a prime number 4 equals 2 * 2 5 is a prime number 6 equals 2 * 3 7 is a prime number 8 equals 2 * 4 9 equals 3 * 3 (Yes, this is the correct code. Look closely: the "else" clause belongs to the "for" loop, **not** the "if" statement.) When used with a loop, the "else" clause has more in common with the "else" clause of a "try" statement than it does with that of "if" statements: a "try" statement’s "else" clause runs when no exception occurs, and a loop’s "else" clause runs when no "break" occurs. For more on the "try" statement and exceptions, see Handling Exceptions. The "continue" statement, also borrowed from C, continues with the next iteration of the loop: >>> for num in range(2, 10): ... if num % 2 == 0: ... print("Found an even number", num) ... continue ... print("Found an odd number", num) ... Found an even number 2 Found an odd number 3 Found an even number 4 Found an odd number 5 Found an even number 6 Found an odd number 7 Found an even number 8 Found an odd number 9 4.5. "pass" Statements ====================== The "pass" statement does nothing. It can be used when a statement is required syntactically but the program requires no action. For example: >>> while True: ... pass # Busy-wait for keyboard interrupt (Ctrl+C) ... This is commonly used for creating minimal classes: >>> class MyEmptyClass: ... pass ... Another place "pass" can be used is as a place-holder for a function or conditional body when you are working on new code, allowing you to keep thinking at a more abstract level. The "pass" is silently ignored: >>> def initlog(*args): ... pass # Remember to implement this! ... 4.6. Defining Functions ======================= We can create a function that writes the Fibonacci series to an arbitrary boundary: >>> def fib(n): # write Fibonacci series up to n ... """Print a Fibonacci series up to n.""" ... a, b = 0, 1 ... while a < n: ... print(a, end=' ') ... a, b = b, a+b ... print() ... >>> # Now call the function we just defined: ... fib(2000) 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 1597 The keyword "def" introduces a function *definition*. It must be followed by the function name and the parenthesized list of formal parameters. The statements that form the body of the function start at the next line, and must be indented. The first statement of the function body can optionally be a string literal; this string literal is the function’s documentation string, or *docstring*. (More about docstrings can be found in the section Documentation Strings.) There are tools which use docstrings to automatically produce online or printed documentation, or to let the user interactively browse through code; it’s good practice to include docstrings in code that you write, so make a habit of it. The *execution* of a function introduces a new symbol table used for the local variables of the function. More precisely, all variable assignments in a function store the value in the local symbol table; whereas variable references first look in the local symbol table, then in the local symbol tables of enclosing functions, then in the global symbol table, and finally in the table of built-in names. Thus, global variables and variables of enclosing functions cannot be directly assigned a value within a function (unless, for global variables, named in a "global" statement, or, for variables of enclosing functions, named in a "nonlocal" statement), although they may be referenced. The actual parameters (arguments) to a function call are introduced in the local symbol table of the called function when it is called; thus, arguments are passed using *call by value* (where the *value* is always an object *reference*, not the value of the object). [1] When a function calls another function, or calls itself recursively, a new local symbol table is created for that call. A function definition associates the function name with the function object in the current symbol table. The interpreter recognizes the object pointed to by that name as a user-defined function. Other names can also point to that same function object and can also be used to access the function: >>> fib >>> f = fib >>> f(100) 0 1 1 2 3 5 8 13 21 34 55 89 Coming from other languages, you might object that "fib" is not a function but a procedure since it doesn’t return a value. In fact, even functions without a "return" statement do return a value, albeit a rather boring one. This value is called "None" (it’s a built-in name). Writing the value "None" is normally suppressed by the interpreter if it would be the only value written. You can see it if you really want to using "print()": >>> fib(0) >>> print(fib(0)) None It is simple to write a function that returns a list of the numbers of the Fibonacci series, instead of printing it: >>> def fib2(n): # return Fibonacci series up to n ... """Return a list containing the Fibonacci series up to n.""" ... result = [] ... a, b = 0, 1 ... while a < n: ... result.append(a) # see below ... a, b = b, a+b ... return result ... >>> f100 = fib2(100) # call it >>> f100 # write the result [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89] This example, as usual, demonstrates some new Python features: * The "return" statement returns with a value from a function. "return" without an expression argument returns "None". Falling off the end of a function also returns "None". * The statement "result.append(a)" calls a *method* of the list object "result". A method is a function that ‘belongs’ to an object and is named "obj.methodname", where "obj" is some object (this may be an expression), and "methodname" is the name of a method that is defined by the object’s type. Different types define different methods. Methods of different types may have the same name without causing ambiguity. (It is possible to define your own object types and methods, using *classes*, see Classes) The method "append()" shown in the example is defined for list objects; it adds a new element at the end of the list. In this example it is equivalent to "result = result + [a]", but more efficient. 4.7. More on Defining Functions =============================== It is also possible to define functions with a variable number of arguments. There are three forms, which can be combined. 4.7.1. Default Argument Values ------------------------------ The most useful form is to specify a default value for one or more arguments. This creates a function that can be called with fewer arguments than it is defined to allow. For example: def ask_ok(prompt, retries=4, reminder='Please try again!'): while True: ok = input(prompt) if ok in ('y', 'ye', 'yes'): return True if ok in ('n', 'no', 'nop', 'nope'): return False retries = retries - 1 if retries < 0: raise ValueError('invalid user response') print(reminder) This function can be called in several ways: * giving only the mandatory argument: "ask_ok('Do you really want to quit?')" * giving one of the optional arguments: "ask_ok('OK to overwrite the file?', 2)" * or even giving all arguments: "ask_ok('OK to overwrite the file?', 2, 'Come on, only yes or no!')" This example also introduces the "in" keyword. This tests whether or not a sequence contains a certain value. The default values are evaluated at the point of function definition in the *defining* scope, so that i = 5 def f(arg=i): print(arg) i = 6 f() will print "5". **Important warning:** The default value is evaluated only once. This makes a difference when the default is a mutable object such as a list, dictionary, or instances of most classes. For example, the following function accumulates the arguments passed to it on subsequent calls: def f(a, L=[]): L.append(a) return L print(f(1)) print(f(2)) print(f(3)) This will print [1] [1, 2] [1, 2, 3] If you don’t want the default to be shared between subsequent calls, you can write the function like this instead: def f(a, L=None): if L is None: L = [] L.append(a) return L 4.7.2. Keyword Arguments ------------------------ Functions can also be called using *keyword arguments* of the form "kwarg=value". For instance, the following function: def parrot(voltage, state='a stiff', action='voom', type='Norwegian Blue'): print("-- This parrot wouldn't", action, end=' ') print("if you put", voltage, "volts through it.") print("-- Lovely plumage, the", type) print("-- It's", state, "!") accepts one required argument ("voltage") and three optional arguments ("state", "action", and "type"). This function can be called in any of the following ways: parrot(1000) # 1 positional argument parrot(voltage=1000) # 1 keyword argument parrot(voltage=1000000, action='VOOOOOM') # 2 keyword arguments parrot(action='VOOOOOM', voltage=1000000) # 2 keyword arguments parrot('a million', 'bereft of life', 'jump') # 3 positional arguments parrot('a thousand', state='pushing up the daisies') # 1 positional, 1 keyword but all the following calls would be invalid: parrot() # required argument missing parrot(voltage=5.0, 'dead') # non-keyword argument after a keyword argument parrot(110, voltage=220) # duplicate value for the same argument parrot(actor='John Cleese') # unknown keyword argument In a function call, keyword arguments must follow positional arguments. All the keyword arguments passed must match one of the arguments accepted by the function (e.g. "actor" is not a valid argument for the "parrot" function), and their order is not important. This also includes non-optional arguments (e.g. "parrot(voltage=1000)" is valid too). No argument may receive a value more than once. Here’s an example that fails due to this restriction: >>> def function(a): ... pass ... >>> function(0, a=0) Traceback (most recent call last): File "", line 1, in TypeError: function() got multiple values for argument 'a' When a final formal parameter of the form "**name" is present, it receives a dictionary (see Mapping Types — dict) containing all keyword arguments except for those corresponding to a formal parameter. This may be combined with a formal parameter of the form "*name" (described in the next subsection) which receives a tuple containing the positional arguments beyond the formal parameter list. ("*name" must occur before "**name".) For example, if we define a function like this: def cheeseshop(kind, *arguments, **keywords): print("-- Do you have any", kind, "?") print("-- I'm sorry, we're all out of", kind) for arg in arguments: print(arg) print("-" * 40) for kw in keywords: print(kw, ":", keywords[kw]) It could be called like this: cheeseshop("Limburger", "It's very runny, sir.", "It's really very, VERY runny, sir.", shopkeeper="Michael Palin", client="John Cleese", sketch="Cheese Shop Sketch") and of course it would print: -- Do you have any Limburger ? -- I'm sorry, we're all out of Limburger It's very runny, sir. It's really very, VERY runny, sir. ---------------------------------------- shopkeeper : Michael Palin client : John Cleese sketch : Cheese Shop Sketch Note that the order in which the keyword arguments are printed is guaranteed to match the order in which they were provided in the function call. 4.7.3. Special parameters ------------------------- By default, arguments may be passed to a Python function either by position or explicitly by keyword. For readability and performance, it makes sense to restrict the way arguments can be passed so that a developer need only look at the function definition to determine if items are passed by position, by position or keyword, or by keyword. A function definition may look like: def f(pos1, pos2, /, pos_or_kwd, *, kwd1, kwd2): ----------- ---------- ---------- | | | | Positional or keyword | | - Keyword only -- Positional only where "/" and "*" are optional. If used, these symbols indicate the kind of parameter by how the arguments may be passed to the function: positional-only, positional-or-keyword, and keyword-only. Keyword parameters are also referred to as named parameters. 4.7.3.1. Positional-or-Keyword Arguments ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ If "/" and "*" are not present in the function definition, arguments may be passed to a function by position or by keyword. 4.7.3.2. Positional-Only Parameters ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Looking at this in a bit more detail, it is possible to mark certain parameters as *positional-only*. If *positional-only*, the parameters’ order matters, and the parameters cannot be passed by keyword. Positional-only parameters are placed before a "/" (forward-slash). The "/" is used to logically separate the positional-only parameters from the rest of the parameters. If there is no "/" in the function definition, there are no positional-only parameters. Parameters following the "/" may be *positional-or-keyword* or *keyword-only*. 4.7.3.3. Keyword-Only Arguments ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ To mark parameters as *keyword-only*, indicating the parameters must be passed by keyword argument, place an "*" in the arguments list just before the first *keyword-only* parameter. 4.7.3.4. Function Examples ~~~~~~~~~~~~~~~~~~~~~~~~~~ Consider the following example function definitions paying close attention to the markers "/" and "*": >>> def standard_arg(arg): ... print(arg) ... >>> def pos_only_arg(arg, /): ... print(arg) ... >>> def kwd_only_arg(*, arg): ... print(arg) ... >>> def combined_example(pos_only, /, standard, *, kwd_only): ... print(pos_only, standard, kwd_only) The first function definition, "standard_arg", the most familiar form, places no restrictions on the calling convention and arguments may be passed by position or keyword: >>> standard_arg(2) 2 >>> standard_arg(arg=2) 2 The second function "pos_only_arg" is restricted to only use positional parameters as there is a "/" in the function definition: >>> pos_only_arg(1) 1 >>> pos_only_arg(arg=1) Traceback (most recent call last): File "", line 1, in TypeError: pos_only_arg() got some positional-only arguments passed as keyword arguments: 'arg' The third function "kwd_only_args" only allows keyword arguments as indicated by a "*" in the function definition: >>> kwd_only_arg(3) Traceback (most recent call last): File "", line 1, in TypeError: kwd_only_arg() takes 0 positional arguments but 1 was given >>> kwd_only_arg(arg=3) 3 And the last uses all three calling conventions in the same function definition: >>> combined_example(1, 2, 3) Traceback (most recent call last): File "", line 1, in TypeError: combined_example() takes 2 positional arguments but 3 were given >>> combined_example(1, 2, kwd_only=3) 1 2 3 >>> combined_example(1, standard=2, kwd_only=3) 1 2 3 >>> combined_example(pos_only=1, standard=2, kwd_only=3) Traceback (most recent call last): File "", line 1, in TypeError: combined_example() got some positional-only arguments passed as keyword arguments: 'pos_only' Finally, consider this function definition which has a potential collision between the positional argument "name" and "**kwds" which has "name" as a key: def foo(name, **kwds): return 'name' in kwds There is no possible call that will make it return "True" as the keyword "'name'" will always bind to the first parameter. For example: >>> foo(1, **{'name': 2}) Traceback (most recent call last): File "", line 1, in TypeError: foo() got multiple values for argument 'name' >>> But using "/" (positional only arguments), it is possible since it allows "name" as a positional argument and "'name'" as a key in the keyword arguments: def foo(name, /, **kwds): return 'name' in kwds >>> foo(1, **{'name': 2}) True In other words, the names of positional-only parameters can be used in "**kwds" without ambiguity. 4.7.3.5. Recap ~~~~~~~~~~~~~~ The use case will determine which parameters to use in the function definition: def f(pos1, pos2, /, pos_or_kwd, *, kwd1, kwd2): As guidance: * Use positional-only if you want the name of the parameters to not be available to the user. This is useful when parameter names have no real meaning, if you want to enforce the order of the arguments when the function is called or if you need to take some positional parameters and arbitrary keywords. * Use keyword-only when names have meaning and the function definition is more understandable by being explicit with names or you want to prevent users relying on the position of the argument being passed. * For an API, use positional-only to prevent breaking API changes if the parameter’s name is modified in the future. 4.7.4. Arbitrary Argument Lists ------------------------------- Finally, the least frequently used option is to specify that a function can be called with an arbitrary number of arguments. These arguments will be wrapped up in a tuple (see Tuples and Sequences). Before the variable number of arguments, zero or more normal arguments may occur. def write_multiple_items(file, separator, *args): file.write(separator.join(args)) Normally, these *variadic* arguments will be last in the list of formal parameters, because they scoop up all remaining input arguments that are passed to the function. Any formal parameters which occur after the "*args" parameter are ‘keyword-only’ arguments, meaning that they can only be used as keywords rather than positional arguments. >>> def concat(*args, sep="/"): ... return sep.join(args) ... >>> concat("earth", "mars", "venus") 'earth/mars/venus' >>> concat("earth", "mars", "venus", sep=".") 'earth.mars.venus' 4.7.5. Unpacking Argument Lists ------------------------------- The reverse situation occurs when the arguments are already in a list or tuple but need to be unpacked for a function call requiring separate positional arguments. For instance, the built-in "range()" function expects separate *start* and *stop* arguments. If they are not available separately, write the function call with the "*"-operator to unpack the arguments out of a list or tuple: >>> list(range(3, 6)) # normal call with separate arguments [3, 4, 5] >>> args = [3, 6] >>> list(range(*args)) # call with arguments unpacked from a list [3, 4, 5] In the same fashion, dictionaries can deliver keyword arguments with the "**"-operator: >>> def parrot(voltage, state='a stiff', action='voom'): ... print("-- This parrot wouldn't", action, end=' ') ... print("if you put", voltage, "volts through it.", end=' ') ... print("E's", state, "!") ... >>> d = {"voltage": "four million", "state": "bleedin' demised", "action": "VOOM"} >>> parrot(**d) -- This parrot wouldn't VOOM if you put four million volts through it. E's bleedin' demised ! 4.7.6. Lambda Expressions ------------------------- Small anonymous functions can be created with the "lambda" keyword. This function returns the sum of its two arguments: "lambda a, b: a+b". Lambda functions can be used wherever function objects are required. They are syntactically restricted to a single expression. Semantically, they are just syntactic sugar for a normal function definition. Like nested function definitions, lambda functions can reference variables from the containing scope: >>> def make_incrementor(n): ... return lambda x: x + n ... >>> f = make_incrementor(42) >>> f(0) 42 >>> f(1) 43 The above example uses a lambda expression to return a function. Another use is to pass a small function as an argument: >>> pairs = [(1, 'one'), (2, 'two'), (3, 'three'), (4, 'four')] >>> pairs.sort(key=lambda pair: pair[1]) >>> pairs [(4, 'four'), (1, 'one'), (3, 'three'), (2, 'two')] 4.7.7. Documentation Strings ---------------------------- Here are some conventions about the content and formatting of documentation strings. The first line should always be a short, concise summary of the object’s purpose. For brevity, it should not explicitly state the object’s name or type, since these are available by other means (except if the name happens to be a verb describing a function’s operation). This line should begin with a capital letter and end with a period. If there are more lines in the documentation string, the second line should be blank, visually separating the summary from the rest of the description. The following lines should be one or more paragraphs describing the object’s calling conventions, its side effects, etc. The Python parser does not strip indentation from multi-line string literals in Python, so tools that process documentation have to strip indentation if desired. This is done using the following convention. The first non-blank line *after* the first line of the string determines the amount of indentation for the entire documentation string. (We can’t use the first line since it is generally adjacent to the string’s opening quotes so its indentation is not apparent in the string literal.) Whitespace “equivalent” to this indentation is then stripped from the start of all lines of the string. Lines that are indented less should not occur, but if they occur all their leading whitespace should be stripped. Equivalence of whitespace should be tested after expansion of tabs (to 8 spaces, normally). Here is an example of a multi-line docstring: >>> def my_function(): ... """Do nothing, but document it. ... ... No, really, it doesn't do anything. ... """ ... pass ... >>> print(my_function.__doc__) Do nothing, but document it. No, really, it doesn't do anything. 4.7.8. Function Annotations --------------------------- Function annotations are completely optional metadata information about the types used by user-defined functions (see **PEP 3107** and **PEP 484** for more information). *Annotations* are stored in the "__annotations__" attribute of the function as a dictionary and have no effect on any other part of the function. Parameter annotations are defined by a colon after the parameter name, followed by an expression evaluating to the value of the annotation. Return annotations are defined by a literal "->", followed by an expression, between the parameter list and the colon denoting the end of the "def" statement. The following example has a required argument, an optional argument, and the return value annotated: >>> def f(ham: str, eggs: str = 'eggs') -> str: ... print("Annotations:", f.__annotations__) ... print("Arguments:", ham, eggs) ... return ham + ' and ' + eggs ... >>> f('spam') Annotations: {'ham': , 'return': , 'eggs': } Arguments: spam eggs 'spam and eggs' 4.8. Intermezzo: Coding Style ============================= Now that you are about to write longer, more complex pieces of Python, it is a good time to talk about *coding style*. Most languages can be written (or more concise, *formatted*) in different styles; some are more readable than others. Making it easy for others to read your code is always a good idea, and adopting a nice coding style helps tremendously for that. For Python, **PEP 8** has emerged as the style guide that most projects adhere to; it promotes a very readable and eye-pleasing coding style. Every Python developer should read it at some point; here are the most important points extracted for you: * Use 4-space indentation, and no tabs. 4 spaces are a good compromise between small indentation (allows greater nesting depth) and large indentation (easier to read). Tabs introduce confusion, and are best left out. * Wrap lines so that they don’t exceed 79 characters. This helps users with small displays and makes it possible to have several code files side-by-side on larger displays. * Use blank lines to separate functions and classes, and larger blocks of code inside functions. * When possible, put comments on a line of their own. * Use docstrings. * Use spaces around operators and after commas, but not directly inside bracketing constructs: "a = f(1, 2) + g(3, 4)". * Name your classes and functions consistently; the convention is to use "UpperCamelCase" for classes and "lowercase_with_underscores" for functions and methods. Always use "self" as the name for the first method argument (see A First Look at Classes for more on classes and methods). * Don’t use fancy encodings if your code is meant to be used in international environments. Python’s default, UTF-8, or even plain ASCII work best in any case. * Likewise, don’t use non-ASCII characters in identifiers if there is only the slightest chance people speaking a different language will read or maintain the code. -[ Footnotes ]- [1] Actually, *call by object reference* would be a better description, since if a mutable object is passed, the caller will see any changes the callee makes to it (items inserted into a list). 5. Data Structures ****************** This chapter describes some things you’ve learned about already in more detail, and adds some new things as well. 5.1. More on Lists ================== The list data type has some more methods. Here are all of the methods of list objects: list.append(x) Add an item to the end of the list. Equivalent to "a[len(a):] = [x]". list.extend(iterable) Extend the list by appending all the items from the iterable. Equivalent to "a[len(a):] = iterable". list.insert(i, x) Insert an item at a given position. The first argument is the index of the element before which to insert, so "a.insert(0, x)" inserts at the front of the list, and "a.insert(len(a), x)" is equivalent to "a.append(x)". list.remove(x) Remove the first item from the list whose value is equal to *x*. It raises a "ValueError" if there is no such item. list.pop([i]) Remove the item at the given position in the list, and return it. If no index is specified, "a.pop()" removes and returns the last item in the list. (The square brackets around the *i* in the method signature denote that the parameter is optional, not that you should type square brackets at that position. You will see this notation frequently in the Python Library Reference.) list.clear() Remove all items from the list. Equivalent to "del a[:]". list.index(x[, start[, end]]) Return zero-based index in the list of the first item whose value is equal to *x*. Raises a "ValueError" if there is no such item. The optional arguments *start* and *end* are interpreted as in the slice notation and are used to limit the search to a particular subsequence of the list. The returned index is computed relative to the beginning of the full sequence rather than the *start* argument. list.count(x) Return the number of times *x* appears in the list. list.sort(*, key=None, reverse=False) Sort the items of the list in place (the arguments can be used for sort customization, see "sorted()" for their explanation). list.reverse() Reverse the elements of the list in place. list.copy() Return a shallow copy of the list. Equivalent to "a[:]". An example that uses most of the list methods: >>> fruits = ['orange', 'apple', 'pear', 'banana', 'kiwi', 'apple', 'banana'] >>> fruits.count('apple') 2 >>> fruits.count('tangerine') 0 >>> fruits.index('banana') 3 >>> fruits.index('banana', 4) # Find next banana starting a position 4 6 >>> fruits.reverse() >>> fruits ['banana', 'apple', 'kiwi', 'banana', 'pear', 'apple', 'orange'] >>> fruits.append('grape') >>> fruits ['banana', 'apple', 'kiwi', 'banana', 'pear', 'apple', 'orange', 'grape'] >>> fruits.sort() >>> fruits ['apple', 'apple', 'banana', 'banana', 'grape', 'kiwi', 'orange', 'pear'] >>> fruits.pop() 'pear' You might have noticed that methods like "insert", "remove" or "sort" that only modify the list have no return value printed – they return the default "None". [1] This is a design principle for all mutable data structures in Python. Another thing you might notice is that not all data can be sorted or compared. For instance, "[None, 'hello', 10]" doesn’t sort because integers can’t be compared to strings and *None* can’t be compared to other types. Also, there are some types that don’t have a defined ordering relation. For example, "3+4j < 5+7j" isn’t a valid comparison. 5.1.1. Using Lists as Stacks ---------------------------- The list methods make it very easy to use a list as a stack, where the last element added is the first element retrieved (“last-in, first- out”). To add an item to the top of the stack, use "append()". To retrieve an item from the top of the stack, use "pop()" without an explicit index. For example: >>> stack = [3, 4, 5] >>> stack.append(6) >>> stack.append(7) >>> stack [3, 4, 5, 6, 7] >>> stack.pop() 7 >>> stack [3, 4, 5, 6] >>> stack.pop() 6 >>> stack.pop() 5 >>> stack [3, 4] 5.1.2. Using Lists as Queues ---------------------------- It is also possible to use a list as a queue, where the first element added is the first element retrieved (“first-in, first-out”); however, lists are not efficient for this purpose. While appends and pops from the end of list are fast, doing inserts or pops from the beginning of a list is slow (because all of the other elements have to be shifted by one). To implement a queue, use "collections.deque" which was designed to have fast appends and pops from both ends. For example: >>> from collections import deque >>> queue = deque(["Eric", "John", "Michael"]) >>> queue.append("Terry") # Terry arrives >>> queue.append("Graham") # Graham arrives >>> queue.popleft() # The first to arrive now leaves 'Eric' >>> queue.popleft() # The second to arrive now leaves 'John' >>> queue # Remaining queue in order of arrival deque(['Michael', 'Terry', 'Graham']) 5.1.3. List Comprehensions -------------------------- List comprehensions provide a concise way to create lists. Common applications are to make new lists where each element is the result of some operations applied to each member of another sequence or iterable, or to create a subsequence of those elements that satisfy a certain condition. For example, assume we want to create a list of squares, like: >>> squares = [] >>> for x in range(10): ... squares.append(x**2) ... >>> squares [0, 1, 4, 9, 16, 25, 36, 49, 64, 81] Note that this creates (or overwrites) a variable named "x" that still exists after the loop completes. We can calculate the list of squares without any side effects using: squares = list(map(lambda x: x**2, range(10))) or, equivalently: squares = [x**2 for x in range(10)] which is more concise and readable. A list comprehension consists of brackets containing an expression followed by a "for" clause, then zero or more "for" or "if" clauses. The result will be a new list resulting from evaluating the expression in the context of the "for" and "if" clauses which follow it. For example, this listcomp combines the elements of two lists if they are not equal: >>> [(x, y) for x in [1,2,3] for y in [3,1,4] if x != y] [(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)] and it’s equivalent to: >>> combs = [] >>> for x in [1,2,3]: ... for y in [3,1,4]: ... if x != y: ... combs.append((x, y)) ... >>> combs [(1, 3), (1, 4), (2, 3), (2, 1), (2, 4), (3, 1), (3, 4)] Note how the order of the "for" and "if" statements is the same in both these snippets. If the expression is a tuple (e.g. the "(x, y)" in the previous example), it must be parenthesized. >>> vec = [-4, -2, 0, 2, 4] >>> # create a new list with the values doubled >>> [x*2 for x in vec] [-8, -4, 0, 4, 8] >>> # filter the list to exclude negative numbers >>> [x for x in vec if x >= 0] [0, 2, 4] >>> # apply a function to all the elements >>> [abs(x) for x in vec] [4, 2, 0, 2, 4] >>> # call a method on each element >>> freshfruit = [' banana', ' loganberry ', 'passion fruit '] >>> [weapon.strip() for weapon in freshfruit] ['banana', 'loganberry', 'passion fruit'] >>> # create a list of 2-tuples like (number, square) >>> [(x, x**2) for x in range(6)] [(0, 0), (1, 1), (2, 4), (3, 9), (4, 16), (5, 25)] >>> # the tuple must be parenthesized, otherwise an error is raised >>> [x, x**2 for x in range(6)] File "", line 1, in [x, x**2 for x in range(6)] ^ SyntaxError: invalid syntax >>> # flatten a list using a listcomp with two 'for' >>> vec = [[1,2,3], [4,5,6], [7,8,9]] >>> [num for elem in vec for num in elem] [1, 2, 3, 4, 5, 6, 7, 8, 9] List comprehensions can contain complex expressions and nested functions: >>> from math import pi >>> [str(round(pi, i)) for i in range(1, 6)] ['3.1', '3.14', '3.142', '3.1416', '3.14159'] 5.1.4. Nested List Comprehensions --------------------------------- The initial expression in a list comprehension can be any arbitrary expression, including another list comprehension. Consider the following example of a 3x4 matrix implemented as a list of 3 lists of length 4: >>> matrix = [ ... [1, 2, 3, 4], ... [5, 6, 7, 8], ... [9, 10, 11, 12], ... ] The following list comprehension will transpose rows and columns: >>> [[row[i] for row in matrix] for i in range(4)] [[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]] As we saw in the previous section, the nested listcomp is evaluated in the context of the "for" that follows it, so this example is equivalent to: >>> transposed = [] >>> for i in range(4): ... transposed.append([row[i] for row in matrix]) ... >>> transposed [[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]] which, in turn, is the same as: >>> transposed = [] >>> for i in range(4): ... # the following 3 lines implement the nested listcomp ... transposed_row = [] ... for row in matrix: ... transposed_row.append(row[i]) ... transposed.append(transposed_row) ... >>> transposed [[1, 5, 9], [2, 6, 10], [3, 7, 11], [4, 8, 12]] In the real world, you should prefer built-in functions to complex flow statements. The "zip()" function would do a great job for this use case: >>> list(zip(*matrix)) [(1, 5, 9), (2, 6, 10), (3, 7, 11), (4, 8, 12)] See Unpacking Argument Lists for details on the asterisk in this line. 5.2. The "del" statement ======================== There is a way to remove an item from a list given its index instead of its value: the "del" statement. This differs from the "pop()" method which returns a value. The "del" statement can also be used to remove slices from a list or clear the entire list (which we did earlier by assignment of an empty list to the slice). For example: >>> a = [-1, 1, 66.25, 333, 333, 1234.5] >>> del a[0] >>> a [1, 66.25, 333, 333, 1234.5] >>> del a[2:4] >>> a [1, 66.25, 1234.5] >>> del a[:] >>> a [] "del" can also be used to delete entire variables: >>> del a Referencing the name "a" hereafter is an error (at least until another value is assigned to it). We’ll find other uses for "del" later. 5.3. Tuples and Sequences ========================= We saw that lists and strings have many common properties, such as indexing and slicing operations. They are two examples of *sequence* data types (see Sequence Types — list, tuple, range). Since Python is an evolving language, other sequence data types may be added. There is also another standard sequence data type: the *tuple*. A tuple consists of a number of values separated by commas, for instance: >>> t = 12345, 54321, 'hello!' >>> t[0] 12345 >>> t (12345, 54321, 'hello!') >>> # Tuples may be nested: ... u = t, (1, 2, 3, 4, 5) >>> u ((12345, 54321, 'hello!'), (1, 2, 3, 4, 5)) >>> # Tuples are immutable: ... t[0] = 88888 Traceback (most recent call last): File "", line 1, in TypeError: 'tuple' object does not support item assignment >>> # but they can contain mutable objects: ... v = ([1, 2, 3], [3, 2, 1]) >>> v ([1, 2, 3], [3, 2, 1]) As you see, on output tuples are always enclosed in parentheses, so that nested tuples are interpreted correctly; they may be input with or without surrounding parentheses, although often parentheses are necessary anyway (if the tuple is part of a larger expression). It is not possible to assign to the individual items of a tuple, however it is possible to create tuples which contain mutable objects, such as lists. Though tuples may seem similar to lists, they are often used in different situations and for different purposes. Tuples are *immutable*, and usually contain a heterogeneous sequence of elements that are accessed via unpacking (see later in this section) or indexing (or even by attribute in the case of "namedtuples"). Lists are *mutable*, and their elements are usually homogeneous and are accessed by iterating over the list. A special problem is the construction of tuples containing 0 or 1 items: the syntax has some extra quirks to accommodate these. Empty tuples are constructed by an empty pair of parentheses; a tuple with one item is constructed by following a value with a comma (it is not sufficient to enclose a single value in parentheses). Ugly, but effective. For example: >>> empty = () >>> singleton = 'hello', # <-- note trailing comma >>> len(empty) 0 >>> len(singleton) 1 >>> singleton ('hello',) The statement "t = 12345, 54321, 'hello!'" is an example of *tuple packing*: the values "12345", "54321" and "'hello!'" are packed together in a tuple. The reverse operation is also possible: >>> x, y, z = t This is called, appropriately enough, *sequence unpacking* and works for any sequence on the right-hand side. Sequence unpacking requires that there are as many variables on the left side of the equals sign as there are elements in the sequence. Note that multiple assignment is really just a combination of tuple packing and sequence unpacking. 5.4. Sets ========= Python also includes a data type for *sets*. A set is an unordered collection with no duplicate elements. Basic uses include membership testing and eliminating duplicate entries. Set objects also support mathematical operations like union, intersection, difference, and symmetric difference. Curly braces or the "set()" function can be used to create sets. Note: to create an empty set you have to use "set()", not "{}"; the latter creates an empty dictionary, a data structure that we discuss in the next section. Here is a brief demonstration: >>> basket = {'apple', 'orange', 'apple', 'pear', 'orange', 'banana'} >>> print(basket) # show that duplicates have been removed {'orange', 'banana', 'pear', 'apple'} >>> 'orange' in basket # fast membership testing True >>> 'crabgrass' in basket False >>> # Demonstrate set operations on unique letters from two words ... >>> a = set('abracadabra') >>> b = set('alacazam') >>> a # unique letters in a {'a', 'r', 'b', 'c', 'd'} >>> a - b # letters in a but not in b {'r', 'd', 'b'} >>> a | b # letters in a or b or both {'a', 'c', 'r', 'd', 'b', 'm', 'z', 'l'} >>> a & b # letters in both a and b {'a', 'c'} >>> a ^ b # letters in a or b but not both {'r', 'd', 'b', 'm', 'z', 'l'} Similarly to list comprehensions, set comprehensions are also supported: >>> a = {x for x in 'abracadabra' if x not in 'abc'} >>> a {'r', 'd'} 5.5. Dictionaries ================= Another useful data type built into Python is the *dictionary* (see Mapping Types — dict). Dictionaries are sometimes found in other languages as “associative memories” or “associative arrays”. Unlike sequences, which are indexed by a range of numbers, dictionaries are indexed by *keys*, which can be any immutable type; strings and numbers can always be keys. Tuples can be used as keys if they contain only strings, numbers, or tuples; if a tuple contains any mutable object either directly or indirectly, it cannot be used as a key. You can’t use lists as keys, since lists can be modified in place using index assignments, slice assignments, or methods like "append()" and "extend()". It is best to think of a dictionary as a set of *key: value* pairs, with the requirement that the keys are unique (within one dictionary). A pair of braces creates an empty dictionary: "{}". Placing a comma- separated list of key:value pairs within the braces adds initial key:value pairs to the dictionary; this is also the way dictionaries are written on output. The main operations on a dictionary are storing a value with some key and extracting the value given the key. It is also possible to delete a key:value pair with "del". If you store using a key that is already in use, the old value associated with that key is forgotten. It is an error to extract a value using a non-existent key. Performing "list(d)" on a dictionary returns a list of all the keys used in the dictionary, in insertion order (if you want it sorted, just use "sorted(d)" instead). To check whether a single key is in the dictionary, use the "in" keyword. Here is a small example using a dictionary: >>> tel = {'jack': 4098, 'sape': 4139} >>> tel['guido'] = 4127 >>> tel {'jack': 4098, 'sape': 4139, 'guido': 4127} >>> tel['jack'] 4098 >>> del tel['sape'] >>> tel['irv'] = 4127 >>> tel {'jack': 4098, 'guido': 4127, 'irv': 4127} >>> list(tel) ['jack', 'guido', 'irv'] >>> sorted(tel) ['guido', 'irv', 'jack'] >>> 'guido' in tel True >>> 'jack' not in tel False The "dict()" constructor builds dictionaries directly from sequences of key-value pairs: >>> dict([('sape', 4139), ('guido', 4127), ('jack', 4098)]) {'sape': 4139, 'guido': 4127, 'jack': 4098} In addition, dict comprehensions can be used to create dictionaries from arbitrary key and value expressions: >>> {x: x**2 for x in (2, 4, 6)} {2: 4, 4: 16, 6: 36} When the keys are simple strings, it is sometimes easier to specify pairs using keyword arguments: >>> dict(sape=4139, guido=4127, jack=4098) {'sape': 4139, 'guido': 4127, 'jack': 4098} 5.6. Looping Techniques ======================= When looping through dictionaries, the key and corresponding value can be retrieved at the same time using the "items()" method. >>> knights = {'gallahad': 'the pure', 'robin': 'the brave'} >>> for k, v in knights.items(): ... print(k, v) ... gallahad the pure robin the brave When looping through a sequence, the position index and corresponding value can be retrieved at the same time using the "enumerate()" function. >>> for i, v in enumerate(['tic', 'tac', 'toe']): ... print(i, v) ... 0 tic 1 tac 2 toe To loop over two or more sequences at the same time, the entries can be paired with the "zip()" function. >>> questions = ['name', 'quest', 'favorite color'] >>> answers = ['lancelot', 'the holy grail', 'blue'] >>> for q, a in zip(questions, answers): ... print('What is your {0}? It is {1}.'.format(q, a)) ... What is your name? It is lancelot. What is your quest? It is the holy grail. What is your favorite color? It is blue. To loop over a sequence in reverse, first specify the sequence in a forward direction and then call the "reversed()" function. >>> for i in reversed(range(1, 10, 2)): ... print(i) ... 9 7 5 3 1 To loop over a sequence in sorted order, use the "sorted()" function which returns a new sorted list while leaving the source unaltered. >>> basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana'] >>> for i in sorted(basket): ... print(i) ... apple apple banana orange orange pear Using "set()" on a sequence eliminates duplicate elements. The use of "sorted()" in combination with "set()" over a sequence is an idiomatic way to loop over unique elements of the sequence in sorted order. >>> basket = ['apple', 'orange', 'apple', 'pear', 'orange', 'banana'] >>> for f in sorted(set(basket)): ... print(f) ... apple banana orange pear It is sometimes tempting to change a list while you are looping over it; however, it is often simpler and safer to create a new list instead. >>> import math >>> raw_data = [56.2, float('NaN'), 51.7, 55.3, 52.5, float('NaN'), 47.8] >>> filtered_data = [] >>> for value in raw_data: ... if not math.isnan(value): ... filtered_data.append(value) ... >>> filtered_data [56.2, 51.7, 55.3, 52.5, 47.8] 5.7. More on Conditions ======================= The conditions used in "while" and "if" statements can contain any operators, not just comparisons. The comparison operators "in" and "not in" check whether a value occurs (does not occur) in a sequence. The operators "is" and "is not" compare whether two objects are really the same object. All comparison operators have the same priority, which is lower than that of all numerical operators. Comparisons can be chained. For example, "a < b == c" tests whether "a" is less than "b" and moreover "b" equals "c". Comparisons may be combined using the Boolean operators "and" and "or", and the outcome of a comparison (or of any other Boolean expression) may be negated with "not". These have lower priorities than comparison operators; between them, "not" has the highest priority and "or" the lowest, so that "A and not B or C" is equivalent to "(A and (not B)) or C". As always, parentheses can be used to express the desired composition. The Boolean operators "and" and "or" are so-called *short-circuit* operators: their arguments are evaluated from left to right, and evaluation stops as soon as the outcome is determined. For example, if "A" and "C" are true but "B" is false, "A and B and C" does not evaluate the expression "C". When used as a general value and not as a Boolean, the return value of a short-circuit operator is the last evaluated argument. It is possible to assign the result of a comparison or other Boolean expression to a variable. For example, >>> string1, string2, string3 = '', 'Trondheim', 'Hammer Dance' >>> non_null = string1 or string2 or string3 >>> non_null 'Trondheim' Note that in Python, unlike C, assignment inside expressions must be done explicitly with the walrus operator ":=". This avoids a common class of problems encountered in C programs: typing "=" in an expression when "==" was intended. 5.8. Comparing Sequences and Other Types ======================================== Sequence objects typically may be compared to other objects with the same sequence type. The comparison uses *lexicographical* ordering: first the first two items are compared, and if they differ this determines the outcome of the comparison; if they are equal, the next two items are compared, and so on, until either sequence is exhausted. If two items to be compared are themselves sequences of the same type, the lexicographical comparison is carried out recursively. If all items of two sequences compare equal, the sequences are considered equal. If one sequence is an initial sub-sequence of the other, the shorter sequence is the smaller (lesser) one. Lexicographical ordering for strings uses the Unicode code point number to order individual characters. Some examples of comparisons between sequences of the same type: (1, 2, 3) < (1, 2, 4) [1, 2, 3] < [1, 2, 4] 'ABC' < 'C' < 'Pascal' < 'Python' (1, 2, 3, 4) < (1, 2, 4) (1, 2) < (1, 2, -1) (1, 2, 3) == (1.0, 2.0, 3.0) (1, 2, ('aa', 'ab')) < (1, 2, ('abc', 'a'), 4) Note that comparing objects of different types with "<" or ">" is legal provided that the objects have appropriate comparison methods. For example, mixed numeric types are compared according to their numeric value, so 0 equals 0.0, etc. Otherwise, rather than providing an arbitrary ordering, the interpreter will raise a "TypeError" exception. -[ Footnotes ]- [1] Other languages may return the mutated object, which allows method chaining, such as "d->insert("a")->remove("b")->sort();". 8. Errors and Exceptions ************************ Until now error messages haven’t been more than mentioned, but if you have tried out the examples you have probably seen some. There are (at least) two distinguishable kinds of errors: *syntax errors* and *exceptions*. 8.1. Syntax Errors ================== Syntax errors, also known as parsing errors, are perhaps the most common kind of complaint you get while you are still learning Python: >>> while True print('Hello world') File "", line 1 while True print('Hello world') ^ SyntaxError: invalid syntax The parser repeats the offending line and displays a little ‘arrow’ pointing at the earliest point in the line where the error was detected. The error is caused by (or at least detected at) the token *preceding* the arrow: in the example, the error is detected at the function "print()", since a colon ("':'") is missing before it. File name and line number are printed so you know where to look in case the input came from a script. 8.2. Exceptions =============== Even if a statement or expression is syntactically correct, it may cause an error when an attempt is made to execute it. Errors detected during execution are called *exceptions* and are not unconditionally fatal: you will soon learn how to handle them in Python programs. Most exceptions are not handled by programs, however, and result in error messages as shown here: >>> 10 * (1/0) Traceback (most recent call last): File "", line 1, in ZeroDivisionError: division by zero >>> 4 + spam*3 Traceback (most recent call last): File "", line 1, in NameError: name 'spam' is not defined >>> '2' + 2 Traceback (most recent call last): File "", line 1, in TypeError: can only concatenate str (not "int") to str The last line of the error message indicates what happened. Exceptions come in different types, and the type is printed as part of the message: the types in the example are "ZeroDivisionError", "NameError" and "TypeError". The string printed as the exception type is the name of the built-in exception that occurred. This is true for all built- in exceptions, but need not be true for user-defined exceptions (although it is a useful convention). Standard exception names are built-in identifiers (not reserved keywords). The rest of the line provides detail based on the type of exception and what caused it. The preceding part of the error message shows the context where the exception occurred, in the form of a stack traceback. In general it contains a stack traceback listing source lines; however, it will not display lines read from standard input. Built-in Exceptions lists the built-in exceptions and their meanings. 8.3. Handling Exceptions ======================== It is possible to write programs that handle selected exceptions. Look at the following example, which asks the user for input until a valid integer has been entered, but allows the user to interrupt the program (using "Control-C" or whatever the operating system supports); note that a user-generated interruption is signalled by raising the "KeyboardInterrupt" exception. >>> while True: ... try: ... x = int(input("Please enter a number: ")) ... break ... except ValueError: ... print("Oops! That was no valid number. Try again...") ... The "try" statement works as follows. * First, the *try clause* (the statement(s) between the "try" and "except" keywords) is executed. * If no exception occurs, the *except clause* is skipped and execution of the "try" statement is finished. * If an exception occurs during execution of the try clause, the rest of the clause is skipped. Then if its type matches the exception named after the "except" keyword, the except clause is executed, and then execution continues after the "try" statement. * If an exception occurs which does not match the exception named in the except clause, it is passed on to outer "try" statements; if no handler is found, it is an *unhandled exception* and execution stops with a message as shown above. A "try" statement may have more than one except clause, to specify handlers for different exceptions. At most one handler will be executed. Handlers only handle exceptions that occur in the corresponding try clause, not in other handlers of the same "try" statement. An except clause may name multiple exceptions as a parenthesized tuple, for example: ... except (RuntimeError, TypeError, NameError): ... pass A class in an "except" clause is compatible with an exception if it is the same class or a base class thereof (but not the other way around — an except clause listing a derived class is not compatible with a base class). For example, the following code will print B, C, D in that order: class B(Exception): pass class C(B): pass class D(C): pass for cls in [B, C, D]: try: raise cls() except D: print("D") except C: print("C") except B: print("B") Note that if the except clauses were reversed (with "except B" first), it would have printed B, B, B — the first matching except clause is triggered. The last except clause may omit the exception name(s), to serve as a wildcard. Use this with extreme caution, since it is easy to mask a real programming error in this way! It can also be used to print an error message and then re-raise the exception (allowing a caller to handle the exception as well): import sys try: f = open('myfile.txt') s = f.readline() i = int(s.strip()) except OSError as err: print("OS error: {0}".format(err)) except ValueError: print("Could not convert data to an integer.") except: print("Unexpected error:", sys.exc_info()[0]) raise The "try" … "except" statement has an optional *else clause*, which, when present, must follow all except clauses. It is useful for code that must be executed if the try clause does not raise an exception. For example: for arg in sys.argv[1:]: try: f = open(arg, 'r') except OSError: print('cannot open', arg) else: print(arg, 'has', len(f.readlines()), 'lines') f.close() The use of the "else" clause is better than adding additional code to the "try" clause because it avoids accidentally catching an exception that wasn’t raised by the code being protected by the "try" … "except" statement. When an exception occurs, it may have an associated value, also known as the exception’s *argument*. The presence and type of the argument depend on the exception type. The except clause may specify a variable after the exception name. The variable is bound to an exception instance with the arguments stored in "instance.args". For convenience, the exception instance defines "__str__()" so the arguments can be printed directly without having to reference ".args". One may also instantiate an exception first before raising it and add any attributes to it as desired. >>> try: ... raise Exception('spam', 'eggs') ... except Exception as inst: ... print(type(inst)) # the exception instance ... print(inst.args) # arguments stored in .args ... print(inst) # __str__ allows args to be printed directly, ... # but may be overridden in exception subclasses ... x, y = inst.args # unpack args ... print('x =', x) ... print('y =', y) ... ('spam', 'eggs') ('spam', 'eggs') x = spam y = eggs If an exception has arguments, they are printed as the last part (‘detail’) of the message for unhandled exceptions. Exception handlers don’t just handle exceptions if they occur immediately in the try clause, but also if they occur inside functions that are called (even indirectly) in the try clause. For example: >>> def this_fails(): ... x = 1/0 ... >>> try: ... this_fails() ... except ZeroDivisionError as err: ... print('Handling run-time error:', err) ... Handling run-time error: division by zero 8.4. Raising Exceptions ======================= The "raise" statement allows the programmer to force a specified exception to occur. For example: >>> raise NameError('HiThere') Traceback (most recent call last): File "", line 1, in NameError: HiThere The sole argument to "raise" indicates the exception to be raised. This must be either an exception instance or an exception class (a class that derives from "Exception"). If an exception class is passed, it will be implicitly instantiated by calling its constructor with no arguments: raise ValueError # shorthand for 'raise ValueError()' If you need to determine whether an exception was raised but don’t intend to handle it, a simpler form of the "raise" statement allows you to re-raise the exception: >>> try: ... raise NameError('HiThere') ... except NameError: ... print('An exception flew by!') ... raise ... An exception flew by! Traceback (most recent call last): File "", line 2, in NameError: HiThere 8.5. Exception Chaining ======================= The "raise" statement allows an optional "from" which enables chaining exceptions. For example: # exc must be exception instance or None. raise RuntimeError from exc This can be useful when you are transforming exceptions. For example: >>> def func(): ... raise IOError ... >>> try: ... func() ... except IOError as exc: ... raise RuntimeError('Failed to open database') from exc ... Traceback (most recent call last): File "", line 2, in File "", line 2, in func OSError The above exception was the direct cause of the following exception: Traceback (most recent call last): File "", line 4, in RuntimeError: Failed to open database Exception chaining happens automatically when an exception is raised inside an "except" or "finally" section. Exception chaining can be disabled by using "from None" idiom: >>> try: ... open('database.sqlite') ... except OSError: ... raise RuntimeError from None ... Traceback (most recent call last): File "", line 4, in RuntimeError For more information about chaining mechanics, see Built-in Exceptions. 8.6. User-defined Exceptions ============================ Programs may name their own exceptions by creating a new exception class (see Classes for more about Python classes). Exceptions should typically be derived from the "Exception" class, either directly or indirectly. Exception classes can be defined which do anything any other class can do, but are usually kept simple, often only offering a number of attributes that allow information about the error to be extracted by handlers for the exception. Most exceptions are defined with names that end in “Error”, similar to the naming of the standard exceptions. Many standard modules define their own exceptions to report errors that may occur in functions they define. More information on classes is presented in chapter Classes. 8.7. Defining Clean-up Actions ============================== The "try" statement has another optional clause which is intended to define clean-up actions that must be executed under all circumstances. For example: >>> try: ... raise KeyboardInterrupt ... finally: ... print('Goodbye, world!') ... Goodbye, world! Traceback (most recent call last): File "", line 2, in KeyboardInterrupt If a "finally" clause is present, the "finally" clause will execute as the last task before the "try" statement completes. The "finally" clause runs whether or not the "try" statement produces an exception. The following points discuss more complex cases when an exception occurs: * If an exception occurs during execution of the "try" clause, the exception may be handled by an "except" clause. If the exception is not handled by an "except" clause, the exception is re-raised after the "finally" clause has been executed. * An exception could occur during execution of an "except" or "else" clause. Again, the exception is re-raised after the "finally" clause has been executed. * If the "finally" clause executes a "break", "continue" or "return" statement, exceptions are not re-raised. * If the "try" statement reaches a "break", "continue" or "return" statement, the "finally" clause will execute just prior to the "break", "continue" or "return" statement’s execution. * If a "finally" clause includes a "return" statement, the returned value will be the one from the "finally" clause’s "return" statement, not the value from the "try" clause’s "return" statement. For example: >>> def bool_return(): ... try: ... return True ... finally: ... return False ... >>> bool_return() False A more complicated example: >>> def divide(x, y): ... try: ... result = x / y ... except ZeroDivisionError: ... print("division by zero!") ... else: ... print("result is", result) ... finally: ... print("executing finally clause") ... >>> divide(2, 1) result is 2.0 executing finally clause >>> divide(2, 0) division by zero! executing finally clause >>> divide("2", "1") executing finally clause Traceback (most recent call last): File "", line 1, in File "", line 3, in divide TypeError: unsupported operand type(s) for /: 'str' and 'str' As you can see, the "finally" clause is executed in any event. The "TypeError" raised by dividing two strings is not handled by the "except" clause and therefore re-raised after the "finally" clause has been executed. In real world applications, the "finally" clause is useful for releasing external resources (such as files or network connections), regardless of whether the use of the resource was successful. 8.8. Predefined Clean-up Actions ================================ Some objects define standard clean-up actions to be undertaken when the object is no longer needed, regardless of whether or not the operation using the object succeeded or failed. Look at the following example, which tries to open a file and print its contents to the screen. for line in open("myfile.txt"): print(line, end="") The problem with this code is that it leaves the file open for an indeterminate amount of time after this part of the code has finished executing. This is not an issue in simple scripts, but can be a problem for larger applications. The "with" statement allows objects like files to be used in a way that ensures they are always cleaned up promptly and correctly. with open("myfile.txt") as f: for line in f: print(line, end="") After the statement is executed, the file *f* is always closed, even if a problem was encountered while processing the lines. Objects which, like files, provide predefined clean-up actions will indicate this in their documentation. 15. Floating Point Arithmetic: Issues and Limitations ****************************************************** Floating-point numbers are represented in computer hardware as base 2 (binary) fractions. For example, the decimal fraction 0.125 has value 1/10 + 2/100 + 5/1000, and in the same way the binary fraction 0.001 has value 0/2 + 0/4 + 1/8. These two fractions have identical values, the only real difference being that the first is written in base 10 fractional notation, and the second in base 2. Unfortunately, most decimal fractions cannot be represented exactly as binary fractions. A consequence is that, in general, the decimal floating-point numbers you enter are only approximated by the binary floating-point numbers actually stored in the machine. The problem is easier to understand at first in base 10. Consider the fraction 1/3. You can approximate that as a base 10 fraction: 0.3 or, better, 0.33 or, better, 0.333 and so on. No matter how many digits you’re willing to write down, the result will never be exactly 1/3, but will be an increasingly better approximation of 1/3. In the same way, no matter how many base 2 digits you’re willing to use, the decimal value 0.1 cannot be represented exactly as a base 2 fraction. In base 2, 1/10 is the infinitely repeating fraction 0.0001100110011001100110011001100110011001100110011... Stop at any finite number of bits, and you get an approximation. On most machines today, floats are approximated using a binary fraction with the numerator using the first 53 bits starting with the most significant bit and with the denominator as a power of two. In the case of 1/10, the binary fraction is "3602879701896397 / 2 ** 55" which is close to but not exactly equal to the true value of 1/10. Many users are not aware of the approximation because of the way values are displayed. Python only prints a decimal approximation to the true decimal value of the binary approximation stored by the machine. On most machines, if Python were to print the true decimal value of the binary approximation stored for 0.1, it would have to display >>> 0.1 0.1000000000000000055511151231257827021181583404541015625 That is more digits than most people find useful, so Python keeps the number of digits manageable by displaying a rounded value instead >>> 1 / 10 0.1 Just remember, even though the printed result looks like the exact value of 1/10, the actual stored value is the nearest representable binary fraction. Interestingly, there are many different decimal numbers that share the same nearest approximate binary fraction. For example, the numbers "0.1" and "0.10000000000000001" and "0.1000000000000000055511151231257827021181583404541015625" are all approximated by "3602879701896397 / 2 ** 55". Since all of these decimal values share the same approximation, any one of them could be displayed while still preserving the invariant "eval(repr(x)) == x". Historically, the Python prompt and built-in "repr()" function would choose the one with 17 significant digits, "0.10000000000000001". Starting with Python 3.1, Python (on most systems) is now able to choose the shortest of these and simply display "0.1". Note that this is in the very nature of binary floating-point: this is not a bug in Python, and it is not a bug in your code either. You’ll see the same kind of thing in all languages that support your hardware’s floating-point arithmetic (although some languages may not *display* the difference by default, or in all output modes). For more pleasant output, you may wish to use string formatting to produce a limited number of significant digits: >>> format(math.pi, '.12g') # give 12 significant digits '3.14159265359' >>> format(math.pi, '.2f') # give 2 digits after the point '3.14' >>> repr(math.pi) '3.141592653589793' It’s important to realize that this is, in a real sense, an illusion: you’re simply rounding the *display* of the true machine value. One illusion may beget another. For example, since 0.1 is not exactly 1/10, summing three values of 0.1 may not yield exactly 0.3, either: >>> .1 + .1 + .1 == .3 False Also, since the 0.1 cannot get any closer to the exact value of 1/10 and 0.3 cannot get any closer to the exact value of 3/10, then pre- rounding with "round()" function cannot help: >>> round(.1, 1) + round(.1, 1) + round(.1, 1) == round(.3, 1) False Though the numbers cannot be made closer to their intended exact values, the "round()" function can be useful for post-rounding so that results with inexact values become comparable to one another: >>> round(.1 + .1 + .1, 10) == round(.3, 10) True Binary floating-point arithmetic holds many surprises like this. The problem with “0.1” is explained in precise detail below, in the “Representation Error” section. See The Perils of Floating Point for a more complete account of other common surprises. As that says near the end, “there are no easy answers.” Still, don’t be unduly wary of floating-point! The errors in Python float operations are inherited from the floating-point hardware, and on most machines are on the order of no more than 1 part in 2**53 per operation. That’s more than adequate for most tasks, but you do need to keep in mind that it’s not decimal arithmetic and that every float operation can suffer a new rounding error. While pathological cases do exist, for most casual use of floating- point arithmetic you’ll see the result you expect in the end if you simply round the display of your final results to the number of decimal digits you expect. "str()" usually suffices, and for finer control see the "str.format()" method’s format specifiers in Format String Syntax. For use cases which require exact decimal representation, try using the "decimal" module which implements decimal arithmetic suitable for accounting applications and high-precision applications. Another form of exact arithmetic is supported by the "fractions" module which implements arithmetic based on rational numbers (so the numbers like 1/3 can be represented exactly). If you are a heavy user of floating point operations you should take a look at the NumPy package and many other packages for mathematical and statistical operations supplied by the SciPy project. See . Python provides tools that may help on those rare occasions when you really *do* want to know the exact value of a float. The "float.as_integer_ratio()" method expresses the value of a float as a fraction: >>> x = 3.14159 >>> x.as_integer_ratio() (3537115888337719, 1125899906842624) Since the ratio is exact, it can be used to losslessly recreate the original value: >>> x == 3537115888337719 / 1125899906842624 True The "float.hex()" method expresses a float in hexadecimal (base 16), again giving the exact value stored by your computer: >>> x.hex() '0x1.921f9f01b866ep+1' This precise hexadecimal representation can be used to reconstruct the float value exactly: >>> x == float.fromhex('0x1.921f9f01b866ep+1') True Since the representation is exact, it is useful for reliably porting values across different versions of Python (platform independence) and exchanging data with other languages that support the same format (such as Java and C99). Another helpful tool is the "math.fsum()" function which helps mitigate loss-of-precision during summation. It tracks “lost digits” as values are added onto a running total. That can make a difference in overall accuracy so that the errors do not accumulate to the point where they affect the final total: >>> sum([0.1] * 10) == 1.0 False >>> math.fsum([0.1] * 10) == 1.0 True 15.1. Representation Error ========================== This section explains the “0.1” example in detail, and shows how you can perform an exact analysis of cases like this yourself. Basic familiarity with binary floating-point representation is assumed. *Representation error* refers to the fact that some (most, actually) decimal fractions cannot be represented exactly as binary (base 2) fractions. This is the chief reason why Python (or Perl, C, C++, Java, Fortran, and many others) often won’t display the exact decimal number you expect. Why is that? 1/10 is not exactly representable as a binary fraction. Almost all machines today (November 2000) use IEEE-754 floating point arithmetic, and almost all platforms map Python floats to IEEE-754 “double precision”. 754 doubles contain 53 bits of precision, so on input the computer strives to convert 0.1 to the closest fraction it can of the form *J*/2***N* where *J* is an integer containing exactly 53 bits. Rewriting 1 / 10 ~= J / (2**N) as J ~= 2**N / 10 and recalling that *J* has exactly 53 bits (is ">= 2**52" but "< 2**53"), the best value for *N* is 56: >>> 2**52 <= 2**56 // 10 < 2**53 True That is, 56 is the only value for *N* that leaves *J* with exactly 53 bits. The best possible value for *J* is then that quotient rounded: >>> q, r = divmod(2**56, 10) >>> r 6 Since the remainder is more than half of 10, the best approximation is obtained by rounding up: >>> q+1 7205759403792794 Therefore the best possible approximation to 1/10 in 754 double precision is: 7205759403792794 / 2 ** 56 Dividing both the numerator and denominator by two reduces the fraction to: 3602879701896397 / 2 ** 55 Note that since we rounded up, this is actually a little bit larger than 1/10; if we had not rounded up, the quotient would have been a little bit smaller than 1/10. But in no case can it be *exactly* 1/10! So the computer never “sees” 1/10: what it sees is the exact fraction given above, the best 754 double approximation it can get: >>> 0.1 * 2 ** 55 3602879701896397.0 If we multiply that fraction by 10**55, we can see the value out to 55 decimal digits: >>> 3602879701896397 * 10 ** 55 // 2 ** 55 1000000000000000055511151231257827021181583404541015625 meaning that the exact number stored in the computer is equal to the decimal value 0.1000000000000000055511151231257827021181583404541015625. Instead of displaying the full decimal value, many languages (including older versions of Python), round the result to 17 significant digits: >>> format(0.1, '.17f') '0.10000000000000001' The "fractions" and "decimal" modules make these calculations easy: >>> from decimal import Decimal >>> from fractions import Fraction >>> Fraction.from_float(0.1) Fraction(3602879701896397, 36028797018963968) >>> (0.1).as_integer_ratio() (3602879701896397, 36028797018963968) >>> Decimal.from_float(0.1) Decimal('0.1000000000000000055511151231257827021181583404541015625') >>> format(Decimal.from_float(0.1), '.17') '0.10000000000000001' The Python Tutorial ******************* Python is an easy to learn, powerful programming language. It has efficient high-level data structures and a simple but effective approach to object-oriented programming. Python’s elegant syntax and dynamic typing, together with its interpreted nature, make it an ideal language for scripting and rapid application development in many areas on most platforms. The Python interpreter and the extensive standard library are freely available in source or binary form for all major platforms from the Python Web site, https://www.python.org/, and may be freely distributed. The same site also contains distributions of and pointers to many free third party Python modules, programs and tools, and additional documentation. The Python interpreter is easily extended with new functions and data types implemented in C or C++ (or other languages callable from C). Python is also suitable as an extension language for customizable applications. This tutorial introduces the reader informally to the basic concepts and features of the Python language and system. It helps to have a Python interpreter handy for hands-on experience, but all examples are self-contained, so the tutorial can be read off-line as well. For a description of standard objects and modules, see The Python Standard Library. The Python Language Reference gives a more formal definition of the language. To write extensions in C or C++, read Extending and Embedding the Python Interpreter and Python/C API Reference Manual. There are also several books covering Python in depth. This tutorial does not attempt to be comprehensive and cover every single feature, or even every commonly used feature. Instead, it introduces many of Python’s most noteworthy features, and will give you a good idea of the language’s flavor and style. After reading it, you will be able to read and write Python modules and programs, and you will be ready to learn more about the various Python library modules described in The Python Standard Library. The Glossary is also worth going through. * 1. Whetting Your Appetite * 2. Using the Python Interpreter * 2.1. Invoking the Interpreter * 2.1.1. Argument Passing * 2.1.2. Interactive Mode * 2.2. The Interpreter and Its Environment * 2.2.1. Source Code Encoding * 3. An Informal Introduction to Python * 3.1. Using Python as a Calculator * 3.1.1. Numbers * 3.1.2. Strings * 3.1.3. Lists * 3.2. First Steps Towards Programming * 4. More Control Flow Tools * 4.1. "if" Statements * 4.2. "for" Statements * 4.3. The "range()" Function * 4.4. "break" and "continue" Statements, and "else" Clauses on Loops * 4.5. "pass" Statements * 4.6. Defining Functions * 4.7. More on Defining Functions * 4.7.1. Default Argument Values * 4.7.2. Keyword Arguments * 4.7.3. Special parameters * 4.7.3.1. Positional-or-Keyword Arguments * 4.7.3.2. Positional-Only Parameters * 4.7.3.3. Keyword-Only Arguments * 4.7.3.4. Function Examples * 4.7.3.5. Recap * 4.7.4. Arbitrary Argument Lists * 4.7.5. Unpacking Argument Lists * 4.7.6. Lambda Expressions * 4.7.7. Documentation Strings * 4.7.8. Function Annotations * 4.8. Intermezzo: Coding Style * 5. Data Structures * 5.1. More on Lists * 5.1.1. Using Lists as Stacks * 5.1.2. Using Lists as Queues * 5.1.3. List Comprehensions * 5.1.4. Nested List Comprehensions * 5.2. The "del" statement * 5.3. Tuples and Sequences * 5.4. Sets * 5.5. Dictionaries * 5.6. Looping Techniques * 5.7. More on Conditions * 5.8. Comparing Sequences and Other Types * 6. Modules * 6.1. More on Modules * 6.1.1. Executing modules as scripts * 6.1.2. The Module Search Path * 6.1.3. “Compiled” Python files * 6.2. Standard Modules * 6.3. The "dir()" Function * 6.4. Packages * 6.4.1. Importing * From a Package * 6.4.2. Intra-package References * 6.4.3. Packages in Multiple Directories * 7. Input and Output * 7.1. Fancier Output Formatting * 7.1.1. Formatted String Literals * 7.1.2. The String format() Method * 7.1.3. Manual String Formatting * 7.1.4. Old string formatting * 7.2. Reading and Writing Files * 7.2.1. Methods of File Objects * 7.2.2. Saving structured data with "json" * 8. Errors and Exceptions * 8.1. Syntax Errors * 8.2. Exceptions * 8.3. Handling Exceptions * 8.4. Raising Exceptions * 8.5. Exception Chaining * 8.6. User-defined Exceptions * 8.7. Defining Clean-up Actions * 8.8. Predefined Clean-up Actions * 9. Classes * 9.1. A Word About Names and Objects * 9.2. Python Scopes and Namespaces * 9.2.1. Scopes and Namespaces Example * 9.3. A First Look at Classes * 9.3.1. Class Definition Syntax * 9.3.2. Class Objects * 9.3.3. Instance Objects * 9.3.4. Method Objects * 9.3.5. Class and Instance Variables * 9.4. Random Remarks * 9.5. Inheritance * 9.5.1. Multiple Inheritance * 9.6. Private Variables * 9.7. Odds and Ends * 9.8. Iterators * 9.9. Generators * 9.10. Generator Expressions * 10. Brief Tour of the Standard Library * 10.1. Operating System Interface * 10.2. File Wildcards * 10.3. Command Line Arguments * 10.4. Error Output Redirection and Program Termination * 10.5. String Pattern Matching * 10.6. Mathematics * 10.7. Internet Access * 10.8. Dates and Times * 10.9. Data Compression * 10.10. Performance Measurement * 10.11. Quality Control * 10.12. Batteries Included * 11. Brief Tour of the Standard Library — Part II * 11.1. Output Formatting * 11.2. Templating * 11.3. Working with Binary Data Record Layouts * 11.4. Multi-threading * 11.5. Logging * 11.6. Weak References * 11.7. Tools for Working with Lists * 11.8. Decimal Floating Point Arithmetic * 12. Virtual Environments and Packages * 12.1. Introduction * 12.2. Creating Virtual Environments * 12.3. Managing Packages with pip * 13. What Now? * 14. Interactive Input Editing and History Substitution * 14.1. Tab Completion and History Editing * 14.2. Alternatives to the Interactive Interpreter * 15. Floating Point Arithmetic: Issues and Limitations * 15.1. Representation Error * 16. Appendix * 16.1. Interactive Mode * 16.1.1. Error Handling * 16.1.2. Executable Python Scripts * 16.1.3. The Interactive Startup File * 16.1.4. The Customization Modules 7. Input and Output ******************* There are several ways to present the output of a program; data can be printed in a human-readable form, or written to a file for future use. This chapter will discuss some of the possibilities. 7.1. Fancier Output Formatting ============================== So far we’ve encountered two ways of writing values: *expression statements* and the "print()" function. (A third way is using the "write()" method of file objects; the standard output file can be referenced as "sys.stdout". See the Library Reference for more information on this.) Often you’ll want more control over the formatting of your output than simply printing space-separated values. There are several ways to format output. * To use formatted string literals, begin a string with "f" or "F" before the opening quotation mark or triple quotation mark. Inside this string, you can write a Python expression between "{" and "}" characters that can refer to variables or literal values. >>> year = 2016 >>> event = 'Referendum' >>> f'Results of the {year} {event}' 'Results of the 2016 Referendum' * The "str.format()" method of strings requires more manual effort. You’ll still use "{" and "}" to mark where a variable will be substituted and can provide detailed formatting directives, but you’ll also need to provide the information to be formatted. >>> yes_votes = 42_572_654 >>> no_votes = 43_132_495 >>> percentage = yes_votes / (yes_votes + no_votes) >>> '{:-9} YES votes {:2.2%}'.format(yes_votes, percentage) ' 42572654 YES votes 49.67%' * Finally, you can do all the string handling yourself by using string slicing and concatenation operations to create any layout you can imagine. The string type has some methods that perform useful operations for padding strings to a given column width. When you don’t need fancy output but just want a quick display of some variables for debugging purposes, you can convert any value to a string with the "repr()" or "str()" functions. The "str()" function is meant to return representations of values which are fairly human-readable, while "repr()" is meant to generate representations which can be read by the interpreter (or will force a "SyntaxError" if there is no equivalent syntax). For objects which don’t have a particular representation for human consumption, "str()" will return the same value as "repr()". Many values, such as numbers or structures like lists and dictionaries, have the same representation using either function. Strings, in particular, have two distinct representations. Some examples: >>> s = 'Hello, world.' >>> str(s) 'Hello, world.' >>> repr(s) "'Hello, world.'" >>> str(1/7) '0.14285714285714285' >>> x = 10 * 3.25 >>> y = 200 * 200 >>> s = 'The value of x is ' + repr(x) + ', and y is ' + repr(y) + '...' >>> print(s) The value of x is 32.5, and y is 40000... >>> # The repr() of a string adds string quotes and backslashes: ... hello = 'hello, world\n' >>> hellos = repr(hello) >>> print(hellos) 'hello, world\n' >>> # The argument to repr() may be any Python object: ... repr((x, y, ('spam', 'eggs'))) "(32.5, 40000, ('spam', 'eggs'))" The "string" module contains a "Template" class that offers yet another way to substitute values into strings, using placeholders like "$x" and replacing them with values from a dictionary, but offers much less control of the formatting. 7.1.1. Formatted String Literals -------------------------------- Formatted string literals (also called f-strings for short) let you include the value of Python expressions inside a string by prefixing the string with "f" or "F" and writing expressions as "{expression}". An optional format specifier can follow the expression. This allows greater control over how the value is formatted. The following example rounds pi to three places after the decimal: >>> import math >>> print(f'The value of pi is approximately {math.pi:.3f}.') The value of pi is approximately 3.142. Passing an integer after the "':'" will cause that field to be a minimum number of characters wide. This is useful for making columns line up. >>> table = {'Sjoerd': 4127, 'Jack': 4098, 'Dcab': 7678} >>> for name, phone in table.items(): ... print(f'{name:10} ==> {phone:10d}') ... Sjoerd ==> 4127 Jack ==> 4098 Dcab ==> 7678 Other modifiers can be used to convert the value before it is formatted. "'!a'" applies "ascii()", "'!s'" applies "str()", and "'!r'" applies "repr()": >>> animals = 'eels' >>> print(f'My hovercraft is full of {animals}.') My hovercraft is full of eels. >>> print(f'My hovercraft is full of {animals!r}.') My hovercraft is full of 'eels'. For a reference on these format specifications, see the reference guide for the Format Specification Mini-Language. 7.1.2. The String format() Method --------------------------------- Basic usage of the "str.format()" method looks like this: >>> print('We are the {} who say "{}!"'.format('knights', 'Ni')) We are the knights who say "Ni!" The brackets and characters within them (called format fields) are replaced with the objects passed into the "str.format()" method. A number in the brackets can be used to refer to the position of the object passed into the "str.format()" method. >>> print('{0} and {1}'.format('spam', 'eggs')) spam and eggs >>> print('{1} and {0}'.format('spam', 'eggs')) eggs and spam If keyword arguments are used in the "str.format()" method, their values are referred to by using the name of the argument. >>> print('This {food} is {adjective}.'.format( ... food='spam', adjective='absolutely horrible')) This spam is absolutely horrible. Positional and keyword arguments can be arbitrarily combined: >>> print('The story of {0}, {1}, and {other}.'.format('Bill', 'Manfred', other='Georg')) The story of Bill, Manfred, and Georg. If you have a really long format string that you don’t want to split up, it would be nice if you could reference the variables to be formatted by name instead of by position. This can be done by simply passing the dict and using square brackets "'[]'" to access the keys. >>> table = {'Sjoerd': 4127, 'Jack': 4098, 'Dcab': 8637678} >>> print('Jack: {0[Jack]:d}; Sjoerd: {0[Sjoerd]:d}; ' ... 'Dcab: {0[Dcab]:d}'.format(table)) Jack: 4098; Sjoerd: 4127; Dcab: 8637678 This could also be done by passing the table as keyword arguments with the ‘**’ notation. >>> table = {'Sjoerd': 4127, 'Jack': 4098, 'Dcab': 8637678} >>> print('Jack: {Jack:d}; Sjoerd: {Sjoerd:d}; Dcab: {Dcab:d}'.format(**table)) Jack: 4098; Sjoerd: 4127; Dcab: 8637678 This is particularly useful in combination with the built-in function "vars()", which returns a dictionary containing all local variables. As an example, the following lines produce a tidily-aligned set of columns giving integers and their squares and cubes: >>> for x in range(1, 11): ... print('{0:2d} {1:3d} {2:4d}'.format(x, x*x, x*x*x)) ... 1 1 1 2 4 8 3 9 27 4 16 64 5 25 125 6 36 216 7 49 343 8 64 512 9 81 729 10 100 1000 For a complete overview of string formatting with "str.format()", see Format String Syntax. 7.1.3. Manual String Formatting ------------------------------- Here’s the same table of squares and cubes, formatted manually: >>> for x in range(1, 11): ... print(repr(x).rjust(2), repr(x*x).rjust(3), end=' ') ... # Note use of 'end' on previous line ... print(repr(x*x*x).rjust(4)) ... 1 1 1 2 4 8 3 9 27 4 16 64 5 25 125 6 36 216 7 49 343 8 64 512 9 81 729 10 100 1000 (Note that the one space between each column was added by the way "print()" works: it always adds spaces between its arguments.) The "str.rjust()" method of string objects right-justifies a string in a field of a given width by padding it with spaces on the left. There are similar methods "str.ljust()" and "str.center()". These methods do not write anything, they just return a new string. If the input string is too long, they don’t truncate it, but return it unchanged; this will mess up your column lay-out but that’s usually better than the alternative, which would be lying about a value. (If you really want truncation you can always add a slice operation, as in "x.ljust(n)[:n]".) There is another method, "str.zfill()", which pads a numeric string on the left with zeros. It understands about plus and minus signs: >>> '12'.zfill(5) '00012' >>> '-3.14'.zfill(7) '-003.14' >>> '3.14159265359'.zfill(5) '3.14159265359' 7.1.4. Old string formatting ---------------------------- The % operator (modulo) can also be used for string formatting. Given "'string' % values", instances of "%" in "string" are replaced with zero or more elements of "values". This operation is commonly known as string interpolation. For example: >>> import math >>> print('The value of pi is approximately %5.3f.' % math.pi) The value of pi is approximately 3.142. More information can be found in the printf-style String Formatting section. 7.2. Reading and Writing Files ============================== "open()" returns a *file object*, and is most commonly used with two positional arguments and one keyword argument: "open(filename, mode, encoding=None)" >>> f = open('workfile', 'w', encoding="utf-8") The first argument is a string containing the filename. The second argument is another string containing a few characters describing the way in which the file will be used. *mode* can be "'r'" when the file will only be read, "'w'" for only writing (an existing file with the same name will be erased), and "'a'" opens the file for appending; any data written to the file is automatically added to the end. "'r+'" opens the file for both reading and writing. The *mode* argument is optional; "'r'" will be assumed if it’s omitted. Normally, files are opened in *text mode*, that means, you read and write strings from and to the file, which are encoded in a specific *encoding*. If *encoding* is not specified, the default is platform dependent (see "open()"). Because UTF-8 is the modern de-facto standard, "encoding="utf-8"" is recommended unless you know that you need to use a different encoding. Appending a "'b'" to the mode opens the file in *binary mode*. Binary mode data is read and written as "bytes" objects. You can not specify *encoding* when opening file in binary mode. In text mode, the default when reading is to convert platform-specific line endings ("\n" on Unix, "\r\n" on Windows) to just "\n". When writing in text mode, the default is to convert occurrences of "\n" back to platform-specific line endings. This behind-the-scenes modification to file data is fine for text files, but will corrupt binary data like that in "JPEG" or "EXE" files. Be very careful to use binary mode when reading and writing such files. It is good practice to use the "with" keyword when dealing with file objects. The advantage is that the file is properly closed after its suite finishes, even if an exception is raised at some point. Using "with" is also much shorter than writing equivalent "try"-"finally" blocks: >>> with open('workfile', encoding="utf-8") as f: ... read_data = f.read() >>> # We can check that the file has been automatically closed. >>> f.closed True If you’re not using the "with" keyword, then you should call "f.close()" to close the file and immediately free up any system resources used by it. Warning: Calling "f.write()" without using the "with" keyword or calling "f.close()" **might** result in the arguments of "f.write()" not being completely written to the disk, even if the program exits successfully. After a file object is closed, either by a "with" statement or by calling "f.close()", attempts to use the file object will automatically fail. >>> f.close() >>> f.read() Traceback (most recent call last): File "", line 1, in ValueError: I/O operation on closed file. 7.2.1. Methods of File Objects ------------------------------ The rest of the examples in this section will assume that a file object called "f" has already been created. To read a file’s contents, call "f.read(size)", which reads some quantity of data and returns it as a string (in text mode) or bytes object (in binary mode). *size* is an optional numeric argument. When *size* is omitted or negative, the entire contents of the file will be read and returned; it’s your problem if the file is twice as large as your machine’s memory. Otherwise, at most *size* characters (in text mode) or *size* bytes (in binary mode) are read and returned. If the end of the file has been reached, "f.read()" will return an empty string ("''"). >>> f.read() 'This is the entire file.\n' >>> f.read() '' "f.readline()" reads a single line from the file; a newline character ("\n") is left at the end of the string, and is only omitted on the last line of the file if the file doesn’t end in a newline. This makes the return value unambiguous; if "f.readline()" returns an empty string, the end of the file has been reached, while a blank line is represented by "'\n'", a string containing only a single newline. >>> f.readline() 'This is the first line of the file.\n' >>> f.readline() 'Second line of the file\n' >>> f.readline() '' For reading lines from a file, you can loop over the file object. This is memory efficient, fast, and leads to simple code: >>> for line in f: ... print(line, end='') ... This is the first line of the file. Second line of the file If you want to read all the lines of a file in a list you can also use "list(f)" or "f.readlines()". "f.write(string)" writes the contents of *string* to the file, returning the number of characters written. >>> f.write('This is a test\n') 15 Other types of objects need to be converted – either to a string (in text mode) or a bytes object (in binary mode) – before writing them: >>> value = ('the answer', 42) >>> s = str(value) # convert the tuple to string >>> f.write(s) 18 "f.tell()" returns an integer giving the file object’s current position in the file represented as number of bytes from the beginning of the file when in binary mode and an opaque number when in text mode. To change the file object’s position, use "f.seek(offset, whence)". The position is computed from adding *offset* to a reference point; the reference point is selected by the *whence* argument. A *whence* value of 0 measures from the beginning of the file, 1 uses the current file position, and 2 uses the end of the file as the reference point. *whence* can be omitted and defaults to 0, using the beginning of the file as the reference point. >>> f = open('workfile', 'rb+') >>> f.write(b'0123456789abcdef') 16 >>> f.seek(5) # Go to the 6th byte in the file 5 >>> f.read(1) b'5' >>> f.seek(-3, 2) # Go to the 3rd byte before the end 13 >>> f.read(1) b'd' In text files (those opened without a "b" in the mode string), only seeks relative to the beginning of the file are allowed (the exception being seeking to the very file end with "seek(0, 2)") and the only valid *offset* values are those returned from the "f.tell()", or zero. Any other *offset* value produces undefined behaviour. File objects have some additional methods, such as "isatty()" and "truncate()" which are less frequently used; consult the Library Reference for a complete guide to file objects. 7.2.2. Saving structured data with "json" ----------------------------------------- Strings can easily be written to and read from a file. Numbers take a bit more effort, since the "read()" method only returns strings, which will have to be passed to a function like "int()", which takes a string like "'123'" and returns its numeric value 123. When you want to save more complex data types like nested lists and dictionaries, parsing and serializing by hand becomes complicated. Rather than having users constantly writing and debugging code to save complicated data types to files, Python allows you to use the popular data interchange format called JSON (JavaScript Object Notation). The standard module called "json" can take Python data hierarchies, and convert them to string representations; this process is called *serializing*. Reconstructing the data from the string representation is called *deserializing*. Between serializing and deserializing, the string representing the object may have been stored in a file or data, or sent over a network connection to some distant machine. Note: The JSON format is commonly used by modern applications to allow for data exchange. Many programmers are already familiar with it, which makes it a good choice for interoperability. If you have an object "x", you can view its JSON string representation with a simple line of code: >>> import json >>> x = [1, 'simple', 'list'] >>> json.dumps(x) '[1, "simple", "list"]' Another variant of the "dumps()" function, called "dump()", simply serializes the object to a *text file*. So if "f" is a *text file* object opened for writing, we can do this: json.dump(x, f) To decode the object again, if "f" is a *binary file* or *text file* object which has been opened for reading: x = json.load(f) Note: JSON files must be encoded in UTF-8. Use "encoding="utf-8"" when opening JSON file as a *text file* for both of reading and writing. This simple serialization technique can handle lists and dictionaries, but serializing arbitrary class instances in JSON requires a bit of extra effort. The reference for the "json" module contains an explanation of this. See also: "pickle" - the pickle module Contrary to JSON, *pickle* is a protocol which allows the serialization of arbitrarily complex Python objects. As such, it is specific to Python and cannot be used to communicate with applications written in other languages. It is also insecure by default: deserializing pickle data coming from an untrusted source can execute arbitrary code, if the data was crafted by a skilled attacker. 14. Interactive Input Editing and History Substitution ****************************************************** Some versions of the Python interpreter support editing of the current input line and history substitution, similar to facilities found in the Korn shell and the GNU Bash shell. This is implemented using the GNU Readline library, which supports various styles of editing. This library has its own documentation which we won’t duplicate here. 14.1. Tab Completion and History Editing ======================================== Completion of variable and module names is automatically enabled at interpreter startup so that the "Tab" key invokes the completion function; it looks at Python statement names, the current local variables, and the available module names. For dotted expressions such as "string.a", it will evaluate the expression up to the final "'.'" and then suggest completions from the attributes of the resulting object. Note that this may execute application-defined code if an object with a "__getattr__()" method is part of the expression. The default configuration also saves your history into a file named ".python_history" in your user directory. The history will be available again during the next interactive interpreter session. 14.2. Alternatives to the Interactive Interpreter ================================================= This facility is an enormous step forward compared to earlier versions of the interpreter; however, some wishes are left: It would be nice if the proper indentation were suggested on continuation lines (the parser knows if an indent token is required next). The completion mechanism might use the interpreter’s symbol table. A command to check (or even suggest) matching parentheses, quotes, etc., would also be useful. One alternative enhanced interactive interpreter that has been around for quite some time is IPython, which features tab completion, object exploration and advanced history management. It can also be thoroughly customized and embedded into other applications. Another similar enhanced interactive environment is bpython. 2. Using the Python Interpreter ******************************* 2.1. Invoking the Interpreter ============================= The Python interpreter is usually installed as "/usr/local/bin/python3.9" on those machines where it is available; putting "/usr/local/bin" in your Unix shell’s search path makes it possible to start it by typing the command: python3.9 to the shell. [1] Since the choice of the directory where the interpreter lives is an installation option, other places are possible; check with your local Python guru or system administrator. (E.g., "/usr/local/python" is a popular alternative location.) On Windows machines where you have installed Python from the Microsoft Store, the "python3.9" command will be available. If you have the py.exe launcher installed, you can use the "py" command. See Excursus: Setting environment variables for other ways to launch Python. Typing an end-of-file character ("Control-D" on Unix, "Control-Z" on Windows) at the primary prompt causes the interpreter to exit with a zero exit status. If that doesn’t work, you can exit the interpreter by typing the following command: "quit()". The interpreter’s line-editing features include interactive editing, history substitution and code completion on systems that support the GNU Readline library. Perhaps the quickest check to see whether command line editing is supported is typing "Control-P" to the first Python prompt you get. If it beeps, you have command line editing; see Appendix Interactive Input Editing and History Substitution for an introduction to the keys. If nothing appears to happen, or if "^P" is echoed, command line editing isn’t available; you’ll only be able to use backspace to remove characters from the current line. The interpreter operates somewhat like the Unix shell: when called with standard input connected to a tty device, it reads and executes commands interactively; when called with a file name argument or with a file as standard input, it reads and executes a *script* from that file. A second way of starting the interpreter is "python -c command [arg] ...", which executes the statement(s) in *command*, analogous to the shell’s "-c" option. Since Python statements often contain spaces or other characters that are special to the shell, it is usually advised to quote *command* in its entirety with single quotes. Some Python modules are also useful as scripts. These can be invoked using "python -m module [arg] ...", which executes the source file for *module* as if you had spelled out its full name on the command line. When a script file is used, it is sometimes useful to be able to run the script and enter interactive mode afterwards. This can be done by passing "-i" before the script. All command line options are described in Command line and environment. 2.1.1. Argument Passing ----------------------- When known to the interpreter, the script name and additional arguments thereafter are turned into a list of strings and assigned to the "argv" variable in the "sys" module. You can access this list by executing "import sys". The length of the list is at least one; when no script and no arguments are given, "sys.argv[0]" is an empty string. When the script name is given as "'-'" (meaning standard input), "sys.argv[0]" is set to "'-'". When "-c" *command* is used, "sys.argv[0]" is set to "'-c'". When "-m" *module* is used, "sys.argv[0]" is set to the full name of the located module. Options found after "-c" *command* or "-m" *module* are not consumed by the Python interpreter’s option processing but left in "sys.argv" for the command or module to handle. 2.1.2. Interactive Mode ----------------------- When commands are read from a tty, the interpreter is said to be in *interactive mode*. In this mode it prompts for the next command with the *primary prompt*, usually three greater-than signs (">>>"); for continuation lines it prompts with the *secondary prompt*, by default three dots ("..."). The interpreter prints a welcome message stating its version number and a copyright notice before printing the first prompt: $ python3.9 Python 3.9 (default, June 4 2019, 09:25:04) [GCC 4.8.2] on linux Type "help", "copyright", "credits" or "license" for more information. >>> Continuation lines are needed when entering a multi-line construct. As an example, take a look at this "if" statement: >>> the_world_is_flat = True >>> if the_world_is_flat: ... print("Be careful not to fall off!") ... Be careful not to fall off! For more on interactive mode, see Interactive Mode. 2.2. The Interpreter and Its Environment ======================================== 2.2.1. Source Code Encoding --------------------------- By default, Python source files are treated as encoded in UTF-8. In that encoding, characters of most languages in the world can be used simultaneously in string literals, identifiers and comments — although the standard library only uses ASCII characters for identifiers, a convention that any portable code should follow. To display all these characters properly, your editor must recognize that the file is UTF-8, and it must use a font that supports all the characters in the file. To declare an encoding other than the default one, a special comment line should be added as the *first* line of the file. The syntax is as follows: # -*- coding: encoding -*- where *encoding* is one of the valid "codecs" supported by Python. For example, to declare that Windows-1252 encoding is to be used, the first line of your source code file should be: # -*- coding: cp1252 -*- One exception to the *first line* rule is when the source code starts with a UNIX “shebang” line. In this case, the encoding declaration should be added as the second line of the file. For example: #!/usr/bin/env python3 # -*- coding: cp1252 -*- -[ Footnotes ]- [1] On Unix, the Python 3.x interpreter is by default not installed with the executable named "python", so that it does not conflict with a simultaneously installed Python 2.x executable. 3. An Informal Introduction to Python ************************************* In the following examples, input and output are distinguished by the presence or absence of prompts (*>>>* and *…*): to repeat the example, you must type everything after the prompt, when the prompt appears; lines that do not begin with a prompt are output from the interpreter. Note that a secondary prompt on a line by itself in an example means you must type a blank line; this is used to end a multi-line command. Many of the examples in this manual, even those entered at the interactive prompt, include comments. Comments in Python start with the hash character, "#", and extend to the end of the physical line. A comment may appear at the start of a line or following whitespace or code, but not within a string literal. A hash character within a string literal is just a hash character. Since comments are to clarify code and are not interpreted by Python, they may be omitted when typing in examples. Some examples: # this is the first comment spam = 1 # and this is the second comment # ... and now a third! text = "# This is not a comment because it's inside quotes." 3.1. Using Python as a Calculator ================================= Let’s try some simple Python commands. Start the interpreter and wait for the primary prompt, ">>>". (It shouldn’t take long.) 3.1.1. Numbers -------------- The interpreter acts as a simple calculator: you can type an expression at it and it will write the value. Expression syntax is straightforward: the operators "+", "-", "*" and "/" work just like in most other languages (for example, Pascal or C); parentheses ("()") can be used for grouping. For example: >>> 2 + 2 4 >>> 50 - 5*6 20 >>> (50 - 5*6) / 4 5.0 >>> 8 / 5 # division always returns a floating point number 1.6 The integer numbers (e.g. "2", "4", "20") have type "int", the ones with a fractional part (e.g. "5.0", "1.6") have type "float". We will see more about numeric types later in the tutorial. Division ("/") always returns a float. To do *floor division* and get an integer result (discarding any fractional result) you can use the "//" operator; to calculate the remainder you can use "%": >>> 17 / 3 # classic division returns a float 5.666666666666667 >>> >>> 17 // 3 # floor division discards the fractional part 5 >>> 17 % 3 # the % operator returns the remainder of the division 2 >>> 5 * 3 + 2 # floored quotient * divisor + remainder 17 With Python, it is possible to use the "**" operator to calculate powers [1]: >>> 5 ** 2 # 5 squared 25 >>> 2 ** 7 # 2 to the power of 7 128 The equal sign ("=") is used to assign a value to a variable. Afterwards, no result is displayed before the next interactive prompt: >>> width = 20 >>> height = 5 * 9 >>> width * height 900 If a variable is not “defined” (assigned a value), trying to use it will give you an error: >>> n # try to access an undefined variable Traceback (most recent call last): File "", line 1, in NameError: name 'n' is not defined There is full support for floating point; operators with mixed type operands convert the integer operand to floating point: >>> 4 * 3.75 - 1 14.0 In interactive mode, the last printed expression is assigned to the variable "_". This means that when you are using Python as a desk calculator, it is somewhat easier to continue calculations, for example: >>> tax = 12.5 / 100 >>> price = 100.50 >>> price * tax 12.5625 >>> price + _ 113.0625 >>> round(_, 2) 113.06 This variable should be treated as read-only by the user. Don’t explicitly assign a value to it — you would create an independent local variable with the same name masking the built-in variable with its magic behavior. In addition to "int" and "float", Python supports other types of numbers, such as "Decimal" and "Fraction". Python also has built-in support for complex numbers, and uses the "j" or "J" suffix to indicate the imaginary part (e.g. "3+5j"). 3.1.2. Strings -------------- Besides numbers, Python can also manipulate strings, which can be expressed in several ways. They can be enclosed in single quotes ("'...'") or double quotes (""..."") with the same result [2]. "\" can be used to escape quotes: >>> 'spam eggs' # single quotes 'spam eggs' >>> 'doesn\'t' # use \' to escape the single quote... "doesn't" >>> "doesn't" # ...or use double quotes instead "doesn't" >>> '"Yes," they said.' '"Yes," they said.' >>> "\"Yes,\" they said." '"Yes," they said.' >>> '"Isn\'t," they said.' '"Isn\'t," they said.' In the interactive interpreter, the output string is enclosed in quotes and special characters are escaped with backslashes. While this might sometimes look different from the input (the enclosing quotes could change), the two strings are equivalent. The string is enclosed in double quotes if the string contains a single quote and no double quotes, otherwise it is enclosed in single quotes. The "print()" function produces a more readable output, by omitting the enclosing quotes and by printing escaped and special characters: >>> '"Isn\'t," they said.' '"Isn\'t," they said.' >>> print('"Isn\'t," they said.') "Isn't," they said. >>> s = 'First line.\nSecond line.' # \n means newline >>> s # without print(), \n is included in the output 'First line.\nSecond line.' >>> print(s) # with print(), \n produces a new line First line. Second line. If you don’t want characters prefaced by "\" to be interpreted as special characters, you can use *raw strings* by adding an "r" before the first quote: >>> print('C:\some\name') # here \n means newline! C:\some ame >>> print(r'C:\some\name') # note the r before the quote C:\some\name String literals can span multiple lines. One way is using triple- quotes: """"..."""" or "'''...'''". End of lines are automatically included in the string, but it’s possible to prevent this by adding a "\" at the end of the line. The following example: print("""\ Usage: thingy [OPTIONS] -h Display this usage message -H hostname Hostname to connect to """) produces the following output (note that the initial newline is not included): Usage: thingy [OPTIONS] -h Display this usage message -H hostname Hostname to connect to Strings can be concatenated (glued together) with the "+" operator, and repeated with "*": >>> # 3 times 'un', followed by 'ium' >>> 3 * 'un' + 'ium' 'unununium' Two or more *string literals* (i.e. the ones enclosed between quotes) next to each other are automatically concatenated. >>> 'Py' 'thon' 'Python' This feature is particularly useful when you want to break long strings: >>> text = ('Put several strings within parentheses ' ... 'to have them joined together.') >>> text 'Put several strings within parentheses to have them joined together.' This only works with two literals though, not with variables or expressions: >>> prefix = 'Py' >>> prefix 'thon' # can't concatenate a variable and a string literal File "", line 1 prefix 'thon' ^ SyntaxError: invalid syntax >>> ('un' * 3) 'ium' File "", line 1 ('un' * 3) 'ium' ^ SyntaxError: invalid syntax If you want to concatenate variables or a variable and a literal, use "+": >>> prefix + 'thon' 'Python' Strings can be *indexed* (subscripted), with the first character having index 0. There is no separate character type; a character is simply a string of size one: >>> word = 'Python' >>> word[0] # character in position 0 'P' >>> word[5] # character in position 5 'n' Indices may also be negative numbers, to start counting from the right: >>> word[-1] # last character 'n' >>> word[-2] # second-last character 'o' >>> word[-6] 'P' Note that since -0 is the same as 0, negative indices start from -1. In addition to indexing, *slicing* is also supported. While indexing is used to obtain individual characters, *slicing* allows you to obtain substring: >>> word[0:2] # characters from position 0 (included) to 2 (excluded) 'Py' >>> word[2:5] # characters from position 2 (included) to 5 (excluded) 'tho' Slice indices have useful defaults; an omitted first index defaults to zero, an omitted second index defaults to the size of the string being sliced. >>> word[:2] # character from the beginning to position 2 (excluded) 'Py' >>> word[4:] # characters from position 4 (included) to the end 'on' >>> word[-2:] # characters from the second-last (included) to the end 'on' Note how the start is always included, and the end always excluded. This makes sure that "s[:i] + s[i:]" is always equal to "s": >>> word[:2] + word[2:] 'Python' >>> word[:4] + word[4:] 'Python' One way to remember how slices work is to think of the indices as pointing *between* characters, with the left edge of the first character numbered 0. Then the right edge of the last character of a string of *n* characters has index *n*, for example: +---+---+---+---+---+---+ | P | y | t | h | o | n | +---+---+---+---+---+---+ 0 1 2 3 4 5 6 -6 -5 -4 -3 -2 -1 The first row of numbers gives the position of the indices 0…6 in the string; the second row gives the corresponding negative indices. The slice from *i* to *j* consists of all characters between the edges labeled *i* and *j*, respectively. For non-negative indices, the length of a slice is the difference of the indices, if both are within bounds. For example, the length of "word[1:3]" is 2. Attempting to use an index that is too large will result in an error: >>> word[42] # the word only has 6 characters Traceback (most recent call last): File "", line 1, in IndexError: string index out of range However, out of range slice indexes are handled gracefully when used for slicing: >>> word[4:42] 'on' >>> word[42:] '' Python strings cannot be changed — they are *immutable*. Therefore, assigning to an indexed position in the string results in an error: >>> word[0] = 'J' Traceback (most recent call last): File "", line 1, in TypeError: 'str' object does not support item assignment >>> word[2:] = 'py' Traceback (most recent call last): File "", line 1, in TypeError: 'str' object does not support item assignment If you need a different string, you should create a new one: >>> 'J' + word[1:] 'Jython' >>> word[:2] + 'py' 'Pypy' The built-in function "len()" returns the length of a string: >>> s = 'supercalifragilisticexpialidocious' >>> len(s) 34 See also: Text Sequence Type — str Strings are examples of *sequence types*, and support the common operations supported by such types. String Methods Strings support a large number of methods for basic transformations and searching. Formatted string literals String literals that have embedded expressions. Format String Syntax Information about string formatting with "str.format()". printf-style String Formatting The old formatting operations invoked when strings are the left operand of the "%" operator are described in more detail here. 3.1.3. Lists ------------ Python knows a number of *compound* data types, used to group together other values. The most versatile is the *list*, which can be written as a list of comma-separated values (items) between square brackets. Lists might contain items of different types, but usually the items all have the same type. >>> squares = [1, 4, 9, 16, 25] >>> squares [1, 4, 9, 16, 25] Like strings (and all other built-in *sequence* types), lists can be indexed and sliced: >>> squares[0] # indexing returns the item 1 >>> squares[-1] 25 >>> squares[-3:] # slicing returns a new list [9, 16, 25] All slice operations return a new list containing the requested elements. This means that the following slice returns a shallow copy of the list: >>> squares[:] [1, 4, 9, 16, 25] Lists also support operations like concatenation: >>> squares + [36, 49, 64, 81, 100] [1, 4, 9, 16, 25, 36, 49, 64, 81, 100] Unlike strings, which are *immutable*, lists are a *mutable* type, i.e. it is possible to change their content: >>> cubes = [1, 8, 27, 65, 125] # something's wrong here >>> 4 ** 3 # the cube of 4 is 64, not 65! 64 >>> cubes[3] = 64 # replace the wrong value >>> cubes [1, 8, 27, 64, 125] You can also add new items at the end of the list, by using the "append()" *method* (we will see more about methods later): >>> cubes.append(216) # add the cube of 6 >>> cubes.append(7 ** 3) # and the cube of 7 >>> cubes [1, 8, 27, 64, 125, 216, 343] Assignment to slices is also possible, and this can even change the size of the list or clear it entirely: >>> letters = ['a', 'b', 'c', 'd', 'e', 'f', 'g'] >>> letters ['a', 'b', 'c', 'd', 'e', 'f', 'g'] >>> # replace some values >>> letters[2:5] = ['C', 'D', 'E'] >>> letters ['a', 'b', 'C', 'D', 'E', 'f', 'g'] >>> # now remove them >>> letters[2:5] = [] >>> letters ['a', 'b', 'f', 'g'] >>> # clear the list by replacing all the elements with an empty list >>> letters[:] = [] >>> letters [] The built-in function "len()" also applies to lists: >>> letters = ['a', 'b', 'c', 'd'] >>> len(letters) 4 It is possible to nest lists (create lists containing other lists), for example: >>> a = ['a', 'b', 'c'] >>> n = [1, 2, 3] >>> x = [a, n] >>> x [['a', 'b', 'c'], [1, 2, 3]] >>> x[0] ['a', 'b', 'c'] >>> x[0][1] 'b' 3.2. First Steps Towards Programming ==================================== Of course, we can use Python for more complicated tasks than adding two and two together. For instance, we can write an initial sub- sequence of the Fibonacci series as follows: >>> # Fibonacci series: ... # the sum of two elements defines the next ... a, b = 0, 1 >>> while a < 10: ... print(a) ... a, b = b, a+b ... 0 1 1 2 3 5 8 This example introduces several new features. * The first line contains a *multiple assignment*: the variables "a" and "b" simultaneously get the new values 0 and 1. On the last line this is used again, demonstrating that the expressions on the right- hand side are all evaluated first before any of the assignments take place. The right-hand side expressions are evaluated from the left to the right. * The "while" loop executes as long as the condition (here: "a < 10") remains true. In Python, like in C, any non-zero integer value is true; zero is false. The condition may also be a string or list value, in fact any sequence; anything with a non-zero length is true, empty sequences are false. The test used in the example is a simple comparison. The standard comparison operators are written the same as in C: "<" (less than), ">" (greater than), "==" (equal to), "<=" (less than or equal to), ">=" (greater than or equal to) and "!=" (not equal to). * The *body* of the loop is *indented*: indentation is Python’s way of grouping statements. At the interactive prompt, you have to type a tab or space(s) for each indented line. In practice you will prepare more complicated input for Python with a text editor; all decent text editors have an auto-indent facility. When a compound statement is entered interactively, it must be followed by a blank line to indicate completion (since the parser cannot guess when you have typed the last line). Note that each line within a basic block must be indented by the same amount. * The "print()" function writes the value of the argument(s) it is given. It differs from just writing the expression you want to write (as we did earlier in the calculator examples) in the way it handles multiple arguments, floating point quantities, and strings. Strings are printed without quotes, and a space is inserted between items, so you can format things nicely, like this: >>> i = 256*256 >>> print('The value of i is', i) The value of i is 65536 The keyword argument *end* can be used to avoid the newline after the output, or end the output with a different string: >>> a, b = 0, 1 >>> while a < 1000: ... print(a, end=',') ... a, b = b, a+b ... 0,1,1,2,3,5,8,13,21,34,55,89,144,233,377,610,987, -[ Footnotes ]- [1] Since "**" has higher precedence than "-", "-3**2" will be interpreted as "-(3**2)" and thus result in "-9". To avoid this and get "9", you can use "(-3)**2". [2] Unlike other languages, special characters such as "\n" have the same meaning with both single ("'...'") and double (""..."") quotes. The only difference between the two is that within single quotes you don’t need to escape """ (but you have to escape "\'") and vice versa. 6. Modules ********** If you quit from the Python interpreter and enter it again, the definitions you have made (functions and variables) are lost. Therefore, if you want to write a somewhat longer program, you are better off using a text editor to prepare the input for the interpreter and running it with that file as input instead. This is known as creating a *script*. As your program gets longer, you may want to split it into several files for easier maintenance. You may also want to use a handy function that you’ve written in several programs without copying its definition into each program. To support this, Python has a way to put definitions in a file and use them in a script or in an interactive instance of the interpreter. Such a file is called a *module*; definitions from a module can be *imported* into other modules or into the *main* module (the collection of variables that you have access to in a script executed at the top level and in calculator mode). A module is a file containing Python definitions and statements. The file name is the module name with the suffix ".py" appended. Within a module, the module’s name (as a string) is available as the value of the global variable "__name__". For instance, use your favorite text editor to create a file called "fibo.py" in the current directory with the following contents: # Fibonacci numbers module def fib(n): # write Fibonacci series up to n a, b = 0, 1 while a < n: print(a, end=' ') a, b = b, a+b print() def fib2(n): # return Fibonacci series up to n result = [] a, b = 0, 1 while a < n: result.append(a) a, b = b, a+b return result Now enter the Python interpreter and import this module with the following command: >>> import fibo This does not enter the names of the functions defined in "fibo" directly in the current symbol table; it only enters the module name "fibo" there. Using the module name you can access the functions: >>> fibo.fib(1000) 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 610 987 >>> fibo.fib2(100) [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89] >>> fibo.__name__ 'fibo' If you intend to use a function often you can assign it to a local name: >>> fib = fibo.fib >>> fib(500) 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 6.1. More on Modules ==================== A module can contain executable statements as well as function definitions. These statements are intended to initialize the module. They are executed only the *first* time the module name is encountered in an import statement. [1] (They are also run if the file is executed as a script.) Each module has its own private symbol table, which is used as the global symbol table by all functions defined in the module. Thus, the author of a module can use global variables in the module without worrying about accidental clashes with a user’s global variables. On the other hand, if you know what you are doing you can touch a module’s global variables with the same notation used to refer to its functions, "modname.itemname". Modules can import other modules. It is customary but not required to place all "import" statements at the beginning of a module (or script, for that matter). The imported module names are placed in the importing module’s global symbol table. There is a variant of the "import" statement that imports names from a module directly into the importing module’s symbol table. For example: >>> from fibo import fib, fib2 >>> fib(500) 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 This does not introduce the module name from which the imports are taken in the local symbol table (so in the example, "fibo" is not defined). There is even a variant to import all names that a module defines: >>> from fibo import * >>> fib(500) 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 This imports all names except those beginning with an underscore ("_"). In most cases Python programmers do not use this facility since it introduces an unknown set of names into the interpreter, possibly hiding some things you have already defined. Note that in general the practice of importing "*" from a module or package is frowned upon, since it often causes poorly readable code. However, it is okay to use it to save typing in interactive sessions. If the module name is followed by "as", then the name following "as" is bound directly to the imported module. >>> import fibo as fib >>> fib.fib(500) 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 This is effectively importing the module in the same way that "import fibo" will do, with the only difference of it being available as "fib". It can also be used when utilising "from" with similar effects: >>> from fibo import fib as fibonacci >>> fibonacci(500) 0 1 1 2 3 5 8 13 21 34 55 89 144 233 377 Note: For efficiency reasons, each module is only imported once per interpreter session. Therefore, if you change your modules, you must restart the interpreter – or, if it’s just one module you want to test interactively, use "importlib.reload()", e.g. "import importlib; importlib.reload(modulename)". 6.1.1. Executing modules as scripts ----------------------------------- When you run a Python module with python fibo.py the code in the module will be executed, just as if you imported it, but with the "__name__" set to ""__main__"". That means that by adding this code at the end of your module: if __name__ == "__main__": import sys fib(int(sys.argv[1])) you can make the file usable as a script as well as an importable module, because the code that parses the command line only runs if the module is executed as the “main” file: $ python fibo.py 50 0 1 1 2 3 5 8 13 21 34 If the module is imported, the code is not run: >>> import fibo >>> This is often used either to provide a convenient user interface to a module, or for testing purposes (running the module as a script executes a test suite). 6.1.2. The Module Search Path ----------------------------- When a module named "spam" is imported, the interpreter first searches for a built-in module with that name. These module names are listed in "sys.builtin_module_names". If not found, it then searches for a file named "spam.py" in a list of directories given by the variable "sys.path". "sys.path" is initialized from these locations: * The directory containing the input script (or the current directory when no file is specified). * "PYTHONPATH" (a list of directory names, with the same syntax as the shell variable "PATH"). * The installation-dependent default (by convention including a "site- packages" directory, handled by the "site" module). Note: On file systems which support symlinks, the directory containing the input script is calculated after the symlink is followed. In other words the directory containing the symlink is **not** added to the module search path. After initialization, Python programs can modify "sys.path". The directory containing the script being run is placed at the beginning of the search path, ahead of the standard library path. This means that scripts in that directory will be loaded instead of modules of the same name in the library directory. This is an error unless the replacement is intended. See section Standard Modules for more information. 6.1.3. “Compiled” Python files ------------------------------ To speed up loading modules, Python caches the compiled version of each module in the "__pycache__" directory under the name "module.*version*.pyc", where the version encodes the format of the compiled file; it generally contains the Python version number. For example, in CPython release 3.3 the compiled version of spam.py would be cached as "__pycache__/spam.cpython-33.pyc". This naming convention allows compiled modules from different releases and different versions of Python to coexist. Python checks the modification date of the source against the compiled version to see if it’s out of date and needs to be recompiled. This is a completely automatic process. Also, the compiled modules are platform-independent, so the same library can be shared among systems with different architectures. Python does not check the cache in two circumstances. First, it always recompiles and does not store the result for the module that’s loaded directly from the command line. Second, it does not check the cache if there is no source module. To support a non-source (compiled only) distribution, the compiled module must be in the source directory, and there must not be a source module. Some tips for experts: * You can use the "-O" or "-OO" switches on the Python command to reduce the size of a compiled module. The "-O" switch removes assert statements, the "-OO" switch removes both assert statements and __doc__ strings. Since some programs may rely on having these available, you should only use this option if you know what you’re doing. “Optimized” modules have an "opt-" tag and are usually smaller. Future releases may change the effects of optimization. * A program doesn’t run any faster when it is read from a ".pyc" file than when it is read from a ".py" file; the only thing that’s faster about ".pyc" files is the speed with which they are loaded. * The module "compileall" can create .pyc files for all modules in a directory. * There is more detail on this process, including a flow chart of the decisions, in **PEP 3147**. 6.2. Standard Modules ===================== Python comes with a library of standard modules, described in a separate document, the Python Library Reference (“Library Reference” hereafter). Some modules are built into the interpreter; these provide access to operations that are not part of the core of the language but are nevertheless built in, either for efficiency or to provide access to operating system primitives such as system calls. The set of such modules is a configuration option which also depends on the underlying platform. For example, the "winreg" module is only provided on Windows systems. One particular module deserves some attention: "sys", which is built into every Python interpreter. The variables "sys.ps1" and "sys.ps2" define the strings used as primary and secondary prompts: >>> import sys >>> sys.ps1 '>>> ' >>> sys.ps2 '... ' >>> sys.ps1 = 'C> ' C> print('Yuck!') Yuck! C> These two variables are only defined if the interpreter is in interactive mode. The variable "sys.path" is a list of strings that determines the interpreter’s search path for modules. It is initialized to a default path taken from the environment variable "PYTHONPATH", or from a built-in default if "PYTHONPATH" is not set. You can modify it using standard list operations: >>> import sys >>> sys.path.append('/ufs/guido/lib/python') 6.3. The "dir()" Function ========================= The built-in function "dir()" is used to find out which names a module defines. It returns a sorted list of strings: >>> import fibo, sys >>> dir(fibo) ['__name__', 'fib', 'fib2'] >>> dir(sys) ['__breakpointhook__', '__displayhook__', '__doc__', '__excepthook__', '__interactivehook__', '__loader__', '__name__', '__package__', '__spec__', '__stderr__', '__stdin__', '__stdout__', '__unraisablehook__', '_clear_type_cache', '_current_frames', '_debugmallocstats', '_framework', '_getframe', '_git', '_home', '_xoptions', 'abiflags', 'addaudithook', 'api_version', 'argv', 'audit', 'base_exec_prefix', 'base_prefix', 'breakpointhook', 'builtin_module_names', 'byteorder', 'call_tracing', 'callstats', 'copyright', 'displayhook', 'dont_write_bytecode', 'exc_info', 'excepthook', 'exec_prefix', 'executable', 'exit', 'flags', 'float_info', 'float_repr_style', 'get_asyncgen_hooks', 'get_coroutine_origin_tracking_depth', 'getallocatedblocks', 'getdefaultencoding', 'getdlopenflags', 'getfilesystemencodeerrors', 'getfilesystemencoding', 'getprofile', 'getrecursionlimit', 'getrefcount', 'getsizeof', 'getswitchinterval', 'gettrace', 'hash_info', 'hexversion', 'implementation', 'int_info', 'intern', 'is_finalizing', 'last_traceback', 'last_type', 'last_value', 'maxsize', 'maxunicode', 'meta_path', 'modules', 'path', 'path_hooks', 'path_importer_cache', 'platform', 'prefix', 'ps1', 'ps2', 'pycache_prefix', 'set_asyncgen_hooks', 'set_coroutine_origin_tracking_depth', 'setdlopenflags', 'setprofile', 'setrecursionlimit', 'setswitchinterval', 'settrace', 'stderr', 'stdin', 'stdout', 'thread_info', 'unraisablehook', 'version', 'version_info', 'warnoptions'] Without arguments, "dir()" lists the names you have defined currently: >>> a = [1, 2, 3, 4, 5] >>> import fibo >>> fib = fibo.fib >>> dir() ['__builtins__', '__name__', 'a', 'fib', 'fibo', 'sys'] Note that it lists all types of names: variables, modules, functions, etc. "dir()" does not list the names of built-in functions and variables. If you want a list of those, they are defined in the standard module "builtins": >>> import builtins >>> dir(builtins) ['ArithmeticError', 'AssertionError', 'AttributeError', 'BaseException', 'BlockingIOError', 'BrokenPipeError', 'BufferError', 'BytesWarning', 'ChildProcessError', 'ConnectionAbortedError', 'ConnectionError', 'ConnectionRefusedError', 'ConnectionResetError', 'DeprecationWarning', 'EOFError', 'Ellipsis', 'EnvironmentError', 'Exception', 'False', 'FileExistsError', 'FileNotFoundError', 'FloatingPointError', 'FutureWarning', 'GeneratorExit', 'IOError', 'ImportError', 'ImportWarning', 'IndentationError', 'IndexError', 'InterruptedError', 'IsADirectoryError', 'KeyError', 'KeyboardInterrupt', 'LookupError', 'MemoryError', 'NameError', 'None', 'NotADirectoryError', 'NotImplemented', 'NotImplementedError', 'OSError', 'OverflowError', 'PendingDeprecationWarning', 'PermissionError', 'ProcessLookupError', 'ReferenceError', 'ResourceWarning', 'RuntimeError', 'RuntimeWarning', 'StopIteration', 'SyntaxError', 'SyntaxWarning', 'SystemError', 'SystemExit', 'TabError', 'TimeoutError', 'True', 'TypeError', 'UnboundLocalError', 'UnicodeDecodeError', 'UnicodeEncodeError', 'UnicodeError', 'UnicodeTranslateError', 'UnicodeWarning', 'UserWarning', 'ValueError', 'Warning', 'ZeroDivisionError', '_', '__build_class__', '__debug__', '__doc__', '__import__', '__name__', '__package__', 'abs', 'all', 'any', 'ascii', 'bin', 'bool', 'bytearray', 'bytes', 'callable', 'chr', 'classmethod', 'compile', 'complex', 'copyright', 'credits', 'delattr', 'dict', 'dir', 'divmod', 'enumerate', 'eval', 'exec', 'exit', 'filter', 'float', 'format', 'frozenset', 'getattr', 'globals', 'hasattr', 'hash', 'help', 'hex', 'id', 'input', 'int', 'isinstance', 'issubclass', 'iter', 'len', 'license', 'list', 'locals', 'map', 'max', 'memoryview', 'min', 'next', 'object', 'oct', 'open', 'ord', 'pow', 'print', 'property', 'quit', 'range', 'repr', 'reversed', 'round', 'set', 'setattr', 'slice', 'sorted', 'staticmethod', 'str', 'sum', 'super', 'tuple', 'type', 'vars', 'zip'] 6.4. Packages ============= Packages are a way of structuring Python’s module namespace by using “dotted module names”. For example, the module name "A.B" designates a submodule named "B" in a package named "A". Just like the use of modules saves the authors of different modules from having to worry about each other’s global variable names, the use of dotted module names saves the authors of multi-module packages like NumPy or Pillow from having to worry about each other’s module names. Suppose you want to design a collection of modules (a “package”) for the uniform handling of sound files and sound data. There are many different sound file formats (usually recognized by their extension, for example: ".wav", ".aiff", ".au"), so you may need to create and maintain a growing collection of modules for the conversion between the various file formats. There are also many different operations you might want to perform on sound data (such as mixing, adding echo, applying an equalizer function, creating an artificial stereo effect), so in addition you will be writing a never-ending stream of modules to perform these operations. Here’s a possible structure for your package (expressed in terms of a hierarchical filesystem): sound/ Top-level package __init__.py Initialize the sound package formats/ Subpackage for file format conversions __init__.py wavread.py wavwrite.py aiffread.py aiffwrite.py auread.py auwrite.py ... effects/ Subpackage for sound effects __init__.py echo.py surround.py reverse.py ... filters/ Subpackage for filters __init__.py equalizer.py vocoder.py karaoke.py ... When importing the package, Python searches through the directories on "sys.path" looking for the package subdirectory. The "__init__.py" files are required to make Python treat directories containing the file as packages. This prevents directories with a common name, such as "string", unintentionally hiding valid modules that occur later on the module search path. In the simplest case, "__init__.py" can just be an empty file, but it can also execute initialization code for the package or set the "__all__" variable, described later. Users of the package can import individual modules from the package, for example: import sound.effects.echo This loads the submodule "sound.effects.echo". It must be referenced with its full name. sound.effects.echo.echofilter(input, output, delay=0.7, atten=4) An alternative way of importing the submodule is: from sound.effects import echo This also loads the submodule "echo", and makes it available without its package prefix, so it can be used as follows: echo.echofilter(input, output, delay=0.7, atten=4) Yet another variation is to import the desired function or variable directly: from sound.effects.echo import echofilter Again, this loads the submodule "echo", but this makes its function "echofilter()" directly available: echofilter(input, output, delay=0.7, atten=4) Note that when using "from package import item", the item can be either a submodule (or subpackage) of the package, or some other name defined in the package, like a function, class or variable. The "import" statement first tests whether the item is defined in the package; if not, it assumes it is a module and attempts to load it. If it fails to find it, an "ImportError" exception is raised. Contrarily, when using syntax like "import item.subitem.subsubitem", each item except for the last must be a package; the last item can be a module or a package but can’t be a class or function or variable defined in the previous item. 6.4.1. Importing * From a Package --------------------------------- Now what happens when the user writes "from sound.effects import *"? Ideally, one would hope that this somehow goes out to the filesystem, finds which submodules are present in the package, and imports them all. This could take a long time and importing sub-modules might have unwanted side-effects that should only happen when the sub-module is explicitly imported. The only solution is for the package author to provide an explicit index of the package. The "import" statement uses the following convention: if a package’s "__init__.py" code defines a list named "__all__", it is taken to be the list of module names that should be imported when "from package import *" is encountered. It is up to the package author to keep this list up-to-date when a new version of the package is released. Package authors may also decide not to support it, if they don’t see a use for importing * from their package. For example, the file "sound/effects/__init__.py" could contain the following code: __all__ = ["echo", "surround", "reverse"] This would mean that "from sound.effects import *" would import the three named submodules of the "sound.effects" package. If "__all__" is not defined, the statement "from sound.effects import *" does *not* import all submodules from the package "sound.effects" into the current namespace; it only ensures that the package "sound.effects" has been imported (possibly running any initialization code in "__init__.py") and then imports whatever names are defined in the package. This includes any names defined (and submodules explicitly loaded) by "__init__.py". It also includes any submodules of the package that were explicitly loaded by previous "import" statements. Consider this code: import sound.effects.echo import sound.effects.surround from sound.effects import * In this example, the "echo" and "surround" modules are imported in the current namespace because they are defined in the "sound.effects" package when the "from...import" statement is executed. (This also works when "__all__" is defined.) Although certain modules are designed to export only names that follow certain patterns when you use "import *", it is still considered bad practice in production code. Remember, there is nothing wrong with using "from package import specific_submodule"! In fact, this is the recommended notation unless the importing module needs to use submodules with the same name from different packages. 6.4.2. Intra-package References ------------------------------- When packages are structured into subpackages (as with the "sound" package in the example), you can use absolute imports to refer to submodules of siblings packages. For example, if the module "sound.filters.vocoder" needs to use the "echo" module in the "sound.effects" package, it can use "from sound.effects import echo". You can also write relative imports, with the "from module import name" form of import statement. These imports use leading dots to indicate the current and parent packages involved in the relative import. From the "surround" module for example, you might use: from . import echo from .. import formats from ..filters import equalizer Note that relative imports are based on the name of the current module. Since the name of the main module is always ""__main__"", modules intended for use as the main module of a Python application must always use absolute imports. 6.4.3. Packages in Multiple Directories --------------------------------------- Packages support one more special attribute, "__path__". This is initialized to be a list containing the name of the directory holding the package’s "__init__.py" before the code in that file is executed. This variable can be modified; doing so affects future searches for modules and subpackages contained in the package. While this feature is not often needed, it can be used to extend the set of modules found in a package. -[ Footnotes ]- [1] In fact function definitions are also ‘statements’ that are ‘executed’; the execution of a module-level function definition enters the function name in the module’s global symbol table. 10. Brief Tour of the Standard Library ************************************** 10.1. Operating System Interface ================================ The "os" module provides dozens of functions for interacting with the operating system: >>> import os >>> os.getcwd() # Return the current working directory 'C:\\Python39' >>> os.chdir('/server/accesslogs') # Change current working directory >>> os.system('mkdir today') # Run the command mkdir in the system shell 0 Be sure to use the "import os" style instead of "from os import *". This will keep "os.open()" from shadowing the built-in "open()" function which operates much differently. The built-in "dir()" and "help()" functions are useful as interactive aids for working with large modules like "os": >>> import os >>> dir(os) >>> help(os) For daily file and directory management tasks, the "shutil" module provides a higher level interface that is easier to use: >>> import shutil >>> shutil.copyfile('data.db', 'archive.db') 'archive.db' >>> shutil.move('/build/executables', 'installdir') 'installdir' 10.2. File Wildcards ==================== The "glob" module provides a function for making file lists from directory wildcard searches: >>> import glob >>> glob.glob('*.py') ['primes.py', 'random.py', 'quote.py'] 10.3. Command Line Arguments ============================ Common utility scripts often need to process command line arguments. These arguments are stored in the "sys" module’s *argv* attribute as a list. For instance the following output results from running "python demo.py one two three" at the command line: >>> import sys >>> print(sys.argv) ['demo.py', 'one', 'two', 'three'] The "argparse" module provides a more sophisticated mechanism to process command line arguments. The following script extracts one or more filenames and an optional number of lines to be displayed: import argparse parser = argparse.ArgumentParser( prog='top', description='Show top lines from each file') parser.add_argument('filenames', nargs='+') parser.add_argument('-l', '--lines', type=int, default=10) args = parser.parse_args() print(args) When run at the command line with "python top.py --lines=5 alpha.txt beta.txt", the script sets "args.lines" to "5" and "args.filenames" to "['alpha.txt', 'beta.txt']". 10.4. Error Output Redirection and Program Termination ====================================================== The "sys" module also has attributes for *stdin*, *stdout*, and *stderr*. The latter is useful for emitting warnings and error messages to make them visible even when *stdout* has been redirected: >>> sys.stderr.write('Warning, log file not found starting a new one\n') Warning, log file not found starting a new one The most direct way to terminate a script is to use "sys.exit()". 10.5. String Pattern Matching ============================= The "re" module provides regular expression tools for advanced string processing. For complex matching and manipulation, regular expressions offer succinct, optimized solutions: >>> import re >>> re.findall(r'\bf[a-z]*', 'which foot or hand fell fastest') ['foot', 'fell', 'fastest'] >>> re.sub(r'(\b[a-z]+) \1', r'\1', 'cat in the the hat') 'cat in the hat' When only simple capabilities are needed, string methods are preferred because they are easier to read and debug: >>> 'tea for too'.replace('too', 'two') 'tea for two' 10.6. Mathematics ================= The "math" module gives access to the underlying C library functions for floating point math: >>> import math >>> math.cos(math.pi / 4) 0.70710678118654757 >>> math.log(1024, 2) 10.0 The "random" module provides tools for making random selections: >>> import random >>> random.choice(['apple', 'pear', 'banana']) 'apple' >>> random.sample(range(100), 10) # sampling without replacement [30, 83, 16, 4, 8, 81, 41, 50, 18, 33] >>> random.random() # random float 0.17970987693706186 >>> random.randrange(6) # random integer chosen from range(6) 4 The "statistics" module calculates basic statistical properties (the mean, median, variance, etc.) of numeric data: >>> import statistics >>> data = [2.75, 1.75, 1.25, 0.25, 0.5, 1.25, 3.5] >>> statistics.mean(data) 1.6071428571428572 >>> statistics.median(data) 1.25 >>> statistics.variance(data) 1.3720238095238095 The SciPy project has many other modules for numerical computations. 10.7. Internet Access ===================== There are a number of modules for accessing the internet and processing internet protocols. Two of the simplest are "urllib.request" for retrieving data from URLs and "smtplib" for sending mail: >>> from urllib.request import urlopen >>> with urlopen('http://worldtimeapi.org/api/timezone/etc/UTC.txt') as response: ... for line in response: ... line = line.decode() # Convert bytes to a str ... if line.startswith('datetime'): ... print(line.rstrip()) # Remove trailing newline ... datetime: 2022-01-01T01:36:47.689215+00:00 >>> import smtplib >>> server = smtplib.SMTP('localhost') >>> server.sendmail('soothsayer@example.org', 'jcaesar@example.org', ... """To: jcaesar@example.org ... From: soothsayer@example.org ... ... Beware the Ides of March. ... """) >>> server.quit() (Note that the second example needs a mailserver running on localhost.) 10.8. Dates and Times ===================== The "datetime" module supplies classes for manipulating dates and times in both simple and complex ways. While date and time arithmetic is supported, the focus of the implementation is on efficient member extraction for output formatting and manipulation. The module also supports objects that are timezone aware. >>> # dates are easily constructed and formatted >>> from datetime import date >>> now = date.today() >>> now datetime.date(2003, 12, 2) >>> now.strftime("%m-%d-%y. %d %b %Y is a %A on the %d day of %B.") '12-02-03. 02 Dec 2003 is a Tuesday on the 02 day of December.' >>> # dates support calendar arithmetic >>> birthday = date(1964, 7, 31) >>> age = now - birthday >>> age.days 14368 10.9. Data Compression ====================== Common data archiving and compression formats are directly supported by modules including: "zlib", "gzip", "bz2", "lzma", "zipfile" and "tarfile". >>> import zlib >>> s = b'witch which has which witches wrist watch' >>> len(s) 41 >>> t = zlib.compress(s) >>> len(t) 37 >>> zlib.decompress(t) b'witch which has which witches wrist watch' >>> zlib.crc32(s) 226805979 10.10. Performance Measurement ============================== Some Python users develop a deep interest in knowing the relative performance of different approaches to the same problem. Python provides a measurement tool that answers those questions immediately. For example, it may be tempting to use the tuple packing and unpacking feature instead of the traditional approach to swapping arguments. The "timeit" module quickly demonstrates a modest performance advantage: >>> from timeit import Timer >>> Timer('t=a; a=b; b=t', 'a=1; b=2').timeit() 0.57535828626024577 >>> Timer('a,b = b,a', 'a=1; b=2').timeit() 0.54962537085770791 In contrast to "timeit"’s fine level of granularity, the "profile" and "pstats" modules provide tools for identifying time critical sections in larger blocks of code. 10.11. Quality Control ====================== One approach for developing high quality software is to write tests for each function as it is developed and to run those tests frequently during the development process. The "doctest" module provides a tool for scanning a module and validating tests embedded in a program’s docstrings. Test construction is as simple as cutting-and-pasting a typical call along with its results into the docstring. This improves the documentation by providing the user with an example and it allows the doctest module to make sure the code remains true to the documentation: def average(values): """Computes the arithmetic mean of a list of numbers. >>> print(average([20, 30, 70])) 40.0 """ return sum(values) / len(values) import doctest doctest.testmod() # automatically validate the embedded tests The "unittest" module is not as effortless as the "doctest" module, but it allows a more comprehensive set of tests to be maintained in a separate file: import unittest class TestStatisticalFunctions(unittest.TestCase): def test_average(self): self.assertEqual(average([20, 30, 70]), 40.0) self.assertEqual(round(average([1, 5, 7]), 1), 4.3) with self.assertRaises(ZeroDivisionError): average([]) with self.assertRaises(TypeError): average(20, 30, 70) unittest.main() # Calling from the command line invokes all tests 10.12. Batteries Included ========================= Python has a “batteries included” philosophy. This is best seen through the sophisticated and robust capabilities of its larger packages. For example: * The "xmlrpc.client" and "xmlrpc.server" modules make implementing remote procedure calls into an almost trivial task. Despite the modules’ names, no direct knowledge or handling of XML is needed. * The "email" package is a library for managing email messages, including MIME and other **RFC 2822**-based message documents. Unlike "smtplib" and "poplib" which actually send and receive messages, the email package has a complete toolset for building or decoding complex message structures (including attachments) and for implementing internet encoding and header protocols. * The "json" package provides robust support for parsing this popular data interchange format. The "csv" module supports direct reading and writing of files in Comma-Separated Value format, commonly supported by databases and spreadsheets. XML processing is supported by the "xml.etree.ElementTree", "xml.dom" and "xml.sax" packages. Together, these modules and packages greatly simplify data interchange between Python applications and other tools. * The "sqlite3" module is a wrapper for the SQLite database library, providing a persistent database that can be updated and accessed using slightly nonstandard SQL syntax. * Internationalization is supported by a number of modules including "gettext", "locale", and the "codecs" package. 11. Brief Tour of the Standard Library — Part II ************************************************ This second tour covers more advanced modules that support professional programming needs. These modules rarely occur in small scripts. 11.1. Output Formatting ======================= The "reprlib" module provides a version of "repr()" customized for abbreviated displays of large or deeply nested containers: >>> import reprlib >>> reprlib.repr(set('supercalifragilisticexpialidocious')) "{'a', 'c', 'd', 'e', 'f', 'g', ...}" The "pprint" module offers more sophisticated control over printing both built-in and user defined objects in a way that is readable by the interpreter. When the result is longer than one line, the “pretty printer” adds line breaks and indentation to more clearly reveal data structure: >>> import pprint >>> t = [[[['black', 'cyan'], 'white', ['green', 'red']], [['magenta', ... 'yellow'], 'blue']]] ... >>> pprint.pprint(t, width=30) [[[['black', 'cyan'], 'white', ['green', 'red']], [['magenta', 'yellow'], 'blue']]] The "textwrap" module formats paragraphs of text to fit a given screen width: >>> import textwrap >>> doc = """The wrap() method is just like fill() except that it returns ... a list of strings instead of one big string with newlines to separate ... the wrapped lines.""" ... >>> print(textwrap.fill(doc, width=40)) The wrap() method is just like fill() except that it returns a list of strings instead of one big string with newlines to separate the wrapped lines. The "locale" module accesses a database of culture specific data formats. The grouping attribute of locale’s format function provides a direct way of formatting numbers with group separators: >>> import locale >>> locale.setlocale(locale.LC_ALL, 'English_United States.1252') 'English_United States.1252' >>> conv = locale.localeconv() # get a mapping of conventions >>> x = 1234567.8 >>> locale.format("%d", x, grouping=True) '1,234,567' >>> locale.format_string("%s%.*f", (conv['currency_symbol'], ... conv['frac_digits'], x), grouping=True) '$1,234,567.80' 11.2. Templating ================ The "string" module includes a versatile "Template" class with a simplified syntax suitable for editing by end-users. This allows users to customize their applications without having to alter the application. The format uses placeholder names formed by "$" with valid Python identifiers (alphanumeric characters and underscores). Surrounding the placeholder with braces allows it to be followed by more alphanumeric letters with no intervening spaces. Writing "$$" creates a single escaped "$": >>> from string import Template >>> t = Template('${village}folk send $$10 to $cause.') >>> t.substitute(village='Nottingham', cause='the ditch fund') 'Nottinghamfolk send $10 to the ditch fund.' The "substitute()" method raises a "KeyError" when a placeholder is not supplied in a dictionary or a keyword argument. For mail-merge style applications, user supplied data may be incomplete and the "safe_substitute()" method may be more appropriate — it will leave placeholders unchanged if data is missing: >>> t = Template('Return the $item to $owner.') >>> d = dict(item='unladen swallow') >>> t.substitute(d) Traceback (most recent call last): ... KeyError: 'owner' >>> t.safe_substitute(d) 'Return the unladen swallow to $owner.' Template subclasses can specify a custom delimiter. For example, a batch renaming utility for a photo browser may elect to use percent signs for placeholders such as the current date, image sequence number, or file format: >>> import time, os.path >>> photofiles = ['img_1074.jpg', 'img_1076.jpg', 'img_1077.jpg'] >>> class BatchRename(Template): ... delimiter = '%' >>> fmt = input('Enter rename style (%d-date %n-seqnum %f-format): ') Enter rename style (%d-date %n-seqnum %f-format): Ashley_%n%f >>> t = BatchRename(fmt) >>> date = time.strftime('%d%b%y') >>> for i, filename in enumerate(photofiles): ... base, ext = os.path.splitext(filename) ... newname = t.substitute(d=date, n=i, f=ext) ... print('{0} --> {1}'.format(filename, newname)) img_1074.jpg --> Ashley_0.jpg img_1076.jpg --> Ashley_1.jpg img_1077.jpg --> Ashley_2.jpg Another application for templating is separating program logic from the details of multiple output formats. This makes it possible to substitute custom templates for XML files, plain text reports, and HTML web reports. 11.3. Working with Binary Data Record Layouts ============================================= The "struct" module provides "pack()" and "unpack()" functions for working with variable length binary record formats. The following example shows how to loop through header information in a ZIP file without using the "zipfile" module. Pack codes ""H"" and ""I"" represent two and four byte unsigned numbers respectively. The ""<"" indicates that they are standard size and in little-endian byte order: import struct with open('myfile.zip', 'rb') as f: data = f.read() start = 0 for i in range(3): # show the first 3 file headers start += 14 fields = struct.unpack('>> import weakref, gc >>> class A: ... def __init__(self, value): ... self.value = value ... def __repr__(self): ... return str(self.value) ... >>> a = A(10) # create a reference >>> d = weakref.WeakValueDictionary() >>> d['primary'] = a # does not create a reference >>> d['primary'] # fetch the object if it is still alive 10 >>> del a # remove the one reference >>> gc.collect() # run garbage collection right away 0 >>> d['primary'] # entry was automatically removed Traceback (most recent call last): File "", line 1, in d['primary'] # entry was automatically removed File "C:/python39/lib/weakref.py", line 46, in __getitem__ o = self.data[key]() KeyError: 'primary' 11.7. Tools for Working with Lists ================================== Many data structure needs can be met with the built-in list type. However, sometimes there is a need for alternative implementations with different performance trade-offs. The "array" module provides an "array()" object that is like a list that stores only homogeneous data and stores it more compactly. The following example shows an array of numbers stored as two byte unsigned binary numbers (typecode ""H"") rather than the usual 16 bytes per entry for regular lists of Python int objects: >>> from array import array >>> a = array('H', [4000, 10, 700, 22222]) >>> sum(a) 26932 >>> a[1:3] array('H', [10, 700]) The "collections" module provides a "deque()" object that is like a list with faster appends and pops from the left side but slower lookups in the middle. These objects are well suited for implementing queues and breadth first tree searches: >>> from collections import deque >>> d = deque(["task1", "task2", "task3"]) >>> d.append("task4") >>> print("Handling", d.popleft()) Handling task1 unsearched = deque([starting_node]) def breadth_first_search(unsearched): node = unsearched.popleft() for m in gen_moves(node): if is_goal(m): return m unsearched.append(m) In addition to alternative list implementations, the library also offers other tools such as the "bisect" module with functions for manipulating sorted lists: >>> import bisect >>> scores = [(100, 'perl'), (200, 'tcl'), (400, 'lua'), (500, 'python')] >>> bisect.insort(scores, (300, 'ruby')) >>> scores [(100, 'perl'), (200, 'tcl'), (300, 'ruby'), (400, 'lua'), (500, 'python')] The "heapq" module provides functions for implementing heaps based on regular lists. The lowest valued entry is always kept at position zero. This is useful for applications which repeatedly access the smallest element but do not want to run a full list sort: >>> from heapq import heapify, heappop, heappush >>> data = [1, 3, 5, 7, 9, 2, 4, 6, 8, 0] >>> heapify(data) # rearrange the list into heap order >>> heappush(data, -5) # add a new entry >>> [heappop(data) for i in range(3)] # fetch the three smallest entries [-5, 0, 1] 11.8. Decimal Floating Point Arithmetic ======================================= The "decimal" module offers a "Decimal" datatype for decimal floating point arithmetic. Compared to the built-in "float" implementation of binary floating point, the class is especially helpful for * financial applications and other uses which require exact decimal representation, * control over precision, * control over rounding to meet legal or regulatory requirements, * tracking of significant decimal places, or * applications where the user expects the results to match calculations done by hand. For example, calculating a 5% tax on a 70 cent phone charge gives different results in decimal floating point and binary floating point. The difference becomes significant if the results are rounded to the nearest cent: >>> from decimal import * >>> round(Decimal('0.70') * Decimal('1.05'), 2) Decimal('0.74') >>> round(.70 * 1.05, 2) 0.73 The "Decimal" result keeps a trailing zero, automatically inferring four place significance from multiplicands with two place significance. Decimal reproduces mathematics as done by hand and avoids issues that can arise when binary floating point cannot exactly represent decimal quantities. Exact representation enables the "Decimal" class to perform modulo calculations and equality tests that are unsuitable for binary floating point: >>> Decimal('1.00') % Decimal('.10') Decimal('0.00') >>> 1.00 % 0.10 0.09999999999999995 >>> sum([Decimal('0.1')]*10) == Decimal('1.0') True >>> sum([0.1]*10) == 1.0 False The "decimal" module provides arithmetic with as much precision as needed: >>> getcontext().prec = 36 >>> Decimal(1) / Decimal(7) Decimal('0.142857142857142857142857142857142857') 12. Virtual Environments and Packages ************************************* 12.1. Introduction ================== Python applications will often use packages and modules that don’t come as part of the standard library. Applications will sometimes need a specific version of a library, because the application may require that a particular bug has been fixed or the application may be written using an obsolete version of the library’s interface. This means it may not be possible for one Python installation to meet the requirements of every application. If application A needs version 1.0 of a particular module but application B needs version 2.0, then the requirements are in conflict and installing either version 1.0 or 2.0 will leave one application unable to run. The solution for this problem is to create a *virtual environment*, a self-contained directory tree that contains a Python installation for a particular version of Python, plus a number of additional packages. Different applications can then use different virtual environments. To resolve the earlier example of conflicting requirements, application A can have its own virtual environment with version 1.0 installed while application B has another virtual environment with version 2.0. If application B requires a library be upgraded to version 3.0, this will not affect application A’s environment. 12.2. Creating Virtual Environments =================================== The module used to create and manage virtual environments is called "venv". "venv" will usually install the most recent version of Python that you have available. If you have multiple versions of Python on your system, you can select a specific Python version by running "python3" or whichever version you want. To create a virtual environment, decide upon a directory where you want to place it, and run the "venv" module as a script with the directory path: python3 -m venv tutorial-env This will create the "tutorial-env" directory if it doesn’t exist, and also create directories inside it containing a copy of the Python interpreter, the standard library, and various supporting files. A common directory location for a virtual environment is ".venv". This name keeps the directory typically hidden in your shell and thus out of the way while giving it a name that explains why the directory exists. It also prevents clashing with ".env" environment variable definition files that some tooling supports. Once you’ve created a virtual environment, you may activate it. On Windows, run: tutorial-env\Scripts\activate.bat On Unix or MacOS, run: source tutorial-env/bin/activate (This script is written for the bash shell. If you use the **csh** or **fish** shells, there are alternate "activate.csh" and "activate.fish" scripts you should use instead.) Activating the virtual environment will change your shell’s prompt to show what virtual environment you’re using, and modify the environment so that running "python" will get you that particular version and installation of Python. For example: $ source ~/envs/tutorial-env/bin/activate (tutorial-env) $ python Python 3.5.1 (default, May 6 2016, 10:59:36) ... >>> import sys >>> sys.path ['', '/usr/local/lib/python35.zip', ..., '~/envs/tutorial-env/lib/python3.5/site-packages'] >>> 12.3. Managing Packages with pip ================================ You can install, upgrade, and remove packages using a program called **pip**. By default "pip" will install packages from the Python Package Index, . You can browse the Python Package Index by going to it in your web browser. "pip" has a number of subcommands: “install”, “uninstall”, “freeze”, etc. (Consult the Installing Python Modules guide for complete documentation for "pip".) You can install the latest version of a package by specifying a package’s name: (tutorial-env) $ python -m pip install novas Collecting novas Downloading novas-3.1.1.3.tar.gz (136kB) Installing collected packages: novas Running setup.py install for novas Successfully installed novas-3.1.1.3 You can also install a specific version of a package by giving the package name followed by "==" and the version number: (tutorial-env) $ python -m pip install requests==2.6.0 Collecting requests==2.6.0 Using cached requests-2.6.0-py2.py3-none-any.whl Installing collected packages: requests Successfully installed requests-2.6.0 If you re-run this command, "pip" will notice that the requested version is already installed and do nothing. You can supply a different version number to get that version, or you can run "pip install --upgrade" to upgrade the package to the latest version: (tutorial-env) $ python -m pip install --upgrade requests Collecting requests Installing collected packages: requests Found existing installation: requests 2.6.0 Uninstalling requests-2.6.0: Successfully uninstalled requests-2.6.0 Successfully installed requests-2.7.0 "pip uninstall" followed by one or more package names will remove the packages from the virtual environment. "pip show" will display information about a particular package: (tutorial-env) $ pip show requests --- Metadata-Version: 2.0 Name: requests Version: 2.7.0 Summary: Python HTTP for Humans. Home-page: http://python-requests.org Author: Kenneth Reitz Author-email: me@kennethreitz.com License: Apache 2.0 Location: /Users/akuchling/envs/tutorial-env/lib/python3.4/site-packages Requires: "pip list" will display all of the packages installed in the virtual environment: (tutorial-env) $ pip list novas (3.1.1.3) numpy (1.9.2) pip (7.0.3) requests (2.7.0) setuptools (16.0) "pip freeze" will produce a similar list of the installed packages, but the output uses the format that "pip install" expects. A common convention is to put this list in a "requirements.txt" file: (tutorial-env) $ pip freeze > requirements.txt (tutorial-env) $ cat requirements.txt novas==3.1.1.3 numpy==1.9.2 requests==2.7.0 The "requirements.txt" can then be committed to version control and shipped as part of an application. Users can then install all the necessary packages with "install -r": (tutorial-env) $ python -m pip install -r requirements.txt Collecting novas==3.1.1.3 (from -r requirements.txt (line 1)) ... Collecting numpy==1.9.2 (from -r requirements.txt (line 2)) ... Collecting requests==2.7.0 (from -r requirements.txt (line 3)) ... Installing collected packages: novas, numpy, requests Running setup.py install for novas Successfully installed novas-3.1.1.3 numpy-1.9.2 requests-2.7.0 "pip" has many more options. Consult the Installing Python Modules guide for complete documentation for "pip". When you’ve written a package and want to make it available on the Python Package Index, consult the Distributing Python Modules guide. 13. What Now? ************* Reading this tutorial has probably reinforced your interest in using Python — you should be eager to apply Python to solving your real- world problems. Where should you go to learn more? This tutorial is part of Python’s documentation set. Some other documents in the set are: * The Python Standard Library: You should browse through this manual, which gives complete (though terse) reference material about types, functions, and the modules in the standard library. The standard Python distribution includes a *lot* of additional code. There are modules to read Unix mailboxes, retrieve documents via HTTP, generate random numbers, parse command- line options, write CGI programs, compress data, and many other tasks. Skimming through the Library Reference will give you an idea of what’s available. * Installing Python Modules explains how to install additional modules written by other Python users. * The Python Language Reference: A detailed explanation of Python’s syntax and semantics. It’s heavy reading, but is useful as a complete guide to the language itself. More Python resources: * https://www.python.org: The major Python Web site. It contains code, documentation, and pointers to Python-related pages around the Web. This Web site is mirrored in various places around the world, such as Europe, Japan, and Australia; a mirror may be faster than the main site, depending on your geographical location. * https://docs.python.org: Fast access to Python’s documentation. * https://pypi.org: The Python Package Index, previously also nicknamed the Cheese Shop [1], is an index of user-created Python modules that are available for download. Once you begin releasing code, you can register it here so that others can find it. * https://code.activestate.com/recipes/langs/python/: The Python Cookbook is a sizable collection of code examples, larger modules, and useful scripts. Particularly notable contributions are collected in a book also titled Python Cookbook (O’Reilly & Associates, ISBN 0-596-00797-3.) * http://www.pyvideo.org collects links to Python-related videos from conferences and user-group meetings. * https://scipy.org: The Scientific Python project includes modules for fast array computations and manipulations plus a host of packages for such things as linear algebra, Fourier transforms, non- linear solvers, random number distributions, statistical analysis and the like. For Python-related questions and problem reports, you can post to the newsgroup *comp.lang.python*, or send them to the mailing list at python-list@python.org. The newsgroup and mailing list are gatewayed, so messages posted to one will automatically be forwarded to the other. There are hundreds of postings a day, asking (and answering) questions, suggesting new features, and announcing new modules. Mailing list archives are available at https://mail.python.org/pipermail/. Before posting, be sure to check the list of Frequently Asked Questions (also called the FAQ). The FAQ answers many of the questions that come up again and again, and may already contain the solution for your problem. -[ Footnotes ]- [1] “Cheese Shop” is a Monty Python’s sketch: a customer enters a cheese shop, but whatever cheese he asks for, the clerk says it’s missing. 1. Command line and environment ******************************* The CPython interpreter scans the command line and the environment for various settings. **CPython implementation detail:** Other implementations’ command line schemes may differ. See Alternate Implementations for further resources. 1.1. Command line ================= When invoking Python, you may specify any of these options: python [-bBdEhiIOqsSuvVWx?] [-c command | -m module-name | script | - ] [args] The most common use case is, of course, a simple invocation of a script: python myscript.py 1.1.1. Interface options ------------------------ The interpreter interface resembles that of the UNIX shell, but provides some additional methods of invocation: * When called with standard input connected to a tty device, it prompts for commands and executes them until an EOF (an end-of-file character, you can produce that with "Ctrl-D" on UNIX or "Ctrl-Z, Enter" on Windows) is read. * When called with a file name argument or with a file as standard input, it reads and executes a script from that file. * When called with a directory name argument, it reads and executes an appropriately named script from that directory. * When called with "-c command", it executes the Python statement(s) given as *command*. Here *command* may contain multiple statements separated by newlines. Leading whitespace is significant in Python statements! * When called with "-m module-name", the given module is located on the Python module path and executed as a script. In non-interactive mode, the entire input is parsed before it is executed. An interface option terminates the list of options consumed by the interpreter, all consecutive arguments will end up in "sys.argv" – note that the first element, subscript zero ("sys.argv[0]"), is a string reflecting the program’s source. -c Execute the Python code in *command*. *command* can be one or more statements separated by newlines, with significant leading whitespace as in normal module code. If this option is given, the first element of "sys.argv" will be ""-c"" and the current directory will be added to the start of "sys.path" (allowing modules in that directory to be imported as top level modules). Raises an auditing event "cpython.run_command" with argument "command". -m Search "sys.path" for the named module and execute its contents as the "__main__" module. Since the argument is a *module* name, you must not give a file extension (".py"). The module name should be a valid absolute Python module name, but the implementation may not always enforce this (e.g. it may allow you to use a name that includes a hyphen). Package names (including namespace packages) are also permitted. When a package name is supplied instead of a normal module, the interpreter will execute ".__main__" as the main module. This behaviour is deliberately similar to the handling of directories and zipfiles that are passed to the interpreter as the script argument. Note: This option cannot be used with built-in modules and extension modules written in C, since they do not have Python module files. However, it can still be used for precompiled modules, even if the original source file is not available. If this option is given, the first element of "sys.argv" will be the full path to the module file (while the module file is being located, the first element will be set to ""-m""). As with the "-c" option, the current directory will be added to the start of "sys.path". "-I" option can be used to run the script in isolated mode where "sys.path" contains neither the current directory nor the user’s site-packages directory. All "PYTHON*" environment variables are ignored, too. Many standard library modules contain code that is invoked on their execution as a script. An example is the "timeit" module: python -m timeit -s 'setup here' 'benchmarked code here' python -m timeit -h # for details Raises an auditing event "cpython.run_module" with argument "module-name". See also: "runpy.run_module()" Equivalent functionality directly available to Python code **PEP 338** – Executing modules as scripts Changed in version 3.1: Supply the package name to run a "__main__" submodule. Changed in version 3.4: namespace packages are also supported - Read commands from standard input ("sys.stdin"). If standard input is a terminal, "-i" is implied. If this option is given, the first element of "sys.argv" will be ""-"" and the current directory will be added to the start of "sys.path". Raises an auditing event "cpython.run_stdin" with no arguments.